2018 Data Pro Assessment

Correctly answering hundreds of questions would only begin to adequately measure one's true competency as a "data" professional, but it's fun to try with a sample set. Test your knowledge and awareness of popular big data tools, technology and terms in this month's edition of Website Magazine's Quiz Time!

1. Apache Hive is an open-source data warehouse infrastructure that...

a) provides tools for data summarization, query and analysis.
b) enables access to data via SQL.
c) is designed to support the analysis of large datasets.
d) All of the above.

2. Cluster analysis ("clustering") is a statistical classification technique or activity that involves grouping a set of objects or data so that those in the same group (called a cluster) are similar to each other, but different from those in other clusters.

a) True
b) False

3. Data mining is the process of analyzing hidden _________ according to different perspectives for categorization into useful information.
a) anomalies in data
b) patterns in data
c) stolen data
d) None of the above.

4. Hadoop, a distributed data management platform and open-source software frame-work for storing and processing big data, was designed to scale linearly to large clusters of thousands of commodity computers.

a) True
b) False

5. Metadata is data that...
a) is shared under a Creative Commons license.
b) describes other data.
c) should never be used for images and videos.
d) All of the above.

6. R is a _________ programming language for ________ analysis.
a) non-visual, narrative
b) enterprise-level, human resources
c) open-source, statistical
d) None of the above.

7. One petabyte is..

a) an extremely large unit of data.
b) equal to 1,000 terabytes.
c) both A and B.
d) None of the above.

8. Pattern recognition occurs when an algorithm locates recurrences or regularities within large data sets.
a) True
b) False

9. Natural language processing (NLP) involves tasks such as...
a) identifying sentence structures
b) detecting keywords
c) extracting relationship data
d) All of the above.

10. This process and technology is used to remove inaccurate data:
a) data cleansing
b) metadata analysis
c) progressive enhancement
d) None of the above. 


Answers: 1. D; 2. A; 3. B; 4. A; 5. B; 6. C; 7. C; 8. A; 9. D; 10. A