Final 2016 INF2190: Data Analytics Final Test, Section 0101 December 6, 2016, 6:30pm-8:30pm This is a closed book and notes exam. You have 120 minutes . This booklet contains 10 pages, including the cover page and two pages as scratch paper at the back. PLEASE WRITE YOUR NAME ON EACH PAGE ! Last name: First name: Student Number: Problem 1 (out of 30) Problem 2 (out of 20) TOTAL (out of 50)

Name: 1. PROBLEM 1, (30 points) For each of the following statements, indicate whether they are true (T) of false (F). Each correct answer is worth 1 point . 1 Divisive hierarchical clustering is more expensive (needs more time to execute) than Agglom- erative Hierarchical clustering. 2 Classification is an unsupervised technique 3 If we are absolutely certain that an event will occur, then it has maximum information (en- tropy). 4 A classification tree (decision tree) always has a fixed number of levels, i.e., we always descend the same number of nodes from its root in order to reach a leaf. This number is set by the user. 5 Information Gain is a distance measure of database records. 6 When doing classification, a special attribute called ”class label” is needed. 7 Divisive hierarchical clustering does not need the number of clusters, k , to be given as input. 8 The average of a set of numbers cannot be calculated in a map-reduce (parallel) environment. 9 A switch is a device that facilitates the communication among rack servers. 10 ”map” is a function that is determined by the programmer. 11 Counting the number of words in a text corpus cannot be done in a distributed (parallel) environment. 12 The World Wide Web is a directed graph, whose nodes are the web pages and edges the hyperlinks between them. 13 In link analysis, pages with higher number of out-going links are more important.
