This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: CPS216 Data-Intensive Computing Systems - Fall 2011 Assignment 5 - Solutions Problem 1 The MapReduce job: 1. Mapper: if filename is R, test if R.B < 20 and output K 2 = A and composite V 2 = 0 ,tagR , else output K 2 = A and composite V 2 = S.B,tagS . (we are not interested in the R.B values, so that is why we output 0 if the file is R). 2. Combiner: looks at the list of values, extracts tag, if there are values with with tagR only one record of type K 2 = A and V 2 = 0 ,tagR will be sent; for the records with tagS we output K 2 = A and V 2 = max ( S.B ) ,tagS . 3. Reducer: for every key A will check if there is at least one record with value 0 ,tagR and if it exists, it will compute the maximum of the local maximums (max(S.B)). Problem 2 1 The plans are equivalent. 2 The plans are inequivalent. If S or T has a duplicate value of A that will match a record in R, then the final result will also have duplicate values of A, which will never happen in plan (a). If all the matching records in S and T have no duplicate A, the result is equivalent to plan (a). 3 The plans are inequivalent. Similar to question 2, if all the matching records in T have no duplicate in A, the result is equivalent to plan (a). 4 The plans are equivalent. Problem 3 1 To do the first TNLJ, the number of getNext() calls on scan of R and S is 6 + 6 * (2 + 1) + 1 = 25, which generates 4 records as intermediate result. To do the(2 + 1) + 1 = 25, which generates 4 records as intermediate result....
View Full Document
- Summer '09