CPS216 Data-Intensive Computing Systems - Fall 2010 Written Assignment 2 Total points = 100. Due date: Tuesday, Nov. 30, 2010 (5.00 PM). Submission: In class, or email solutions in pdf or plain text to [email protected] You can also drop oF the solutions at Gang’s o±ce: N303A North Building. Do not forget to indicate your name on your submission. State all assumptions. ²or questions where descriptive solutions are required, you will be graded both on the correctness and clarity of your reasoning. Email questions to [email protected] and [email protected] Question 1 Points 15 Pig has an ILLUSTRATE operator/command that Chris Olston covered in his talk. You can ³nd more details on the ILLUSTRATE operator in: (a) Page 307 of Tom White’s book, and (b) the paper that describes the algorithm used by the ILLUSTRATE operator available at: http://research.yahoo.com/pub/2807 Consider the following Pig Latin script taken from: http://developer.yahoo.com/blogs/hadoop/posts/2010/01/comparing pig latin and sql fo/ Users = load ‘users’ as (name, age, ipaddr); Clicks = load ‘clicks’ as (user, url, value); ValuableClicks = filter Clicks by value > 0; UserClicks = join Users by name, ValuableClicks by user; Geoinfo = load ‘geoinfo’ as (ipaddr, dma); UserGeo = join UserClicks by ipaddr, Geoinfo by ipaddr; ByDMA = group UserGeo by dma; ValuableClicksPerDMA = foreach ByDMA generate group, COUNT(UserGeo); store ValuableClicksPerDMA into ‘ValuableClicksPerDMA’; (a) What are the important challenges faced by the ILLUSTRATE operator for a Pig Latin script like the one above? (b) Give one complete example of the output that the ILLUSTRATE operator may produce for the
