AS (loc, wban, date:chararray, temp, count);grunt> data_flt = FILTER data_raw BY date != ’YEARMODA’;grunt> data = FOREACH data_flt GENERATE (long)loc, (int)wban,date, (float)temp, (int)count;grunt> temps = FOREACH data GENERATE ((temp-32.0)*5.0/9.0);grunt> temps_group = GROUP temps ALL;grunt> max_temp = FOREACH temps_group GENERATE MAX(temps);grunt> DUMP max_temp;(43.27777862548828)16
Also the mean daily temperatures were obtained from averaging a variable number ofmeasurements: the amount is given in the 5thcolumn, variable count. You might wantto filter all mean values obtained with less than –say– 5 measurements out. This is leftas an exercise to the reader.5.4Some extra Pig commandsSome relational operatorsFILTERUse it to work with tuples or rows of dataFOREACHUse it to work with columns of dataGROUPUse it to group data in a single relationORDERSort a relation based on one or more fields...Some built-in functionsAVGCalculate the average of numeric values in asingle-column bagCOUNTCalculate the number of tuples in a bagMAX/MINCalculate the maximum/minimum value in asingle-column bagSUMCalculate the sum of values in asingle-column bag...17
Hands-on block 6Extras6.1Installing your own HadoopThe Hadoop community has its main online presence in:Although you can download the latest source code and release tarballs from that location,we strongly suggest you to use the more production-ready Cloudera distribution:Cloudera provides ready to use Hadoop Linux packages for several distributions, as well asa Hadoop Installer for configuring your own Hadoop cluster, and also a VMWare appliancepreconfigured with Hadoop, Hue, HBase and more.18
Appendix AAdditional InformationHadoop HomepageInternet:Cloudera Hadoop DistributionInternet:DocumentationTutorial:mapred_tutorial.htmlHadoop API:Pig:Recommended booksHadoop:TheDefinitive GuideTom White, O’Reilly Media, 2010 (2nd Ed.)Hadoop in actionChuck Lam, Manning, 201119
You've reached the end of your free preview.
Want to read all 19 pages?
- Fall '14
- Computer Architecture, Hadoop, Hadoop Streaming, Hadoop ﬁlesystem HDFS