View the step-by-step solution to:

Using Python Data:

Using Python


A,1170,1,"Microsoft Mail","/mail"
A,1149,1,"Advanced Data Connector","/adc"
A,1029,1,"Clip Gallery Live","/clipgallerylive"
A,1221,1,"Microsoft TV Program Information","/mstv"
A,1039,1,"Internet Service Providers","/isp"
A,1034,1,"Internet Explorer","/ie"
A,1265,1,"Source Safe Support","/ssafesupport"
A,1129,1,"ActiveX Data Objects","/ado"
A,1193,1,"Office Developer Support","/offdevsupport"
A,1219,1,"Corporate Advertising Content","/ads"
A,1030,1,"Windows NT Server","/ntserver"
A,1100,1,"MS in Education","/education"
A,1210,1,"SNA Support","/snasupport"

The beginning letter for each line specifies what information contained in that line. If it's A then that is the attribute of the website with ID in the 2nd column. If it's C, the 2nd column is the visitor ID. We are interested in the site visited so our line starts with 'V'. For every line starting with V, the 2nd column is site ID, the 3rd is # of visit (=1 always in this way)

Question 4 make a MRJob task to get siteID, total # of visit. Then use unix sort to find the top 10 most frequently visited sites.

Top Answer

The output is as follows:... View the full answer

Sign up to view the full answer

Why Join Course Hero?

Course Hero has all the homework and study help you need to succeed! We’ve got course-specific notes, study guides, and practice tests along with expert tutors.


Educational Resources
  • -

    Study Documents

    Find the best study resources around, tagged to your specific courses. Share your own to gain free Course Hero access.

    Browse Documents
  • -

    Question & Answers

    Get one-on-one homework help from our expert tutors—available online 24/7. Ask your own questions or browse existing Q&A threads. Satisfaction guaranteed!

    Ask a Question
Ask a homework question - tutors are online