20-Assignment-4-RSS - CS107 Spring 2007 Handout 20 April...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
CS107 Handout 20 Spring 2007 April 25, 2007 Assignment 4: RSS News Feed Aggregation Virtually all major newspapers and television news stations have bought into Al Gore’s most famous invention ever: the Internet. What you may not know is that all of these media corporations serve up RSS feeds summarizing the news stories that’ve aired or gone to press in the preceding 24 hours. RSS news feeds are XML documents with information about online news articles. If we can get the feeds, we can get the articles, and if we can get the articles, we can build a database of information similar to that held by news.google.com . That’s precisely what you’ll be doing for Assignment 4. Due: Thursday, May 3 rd at 11:59 p.m. This week’s assignment has you index a few hundred online news articles. Indexing a news article amounts to little more than breaking the content down into the individual words, and noting how many times each word appears. If a particular word appears a good number of times and it isn’t so common as to appear in virtually every other web page, then said word is probably a good indicator as to what the web page is all about. Once everything’s been indexed, you can talk to the database and ask for a list of stories about a specific person, place, or thing. If you’re curious what bipartisan issues are surfacing over the war in Iraq, you can just ask your friendly neighborhood database and it’s sure to come back with a lot: Please enter a single search term [enter to break]: Iraq 1 We found 165 articles with the word "Iraq". [We'll just list 10, though.] 1.) "UN criticises Iraq human rights" [search term occurs 20 times] "news.bbc.co.uk/2/hi/middle_east/6591151.stm" 2.) "Iraqi oil wealth 'going untapped'" [search term occurs 19 times] "news.bbc.co.uk/2/hi/business/6570623.stm" 3.) "Cheney, Reid trade accusations" [search term occurs 12 times] "www.boston.com/articles/2007/04/25/cheney_reid_trade_accusations " 4.) "Sunni group says it staged attack" [search term occurs 10 times] "www.philly.com/philly/news/Sunni_group.html" 5.) "Instead of a policy, a wall" [search term occurs 7 times] "www.boston.com/ /articles/2007/04/25/instead_of_a_policy" 6.) "Rebels use "new methods" in Diyala bombings" [search term occurs 7 times] "seattletimes.nwsource.com/html/nationworld/2003680063_iraq25.html" 7.) "The house that Jacques built" [search term occurs 5 times] "news.bbc.co.uk/2/hi/europe/6578581.stm" 8.) "Democrats still silent on gun control" [search term occurs 5 times]
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 01/14/2010 for the course CS 107 taught by Professor Cain,g during the Spring '08 term at Stanford.

Page1 / 4

20-Assignment-4-RSS - CS107 Spring 2007 Handout 20 April...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online