hw6 - CS 124 / LINGUIST 180 - Winter 2010 Homework 6:...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
CS 124 / LINGUIST 180 - Winter 2010 Homework 6: Jeopardy: Question and Answering and XML Due: Friday, Feb 25 9:30am Can you make a system to compete with IBM's Jeopardy-playing "Watson"? This homework has two parts, both making use of Wikipedia. The first 100mb of Wikipedia in xml format is here: /usr/class/cs124/enwiki-20081008-pages-articles-first-100mb.xml Obviously, this file is very large, and you may not want to download it. In particular, you don't want to submit it, or you'll very likely run us out of quota. (In fact, opening this in an editor may cause problems. We suggest looking at it in less instead of emacs or vi.) Part 1 - Parsing XML. Part 1 is to take this wikipedia xml file and produce a summary of the titles and sort by last-modified timestamp. It will probably be easiest to use xmlstarlet. XMLStarlet is a commandline utility for processing XML files. To use it, edit your ~/.cshrc and add /usr/class/cs124/bin to make it look like:
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 06/01/2011.

Page1 / 2

hw6 - CS 124 / LINGUIST 180 - Winter 2010 Homework 6:...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online