hw2 - CS 124 / LINGUIST 180 - Winter 2011 Homework 2:...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
CS 124 / LINGUIST 180 - Winter 2011 Homework 2: Language Identification Due: Thursday Jan 20 9:30am In order to extract any kind of information from text, the first thing we have to know is what language the text is in. In this assignment you are going to use character N-gram grammars to solve the problem of language identification . Given a document, your goal is to say what language it is written in. We will give you a set of training documents (one in each of 10 languages) and a set of development test documents. You will be graded on an unseen set of 10 test documents. To make the problem tractable, we guarantee that the test documents will come from one of the 10 languages you have seen in the training set. The data you will use is 10 translations of the part of the Universal Declaration of Human Rights (which has been translated into many languages although we've set up the data for you locally so you don't need to download it from that site; see below.)
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

Page1 / 2

hw2 - CS 124 / LINGUIST 180 - Winter 2011 Homework 2:...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online