cs221-pa2

cs221-pa2 - CS221 Problem Set #2 Programming Assignment 1...

Info iconThis preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CS221 Problem Set #2 Programming Assignment 1 CS 221, Autumn 2009 Problem Set #2 Programming Assignment Due by 9:30am on Tuesday, October 27. Please see the course information page on the class website for late homework submission instructions. SCPD students can also fax their solutions to (650) 725-1449. We will not accept solutions by email or courier. Programming part (35 points) 1 Overview For this problem, your programming team (1 to 3 people) will use an ensemble of decision trees to build a handwritten digit recognizer. The set of handwritten characters is supplied for you. They come from the post o±ce in Bu²alo, NY, where a classi³er like yours would greatly speed up delivery. The goal of the recognizer is to get 100% of the characters correctly classi³ed as the appropriate digit, but the problem with real-world data is that it is noisy. Some of these digits could not even be pre-classi³ed by people, so getting as close as possible to 100% is a more feasible goal. The overall structure of the problem set is as follows: We consider three types of classi³ers for the data. The ³rst is a single decision tree. The second is an ensemble of decision trees constructed using the Bagging algorithm. The last is an ensemble of decision trees constructed using the AdaBoost algorithm. In this problem set, we will explore varying the classi³ers along di²erent dimensions: the size of the training set, the depth of the decision tree(s), and the number of elements in the ensemble. Note that this project involves running computationally intensive experiments, so you might want to start early to give yourself plenty of time to optimize your code and to obtain all the results. 1.1 Data Set The data set consists of images of centered hand-written digits. Each image is a 14 × 14 matrix where each pixel’s intensity is in the range (0 , 255), which we have normalized to be in the range (0 , 1). For now, let’s just consider the case of distinguishing 0s from 1s; we’ll label 1 the positive class. Note that we are also providing you with your own independent test set, which you will use to evaluate your algorithm. You must not use your test data to train or tune the parameters of your model. The test data is there to gauge how well your classi³er performs on unseen data. If you use it to train, your graphs will not show the desired behaviors and you won’t get full credit. 1.2 Decision Trees Since our data involves continuous variables, we need to extend our notion of splits in a decision tree. We use the following approach to build a decision tree on this input: Each node in the tree
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
CS221 Problem Set #2 Programming Assignment 2 chooses a single pixel and a threshold value. If the value of that pixel in an image is greater than the threshold, it sends the digit down the left branch. Otherwise, it sends it down the right. We use decision trees in two ways. The decision tree learning algorithm, as described in class,
Background image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 12/15/2009 for the course CS 221 taught by Professor Koller,ng during the Fall '09 term at Stanford.

Page1 / 6

cs221-pa2 - CS221 Problem Set #2 Programming Assignment 1...

This preview shows document pages 1 - 3. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online