cs221-pa2 - CS221 Problem Set#2 Programming Assignment 1 CS...

Info icon This preview shows pages 1–3. Sign up to view the full content.

View Full Document Right Arrow Icon
CS221 Problem Set #2 Programming Assignment 1 CS 221, Autumn 2009 Problem Set #2 Programming Assignment Due by 9:30am on Tuesday, October 27. Please see the course information page on the class website for late homework submission instructions. SCPD students can also fax their solutions to (650) 725-1449. We will not accept solutions by email or courier. Programming part (35 points) 1 Overview For this problem, your programming team (1 to 3 people) will use an ensemble of decision trees to build a handwritten digit recognizer. The set of handwritten characters is supplied for you. They come from the post office in Buffalo, NY, where a classifier like yours would greatly speed up delivery. The goal of the recognizer is to get 100% of the characters correctly classified as the appropriate digit, but the problem with real-world data is that it is noisy. Some of these digits could not even be pre-classified by people, so getting as close as possible to 100% is a more feasible goal. The overall structure of the problem set is as follows: We consider three types of classifiers for the data. The first is a single decision tree. The second is an ensemble of decision trees constructed using the Bagging algorithm. The last is an ensemble of decision trees constructed using the AdaBoost algorithm. In this problem set, we will explore varying the classifiers along different dimensions: the size of the training set, the depth of the decision tree(s), and the number of elements in the ensemble. Note that this project involves running computationally intensive experiments, so you might want to start early to give yourself plenty of time to optimize your code and to obtain all the results. 1.1 Data Set The data set consists of images of centered hand-written digits. Each image is a 14 × 14 matrix where each pixel’s intensity is in the range (0 , 255), which we have normalized to be in the range (0 , 1). For now, let’s just consider the case of distinguishing 0s from 1s; we’ll label 1 the positive class. Note that we are also providing you with your own independent test set, which you will use to evaluate your algorithm. You must not use your test data to train or tune the parameters of your model. The test data is there to gauge how well your classifier performs on unseen data. If you use it to train, your graphs will not show the desired behaviors and you won’t get full credit. 1.2 Decision Trees Since our data involves continuous variables, we need to extend our notion of splits in a decision tree. We use the following approach to build a decision tree on this input: Each node in the tree
Image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
CS221 Problem Set #2 Programming Assignment 2 chooses a single pixel and a threshold value. If the value of that pixel in an image is greater than the threshold, it sends the digit down the left branch. Otherwise, it sends it down the right.
Image of page 2
Image of page 3
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern