This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Statistics 133 - Homework 3 due 11pm Thursday, Oct. 22 on bSpace Text Mining the State of the Union Address The goal of this homework is for you to analyze the “State of the Union” U.S. presidential addresses from 1776 to the present. To do this you will need to prepare the data in a form that is suitable for statistical analysis. In particular, you will be studying how various summary statistics (number of sentences, average words per sentence) change across time, and how the distribution of individual words varies among the presidents. You’ll need the R package Rstem to complete this assignment. This package is an interface to C code which “stems” words, so that, for example, “run” and “running” are counted as the same word. This package is already available on the lab machines, and if you are using one you may load the package in R with library(Rstem) . If you have your own UNIX or Mac machine, you can install it using the instructions below. 1 If you are using Windows, you may find it a bit tricky to install it, since it’s not available on CRAN. You can complete most of the assignment, but I’d suggest switching over to one of the lab machines to finish. Also make sure you have all the files for this assignment in your working directory. You will need the code in SOU.R , HW3.R , and SJD.R . You may examine or change the working directory from within R , using the functions getwd and setwd . Step 1: Text Processing • For background, read the research notes on the State of the Union by Gerhard Peters on the UCSB American Presidency Project website: http://www.presidency.ucsb.edu/sou.phphttp://www....
View Full Document
- Fall '08
- Statistics, Vector Motors, total number, Stemming