Writing your own
So far we have relied on the built-in functionality of R to
carry out our analyses. We will cover:
How to write your own functions
How to use ow control mechanisms like if and for
Debugging your code when something goes
Classication & Decision Trees
Department of Statistics
1 / 46
We are given a data matrix X with either continuous or
discrete variables such that each row Xi
Reading and Wri+ng Data Files
Unstructured vs Structured
State of the Union Speeches
State of the Union Address !
George Washington !
December 8, 1790 !
Fellow-Citizens of the Senate and House of Representatives
Odds and Ends
Web Caching, Simula5on,
When you use a search engine to look for a Web
page, the search engine looks through its cache.
The cache is created by regularl
eXtensible Markup Language
XML package in R
Handy func8ons for parsing XML
readHTMLTable: reads an HTML table into R
xmlParse: read an XML le into R
xmlValue: retrieve text content of a node
we have seen so far
R uses control ow to describe a
shell commands command line interface to
the opera;ng system
regular expressions describes a pa>ern but
not how to nd
Geographic Data longitude and la+tude of
the county center
Popula+on Data from the census for each
Elec+on results from 2008 for each county
(scraped from a Website)
Want to mat
Probability allows us to quan0fy statements about
the chance of an event taking place.
For example -
Flip a fair coin
1. Whats the chance it lands heads?
2. Flip it 4 0mes, what propor
Data frames, Lists, Matrices
AND the Apply Family of Func9ons
2012 Summer Olympics
2012 Olympic Athletes
Lab made this
data explorer for
the Guardian. It
includes data on
Vectors, and Subse4ng
Think in terms of variables an ordered
collec6on of measurements on a group of
Care about the kind of measuremet values: it
informs the type of
Why is graphics in this course?
Good graphics today requires the computer
Visualiza9on enters every step of the data
Data cleaning are there anomalies?