Assignment on evolution
The driving model in evolution is the r, K -model.1 Basically the story is
that populations grow exponentially (at rate r) until they end up reaching
their carrying capacity (K ). So, the population growing curve looks like a
logis
First practice in R
February 7, 2012
Lets look at a simple data set from your rst regression class here at Penn.
First grab the le:
http:/www-stat.wharton.upenn.edu/~waterman/fsw/datasets/txt/Cleaning.txt
Once you have this le, you can analyse it either u
Obama vs the SP500
First the data itself:
>
>
>
>
>
obama <- read.csv("obama.csv")
sp500 <- read.csv("sp500.csv")
obama$rowIndex <- 1:(dim(obama)[1])
both <- merge(sp500, obama, "Date")
both <- both[sort.list(both$rowIndex), ]
Basic plots:
plot(both$Obama
Class: Naive bayes for Linguistics Simple regression (R-squared
of 0.54) then 5 variable multiple regression (R-squared of 0.73). Using
80 variables we have a simple regression (R-squared = 1.0) and a naive
bayes. Last is a naive-bayes with 100s of variab
200th Darwin day: Heterocarpy in daisies
Dean Foster
Avi Shmida
February 13, 2012
1: Review of Evolution
Genes are in it for themselves:
(Read the original Darwin, The Origin of
Species or a modern version Dawkins, The
selsh gene or my favorite The exte
Second R Practice
January 18, 2012
For this assignment we will be using the famous Boston housing data set.
You can download it here:
http:/www-stat.wharton.upenn.edu/~magarick/471/boston.dat
Descriptions of the variables are here
http:/www-stat.wharton.u
#=
# INDEXING / SUBSETTING VECTORS WITH INTEGERS
#
#
#
#
#
#
#
#
#
#
#
In what follows we will learn two of four ways of solving the
task of pulling data out of a larger data vector. Subsetting
datasets is one of the most important tasks in any analysis o
#=
# CONCEPT: NUMERIC VECTORS AND FUNCTIONS TO CREATE THEM
# - This is our first 'composite data structure'.
#
We have seen simple examples: ladders with spacings of +/-1
#
(It will be followed by matrices, arrays, lists and data frames.)
#
# - A numeric
#=
=
# CHARACTER DATA / STRINGS / TEXT DATA
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
- We use synonymously:
'text data' = 'string data' = 'character data'
Character data has many uses:
. It can label groups of data.
Examples: gender groups (female, male)
#=
#
#
# - SOME BASICS: HISTORY, NUMBERS, OPERATIONS, FUNCTIONS -#
#
#
WHY R?
#
#
. We need standards - R is one of them.
#
. Huge developer community
#
. New stats algorithms appear first as R packages.
#
. Growing user community, also in industry
#
. Po
#=
# CONCEPT: VARIABLE NAMES AND ASSIGNMENT OF VALUES/DATA
# - As in math, we can use 'variable names' to point
#
to values and data structures.
#
#
Examples:
x <- 1.2
# preferred
x = 1.2
# same, less preferred
1.2 -> x
# ok but rarely used
x
# Printing t
#=
# CONCEPT: VECTOR COMPUTATIONS
# - The simplest forms of data consisting of a single variable
#
can be stored in a vector. (Later we will introduce matrices
#
and dataframes for data with more variables.)
#
Two typical tasks on such simple data are:
#
#=
# INDEXING / SUBSETTING VECTORS WITH INTEGERS
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
#
In what follows we will put all three data types to work for the
task of pulling data out of a larger data vector. Subsetting
datasets is one of the most important tasks
#=
# RANDOM NUMBERS:
# - Random numbers are the food of stochastic simulations.
#
They are essentially 'iid' draws from probability distributions.
#
Even though they 'look' truly random and distributed according
#
to the desired distribution, they are rea
RIC: Risk Ination Criterion
February 29, 2012
1
Admistrivia
2
Status so far
The model
Y = + 1 x1 + 2 x2 + 3 x3 + + p xp +
where
is N (0, 2 ).
Notation: here the subscripts identify which variable we are talking
about not which observation we are talking a
(pdf version)
1
Status so far
The model
Yi = + xi +
where
i
i
are iid and
i
N (0, 2 ).
First we discussed tting ( + xi )
Then we discussed the residuals
Now we want to discuss how to estimate the error in
2
Why we care
If the normal linear model hold
Admistrivia
HW 2 due next Tuesday
I put up notes (in Rnw) which have examples of all the bootstraps
I talked about in class last time.
Nice article by Jim Manzi on experiments.
Science is observation
Piaget did wonders for child psychology from just o
Class: Doglegs /piecewise linear / Bent
stick
February 7, 2012
(online version)
Story time: Publishing books
Information wants to be free
I could tell you who said thatbut wiki is down today
Accedemics write papers for free
Most musicians (as in numbe
Class: CCA
April 4, 2012
Admistrivia
Last time: Prize for compression
Lit review due today
One shot learning
Discriptors
POS: Noun / verb
Tone: formal / informal
gender: male / female
sentiment: better / worse
etc
1
Fill in the blank
New word come
Admistrivia
Homework due today.
Suggested readings
(pdf version)
Suggested readings
Wiki article.
How to bootstrap in R.
More theoretical description of bootstrap
Efron came up with the idea. See for example, Efron, Bradley.
Tibshirani, Robert J. An
Class: bootstrap
cfw_Admistrivia
H omework due today.
cfw_Suggested readings
( pdf version )
Suggested readings
Wiki article .
How to bootstrap in R .
More theoretical description of bootstrap
Efron came up with the idea. See for example, Efron, Bradley.
Admistrivia
HW 2 due next Tuesday
The magic of bootstrap
Suppose you are considering two measures of center:
mean
median
Which is better? Theory is very dicult to do. So use bootstrap!
Consider a fat tailed Y :
> Y <- c(rnorm(80, 0, 1), rnorm(20, 0, 10
CAPM: Berndt CAPM
Dean Foster
February 7, 2012
1
Admistrivia
Start on next homework. It walks you through using all the ideas
we have talked about in class so far.
No writeups necessary. Just practice R.
2
Story: Pair programming
Mythical man month
If
These exercises are to show you computer techniques. Do simple prints
to conrm that they worked for you and short write ups. So no detailed
descriptions, but a sentence here and there is nice. (Include your R script.)
Here are some useful R commands: .R.
Class: Rare counts
April 2, 2012
(pdf version)
1
Admistrivia
Lit review due wednesday.
Lyle Ungar (Computer Science), Dean Foster (Wharton Statistics), and Mark Liberman (Linguistics) are looking for a student
to work this summer on an exploratory resea