Statistics W4240: Data Mining
Columbia University
Spring, 2014
Version: January 19, 2016. The syllabus is subject to change, so look for the version with
the most recent date.
Course Description
Massive data collection and storage capacities have led to n
Publication 560
Contents
Retirement
Plans
for Small
Business
What's New
Cat. No. 46574N
Department
of the
Treasury
Internal
Revenue
Service
(SEP, SIMPLE, and
Qualified Plans)
For use in preparing
2014 Returns
. 1
Reminders . . . . . . . . . . . . . . . .
Summary of the Employee Retirement Income
Security Act (ERISA)
Patrick Purcell
Specialist in Income Security
Jennifer Staman
Legislative Attorney
May 19, 2009
Congressional Research Service
7-5700
www.crs.gov
RL34443
CRS Report for Congress
Prepared for M
5.
Defined Benefit and Defined Contribution
Plans: Understanding the Differences
Introduction
Both defined benefit and defined contribution pension plans offer various advantages to
employers and employees. The features of each are generally distinct and
Introduction to Statistical Inference
Statistics W4107 Spring 2016
Section 001: TR 6:10pm7:25pm; 312 Mathematics
Instructor:
Ronald Neath
rcn2112@columbia.edu
Office hours:
Time and location to be announced
Course description: A calculus-based introductio
Introduction to Statistical Inference
Statistics W4107 Spring 2016
Assignment 6
Reading:
By Tuesday, March 29, read Sections 5.15.5, 6.16.3.1, and Chapters 710 of Casella &
Berger; and/or Appendix A and Chapters 15 of Abramovich & Ritov.
For Thursday, Mar
Introduction to Statistical Inference
Statistics W4107 Spring 2016
Assignment 3
Reading:
By Tuesday, February 9, read Sections 5.15.5 & 6.16.3.1 and Chapter 7 of Casella & Berger;
and/or Appendix A, Sections 5.35.5, and Chapters 12 of Abramovich & Ritov.
Introduction to Statistical Inference
Statistics W4107 Spring 2016
Assignment 5
Reading:
By Thursday, March 10, read Sections 5.15.5, 6.16.3.1, and Chapters 79 of Casella &
Berger; and/or Appendix A, Sections 5.35.5, and Chapters 14 of Abramovich & Ritov.
Introduction to Statistical Inference
Statistics W4107 Spring 2016
Assignment 4
Reading:
By Thursday, February 25, read Sections 5.15.5, 6.16.3.1, and Chapters 78 of Casella &
Berger; and/or Appendix A, Sections 5.35.5, and Chapters 12 & 4 of Abramovich &
1
Objective Analysis. Effective Solutions.
commentary
(The Hill)
December 18, 2015
How Terrorists Get Here
Flowers and candles are displayed at a makeshift memorial after the December 2, 2015 shooting in San Bernardino,
California
Photo by Patrick T. Fall
1. Run this code, and describe the resulting plot.
data data;
p=0.5;
do x = 0 to 1;
pmf = cdf('bern',x,p)-cdf('bern',x-1,p);
output;
end;
run;
proc gchart data=data;
vbar x / sumvar=pmf;
run;
2. Change the value of p from 0.5 to 0.2.
Quiz Review
A sample of heights, recorded in feet, has a sample variance equal to
1. If the heights were recorded in inches, what then would would be
the sample variance?
A sample of heights, recorded in feet, has a sample variance equal to
1. If the heig
Probability
The first chapter of the text touches on what statistical analysis is, and how study
designs may go awry, and it notes that to do justice to the topics, an understanding
of probability theory is needed
Probability
Known
Unknown
Data
Model
Method
Analy5c
Goal
Design
Results
And
Error
Es5ma5on
conclusion
Context
Sta5s5cal
Theory
Gap in
Knowledge
Decision
Theory
GeDng to the right method
The right model
Recall the identity,
n
n
(xi x)2 + nx2 .
x2 =
i
i=1
i=1
There is a very nice geometric intuition behind the identity, and it is the
purpose of this note to try to develop that intuition.
To follow the intuition, one needs a little bit of linear algebra. S
1. A sample of heights, recorded in feet, has a sample variance equal to 1. If the heights were recorded
in inches, what then would would be the sample variance?
(a) 12
(b) 144
(c) 1/12
(d) 1/144
2. Which of the following is always true?
(a) P (A B ) = P
This supplementary note treats formulae for expectations and variances
in terms of conditional expectations and variances. Theyre only useful if
you plan on taking addititional theory and methods courses . But if you do
go on, they should prove quite usef
STAT 4150
Practice Quiz Solutions
February 19, 2014
1. A fair coin is flipped repeatedly until a head has been seen five times. What is the
probability that the fifth head occurs on the tenth flip?
There
Let
yi = xi + a,
then
y=
Let
yi = xi + a,
then
y = x + a.
Let
yi = xi + a,
then
y = x + a.
Derivation.
y=
1
n
n
yi
i =1
Let
yi = xi + a,
then
y = x + a.
Derivation.
y=
=
1
n
1
n
n
yi
i =1
n
(xi + a)
i =1
Let
yi = xi + a,
then
y = x + a.
Derivation.
y=
=
=
All the people
Some wear dark, others light
Accidents happen
Rate of dark/light in accidents
Rates of accidents in groups
Data o;en comes in tables
Rows are Records
One per subject, person, observaBon, item,
exp
Second Textbook Assignment
Solutions
2.5. This one could be a bit tedious too. I used the SAS bundled with the assignment)
for this. (JingJing Zou, our T.A. has code in R, if you are interested.) But the
Problem 5. To nd the number of congurations of working/failed for the
four components, one might simply write them all out, and count:
W W W W, W W W F, W W F W, W W F F et cetera.
(Here, the notation W W W F , for example, would mean that the rst three
c
One way to think about random variables.
Some observation or series of numeric observations.
All possible values: a sample space. A subset of Cartesian
coordinates.
Events in the sample space have probabilities.
The probabilities can be characterized by d
Problem 6.
First, what is ? The trick with this kind of problem is to choose the constant
so that the density integrates to 1. (Why? Because the integral is the
probability of the whole sample space, which, by convention, as embodied in
the axioms of prob
The set up for the axioms of probability: the sample space; and
a probability function. What are these, mathematically?
The sample space is
The set up for the axioms of probability: the sample space; and
a probability function. What are these, mathematica
Random Variables and
Expecta4on
Random Variable (one way to think
about them).
Numeric (?) outcome of (random!) observa4on.
The random variable induces a probability setup
The sample space is all possible outcome
Problem 1.2
One might presume that automobile and telephone owners were a substantially different
population from the population of individuals who voted in the election and that the proportion
preferring Truman amon