Homework 1
Jonathan Auerbach
February 13, 2015
Problem 1. Seven students volunteered for a comparison of study guides for an advanced course in
mathematics. They were randomly assigned, four to study guide A and three to study guide B. All were
instructed
Homework 2
Question 1.
(a) If we know to be 0, the unbiased estimator of the variance is
!
1
! =
(! )!
!
which is
!
!
! !
!
(
) =
!
!
!
!
The variables Zi are independent standard normal random variables, so ! /2 has a chi-square
distribution with n degre
Review of Basic Statistical
Procedures
Inference:
Estimation
Approaches:
Frequentist
Hypothesis
Testing
Bayesian
Type I error occurs when Ho is falsely rejected
Type II: When Ho is not rejected when it should be.
Note: [0,1]
Wilks Theorem: -2 ln () a
Advanced Data Analysis
Spring 2013
STAT W4201 ADVANCED DATA ANALYSIS
E-mail: [email protected]
Office Hours: Friday: 5:00 PM - 6 PM,
and by appointment
TA: Chien-Hsu Huang: [email protected]
Prerequisites:
At least two of the following courses are
Jingchen Liu
1
Inference of multiple linear regression
Diagnostics tools: the four assumptions
Studentized residual, leverage, Cooks distance
Robust variance estimator
2
3
Large p small n problem
We cannot afford to put in all the potential explanatory
va
Jingchen Liu
1
2
3
4
Assumptions of linear model
Linearity
Normality
Equal variance
Independence
Seek for alternatives
5
6
A natural way to change the mean
Location shift
Natural exponential family
Canonical link function
Logistic function canonic link
Advanced Data Analysis
Fall 2015
IMPORTANT NOTICE!
Due to over-enrollment, class open ONLY
to Stat MA students who will graduate in
Dec 2015. NO EXCEPTION!
Stat MA and MAFN students must be in
their last semester to take the course.
The course will be off
Survival Analysis
Survival Analysis
Properties of Survival Times
Note: The hazard function or the cumulative hazard function
NOT a probability. They are a measure of risk. The greater
the value of the cumulative hazard function, the greater the
risk of fa
Linear Models (cont.)
OLS estimators, BLUE in the class of linear
unbiased estimators.
Model Validation
Linearity
Independence
Normality
Homoscedasticity
Influential Points
Robust Statistics
Multicollinearity
Ridge Regression
Principal Component
Analysis of Variance
ANALYSIS OF VARIANCE
Examples: Data on blood pressure reductions
for patients receiving 3 different drugs.
Six patients randomized to each drug
Sitting diastolic pressure measured before
randomization (baseline) and after 4 weeks of
Generalized Linear Models II
Suppose there is a function g such that
Logistic Regression
Denote
Residuals
Interpretation of Coefficients
Overdispersion
Overdispersion (cont)
Special Cases
Implementation
R
R, glmnet package fitt:
glm(formula, family=poiss
Stat W 4201, Spring 2015
Assignment #2: Due February 13
1. Suppose that cfw_X1 , X2 , ., Xn is a random sample from N (, 2 ). Construct a 95% confidence interval
for 2 under the following scenarios:
(a) is known to be 0.
(b) is unknown.
Fix n = 10 and =
Homework 4
Question 1.
The null hypothesis is 1 = 2 = = 12 (reduced model)
The alternative hypothesis is that the means are different for different bones (full model).
We run the ANOVA F-test in R:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
b1 = subs
Categorical Data Analysis
Test the relevant hypothesis.
> prop.test(13,100,0.15,alt="l")
1-sample prop test with continuity correction
X-square = 0.1765, df = 1, p-value = 0.3372
95 percent confidence interval:
0.0000000 0.2009056
Example. Suppose the man
Jingchen Liu
1
One sample test
Paired t-test, rank sign test
Two sample test
T-test, Welch test, permutation test, Wilcoxon rank sum
Assumptions
Parametric Vs nonparametric
Robust Vs efficiency
Resistent
2
3
4
5
1.
Is there significant differences amo
Jingchen Liu
1
Linear combinations
Linear contrast
Linear trend
Multiple comparison
Tukey-Kramer Procedure and Studentized range distribution
Scheffes Procedure
LSD
Bonferroni
Risk of data snooping
2
Linearity
Equal variance
Independence
Normality
Call:
Jingchen Liu
1
The Gaussian distribution and t-distribution
The Z-statistic and t-statistic
Paired t-test (One sample t-test)
1 , , (0,1)
two sample t-test
1 , , (0,1)
1 , , (0,1)
Hypothesis testing for two sample comparison
Permutation test
Two samp
Jingchen Liu
1
One sample test
Paired t-test, rank sign test
Two sample test
T-test, Welch test, permutation test, Wilcoxon rank sum
Assumptions
Parametric Vs nonparametric
Robust Vs efficiency
Resistent
2
Two-group comparison in presence of multiple
Homework 3
Jonathan Auerbach
March 10, 2015
Problem 1. A sunscreen sunlight protection factor (SPF) of 5 means that a person who can tolerate Y
minutes of sunlight without the sunscreen can tolerate 5Y minutes of sunlight with the sunscreen. The data
(bel
Jingchen Liu
1
2
Experimental Study: randomized design
Observational Study: not randomized design
3
Make inference on a populations
Randomly select samples
Make causal inference
Randomized experiment
Experimental study
Observational study
Confounding v
Jingchen Liu
Calibration
Regression effect
The four assumption:
linearity, equal variance, normality, independence
Graphical diagnostic
Lack-of-fit test: analysis of variance
2
2
The fraction of variation that can be explained by the
explanatory variables
Jingchen Liu
1
Causal inference
Experimental study and observational study
Confounding variables
Sampling from the population
Hypothesis testing
Null hypothesis and alternative hypothesis
Logic: start with the null and trying to reach some contradictio