STAT 300
Assignment 1 Solution
Solutions to STAT300 Written Assignment No. 1
Note: Due to minor variations on how the methods are implemented (e.g., different ways
to handle ties, different approxim
Review: Fishers Exact Test
Contingency tables are used to summarize data from two or more
categorical variables. The simplest contingency table is a 2 2
table for two binary variables. The 2 2 tables
Statistics: General Principles
Statistics is about learning from data, i.e., getting useful
information from data.
First step
The rst important step is to carefully collect the data. For example,
how
Summary: Normal Probability Plots
Many statistical methods assume that data follow normal
distributions. Examples include:
one-sample or two-sample t-tests
ANOVA models
linear regression models
In dat
Goodness-of-Fit Test
In practice, data may or may not follow a statistical model.
A model is an assumption, which may or may not t the data.
In statistical analysis, when we assume a model, we need to
Summary: Density Curve Fitting
We can use the 2 goodness-of-t test method for categorical data to
check if continuous data follow a known distribution (or model).
The basic idea is to divide continuou
Summary: Goodness-of-Fit in Contingency Tables
Contingency tables are used to summarize data from two or more
categorical variables.
A common hypothesis to be tested is that the variables are independ
Summary: Power of a test
Neyman-Pearson principle of evaluating tests: rst control
Type I error (i.e., signicance level ), then maximize power.
That is, with the same signicance level, the higher the
Summary: Permutation Test
No distributional assumption for the data.
The permutation test is another alternative to the two-sample
t-test. It is also an alternative to the Wilcoxon rank sum test.
It i
Summary: Wilcoxon rank sum test
The Wilcoxon rank sum test is a distribution-free
(nonparametric) version of the two-sample t-test.
The test is useful when (i) the data do not follow normal
distributi
Summary: Kruskal-Wallis test
The two-sample t-test is used to compare two population means.
The analysis of variance (ANOVA) is used to compare three or
more population means.
The distribution-free (n
Summary: Bootstrap Tests
Bootstrap: distribution-free, works for any samples.
The 95% bootstrap condence interval for population mean (or
median m) is (u0.025 , u0.975 ), where u0.025 and u0.975 are t
Review: Experimental Design
There are generally two types of studies: observational studies
and designed experiments.
Causation (cause-and-effect) may be obtained in experiments
but not in observation
Linear Model Review
Regression models are among most useful statistical methods.
The simplest regression models are linear models:
y = 0 + 1 x + e.
In linear regression models, the parameters may be e
Motivation: Bootstrap
In statistical analysis, the mean and the standard deviation
(SD) are the two most important statistics. They measure the
center and the variation of data respectively. They prov
Review for Midterm Exam
Time: 50 minutes, in class.
One formula/cheat sheet (letter-size, two-sided, hand-writing
or typing)
A simple calculator, UBC ID
Two versions of exam
Cover materials up to last
Summary: Interaction in Two-factor Designs
In one-way ANOVA, we study the effect of one factor on the
response. The factor has two or more levels. This corresponds to
the simplest experimental design:
Review: ANOVA Part II
When the assumptions for an ANOVA model do not hold, such as
data not normal or variances not equal, we can use the
corresponding nonparametric Kruskal Wallis test.
An ANOVA mode
Summary: Analysis of Two-way Designs
In the analysis of two-way ANOVA, possible interaction
between the two factors should be taken into account.
In a simple 2 2 table, the interaction effect can be e
Summary: Multiple Comparison
ANOVA is used to compare three or more groups. In the
ANOVA setting, we are often also interested in comparing all
two groups (i.e., pairwise comparison), especially if th
Summary: Contrast
ANOVA models are used to compare three or more
populations/groups. Sometimes, we may also be interested in
comparing two groups, or the average effect of two groups
with a third grou
Review: ANOVA
The Analysis of Variance (ANOVA) method is often used to
compare three or more groups (or populations), while t-tests are used
to compare one or two groups (or populations).
Let 1 , 2 ,
Summary: The Sign Test
The data are not assumed to follow any parametric distribution
(i.e., the test is distribution free).
The hypotheses are about the population median (or mean).
The test statisti
Activity Solution: Properties of Regression Estimators
We have a bivariate sample (x1 , y1 ) , . . . , (xn , yn ) from two variables X
and Y. The regression line of Y on X can be written
y = b0 + b1 x
Activity Solution: Introducing Logistic Regression
The relationship between correlation and linear regression is very much
akin to the relationship between odds ratio (discussed last time) and logisti
Activity Solution: Interaction in Twofactor Designs
Consider an experiment involving two factors, A and B, each with two
levels + and say. These levels could correspond to high and low, for
instance,
Activity Solution: Normal Probability Plots
A study was conducted in Florida between 1968 and 1972 to assess the effectiveness of cloud seeding On each of 52 days during the study period, it
was det
Activity Solution: ANOVA Review
A company manufactures rubber handles for axes, and is testing four
dierent machines for the manufacturing of the rubber. The outcome of
interest is the tensile strengt
Activity Solution: Analysing Sums of Squares
Having familiarized ourselves with the basic ANOVA set we now learn
some general properties about the sums of squares. First some notation is
Chapter 14: Nonparametric Tests
Introduction
robustness
The most commonly used methods for inference about the means of quantitative response variables assume that the variables in question have nor
Non-Parametric Univariate Tests: Wilcoxon Signed Rank Test
Wilcoxon Signed Rank Test
This is another test that is a non-parametric equivalent of a 1-Sample t-test. The Wilcoxon
Signed Rank procedure