One-Way Analysis of Variance (ANOVA)
One-Way Analysis of Variance (ANOVA) is a method for comparing the means of a
populations. This kind of problem arises in two dierent settings
1. When a independent random samples are drawn from a populations.
2. Whe
Condence Intervals
A condence interval provides a simple summary of
how precisely a parameter, denoted , is estimated.
In many situations, a (1 )100% condence interval is of the form
( s t/2 ,
+ s t/2 )
where
is an estimate of
s is its standard error
1. Are the following expression defined? If so, state whether the results is a scalar or a vector.
Assume F is a sufficiently differentiable vector field and f is a suffic
1. Question 1
In a study to compare the visibility of paints used on highways, four different types of
paint were tested at three different locations. It is to be expected that paint wear will
differ depending on
Week 7 Advanced SQL JOINS, Subqueries
Learning Objectives
In advanced SQL, you will learn:
How to use the SQL JOIN operator syntax
About the different types of subquer
SQL Functions
SQL Functions
Data manipulations might be required to decompose data
elements
E.g. Address contains House # and Street Name
SQL provides special functions to manipulate strings,
numbers and dates
Numeric
String
Data and Time
Conversio
Week 8 Advanced SQL (contd.)
Sample Database
Subqueries
Subqueries can be used in several places in a SELECT
statement
WHERE subqueries
IN subqueries
HAVING subquerie
Two ways to look at Con-ngency Tables
Species distribu-ons by Sample
Sample distribu-ons by Species
Minitab: Stat -> Tables -> Chi-Square Test (Two-Way Table in Worksheet)
Test Sta-s-c is 4.601, DOF = 6, P-value=
Correlation and Regression
a scatterplot is used to assess the
relationship between two variables
each point shows the values of the two
variables (xi , yi ) measured on the same
individual
look for the overall pattern and for
striking deviations from
Hypothesis Testing
basic ingredients of a hypothesis test are
1. the null hypothesis, denoted Ho
2. the alternative hypothesis, denoted Ha
3. the test statistic
4. the the data
5. the conclusion
the hypotheses are usually statements about the values of
Wilcoxon Rank-Sum Test, also known as the Mann-Whitney test
Rank the data. That is replace, the data values by their ranks, from
smallest to largest. For example, the pH samples are:
Group 1:
Group 2:
8.53 8.52 8.01 7.99 7.93
7.85 7.73 7.58 7.40 7.35
Comparing two means, paired experiment
many studies are comparative
they compare outcomes from one group with outcomes from another
(e.g. two dierent medical treatments)
in the matched-pairs design each subject in one group is paired with a similar su
Exercise 1
a) Identify the serious data redundancy problems in the file structure
shown below
b) How could you improve the EMP_NAME and EMP_PHONE
contents shown below?
c) Identify the various data sources in the file shown below. What
new files should you
Stat 2080 Assigment 9 Solutions Winter 2010
1. An exercise physiologist used skinfold measurements to estimate the total body fat, Y ,
expressed as a percentage of body weight, X1 , for 19 participants in a physical fitness
program. The body fat percentag
1
A climber was interested in the relationship between altitude r (measured in meters) and the boiling
point of watcr g. She measured the boiling point at scveral altitudes on a partictrlar mountain,
Icading to the lbllowirrg data. (Note one value of y
Consumer Reports produced a Rating for a large number of breakfast cereals. Some of
the variables measured on each cereal were the amount of Fat, Fiber, Sugars (in grams),
and the amount of Vitamins (as a percent of the recommended daily amount). A par
3. A chernical engineer is investigating the effect of process operating temperature (T) on
product yield (Y). The study results in the following data; use your knowledge of least
squares regression to constmct a linear rnodel for predicting yield
STAT 2080/MATH 2080/ECON 2280
Sample Final Questions
1. (4 pt) On hundred volunteers who suffer from severe depression are available for a study.
Fifty are selected at random and are given a new drug while the other 50 are given
an existing drug. A psychi
Introduction, 1 sample t-test and t-interval
Central Limit Theorem
Let X1 , X2 , . . . , Xn be a random sample from a distribution with mean
and variance 2 .Then if n is suciently large (Rule of thumb > 30), X has
2
2
approximately a normal distribution
Permutation Test
for the Two Sample Problem
we wish to compare results for two groups
of experimental units
the rst group could be some subjects
who have been given a treatment, whereas
the second group has not
in some cases we are unable to assume
tha
A nonparametric measure of correlation
we have seen that Pearsons correlation
coecient
measures only linear association
between variables
can be greatly aected by outlying
values
Spearmans correlation coecient is
designed to overcome these problems
t