Variables and
Their Measurements.
A variable is quantitative if the values it takes are numeric values.
The values of a quantitative variable represent different magnitudes
of that variable.
What are examples of quantitative variables?
Variables and
Their
Finding Z values
Many inferential methods use z-values corresponding to certain
probabilities from a normal distribution. This entails the reverse use
of the z-table.
Starting with a lower tail probability (the probability of being less
than), which is li
Analyzing association between
categorical variables
Recall, there is an association between two variables if the
distribution of the response variable changes in some way as the
value of the explanatory variable changes.
We are now going to learn methods
Significance test
for a mean
For quantitative variables, significance tests usually refer to the
population mean
The null hypothesis has the form Ho: = o, where o is a particular
value for the population mean.
The hypothesized value in Ho is a particular
2-way
contingency tables
Party Identification
Gender
Dem.
Rep.
Total
Females
400
150
50
200
Total
100
Males
300
450
150
600
Some notation:
We will define nij as the number of observations in row i and in
column j. Thus n21 = 150.
We define ni+ as the numb
Why pair?
Example: The subjects in the previous study were asked to
rate their hunger at the end of each two week period.
Find the P-value for a paired sample t
test.
A.P=.013
B.P=.026
C.P=.073
s
D.P=.42d
SE
d
nd
d0
ts
SEd
?
Why pair?
Example: The subj
Pearson chi-square
test of independence
If the null is true, the expected cell counts ij should be close to
the observed cell counts nij.
The larger the errors, the greater the evidence against the
null hypothesis.
The Pearson Chi-square test of independe
Comparing
two proportions
As with means, hypothesis tests between two proportions generally
are testing if the proportions are the same. That is we generally
have
Ho: p1 = p2
This is again equivalent to testing
Ho: p1 - p2 = 0
To calculate the p-value, we
ANOVA
We discussed before how to compare the means of two groups, 1
and 2.
We will now discuss comparing the means of several groups.
The mean of a quantitative response variable is compared among
groups that are categories (or levels) of an explanatory v
Normal
Error Model
No matter what distribution the error terms i have (and hence the
Yi), the least squares method provides unbiased point estimates of
0 and 1.
These estimates have minimum variance among all unbiased
linear estimators.
However, to do con
ANOVA
The F statistic is the ratio of the estimated variance between groups
over the estimated variance within groups.
If the estimated variance between groups is large compared to the
estimated variance within groups, the F statistic will be large.
Thus
Relations between variables
A functional relationship between two variables is expressed by a
mathematical formula.
For an independent variable X and a dependent variable Y, a
functional relationship has the form
Y= f(X)
For example, if we sell widgets fo
ANOVA
One assumption of the ANOVA models is that the standard
deviation is the same for each group. This common standard
deviation is denoted by . Thus we assume that
1 = 2 = . = I =
homoskedasticity: The variances for all the groups are equal
heterosked
ANOVA
Example: Two-week weight gains (lbs) of lambs on three different
diets 1: State H and H
Step
0
a
Diet1
SS(within ) 210
SS(within)
df(within ) 9
df(within)
Step 3:
Step 4:
Step5: MS(within)
MS(within ) 23.333
Step 6: Calculate the between sums
SS(bet
Confidence Intervals
for means
Like a confidence interval for a proportion, the confidence interval for a
mean has the form: point estimate margin of error.
The margin of error for is z1-/2* y(bar) , where
y
n
Thus, in the long run
y 1.96 n
will contain
Comparing
two means
To compare two populations, we estimate the difference between their
parameters.
To compare population means 1 and 2, we treat 1 - 2 as the
parameter.
The point estimate of this parameter is the difference in the sample
means:
y1(bar)
Significance test
for a proportion
When the sample size is large, the sampling distribution of p(hat) is
approximately normal with mean p, the true parameter.
We are interested in testing a null hypothesis of the form Ho: p=p o
The alternative can be eith
Definitions relating samples to populations
Population
Parameter
A characteristic
of a population
(e.g. true mean,
true SD)
Sample
Statistic
A characteristic
of a sample
(e.g. sample mean,
sample SD)
A statistic is an estimate, based on a sample, of an un
Example
APD did a study of the relationship between the number of
accidents at intersections and panhandlers.
They found that there were a high number of accidents at
intersections that had a high number of panhandlers.
They concluded that there needed to
Good graphics boxplots
Max
Q3
Median
Q1
Min
Boxplot: visual representation of the five number
summary
This distribution is
skewed to the:
A.
B.
60
65
70
75
80
Where would the mean be?
Right
Left
?
Good graphics boxplots
Boxplot: visual representation of t
Probability: With a random sample or a randomized experiment, the
probability an observation has a particular outcome is the proportion of
times that outcome would occur in a very long sequence of observations .
Probability helps us describe the outcome o
Binomial Distribution
For Categorical data, the following conditions are often true:
1)
Each observation falls into one of two categories (success or
failure, heads or tails)
2)
The probabilities for the two categories are the same for each
observation. W
Describing
variability of data
A measure of center alone is not adequate for numerically describing
data for a quantitative variable.
It describes a typical value, but not the spread of the data about that
typical value.
Consider two nations, A and B. Bot
Point Estimation
We want to estimate parameters from our data. A point estimate is
a single number that is used to estimate the parameter.
An interval estimate is an interval of numbers around the point
estimate, within which the parameter value is believ
Descriptive statistics
The purpose of descriptive statistics is to summarize data in order to
make it easier to assimilate the information.
Tables and graphs are used to show the number of times various
outcomes occur.
For categorical data, the categories
Confidence intervals: Locating an invisible man!
y
Themaniswithin1SEabouttwothirdsofthetime
95%confidenceinterval
Themaniswithin2SEsofthedogabout95%ofthetime
y
Weknowwherethedogis,andwedliketoestimatewherethemanis.
y
Weknowthat95%ofthetimethemaniswithin2S
The Normal Distribution
IfarandomvariableYfollowsanormaldistributionwithmeanandstandard
deviation,thenwewrite:
Y~N(,)
ThedensitycurveofthedistributionofanormalvariableYisgivenby:
1
f(y)=e
2
( )
1y
2
2
Normal distribution
The empirical rule applies to var
Note the mean of y(bar) is the same as the mean
of the true population. That is the mean of y(bar)
is the same as the mean of an observation.
Note the standard deviation of y(bar) is
smaller than the standard deviation of the
true population.
We would not
Significance Tests
In statistics, a hypothesis is a statement about a population. It is
usually a statement that a parameter takes a particular numerical
value or falls in a certain range of values.
A significance test uses data to summarize the evidence
Significance test
for a mean
When the alternative hypothesis is one sided, technically the null
hypothesis is as well.
The null and alternative hypotheses are compliments and comprise
all possibilities, such as it is raining vs it is not raining or the me
SQL
SELECT
DISTINCT
WHERE
AND OR
IN
BETWEEN
LIKE
ORDER BY
COUNT
GROUP BY
HAVING
ALIAS
CONCATENATE
SUBSTRING
TRIM
CREATE TABLE
CONSTRAINT
NOT NULL
UNIQUE
CHECK
CREATE VIEW
CREATE INDEX
ALTER TABLE
DROP TABLE
TRUNCATE TABLE
INSERT INTO
UPDATE
DELETE FROM