Stats Study Guide
Bivariate statistics:
Means
For a realvalued random variable
X
, the mean is the expectation of
X
. Note that
not every probability distribution has a defined mean (or variance);
For a data set, the mean is the sum of the observations divided by the number of
observations. The mean is often quoted along with the
standard deviation
: the mean
describes the central location of the data, and the standard deviation describes the spread.
Ttest
A
t
test
is any statistical hypothesis test in which the test statistic has a Student's
t
distribution if the null hypothesis is true. It is applied when sample sizes are small enough
that using an assumption of normality and the associated
ztest
leads to incorrect
inference
. A test of the null hypothesis that the
means
of two
normally distributed
populations are equal. Given two data sets, each characterized by its
mean
,
standard
deviation
and number of data points, we can use some kind of
t
test to determine whether
the means are distinct, provided that the underlying distributions can be assumed to be
normal
ANOVA
In
statistics
,
analysis of variance
(
ANOVA
) is a collection of
statistical
models
, and their associated procedures, in which the observed variance is partitioned
into components due to different
explanatory variables
.
,
Correlation
(bivariate, partial, distances),
Nonparametric
tests
A
cross tabulation
(often abbreviated as
cross tab
) displays the joint distribution of two
or more
variables
. They are usually presented as a
contingency table
in a
matrix
format.
Whereas a
frequency distribution
provides the distribution of one variable, a contingency
table describes the distribution of two or more variables simultaneously. Each cell shows
the number of respondents that gave a specific combination of responses, that is, each cell
contains a single cross tabulation.
The following is a fictitious example of a 3 × 2 contingency table. The variable
“Wikipedia usage” has three categories: heavy user, light user, and non user. These
categories are all inclusive so the columns sum to 100%. The other variable "underpants"
has two categories: boxers, and briefs. These categories are not all inclusive so the rows
need not sum to 100%. Each cell gives the percentage of subjects that share that
combination of traits.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full Documentboxers
briefs
heavy Wiki user
70%
5%
light Wiki user
25%
35%
non Wiki user
5%
60%
Cross tabs are frequently used because:
1.
They are easy to understand. They appeal to people that do not want to use more
sophisticated measures.
2.
They can be used with any level of data: nominal, ordinal, interval, or ratio  cross
tabs treat all data as if it is nominal
3.
A table can provide greater insight than single statistics
4.
It solves the problem of empty or sparse cells
5.
they are simple to conduct
the
chisquare distribution
(also
chisquared
or
χ
2
distribution
) is one of the most
widely used
theoretical
probability distributions
in
inferential statistics
, i.e. in
This is the end of the preview.
Sign up
to
access the rest of the document.
 Fall '06
 Banaciewicz
 Statistics, Normal Distribution, Null hypothesis, linear relationship

Click to edit the document details