Hypothesis Testing
Recall that a hypothesis is an unproven conclusion
something that must be tested and evaluated.
There are two hypotheses in a hypothesis test:
I. The null hypothesis: Ho an assertion or statement we
would like to evaluate statistically.
III. The Point Bi-serial Correlation Statistic
Used to measure the linear statistical association between
two variables (X and Y) where one of the variables is
a binary or dichotomous variable (with values of 0, 1).
Commonly represented as rpb, where rp
Continuous Probability Distributions
Most widely used: Normal probability distribution
(also known as the Gaussian distribution)
Used to model continuous random normal variables
in raw or standardized form
X
Z
(Z-scores of X)
It is a bell shaped distri
Correlation: An Overview of Some Widely
Used Measures and Coefficients
Correlation Indices: An Overview
I. The Pearson Product Moment Coefficient
Used to measure the linear statistical association between
two variables (X and Y). It is an index that prov
III. Chi-Square
2
a multiple-use statistic
2
and Goodness of Fit
2 Chi-square as a goodness-of-fit test:
The 2 statistic is used to compare observed values O (or
i
frequencies) to expected values Ei for i=1, k categories
or classes, where expected valu
Probability
probability refers to the likelihood or chance
of something occurring or being observed
P(A) probability of event or outcome A,
where, P(A) may be expressed as a ratio:
number of favorable outcomes
P(A) = -total number of possible outcomes
Not
Equality of Variance Testing
EVT relies on the F-distribution, based on the F-statistic
or F-ratio, and defines a class of continuous probability
distributions named in honor of Ron Fisher (1890-1962).
The F-statistic or F-ratio is a ratio of variances fo
Non-Parametric test procedures
Note: Non-Parametric test procedures are typically used
when the normality assumption cannot be validated.
Non-parametric statistics, by definition, are not constrained
by the assumption of normality (that is, that a variabl
A Quick Primer on Some
Widely-Used Notation
- Sigma: the summation operator
(simply means to add up)
Consider the expression
n
Xi
i=1
which means to add up the values of a
variable X from observations 1 through n,
where n denotes the sample size
(for i
Students t-distribution
correcting the
Normal Distribution
probability values
for small samples
Recall that the standard normal variable (for the sampling
distribution of the mean of X
^
Z = (X ) / ( /n)
^
= (X ) / ( x )
has an approximate standard normal
I. Univariate Descriptive Statistics
Summary measures, procedures, and/or tools
used to highlight the general characteristics or
distributional qualities of variables
Distribution refers to the propensity of
individual observations of a variable (X) to
t
Descriptive Measurements of Dispersion
* spatial dispersion/concentration
- entropy
* traditional
- IR range
- variance & standard deviation
- coefficient of variation
* centro-graphic based dispersion
- standard distance
Relative Entropy
Entropy is a mea