Frequency table:
class width=(highest value)-(lowest value)/# of classes
Scattergram:
direction, form, strength, unusual features
Mode
The value that occurs most frequently or the frequency class with the
highest frequency. A data set may be unimodal, bimodal, multimodal, or
have no mode
Midrange
The value midway between the highest and lowest values in the
original data set Midrange = (highest score + lowest score)/2
Interquartile range (IQR)=
Q3-Q1
Mean:
ỹ
= Σ
y/n=Total/n
Standard dev:
s=√(Σ(y-
ỹ
)
2
/(n-1))
Five # summary:
max,Q3, median, Q1, min (w/boxplot)
STAT
Calc
1-Var stats
Median and mode not affected by outliers
Measures of spread w/measures of center:
-Midrange and Range
-Median and Interquartile Range
-Mean and Standard Deviation
Left-skewed:
maj. Of data fall to left of mean and cluster @ lower end of
dist.
Right-skewed:
maj. Of data fall to right of mean and cluster @ upper end
of dist
Z-score (standardized value):
z=(y-
)/
ỹ
s
(the center is changed by becoming 0, The spread is changed; the standard
deviation becomes 1, The shape of the distribution doesn’t change.)
(Ordinary values: z-score between -2sd and 2sd)
68-95-99.7 Rule
Find normal percentile range:
2
nd
DISTR
normalcdf
If
z
has a standard normal distribution:
– P(a<
z
< b) = normalcdf ( a , b )
-To find P(
z
<
a
), enter normalcdf ( -5 ,
a
)
-To find P( z > a ), enter normalcdf (
a
, 5)
Correlation coefficient=
r=Σz
x
z
y
/(n-1)
Must fit straight enough condition, outlier condition (report with and w/o)
Meas. Strength of lin. Assn.
1. -1 ≤
r
≤ 1; the sign tells the direction of the association.
Values of exactly
±1 are rare.
2. The value of
r
does not change if all values of either variable are
converted to a different scale (like changing unit of measure or using z-
scores).
3. The value of
r
is not affected by the choice of
x
and
y
. Interchange
x
and
y
and the value of
r
will not change.
4.
r
measures the strength of a linear association.
5.
r
has no units
6. The value of
r
is sensitive to outliers.
On calc.: STAT
Calc
LinReg(ax+b) (enter, enter)
•
r
2
gives the fraction of the data’s variation
accounted for by the model,
• And 1 -
r
2
is the fraction of the original variation
left in the residuals
x
is the independent variable (predictor variable)
ŷ
is the dependent variable (response variable)
ŷ
=
b
0
+
b
1
x (b
0
=y-int., b
1
=slope)
ŷ
=
mx
+
b, (algebra text) or
ŷ = ax + b (calculator)
b
1
=r s
y
/s
x
b
o
=
-b
ỹ
1
x
Residual=e=data-model=y- ŷ
Std. Dev. For residuals: S
e
=√(Σe
2
/(n-2))
Plotted should stretch horizontally with even scattering throughout, no
bends, no outliers(boring)
Graph of z-scores: z
y
=rz
x
Sample Surveys
Sample data must be collected in an appropriate way,
such as through a process of random selection.
(ONLY THE # SAMPLED
MATTERS, not the population)
Sampling frame
is a list of individuals from which the sample is drawn. It
must match the
population to avoid the risk of bias
Parameter
a numerical measurement describing some characteristic of a