This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Correlations ("r"). 6/16/09 1:20 PM Correlation Interpreting the "Correlation Coefficient"
(llrll & Ilr2ll)
Definition: "Correlation" is a quantitative index, a standard statistical measurement of the degree of relationship or
association between two sets of numbers (variables) to describe how closely they track or are related to
one another. The notion does not necessarily imply causation since no direction of inﬂuence is known or
can be assumed. lin fact, often both variables are "caused" by some other independent variable(s) not
being measured. The concept has been in wide use across the sciences since the 1880's when ﬁrst
popularized by Sir Francis Galton, explaining the relationship between children's height to that of their
parents. it is perhaps the most widely used single analytic procedure in the behavioral sciences. Explanation: Correlation is most commonly measured by a "Pearson Product Moment Correlation Coefﬁcient" (normally
referred to in shorthand by the symbol "r"). It is a calculated as a number ranging between -1.00 and
+1.00. A measure of +/- 1.00 represents a perfect positive or negative correlation, indicating that the two
sets of numbers form an identical pattern. (One example of a +1.00 correlation coefﬁcient might be a
comparison of two sets of numbers where one set represents the inches in height of a group of individuals
while another set represents the centimeters in height of the same group of individuals.) A measure of - ' 1.00 represents perfect negative correlation, indicating that the two sets of numbers form a perfect inverse
relationship. (One example of a -1.00 correlation coefﬁcient might be a comparison of two sets of numbers
where one set represents the total number of dollars in your bank account(s) for each day of the month
and the other represents the total number of dollars being spent from your bank accounts on each day of
the month.) It is rare to ﬁnd correlations of + or - 1.00 in social or educational research. A correlation of
0.00 means there is no relationship whatever between the variables. The Pearson's correlation coefﬁcient
is named after Karl Pearson, who further developed the concept and its formal calculation in the 1890's
following the pioneering work of Galton. Limitations & Characteristics: "Correlation" assumes that there is a linear relationship between the two sets of numbers. If the
relationship is curvilinear, the "r" will give false and misleading readings that substantially underestimate
the relationship. The easy way to test and see whether the relationship is linear is to plot a scatter diagram and see if the
"points" scatter in a more or less linear direction. On a scatter diagram, the coefﬁcient measures the slope
of the general pattern of points plotted and the width of the ellipse that encloses those points. The width of
the ellipse indicates the extent of the relationship and hence, the magnitude, or absolute value of the
coefﬁcient. Some analysts advise removing any "outlier" cases from consideration and treat them a priori as
aberrations so that they do not bias the relationship remaining among the more "normal" cases. The two variables being correlated must always be paired observations for the same set of individuals or
objects, such as height and weight of a single group of individuals. Each case to be included must be http://irp.savstate.edulirp/glossary/correlation.html Page 1 of 3 Correlations ("r") 6/16/09 1:20 PM represented by a value in each variable. The variables being correlated must be measured on an interval or ratio scale. Categorical data cannot I?
be properly measured with this tool. A "point-biserial correlation coefﬁcient" (rpb) can be used to compa ' interval or ratio data in one variable to nominal or dichotomous data in the other. [Other tools are available
to measure the relationship between two nominal variables] The homogeeity of the group can effect the correlations. If a group is sufﬁciently homogeneous on either
or both variables, the variation will tend toward zero. In this case, one would be, in effect, dividing by zero
and the formula becomes meaningless. The variable will have been reduced to a constant. In other words, there must be enough variation or heterogeneity in the scores to allow a relationship to manifest
itself. While the number of observations used in the calculation does not inﬂuence the value of the coefﬁcient, it
does affect the accuracy of the relationship. Typical Interpretation: One old classic and typical interpretation of "r" uses ﬁve easy "rules of thumb" to answer the question
"When is a correlation coefﬁcient "high" and when is it "low"? as follows: "r" ranging from zero to about .20 may be regarded as indicating no or negligible correlation.
"r" ranging from about .20 to .40 may be regarded as indicating a low degree of correlation.
"r" ranging from about .40 to .60 may be regarded as indicating a moderate degree of correlation.
"r" ranging from about .60 to .80 may be regarded as indicating a marked degree of correlation.
"r" ranging from about .80 to 1.00 may be regarded as indicating high correlation.
[A. Franzblau (1958), A Primer of Statistics for Non-Statisticians Harcourt, Brace & World.
(Chap. 7)] Italics in original. Other more recent scholars explain, simply, "as a rule of thumb, we can say that correlations of less than
.30 indicate little if any relationship between the variables." [See: Hink/e, Wiersma, & Jurs (1988), Applied Statistics for the Behavioral Sciences, 2nd
ed, Houghton Mifflin 00.] Advanced Interpretation: A more precise interpretation arising from the correlation coefﬁcient is recommended by some statisticians
and requires one further calculation. If the "r" is multiplied by itself or "squared," the quotient, commonly known as "r2" (read "r square") will indicate approximately the percent of the "dependent" variable that is
associated with the "independent" variable. Technically, "r2" is called the "coefﬁcient of determination." Thus, for example, a correlation coefﬁcient ("r") of .50 would yield a coefﬁcient of determination ("r2") of
.25 so that in such a case, 25% of the variation in the dependent variable might be considered as being
associated with the variation in the independent variable. The coefﬁcient of determination is indicating the
proportion of "shared" variance between the variables, irrespective of causality, It is the proportion of the
variance in one variable associated with the variance of the other variable. Predictive Validity: V\ﬁth regard to predicting a dependent variable from the values of an independent variable, Franzblau say"
"the relationship between the size of a correlation coefﬁcient and its predictive value is not a directly
proportional one. The lower correlation ﬁgures are of almost no value in prediction; the moderate ones are http://irp.savstate.edu/irp/glossary/correlation.html Page 2 of 3 Correlations ("r") 6/ 16/09 1:20 PM only slightly better; the marked coefﬁcients are somewhat but not very much better. Only as we advance
into the high correlation range do the predictive values rise to usable levels... Coefﬁcients below .40 do not
yield a guess even 10% better than chance. To yield a prediction which is 25% better than a chance or
random guess, the correlation must be at least .66; to be 50% better than chance, a correlation of at least
.86 is needed; to be 75% better than chance, the coefﬁcient must rise as high as .97." (ibid. p. 88). Savannah State University
Office of institutional Research & Planning
Summer, 2002 http:/lirp.savstate.edu/irp/glossary/correlation.html Page 3 of 3 ...
View Full Document
This note was uploaded on 10/19/2011 for the course CHEM 197 taught by Professor Bonk during the Summer '11 term at Duke.
- Summer '11