This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: ...__r_.\u..... Mums—«wwwkmadlmmﬁw.g;, .. _......; was...“ “in ,_ ' ,s’
“.3“ 7 p164) Wright and Mahatmatail (zest) 9.6 REGRESSEON AND CORRELATION ANALYSIS In Chapter 7 we discussed the dependence of a random variable on another. A cor
relation coefficient was introduced and described the degree of linear correlation
between a dependent variable and independent variable. In many engineering prob—
lems, we may have to go beyond simply finding the correlation. In fact, the interest
may be in investigating whether a relationship exists that would describe the depen
dent variable Y in terms of the independent variable X. It is noted that the basis for
establishing such a relationship is a set of data compiled for the dependent and inde—
pendent variables. A regression analysis refers to the analysis of data to arrive at this
relationship. The correlation and regression analyses are conceptually different.
However, they are related. Any established relation between Y and X is within the
range of data compiled. The validity of such relation is evaluated through the corre—
lation between Y and X. In problems where only one independent variable is involved, the regression
analysis is called simple regression. When several independent variables X , (i = 1 to
q) are involved, the regression is said to be multiple. Furthermore, recall that the
correlation coefﬁcient (p) described in Chapter 7 is a measure of linear correlation
between X and 1’. As a result, if a nonlinear relation between Y and X exists, the
correlation coefﬁcient will be substantially less'than 1 (recall that lpl = 1 is an indi
cation of a perfect linear correlation between X and Y). In most engineering appli
cations, the regression analysis is primarily conducted to arrive at an empirical
equation for the dependent variable. The equation is then adapted for use in design Sec. 9.6 Regression and Correlation Analysis 269 and decision~malcing problems when the exact relation between Y and X is not
known or cannot easily be found through theoretical formulations.
Regression analysis problems can be classified into one of the following cases: '  Linear regression of the dependent variable Yon the independent variable X.  Linear regression of the dependent variable Y on q independent variables X,
(i=1,2,...,q). _
0 Nonlinear regression of You X or You several independent variables X,. In addition, We may be interested in investigating the correlation between Y and X
(or between Y and X,) in these cases. When the data set involves ranks or the sam—
ple size is small enough that it can easily be ranked, a measure of linear correlation
can also be obtained using the ranks. In any correlation and regression analysis, it is
important to note that the statistical samples play a crucial role. A regression equa—
tion is only valid Within the range of the data collected. As such, one should not
extrapolate the regression relationship beyond the boundaries of the compiled data.
The correlation coefficient represents a statistical parameter for which confidence
levels may be established. ' We begin the discussion With simple linear regression analysis. This is followed
by the nonlinear analysis and a comprehensive discussion on correlation analysis. 9.6.1 Simple Linear Regression Analysis This problem involves regression analysis of a dependent variable Y on an indepen~
dent variable X. In theory, we will be interested in finding E(le), which yields the
expected value of Y given a specific outcome for Xsuch as x. We can write: E(Yx) = a + bx (9.41) in which a and b are constants. The estimates for a and b are found using the least
square analysis. Suppose in a data acquisition session, we compile it pairs of sample
values for Y and X. These pairs will be (rhyl), (3:2,y2), . . ., (men). Each x, corre*
sponds to a specific value for Y such as y,. The pair of values for X and Y are
obtained in an experiment, through field data collection, by surveys and so on. A
plot of y, versus x, often appears in the form of a cluster of points. If a strong linear
correlation beIWBen X and Y exists, the data will cluster close to each other and
show a linear trend. Very weak correlation between X and Y will result in a large
scatter in the plotted points and no speciﬁc trend between the two. Figure 9.5 shows
these two situations. Assuming the exact values for the constants a and b are known,
Eq. 9.41 can also be plotted along with the scatter plot of yi versus x,. As seen in Fig—
ure 9.6, for any given value forX such as x,, there are two values for Y. These are yh
which is obtained from the data, and y; which is from Eq. 9.41. The difference
between the two values A; z y; — y, is called the residue. In conducting the least
square method, the sum of square values of the residues is minimized to obtain esti
mates for a and b. These estimates will represent the bestﬁt equation for the data.
Denoting the sum of squares of the residues as SSE, l
l
l
i
l
1‘
l
l i

I
i ”5.3;." .~ .u h. oui{i‘w 270 _ ' Hypothesis Testing, Analysis of Variance, Regression Chap. 9 (a) Strong Correlation  (b) Weak Correlation Figure 9.5 Strong vs. weak condition. (a) Strong correlation; (1)) weak correlation. 1‘: Figure 9.6 Regression line. 55;: = E (y?  3102 = E (a + bxi — 3:02 (9.42)
' . iﬁl i=1
The minimization involves the following operations:
3555
= 0 s
.661
assE
ab _ 0 These will result in the following two equations for a and b. Note that because the
equations provide only the estimates for a and b, the symbols E: and b are used to
indicate that these are only estimates. Sec. 9.6 Regression and Correlation Analysis 271 a = my e bmx (9.43)
' ”2750’s " Eixiziyr
5: i=1 i=1 I=1 (944) n n 2 n a — (2) ' i=1 i=1
Here, MI and my are estimates of mean values for X and Y, respectively (see Eq.
8.10). Other variations to Eq. 9.44 can also be found. The proof of Eqs. 9.43 and 9.44
are left to the reader as an exercise problem. Notice that H): is a random variable
_ whose mean value is E(Y§x). An unbiased estimate for the variance of this random
variable is denoted by 5124,; and is computed as follows: 1 P1 1 TI  V y 252:“ 'r_ .?:————— +b— .2 ﬂl"( II) _ Hr 1‘1 _2;;[yl yr) n _2'2_(a I yl) SSE
=  .4
n' — 2 (9 5) The standard deviation of le (i_e., 5,4,) is sometimes shown with two parallel lines
along with the regression line on the Y — X graph.'We must emphasize that this
standard deviation is different from 5)., since 3y is the standard deviation of Ywithout
any reference to X. As it is explained in the next section the two standard devia»
tions, i.e. rm, and 5)., are related to each other through the correlation coefficient. 9.6.2 Correlation Coefficient Recall from Section 7.3.2 that the linear correlation coefficient is computed from
Eq. 7.36. In theory, Eq. 7.36 requires information on the joint probability density
(or mass) function ofX and Y. An estimate for p can be found using the data com
piled for X and Y. Assuming a uniform joint probability distribution for X and Y,
the covariance of X and Y can be written as: Covtm = E[(X — your — m3 ="31‘20r— maca — my) (946a) An unbiased estimate for the covariance can be found when n — 1 instead of n is
used: C0v(X,Y) = 1 :(xrmiwimy) ._ (9.46b) fit—11.. Using .3 and 3),, i.e., the estimates for the standard deviation of X and Y, and denot
ing the correlation coefficient as r as an estimate of p, in light of Eq. 7.36: i (I; ”' matri _ my)
"1 (9.47a) OI mm... News. Muhammad Hypothesis Testing, Analysis of Variance, Regression Chap. 9 Exes — ”(171me
1 i=1 , a. m” _ 1 . My (9.4%) From Eqs. 9.44 and 9.47, it can be shown that: z (I, ~ mxjvi h my)s 3.:
i=1 1 J J’
COnsidering the unbiased estimate for 5%. (Eq. 9.45), and in light of Eq. 9.47c, it can
be shoWn that: n ~— 1 '
3%; = n _ 253(1 — r2) {9.48)
Solving this equation for r2
' 2
2 I” ~ 2 '3le
= 1 “ .47d
’ . n — 1 s; (9 )
For large values of n, the ratio (n _ 2)](n — i) = 1 and _
2 ..
s V
r2 .= 1 — :1 (9.47e)
5) For a perfect correlation r = 1 and 5,1,, = 0. This means that there will be no scatter
about the regression line. However, it is emphasized that this does not mean ity is
zero because S), is an estimate of the standard deviation of Ywithout any reference
to its relationship to X. Example 9.19 In studying the productivity in construction sites discussed earlier, we want to establish
a linear regression equation between the productivity (dependent variable) and the
work shift hourslday (independent variable). The productivity is measured as the per
centage of work completed as opposed to the scheduled work to be completed within a speciﬁc time period. In a survey, nine observations were made and the results in Table
E9.19A were obtained. (2) Establish the regression equation E(le) = a + bx.
(b) Estimate the standard deviation of Yl'x.
(c) Estimate the correlation coefficient. Solution: (a) Equations 9.43 and 9.44 are used for estimating a and [7. Table 139.1913 summarizes
' the statistical values needed for a and b computation. For simplicity, we use a and
b to represent estimates (without the symbol A). b = 9 X 5915 — 72.5 X 745
9 X 605.75 — (72.5)2
mx =' 72519 = 8.056 m), = 745/9 = 82.778
a = 82.778 + 3.977 X 8.056 = 114.817 = —3.977 Sec. 9.6 Regression and Correlation Anaiysis 273 TABLE EBASA PRODUCTIVITY VERSUS WORK
SHIFT HOURSIDAY Productivity (%), Y Work Shift Hourstay, X 90 . . . 7
85 6.5
75 9.5
83 7
88 7
78 10
90 7.5
87 7
69 11
TABLE 59.1913
xi 3‘; l7 3’? _ 11X);
7 90 ’ 49 8100 630
6.5  85 42.25 7225 552.5
9.5 75 90.25 5625 712.5
7 B3 49 6889 581
' 7 88 49 7744 616
10 78 100 6084 780
7.5 90 56.25 8100 675
7 87 49 7569 609
11 69 121 4761 759 Ex, 2 72.5 29,. = 745 2111 = 605.75 2)»? = 62097 2;] x y; = 5915 a = 82.778 + 3.977 X 8.056 = 114.817
15(17):) = 114.817 — 3.977x.
(b) 111 this part we utilize Eq. 9.45. By expanding this equation we obtain
Var(Y[x) = (7m2 + 1.72223? + By? + 261(32):, — 2:12;), — 252x313“?! — 2)
= [9(114.817)2 + (3.977)2 (605.75) + (62097) + 2(114.817)(3.977)(72.5)
*2 X (114.817)(745) H 2(—3.977)(5915)]!7 = 84.017 = 12.00 The standard deviation 51,1I = 3.46. (c) In this part we can either use Eq. 9.473 or 9.471}. We ﬁrst must compute 3x and 3y.
5} = (2x?  nm§)/(n ~— 1) = {605.75 — 9(8.056)2]l8 = 2.707, and S, = 1.645. Sim
ilarly, xi = [62097 — 9(82.778)2]l8 = 53.403, and .5), = 7.308. Thus using Eq. 9.471: r = [(5915 .. 9 x 8.056 x 82.778).’(1.645 x 7.308)}!(9 *1) = 41.90. Figure £9.19 presents the data and the regression line. Notice that 5m = 3.46 is shown
as two parallel lines above and below the regression line. a
i
'3
l
l
i
l
i
i
l
i 274 Hypothesis Testing, Analysis of Variance, Regression Chap. 9 Productivity (93) ' Figure 13.9.19
‘3 6 8 10 ‘2 Productivity vs. work
Work shift hours per day shift hours per day. 9.6.3 Strength of Linear Correlation The least square method used in establishing a straight line fit to the X,Y data is
merely a mathematical tool that always results in a linear equation between E(Yx) . and x. However, one must use this equation with caution, since in reality there may be no linear relation between Y and X, or at best the linear relation may be weak. It
is therefore necessary to investigate how well the straight line obtained through the
least square analysis fits the data. Of course, the correlation coefficient can be used
as a measure to determine how well the linear equation fits the data. More elaborate
methods are conducted by employing a statistic called the coeﬁicient of determina
tion and by using an analysis of variance. Both these methods are explained in this
section. _ _ In Section 9.6.2, we established a direct relationship between the estimates of
b (i.e., the slope of the regression line) and 1" (see Eq.9.47c). Equation 9.47c indi» cates that b and 1' always have the same algebraic sign. The positive values of r will _ always be accompanied by a_ rise in the regression line (positive slope), while the
negative r will be accompanied by a negative slepe. We further derived an expres
sion between the correlation coefﬁcient r and the random variation in Y about I (as
denoted by 32”,) and the variation in Y alone (as denoted by 52),). The following notaf
tions are introduced: s.y= itxi— matte ,) ' i 1 and S11: 2(xi_ x)2 fa}? '3'. Sec. 9.6 Regression and Correlation Analysis 275 and
Srr : 2 05' _. may
i=1 and it is knowu that SSE~52HX and tine52)., Eq. 9.47e can also be written as
.533; " syy — SSE
S S h )‘3’ F)’ The numerator in this equation is a measure of variability in Y that concerns the'lin
ear relationship with X. This is to say that r2 is the ratio ofthe measure of variation
in linearity of Y to the measure of total variation in Y (without any reference to X).
This ratio is called the coefﬁcient ofdeterminmion. The maximum value for this ratio
is 1. If r2 is large (closer to 1), it can be concluded that there is a strong linear relation
between X and Y. Smaller r2 values (i.e., below 0.4) indicate a somewhat weaker lin
ear relation. The analysis of variance can also be used as a means to test whether the
straight line equation obtained for H): is associated with statistically significant vari— '
ation in Y. To conduct ANOVA, we need to identify components such that the
sources of the variability can be recognized through these components. Recall from Section 9.6.1 that the sum of squares of the difference between the observed values
of Y and those from the linear equation was shown as SSE, where 2.. r — 1 (9.479 r: 555 = Etyi , (a + my i=1
It can be shown that
SSE = 5).}. ? b530,
or
S” = b5”, + SSE Thus the total variation in 1’ (Le, S”) is now written in terms of two components.
These are: (1) the variation that is attributed to the linear regression between Y and
X (i.e., bey); and (2) the error, or residual, sum of squares (unexplained error).
Denoting bey as SSH, we notice that if the relationship between X and Y is very
close to linear, then most of the variability in 1" will be in SSR. The relative size of
SSR to SSE can then be used as a basis for ANOVA. In order to develop a model to conduct ANOVA, we assume that k specific
values of the random variable X are selected. These are x1, x2, . . ., xk. Considering
the regression of Y on X, there will be It random variables le; where 1' = 1,2,. . ., k.
At this point, we assume that these random variables are independent normal with
a common variance 02. If the regression of Y on Xis linear, then the mean values of
these k random variables must be all located on the line represented by the regres—
sion equation. The regression line representing the mean values can be expressed
with an expression #th = a + [3x as show in Figure 9.7. We further assume that a
random sample of size n, is selected from each normal distribution describing the
random variable lei. If yij is denoted as the jth element of the random sample 1',
then it has a mean value equal to a + Big and a variance equal to U2. There will also Hypothesis Testing, Analysis of Variance, Regression Chap. 9 Figure 9.7 Regression
line representing the
mean value. be an observed value associated with YE}; however, this observed value will differ from the mean of YE, by a random amount 11,7. Denoting this observed value as Y*,},
we can write yaﬁ = a + [315+ A”. ' (9.49) This equation is valid fori = 1, 2, . . ., k and j 2 1, 2,. . ., 12,. Note that ifB = 0, then
the variation in Ywill not be due to the linear model since all variation in Ywill be
random. On the other hand, if ,8 vi 0, Eq. 9.49 will depend on in. This means that a portion of variation in Ywill be due to the regression line. The null and alternate
hypotheses can then be written as follows: Hozﬁxo
HpﬁsEO If the null hypothesis cannot be rejected, it indicates a lack of linear regression at the
significance level tested. However, if the null hypothesis is rejected, then‘there is
enough statistical evidence to suggest that the linear regression is valid. To establish the test statistics fer AN OVA we utilize SSR and SSE, as
described earlier to define: (I) MSR, which 15 called the regression mean square, and
(2) MS 5, which IS referred to as the error mean square The degrees of freedom asso ciated with these statistics are 1 andn — 2, respectively, wherenf — 711 + 112 + milk.
We can thus write: R '= 55,111 and ' MSE =' 351.111 — 2) (9.50) in which SSR = [15 and 555 — S” — SSE.
Conceivably, if the null hypothesis Is true, then the observed value of the ratio . MSR/MSE will be close to 1. If the null hypothesis 1s rejected, the ratio is said to be inﬂated, and as such, the regression equation 1s linear This ratio will follow an F dis—
tribution function with degrees of freedom equal to 1 and n — 2. In practice the necessary calculations for ANOVA are arranged m tabular form as shown in Table 9. 3 Example 9.29 In Example 9.19; .
(3) Discuss the strength of the linear regression. (b) Conduct an ANOVA to further investigate the statistical evidence of the linear 
relationship between Y and X. Sec. 9.6 Regression and Correlation Analysis 277 TABLE 9.3 ANOVA FOR LINEAR REGRESSION Variation Source Degrees of , Sum of Squares Mean Square F
Freedom SS MS
'e ' Regression ' ' ' '1 ' SS}; : 1:3,, SSH}: MSR/MSE
Error n  2 SS; = S». — b3” SSE/(n e 2)
Total n v 1 Sn. TABLE E920 ANOVA RESULTS  Variation Source Degrees of SS MS F f0_95(],7)
‘ Freedom
Regression 1 SS}? = 344.96 344.96 29.35 559
) Enor 7 SS 5 = 82.26 11.75
Total 8 427.22
1
3 Solution:
1 . . . .
3 (a) In this part, we compute r3 to equal 081. This ratio is large enough to suggest that
’ the linear relationship betWeen X and Y is strong.
(b) To conduct ANOVA we must compute 5,), and S”.
Sty : 20?, H ”190’: _ my) = Exiyl .— ”(mxmy)
= 5915 — 9 X 8.056 X 82.778 = *8674
SN = 201, — my)2 = 2y? w r1025): = 62097 — 9032.778)2 = 427.22. Based on these, we have 35,; = b5”, 2 —3.977 X (—86.74) = 344.96, and
3 SSE = S” — SSH = 42722 W 344.96 = 82.26. Thus
1 MSR = SSR/l = 344.96, and
_ MSE = SSE/(n — 2) = 8226/7 = 11.75. The test statistic is
) F = MSRIMSE = 344.96/11.75 : 29.35. The critical value of the F distribution for v1 = 1 and :12 = 9  2 = 7 degrees of free
dom is fo_05(1,7) = 5.59 at 0.05. Since F 2 29.35 is larger than the critical value, the ratio
F is inﬂared; and as such the null hypothesis is rejected. This means there is enough sta
tistical evidence to have a linear regression of Y on X. The ANOVA results are sum—
marized in Table E920. 9.6.4 Multiple Linear Regression Occasionally, in certain engineering problems, it may be necessary to establish the
linear regression between the dependent variable Y and q independent variables X 3,
X2, . . ., and X 9. The regression equation in this case is written as: E(Y]x1,x2, ...xq) = a + bixl + (523:2 + + quq (9.51) i
E
l
:x 278 Hypothesis Testing, Analysis of Variance, Regression Chap. 9 The procedure for estimating the constants a and b}. (where j = 1, 2, . . ., q) is the
same as that in single linear regression. Upon establishing SS 5, the sum of the square
of the difference between the observed values of Y (i.e., 31,) and the theoretical val . ues from Eq. 9.51, SSE is minimized. This will result in q + 1 simultaneous equations that are then solved for a and b} (j = 1, 2, . . ., q). It can be shown that
q ,
a = my — 2 bjmj (9.52)
“=1 and that the constants b can be obtained from the following q simultaneous
equations: l712051;" ”10(in '" m1) ”1’ {72:91, motel“ mg) +
+ b qztrli ... mt)(xqr  mq)_ — 2(XII _ W100"; — my) [712(125“ mzltx]: _ m1) + 52203; maXJ‘zr— m2) ‘5' + b 922(ij _ m2)(xq:_ _ mg) = I2(fo _ m2)(.Yi._ my) blE(xqf_ math; .— ml) + 522(qu " qu—l’zt " m2) + + b q2(xq, — mq)(qu~* mg) = 2(qu — quy — m) (9.53)
in which mi is the mean value of the variable Xi (j— — 1, 2, . . , q); and my is the mean
value of Y. Furthermore, in Eqs. 9.53 the summations are from i = 1 to n (note that
n is the size of each of the variables Y and X4) and a term such as .r means the ith
element of the variable X. For simplicity, in Eqs. 9. 52 and 9 53 we have dropped the symbol " which IS often used to describe the estimates for a and b. The variance of
Y in the regression equation can also be obtained as in the case of the single linear correlation, i.e.,
Var(le1,x2,...xq) =2 SSE/(n  q ~1) (9.54)
in which
2 20’1”" a _ thu _ 52x2: _ bsxsr " _ 53¢qu2 (955) 9.6.5 Nonlinear Regression 'In many engineering problems, the data set may show a nonlinear trend between Y
and X. In these problems, it is desirable to establish a nonlinear regression between
Y and X. Usually the scatter plot of Y data versus X sample values will suggest the
type of function that may be suitable for the relation between X and Y. If it is deter
mined that the relation follows a speciﬁc type of function, such as g(x), then the
regression equation can simply be established by linearization. However, if the type ' of function Cannot be identified, a polynomial may be attempted in order to estab— lish the regression equation. In this case, one may wish to closely examine the scat
ter plot of 1’ versus X to decide on the order of polynomials that would best fit the
data. In this section these two methods are explained. Regression by Using a I’mDetermined Function. Suppose the scatter plot of
Y versus X suggests that the relationship is in the form of a function g(x). For exam .—., Sec. 9.6 Regression and Correlation Analysis 279 ple, this function is exponential or logarithmic (i.e., g = eJ or g = Enx). The
regression of Y on X can then be written as: E(le) : a + bg(x) (9.56s) The linearization is achieved by substituting an auxiliary variable infor g(x). Since
2 I g(x), then Eq. 9.56a becomes: E(Yiz) =.a + bz ' (9.56b) Constants a and b can now be estimated from Eqs. 9.43 and 9.44. However, in these
equations all x, will be substituted by z, = g(x,). Nonlinear Regression Using a Polynomial Function. In this case, the regres
sion equation is written as a polynomial of mth order. E(le) : £20 + alx + :12):2 + + nmx’" (9.57) Once m is selected, a procedure similar to the linear regression method is used to
minimize the sum of squares of the differences between the observed values of Y
(1.8., y) and the theoretical value from Eq. 9.57. This results in m + 1 simultaneous
equations for computing the estimates for the constants a0, a], . . ., am in the form of ass
5 = Ofor j =0tom
601
These m + 1 equations can be written in matrix as
[Cl{a} = {R} (9.58}
in which it can be shown that
n Exi ‘ Ex? 2,6,7”
Exj Ex? Er? 21?“
[C] = _
Ex? Ex?" +1 Ex?“+2 21?"
and
2%
2w.
{R} = Eye?
Eyed"
and
an
a]
{a} = “2
H m For more information on regression analysis in general, see Myers (1990). 230 Hypothesis Testing, Analysis of Variance, Regression  Chap. 9 Example 9.2] Bus fares in suburban communities of a major metropoiitan area are a function of rides
per bus line per day. In a survey of eight communities, the data in Table E921A was .
gathered on the bus fare in dollars (dependent variable, Y} and number of rides per
line per day (independent variable, X). A regression equation in the form of Y = a + b€n(x) has been suggested for the data.
Compute the estimates fora and b and discuss the correlation of Yon {71(X). Solution: We introduce Z = €n(X) and establish a linear regression between 1’ and Z. Table
E9.21B summarizes the results. ' x '. —— . .
.538 43004 57693><595=0396
8 x 416.30 H (57.693)2 :71, = 57.69318 3 7.212 ' my = 5.95/8 = 0.744 a _= 0.744 — 0.396 x 7.212 = 42.112 7
15042) = —2.112 + 0.396: = —2.112 + 0.396 020). To compute the correlation coefficient between Y and Z, we can either use Eq. 9.47:: or
9.47b. We first must compute s1 and 5),. TABLE E9.21A Rides/LinefDay Fare ($) 1050 0.60 1100 0.75 1255 ' 0.70 1270 0.65 1420 0.80 1530 0.75 1625 0.85 1750 0.85
TABLE E9213
x, 11 "4 4372(0) Ya Z? 7? ' Z0.
1050 _ 6.957 0.6 48.40 0.360 4.174
1100 7.003 0.75 49.04 0.563 5.252
1255 7.135 0.70 50.91 0.490 4.995
1270 7.147 0.65 51.08 0.423 4.646
1420 ' 7.258 0.80 52.68 0.64_ 5.806
1530 7.333  0.75 53.77 0.563 5.500
1625 7.393 0.85 54.66 0.723 6.284
1750 7.467 0.85 55.76 0.723 6.347 Sum 57.693 ' 5.95 416.30 4.485 43 .004 Summary 281 53 = (122,2 — nmg)l[n — 1) = [416.30 — 8(7.212)2]I7 = 0.0281, and sI = 0.168. Simi—
larly, 5; = [4.435 f 3(0.744)2]/7 = 0.0031, and s, = 0.09. Thus using Eq. 9.47b
r = [(43004 — 8 X 7.212 X 0.744)l(0.168 X 0.09)]/(8 e 1) : 0.74.
The r2 factor is about 0.6. This indicates a reasonable strength for the relationship. 9.6.5 Spearsman’s Rank Correlation Coefficient In estimating the correlation coefficient between X and Y, we may findout that the
data consists of ranks or that the data set is small and it can be ranked readily. In
these cases, the Spearsman’s rank correlation coefficient is used for measuring the
correlation coefficient. Suppose we have compiledn sets of values for X and Y
denoted by (If, y.) where i = 1 to n. We rank both the X and Y data in ascending
order. The ranks for the X values are denoted by r,, and those'for 1" by ryi. Thus we
now have )1 pairs of ranks such as (r,,, rﬂ). For tied scores, again we use the average
of the ranks for those scores that are tied; Spearsman’s rank correlation coefﬁcient
r, is obtained from the following equation ”Erﬂrﬂ FT ﬁrm2r)” r‘ 1 Vin»; — (Eritrea — (25.02] The summations in Eq. 9.59 each is from t' = ‘1 to n. The results from this equation
are usually slightly different from those from Eqs. 9.47. For large n, the two results
are in close agreement with each other. If there are no ties in ranks, the Spearsman’s
coefﬁcient is given by the following simple equation 6sz 11012 — I)
in which'd; = rn — rﬂ and again the summation is fromi = 1 to rt. As discussed pre viously, the values of r5 close to :1 indicate a strong correlation between X and Y. (9.59). r5=1 (9.60) Example 9.22 In Example 9.19, compute the correlation coefﬁcient using Spearsman’s equation.
Solution: We need to rank the values of X and Yin ascending order from 1 to 9. Table 139.22
summarizes the ranks for X (i.e., TI.) and ranks for Y (i.e., ryi), Iii, rfi and rnryf. Note that
tied scores receive the average of the ranks in the tiedscore group. Using Eq. 9.59
9 X 19225 — 45 X 45 V[9 x 280.00 ~ (45mg x 234.50 — (45}21 = — 0.57 r, = muggﬁugswa...n—,ca .v'mr. .. n . = as... . —... “re.“ . — i
.l
i
.l
i
t
i
l
l
’l
.'l 282 Hypothesis Testing, Analysis of Variance, Regression Chap. 9 TABLE £9.22 RANKED DATA Work shift Ranks Productivity Ranks r,;2 r; rxjry;
Hrslday, X r“ %, Y r},
7 35* 90 8.5‘”k 12.25 72.25 29.75
6.5 ‘ '1 85 5 1.00 25.00 5.00
9.5 7 75 2 49.00 4.00 14.00
7 35* 83 4 12.25 16.00 14.00
7 35* 38 7' 12.25 49.00 24.50
10 8 78 3 64.00 9.00 24.00
7.5 6 90 8.5“ 36.00 72.7.5 51.00
7 35* 87 6 12.25 36.00 21.00
11 9 69 1 31.00 1.00 9.00
Sum 45 45 280.00 284.50 192.25 *This is the average of ranks 2,3. 4, and 5 that are tied.
“This is the average of ranks 8 and 9 that are tied. SUMMARY In this chapter, we reviewed several useful statistical methods for the analysis of
engineering data. The focus of the discussion was on hypothesis testing, comparing
means from two or more populations, the analysis of variance and correlation and
regression analysis. The‘discussion on these methods is rather comprehensive. We
presented only those methods that are commonly used in statistical analysis of data.
The applications of these methods in engineering are numerous. One may wish to
apply these methods as statistical evidence that the data sets compiled in two groups
are either identical or different. Furthermore, an engineer may wish to apply these
methods in connection with providing the necessary evidence as to whether a pre=
sumed notion is valid. In general, we recommend the use of the methods presented
in this chapter for those engineers who are involved in research that require field or laboratory data compilation. REFERENCES GUTTMAN, 1., 5.3. WILKs, and LS. HUNTER {1982). Introductory Engineering Statistics, Third
edn., John Wiley & Sons, New York, NY. Kurt, (3.1., U.H. ST.CLAIR, and B. YUAN (1997). Fuzzy Set Theory, Foundations and Appli
cations, Prentice Hall PTR, Upper Saddle River, NJ. MtLTon, 13., and 1.0. TSOKOS (1983). Statistical Methods in the Biological and Health Sci
ences, McGraw—Hill Publishing Company, New York, NY. MYERS, RH, (1990). Classical and Modern Regression with Applications, 2nd edn, Duxbury
Press, Bostori Massachusetts. NELSON, M. and W. ILLINGwonTH (1990}. A Practical Guide to Neural Nets, AddisonWesley
Publishing Company, Inc. ...
View
Full Document
 '08
 Stedinger

Click to edit the document details