Unformatted text preview: Σ f (Xi – X)2 n1 = the Sample Variance An unbiased estimator of the population value! this is the average squared deviation n1 gives a better estimate of the pop. value Σ f (Xi – X)2 n 1 = the Sample Standard Deviation n1 gives a better estimate of the pop. value More displays of data Stemandleaf display Boxplots Scatterplots • With bivariate distributions (when we have two variables) • correlations, correlation coefficient (r) Stem and leaf display You are listing the numbers as well as graphing them (providing “pictures of them”) Let’s say we have the following scores and would like to see the distribution 55,56,54,68,67,67,69,62,70,73,74 42,42,43,40,41,50,52,22,31,33,39 38,36,38,37,37,38,40,21,58,59,51 More displays of data Stem and leaf display
55,56,54,68,67,67,69,62,70,73,74 42,42,43,40,41,50,52,22,31,33,39 38,36,38,37,37,38,40,21,58,59,51
The stem is the first digit of each number, the leaves are from the second digit of that number. Stem 7 6 5 4 3 2 Leaves 0 3 4 2 7 7 8 9 0 1 2 4 5 6 8 9 0 0 1 2 2 3 1 3 6 7 7 8 8 8 9 1 2 Boxplots Graphs both central tendency and dispersion measures Minimum score, 25th percentile Median (50th percentile), 75th percentile, and Maximum score Talking about the interquartile range here… Boxplots
55,56,54,68,67,67,69,62,70,73,74 42,42,43,40,41,50,52,22,31,33,39 38,36,38,37,37,38,40,21,58,59,51 N = 33 Minimum score is 21, Maximum score is 74 25th percentile = (N + 1).25 (34).25 = the 8½ score halfway between the 8th and 9th score (both are 38, so it is 38) Median (50th percentile) = (34).5 = 17th score = 43 75th percentile = (N + 1).75 (34).75 = the 25½ score halfway between the 25th and 26th score (59 and 62, so it is 60.5) Boxplots
80 70 Maximum score (74) 60 75th percentile (60.5) 50 MEDIAN (43)
40 25th percentile (38) 30 20 Minimum score (21) 10 300 420 450 300 300 300 400 400 400 570 488 490 450 500 590 585 565 575 435 300 350 300 620 520 550 510 510 620 750 320 342 449 450 400 406 480 900 600 600 840 800 850 900 500 900 400 580 550 645 540 Scores from previous exercise (testing for promotion) Bivariate Distribution 2 300 5 420 2 450 5 300 4 300 4 300 8 400 4 400 4 400 8 400 4 570 5 488 6 490 5 450 11 500 10 590 3 585 10 565 12 575 15 580 9 435 8 300 3 350 2 300 2 620 12 520 12 550 7 510 7 510 7 550 8 620 7 750 8 320 7 342 7 449 2 450 3 400 1 406 1 480 9 645 16 900 14 600 16 600 11 840 10 800 11 850 15 900 14 500 17 900 7 540 Scores from previous exercise (testing for promotion) Number of weeks employees attended training sessions 1000 900 800 700 Score 600 500 400 300 200 0 2 4 6 8 10 12 14 16 18 20 r = .629 Weeks attended training sessions Different degrees of positive correlation
10 14 12 8 10 6 8 4 6 Variable 2 Variable 2 2 r = +1.0
0 2 4 6 8 10 12 14 16 18 4 r = +.5
0 2 4 6 8 10 12 14 16 18 2 0 0 Variable 1
12 Variable 1 10 8 6 4 r = +.014 Variable 2 2 0 0 2 4 6 Variable 1 8 10 12 14 16 18 Different degrees of negative correlation
20
20 15 10 10 5 Variable 2 Variable 2 r = 1.0
0 5 10 15 20 r = .49
0 0 2 4 6 8 10 12 14 16 18 0 Variable 1
22 20 18 16 14 12 10 8 6 Variable 1 r = .036 Variable 2 4 2 0 0 2 4 6 8 10 12 14 16 18 20 22 Variable 1 Pearson’s (r) correlationcoefficient Also known as the productmoment correlation Correlation, not to be confused with causation X can cause Y, but Y could cause X Or, a third variable is influencing X and Y Or, a complex set of interrelated variables influences X and Y Or more than one ofthese scenarios can apply… Minium et al., (1993) Pearson’s (r) correlationcoefficient Use only for linear relationships (e.g., not for curvilinear) One equation we can use… r = Σ (X – X) (Y – Y)
(SSx)(SSy) Correlation coefficient
X Y XX
(3) YY
(3.4) (XX)2 (YY)2 5.76 .16 .16 6.76 .36 (XX)x (YY) 1 2 3 4 5 1 3 3 6 4 2 1 0 1 2 2.4 .4 .4 2.6 .6 4 1 0 1 4 4.8 .4 0 2.6 1.2 10 13.2 9 Correlation coefficient
X Y XX
(3) YY (4.4) (XX)2 (YY)2 (XX)x (YY) 10 r = 9 Σ (X – X) (Y – Y)
(SSx)(SSy) 11.48 132 13.2 = .783 9 Exercise Your supervisor would like to see whether the number of hours of computer training employees received have a positive association with their overall technical proficiency scores generated during their annual performance appraisals. Get central tendencies, dispersion (use n – 1) in the denominator with your equations for the variance and standard deviation Graph distributions, generate tables Find the correlation coefficient Graph a scatterplot Graph a boxplot and a stemandleaf for each variable What is the Z score for 50 hrs of training? 350 on technical proficiency score? 50 52 53 24 29 42 32 35 36 38 39 22 45 60 62 64 68 63 162 165 200 300 500 321 600 600 750 165 400 400 405 220 230 240 500 520 630 500 68 69 62 63 65 68 42 40 40 41 33 55 23 27 28 29 70 74 52 120 120 158 152 165 154 300 300 322 700 650 650 450 650 120 160 100 100 200 26 28 22 19 16 66 62 42 32 60 450 450 450 800 800 200 200 200 250 100 61 65 35% 30% 25% 20% 15% 10% 5% 0% 10 to 19 20 to 29 30 to 39 40 to 49 50 to 59 60 to 69 70 to 79 VAR00003
Frequency 2 10 8 7 5 16 2 50 Percent 4.0 20.0 16.0 14.0 10.0 32.0 4.0 100.0 Valid Percent 4.0 20.0 16.0 14.0 10.0 32.0 4.0 100.0 Cumulative Percent 4.0 24.0 40.0 54.0 64.0 96.0 100.0 Valid 10 to 19 20 to 29 30 to 39 40 to 49 50 to 59 60 to 69 70 to 79 Total 30% 25% 20% 15% 10% 5% 0% 100 to 200 to 300 to 400 to 500 to 600 to 700 to 800 to 199 299 399 499 599 650 799 899 VAR00004 Frequency 14 9 5 7 5 6 2 2 50 Percent 28.0 18.0 10.0 14.0 10.0 12.0 4.0 4.0 100.0 Valid Percent 28.0 18.0 10.0 14.0 10.0 12.0 4.0 4.0 100.0 Cumulative Percent 28.0 46.0 56.0 70.0 80.0 92.0 96.0 100.0 Valid 100 to 199 200 to 299 300 to 399 400 to 499 500 to 599 600 to 699 700 to 799 800 to 899 Total 1000 r = .517
800 600 400 VAR00002 200 0 10 20 30 40 50 60 70 80 VAR00001 80 70 74 62.25 60 50 40 42 31.25 16
50 30 20 10
N= Mean = 45.88, sd = 16.89, multiple modes… (32 is the smallest) 1000 800 800 600 500
400 300
200 165 100 0 Mean = 355.58, sd = 208.79, Mode = 200 Stem 7 6 5 4 3 2 1 Stem 8 7 6 5 4 3 2 1 Leaves 0 4 0 0 1 2 2 2 3 3 4 5 5 6 8 8 8 9 0 2 2 3 5 0 0 1 2 2 2 5 2 2 2 3 5 6 8 9 2 2 3 4 6 7 8 8 9 9 6 9 Leaves 00 00 00 50 00 00 30 50 50 50 00 00 00 00 20 00 00 05 50 50 50 50 00 00 00 21 22 00 00 00 00 00 20 30 40 50 00 00 00 20 20 20 52 54 58 60 62 65 65 65 Z score for 50? Z score for 350? • 5045.88/16.89 = +.24 • 350355.58/208.79 = .026 ...
View
Full Document
 Spring '09
 BarryYoung
 Harshad number, 5%, supervisor, 175, 2 10 8 7 5 16 2 50 Percent

Click to edit the document details