Unformatted text preview: Topic #2 Basic Statistics – Part 1
REVIEW REVIEW ON OWN • Summarizing Qualitative Data • Summarizing Quantitative Data • Measures of Location and Variability/Dispersion Measures Measures of Association Between Two Variables Random Random Variables Discrete Discrete Probability Distributions Expected Expected Value and Variance Slide 1 Summarizing Qualitative Data Frequency Frequency Distribution Relative Relative Frequency Percent Percent Frequency Distribution Bar Bar Graph Pi Ch Pie Chart What does “Qualitative” mean?
Slide 2 Frequency Distribution A frequency distribution is a tabular summary of frequency data showing the frequency (or number) of items in each of several nonoverlapping classes. The The objective is to provide insights about the data that cannot be quickly obtained by looking only at the original data. Slide 3 1 Example: Marada Inn
Guests staying at Marada Inn were asked to rate the quality of their accommodations as being excellent, excellent above above average, average, below average, or poor. The average below poor ratings provided by a sample of 20 guests are shown below. Below Average Average Above Average Above Average Above Average Above Average Above Above Average Below Average Below Average Average Poor Poor Above Average Excellent Above Average Average Above Average Average Above Average Average How How many rated worse than average? How many rated better than average?
Slide 4 Example: Marada Inn
Frequency Frequency Distribution Rating Frequency Poor 2 Below Average 3 5 Average Above Average 9 Excellent 1 Total 20 How many rated worse than average? How many rated better than average?
Slide 5 Example: Marada Inn
Frequency Frequency Distribution Rating Frequency Poor 2 Below Average 3 5 Average Above Average 9 Excellent 1 Total 20 The GM of Marada Inn has a goal that no more than 10% of all guests will rate their stay as worse than average. How is the inn doing? Slide 6 2 Relative Frequency and Percent Frequency Distributions
The The relative frequency of a class is the fraction or proportion of the total number of data items belonging to the class. A relative frequency distribution is a tabular relative summary of a set of data showing the relative frequency for each class. The The percent frequency of a class is the relative frequency multiplied by 100. A percent frequency distribution is a tabular summary of a set of data showing the percent frequency for each class. Slide 7 Example: Marada Inn
Relative Relative Frequency and Percent Frequency Distributions Relative Percent Rating Frequency Frequency Poor .10 10 Below Average .15 15 Average .25 25 Above Average .45 45 Excellent .05 5 Total 1.00 100 Slide 8 Bar Graph
A bar graph is a graphical device for depicting bar qualitative data that have been summarized in a frequency, relative frequency, or percent frequency distribution. On On the horizontal axis we specify the labels that are used for each of the classes. A frequency, relative frequency, or percent frequency frequency relative or percent scale can be used for the vertical axis. Using Using a bar of fixed width drawn above each class label, we extend the height appropriately. The The bars are separated to emphasize the fact that each class is a separate category. Slide 9 3 Example: Marada Inn
Bar Graph
9 8 7
Frequency 6 5 4 3 2 1 Poor Below Average Above Excellent Average Average Rating
Slide 10 ECO 6416 Grade Distribution
60% 40% 20% 0% A B C D F 0% 0% 25% 50% 25% Slide 11 ECO 6416 Grade Distribution
100% 100% 50% 0% 0% A B C D F 0% 0% 0% Slide 12 4 Pie Chart
The The pie chart is a commonly used graphical device for presenting relative frequency distributions for qualitative data. First First draw a circle; then use the relative frequencies to subdivide the circle into sectors that correspond to the relative frequency for each class. Since Since there are 360 degrees in a circle, a class with a relative frequency of .25 would consume .25(360) = 90 degrees of the circle. Slide 13 Pie Chart Slide 14 Example: Marada Inn
Pie Pie Chart
Exc. Poor 5% 10% Above Average 45% Below Average 15% Average 25% Quality Ratings
Slide 15 5 Summarizing Quantitative Data Frequency Frequency Distribution Relative Relative Frequency and Percent Frequency Distributions Histogram Histogram Slide 16 Example: Hudson Auto Repair
The manager of Hudson Auto would like to get a better picture of the distribution of costs for engine tunetuneup parts. A sample of 50 customer invoices has been taken and the costs of parts, rounded to the nearest dollar, are listed below. 91 71 104 85 62 78 69 74 97 82 93 72 62 88 98 57 89 68 68 101 75 66 97 83 79 52 75 105 68 105 99 79 77 71 79 80 75 65 69 69 97 72 80 67 62 62 76 109 74 73 Slide 17 Example: Frequency Distribution Table
This is what a frequency distribution table looks like
Cumulative Cumulative Percent Frequency Frequency 2 4 15 30 31 62 38 76 45 90 50 100 Cost ($) 5059 606069 707079 808089 909099 100100109 Totals Frequency 2 13 16 7 7 5 50 Relative Frequency Frequency .04 .26 .32 .14 .14 .10 1.00 Slide 18 6 Frequency Distribution
Guidelines Guidelines for Selecting Number of Classes • Use between 5 and 20 classes. • Data sets with a larger number of elements usually require a larger number of classes. • Smaller data sets usually require fewer classes. Guidelines Guidelines for Selecting Width of Classes • USE CLASSES OF EQUAL WIDTH • Approximate Class Width = Largest Data Value − Smallest Data Value Number of Classes Slide 19 Example: Hudson Auto Repair
Frequency Frequency Distribution If we choose six classes: Approximate Class Width = (109  52)/6 = 9.5 ≅ 10 10 Frequency 2 13 16 7 7 5 Total 50 Would it be wrong if 5059,6079,8089,9099,100109? 5059,6079,8089,9099,100Cost ($) 505059 606069 707079 808089 909099 100100109
Slide 20 Example: Hudson Auto Repair
Relative Relative Frequency and Percent Frequency Distributions Relative Percent Frequency Frequency Cost ($) 505059 .04 4 6069 .26 26 707079 .32 32 808089 .14 14 909099 .14 14 10 100100109 .10 Total 1.00 100
Slide 21 7 Histogram
Another Another common graphical presentation of quantitative data is a histogram. The The variable of interest is placed on the horizontal axis and the frequency, relative frequency, or percent frequency is placed on the vertical axis. A rectangle is drawn above each class interval with rectan its height corresponding to the interval’s frequency, relative frequency, or percent frequency. Unlike Unlike a bar graph, a histogram has no natural separation separation between rectangles of adjacent classes. Slide 22 22 Example: Hudson Auto Repair
Histogram Histogram
18 16 14
Frequency 12 10 8 6 4 2 50 60 70 80 90 100 110 Cost ($)
Slide 23 NBA Salaries, few yrs ago
40 35 30 25 20 15 10 5 0 Numer of Players 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 Salaries ($100,000) Slide 24 8 Measures of Location Mean Mean Median Median Mode Mode Percentiles Percentiles Quartiles x
Slide 25 Example: Apartment Rents
Given below is a sample of monthly rent values ($) for onebedroom apartments. The data is a sample of 70 oneapartments in a particular city. The data are presented in ascending order. 4 25 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615
Slide 26 Mean
The The mean of a data set is the average of all the data values. If If the data are from a sample, the mean is denoted by x. ∑ xi x= n If If the data are from a population, the mean is denoted by μ (mu). μ= ∑ xi N Slide 27 9 Example: Apartment Rents
Mean Mean x= ∑ xi 34 , 356 = = 490 . 80 n 70 425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615
Slide 28 Median The The median of a data set is the value in the middle when the data items are arranged in ascending order. If If there is an odd number of items, the median is the value of the middle item. If If there is an even number of items, the median is the the average average of the values for the middle two items. Slide 29 29 Example: Apartment Rents
Median Median Since 70 is even and ½ of 70 = 35, average 35th and 36th data values: Median = (475 + 475)/2 = 475
425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615
Slide 30 10 Example: Apartment Rents
ALTERNATIVE ALTERNATIVE METHOD  Median Median = 50% percentile i = (p/100)n = (50/100)70 = 35, average 35th and 36th data values: (see “Percentiles” slide) Median = (475 + 475)/2 = 475
425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615
Slide 31 Mode The The mode of a data set is the value that occurs with greatest frequency. Slide 32 Example: Apartment Rents
Mode Mode 450 occurred most frequently (7 times) Mode = 450 425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615 Mean = $491 Median = $475 Mode = $450 Slide 33 11 Percentiles
The The pth percentile of a data set is a value such that at least p percent of the items take on this value or less • (and at least (100  p) percent of the items take on this value or more.) • Arrange the data in ascending order. the • Compute index i, the position of the pth percentile. i = (p/100)n • If i is not an integer, round up. The pth percentile is the value in the ith position. • If i is an integer, the pth percentile is the average of the values in positions i and i+1. and Slide 34 Example: Apartment Rents
90th 90th Percentile i = (p/100)n = (90/100)70 = 63 Averaging the 63rd and 64th data values: 90th Percentile = (580 + 590)/2 = 585
425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615
Slide 35 Example: Apartment Rents
90th 90th Percentile = 585 • So, 90% of sample values should be less than or equal to 585 • True? 63 values < 585, so above is true!
4 25 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615
Slide 36 12 Quartiles Quartiles Quartiles are specific percentiles First First Quartile = 25th Percentile Second Second Quartile = 50th Percentile = Median Third Third Quartile = 75th Percentile Slide 37 37 Measures of Variability Range Range Variance Variance Standard Standard Deviation Slide 38 38 Range
The The range of a data set is the difference between the largest and smallest data values. It It is the simplest measure of variability. It It is very sensitive to the smallest and largest data values. Slide 39 13 Example: Apartment Rents
Range Range Range = largest value  smallest value Range = 615  425 = 190 425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615 Slide 40 Variance
The The variance is the average of the squared differences between each data value and the mean. If If the data set is a sample, the variance is denoted by s 2. 2 ∑ ( xi − x ) s2 = n−1 If If the data set is a population, the variance is denoted by σ 2. ∑ ( xi − μ)2 σ2 = N Slide 41 Variance Example
Salaries (x) ($1000) 200 200 300 400 500 600 x = 400 (x  x ) 200 100 0 100 200 (x  x )2 40,000 10,000 0 10,000 40,000 sum = 100,000 s= 2 2 ∑(xi −x) = 100,000/4 = $25,000 n−1 Slide 42 14 Standard Deviation
The The standard deviation of a data set is the positive square root of the variance. It It is measured in the same units as the data, making it more easily comparable, than the variance, to the mean. If the data set is a sample, the standard deviation is If denoted s. s= s2 If If the data set is a population, the standard deviation is denoted σ (sigma). σ= σ 2 Slide 43 43 Standard Deviation Example
Recall Recall s2 = $25,000 s=
200 200 300 400 500 600 s 2 = $158 $158
(x  x ) 200 100 0 100 200 Salaries Average distance from mean = 600/4 = $150
Slide 44 More About Variance & Std. Deviation
Standard Standard Deviation • this measure is a number that shows how widely the values are dispersed about the mean of the distribution. • it shows (loosely) the average distance any variable average value is from the variable's mean. value mean. • in the same units of measurement as the original distribution distribution values. • For instance, if the variable is "revenue per month," measured in dollars, then the standard deviation will also be in dollars. Slide 45 45 15 Dispersion
A small standard deviation shows that your values small are are grouped closely about the mean •s=1 a large standard deviation shows that your values are large dispersed widely about the mean. •s=5 Slide 46 Standard Deviation Example
Season winning percentages for two teams over four seasons NOTE: same mean winning percentages. A: .450, .555, .345, .650 mean = .500 standard deviation = 0.13 B: .200, .350, .800 , .650 mean = .500 standard deviation = 0.27 Which team is more unpredictable? How can you tell? Slide 47 Dispersion
Variance Variance • This is the square of the standard deviation. • This is obviously, then, also a measure of how widely values are dispersed about the mean of a distribution. One goal of statistical testing is to explain lots of the variance variance (or variation) in the variable that you are examining examining or explaining. Slide 48 48 16 Who Uses This QualityQualitycontrol specialists have traditionally regarded observations more than three standard three deviations from the mean as candidates for further examination. A common measure of risk in investments is the Beta, th a calculated value based upon the standard calculated deviation. Slide 49 49 Does blood pressure predict life expectancy? Do SAT scores predict college performance? Does understanding statistics make you a better person?
Slide 50 Measures of Association between Two Variables
Scatter Scatter Diagrams Correlation Correlation Coefficient Slide 51 17 Scatter Diagrams
Thus Thus far we have focused on methods that are used to summarize the data for one variable at a time. Often Often a manager is interested in tabular and graphical methods that will help understand the relationship between two variables. A scatter diagram is one method for summarizing the scatter data for two variables simultaneously. Slide 52 52 Example: Pittsburgh Panthers Football Team
Scatter Scatter Diagram The Panthers football team is interested in investigating the relationship, if any, between interceptions made and points scored (in a game.) x = Number of y = Number of Points Scored Interceptions 1 14 3 24 2 18 1 17 3 27 (Note: these data are for 5 different games. For example: in game #1, they made 1 interception and scored a total of 14 points in that game.)
Slide 53 Example: Panthers Football Team
What happens to points scored as interceptions rise? Scatter Scatter Diagram
Number of Points Scored y 30 25 20 15 10 5 0 0 1 2 3 Number of Interceptions x Slide 54 18 Covariance
The The covariance is a quantitative measure of association between two variables. The The formula for the sample covariance is sxy = (1/ (n − 1))∑ ( xi − x )( yi − y )
i =1 n For For each observation on variable x, calculate the x, calculate deviation from the mean of x. Do same for variable y. x. Compute Compute the product of the deviations of x and y, observation by observation. Sum Sum the products, and divide by sample size less one. Slide 55 Example: Pittsburgh Panthers Football Team
Covariance Covariance x dev y dev Product 1 6 6 1 4 4 0 2 0 1 3 3 1 7 7 Sum 20 n1 4 Cov 5 For future reference, note that the std devs are 1 for x and and 5.34 for y. y.
Slide 56 x (Int) Int) 1 3 2 1 3 Mean 2 y (Pts) 14 24 18 17 27 20 Covariance
A positive covariance indicates that when x is above its mean, y positive is also tends to be above its mean; when x is below its mean, y also is tends to be below its mean. Conversely, Conversely, a negative covariance indicates that when x is above its mean, y tends to be below its mean; when x is below its mean, y is tends to be above its mean. While While the sign of the covariance is easily interpreted, its magnitude sign magnitude depends on the units of measurement of the variables. Ex: Ex: if x is advertising and y is sales, the absolute magnitude of the is covariance depends on whether the units are dollars or millions of dollars, whereas the sign is the same regardless. Slide 57 19 Correlation Coefficient
The The correlation coefficient is a “unitsfree” measure of “unitslinear association. The The coefficient can take on values between 1 and +1. • Values near 1 indicate a strong negative linear relationship relationship. • Values near +1 indicate a strong positive linear strong relationship relationship. • Zero value indicates no relationship. Slide 58 58 Correlation Coefficient (Cont)
The The coefficient can take on values between 1 and +1. • Values near 1 indicate a strong negative linear relationship relationship. • Values near +1 indicate a strong positive linear relationship relationship. • Zero value indicates no relationship. no If If the data sets are samples, the coefficient is rxy. sx sy (std. devs. of x & y) devs. If If the data sets are populations, the coefficient is ρ xy . rxy = sxy ρxy = σxy σxσ y Slide 59 Correlation Coefficient: Computation
In In the interceptions / points example, sxy = 5, sx = 1, s y = 5.34 rxy = 5 (1)(5.34) = 0.94. 0.94. What What does this mean? Slide 60 20 Example: Panthers Football Team
Describe relationship between points scored & interceptions Scatter Scatter Diagram & Correlation Coefficient
Number of Points Scored y 30 25 20 15 10 5 0 0 Correlation = .94 1 2 3 Number of Interceptions x Slide 61 Correlation Coefficient (cont.) Examples Examples • Team ERA vs. number of games lost • correlation = 0.84 • What does this mean? • turnovers vs. number of points scored • correlation =  0.67 • What does this mean? Slide 62 Exercise College College Athletic Departments • Questions #6  #9 Slide 63 63 21 Random Variables
A random variable is a numerical description of the random outcome of “an experiment.” A random variable can be classified as being either random discrete or continuous depending on the numerical values it assumes. • A discrete random variable may assume either a finite number of values or an infinite sequence of values. • A continuous random variable may assume any numerical value in an interval or collection of intervals. Slide 64 64 Discrete vs. Continuous Random Variables (cont.)
During During the 200304 NFL season, the Tampa Bay Bucs 2003had 2 passes intercepted in three games. Opponent No. of Interceptions Game #1 Philadelphia 1 #2 Carolina 1 #3 Atlanta 0 They Th...
View
Full Document
 Spring '08
 Staff
 Probability distribution, Probability theory, Discrete probability distribution

Click to edit the document details