Unformatted text preview: Basic Statistical Analysis Basic in Excel in
A quick look at Chapters 2-3 Statistics in Excel 1 Topics Covered
Computing simple statistics Computing
Mean, standard deviation Mean, Five-number summary Five number Exploratory graphics Exploratory
Histogram Histogram Box Plot Box Using the statistics Using
Intervals for the range of the data Intervals Outlier detection Outlier
Statistics in Excel 2 1 Example: bankruptcy cases
BankruptAndPop.xls BankruptAndPop.xls Number of cases filed by state and state population (in 1000s), for four years. n=51 (50 states plus DC) years. Statistics in Excel 3 Some of the data Statistics in Excel 4 2 Using Excel functions
1. 2. 3. 4. 5. Count Minimum Maximum Average Standard Deviation Usage: =function(DataRange)
Statistics in Excel 5 Bankruptcies in 2000
Count Minimum Maximum Average Std. Dev. =COUNT(H5:H56) =MIN(H5:H56) =MAX(H5:H56) =AVERAGE(H5:H56) =STDEV(H5:H56) Count Minimum Maximum Average Std. Dev. 51 1425 160564 24702.961 27229.841
6 Statistics in Excel 3 Other commonly-used summaries
Quartiles (25th, 50th and 75th percentiles) in Excel: =quartile(H5:H56,1) yields 6331.0 6331.0 Minimum and maximum Minimum Five number summary: the three quartiles, the min and the max quartiles, Statistics in Excel 7 Computing quartiles
1st Quartile 2nd Quartile (Median) 3rd Quartile Book Rules Excel Rules 5590.0 6331.0 16742.0 16742.0 36424.0 33923.0 Textbook rules on page 103. Textbook Why are the values different from those produced by the Excel function Quartile? Quartile Statistics in Excel 8 4 Easier ways to compute these
A statistical add-iin program will do this n statistical from menu-driven commands. from driven Windows PCs: Data Analysis Toolkit comes with Excel 2003, 2007, 2010. comes Even better: these computers should be able to run PhStat. able Apples: can‘t run Excel add-iins. A ns. Apples: template can work in any version. template Statistics in Excel 9 ExploreData.XLS
Computes common summary statistics, a Box Plot and Histogram Box Copy your column of data into column A of the “Data” worksheet. “Results” sheet will be updated. Statistics in Excel 10 5 The Data worksheet Statistics in Excel 11 Results sheet Statistics in Excel 12 6 Exploratory graphics Exploratory
The template includes two Excel charts. The A “schematic” diagram called a Box Plot. 2. A bar graph or histogram.
1. These help us understand how the data are These distributed. distributed.
Statistics in Excel 13 Results for 2000 Statistics in Excel 14 7 Using the average and SD
The standard deviation is the “typical The typical amount” that observations differ from their amount that mean mean We can use the two together to get a quick We description of where the actual data values are. Statistics in Excel 15 Chebychev’s rule (page 112)
For a data set with any shape: For any 1. At least 75% of the data values fall At within two standard deviations of the mean mean 2. At least 90% fall within three standard At deviations deviations Statistics in Excel Statistics 16 8 Bankruptcies in 2000
Average = 24703 Average 2-stdev interval 3-stdev interval StdDev = 27230 StdDev Statistics in Excel 17 The inequality is very conservative
A look at the data shows only one value outside the 3 standard deviation interval (California at 160564). (California This was 1/51 or about 2% of the data This The inequality says that no more than 10% (about 5 values) would be this far out. (about Statistics in Excel 18 9 The empirical rule (page 111)
For bell-shaped distributions: For bell 1. About 95% of the data values lie within About two standard deviations from the mean mean 2. About 99% lie within three standard About deviations deviations This might be a better description tool for This bankruptcy rates (following shortly) rates
Statistics in Excel 19 Outliers (not in textbook)
Outliers are data values that appear to be disconnected from the main body of data disconnected They may be extreme due to some important special cause or simply because of some measurement or recording error of Statistics in Excel 20 10 Detection of potential outliers
Find interquartile range = Q3 - Q1 Upper fence = Q3 + 1.5*IQR Lower fence = Q1 - 1.5*IQR Values beyond fences are potential outliers outliers Statistics in Excel 21 Outliers for Bankruptcies in 2000 IQR and Fences? Outliers? Statistics in Excel 22 11 This is part of the output Fences appear on Box Plot
Statistics in Excel 23 Box Plot of 5-number summary Statistics in Excel 24 12 Producing a Box Plot
Not a standard Excel chart type Not PhStat has one in Descriptive Statistics PhStat A workbook is in the Excel templates. workbook It actually is a “tricky” X-Y plot.
Statistics in Excel 25 Box Plot in PhStat
Bankruptcies in 2000 Bkrpt2000 1420 21420 41420 61420 81420 101420 121420 141420 161420 Statistics in Excel 26 13 Frequency distributions (2.4)
A frequency distribution and bar graph will frequency show us more detail. show 1. Define a set of “bins” or categories. Define 2. Get a count of how many observations Get fall in each category. fall A histogram is a column bar graph of this histogram distribution. distribution.
Statistics in Excel 27 Guidelines for categories
Each class grouping has the same width Each Determine the width of each interval by Determine
w = interval width = highest value − lowest value number of classes Intervals never overlap Intervals Round up the interval width to get desirable interval endpoints interval
Statistics in Excel 28 14 How many bins?
Page 26: k is between 5 and 15 Page This reflects Sturges’ rule: Sturges k = 1 + 3.3 log(n) 3.3 log(n For n=100, log(100) = 2, k = For For n= 51, k = For
Statistics in Excel 29 Our data
Sample size: ________ Number of bins: _______ Smallest: __________ Largest: __________ Interval width: ________ Statistics in Excel 30 15 The graph, after titles added
Bankruptcy Cases Filed by State, 2000
35 Number of States 30 25 20 15 10 5 0 25000 50000 75000 100000 125000 150000 175000 Num ber of cases Statistics in Excel 31 The shape of the data
Things we would like to know, in general:
1. 2. 3. Does the data clump together? If not, how does it spread out? Are there unusual cases high or low? Statistics in Excel 32 16 Bankruptcies are highly skewed
Skewed “to the right” because tail on right. Shape not surprising because the data are driven by population size. driven Hard to say what is the “typical” number of number bankruptcies because the answer needs to reflect population size. to We should try to fix that and create a new histogram. histogram.
Statistics in Excel 33 Ideal shape of a histogram
80 70 60 Frequency 50 40 30 20 10 0 -3 -2 -1 0 1 2 3 Normally Distributed Data Statistics in Excel 34 17 The ideal shape is the “bell” curve The
High degree of symmetry High No unusual cases high or low No Easy to summarize (mean = median) Easy We might get this from bankruptcy rates. Create a new column, dividing bankruptcies by population. by Statistics in Excel 35 ExploreData.xls for Rates Statistics in Excel 36 18 Scaling the Box Plot
It might be nice to have both graphs with the same X-axis. the To change the frequency distribution specify a new set of 7 bin limits. specify To change the Box Plot, right-click on any To click number for the X-Axis. number Format axis and specify a new scale. Statistics in Excel 37 19 ...
View Full Document
This note was uploaded on 02/12/2011 for the course QMB 3250 taught by Professor Thompson during the Spring '08 term at University of Florida.
- Spring '08