l5 - Box Plots: A graphical way to show the...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Box Plots: A graphical way to show the shape of a distribution Example: weights of 57 children Example: Weights of 57 children Boxplot of Childrens’ weights 80 Possible outlier Upper adjacent value ! Weight (lbs) 60 Upper Quartile 40 Median Lower Quartile 20 Lower adjacent value 0 •First •Prev •Next •Last •Go Back •Full Screen •Close •Quit The Q Q plot: Comparison of the shapes of the distribution of two samples or a sample and the theoretical distribution Comparing quartiles of two populations Result: if the Q ­Q plot is linear, then what does it imply? e Page QQ-plot: chest size data Page (a) Histogram tents (b) QQ plot ! ! !! ! ! ! !! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! !! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! !! !! !! ! ! ! ! !! ! ! ! ! !! 0.15 Sample Quantiles 35 40 Chest (in) 45 ￿￿ Density 0.10 ￿ 0.05 0.00 50 of 61 35 40 45 Back −4 −2 0 2 4 Theoretical Quantiles creen lose uit Review Keywords: Box-plots, QQ-plots. • • • • First Prev Next Last •Go Back •Full Screen •Close •Quit Bivariate Data • Suppose two variables (X,Y) are measured on a sample of n units to give data (x1,y1), (x2,y2),…….(xn,yn) • The variables X and Y might be associated • Example: yi = response variate for unit i while xi is the corresponding explanatory variable • The problem: Is there an associative relationship between the two variates? If there is, is the relationship a causal one? Graphical Solutions: Scatter Plot: More than two variables Scatter Plot matrix me Page Example: (from the notes) Scatterplots Scatterplot of Votes/Seats 80 tle Page Plot of Seats by Time ! ontents ! ! 70 ￿￿ Seats % ! ! ! ! ! ! ! ! ! ! ! !! ! !! ! ! ! ! ! ! ! !! ! ! ! !! ! ! 60 ￿ 50 Seats % 40 30 o Back ! 40 ll Screen 45 50 Votes % 55 30 1900 40 e 52 of 61 50 60 70 80 1920 1940 Year 1960 Close Quit •First •Prev •Next •Last •Go Back •Full Screen •Close •Quit Scatter Plot matrix matrix Scatter-plot 40 45 50 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 55 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! Year ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! !! ! ! ! ! ! ! ! !! ! ! !! ! ! ! 55 50 !! ! ! ! ! ! ! ! ! !! ! ! ! ! Votes % ! !! !! 40 45 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! !! ! ! ! ! ! ! !! ! ! ! ! ! ! ! ! ! ! ! ! !! ! !! ! !! ! ! ! ! ! ! ! 1900 1920 1940 1960 30 40 50 60 70 80 30 40 50 Seats % 60 70 80 1900 ! ! 1920 1940 1960 ! •First •Prev •Next •Last •Go Back •Full Screen •Close •Quit Algebraic Measures of Association: Categorical Variates: The ideas from STAT 230 (the idea of independence) Continuous Variates: The correlation coefficient. Example: Investigating the relationship between being a smoker and risk of heart attack Smoke/Risk Yes No Total High 42 12 54 Low 7 39 46 Total 49 51 100 Dependence & Relative Risk • Two events A and B are said to be independent if P(A|B) = P(A|not B) = P(A) • We can measure dependence by calculating the ratio P(A|B) P(A|not B) And see how far it is from 1 • What is true for probabilities also holds for proportions. So use Proportion of smokers who are high risk Proportion of non ­smokers who are high risk • The above measure is called Relative Risk Relative Risk = 3.64 > 1 How high is too high? Correlation • A commonly used measure of strength of a linear relationship between two continuous variables (xi, yi) i=1,2,..n • This is usually denoted by r and is defined as r = SXY/(SXXSYY)1/2 where SXX = Σ (xi ­x)2 SYY = Σ(yi ­y)2 SXY = Σ {(xi ­x)(yi ­y)} Properties of the correlation coefficient: •  ­1≤ r ≤ 1 • When the points lie exactly on a straight line with positive (negative) slope, r=1. (r = ­1) when there is a reasonably strong positive (negative) relationship, r will be appreciably positive (negative), when there is no relationship, r will be near zero • Is the converse true? Time Series Data • The investigation concerns a process • We have data points of the form (t, y(t)) t an be discrete (more likely), or continuous (weekly sales figures, # of visits to the doctor’s office every month, etc) • Scatter plot • Yt = Tt +St + εt Tt = Trend Term St= seasonal term εt = Error Term Examples: Home Page Example: Dow street crash Title Page Contents Dow Jones Index (1920−1941) 400 400 Dow Jones Index 1925 1930 1935 1940 Dow Jones Index (1929−1931) ￿￿ ￿ ￿￿ 300 ￿ Dow Jones Index 200 Page 60 of 61 100 Go Back 0 0 1929.5 100 200 300 Full Screen 1930.5 Years(1929−1931) 1931.5 Years(1920−1941) Close Quit •First •Prev •Next •Last •Go Back •Full Screen •Close •Quit ...
View Full Document

This note was uploaded on 01/27/2011 for the course STAT 231 taught by Professor Cantremember during the Winter '08 term at Waterloo.

Ask a homework question - tutors are online