book.pdf - Introduction to Statistical Thought Michael Lavine i c 2005 by Michael Lavine Copyright Contents List of Figures v List of Tables x Preface

book.pdf - Introduction to Statistical Thought Michael...

This preview shows page 1 out of 475 pages.

Unformatted text preview: Introduction to Statistical Thought Michael Lavine February 21, 2013 i c 2005 by Michael Lavine Copyright Contents List of Figures v List of Tables x Preface xi 1 2 Probability 1.1 Basic Probability . . . . . . . . . . . . . . . 1.2 Probability Densities . . . . . . . . . . . . . 1.3 Parametric Families of Distributions . . . . 1.3.1 The Binomial Distribution . . . . . . 1.3.2 The Poisson Distribution . . . . . . 1.3.3 The Exponential Distribution . . . . 1.3.4 The Normal Distribution . . . . . . 1.4 Centers, Spreads, Means, and Moments . . . 1.5 Joint, Marginal and Conditional Probability 1.6 Association, Dependence, Independence . . . 1.7 Simulation . . . . . . . . . . . . . . . . . . . 1.7.1 Calculating Probabilities . . . . . . 1.7.2 Evaluating Statistical Procedures . . 1.8 R . . . . . . . . . . . . . . . . . . . . . . . . 1.9 Some Results for Large Samples . . . . . . . 1.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 6 14 14 17 20 22 29 41 50 56 57 61 72 76 80 Modes of Inference 2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . 94 94 95 95 ii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CONTENTS 2.3 2.4 2.5 2.6 2.7 2.8 3 4 5 iii 2.2.2 Displaying Distributions . . . . . . . . . . . . 2.2.3 Exploring Relationships . . . . . . . . . . . . Likelihood . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 The Likelihood Function . . . . . . . . . . . . 2.3.2 Likelihoods from the Central Limit Theorem 2.3.3 Likelihoods for several parameters . . . . . . Estimation . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 The Maximum Likelihood Estimate . . . . . . 2.4.2 Accuracy of Estimation . . . . . . . . . . . . 2.4.3 The sampling distribution of an estimator . . . Bayesian Inference . . . . . . . . . . . . . . . . . . . Prediction . . . . . . . . . . . . . . . . . . . . . . . . Hypothesis Testing . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . Regression 3.1 Introduction . . . . . . . . . . . . . 3.2 Normal Linear Models . . . . . . . . 3.2.1 Introduction . . . . . . . . . 3.2.2 Inference for Linear Models 3.3 Generalized Linear Models . . . . . 3.3.1 Logistic Regression . . . . . 3.3.2 Poisson Regression . . . . . . 3.4 Predictions from Regression . . . . . 3.5 Exercises . . . . . . . . . . . . . . . More Probability 4.1 More Probability Density . . . . . . 4.2 Random Vectors . . . . . . . . . . . 4.2.1 Densities of Random Vectors 4.2.2 Moments of Random Vectors 4.2.3 Functions of Random Vectors 4.3 Representing Distributions . . . . . . 4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 114 133 133 140 145 154 154 156 159 164 174 178 192 . . . . . . . . . 202 202 210 210 222 235 235 244 248 252 . . . . . . . 263 263 264 264 266 266 270 275 Special Distributions 279 5.1 Binomial and Negative Binomial . . . . . . . . . . . . . . . . . . . . . . 279 5.2 Multinomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 5.3 Poisson . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 CONTENTS 5.4 5.5 5.6 5.7 iv . . . . . . . . . 304 305 312 315 315 320 327 330 335 6 Bayesian Statistics 6.1 Multidimensional Bayesian Analysis . . . . . . . . . . . . . . . . . . . . 6.2 Metropolis, Metropolis-Hastings, and Gibbs . . . . . . . . . . . . . . . 6.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 346 358 377 7 More Models 7.1 Random Effects . . . . . . . . . 7.2 Time Series and Markov Chains 7.3 Survival analysis . . . . . . . . 7.4 Exercises . . . . . . . . . . . . . . . . 382 382 396 409 416 . . . . . . . . . . 420 420 420 423 428 431 433 439 443 446 451 5.8 5.9 8 Uniform . . . . . . . . . . . . . . . . . . . . . . . . . . . Gamma, Exponential, Chi Square . . . . . . . . . . . . . Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Normal . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.1 The Univariate Normal Distribution . . . . . . . 5.7.2 The Multivariate Normal Distribution . . . . . . 5.7.3 Marginal, Conditional, and Related Distributions The t Distribution . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathematical Statistics 8.1 Properties of Statistics . . . . . . . . . . . . . . . . 8.1.1 Sufficiency . . . . . . . . . . . . . . . . . . 8.1.2 Consistency, Bias, and Mean-squared Error 8.2 Information . . . . . . . . . . . . . . . . . . . . . . 8.3 Exponential families . . . . . . . . . . . . . . . . . 8.4 Asymptotics . . . . . . . . . . . . . . . . . . . . . . 8.4.1 Modes of Convergence . . . . . . . . . . . . 8.4.2 The δ-method . . . . . . . . . . . . . . . . . 8.4.3 The Asymptotic Behavior of Estimators . . 8.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456 List of Figures 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 1.14 1.15 1.16 1.17 1.18 1.19 1.20 1.21 1.22 1.23 pdf for time on hold at Help Line . . . . . . . . . . . . . pY for the outcome of a spinner . . . . . . . . . . . . . . (a): Ocean temperatures; (b): Important discoveries . . . Change of variables . . . . . . . . . . . . . . . . . . . . Binomial probabilities . . . . . . . . . . . . . . . . . . . P[X = 3 | λ] as a function of λ . . . . . . . . . . . . . . . Exponential densities . . . . . . . . . . . . . . . . . . . . Normal densities . . . . . . . . . . . . . . . . . . . . . . Ocean temperatures at 45◦ N, 30◦ W, 1000m depth . . . . . Normal samples and Normal densities . . . . . . . . . . . hydrographic stations off the coast of Europe and Africa Water temperatures . . . . . . . . . . . . . . . . . . . . Water temperatures with standard deviations . . . . . . . Two pdf’s with ±1 and ±2 SD’s. . . . . . . . . . . . . . . Permissible values of N and X . . . . . . . . . . . . . . . Features of the joint distribution of (X, Y) . . . . . . . . Lengths and widths of sepals and petals of 150 iris plants correlations . . . . . . . . . . . . . . . . . . . . . . . . 1000 simulations of θˆ for n.sim = 50, 200, 1000 . . . . . 1000 simulations of θ under three procedures . . . . . . . Monthly concentrations of CO2 at Mauna Loa . . . . . . 1000 simulations of a FACE experiment . . . . . . . . . . Histograms of craps simulations . . . . . . . . . . . . . . 2.1 2.2 2.3 quantiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Histograms of tooth growth . . . . . . . . . . . . . . . . . . . . . . . . 102 Histograms of tooth growth . . . . . . . . . . . . . . . . . . . . . . . . 103 v . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 9 11 13 16 19 21 24 25 27 31 32 36 38 44 48 52 55 60 64 65 69 81 LIST OF FIGURES 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 2.19 2.20 2.21 2.22 2.23 2.24 2.25 2.26 2.27 2.28 2.29 2.30 2.31 2.32 2.33 2.34 2.35 2.36 2.37 2.38 2.39 Histograms of tooth growth . . . . . . . . . . . . . . . . . . . . . . . . calorie contents of beef hot dogs . . . . . . . . . . . . . . . . . . . . . Strip chart of tooth growth . . . . . . . . . . . . . . . . . . . . . . . . Quiz scores from Statistics 103 . . . . . . . . . . . . . . . . . . . . . . QQ plots of water temperatures (◦ C) at 1000m depth . . . . . . . . . . . Mosaic plot of UCBAdmissions . . . . . . . . . . . . . . . . . . . . . . Mosaic plot of UCBAdmissions . . . . . . . . . . . . . . . . . . . . . . Old Faithful data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Waiting time versus duration in the Old Faithful dataset . . . . . . . . Time series of duration and waiting time at Old Faithful . . . . . . . . . Time series of duration and waiting time at Old Faithful . . . . . . . . . Temperature versus latitude for different values of longitude . . . . . Temperature versus longitude for different values of latitude . . . . . Spike train from a neuron during a taste experiment. The dots show the times at which the neuron fired. The solid lines show times at which the rat received a drop of a .3 M solution of NaCl. . . . . . . . . . . . . . . Likelihood function for the proportion of red cars . . . . . . . . . . . . P `(θ) after yi = 40 in 60 quadrats. . . . . . . . . . . . . . . . . . . . . Likelihood for Slater School . . . . . . . . . . . . . . . . . . . . . . . Marginal and exact likelihoods for Slater School . . . . . . . . . . . . Marginal likelihood for mean CEO salary . . . . . . . . . . . . . . . . FACE Experiment: data and likelihood . . . . . . . . . . . . . . . . . . Likelihood function for Quiz Scores . . . . . . . . . . . . . . . . . . . Log of the likelihood function for (λ, θ f ) in Example 2.13 . . . . . . . . Likelihood function for the probability of winning craps . . . . . . . . Sampling distribution of the sample mean and median . . . . . . . . . . . Histograms of the sample mean for samples from Bin(n, .1) . . . . . . . . Prior, likelihood and posterior in the seedlings example . . . . . . . . . Prior, likelihood and posterior densities for λ with n = 1, 4, 16 . . . . . Prior, likelihood and posterior densities for λ with n = 60 . . . . . . . Prior, likelihood and posterior density for Slater School . . . . . . . . Plug-in predictive distribution for seedlings . . . . . . . . . . . . . . . Predictive distributions for seedlings after n = 0, 1, 60 . . . . . . . . . pdf of the Bin(100, .5) distribution . . . . . . . . . . . . . . . . . . . . . pdfs of the Bin(100, .5) (dots) and N(50, 5) (line) distributions . . . . . . Approximate density of summary statistic w . . . . . . . . . . . . . . . . Number of times baboon father helps own child . . . . . . . . . . . . . . Histogram of simulated values of w.tot . . . . . . . . . . . . . . . . . . vi 104 108 111 113 115 119 120 123 124 125 126 129 130 131 135 137 139 142 144 147 149 153 158 160 162 169 171 172 173 175 179 183 184 186 190 191 LIST OF FIGURES vii 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15 3.16 3.17 3.18 3.19 Four regression examples . . . . . . . . . . . . . . . . . . . . . 1970 draft lottery. Draft number vs. day of year . . . . . . . . Draft number vs. day of year with smoothers . . . . . . . . . . . Total number of New seedlings 1993 – 1997, by quadrat. . . . . . Calorie content of hot dogs . . . . . . . . . . . . . . . . . . . . Density estimates of calorie contents of hot dogs . . . . . . . . The PlantGrowth data . . . . . . . . . . . . . . . . . . . . . . . Ice cream consumption versus mean temperature . . . . . . . . . Likelihood functions for (µ, δ M , δP ) in the Hot Dog example. . . pairs plot of the mtcars data . . . . . . . . . . . . . . . . . . . mtcars — various plots . . . . . . . . . . . . . . . . . . . . . . likelihood functions for β1 , γ1 , δ1 and δ2 in the mtcars example. Pine cones and O-rings . . . . . . . . . . . . . . . . . . . . . . . Pine cones and O-rings with regression curves . . . . . . . . . . Likelihood function for the pine cone data . . . . . . . . . . . . Actual vs. fitted and residuals vs. fitted for the seedling data . Diagnostic plots for the seedling data . . . . . . . . . . . . . . . Actual mpg and fitted values from three models . . . . . . . . . Happiness Quotient of bankers and poets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 4.2 The (X1 , X2 ) plane and the (Y1 , Y2 ) plane . . . . . . . . . . . . . . . . . . 269 pmf’s, pdf’s, and cdf’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 5.1 5.2 5.3 5.4 5.5 285 288 294 299 The Binomial pmf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Negative Binomial pmf . . . . . . . . . . . . . . . . . . . . . . . . . Poisson pmf for λ = 1, 4, 16, 64 . . . . . . . . . . . . . . . . . . . . . . . Rutherford and Geiger’s Figure 1 . . . . . . . . . . . . . . . . . . . . . Numbers of firings of a neuron in 150 msec after five different tastants. Tastants: 1=MSG .1M; 2=MSG .3M; 3=NaCl .1M; 4=NaCl .3M; 5=water. Panels: A: A stripchart. Each circle represents one delivery of a tastant. B: A mosaic plot. C: Each line represents one tastant. D: Likelihood functions. Each line represents one tastant. . . . . . . . . . . . 5.6 The line shows Poisson probabilities for λ = 0.2; the circles show the fraction of times the neuron responded with 0, 1, . . . , 5 spikes for each of the five tastants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Gamma densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Exponential densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9 Beta densities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10 Water temperatures (◦ C) at 1000m depth . . . . . . . . . . . . . . . . . 203 206 207 209 211 213 215 221 227 230 232 234 238 239 242 247 249 251 255 301 303 306 310 314 316 LIST OF FIGURES viii 5.11 Bivariate Normal density . . . . . . . . . . . . . . . . . . . . . . . . . 323 5.12 Bivariate Normal density . . . . . . . . . . . . . . . . . . . . . . . . . 325 5.13 t densities for four degrees of freedom and the N(0, 1) density . . . . . 334 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.14 7.1 7.2 Posterior densities of β0 and β1 in the ice cream example using the prior from Equation 6.4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Numbers of pine cones in 1998 as a function of dbh . . . . . . . . . . . . Numbers of pine cones in 1999 as a function of dbh . . . . . . . . . . . . Numbers of pine cones in 2000 as a function of dbh . . . . . . . . . . . . 10,000 MCMC samples of the Be(5, 2) density. Top panel: histogram of samples from the Metropolis-Hastings algorithm and the Be(5, 2) density. Middle panel: θi plotted against i. Bottom panel: p(θi ) plotted against i. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10,000 MCMC samples of the Be(5, 2) density. Left column: (θ∗ | θ) = U(θ − 100, θ + 100); Right column: (θ∗ | θ) = U(θ − .00001, θ + .00001). Top: histogram of samples from the Metropolis-Hastings algorithm and the Be(5, 2) density. Middle: θi plotted against i. Bottom: p(θi ) plotted against i. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trace plots of MCMC output from the pine cone code on page 365. . . . Trace plots of MCMC output from the pine cone code with a smaller proposal radius. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trace plots of MCMC output from the pine cone code with a smaller proposal radius and 100,000 iterations. The plots show every 10’th iteration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Trace plots of MCMC output from the pine cone code with proposal function g.one and 100,000 iterations. The plots show every 10’th iteration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pairs plots of MCMC output from the pine cones example. . . . . . . . . Trace plots of MCMC output from the pine cone code with proposal function g.group and 100,000 iterations. The plots show every 10’th iteration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pairs plots of MCMC output from the pine cones example with proposal g.group. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Posterior density of β2 and γ2 from Example 6.3. . . . . . . . . . . . . . 350 354 355 356 361 363 367 368 370 371 373 375 376 377 Plots of the Orthodont data: distance as a function of age, grouped by Subject, separated by Sex. . . . . . . . . . . . . . . . . . . . . . . . . . 384 Plots of the Orthodont data: distance as a function of age, separated by Subject. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 LIST OF FIGURES 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12 7.13 8.1 8.2 8.3 8.4 8.5 8.6 Percent body fat of major (blue) and minor (purple) Pheidole morrisi ants at three sites in two seasons. . . . . . . . . . . . . . . . . . . . . . Residuals from Model 7.4. Each point represents one colony. There is an upward trend, indicating the possible presence of colony effects. . . . Some time series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yt+1 vs. Yt for the Beaver and Presidents data sets . . . . . . . . . . . . Yt+k vs. Yt for the Beaver data set and lags 0–5 . . . . . . . . . . . . . . coplot of Yt+1 ∼ Yt−1 | Yt for the Beaver data set . . . . . . . . . . . . . Fit of CO2 data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DAX closing prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DAX returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Survival curve for bladder cancer. Solid line for placebo; dashed line for thiotepa. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cumulative hazard and log(hazard) curves for bladder cancer. Solid line for thiotepa; dashed line for placebo. . . . . . . . . . . . . . . . . Mean Squared Error for estimating Binomial θ. Sample size = 5, 20, 100, 1000. α = β = 0: solid line. α = β = 0.5: dashed line. α = β = 1: dotted line. α = β = 4: dash–dotted line. . . . . . . . . . . . . . . . . . The Be(.39, .01) density . . . . . . . . . . . . . . . . . . . . . . . . . . . Densities of Y¯ in . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Densities of Zin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The δ-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Top panel: asymptotic standard deviations of δn and δ0n for Pr[X ≤ a]. The solid line shows the actual relationship. The dotted line is the line of equality. Bottom panel: the ratio of asymptotic standard deviations. . ix 391 393 398 399...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture