This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Regression Models for Count Data in R Achim Zeileis Universit¨ at Innsbruck Christian Kleiber Universit¨ at Basel Simon Jackman Stanford University Abstract The classical Poisson, geometric and negative binomial regression models for count data belong to the family of generalized linear models and are available at the core of the statistics toolbox in the R system for statistical computing. After reviewing the conceptual and computational features of these methods, a new implementation of hurdle and zero-inflated regression models in the functions hurdle() and zeroinfl() from the package pscl is introduced. It re-uses design and functionality of the basic R functions just as the underlying conceptual tools extend the classical models. Both hurdle and zero- inflated model, are able to incorporate over-dispersion and excess zeros—two problems that typically occur in count data sets in economics and the social sciences—better than their classical counterparts. Using cross-section data on the demand for medical care, it is illustrated how the classical as well as the zero-augmented models can be fitted, inspected and tested in practice. Keywords : GLM, Poisson model, negative binomial model, hurdle model, zero-inflated model. 1. Introduction Modeling count variables is a common task in economics and the social sciences. The classical Poisson regression model for count data is often of limited use in these disciplines because empirical count data sets typically exhibit over-dispersion and/or an excess number of zeros. The former issue can be addressed by extending the plain Poisson regression model in various directions: e.g., using sandwich covariances or estimating an additional dispersion parameter (in a so-called quasi-Poisson model). Another more formal way is to use a negative bino- mial (NB) regression. All of these models belong to the family of generalized linear models (GLMs, see Nelder and Wedderburn 1972 ; McCullagh and Nelder 1989 ). However, although these models typically can capture over-dispersion rather well, they are in many applications not sufficient for modeling excess zeros. Since Mullahy ( 1986 ) and Lambert ( 1992 ) there is in- creased interest, both in the econometrics and statistics literature, in zero-augmented models that address this issue by a second model component capturing zero counts. Hurdle models ( Mullahy 1986 ) combine a left-truncated count component with a right-censored hurdle com- ponent. Zero-inflation models ( Lambert 1992 ) take a somewhat different approach: they are mixture models that combine a count component and a point mass at zero. An overview of count data models in econometrics, including hurdle and zero-inflated models, is provided in Cameron and Trivedi ( 1998 , 2005 )....
View Full Document
- Summer '08
- Regression Analysis, Poisson regression, count data, Christian Kleiber, Achim Zeileis, Simon Jackman