This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 12 Latent variable methods for ordinal data Many datasets include variables whose distributions cannot be represented by the normal, binomial or Poisson distributions we have studied thus far. For example, distributions of common survey variables such as age, education level and income generally cannot be accurately described by any of the above mentioned sampling models. Additionally, such variables are often binned into ordered categories, the number of which may vary from survey to survey. In such situations, interest often lies not in the scale of each individual variable, but rather in the associations between the variables: Is the relationship be tween two variables positive, negative or zero? What happens if we “account” for a third variable? For normally distributed data these types of questions can be addressed with the multivariate normal and linear regression models of Chapters 7 and 9. In this chapter we extend these models to situations where the data are not normal, by expressing nonnormal random variables as functions of unobserved, “latent” normally distributed random variables. Multivariate normal and linear regression models then can be applied to the latent data. 12.1 Ordered probit regression and the rank likelihood Suppose we are interested in describing the relationship between the edu cational attainment and number of children of individuals in a population. Additionally, we might suspect that an individual’s educational attainment may be influenced by their parent’s education level. The 1994 General Social Survey provides data on variables DEG, CHILD and PDEG for a sample of individuals in the United States, where DEG i indicates the highest degree obtained by individual i , CHILD i is their number of children and PDEG i is the binary indicator of whether or not either parent of i obtained a college degree. Using these data, we might be tempted to investigate the relationship between the variables with a linear regression model: P.D. Hoff, A First Course in Bayesian Statistical Methods , Springer Texts in Statistics, DOI 10.1007/9780387924076 12, c Springer Science+Business Media, LLC 2009 210 12 Latent variable methods for ordinal data DEG i = β 1 + β 2 × CHILD i + β 3 × PDEG i + β 4 × CHILD i × PDEG i + i , where we assume that 1 ,..., n ∼ i.i.d. normal(0 ,σ 2 ). However, such a model would be inappropriate for a couple of reasons. Empirical distributions of DEG and CHILD for a sample of 1,002 males in the 1994 workforce are shown in Figure 12.1. The value of DEG is recorded as taking a value in { 1 , 2 , 3 , 4 , 5 } corresponding to the highest degree of the respondent being no degree, high school degree, associate’s degree, bachelor’s degree, or graduate degree....
View
Full
Document
This note was uploaded on 11/24/2010 for the course STAT 201a taught by Professor Wu during the Spring '10 term at Pasadena City College.
 Spring '10
 wu
 Binomial, Poisson Distribution

Click to edit the document details