This preview shows pages 1–3. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Optimal scaling methods for multivariate categorical data analysis By Jacqueline J. Meulman, Ph.D., Data Theory Group Faculty of Social and Behavioral Sciences Leiden University w h i t e p a p e r w h i t e p a p e r O p t i m a l s c a l i n g m e t h o d s f o r m u l t i v a r i a t e c a t e g o r i c a l d a t a a n a l y s i s 2 O ver more than 60 years, classical statistical methods have been adapted in various ways to suit the particular characteristics of social and behavioral science research, with important areas of applications such as market research. Research in these branches often results in data that are non-numerical, with measurements recorded on scales having an uncertain unit of measurement. Data would typically consist of qualitative or categorical variables that describe the units (objects, subjects) in a limited number of categories. The zero point of these scales is uncertain, the relationships among the different categories is often unknown, and although frequently it can be assumed that the categories are ordered, their mutual distances might still be unknown. The uncertainty in the unit of measurement is not just a matter of measurement error, because its variability may have a systematic component. An important development in multidimensional data analysis has been the optimal assignment of quantitative values to such qualitative scales. This form of optimal quantification (scaling, scoring) is a general approach to treat multivariate (categorical) data. For example, in the simple linear regression model we wish to predict a response variable z from m predictor variables in X . This objective is achieved by finding a particular linear combination Xb that correlates maximally with z . Incorpo- rating optimal scaling amounts to the minimization of ||X*b - z*||**2 over regression weights b , and nonlinear functions z* = θ (Z) and X j * = ϕ j (X j ), j = 1,..., m . Thus, optimal scaling maximizes the correlation between θ (z) and ∑ j m (b j ϕ j (X j )), over feasible nonlinear functions. These functions are called transformations for quantita- tive variables, and scalings, scorings or quantifications for categorical variables. Categorical variables are dealt with in this framework in the following way. A categorical variable h j defines a binary indicator matrix G j with n rows and l j columns, where l j denotes the number of categories. Elements h i j then define elements g ir ( j ) as follows: h i j = r → g ir ( j ) = 1; h i j ≠ r → g ir ( j ) = 0, where r = 1,..., l j is the running index indicating a category number in variable j. If category quantifications are denoted by y j , then a transformed variable can be written as G j y j and, for instance, a weighted sum of predictor variables as ∑ j m b j G j y j = X * b , which is as in the standard linear model....
View Full Document
This note was uploaded on 07/21/2011 for the course BUS 10001 taught by Professor All during the Spring '11 term at Shaheed Zulfiqar Ali Bhutto Institute of Science and Technology.
- Spring '11