3.9Categorical explanatory variablesIn this section, we indicate how to code categorical explanatory variables into binary dummy variables touse for multiple regression. A categorical variable withmcategories (m≥2) is converted intom-1 binaryvariables, where one category is chosen as a baseline category and them-1 binary variables representdifferences of the other categories relative to the baseline.The selection of the baseline category is notunique.To show the main ideas, first consider a multiple regression with one continuous and one categoricalexplanatory variable. If the categorical variable has two categories (e.g., female and male), definezi=1if category 2 forith case,0if category 1 for ith case.(3.109)In this case, category 1 is considered as the baseline category. The data are converted (yi, xi, zi),i= 1, . . . , n.The regression equation becomesYi=μY(xi, zi) +i, whereμY(xi, zi) =β0+β1xi+β2zi=β0+β1xiif category 1,β0+β1xi+β2= (β0+β2) +β1xiif category 2.(3.110)This implies a model where the relation ofywithxis linear for both categories and there is a common slope.So on a scatterplot, the data for the two categories should lie roughly on parallel lines.β2is interpreted asthe separation distance of the two lines.If the scatterplot shows linear relationships with different slopes for the two categories, then for multipleregression, use converted (yi, xi, zi, xizi),i= 1, . . . , n. The regression equation becomesYi=μ*Y(xi, zi) +i,whereμ*Y(xi, zi)=β0+β1xi+β2zi+β3xizi(3.111)=β0+β1xiif category 1,β0+β1xi+β2+β3xi= (β0+β2) + (β1+β3)xiif category 2.(3.112)Henceβ3is interpreted as the difference in slope for category 2 versus category 1.Is there a simple in-terpretation forβ2in this case?The productxiziis an example of what is called aninteraction terminmultiple regression.Interaction terms involving products of other explanatory variables indicate that thetwo variables do not influence the mean value of the response in an additive manner.58
With two predictors of which one is continuous and the other is binary, the regression lines can be shownin a plot. See, for example, the Figure 3.6.For categorical variable withmcategories, createm-1 binary dummy variableszi2, . . . , zimwherezij=1if categoryjforith case,0otherwise,forj= 2, . . . , m.(3.113)Here, category 1 is considered as the baseline category.If theith observation is in category 1, then(zi2, . . . , zim) = (0, . . . ,0).If theith observation is in category 2, then (zi2, . . . , zim) = (1,0, . . . ,0).Iftheith observation is in category 3, then (zi2, . . . , zim) = (0,1,0, . . . ,0), etc. If theith observation is in cat-egorym, then (zi2, . . . , zim) = (0, . . . ,0,1). The regression equation becomesYi=μY(xi, zi2, . . . , zim) +i,whereμY(xi, zi2, . . . , zim)=β0+β1xi+β2zi2+· · ·+βmzim(3.114)=β0+β1xiif category 1,(β0+β2) +β1xiif category 2,(β0+β3) +β1xiif category 3,· · ·etc.