orie474sec6 - Fall 2006 ORIE474: Section 6 notes Nikolai...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Fall 2006 ORIE474: Section 6 notes Nikolai Blizniouk The goal of these notes is to provide some guidance for the use of Regression node in SAS . The setup assumes that you have drawn a diagram similar to that from Section 5. Doing regression with categorical variables using SAS Suppose we have a population divided in J disjoint strata/categories and we make a measurement on a subject that belongs to one of these strata. Consider a simple regression model of the form Y i = β 0 + J s j =1 β 1 ,j · I ( i th subject belongs to j th category) + ǫ i , where j = 1 , . . . , J and ǫ i is a zero-mean noise term. 1 The coeFcient β 0 is treated as the overall population mean (aka “grand mean”) and β 1 ,j ’s measure the amount by which the mean of the j th stratum deviates from the grand mean. Given that we have the data Y 1 , . . . , Y n , the goal is to estimate the parameters ( β 0 , β 1 , 1 , ..., β 1 ,J ) of the model. It is known from which subpopulation each Y i comes from. Notice that the above model does not determine the parameters uniquely. In statistics, a formal statement is “parameterization is not identi±able”. This means that no matter how much data we have, it will not be possible to say for certain which parameter values actually were used to generate the data. Why? Because Y i = β 0 + β 1 ,j + ǫ i = ( β 0 α ) + ( β 1 ,j + α ) + ǫ i , for every value of α , so even if you knew the expectation (i.e., true mean) μ i of each of Y i ’s, you would still be unable to determine the β ’s uniquely. ²or example, suppose we have two subpopulations of ORIE graduates, one of which consists of individuals whose highest degree is Bachelor and the other of those with Masters. Let Y i denote the random variable for the current salary of the i th individual. Then β 1 ,j ’s will capture deviations of the subpopulation means ( μ j = β 0 + β 1 ,j ) from some “base level” mean β 0 . Obviously, unless β 0 is ±xed, one would be unable to determine β 1 ,j ’s even if Y i ’s did not include the error term ǫ i . It is thus desirable to avoid this kind of situation by putting constraints on pa-
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 02/06/2011 for the course ORIE 474 taught by Professor Apanasovich during the Spring '07 term at Cornell.

Page1 / 3

orie474sec6 - Fall 2006 ORIE474: Section 6 notes Nikolai...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online