Thisgivesthejointdensityofalltheunknownparametersconditionedontheobserved data. Our Bayesian
estimators of the parameters will be the posterior means for these (n+1)K + K(K +1)/2 parameters. In
principle, this requires integration of (1623) with respect
when the sample size is small or moderate. The least absolute deviations (LAD) estimator has been
suggested as an alternative that remedies (at least to some degree) the problem. The LAD estimator is
the solution to the optimization problem,
Minb0
n i=1

1988. Independent variables in the model that we formulated were xit1 = constant, xit2 = log of sales,
xit3 = relative size=ratio of employment in business unit to employment in the industry, xit4 = ratio of
industry imports to (industry sales+imports), x
differ considerably from the probit model, but in each case, a condence interval around the posterior
mean contains the probit estimator. Finally, the (identical) prior and average of the sample posterior
class probabilities are shown at the bottom of the
This set of probabilities, wi =(wi1,wi2,.,wiJ) gives the posterior density over the distribution of values of
, that is, [1,2,.,J]. The Bayesian estimator of the (individual specic) parameter vector would be the
posterior mean
p i = Ej[ j observationi]=
h;onepopularchoiceisthatusedbyStata,h= .9s/n1/5.Thekernelfunctionislikewise discretionary, though it
rarely matters much which one chooses; the logit kernel (see Table 16.4) is a common choice. The
bootstrap method of inferring statistical properties is w
xit6 = productivity=ratio of industry value added to industry employment, xit7 = dummy variable
indicating rm is in the raw materials sector, xit8 = dummy variable indicating rm is in the investment
goods sector. Discussion of the data set may be found in
where V = (1/n)n i=1(i )(i ). Train (2001) suggests the following strategy for sampling a matrix
from this distribution: Let M be the lower triangular Cholesky factorofW1,soMM =W1.Obtain
K+ndrawsofvk = Kstandardnormalvariates. Then,obtainS=M(K+n k=1 vkvk)
Therefore, the result favors the model that provides the better t using R2 as the t measure. If we
stretch Zellners analysis a bit by interpreting model 1 as the model and model 0 as no model (i.e.,
the relevant part of 0 =0, soR2 0 =0), then the ratio si
CHAPTER 16 Estimation Frameworks in Econometrics
The kernel density function is an estimator. For any specic x, f(x) is a sample statistic,
f(z) =
1n
n i=1
g(xi z,h).
Since g(xi z,h) is nonlinear, we should expect a bias in a nite sample. It is temptin
anexample.InExample16.5,weestimatedamodelthatproducedaposteriorestimator of a slope vector for
each of the 1,270 rms in our sample. We might be interested in the distribution of these estimators
across rms. In particular, the posterior estimates of the es
ication around a linear function, x, which makes them at least semiparametric, but nonetheless still
avoid distributional assumptions by using kernel methods. Lewbels (2000) estimator for the binary
choice model is another example.
Example 16.8 Semiparame
CHAPTER 16 Estimation Frameworks in Econometrics 451
hand,anoisyestimateof f(xi) couldbeestimatedwith yi z i
d (theestimatecontains
the estimation error as well as vi).24 The problem, of course, is that the enabling assumption is heroic.
Data would not b
thesampledataandinvolve,aswell,complicatedfunctionsofallthemodelparameters.The estimated numbers
of class members are computed by assigning to each rm the predicted
17The authors used the robust sandwich estimator for the standard errorssee Section 17.9ra
where K[i ,] denotes the K variate normal prior density for i given and . Maximum likelihood
estimation of this model, which entails estimation of the deep parameters, , , then estimation of the
individual specic parameters, i using the
samemethodweusedf
xit6 = productivity=ratio of industry value added to industry employment, xit7 = dummy variable
indicating rm is in the raw materials sector, xit8 = dummy variable indicating rm is in the investment
goods sector. Discussion of the data set may be found in
CHAPTER 16 Estimation Frameworks in Econometrics 439
We will briey sketch his results. We form informative priors for [, 2]j, j = 0,1, as specied in (1612)
and (1613), that is, multivariate normal and inverted gamma, respectively. Zellner then derives t
CHAPTER 16 Estimation Frameworks in Econometrics
16.2.4 HIERARCHICAL BAYES ESTIMATION OF A RANDOM PARAMETERS MODEL BY MARKOV CHAIN
MONTE CARLO SIMULATION We now consider a Bayesian approach to estimation of the random
parameters model in (1619). For an i
parametric estimator when the assumption of the distribution is correct. Once again, in the frontier
function setting, least squares may be robust for the slopes, and it is the most efcient estimator that
uses only the orthogonality of the disturbances an
all values of x. Here, we are interested in methods that do not assume any particular functional form.
Thesimplestcasetoanalyzewouldbeoneinwhichseveral(different)observations on yi were made with each
specic value of xi. Then, the conditional mean functio
CHAPTER 16 Estimation Frameworks in Econometrics 441
probabilities will arise if an arbitrary vector is added to every j. The resulting log likelihood is a
continuous function of the parameters 1,.,J and 1,.,J. For all its apparent complexity, estimation
,.Conditioned on and , has a Kvariate normal distribution with mean = (1/n)n i=1 i and
covariance matrix (1/n). To sample from this distribution we will rstobtaintheCholeskyfactorizationof
=LL whereLisalowertriangularmatrix.
[SeeSectionA.7.11.]Letvbeave
CHAPTER 16 Estimation Frameworks in Econometrics
normal equations for least squares. Note that the estimator is specied without benet
ofanydistributionalassumption.MethodofmomentsestimationisthesubjectofChapter 18, so we will defer
further analysis until
A ratio of exponentials that appears in Zellners result (his equation 10.50) is omitted. To the order of
approximation in the result, this ratio vanishes from the nal result. (Personal correspondence from A.
Zellner to the author.) 14In principle, the lat
The naive estimator has several shortcomings. It is neither smooth nor continuous. Its shape is partly
determined by where the leftmost and rightmost terminals of the histogram are set. (In constructing a
histogram, one often chooses the bin width to be a
The sample consists of 1270 German manufacturing rms observed for ve years, 1984 1988.
Independent variables in the model that we formulated were xit1 = constant, xit2 = log of sales, xit3 =
relative size=ratio of employment in business unit to employment
mightstillbeinterestedinrelaxingtheassumptionoffunctionalforminthemodel.The
partiallylinearmodel[analyzedindetailbyYatchew(1998,2000)]isanotherapproach. Consider a regression
model in which one variable, x, is of particular interest, and the functional fo
mationofthemodelparameters.WewillreturntothismodelformulationinChapter17. The preceding has
assumed i has a continuous distribution. Suppose that i is generated from a discrete distribution with J
values, or classes, so that the distribution of isoverthes
CHAPTER 16 Estimation Frameworks in Econometrics
variation comes from the individual heterogeneity, vi. This random vector is assumed
tohavemeanzeroandcovariancematrix,.Theconditionaldensityoftheparametersis g(i zi,) = g(vi +
+zi,), where g(.) is the und
For example, ones uncertainty about the sign of a parameter might be summarized in aprioroddsover
H0:0versus H1:<0of0 .5/0.5=1.Afterthesampleevidenceisgathered, the prior will be modi ed, so the
posterior is, in general, Oddsposterior = B01 Oddsprior. The
it with Bickel and Doksums (2000) version, which observes that the asymptotic sampling distribution of
the posterior mean is the same as the asymptotic distribution of the maximum likelihood estimator. The
practical implication of this for us is that if t
21Forsomeapplications,seeTaylor(1974),Amemiya(1985, pp.7080),Andrews(1974),KoenkerandBassett
(1978),andasurveywrittenataveryaccessiblelevelbyBirkesandDodge(1993).Asomewhatmorerigorous
treatment is given by Hardle (1990). 22Powell (1984) has extended the L
where K[i ,] denotes the K variate normal prior density for i given and . Maximum likelihood
estimation of this model, which entails estimation of the deep parameters, , , then estimation of the
individual specic parameters, i using the
samemethodweusedf
construction. The least squares coefcient vectors with and without these two observations are (1.844,
0.245, 0.805) and (1.764, 0.209, 0.852), respectively, which bears out the suggestion that these two
points do exert considerable inuence. Table 16.3 pre
USING BAYES THEOREM IN A CLASSICAL ESTIMATION PROBLEM: THE LATENT CLASS MODEL
Latentclassmodelingcanbeviewedasameansofmodelingheterogeneityacrossindividuals in a random
parameters framework. We rst encountered random parameters models in Section 13.8 in c