Stata Technical Bulletin
19
diagonal matrix with elements
if
if
otherwise
and
is the design matrix
. This is derived from formula 3.11 in Koenker and Bassett, although their notation is much
different.
refers to the density of the true residuals. There are many things that Koenker and Bassett leave unspecified,
including how one should obtain a density estimate for the errors in real data. It is at this point that we offer our contribution.
We first sort the residuals and locate the observation in the residuals corresponding to the quantile in question, taking
into account weights if they are applied. We then calculate
, the square root of the sum of the weights. Unweighted data
is equivalent to weighted data where each observation has weight 1, resulting in
. For analytically weighted data,
the weights are rescaled so that the sum of the weights is the sum of the observations, resulting in
again. For frequency
weighted data,
literally is the square of the sum of the weights.
We locate the closest observation in each direction such that the sum of weights for all closer observations is
. If we
run off the end of the dataset, we stop. We calculate
, the sum of weights for all observations in this middle space. Typically,
is slightly greater than
.
The residuals obtained after quantile regression have the property that if there are
parameters, then exactly
of them must
be zero. Thus, we calculate an adjusted weight
. The density estimate is the distance spanned by these observations
divided by
. Because the distance spanned by this mechanism converges toward zero, this estimate of the density converges
in probability to the true density.
References
Gould, W. 1992. Quantile regression and bootstrapped standard errors.
Stata Technical Bulletin
9: 19–21.
Koenker, R. and G. Bassett, Jr. 1982. Robust tests for heteroscedasticity based on regression quantiles.
Econometrica
50: 43–61.
Rogers, W. H. 1992. Quantile regression standard errors.
Stata Technical Bulletin
9: 16–19.
sg17
Regression standard errors in clustered samples
William Rogers,
CRC
,
FAX
310-393-7551
Stata’s
,
and
commands estimate regression, maximum-likelihood logit, and maximum-likelihood
probit models based on Huber’s (1967) formula for individual-level data and they produce consistent standard errors even if
there is heteroscedasticity, clustered sampling, or the data is weighted. The description of this in [5s] hreg might lead one
to believe that Huber originally considered clustered data, but that is not true. I developed this approach to deal with cluster
sampling problems in the
RAND
Health Insurance Experiment in the early 1980s (Rogers 1983; Rogers and Hanley 1982; Brook,
et al. 1983). What is true is that with one simple assumption, the framework proposed by Huber can be applied to produce
the answer we propose. That assumption is that the clusters are drawn as a simple random sample from some population. The
observations must be obtained within each cluster by some repeatable procedure.