2_Probability and Statistics

# 7 two remarks 1 the variance measures the degree of

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: we distinguish between parametric and nonparametric estimators. Parametric estimators are constructed under the assumption of knowing the functional form of probability density function f (y; Q) . The unknown parameter Q is estimated from the data. In contrast, non-parametric estimators are constructed without any distributional assumptions of the probability density function. 7 Two remarks: (1) The variance measures the degree of dispersion of a random variable around its mean. Therefore, estimators with small variances are more concentrated, i.e. they estimate the parameters more precisely. An unbiased estimator is called a (finite-sample) efficient estimator if it reaches the lower bound in the Cramér–Rao inequality above, ˆ ˆ ˆ i.e. Var (q1 ) £ Var (q ) for all unbiased estimators q . (2) In defining efficiency, we have limited our discussion to unbiased estimators. Clearly, there are biased estimators that hav smaller variances than the unbiased ones we have considered. For example, any constant has zero variance. Of course, using a constant as an estimator is not likely to be an effective use of the sample data. Needless to say, in practice, we may want to use a biased estimator (of course, as long as the bias is small enough) with a much smaller variance. The mean-squared error (MSE), also discussed in this handout, explicitly measures this trade-off. 8 ˆ Proof: Define q * = E (q ) . Then, ( ) ( ) ˆ ˆ E (q - q)2 = E (q - q * + q * - q)2 ˆ ˆ = E (q - q * )2 + (q * - q)2 + 2(q * - q)E (q - q * ) ˆ = Var (q) + (q * - q)2 ˆ ˆ = Var (q) + Bias(q)2 VER. 9/11/2012. © P. KOLM 67 9 If you are familiar with probability theory, you have probably already noted that consistency implies that the sequence of estimators converges in probability to the true underlying parameter. 10 Note that this is a random interval containing the population mean, mY , with probability .95. It is a random interval, since the endpoints change with different samples. Probability plays no role once the confidence interval is computed for the particular data at hand. The probabilistic interpretation comes from the fact that for 95% of all random samples, the constructed confidence interval will contain mY . 11 We start from the assumption that the null hypothesis is true. Then the p-value is the probability of getting a value equal to or more extreme than the sample result. The decision rule then become: If the p-value is less than 5% then reject the null hypothesis. If the p-value is 5% or more, then we cannot reject the null hypothesis. 12 Classical hypothesis testing requires that we initially specify a significance level for a test. When we specify a value for, we are essentially quantifying our tolerance for a Type I error. Common values for are 0.10, 0.05, and .01. If a = 0.05 , then we are willing to falsely reject H 0 5% of the time, in order to detect deviations from H 0 . Once we have chosen the significance level, we would then like to minimize the probability of a Type II error. Alternatively, we would like to maximize the power of a test against all relevant alternatives. The power of a test is just one minus the probability of a Type II error. Mathematically, p(q) = P (reject H 0 | q) = 1 - P (Type II|q) = 1 - P (fail to reject H 0 | q) , where q denotes the actual value of the parameter. We would like the power to equal unity whenever the null hypothesis is false. But this is impossible to achieve while keeping the significance level small. Instead, we choose our tests to maximize the power for a given significance level. VER. 9/11/2012. © P. KOLM 68...
View Full Document

## This document was uploaded on 02/17/2014 for the course COURANT G63.2751.0 at NYU.

Ask a homework question - tutors are online