International Encyclopedia of Statistical Science - M. Lovric (Springer, 2011) BBS.pdf

This preview shows page 1 out of 1668 pages.

Unformatted text preview: A of λ, the greater the amount of shrinkage. The quadratic ridge penalty term makes βˆ a linear function of y. Frank Absolute Penalty Estimation Ejaz S. Ahmed , Enayetur Raheem , Shakhawat Hossain Professor and Department Head of Mathematics and Statistics University of Windsor, Windsor, ON, Canada University of Windsor, Windsor, ON, Canada and Friedman () introduced bridge regression, a generalized version of penalty (or absolute penalty type) estimation, which includes ridge regression when γ = . For a given penalty function π(⋅) and regularization parameter λ, the general form can be written as In statistics, the technique of 7least squares is used for estimating the unknown parameters in a linear regression model (see 7Linear Regression Models). This method minimizes the sum of squared distances between the observed responses in a set of data, and the fitted responses from the regression model. Suppose we observe a collection of data {yi , xi }ni= on n units, where yi s are responses and xi = (xi , xi , . . . , xip )T is a vector of predictors. It is convenient to write the model in matrix notation, as, where the penalty function is of the form y = Xβ + ε, ϕ(β) = (y − X β)T (y − X β) + λπ(β), () where y is n × vector of responses, X is n × p matrix, known as the design matrix, β = (β , β , . . . , β p )T is the unknown parameter vector and ε is the vector of random errors. In ordinary least squares (OLS) regression, we estimate β by minimizing the residual sum of squares, RSS = (y − Xβ)T (y − Xβ), giving βˆ OLS = (X TX)− X Ty. This estimator is simple and has some good statistical properties. However, the estimator suffers from lack of uniqueness if the design matrix X is less than full rank, and if the columns of X are (nearly) collinear. To achieve better prediction and to alleviate ill conditioning problem of X T X, Hoerl and Kernard () introduced ridge regression (see 7Ridge and Surrogate Ridge Regressions), which minimizes the RSS subject to a constraint, ∑ β j ≤ t, in other words ⎧ ⎫ p p ⎪ ⎪ ridge ⎪N ⎪ βˆ = argmin ⎨∑(yi − β − ∑ xij β j ) + λ ∑ β j ⎬ , ⎪ ⎪ ⎪ ⎪ β j= j= ⎩ i= ⎭ () where λ ≥ is known as the complexity parameter that controls the amount of shrinkage. The larger the value p π(β) = ∑ ∣β j ∣γ , γ > . () j= The penalty function in () bounds the Lγ norm of the γ parameters in the given model as ∑m j= ∣β j ∣ ≤ t, where t is the tuning parameter that controls the amount of shrinkage. We see that for γ = , we obtain ridge regression. However, if γ ≠ , the penalty function will not be rotationally invariant. Interestingly, for γ < , it shrinks the coefficient toward zero, and depending on the value of λ, it sets some of them to be exactly zero. Thus, the procedure combines variable selection and shrinkage of coefficients of penalized regression. An important member of the penalized least squares (PLS) family is the L penalized least squares estimator or the lasso [least absolute shrinkage and selection operator, Tibshirani ()]. In other words, the absolute penalty estimator (APE) arises when the absolute value of penalty term is considered, i.e., γ = in (). Similar to the ridge regression, the lasso estimates are obtained as ⎧ ⎫ p p ⎪ ⎪ lasso ⎪ ⎪n βˆ = argmin ⎨∑(yi − β − ∑ xij β j ) + λ ∑ ∣β j ∣⎬ . ⎪ ⎪ ⎪ ⎪ β j= j= ⎩ i= ⎭ () The lasso shrinks the OLS estimator toward zero and depending on the value of λ, it sets some coefficients to exactly zero. Tibshirani () used a quadratic lasso programming method to solve () for βˆ . Later, Efron et al. () proposed least angle regression (LAR), a type of stepwise regression, with which the Miodrag Lovric (ed.), International Encyclopedia of Statistical Science, DOI ./----, © Springer-Verlag Berlin Heidelberg A Absolute Penalty Estimation lasso estimates can be obtained at the same computational cost as that of an ordinary least squares estimation Hastie et al. (). Further, the lasso estimator remains numerically feasible for dimensions m that are much higher than the sample size n. Zou and Hastie () introduced a hybrid PLS regression with p the so called elastic net penalty defined as λ ∑j= (αβ j + ( − α) ∣β j ∣). Here the penalty function is a linear combination of the ridge regression penalty function and lasso penalty function. A different type of PLS, called garotte is due to Breiman (). Further, PLS estimation provides a generalization of both nonparametric least squares and weighted projection estimators, and a popular version of the PLS is given by Tikhonov regularization (Tikhonov ). Generally speaking, the ridge regression is highly efficient and stable when there are many small coefficients. The performance of lasso is superior when there are a small-to-medium number of moderatesized coefficients. On the other hand, shrinkage estimators perform well when there are large known zero coefficients. Ahmed et al. () proposed an APE for partially linear models. Further, they reappraised the properties of shrinkage estimators based on Stein-rule estimation. There exists a whole family of estimators that are better than OLS estimators in regression models when the number of predictors is large. A partially linear regression model is defined as yi = xTi β + g(ti ) + ε i , i = , . . . , n, () where ti ∈ [, ] are design points, g(⋅) is an unknown real-valued function defined on [, ], and yi , x, β, and ε i ’s are as defined in the context of (). We consider experiments where the vector of coefficients β in the linear part of () can be partitioned as (β T , β T )T , where β is the coefficient vector of order p × for main effects (e.g., treatment effects, genetic effects) and β is a vector of order p × for “nuisance” effects (e.g., age, laboratory). Our relevant hypothesis is H : β = . Let βˆ be a semiparametric least squares estimator of β , and we let β˜ denote the restricted semiparametric least squares estimator of β . Then the semiparametric Stein-type estimator (see 7James-Stein Estimator and Semiparametric RegresS sion Models), βˆ , of β is S − βˆ = β˜ + { − (p − )T }( βˆ − β˜ ), p ≥ () where T is an appropriate test statistic for the H . S+ A positive-rule shrinkage estimator (PSE) βˆ is defined as βˆ S+ = β˜ + { − (p − )T − }+ (βˆ − β˜ ), p ≥ () where z+ = max(, z). The PSE is particularly important to S control the over-shrinking inherent in βˆ . The shrinkage estimators can be viewed as a competitor to the APE approach. Ahmed et al. () finds that, when p is relatively small with respect to p, APE performs better than the shrinkage method. On the other hand, the shrinkage method performs better when p is large, which is consistent with the performance of the APE in linear models. Importantly, the shrinkage approach is free from any tuning parameters, easy to compute and calculations are not iterative. The shrinkage estimation strategy can be extended in various directions to more complex problems. It may be worth mentioning that this is one of the two areas Bradley Efron predicted for the early twenty-first century (RSS News, January ). Shrinkage and likelihood-based methods continue to be extremely useful tools for efficient estimation. About the Author The author S. Ejaz Ahmed is Professor and Head Department of Mathematics and Statistics. For biography, see entry 7Optimal Shrinkage Estimation. Cross References 7Estimation 7Estimation: An Overview 7James-Stein Estimator 7Linear Regression Models 7Optimal Shrinkage Estimation 7Residuals 7Ridge and Surrogate Ridge Regressions 7Semiparametric Regression Models References and Further Reading Ahmed SE, Doksum KA, Hossain S, You J () Shrinkage, pretest and absolute penalty estimators in partially linear models. Aust NZ J Stat ():– Breiman L () Better subset selection using the non-negative garotte. Technical report, University of California, Berkeley Efron B, Hastie T, Johnstone I, Tibshirani R () Least angle regression (with discussion). Ann Stat ():– Frank IE, Friedman JH () A statistical view of some chemometrics regression tools. Technometrics :– Hastie T, Tibshirani R, Friedman J () The elements of statistical learning: data mining, inference, and prediction, nd edn. Springer, New York Hoerl AE, Kennard RW () Ridge regression: biased estimation for nonorthogonal problems. Technometrics :– Tibshirani R () Regression shrinkage and selection via the lasso. J R Stat Soc B :– Accelerated Lifetime Testing Tikhonov An () Solution of incorrectly formulated problems and the regularization method. Soviet Math Dokl :– , English translation of Dokl Akad Nauk SSSR , , – Zou H, Hastie T () Regularization and variable selction via the elastic net. J R Stat Soc B ():– Accelerated Lifetime Testing Francisco Louzada-Neto Associate Professor Universidade Federal de São Carlos, Sao Paulo, Brazil by a lifetime distribution, such as exponential, Weibull, log-normal, log-logistic, among others. The other is a stress-response relationship (SRR), which relates the mean lifetime (or a function of this parameter) with the stress levels. Common SRRs are the power law, Eyring and Arrhenius models (Meeker and Escobar ) or even a general log-linear or log-non-linear SRR which encompass the formers. For sake of illustration, we shall assume an exponential distribution as the lifetime model and a general log-linear SRR. Here, the mean lifetime under the usual working conditions shall represent our device reliability measure of interesting. Let T > be the lifetime random variable with an exponential density f (t, λ i ) = λ i exp {−λ i t} , Accelerated life tests (ALT) are efficient industrial experiments for obtaining measures of a device reliability under the usual working conditions. A practical problem for industries of different areas is to obtain measures of a device reliability under its usual working conditions. Typically, the time and cost of such experimentation are long and expensive. The ALT are efficient for handling such situation, since the information on the device performance under the usual working conditions are obtained by considering a time and cost-reduced experimental scheme. The ALT are performed by testing items at higher stress covariate levels than the usual working conditions, such as temperature, pressure and voltage. There is a large literature on ALT and interested readers can refer to Mann et al. (), Nelson (), Meeker and Escobar () which are excellent sources for ALT. Nelson (a, b) provides a brief background on accelerated testing and test plans and surveys the related literature point out more than related references. A simple ALT scenario is characterized by putting k groups of ni items each under constant and fixed stress covariate levels, Xi (hereafter stress level), for i = , . . . , k, where i = generally denotes the usual stress level, that is, the usual working conditions. The experiment ends after a certain pre-fixed number ri < ni of failures, ti , ti , . . . , tiri , at each stress level, characterizing a type II censoring scheme (Lawless ; see also 7Censoring Methodology). Other stress schemes, such as step (see 7Step-Stress Accelerated Life Tests) and progressive ones, are also common in practice but will not be considered here. Examples of those more sophisticated stress schemes can be found in Nelson (). The ALT models are composed by two components. One is a probabilistic component, which is represented A () where λ i > is an unknown parameter representing the constant failure rate for i = , . . . , k (number of stress levels). The mean lifetime is given by θ i = /λ i . The likelihood function for λ i , under the i-th stress level Xi , is given by ⎛ ri ⎞ Li (λ i ) = ∏ f (tij , λ i ) (S(tiri , λ i ))ni −ri = λ ri i exp {−λ i Ai } , ⎝ j= ⎠ where S(tiri , λ i ) is the survival function at tiri and Ai = ri tij + (ni − ri )tiri denotes the total time on test for the ∑j= i-th stress level. Considering data under the k random stress levels, the likelihood function for the parameter vector λ = (λ , λ , . . . , λ k ) is given by k L(λ) = ∏ λ ri i exp {−λ i Ai } . () i= We consider a general log-linear SRR defined as λ i = exp(−Zi − β − β Xi ), () where X is the covariate, Z = g(X) and β and β are unknown parameters such that −∞ < β , β < ∞. The SRR () has several models as particular cases. The Arrhenius model is obtained if Zi = , Xi = /Vi, β =−α and β = α , where Vi denotes a level of the temperature variable. If Zi = , Xi = −log(Vi), β = log(α) and β = α , where Vi denotes a level of the voltage variable we obtain the power model. Following Louzada-Neto and Pardo-Fernandéz (), the Eyring model is obtained if Zi = − log Vi , Xi = /Vi, β = −α and β = α , where Vi denotes a level of the temperature variable. Interested readers can refer to Meeker and Escobar () for more information about the physical models considered here. A A Accelerated Lifetime Testing From () and (), the likelihood function for β and β is given by Two types of software for ALT are provided by Meeker and Escobar () and ReliaSoft Corporation (). k L(β , β ) = ∏{exp(−Zi − β − β Xi )ri About the Author i= exp(− exp(−Zi − β − β Xi )Ai )}. () The maximum likelihood estimates (MLEs) of β and β can be obtained by direct maximization of (), or by solving the system of nonlinear equations, ∂ log L/∂θ = , where θ ′ = (β , β ). Obtaining the score function is conceptually simple and the expressions are not given explicitly. The MLEs of θ i can be obtained, in principle, straightforwardly by considering the invariance property of the MLEs. Large-sample inference for the parameters can be based on the MLEs and their estimated variances, obtained by inverting the expected information matrix (Cox and Hinkley ). For small or moderate-sized samples however we may consider simulation approaches, such as the bootstrap confidence intervals (see 7Bootstrap Methods) that are based on the empirical evidence and are therefore preferred (Davison and Hinkley ). Formal goodnessof-fit tests are also feasible since, from (), we can use the likelihood ratio statistics (LRS) for testing goodness-of-fit of hypotheses such as H : β = . Although we considered only an exponential distribution as our lifetime model, more general lifetime distributions, such as the Weibull (see 7Weibull Distribution and Generalized Weibull Distributions), log-normal, log-logistic, among others, could be considered in principle. However, the degree of difficulty in the calculations increase considerably. Also we considered only one stress covariate, however this is not critical for the overall approach to hold and the multiple covariate case can be handle straightforwardly. A study on the effect of different reparametrizations on the accuracy of inferences for ALT is discussed in LouzadaNeto and Pardo-Fernandéz ). Modeling ALT with a log-non-linear SRR can be found in Perdoná et al. (). Modeling ALT with a threshold stress, below which the lifetime of a product can be considered to be infinity or much higher than that for which it has been developed is proposed by Tojeiro et al. (). We only considered ALT in presence of constant stress loading, however non-constant stress loading, such as step stress and linearly increasing stress are provided by Miller and Nelson () and Bai, Cha and Chung (), respectively. A comparison between constant and step stress tests is provided by Khamis (). A log-logistic step stress model is provided by Srivastava and Shukla (). Francisco Louzada-Neto is an associate professor of Statistics at Universidade Federal de São Carlos (UFSCar), Brazil. He received his Ph.D in Statistics from University of Oxford (England). He is Director of the Centre for Hazard Studies (–, UFSCar, Brazil) and Editor in Chief of the Brazilian Journal of Statistics (–, Brazil). He is a past-Director for Undergraduate Studies (–, UFSCar, Brazil) and was Director for Graduate Studies in Statistics (–, UFSCar, Brazil). Louzada-Neto is single and joint author of more than publications in statistical peer reviewed journals, books and book chapters, He has supervised more than assistant researches, Ph.Ds, masters and undergraduates. Cross References 7Degradation Models in Reliability and Survival Analysis 7Modeling Survival Data 7Step-Stress Accelerated Life Tests 7Survival Data References and Further Reading Bai DS, Cha MS, Chung SW () Optimum simple ramp tests for the Weibull distribution and type-I censoring. IEEE T Reliab :– Cox DR, Hinkley DV () Theoretical statistics. Chapman and Hall, London Davison AC, Hinkley DV () Bootstrap methods and their application. Cambridge University Press, Cambridge Khamis IH () Comparison between constant- and step-stress tests for Weibull models. Int J Qual Reliab Manag :– Lawless JF () Statistical models and methods for lifetime data, nd end. Wiley, New York Louzada-Neto F, Pardo-Fernandéz JC () The effect of reparametrization on the accuracy of inferences for accelerated lifetime tests. J Appl Stat :– Mann NR, Schaffer RE, Singpurwalla ND () Methods for statistical analysis of reliability and life test data. Wiley, New York Meeker WQ, Escobar LA () Statistical methods for reliability data. Wiley, New York Meeker WQ, Escobar LA () SPLIDA (S-PLUS Life Data Analysis) software–graphical user interface. . iastate.edu/~splida Miller R, Nelson WB () Optimum simple step-stress plans for accelerated life testing. IEEE T Reliab :– Nelson W () Accelerated testing – statistical models, test plans, and data analyses. Wiley, New York Nelson W (a) A bibliography of accelerated test plans. IEEE T Reliab :– Nelson W (b) A bibliography of accelerated test plans part II – references. IEEE T Reliab :– Acceptance Sampling Perdoná GSC, Louzada Neto F, Tojeiro CAV () Bayesian modelling of log-non-linear stress-response relationships in accelerated lifetime tests. J Stat Theory Appl ():– Reliasoft Corporation () Optimum allocations of stress levels and test units in accelerated tests. Reliab EDGE :–. Srivastava PW, Shukla R () A log-logistic step-stress model. IEEE T Reliab :– Tojeiro CAV, Louzada Neto F, Bolfarine H () A Bayesian analysis for accelerated lifetime tests under an exponential power law model with threshold stress. J Appl Stat ():– Acceptance Sampling M. Ivette Gomes Professor Universidade de Lisboa, DEIO and CEAUL, Lisboa, Portugal Introduction Acceptance sampling (AS) is one of the oldest statistical techniques in the area of 7statistical quality control. It is performed out of the line production, most commonly before it, for deciding on incoming batches, but also after it, for evaluating the final product (see Duncan ; Stephens ; Pandey ; Montgomery ; and Schilling and Neubauer , among others). Accepted batches go into the production line or are sold to consumers; the rejected ones are usually submitted to a rectification process. A sampling plan is defined by the size of the sample (samples) taken from the batch and by the associated acceptance–rejection criterion. The most widely used plans are given by the Military Standard tables, developed during the World War II, and first issued in . We mention MIL STD E () and the civil version ANSI/ASQC Z. () of the American National Standards Institution and the American Society for Quality Control. At the beginning, all items and products were inspected for the identification of nonconformities. At the late s, Dodge and Romig (see Dodge and Romig ), in the Bell Laboratories, developed the area of AS, as an alternative to % inspection. The aim of AS is to lead producers to a decision (acceptance or rejection of a batch) and not to the estimation or improvement of the quality of a batch. Consequently, AS does not provide a direct form of quality control, but its indirect effects in quality are important: if a batch is rejected, either the supplier tries improving its production methods or the consumer (producer) looks for a better supplier, indirectly increasing quality. A Regarding the decision on the batches, we distinguish three different approaches: () acceptance without inspection, applied when the supp...
View Full Document

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture