Week 8 Prep - Engr 9397 – Week 8 Prepara1on...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Engr 9397 – Week 8 Prepara1on Correla1on Analysis Mul1ple Regression Analysis Design of Experiments •  Below are some slides on correla1on that we didn’t get to last week. I also included an overview of mul1ple regression as well as an overview of Design of Experiments so you will know more about what Leonard talks about next week. Take a look through all as correla1on analysis and mul1ple regression lead into the Design of Experiments discussion. Correla1on Analysis •  Correla1on Analysis determines the degree of linear interrela1on or associa1on between 2 random variables •  All correla1on measures (ρ ) are dimensionless and scale to lie in the range:  ­1 ≤ ρ ≤ 1 •  For uncorrelated data, ρ = O Correla1on Analysis •  Correla1on does not provide evidence for casual rela1onship between two variables •  Two variables may be correlated because one causes the other; (ex. mixing produces heat) or because they both share a common factor that influences both variables (ex. Two mechanical parts are affected by vibra1ons from a common source) Difference between Correla1on and Regression •  Correla1on of: x vs. y = y vs. x •  Regression of : x vs. y ≠ y vs. x, •  In regression, a line is fiWed to explain y from x or x from y and the rela1onship cannot be reversed unless the fit is perfect •  A regression line can be used to predict y values Coefficient of Correla1on (r) •  A coefficient of correla1on quan1fies and tests the strength of monotonic rela1onship between two variables x and y •  Pearson's correla1on coefficient, r, measures the linear correla1on/associa1on between x and y •  r is invariant to changes in scale, and has a dimensionless value that is obtained by dividing the difference between random variables and their means by their respec1ve sample standard devia1ons Coefficient of correla1on •  This scale ­invariant measure of the strength of a linear regression can be constructed by comparing the variability “explained” by the regression to the original variability of y •  For random variables: •  For samples: Correla1on’s rela1on to Least ­Squares line •  The strength of a linear rela1onship can be measured by the slope of the least squares regression line, but this measure is dependent on the magnitude of the slope, or the scale of measurement for y •  a scale ­invariant measure of linear rela1onship can be constructed by comparing the variability explained by the regression to the original variability of y values, resul1ng in the popula&on correla&on coefficient What does r tell us about the least ­squares fit line? •  If r takes on a posi1ve value close to 1, the data points (xi, yi) are nearly collinear, and the slope of the corresponding least ­squares line is posi1ve •  If r takes on a nega1ve value close to 1, the data points (xi, yi) are nearly collinear, and the slope of the corresponding least ­squares line is nega1ve •  If r takes on a value of zero, there is liWle or no linear rela1onship between x and y Remember our previous example •  The correla1on is the propor1on of the total SSyy accounted for by the regression •  Comparing variance of residuals (variability explained by the regression) to the original variance of the y values  ­ the propor1on of the reduc1on in variability associated with the regression to the original variability of y Example – finding r •  The following data give the hardness of six specimens of cold ­reduced steel having different annealing temperatures: Hardness 81.7 77.6 1090 53.8 1370 62.4 1180 85.3 900 71.2 1100 Temperature 950 •  Find the correla1on coefficient between the annealing temperature and the hardness. What percentage of the varia1on in hardness values is explained by the linear rela1onship with annealing temperature? Coefficient of correla1on •  The correla1on coefficient can form the basis of a sta1s1cal test of independence. •  Ho: ρ = 0 yi are independent iden1cally distributed normal rvs, not dependent on xi •  The test sta1s1c t is defined as: •  Ho: ρ ≠ O •  Ho is rejected if |t| > tcrit, at n ­2 degrees of freedom Significance Tests and Confidence Intervals for r •  For our test sta1s1c, we use the t ­distribu1on, where t defined as: •  We reject Ho if |t| > tcrit •  tcrit is the value of the t ­distribu1on with n  ­ 2 d.f. and a probability of exceedence of α/2 •  In ­class example Applica1on to Quality •  Quality ­related issues are ojen the result of the combined effects of mul1ple variables that do not act independently •  Regression analysis is useful in screening if a variable is not correlated with another and if it causes liWle change to y, it can probably be excluded from the analysis •  Problems associated with correlated variables, non ­measured variables and equa1on form are ojen overlooked – bad idea! Week 8 Prep Introduc1on to Mul1ple Regression Mul1ple Regression •  In mul1ple linear regression, the dependent y variable con1nues to assume random values, but we are dealing with k fixed independent (non ­random) variables (regressors) •  In the case of simple regression, we used the least squares method to find the line of best fit to the data •  In the case of mul1ple regression, where more than two variables are involved, we apply the method of least squares to fit a hyperplane to a set of data points so that y can be predicted for a given set of values of x1, x2,…xk with minimum error Mul1ple Regression con1nued •  The fiWed surface has the equa1on which predicts the value of y when the independent x variables take on a given set of values •  The bk coefficients can be found by the least squares method (complicated without help of sojware) Mul1ple Regression •  Let’s look at a sample Minitab result and go through what it means •  Example: 5 variables (conveyor angle, temperature, flux concentra1on, conveyor speed, and preheat temperature) involved in the soldering process of a circuit board set up are measured over 25 separate runs of 5 boards. Each board contains 460 solder joints. We are interested in examining the effects of these 5 variables on the number of defec1ve solder joints per 100 joints inspected is recorded Mul1ple Regression Analysis •  We get the following results from a linear regression analysis COEFFICIENT  ­1.7885 C1 C2 C3 C4 C5 R ­squared S .21357  ­.00959 .898 .1216 .0001695 73.6 0.05806 STD DEV COEF 0.9655 0.0363 0.001873 1.047 0.2167 0.0009457 T RATIO COEF/ STD DEV  ­1.85 5.88  ­0.51 0.86 0.56 0.18 Row 22 C1 6.7 C6 0.287 Pred Y value 0.1104 Std Dev Residual 0.022 0.1766 St.Res. 3.29R Regression analysis result interpreta1on •  Regression line equa1on: a fiWed linear least squares equa1on, gives coefficient values •  Table of coefficients: gives coefficient values, t values needed to perform a significance test on each coefficient –  coefficient t values can be compared to tcrit at a given level of significance and d.f. •  s: the standard devia1on of the residuals es1mated by the mean ­squared devia1on of the y values from the least ­squares equa1on R ­Squared value •  Similar to r, R is the mul1ple correla1on coefficient •  R2 measures the strength of the linear rela1onship of the x’s with the dependent variable, y, by comparing the variance of the y values with the variance of the residuals •  R2 is expressed as a percentage of reduc1on of y variance that is aWributable to the rela1onship between y and the predictor variables (xi) Correla1on among x variables •  Coefficients are subject to random error, error that can be quan1fied by a CI (assuming that residuals are normally distributed) •  A more serious source of coefficient error results from correla1ons among x variables  ­ when x variables are correlated, we cannot separate their individual effects on y •  When independent variables are correlated, their effects are said to be confounded, and it is not possible to separate their individual effects Mul1ple Regression  ­ confounding •  A correla&on matrix (below) is used to inves1gate the degree of correla1on among independent variables x’s •  x variables are considered confounded if one or more correla1on coefficients are near +/ ­ 1 C1 C2 C3 C4 C5  ­.328  ­.39 .281 .251 .174 .03 .402 .215 .117  ­.207 C2 C3 C4 Mul1ple Regression – correla1on matrix example •  Confounding can also be shown by omiong x variables from the regression equa1on and performing a new regression analysis –  If the x coefficients (or t values) have changed significantly, then coefficients are confounded •  Performing another regression analysis with x1 omiWed results in the following equa1on: •  vs. our original regression analysis results: A note about non ­linear terms •  Non ­linear terms can be introduced to produce an equa1on that has beWer predic1ve power, but in so doing ojen increases the risk of introducing increased correla1ons among x variables •  An analysis of residuals (using a normal ­scores plot, a plot of residuals against ypred or a plot of residuals against run number) can help to validate our assump1ons of normality, linearity and experiment repeatability (randomness) Mul1ple Regression Analysis  ­ Applica1on to Quality •  Quality ­related issues are ojen the result of the combined effects of mul1ple variables that do not act independently •  Regression analysis is useful in screening if a variable is not correlated with another and if it causes liWle change to y, it can probably be excluded from the analysis •  Problems associated with correlated variables, non ­measured variables and equa1on form are ojen overlooked – bad idea! Week 8 Prep Introduc1on to Design of Experiments Design of Experiments •  How can one use knowledge of product or process parameters and their interac1ons to op1mize a product or process? •  Experimental Design: a planned program for data collec1on for the purpose of obtaining answers to specific ques1ons •  A well ­executed experiment can help to: –  minimize problems of causality –  reduce errors ojen seen in regression analysis –  Avoid incorrect conclusions and decisions Design of Experiments  ­ Introduc1on •  DoE: A method for experimenta1on –  An organized inves1ga1on into factors that can affect a product or process –  Iden1fies factors and values at which factors are to be tested (factor levels) as well as which factor combina1ons (treatments) will be tested during the series of test trials –  Studies the effect of factors, or input variables, on a response (the response can be either a process or performance measure) General DoE Methodology •  •  •  •  •  •  •  Define the problem ID experiment components Design the experiment Perform the experiment Select an analysis method Analyze the data Act on results Design of an Experiment What cons1tutes a design experiment? A clear purpose – problem to inves1gate A descrip1on of the data Data required, method of data collec1on data acquisi1on plan, test condi1ons, other influen1al factors that may affect the experiment •  # factors, factor test levels, sample size, response variable, # trials •  Analysis & results repor1ng procedure •  •  •  •  •  Characteris1cs of good experiment design •  Simple (set ­up, carry out) •  Straightorward to analyze •  Results can be clearly communicated and appreciated •  Insignificant factors, interac1ons and levels can be quickly eliminated •  Unbiased, free of ambiguity •  Results point towards a direc1on of improvement •  Adequate randomiza1on Good experimental design  ­ Randomiza1on •  Noise can affect treatments combina1on outcomes rela1ve to each other (example, humidity changes, wind etc.) •  To minimize this effect, trials are randomized (priori1zed) by running treatments out of order, which prevents systema1c effects of noise •  Blocking – a technique used to run series of replicates (either par1ally or completely) in a randomized fashion, trials within each block are randomized Introduc1on of DoE Terminology •  Interac1on: when 2 variables have a joint effect of the response variable •  Factor: a variable selected for study •  Levels: selected values for each factor •  Main effects: overall effect of factors on the response variable •  Replicate: a repe11on of the experiment used to detect interac1ons among factors in small factorial experiments •  Fully balanced experiment: each replicate contains the same number of observa1ons for each combina1on of values (a factorial experiment requirement) What should an experiment determine? •  •  •  •  •  Which factors significantly affect the system How factor magnitudes affects the system Op1mal levels for each factors How to manipulate factors to control response The experiment should be capable of answering the ques1ons of concern upon which the experiment is based and make probability statements about the validity of the answers One ­Way Designs •  One ­way design – an experimental design for tes1ng differences among several means •  In a one ­way design, we want to determine if random samples taken from k popula1on means are significantly different from each other •  H0: several popula1on means are equal •  H1 : at least two of the popula1on means are unequal •  We need to find the right test sta1s1c One ­Way Designs •  To get an appropriate test sta1s1c, we need to make some assump1ons: –  Samples are independent –  Each sample comes from a normally distributed popula1on –  Variances of k popula1ons are equal •  A test of whether the popula1on means are different from one another can be based on a comparison of two different methods to calculate variance •  This compara1ve test is called an F test Calcula1ng variance of popula1on means •  Pooled within ­sample variance (sW2): –  the mean of the k sample variances (es1mates sigma squared directly from experimental data) note that sW2 is called the MSE mean square for error in an ANOVA table –  the variance of the k sample means –  If the k samples have the same mean, then the between sample variance will es1mate σ2/n –  If the k samples do not have the same mean, then the between sample variance will es1mate σ2/n plus a second component of variability associated with the differences among the means –  note that s2B is called the MSM mean square for means in an ANOVA table •  Between ­sample variance (sB2): F test: differences among popula1on means •  If H0 is true the distribu1on of the random variable F is represented by the F distribu1on •  If the k means differ, sB2 will exceed sw2 and F will be greater than 1 •  A test of the significance is performed by calcula1ng the two variances, n 1mes sB2 and the pooled within ­sample variance of the k samples, and then taking the ra1o of these variances to find F F test: differences among popula1on means •  This ra1o (F) will have the F distribu1on when H0 is true – i.e. there are no differences among the means •  F is then compared with F0.05 or F0.1 •  d.f. for the numerator of F are k – 1, d.f. for the denominator are k(n – 1) Analysis of Variance •  We can use Analysis of Variance (an algorithm), and an ANOVA (analysis ­of ­variance table) to help us compute variances required to analyze an experimental design •  To perform an analysis of variance: –  calculate the sum of squares (the numerator of a given variance) –  divide the sum of squares by the appropriate d.f. to form variance es1mates and calculate F –  Compare the F value with table values at k ­1 and n(k ­1) d.f. at a significance level of F.05 or F.01 to determine if the k sample means are significantly different –  If F calculated by the analysis of variance is greater than the table value, it can be concluded that the means are significantly different at a par1cular level of significance ANOVA – Analysis of Variance Table Source of variability Means Error Total d.f. k ­1 nk ­1 Sum of squares SSM SST Mean square MSM = SSM/(k ­1) MSE = SSE/k(n ­1) F MSM/MSE k(n ­1) SSE •  Mean squares terms refer to a sum of squares divided by its corresponding d.f. – mean squares are es1mates of variance •  Mean square for ‘Error’ es1mates pooled ­within ­sample variance, s2W (represents background variability against which difference are compared •  Means square for ‘Means’ es1mates between ­sample variance, s2B •  F gives the value of the test sta1s1c used for tes1ng the equality of the k sample means Example ANOVA – one ­way designs •  For test involving several means, we use the following sums of squares: •  CorrecJon Term: •  Total sum of squares: C = (sum of all nk observaJons)2 /nk SST = sum of the squares of all nk observaJons – C •  Sum of Squares for Means: SSM = 1/n Jmes the sum of squares of the sample totals – C •  Sum of Squares for Error: SSE = SST  ­ SSM Example of this will be done in class Randomized Block designs •  Some1mes we are looking to eliminate an extraneous source of variability so that the inves1ga1on/aWen1on of the experiment can focus more specifically on one or more aspects of a comparison between samples •  Extraneous variability can be eliminated by bolding its value fixed in an experiment and repea1ng a one ­way experiment, fixing the value of the extraneous variable at a different level in each repe11on •  This randomiza1on assures that biases associated with other causes of variability will not affect the experimental results Randomized block design – degrees of freedom Source of variability Means Blocks Error Total d.f. a ­1 b ­1 (a ­1)(b ­1) ab  ­ 1 •  This dummy analysis of variance above omits sums of squares, mean square and F •  It is useful in designing an experiment as it lists all the factors and their respec1ve d.f. •  It also allows us to check to see if there are sufficient d.f. for error •  For moderate differences among popula1on means, d.f. for error should be ≥ 30 •  If the d.f. for error is insufficient, the number of block can be increased Randomized block designs – sums of squares •  C is the square of the grand total of all observa1ons divided by the number of observa1ons, ab •  TSS is the sum of squares of all ab observa1ons minus C (same as one ­way design) •  To calculate the sum of squares for means or for blocks: –  Totals are found for each mean or block –  These totals are squared and summed –  The sums are then and divided by the number of observa1ons comprising each total –  C is then subtracted from the result Randomized block designs – calcula1ng mean square values •  To calculate the mean square value for the means, blocks and the error, divide each sum of squares by the corresponding degree of freedom •  MSMa = SSMa / (a ­1) •  MSMb = SSMb / (b ­1) •  MSE = SSE / ((a ­1)(b ­1)) Randomized block designs – calcula1ng F values •  Our mean square values MSMa and MSMb are then used to calculate two F sta1s1cs, Fa and Fb •  Fa = MSMa/MSE •  Fb = MSMb/MSE •  These F values can then be compared to our cri1cal F values at a given level of significance to determine whether there is a significant difference among the means and whether there is a significant difference among the blocks Randomized block designs – sums of squares •  If blocking if effec1ve in elimina1ng the variability associated with an extraneous variable, then the MSE will be reduced •  Smaller error terms makes F test is more sensi1ve to small differences among the means Factorial Experiments •  Factorial experiment: Experiment in which each level of one factor is combined with each level of every other factor to obtain all possible treatment combina1ons at which the trials are to be performed •  There are many mul1 ­factor experimental designs to suit different situa1ons •  The 2k factorial design, where k factors are studied at two different factor levels to observe their effect on a response, is a useful method to help in the selec1on product and process parameters Factorial Designs – Joint Effects •  When two variables have a joint effect on the response variable, there is an interac1on between them •  Interac1ons between factors exist if the combined effect of two factors is much more/less than the sum of the effects caused by the individual factors alone •  The overall effects of these factors on the response are called main effects •  When significant interac1on exists, main effects calcula1ons have no meaning, if there is no interac1on (joint effect is addi1ve), then interac1on effects = 0 Factorial Designs – Replicates •  Factorial Designs involve using replicates (balanced repe11ons of an experiment) •  Replicates allows us to es1mate the interac1on between factors (variables studied) •  Each replicate must be fully balanced – each replicate must contain the same number of observa1ons for each combina1on of variables Two ­Factor Designs – ANOVA summary •  Two ­factor designs are like a randomized block design except that the design is replicated (2 or more balanced repe11ons) Source of Variability Replicates Factor A Factor B Interac1on Error Total d.f. r – 1 a – 1 b – 1 (a – 1)(b  ­ 1) (ab  ­ 1)(r  ­ 1) rab  ­ 1 Interpre1ng Experiment Results •  Many experiment results are not clear •  Replicates may differ substan1ally, treatment combina1ons may overlap, and there is uncertainty about where differences are ‘true’ or are a result of experimental variability •  Do over come this uncertainty, we calculate the effect resul1ng from the individual factors and their interac1ons with each other to determine if effects are significant ...
View Full Document

Ask a homework question - tutors are online