Unformatted text preview: Engr 9397 – Week 8 Prepara1on Correla1on Analysis Mul1ple Regression Analysis Design of Experiments • Below are some slides on correla1on that we didn’t get to last week. I also included an overview of mul1ple regression as well as an overview of Design of Experiments so you will know more about what Leonard talks about next week. Take a look through all as correla1on analysis and mul1ple regression lead into the Design of Experiments discussion. Correla1on Analysis • Correla1on Analysis determines the degree of linear interrela1on or associa1on between 2 random variables • All correla1on measures (ρ ) are dimensionless and scale to lie in the range:
1 ≤ ρ ≤ 1 • For uncorrelated data, ρ = O Correla1on Analysis • Correla1on does not provide evidence for casual rela1onship between two variables • Two variables may be correlated because one causes the other; (ex. mixing produces heat) or because they both share a common factor that inﬂuences both variables (ex. Two mechanical parts are aﬀected by vibra1ons from a common source) Diﬀerence between Correla1on and Regression • Correla1on of: x vs. y = y vs. x • Regression of : x vs. y ≠ y vs. x, • In regression, a line is ﬁWed to explain y from x or x from y and the rela1onship cannot be reversed unless the ﬁt is perfect • A regression line can be used to predict y values Coeﬃcient of Correla1on (r) • A coeﬃcient of correla1on quan1ﬁes and tests the strength of monotonic rela1onship between two variables x and y • Pearson's correla1on coeﬃcient, r, measures the linear correla1on/associa1on between x and y • r is invariant to changes in scale, and has a dimensionless value that is obtained by dividing the diﬀerence between random variables and their means by their respec1ve sample standard devia1ons Coeﬃcient of correla1on • This scale
invariant measure of the strength of a linear regression can be constructed by comparing the variability “explained” by the regression to the original variability of y • For random variables: • For samples: Correla1on’s rela1on to Least
Squares line • The strength of a linear rela1onship can be measured by the slope of the least squares regression line, but this measure is dependent on the magnitude of the slope, or the scale of measurement for y • a scale
invariant measure of linear rela1onship can be constructed by comparing the variability explained by the regression to the original variability of y values, resul1ng in the popula&on correla&on coeﬃcient What does r tell us about the least
squares ﬁt line? • If r takes on a posi1ve value close to 1, the data points (xi, yi) are nearly collinear, and the slope of the corresponding least
squares line is posi1ve • If r takes on a nega1ve value close to 1, the data points (xi, yi) are nearly collinear, and the slope of the corresponding least
squares line is nega1ve • If r takes on a value of zero, there is liWle or no linear rela1onship between x and y Remember our previous example • The correla1on is the propor1on of the total SSyy accounted for by the regression • Comparing variance of residuals (variability explained by the regression) to the original variance of the y values
the propor1on of the reduc1on in variability associated with the regression to the original variability of y Example – ﬁnding r • The following data give the hardness of six specimens of cold
reduced steel having diﬀerent annealing temperatures: Hardness 81.7 77.6 1090 53.8 1370 62.4 1180 85.3 900 71.2 1100 Temperature 950 • Find the correla1on coeﬃcient between the annealing temperature and the hardness. What percentage of the varia1on in hardness values is explained by the linear rela1onship with annealing temperature? Coeﬃcient of correla1on • The correla1on coeﬃcient can form the basis of a sta1s1cal test of independence. • Ho: ρ = 0 yi are independent iden1cally distributed normal rvs, not dependent on xi • The test sta1s1c t is deﬁned as: • Ho: ρ ≠ O • Ho is rejected if t > tcrit, at n
2 degrees of freedom Signiﬁcance Tests and Conﬁdence Intervals for r • For our test sta1s1c, we use the t
distribu1on, where t deﬁned as: • We reject Ho if t > tcrit • tcrit is the value of the t
distribu1on with n
2 d.f. and a probability of exceedence of α/2 • In
class example Applica1on to Quality • Quality
related issues are ojen the result of the combined eﬀects of mul1ple variables that do not act independently • Regression analysis is useful in screening if a variable is not correlated with another and if it causes liWle change to y, it can probably be excluded from the analysis • Problems associated with correlated variables, non
measured variables and equa1on form are ojen overlooked – bad idea! Week 8 Prep Introduc1on to Mul1ple Regression Mul1ple Regression • In mul1ple linear regression, the dependent y variable con1nues to assume random values, but we are dealing with k ﬁxed independent (non
random) variables (regressors) • In the case of simple regression, we used the least squares method to ﬁnd the line of best ﬁt to the data • In the case of mul1ple regression, where more than two variables are involved, we apply the method of least squares to ﬁt a hyperplane to a set of data points so that y can be predicted for a given set of values of x1, x2,…xk with minimum error Mul1ple Regression con1nued • The ﬁWed surface has the equa1on which predicts the value of y when the independent x variables take on a given set of values • The bk coeﬃcients can be found by the least squares method (complicated without help of sojware) Mul1ple Regression • Let’s look at a sample Minitab result and go through what it means • Example: 5 variables (conveyor angle, temperature, ﬂux concentra1on, conveyor speed, and preheat temperature) involved in the soldering process of a circuit board set up are measured over 25 separate runs of 5 boards. Each board contains 460 solder joints. We are interested in examining the eﬀects of these 5 variables on the number of defec1ve solder joints per 100 joints inspected is recorded Mul1ple Regression Analysis • We get the following results from a linear regression analysis COEFFICIENT
1.7885 C1 C2 C3 C4 C5 R
squared S .21357
.00959 .898 .1216 .0001695 73.6 0.05806 STD DEV COEF 0.9655 0.0363 0.001873 1.047 0.2167 0.0009457 T RATIO COEF/ STD DEV
1.85 5.88
0.51 0.86 0.56 0.18 Row 22 C1 6.7 C6 0.287 Pred Y value 0.1104 Std Dev Residual 0.022 0.1766 St.Res. 3.29R Regression analysis result interpreta1on • Regression line equa1on: a ﬁWed linear least squares equa1on, gives coeﬃcient values • Table of coeﬃcients: gives coeﬃcient values, t values needed to perform a signiﬁcance test on each coeﬃcient – coeﬃcient t values can be compared to tcrit at a given level of signiﬁcance and d.f. • s: the standard devia1on of the residuals es1mated by the mean
squared devia1on of the y values from the least
squares equa1on R
Squared value • Similar to r, R is the mul1ple correla1on coeﬃcient • R2 measures the strength of the linear rela1onship of the x’s with the dependent variable, y, by comparing the variance of the y values with the variance of the residuals • R2 is expressed as a percentage of reduc1on of y variance that is aWributable to the rela1onship between y and the predictor variables (xi) Correla1on among x variables • Coeﬃcients are subject to random error, error that can be quan1ﬁed by a CI (assuming that residuals are normally distributed) • A more serious source of coeﬃcient error results from correla1ons among x variables
when x variables are correlated, we cannot separate their individual eﬀects on y • When independent variables are correlated, their eﬀects are said to be confounded, and it is not possible to separate their individual eﬀects Mul1ple Regression
confounding • A correla&on matrix (below) is used to inves1gate the degree of correla1on among independent variables x’s • x variables are considered confounded if one or more correla1on coeﬃcients are near +/
1 C1 C2 C3 C4 C5
.328
.39 .281 .251 .174 .03 .402 .215 .117
.207 C2 C3 C4 Mul1ple Regression – correla1on matrix example • Confounding can also be shown by omiong x variables from the regression equa1on and performing a new regression analysis – If the x coeﬃcients (or t values) have changed signiﬁcantly, then coeﬃcients are confounded • Performing another regression analysis with x1 omiWed results in the following equa1on: • vs. our original regression analysis results: A note about non
linear terms • Non
linear terms can be introduced to produce an equa1on that has beWer predic1ve power, but in so doing ojen increases the risk of introducing increased correla1ons among x variables • An analysis of residuals (using a normal
scores plot, a plot of residuals against ypred or a plot of residuals against run number) can help to validate our assump1ons of normality, linearity and experiment repeatability (randomness) Mul1ple Regression Analysis
Applica1on to Quality • Quality
related issues are ojen the result of the combined eﬀects of mul1ple variables that do not act independently • Regression analysis is useful in screening if a variable is not correlated with another and if it causes liWle change to y, it can probably be excluded from the analysis • Problems associated with correlated variables, non
measured variables and equa1on form are ojen overlooked – bad idea! Week 8 Prep Introduc1on to Design of Experiments Design of Experiments • How can one use knowledge of product or process parameters and their interac1ons to op1mize a product or process? • Experimental Design: a planned program for data collec1on for the purpose of obtaining answers to speciﬁc ques1ons • A well
executed experiment can help to: – minimize problems of causality – reduce errors ojen seen in regression analysis – Avoid incorrect conclusions and decisions Design of Experiments
Introduc1on • DoE: A method for experimenta1on – An organized inves1ga1on into factors that can aﬀect a product or process – Iden1ﬁes factors and values at which factors are to be tested (factor levels) as well as which factor combina1ons (treatments) will be tested during the series of test trials – Studies the eﬀect of factors, or input variables, on a response (the response can be either a process or performance measure) General DoE Methodology • • • • • • • Deﬁne the problem ID experiment components Design the experiment Perform the experiment Select an analysis method Analyze the data Act on results Design of an Experiment What cons1tutes a design experiment? A clear purpose – problem to inves1gate A descrip1on of the data Data required, method of data collec1on data acquisi1on plan, test condi1ons, other inﬂuen1al factors that may aﬀect the experiment • # factors, factor test levels, sample size, response variable, # trials • Analysis & results repor1ng procedure • • • • • Characteris1cs of good experiment design • Simple (set
up, carry out) • Straightorward to analyze • Results can be clearly communicated and appreciated • Insigniﬁcant factors, interac1ons and levels can be quickly eliminated • Unbiased, free of ambiguity • Results point towards a direc1on of improvement • Adequate randomiza1on Good experimental design
Randomiza1on • Noise can aﬀect treatments combina1on outcomes rela1ve to each other (example, humidity changes, wind etc.) • To minimize this eﬀect, trials are randomized (priori1zed) by running treatments out of order, which prevents systema1c eﬀects of noise • Blocking – a technique used to run series of replicates (either par1ally or completely) in a randomized fashion, trials within each block are randomized Introduc1on of DoE Terminology • Interac1on: when 2 variables have a joint eﬀect of the response variable • Factor: a variable selected for study • Levels: selected values for each factor • Main eﬀects: overall eﬀect of factors on the response variable • Replicate: a repe11on of the experiment used to detect interac1ons among factors in small factorial experiments • Fully balanced experiment: each replicate contains the same number of observa1ons for each combina1on of values (a factorial experiment requirement) What should an experiment determine? • • • • • Which factors signiﬁcantly aﬀect the system How factor magnitudes aﬀects the system Op1mal levels for each factors How to manipulate factors to control response The experiment should be capable of answering the ques1ons of concern upon which the experiment is based and make probability statements about the validity of the answers One
Way Designs • One
way design – an experimental design for tes1ng diﬀerences among several means • In a one
way design, we want to determine if random samples taken from k popula1on means are signiﬁcantly diﬀerent from each other • H0: several popula1on means are equal • H1 : at least two of the popula1on means are unequal • We need to ﬁnd the right test sta1s1c One
Way Designs • To get an appropriate test sta1s1c, we need to make some assump1ons: – Samples are independent – Each sample comes from a normally distributed popula1on – Variances of k popula1ons are equal • A test of whether the popula1on means are diﬀerent from one another can be based on a comparison of two diﬀerent methods to calculate variance • This compara1ve test is called an F test Calcula1ng variance of popula1on means • Pooled within
sample variance (sW2): – the mean of the k sample variances (es1mates sigma squared directly from experimental data) note that sW2 is called the MSE mean square for error in an ANOVA table – the variance of the k sample means – If the k samples have the same mean, then the between sample variance will es1mate σ2/n – If the k samples do not have the same mean, then the between sample variance will es1mate σ2/n plus a second component of variability associated with the diﬀerences among the means – note that s2B is called the MSM mean square for means in an ANOVA table • Between
sample variance (sB2): F test: diﬀerences among popula1on means • If H0 is true the distribu1on of the random variable F is represented by the F distribu1on • If the k means diﬀer, sB2 will exceed sw2 and F will be greater than 1 • A test of the signiﬁcance is performed by calcula1ng the two variances, n 1mes sB2 and the pooled within
sample variance of the k samples, and then taking the ra1o of these variances to ﬁnd F F test: diﬀerences among popula1on means • This ra1o (F) will have the F distribu1on when H0 is true – i.e. there are no diﬀerences among the means • F is then compared with F0.05 or F0.1 • d.f. for the numerator of F are k – 1, d.f. for the denominator are k(n – 1) Analysis of Variance • We can use Analysis of Variance (an algorithm), and an ANOVA (analysis
of
variance table) to help us compute variances required to analyze an experimental design • To perform an analysis of variance: – calculate the sum of squares (the numerator of a given variance) – divide the sum of squares by the appropriate d.f. to form variance es1mates and calculate F – Compare the F value with table values at k
1 and n(k
1) d.f. at a signiﬁcance level of F.05 or F.01 to determine if the k sample means are signiﬁcantly diﬀerent – If F calculated by the analysis of variance is greater than the table value, it can be concluded that the means are signiﬁcantly diﬀerent at a par1cular level of signiﬁcance ANOVA – Analysis of Variance Table Source of variability Means Error Total d.f. k
1 nk
1 Sum of squares SSM SST Mean square MSM = SSM/(k
1) MSE = SSE/k(n
1) F MSM/MSE k(n
1) SSE • Mean squares terms refer to a sum of squares divided by its corresponding d.f. – mean squares are es1mates of variance • Mean square for ‘Error’ es1mates pooled
within
sample variance, s2W (represents background variability against which diﬀerence are compared • Means square for ‘Means’ es1mates between
sample variance, s2B • F gives the value of the test sta1s1c used for tes1ng the equality of the k sample means Example ANOVA – one
way designs • For test involving several means, we use the following sums of squares: • CorrecJon Term: • Total sum of squares: C = (sum of all nk observaJons)2 /nk SST = sum of the squares of all nk observaJons – C • Sum of Squares for Means: SSM = 1/n Jmes the sum of squares of the sample totals – C • Sum of Squares for Error: SSE = SST
SSM Example of this will be done in class Randomized Block designs • Some1mes we are looking to eliminate an extraneous source of variability so that the inves1ga1on/aWen1on of the experiment can focus more speciﬁcally on one or more aspects of a comparison between samples • Extraneous variability can be eliminated by bolding its value ﬁxed in an experiment and repea1ng a one
way experiment, ﬁxing the value of the extraneous variable at a diﬀerent level in each repe11on • This randomiza1on assures that biases associated with other causes of variability will not aﬀect the experimental results Randomized block design – degrees of freedom Source of variability Means Blocks Error Total d.f. a
1 b
1 (a
1)(b
1) ab
1 • This dummy analysis of variance above omits sums of squares, mean square and F • It is useful in designing an experiment as it lists all the factors and their respec1ve d.f. • It also allows us to check to see if there are suﬃcient d.f. for error • For moderate diﬀerences among popula1on means, d.f. for error should be ≥ 30 • If the d.f. for error is insuﬃcient, the number of block can be increased Randomized block designs – sums of squares • C is the square of the grand total of all observa1ons divided by the number of observa1ons, ab • TSS is the sum of squares of all ab observa1ons minus C (same as one
way design) • To calculate the sum of squares for means or for blocks: – Totals are found for each mean or block – These totals are squared and summed – The sums are then and divided by the number of observa1ons comprising each total – C is then subtracted from the result Randomized block designs – calcula1ng mean square values • To calculate the mean square value for the means, blocks and the error, divide each sum of squares by the corresponding degree of freedom • MSMa = SSMa / (a
1) • MSMb = SSMb / (b
1) • MSE = SSE / ((a
1)(b
1)) Randomized block designs – calcula1ng F values • Our mean square values MSMa and MSMb are then used to calculate two F sta1s1cs, Fa and Fb • Fa = MSMa/MSE • Fb = MSMb/MSE • These F values can then be compared to our cri1cal F values at a given level of signiﬁcance to determine whether there is a signiﬁcant diﬀerence among the means and whether there is a signiﬁcant diﬀerence among the blocks Randomized block designs – sums of squares • If blocking if eﬀec1ve in elimina1ng the variability associated with an extraneous variable, then the MSE will be reduced • Smaller error terms makes F test is more sensi1ve to small diﬀerences among the means Factorial Experiments • Factorial experiment: Experiment in which each level of one factor is combined with each level of every other factor to obtain all possible treatment combina1ons at which the trials are to be performed • There are many mul1
factor experimental designs to suit diﬀerent situa1ons • The 2k factorial design, where k factors are studied at two diﬀerent factor levels to observe their eﬀect on a response, is a useful method to help in the selec1on product and process parameters Factorial Designs – Joint Eﬀects • When two variables have a joint eﬀect on the response variable, there is an interac1on between them • Interac1ons between factors exist if the combined eﬀect of two factors is much more/less than the sum of the eﬀects caused by the individual factors alone • The overall eﬀects of these factors on the response are called main eﬀects • When signiﬁcant interac1on exists, main eﬀects calcula1ons have no meaning, if there is no interac1on (joint eﬀect is addi1ve), then interac1on eﬀects = 0 Factorial Designs – Replicates • Factorial Designs involve using replicates (balanced repe11ons of an experiment) • Replicates allows us to es1mate the interac1on between factors (variables studied) • Each replicate must be fully balanced – each replicate must contain the same number of observa1ons for each combina1on of variables Two
Factor Designs – ANOVA summary • Two
factor designs are like a randomized block design except that the design is replicated (2 or more balanced repe11ons) Source of Variability Replicates Factor A Factor B Interac1on Error Total d.f. r – 1 a – 1 b – 1 (a – 1)(b
1) (ab
1)(r
1) rab
1 Interpre1ng Experiment Results • Many experiment results are not clear • Replicates may diﬀer substan1ally, treatment combina1ons may overlap, and there is uncertainty about where diﬀerences are ‘true’ or are a result of experimental variability • Do over come this uncertainty, we calculate the eﬀect resul1ng from the individual factors and their interac1ons with each other to determine if eﬀects are signiﬁcant ...
View
Full
Document
This note was uploaded on 03/29/2011 for the course ENGR 9397 taught by Professor Susanhunt during the Winter '11 term at Memorial University.
 Winter '11
 SusanHunt

Click to edit the document details