Regression_analysis - Regression Analysis Author: John M....

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Regression Analysis Author: John M. Cimbala, Penn State University Latest revision: 12 September 2007 Introduction Consider a set of n measurements of some variable y as a function of another variable x . Typically, y is some measured output as a function of some known input , x . Recall that the linear correlation coefficient is used to determine if there is a trend. If there is a trend, regression analysis is useful. Regression analysis is used to find an equation for y as a function of x that provides the best fit to the data . Linear regression analysis Linear regression analysis is also called linear least-squares fit analysis . The goal of linear regression analysis is to find the “best fit” straight line through a set of y vs. x data. The technique for deriving equations for this best-fit or least-squares fit line is as follows: o An equation for a straight line that attempts to fit the data pairs is chosen as Ya xb =+ . o In the above equation, a is the slope ( a = dy / dx – most of us are more familiar with the symbol m rather than a for the slope of a line), and b is the y-intercept – the y location where the line crosses the y axis (in other words, the value of Y at x = 0). o An upper case Y is used for the fitted line to distinguish the fitted data from the actual data values, y . o In linear regression analysis, coefficients a and b are optimized for the best possible fit to the data . o The optimization process itself is actually very straightforward: o For each data pair ( x i , y i ), error e i is defined as the difference between the predicted or fitted value and the actual value : e i = error at data pair i , or iii i i eYya xby = −= +− . e i is also called the residual . Note : Here, what we call the actual value does not necessarily mean the “correct” value, but rather the value of the actual measured data point. o We define E as the sum of the squared errors of the fit – a global measure of the error associated with all n data points. The equation for E is () 2 2 11 in ii i ea x b y == + ∑∑ E . o It is now assumed that the best fit is the one for which E is the smallest . o In other words, coefficients a and b that minimize E need to be found . These coefficients will be the ones that create the best-fit straight line Y = ax + b . o How can a and b be found such that E is minimized? Well, as any good engineer or mathematician knows, to find a minimum (or maximum) of a quantity, that quantity is differentiated , and the derivative is set to zero . o Here, two partial derivatives are required, since E is a function of two variables, a and b . Therefore, we set 0 E a = and 0 E b = . o After some algebra, which can be verified, the following equations result for coefficients a and b : 1 2 2 i i i nx y x y a x = = ⎛⎞ ⎜⎟ ⎝⎠ = 2 1 1 2 2 i i i i i x yx x y b x = = = = = and .
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 04/05/2008 for the course ME 345 taught by Professor Staff during the Spring '08 term at Pennsylvania State University, University Park.

Page1 / 6

Regression_analysis - Regression Analysis Author: John M....

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online