Regression_analysis

# Regression_analysis - Regression Analysis Author: John M....

This preview shows pages 1–2. Sign up to view the full content.

Regression Analysis Author: John M. Cimbala, Penn State University Latest revision: 12 September 2007 Introduction Consider a set of n measurements of some variable y as a function of another variable x . Typically, y is some measured output as a function of some known input , x . Recall that the linear correlation coefficient is used to determine if there is a trend. If there is a trend, regression analysis is useful. Regression analysis is used to find an equation for y as a function of x that provides the best fit to the data . Linear regression analysis Linear regression analysis is also called linear least-squares fit analysis . The goal of linear regression analysis is to find the “best fit” straight line through a set of y vs. x data. The technique for deriving equations for this best-fit or least-squares fit line is as follows: o An equation for a straight line that attempts to fit the data pairs is chosen as Ya xb =+ . o In the above equation, a is the slope ( a = dy / dx – most of us are more familiar with the symbol m rather than a for the slope of a line), and b is the y-intercept – the y location where the line crosses the y axis (in other words, the value of Y at x = 0). o An upper case Y is used for the fitted line to distinguish the fitted data from the actual data values, y . o In linear regression analysis, coefficients a and b are optimized for the best possible fit to the data . o The optimization process itself is actually very straightforward: o For each data pair ( x i , y i ), error e i is defined as the difference between the predicted or fitted value and the actual value : e i = error at data pair i , or iii i i eYya xby = −= +− . e i is also called the residual . Note : Here, what we call the actual value does not necessarily mean the “correct” value, but rather the value of the actual measured data point. o We define E as the sum of the squared errors of the fit – a global measure of the error associated with all n data points. The equation for E is () 2 2 11 in ii i ea x b y == + ∑∑ E . o It is now assumed that the best fit is the one for which E is the smallest . o In other words, coefficients a and b that minimize E need to be found . These coefficients will be the ones that create the best-fit straight line Y = ax + b . o How can a and b be found such that E is minimized? Well, as any good engineer or mathematician knows, to find a minimum (or maximum) of a quantity, that quantity is differentiated , and the derivative is set to zero . o Here, two partial derivatives are required, since E is a function of two variables, a and b . Therefore, we set 0 E a = and 0 E b = . o After some algebra, which can be verified, the following equations result for coefficients a and b : 1 2 2 i i i nx y x y a x = = ⎛⎞ ⎜⎟ ⎝⎠ = 2 1 1 2 2 i i i i i x yx x y b x = = = = = and .

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 04/05/2008 for the course ME 345 taught by Professor Staff during the Spring '08 term at Pennsylvania State University, University Park.

### Page1 / 6

Regression_analysis - Regression Analysis Author: John M....

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online