093010 linear regress

# 093010 linear - Curve Fitting Sometimes data just cannot be fit to a straight line However we can fool the data into thinking it is a line Function

This preview shows pages 1–5. Sign up to view the full content.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Curve Fitting Sometimes data just cannot be fit to a straight line. However, we can fool the data into thinking it is a line Function: y = ax2 + b Plot: y vs x2 Function: y2 = a/x + b Plot: y2 vs l/x or y vs l/xl/2 However, there are 3 plots that should be tried first before fooling around with more exotic plots. 1) - linear y = mx +b 2)‘ exponential (or semi-log) y = a exp (bx) 3) logarithmic (power law) y = axb One of these 3 plots should reveal any simple relationship between dependent and independent variables. All 3 should be analyzed using the method of least squares. Method of Least Squares Data generally consist of ordered pairs (xi, y,) of independent (xi) and dependent (yi) variables which we have determined will be well represented by some line: yi = axi + b If the fit were perfect, then we should find an a,b such that yi - axi + b = 0 for all xi, yi, but due to the realities of data, this never happens. Instead, we might try to find a,b so that 2 d1: 2 yi = ——(axi + b) is minimized i i di is simply the distance of the ith point from the line to the point. Intuitively, we can agree that the "best choice" of (a,b) is the one for which the deviations are minimized. Mathematically, this is easier if we look at diz. Z d3=2 [yi - (axi + b)]2 because i i we don't care if d1 is positive or negative, we just want to look at the net distance. We see that 2 ¢(a,b) = 2 di 1 or that this deviation is a function of a,b, or choice of parameters for the line. We want to minimize ¢(a,b) to find the best values of ab for this data set. How to do it? Let's look at a simpler problem. Assume for a second that we knew the line should go through the origin: yi = mg, then (Ma) 2 2 d? = Z (yi - axi)2 is one variable. We know to find an extremum, we just take the derivative = 0 d d d—i’ = 0 = \$1§(Yi‘axi)2 Z 2(Yi—aXi)(-Xi) O = -222(xiyi)—axi2 O = xiyi-az xi2 1 l ZXiYi 22x.2 l l to So, the best a is just a sum of products of the data. For the more general case, q)(a,b) = O, we need 3:3 = 3%) = 0. If we do this, we get: NZXiYi ~° 2X1 2% l l l a = _—‘_—‘—"——“—— 2 NZ Xi ‘ [Z “)2 ‘ 1 l 2 23% in - ZXi EXiYi b = l l NZ Xiz - [Z X112 ifwelet 1 1 l 1 Sx in Sxy—ﬁEXiYi, then we can write these as (Appendix A.1) a: Sxx " S,,x,Sy b: M Sxx ‘ (Sx)2 Sxx " (sx)2 The next question should be - How well does the equation actually fit a straight line? How well do a,b represent that line? We already have a measure of the errors: ¢(a,b>= N3 = : [Yi - (axi + bu2 and the average deviation at each point is then: s.D.—962=¢(§}b) = ﬁ 2 [Yr (aXi+ b)l2 The error in each variable is related to the SD. of the fit: If we evaluate these, we get the following simple results _ N02 Sxx - (5x)2 qu 2 SXXGZ o =~——~— is sxx — (so2 and the proper answer is a i Ga, b i 0b and deviation = 0' How do we know if a straight line is the best representation of the data? Examine "correlation" coefficient. We started the process by assuming our dependent variable (yi) was linear with respect to independent variable xi: d1: ax;+b-yi -> axi + b = yi We might have equally written d, = a’y-1 + b’ — xi or a’yi + b’ = xi 1 We can show that: a, = s,”I asexsx b, = slugsx — sysxx 2 ’ 2 Syy _ Sy Syy _ Sy 1 2 1 Syy =ﬁ Zyi Sy = NEW and the two results are clearly related: axi+b=yi ml:— n m —>xi= —-and MS. 0" mlo‘ :: b’ If the two lines were perfect representations of the data, the two lines would be "correlated" / 1 I aa I - l a a l -1 However, if no correlation exits aa’ —> 0 We define a parameter r called the correlation coefﬁcient “Jag, z sX -sxs [(sxx — Si) (syy - SM“2 and tables have been developed to tell us the likelihood (probability), for a given r, that the relationship between the data points is no better than random chance. Probability of r > table being from random chance N .1 .01 . .001 (- probability ' I .988 1.00 1.00 I .805 .959 .992 10 I .549 .765 .872 40 I .264 .403 .502 100 I .168 .259 .327 If the correlation coefficient is quite low, there exists no dependent-independent relationship ' between variables. ...
View Full Document

## This note was uploaded on 12/29/2011 for the course CHE 10 taught by Professor Doyle,f during the Fall '08 term at UCSB.

### Page1 / 5

093010 linear - Curve Fitting Sometimes data just cannot be fit to a straight line However we can fool the data into thinking it is a line Function

This preview shows document pages 1 - 5. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online