This preview shows pages 1–5. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: Curve Fitting Sometimes data just cannot be fit to a straight line. However, we can fool the data into thinking it
is a line Function: y = ax2 + b
Plot: y vs x2
Function: y2 = a/x + b
Plot: y2 vs l/x or y vs l/xl/2
However, there are 3 plots that should be tried first before fooling around with more exotic plots.
1)  linear y = mx +b
2)‘ exponential (or semilog) y = a exp (bx)
3) logarithmic (power law) y = axb One of these 3 plots should reveal any simple relationship between dependent and independent
variables. All 3 should be analyzed using the method of least squares. Method of Least Squares Data generally consist of ordered pairs (xi, y,) of independent (xi) and dependent (yi)
variables which we have determined will be well represented by some line: yi = axi + b
If the fit were perfect, then we should find an a,b such that
yi  axi + b = 0 for all xi, yi, but
due to the realities of data, this never happens. Instead, we might try to find a,b so that
2 d1: 2 yi = ——(axi + b) is minimized
i i di is simply the distance of the ith point from the line to the point. Intuitively, we can agree that the
"best choice" of (a,b) is the one for which the deviations are minimized. Mathematically, this is
easier if we look at diz. Z d3=2 [yi  (axi + b)]2 because
i i
we don't care if d1 is positive or negative, we just want to look at the net distance. We see that
2
¢(a,b) = 2 di
1 or that this deviation is a function of a,b, or choice of parameters for the line. We want to minimize ¢(a,b) to find the best values of ab for this data set. How to do it? Let's look at a simpler problem. Assume for a second that we knew the line should go
through the origin: yi = mg, then
(Ma) 2 2 d? = Z (yi  axi)2 is one variable. We know to find an extremum, we just take the derivative = 0 d d
d—i’ = 0 = $1§(Yi‘axi)2 Z 2(Yi—aXi)(Xi)
O = 222(xiyi)—axi2 O = xiyiaz xi2
1 l ZXiYi 22x.2
l l
to So, the best a is just a sum of products of the data. For the more general case, q)(a,b) = O, we need 3:3 = 3%) = 0. If we do this, we get: NZXiYi ~° 2X1 2%
l l l a = _—‘_—‘—"——“——
2 NZ Xi ‘ [Z “)2 ‘ 1 l
2
23% in  ZXi EXiYi
b = l l NZ Xiz  [Z X112 ifwelet
1 1 l 1
Sx in Sxy—ﬁEXiYi, then we can write these as (Appendix A.1) a: Sxx " S,,x,Sy b: M
Sxx ‘ (Sx)2 Sxx " (sx)2 The next question should be  How well does the equation actually fit a straight line? How well do
a,b represent that line? We already have a measure of the errors: ¢(a,b>= N3 = : [Yi  (axi + bu2 and the average deviation at each point is then: s.D.—962=¢(§}b) = ﬁ 2 [Yr (aXi+ b)l2 The error in each variable is related to the SD. of the fit: If we evaluate these, we get the following simple results _ N02
Sxx  (5x)2 qu 2 SXXGZ
o =~——~—
is sxx — (so2 and the proper answer is a i Ga, b i 0b and deviation = 0' How do we know if a straight line is the best representation of the data?
Examine "correlation" coefficient. We started the process by assuming our dependent variable (yi) was linear with respect to
independent variable xi: d1: ax;+byi > axi + b = yi
We might have equally written
d, = a’y1 + b’ — xi or a’yi + b’ = xi
1 We can show that: a, = s,”I asexsx b, = slugsx — sysxx
2 ’ 2
Syy _ Sy Syy _ Sy 1 2 1
Syy =ﬁ Zyi Sy = NEW and the two results are clearly related: axi+b=yi ml:—
n
m —>xi= —and MS.
0"
mlo‘ :: b’
If the two lines were perfect representations of the data, the two lines would be "correlated"
/ 1
I aa I  l a a l 1 However, if no correlation exits aa’ —> 0 We define a parameter r called the correlation coefﬁcient “Jag, z sX sxs [(sxx — Si) (syy  SM“2 and tables have been developed to tell us the likelihood (probability), for a given r, that the
relationship between the data points is no better than random chance. Probability of r > table being from random chance N .1 .01 . .001 ( probability
' I .988 1.00 1.00
I .805 .959 .992
10 I .549 .765 .872
40 I .264 .403 .502
100 I .168 .259 .327 If the correlation coefficient is quite low, there exists no dependentindependent relationship
' between variables. ...
View
Full
Document
This note was uploaded on 12/29/2011 for the course CHE 10 taught by Professor Doyle,f during the Fall '08 term at UCSB.
 Fall '08
 Doyle,F

Click to edit the document details