Unformatted text preview: 3/8/2010 Problem (text 1329): The table below gives the population of a small but growing suburb over a twenty year period. Year Population 0 100 5 200 10 450 15 950 20 2000 The growth is assumed to be exponential: population = *exp(t), where t is a time in years What values of and best fit the data? What can the population be expected to be after 25 years? Ideally a nonlinear regression technique would be used to find the and that absolutely minimize the sum of the squares of the errors between the data points and the fitted curve. A good (but not perfect) answer can be obtained more simply by transforming the data and using linear regression. The basic procedure: y = *exp(x) ln(y) = ln() +x Let y' = ln(y) y , ( ) Then y'= ax +b, where a = and b = ln() Use linear regression to find best a and b. Then find and by applying =exp(b) and = a Matlab part 1: x = [0 5 10 15 20]; y = [100 200 450 950 2000]; yt = log(y); % transform the y values p = polyfit (x, yt, 1); % fit a straight line to the transformed data fitt = @(x) p(1) * x + p(2); % function for the fitted line 1 3/8/2010 Matlab part 2: figure (1) plot (x, yt, 'x', x, fitt(x), 'MarkerSize', 10); grid on; xlabel ('Years'); ylabel ('ln(Population)'); fprintf ('For transformed data a = %f, b = %f, r = %f\n', ... p(1), p(2), correlate (x, yt, fitt)); p(1) p(2) orrelate ( t fitt)) The first data point doesn't appear. This seems to happen a lot (a Matlab bug?). The best fit line is y = 0.1510 * x + 4.5841 The correlation coefficient for this straight line and the transformed data is 0.999789. Matlab part 3: % calculate alpha and beta alpha = exp(p(2)); beta = p(1); @( ) p p( ); fit = @(x) alpha * exp(beta * x); % function for fitted curve % need lots of x values to get a smooth plot of the fitted curve xplot = linspace (0, 25, 100); % plot up to 25 years yplot = fit(xfit); figure (2) plot (x, y, 'x', xplot, yplot, 'MarkerSize', 10); grid on; xlabel ('Years'); ylabel ('Population'); fprintf ('For original data alpha = %f, beta = %f, r = %f\n', ... alpha, beta, correlate (x, y, fit)); fprintf (`Predicted population after 25 years = %f\n', fit(25)); 2 3/8/2010 This time the first data point does show up. The best fit curve is y = 97.9148 * exp (0.1510 * x) The correlation coefficient for this curve and the original data is 0.999957 The predicted population after 25 years is 4268. The basic idea can be adapted to power equations: y = x log(y) = log() +log(x) Let x' = log(x) and y' = log(y) Then y'= ax' +b, where a = and b = log() Use linear regression to find best a and b. Then find and by applying =10b and = a Note: log is used instead of ln only for consistency with the text. ln would work equally well (use =exp(b)) would work equally well (use 3 3/8/2010 And to saturation growth rate equations as well: y = (x / ( +x)) 1/y = (/)(1/x) + (1/) Let x' = 1/x and y' = 1/y Then y'= ax' +b, where a = / and b = 1/ Use linear regression to find best a and b. Then find and by applying =1/b and = a/b The mathematics of linear regression:
Given : ( x1 , y1 ), ( x2 , y2 ), ( x3 , y3 )...( xn , yn ) To find : the straight line ( y ax b) that best fits the data We W must minimize E yi (axi b) t i i i
i 1 n n 2 a 2 xi b 2 yi 2abxi 2axi yi 2byi
2 2 i 1 At the minimum :
n E 2 2axi 2bxi 2 xi yi 0 a i 1 n E 2b 2axi 2 yi 0 b i 1 4 3/8/2010 Dividing both equations by 2 and expressing them in matrix form gives: xi 2 xi x a x y (1) b y i i i i where (1) (1) n
i 1 n Solving using Cramer's Rule produces: S l i i C ' R l d
b1 b2 a12 a22 n xi yi xi yi 2 2 A n xi xi a a11 b1 a21 b2 b A x y x x y n x x 2 i i 2 i i 2 i i i Aside: b is more easily calculated using b y ax Calculating of a and b involves first passing through the data points and calculating the following summations: y y x
i i 2 i x y
i i Once this is done formulas for a and b can be applied. For linear regression ONLY, the correlation coefficient r can be computed using:
r n xi yi xi yi
2 2 n xi xi 2 n yi yi 2 In addition to the summations listed above this requires y 2 i 5 3/8/2010 Linear Regression and the Casio Calculator: formula is y = Ax+B Mode Mode 2 (REG) REG stands for regression 1 (LIN) LIN stands for linear SHIFT CLR 1 (Scl) = clear statistical memory SHIFT CLR 1 (Scl) clear statistical memory x1 , y1 DT the DT key is the M+ key x2 , y2 DT .... and so on until all points entered To retrieve value of A: SHIFT SVAR > > 1 (A) = the SVAR key is the 2 key, > is right arrow To retrieve value of B: SHIFT SVAR > > 2 (B) = To retrieve the correlation coefficient: SHIFT SVAR > > 3 (r) = Other forms of regression are also supported. Polynomial regression: Linear regression involves fitting a first order polynomial (i.e. a polynomial of the form ax + b) to a set of data points. The basic idea is readily extended to higher order polynomials. Example:
X: 0 Y: 189.4 3 95.1 6 34.1 9 1.8 12 7.3 15 46.7 18 131.9 21 253.2 We want to fit a quadratic (i.e. a polynomial of the form y = ax2+bx+c) to the data. This can be done by using polyfit and specifying a second order polynomial. >> polyfit (x, y, 2) % 2 for second order The result is a 3 element containing a, b, and c (in that order). 6 3/8/2010 >> xplot = linspace (1, 22, 100); >> yplot = polyval (p, xplot); >> plot (x, y, 'o', xplot, yplot, 'MarkerSize', 10); >> fprintf ('The best fit curve is %6.4f * x^2 + %6.4f * x + %6.4f\n',... p(1), p(2), p(3)); The best fit curve is 2.0088 * x^2 + 39.5105 * x + 193.4125 >> f = @(x) p(1) * x .^ 2 + p(2) * x + p(3); >> r = correlate (x, y, f); >> r correlate (x y f); >> fprintf ('The correlation coefficient is %6.4f\n', r); The correlation coefficient is 0.9991 7 3/8/2010 The mathematics of quadratic regression:
Given : ( x1 , y1 ), ( x2 , y2 ), ( x3 , y3 )...( xn , yn ) To find : the quadratic ( y ax 2 bx c) that best fits the data We must minimize E yi (axi bxi c)
2 i 1 n 2 At the minimum E E E 0 a b c First order equations Filling in the details gives : xi 4 3 xi x 2 i x x a x y x x b x y x (1) c y 3 2 2 i i i i 2 i i i i i i where (1) (1) n
i 1 n The values of a, b, and c can be found by solving this series of equations. Equations = first order equations plus extra row and column. This pattern extends to higher order polynomials. 8 ...
View
Full Document
 Winter '09
 Linear Regression, Regression Analysis, Yi, ax

Click to edit the document details