lec12_linear_regression

# lec12_linear_regression -   Finish Numerical...

This preview shows page 1. Sign up to view the full content.

This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview:   Finish Numerical Integration & Regression   Monte Carlo Integration   Linear regression (fitting data)   Non-linear functions that can be linearized   Polynomial regression [ polyfit() in Pylab ]     HW#5 due Thursday. Midterm handouts: Due date is March 25!         Integration is the inverse operation to differentiation. Integral of a function is a measure of the “area” bounded by the function and the horizontal axis. Many analytic functions can integrated analytically to get a closed form solution. Area above axis is (+), area below axis is (-). ￿ b f (x)dx = area a         Newton-Cotes methods divide area up into panels for which the area can be easily calculated. Geometry of panel cap differentiates the type of method (line: Trapezoid, parabola: Simpson’s 1/3,…) Another idea is to randomly fill a box containing the function area with points. Ratio of points below the function to total points is ~ equal to the ratio of the area below function to total (known) area.     Monte Carlo is generally less efficient that Newton-Cotes methods for 1-D functions (like f(x) ). However, they can be much more efficient for higher dimension ￿￿￿ functions like: f (x, y, x)dxdydz           A common task for scientists is to compare a set of measured data with a mathematical (theoretical) model. Simplest model is a line: f ( x ) = mx + b Problem is to determine the slope and intercept which ‘best fits’ the data. Criteria for ‘best fit’ is that which minimizes the sum of squares of the residuals (difference between data point and model). A type of ‘optimization’ problem. A type of optimization problem: need to determine m and b which minimize the sum of the squares of residuals (R2).   Do this by taking derivative w.r.t. m and b and setting equal to zero. For N data points [(x,y) pairs]: ∂ (R2 ) N =0 ￿ 2 ∂m 2 R= [yi − mxi − b]   i=1 ∂ (R2 ) =0 ∂b   Now solve these equations to find optimal m and b: ￿ ¯ i yi (xi − x) m= ￿ ¯ i xi (xi − x) b = y − mx ¯ ¯     So how do we quantify how well the model fits the data? ￿ One option is the standard R2 deviation (for 2 free parameters): σ=   More commonly see the correlation coefficient (r2) in which ￿ 1.00 is a perfect fit. N −2 Sxy = r2 = 2 Sxy Sxx = Sxx − Syy Syy = i ￿ ￿ i i [xi yi − N xy ] ¯¯ 2 [xi 2 − Nx ] ¯ 2 [yi − N y 2 ] ¯   σ δm = √ Sxx   The quality of fit will determine the confidence (uncertainty) in the values for the fitted parameters. ￿ δb = σ 1 x2 ¯ + N Sxx For data shown, m=0.853+/-0.04 b=0.402+/-0.23 r2=0.993   Plot of the sum of squares of the residuals vs free parameters shows how fit values do in fact find the minimum.     Some non-linear functions can be manipulated to take a linear shape. Power Laws: take log of both sides & exponent becomes slope, log-log plot is a line. Use y = cx → ln(y ) = b ln(x) + ln(c) b   Exponentials: take log of both sides, argument becomes r.h.s, log-linear plot is a line. y = ae mx → ln(y ) = mx + ln(a) y = a + bx + cx2 + dx3 + ...           Can be linearized, but math is trickier. Pylab has function called polyfit(xdata,ydata,order) which returns coefficients (a,b,c,..) which minimize the sum of squares of residuals. order = 1: line, order = 2: quadratic, order = 3: cubic,…. Useful when taking derivatives of actual data which are sparse. For plotting, you can use the polyval() function to generate data for the curve.       Write code to determine slope and intercept from linear data. Use it to coefficient and decay constant for data: Plot the fit and data together on a log-linear plot (use semilogy) X Y ========== 0.0 2.10 0.2 0.92 0.4 0.55 0.6 0.21 0.8 0.15 1.0 0.04 ...
View Full Document

{[ snackBarMessage ]}

Ask a homework question - tutors are online