This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Nonlinear Models Nonlinear x⇒y Identifying a Non-linear relationship relationship Plot continuous x and continuous y Is relationship obviously nonlinear? Is If not, fit with a linear model and plot residuals If against predicted y. against Are points randomly scattered about zero line? Are Residual Plots Residual Original and Residual Plot Original What to do? What Options Transform x or y Transform Fit nonlinear function Which one? Power, exponential, sine, …? Polynomial regression Piecewise regression
Predetermined knots or not… Spline fits Makes no assumptions about function Linear vs. cubic splines 1. Transforming x or y Typical functions Logarithm Square root Inverse Power (e.g., squaring) Violating homogeneity of variance Justifying your choice on theoretical grounds Cautions Shrinking large x values Square root, logarithm, and inverse (in order Square or shrinking power). or
Original Square root Logarithm Logarithm Inverse Expanding large x values Expanding Square, exponential So, which was the best fitting model? best Adjusted R2 Original Square root Log Square 1.3 Power .6438 .6100 .5516 .6589 .5590 2. Fit a nonlinear function 2. How to identify the right function? Experience - you recognize some relationships Theoretical - for example, exponential decays are Theoretical prevalent. Systematic exploration - e.g., Shull article can help Systematic you identify some. you Fitting a specific function Fitting A function has free parameters that must be function estimated. Typically this is done using gradient descent methods. gradient 1. Pick initial values for each parameter of the 1. function (you) function 2. Test the resulting function for each x-y pair 2. x-y (software task) and compute aggregate error (software 3. Change the parameter values a little (software). 4. Repeat steps 2 to 3 until error stops decreasing. Issue with fitting using gradient descent using Can be very sensitive to initial parameter Can values values Some values cause model to fail to converge (error Some doesn’t decrease in systematic way). doesn’t Some values result in “local minima” Can be multiple solutions Example Example I used the dataset “attenu” in R and tried to used model “dist” as a function of “accel” model The plot revealed a strong non-linear, negative The relationship. None of the log transforms from Shull’s Fig. 1 None created a straight line, but “F” (a hyperbolic) looked promising. looked a f (x = ) So, I tried this: So, b +c x JMP Demonstration JMP Special case: Fitting distributions Fitting A common case of non-linear curve fitting common involves fitting frequency distributions. involves Examples: Normal, LogNormal, Gamma, Examples: Poisson, Beta, Weibull, Exponential. Poisson,
1. Normal: µ (location) and σ (dispersion) Normal: 2. Weibull: location (rightward shift), shape Weibull: (exponential to normal-like) and scale (like ‘mean’) - good for fitting RTs ‘mean’) 3. Gamma: shape, scale, and threshold (lognormal, Gamma: exponential, and Weibull are all special cases). exponential, 3. Polynomial regression 3. Add polynomials of x as predictors. Add Quadratic: f(x) = β 0 + β 1x + β 2x2 Quadratic: Cubic: f(x) = β 0 + β 1x + β 2x2 + β 3x3 Cubic: Enter as many polynomial terms as you wish Enter and let the computer tell you which ones are significant. significant. Must use good model selection procedures to Must avoid using overly complex model (e.g., stepwise). avoid JMP Example JMP Use “attenu” data again… 4. Piecewise regression 4. Fit different regions of data separately. Fit Endpoints of the ranges are called the “knots” Endpoints of the regression. of Simple form: piecewise linear Like polynomial you add functions of x as Like predictors. predictors. But, the new functions are: (x-ε 1)+ for each knot location, ε i. This insures continuity at the knots. Piecewise linear (Fig 5.1 from Hastie et al.) (Fig Spline fits: A type of piecewise regression type Piecewise cubic regressions in which the knot Piecewise junctions must be continuous in the second derivative are called “spline fits” By hand, for two knots the predictors are: 1 (constant), x, x2, x3, (x-ε 1) 3+, and (x-ε 2)3+ Sometimes chosen based on theory or data. But, can use smoothing splines with knots at every point But, smoothing but change λ to differentially penalize complexity (higher λ values penalize curvature as computed by the second derivative) derivative) How many knots? Figure 5.2 from Hastie et al. Figure JMP Demonstration of smoothing splines smoothing 0. Basis Functions 0. A general method of conceptualizing many of general the techniques presented. the Basis functions are a set of functions that Basis replace x with additional variables which are transformations of x. M Generic Basis Function Model Generic f ( x) = ∑ β m hm ( x)
m =1 M is the total number of functions used hm represents one of the M functions β m is the beta weight for each function is Example: h1(x) = 1, h2(x) = x, h3(x) = x2 , h4(x) = x3 Example: Example: h1(x) = 1, h2(x) = log(x) Example: Example: h1(x) = 1, h2(x) = x, h3(x) = (x-ε 1)+, h4(x) = Example: (x) +, (x-ε 2)+, where t+ denotes the positive part where (x- Basis functions can be anything Basis Fourier transformed version of x (various Fourier frequencies). frequencies). Wavelet transforms of x. Wavelet Gaussian transforms of x (kernel regression). Gaussian Etc…. Basis functions can also be multivariate, i.e., Basis with multiple predictors as arguments: with For example, h1(x1,x2,x3) = …. Preprocessing of inputs, x Preprocessing of features is a very general and Preprocessing powerful method for improving the performance of a learning algorithm. performance By using domain knowledge to construct By appropriate features, one can usually improve on a learning method that has only the raw features x at its disposal. Much more efficient to build in knowledge than to Much make a powerful learning algorithm do it all. make ...
View Full Document
- Spring '11