variance--is approximately 0.0242. This value is crucial in analyzing the overall effectiveness and explanatory power of the model.
18 Source Degrees of Freedom Sum of Squares Mean Squares F-stat Crit-F (α= 0.10) Model 40 22.2694 0.0556735 11.5124 1.296 Error 16944 819.4052 0.04836 Total 16984 841.6747 Table 3: ANOVA table for the our multiple regressor model. Interestingly, only a single geographic location affects the expected return rate of a borrower: An application originating from western states would seemingly decrease the expected return by .0202 percent, all else held constant. The presence of all grades runs counter to our initial assumptions. Prior to this analyses, we presumed that grades “C” through “E” would yield a higher return rate compared to any other grades. This comes from overall grade trends and loan maturity data provided by Lending Club. Lastly, there are contradictory effects for certain interaction terms as well--namely, the single highest contributor to return rate X4X9 (two-year delinquencies multiplied by the number of public derogatory records)with marginal effect of 0.0590 percent; according to this model, the higher the number of delinquencies on a borrower's record, the higher his or her return rate will be. In fact, the overall effect of X4is theoretically 0.0031 + 0.0297X3+ 0.0012X5+ 0.0590X9--which is positive given any value of these variables. This contradiction persists with the effect of the square of the logarithm of annual income--negative, despite the positive effect of the associated linear term. Linear regression entails certain assumptions that the data may or may not follow. In order to ascertain the state of the profile data, we consider the residuals, or errors in estimation, and their distribution relative to the standard normal distribution. One central tenet of linear regression holds that residuals must be distributed normally; otherwise, the aforementioned model may not be an appropriate least-squares estimation of expected returns.
19 4.3 Analysis As mentioned before, the forward-stepwise selection process ensures that the chosen variables in the model are jointly significant as per an appropriate F-test. However, other measures indicate that this model is far from “good.” The ANOVA values indicate that the model has minimal explanatory power in the context of Lending Club. With a sum of errors of approximately 819.4052, where total sum of squares is 841.6747, the amount of explained variation lies at about 2.42 percent--far less than half, which most consider an appropriate benchmark of model viability. Beyond this, the proceeding figures are also troubling. Figure 6 clearly shows that the residuals do not elicit a normal distribution; the data is too skewed to the right. This skew towards higher residual values is further highlighted in Figure 7, where the variation in value increases as residuals increase. Though, the Q-Q plot illustrated in Figure 8 between residuals and normal quantiles is the final indication of this heavy skew. The graph is heavily off-center; this is a result of the inherent skew in residuals, but overall illustrates a failure in the primary linear regression assumptions.