326a2ans - STATS 326 Applied Time Series ASSIGNMENT TWO...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: STATS 326 Applied Time Series ASSIGNMENT TWO Answer Guide Question One: > occupancy.ts<-ts(hotel.df$occupancy,start=1980,frequency=12) > plot(occupancy.ts,main="Hotel Occupancy (1980 - 1992)",xlab="time", ylab="occupancy") Hotel Occupancy (1980 - 1992) 1000 1100 Question Two: > > > > Time<-1:156 Month<-factor(rep(1:12,13)) sf.hotel.fit1<-lm(log.occupancy.ts~Time+Month) plot.ts(residuals(sf.hotel.fit1),main="Residual Series") Residual Series residuals(hotel.fit1) 1980 1982 1984 1986 time 1988 1990 1992 occupancy 800 600 700 500 -0.06 0 -0.04 -0.02 0.00 900 0.02 0.04 50 Time 100 150 The plot of the hotel occupancy series shows an increasing reasonably linear trend with a seasonal component that has increasing seasonal variation through time. A log transformation is indicated to try and make the seasonal variation more constant over time. > log.occupancy.ts<-ts(log(hotel.df$occupancy),start=1980,frequency=12) > plot(log.occupancy.ts,main="(log) Hotel Occupancy",xlab="time",ylab="(log) occupancy") (log) Hotel Occupancy > sf.hotel.fit2<-lm(log.occupancy.ts~Time+I(Time^2)+Month) > plot.ts(residuals(sf.hotel.fit2),main="Residual Series") Residual Series residuals(sf.hotel.fit2a) 1980 1982 1984 1986 time 1988 1990 1992 (log) occupancy 6.6 6.4 6.2 The plot of the log transformed hotel occupancy series shows an increasing reasonably linear trend with a reasonably constant seasonal component. The seasonal peak occurs in July or August and the seasonal troughs occur in February and November. -0.04 0 -0.02 0.00 0.02 6.8 0.04 7.0 50 Time 100 150 > acf(residuals(sf.hotel.fit2)) > acf(residuals(sf.hotel.fit3)) Series residuals(sf.hotel.fit2a) 1.0 1.0 Series residuals(sf.hotel.fit3) 0.8 0.6 0.4 ACF 0.2 ACF 0.0 -0.2 0 5 10 Lag 15 20 -0.2 0.0 0.2 0.4 0.6 0.8 0 5 10 Lag 15 20 > sf.hotel.fit3<-lm(log.occupancy.ts[-1]~Time[-1]+I(Time[-1]^2)+ Month[-1]+log.occupancy.ts[-156]) > plot.ts(residuals(sf.hotel.fit3),main="Residual Series") > normcheck(residuals(sf.hotel.fit3)) Normal Q-Q Plot 0.04 Histogram of residuals(sf.hotel.fit3) Residual Series Shapiro-Wilk normality test W = 0.9941 P-value = 0.785 0.04 Sample Quantiles 0.02 0.02 0.00 residuals(sf.hotel.fit3) Density -2 -1 0 1 2 0.00 -0.02 -0.02 -0.04 -0.04 0 -0.06 5 10 15 20 -0.04 -0.02 0.00 0.02 0.04 0.06 residuals(sf.hotel.fit3) residuals(sf.hotel.fit3) 0 50 Time 100 150 > summary(sf.hotel.fit3) Call: lm(formula = log.occupancy.ts[-1] ~ Time[-1] + I(Time[-1]^2) + Month[-1] + log.occupancy.ts[-156]) Residuals: Min 1Q -0.054309 -0.012768 Coefficients: Estimate Std. Error t value (Intercept) 4.236e+00 5.008e-01 8.457 Time[-1] 2.209e-03 2.974e-04 7.428 I(Time[-1]^2) -2.118e-06 8.950e-07 -2.367 Month[-1]2 -5.621e-02 8.291e-03 -6.780 Month[-1]3 -6.063e-03 1.155e-02 -0.525 Month[-1]4 1.093e-01 9.951e-03 10.981 Month[-1]5 4.991e-02 8.414e-03 5.931 Month[-1]6 1.837e-01 7.878e-03 23.323 Month[-1]7 2.832e-01 1.436e-02 19.719 Month[-1]8 2.617e-01 2.457e-02 10.650 Month[-1]9 -4.353e-04 2.633e-02 -0.017 Month[-1]10 6.386e-02 9.034e-03 7.070 Month[-1]11 -8.471e-02 8.392e-03 -10.094 Month[-1]12 7.453e-02 1.165e-02 6.397 log.occupancy.ts[-156] 3.183e-01 7.993e-02 3.983 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ Pr(>|t|) 3.25e-14 9.88e-12 0.019305 3.10e-10 0.600443 < 2e-16 2.24e-08 < 2e-16 < 2e-16 < 2e-16 0.986836 6.76e-11 < 2e-16 2.21e-09 0.000109 *** *** * *** *** *** *** *** *** *** *** *** *** Technical Notes: See Question One for comments on the plots of the Hotel Occupancy Series. sf.hotel.fit1: The Residual Series shows some evidence of curvature and some clustering of positive and negative residuals over time. A quadratic in time may be appropriate to take care of the curvature. sf.hotel.fit2: After fitting a quadratic time trend, the Residual Series appears to be centred around 0, but there is still evidence of clustering of positive and negative residuals over time. The plot of the autocorrelation function shows significant positive autocorrelation for lags 1, 12 and 13, and significant negative autocorrelation for lags 3 – 6 and 18. We need to correct for autocorrelation. sf.hotel.fit3: A lagged response variable was added as an additional explanatory variable to correct for autocorrelation. The Residual Series looks like White Noise, although there are a couple of large residuals. The plot of the autocorrelation function shows significant positive autocorrelations for lags 2 and 12 and significant negative autocorrelations for lags 3, 5 and 18. Some of these lags are quite significant indicating some autocorrelation structure is still present in the Residual Series. The Residual Series is not White Noise. The Normal Q-Q plot shows the residuals appear to have come from a normal distribution as the points lie close to the reference line. The Shapiro-Wilk test provides no evidence against the hypothesis that the underlying errors are normally distributed (Pvalue = 0.785). The model explains 99% of the variation in log hotel occupancy, but this may be inflated due to the autocorrelation in the Residual Series. The model should be excellent for prediction however. The F-statistic provides no evidence against the hypothesis that none of the variables are related to the log occupancy figures (P-value ≈ 0), but this may be unreliable due to the autocorrelation in the Residual Series. The quadratic time trend is highly significant (P-value = 9.88 × 10-12 for the Time variable and P-value = 0.019305 for the quadratic (Time2) variable). For the levels of the seasonal factor: • we have no evidence that log hotel occupancy in March (P-value = 0.600443) and in September (P-value = 0.986836) are different to that in January, on average • we have very strong evidence that February (P-value = 3.10 × 10-10) and November (P-value ≈ 0) have lower log hotel occupancy than January, on average • we have very strong evidence that April (P-value ≈ 0), May (P-value = 2.24 × 10-8), June (P-value ≈ 0), July (P-value ≈ 0), August (P-value ≈ 0), October (Pvalue = 6.76 × 10-11) and December (P-value = 2.21 × 10-9) have higher log hotel occupancy than January, on average. Median 0.000795 3Q 0.011939 Max 0.047473 0.1 ‘ ’ 1 Residual standard error: 0.01904 on 140 degrees of freedom Multiple R-squared: 0.9904, Adjusted R-squared: 0.9895 F-statistic: 1033 on 14 and 140 DF, p-value: < 2.2e-16 We have very strong evidence against the hypothesis of no autocorrelation (P-value = 0.000109). The t-tests and P-values for the trend, the levels of the seasonal factor and the autocorrelation term may also be unreliable because of the autocorrelation in the Residual Series. > sf.y157<-sf.hotel.fit3$coef[1]+sf.hotel.fit3$coef[2]*157+ sf.hotel.fit3$coef[3]*(157^2)+ sf.hotel.fit3$coef[15]*log.occupancy.ts[156] > sf.y157 (Intercept) 6.663182 > sf.y158<-sf.hotel.fit3$coef[1]+sf.hotel.fit3$coef[2]*158+ sf.hotel.fit3$coef[3]*(158^2)+sf.hotel.fit3$coef[4]+ sf.hotel.fit3$coef[15]*sf.y157 > sf.y158 (Intercept) 6.596561 > sf.y159<-sf.hotel.fit3$coef[1]+sf.hotel.fit3$coef[2]*159+ sf.hotel.fit3$coef[3]*(159^2)+sf.hotel.fit3$coef[5]+ sf.hotel.fit3$coef[15]*sf.y158 > sf.y159 (Intercept) 6.627039 > sf.y160<-sf.hotel.fit3$coef[1]+sf.hotel.fit3$coef[2]*160+ sf.hotel.fit3$coef[3]*(160^2)+sf.hotel.fit3$coef[6]+ sf.hotel.fit3$coef[15]*sf.y159 > sf.y160 (Intercept) 6.753603 > sf.y161<-sf.hotel.fit3$coef[1]+sf.hotel.fit3$coef[2]*161+ sf.hotel.fit3$coef[3]*(161^2)+sf.hotel.fit3$coef[7]+ sf.hotel.fit3$coef[15]*sf.y160 > sf.y161 (Intercept) 6.736059 > sf.y162<-sf.hotel.fit3$coef[1]+sf.hotel.fit3$coef[2]*162+ sf.hotel.fit3$coef[3]*(162^2)+sf.hotel.fit3$coef[8]+ sf.hotel.fit3$coef[15]*sf.y161 > sf.y162 (Intercept) 6.865822 > sf.pred<-c(sf.y157,sf.y158,sf.y159,sf.y160,sf.y161,sf.y162) > exp(sf.pred) (Intercept) (Intercept) (Intercept) (Intercept) (Intercept) (Intercept) 783.0390 732.5716 755.2425 857.1418 842.2352 958.9339 Question Three: > decomp.occupancy<-stl(log.occupancy.ts,s.window="periodic") > decomp.occupancy Call: stl(x = log.occupancy.ts, s.window = "periodic") Components Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1980 1980 1980 1980 1980 1980 1980 1980 1980 1980 1980 1980 seasonal -0.091713875 -0.159592920 -0.131070705 -0.006305409 -0.025719538 0.102381023 0.242968476 0.266629419 0.012419590 -0.004197088 -0.157943993 -0.047854972 trend 6.337874 6.337923 6.337973 6.338266 6.338559 6.339230 6.339900 6.340744 6.341588 6.343394 6.345199 6.349132 remainder -2.955354e-02 1.198511e-02 1.567405e-02 2.761332e-02 -1.205367e-02 7.278850e-03 7.432531e-03 -2.120176e-02 1.760430e-02 -4.393043e-02 -1.346898e-02 -2.839960e-02 > sa.log.occupancy.ts<-log.occupancy.tsdecomp.occupancy$time.series[,1] > plot(sa.log.occupancy.ts,main="(log) De-seasonalised Hotel Occupancy (1980 - 1992)",xlab="time",ylab="(log) occupancy") (log) De-seasonalised Hotel Occupancy (1980 - 1992) (log) occupancy 6.3 1980 6.4 6.5 6.6 6.7 1982 1984 1986 time 1988 1990 1992 > sa.hotel.fit1<-lm(sa.log.occupancy.ts~Time) > plot.ts(residuals(sa.hotel.fit1),main="Residual Series") > acf(residuals(sa.hotel.fit2)) Series residuals(sa.hotel.fit2) Residual Series 1.0 0.04 residuals(sa.hotel.fit1) 0.02 0.00 ACF -0.02 -0.04 -0.06 -0.2 0.0 0.2 0.4 0.6 0.8 0 5 10 Lag 15 20 0 50 Time 100 150 > sa.hotel.fit2<-lm(sa.log.occupancy.ts~Time+I(Time^2)) > plot.ts(residuals(sa.hotel.fit2),main="Residual Series") > sa.hotel.fit3<-lm(sa.log.occupancy.ts[-1]~Time[-1]+I(Time[-1]^2)+ sa.log.occupancy.ts[-156]) > plot.ts(residuals(sa.hotel.fit3),main="Residual Series") Residual Series Residual Series 0.04 residuals(sa.hotel.fit3) -0.04 0 0 50 Time 100 150 0.04 residuals(sa.hotel.fit2) 0.02 0.00 -0.04 -0.02 -0.02 0.00 0.02 50 Time 100 150 > acf(residuals(sa.hotel.fit3)) Series residuals(sa.hotel.fit3) 1.0 > summary(sa.hotel.fit3) Call: lm(formula = sa.log.occupancy.ts[-1] ~ Time[-1] + I(Time[-1]^2) + sa.log.occupancy.ts[-156]) Residuals: Min 1Q -0.055533 -0.012900 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.300e+00 4.861e-01 8.846 2.23e-15 Time[-1] 2.207e-03 2.862e-04 7.711 1.56e-12 I(Time[-1]^2) -2.138e-06 8.625e-07 -2.479 0.0143 sa.log.occupancy.ts[-156] 3.197e-01 7.697e-02 4.154 5.45e-05 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.01838 on 151 degrees of freedom Multiple R-squared: 0.9789, Adjusted R-squared: 0.9784 F-statistic: 2331 on 3 and 151 DF, p-value: < 2.2e-16 *** *** * *** Median 0.001156 3Q 0.011377 Max 0.047107 ACF -0.2 0.0 0.2 0.4 0.6 0.8 0 5 10 Lag 15 20 > normcheck(residuals(sa.hotel.fit3)) Normal Q-Q Plot Shapiro-Wilk normality test W = 0.9952 P-value = 0.894 0.04 Histogram of residuals(sa.hotel.fit3) 0.02 Sample Quantiles 0.00 Density -2 -1 0 1 2 -0.02 -0.04 0 -0.06 5 10 15 20 -0.04 -0.02 0.00 0.02 0.04 0.06 residuals(sa.hotel.fit3) residuals(sa.hotel.fit3) Technical Notes: See Question One for comments on the plots of the Hotel Occupancy Series. decomp.occupancy: The seasonally adjusted series shows an increasing possibly non-linear trend. The seasonal estimates show that, on average: • January to May and October to December are lower than the overall trend with February (−0.15959292) being further from the trend than the other months • June to September are higher than the overall trend with August (0.266629419) being further from the trend than the other months. sa.hotel.fit1: The Residual Series shows some evidence of curvature and some clustering of positive and negative residuals over time. A quadratic in time may be appropriate to take care of the curvature. sa.hotel.fit2: After fitting a quadratic time trend, the Residual Series appears to be centred around 0, but there is still evidence of clustering of positive and negative residuals over time. The plot of the autocorrelation function shows significant positive autocorrelation for lags 1, 12 and 13, and significant negative autocorrelation for lags 3 – 6 and 18. We need to correct for autocorrelation. sa.hotel.fit3: A lagged response variable was added as an additional explanatory variable to correct for autocorrelation. The Residual Series looks like White Noise, although there are a couple of large residuals. The plot of the autocorrelation function shows significant positive autocorrelations for lags 2 and 12 and significant negative autocorrelations for lags 3, 5 and 18. Some of these lags are quite significant indicating some autocorrelation structure is still present in the Residual Series. The Residual Series is not White Noise. The Normal Q-Q plot shows the residuals appear to have come from a normal distribution as the points lie close to the reference line. The Shapiro-Wilk test provides no evidence against the hypothesis that the underlying errors are normally distributed (Pvalue = 0.894). The model explains 98% of the variation in seasonally adjusted log hotel occupancy, but this may be inflated due to the autocorrelation in the Residual Series. The model should be excellent for prediction however. The F-statistic provides no evidence against the hypothesis that none of the variables are related to the log occupancy figures (P-value ≈ 0), but this may be unreliable due to the autocorrelation in the Residual Series. The quadratic time trend is highly significant (P-value = 1.56 × 10-12 for the Time variable and P-value = 0.0143 for the quadratic (Time2) variable). We have very strong evidence against the hypothesis of no autocorrelation (P-value = 5.45 × 10-5). The t-tests and P-values for the trend and the autocorrelation term may also be unreliable because of the autocorrelation in the Residual Series. > sa.y157<-sa.hotel.fit3$coef[1]+sa.hotel.fit3$coef[2]*157 +sa.hotel.fit3$coef[3]*(157^2)+ sa.hotel.fit3$coef[4]*sa.log.occupancy.ts[156] > sa.y157 (Intercept) 6.751427 > sa.y158<-sa.hotel.fit3$coef[1]+sa.hotel.fit3$coef[2]*158 +sa.hotel.fit3$coef[3]*(158^2)+sa.hotel.fit3$coef[4]*sa.y157 > sa.y158 (Intercept) 6.753869 > sa.y159<-sa.hotel.fit3$coef[1]+sa.hotel.fit3$coef[2]*159 +sa.hotel.fit3$coef[3]*(159^2)+sa.hotel.fit3$coef[4]*sa.y158 > sa.y159 (Intercept) 6.756179 > sa.y160<-sa.hotel.fit3$coef[1]+sa.hotel.fit3$coef[2]*160 +sa.hotel.fit3$coef[3]*(160^2)+sa.hotel.fit3$coef[4]*sa.y159 > sa.y160 (Intercept) 6.758442 > sa.y161<-sa.hotel.fit3$coef[1]+sa.hotel.fit3$coef[2]*161 +sa.hotel.fit3$coef[3]*(161^2)+sa.hotel.fit3$coef[4]*sa.y160 > sa.y161 (Intercept) 6.760687 > sa.y162<-sa.hotel.fit3$coef[1]+sa.hotel.fit3$coef[2]*162 +sa.hotel.fit3$coef[3]*(162^2)+sa.hotel.fit3$coef[4]*sa.y161 > sa.y162 (Intercept) 6.76292 > sa.pred<-c(sa.y157,sa.y158,sa.y159,sa.y160,sa.y161,sa.y162) > sa.pred (Intercept) (Intercept) (Intercept) (Intercept) (Intercept) (Intercept) 6.751427 6.753869 6.756179 6.758442 6.760687 6.762921 > seasonal<-decomp.occupancy$time.series[1:6,1] > seasonal [1] -0.091713875 -0.159592920 -0.131070705 -0.006305409 -0.025719538 [6] 0.102381023 > reseas.pred<-sa.pred+seasonal > reseas.pred (Intercept) (Intercept) (Intercept) (Intercept) (Intercept) (Intercept) 6.659713 6.594276 6.625108 6.752137 6.734967 6.865302 > exp(reseas.pred) (Intercept) (Intercept) (Intercept) (Intercept) (Intercept) (Intercept) 780.3271 730.8995 753.7857 855.8857 841.3158 958.4348 Question Four: > plot.ts(log.occupancy.ts[1:24],main="Hotel Occupancy (1980 – 1981)",xlab="time",ylab="occupancy") > harm.hotel.fit1<-lm(log.occupancy.ts~Time+c1+s1+c2+s2+c3+s3+c4+s4+c5+ s5+c6) > plot.ts(residuals(harm.hotel.fit1),main="Residual Series") Hotel Occupancy (1980 - 1981) 0.04 Residual Series 6.6 residuals(harm.hotel.fit1) 6.5 occupancy 6.4 6.3 6.2 -0.06 0 -0.04 -0.02 0.00 0.02 5 10 time 15 20 50 Time 100 150 > harm.hotel.fit2<-lm(log.occupancy.ts~Time+I(Time^2)+c1+s1+c2+ s2+c3+s3+c4+s4+c5+s5+c6) > plot.ts(residuals(harm.hotel.fit2),main="Residual Series") Residual Series c1<-cos(2*pi*Time*(1/12)) s1<-sin(2*pi*Time*(1/12)) c2<-cos(2*pi*Time*(2/12)) s2<-sin(2*pi*Time*(2/12)) c3<-cos(2*pi*Time*(3/12)) s3<-sin(2*pi*Time*(3/12)) c4<-cos(2*pi*Time*(4/12)) s4<-sin(2*pi*Time*(4/12)) c5<-cos(2*pi*Time*(5/12)) s5<-sin(2*pi*Time*(5/12)) c6<-cos(2*pi*Time*(6/12)) residuals(harm.hotel.fit2) -0.04 0 -0.02 0.00 0.02 0.04 50 Time 100 150 > acf(residuals(harm.hotel.fit2)) Series residuals(harm.hotel.fit2) 1.0 > acf(residuals(harm.hotel.fit3)) Series residuals(harm.hotel.fit3) 1.0 ACF 0.8 0.6 0.4 ACF 0.2 0.0 -0.2 0 5 10 Lag 15 20 -0.2 0.0 0.2 0.4 0.6 0.8 0 5 10 Lag 15 20 > harm.hotel.fit3<-lm(log.occupancy.ts[-1]~Time[-1]+I(Time[-1]^2)+ c1[-1]+s1[-1]+c2[-1]+s2[-1]+c3[-1]+s3[-1]+c4[-1]+s4[-1]+c5[-1]+ s5[-1]+c6[-1]+log.occupancy.ts[-156]) > plot.ts(residuals(harm.hotel.fit3),main="Residual Series") > normcheck(harm.hotel.fit3) Normal Q-Q Plot Residual Series Shapiro-Wilk normality test W = 0.9941 P-value = 0.785 0.04 Histogram of 1]^2) + c1[-1] + s1[-1] + c2[-1] + s2[-1] + c3[-1] + s3[-1] + 0.04 Sample Quantiles 0.02 0.00 residuals(harm.hotel.fit3) D ensity -2 -1 0 1 2 0.02 0.00 -0.02 -0.02 -0.04 -0.04 0 -0.06 5 10 15 20 -0.04 -0.02 0.00 0.02 0.04 0.06 (Time[-1]^2) + c1[-1] + s1[-1] + c2[-1] + s2[-1] + c3[-1] + s3[-1] + c4[-1] I(Time[-1]^2) + c1[-1] + s1[-1] + c2[-1] + s2[-1] + c3[-1] + s3[-1] + c4[-1] 0 50 Time 100 150 > summary(harm.hotel.fit3) Call: lm(formula = log.occupancy.ts[-1] ~ Time[-1] + I(Time[-1]^2) + c1[-1] + s1[-1] + c2[-1] + s2[-1] + c3[-1] + s3[-1] + c4[-1] + s4[-1] + c5[-1] + s5[-1] + c6[-1] + log.occupancy.ts[-156]) Residuals: Min 1Q -0.054309 -0.012768 Coefficients: Estimate Std. Error t value (Intercept) 4.309e+00 5.048e-01 8.536 Time[-1] 2.209e-03 2.974e-04 7.428 I(Time[-1]^2) -2.118e-06 8.950e-07 -2.367 c1[-1] -1.088e-01 5.305e-03 -20.504 s1[-1] -5.265e-02 1.231e-02 -4.277 c2[-1] 3.327e-02 4.405e-03 7.553 s2[-1] 5.056e-02 4.431e-03 11.410 c3[-1] 4.235e-02 2.288e-03 18.509 s3[-1] -2.382e-02 4.228e-03 -5.634 c4[-1] -1.029e-02 2.648e-03 -3.885 s4[-1] 4.123e-02 3.225e-03 12.783 c5[-1] 1.183e-02 2.534e-03 4.666 s5[-1] 2.601e-02 2.468e-03 10.542 c6[-1] 3.292e-02 2.519e-03 13.066 log.occupancy.ts[-156] 3.183e-01 7.993e-02 3.983 --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ Pr(>|t|) 2.07e-14 9.88e-12 0.019305 < 2e-16 3.48e-05 5.01e-12 < 2e-16 < 2e-16 9.32e-08 0.000157 < 2e-16 7.09e-06 < 2e-16 < 2e-16 0.000109 *** *** * *** *** *** *** *** *** *** *** *** *** *** *** Technical Notes: See Question One for comments on the plots of the Hotel Occupancy Series. The plot of the first 24 observations of the log hotel occupancy series shows a seasonal pattern that does not complete one smooth cycle per year so that it is unlikely that a cosine model will capture the seasonal pattern very well. It will be better to use a full harmonic model on the log hotel occupancy series. harm.hotel.fit1: The Residual Series shows some evidence of curvature and some clustering of positive and negative residuals over time. A quadratic in time may be appropriate to take care of the curvature. harm.hotel.fit2: After fitting a quadratic time trend, the Residual Series appears to be centred around 0, but there is still evidence of clustering of positive and negative residuals over time. The plot of the autocorrelation function shows significant positive autocorrelation for lags 1, 12, 13, and 14 and significant negative autocorrelation for lags 3 – 6 and 18. We need to correct for autocorrelation. harm.hotel.fit3: A lagged response variable was added as an additional explanatory variable to correct for autocorrelation. The Residual Series looks like White Noise, although there are a couple of large residuals. The plot of the autocorrelation function shows significant positive autocorrelations for lags 2 and 12 and significant negative autocorrelations for lags 3, 5 and 18. Some of these lags are quite significant indicating some autocorrelation structure is still present in the Residual Series. The Residual Series is not White Noise. All of the harmonic terms were found to be significant. The Normal Q-Q plot shows the residuals appear to have come from a normal distribution as the points lie close to the reference line. The Shapiro-Wilk test provides no evidence against the hypothesis that the underlying errors are normally distributed (Pvalue = 0.785). The model explains 99% of the variation in log hotel occupancy, but this may be inflated due to the autocorrelation in the Residual Series. The model should be excellent for prediction however. The F-statistic provides no evidence against the hypothesis that none of the variables are related to the log occupancy figures (P-value ≈ 0), but this may be unreliable due to the autocorrelation in the Residual Series. The quadratic time trend is highly significant (P-value = 9.88 × 10-12 for the Time variable and P-value = 0.019305 for the quadratic (Time2) variable). For the seasonal harmonic terms we have very strong evidence against the hypothesis that each harmonic is not needed in the model (P-value = 0.000157 for the cosine harmonic with frequency 4/12 which is the least significant and P-value ≈ 0 for the cosine Median 0.000795 3Q 0.011939 Max 0.047473 0.1 ‘ ’ 1 Residual standard error: 0.01904 on 140 degrees of freedom Multiple R-squared: 0.9904, Adjusted R-squared: 0.9895 F-statistic: 1033 on 14 and 140 DF, p-value: < 2.2e-16 harmonics with frequencies 1/12, 3/12 and 6/12 and for the sine harmonics with frequencies 2/12, 4/12 and 5/12 which are the most significant). We have very strong evidence against the hypothesis of no autocorrelation (P-value = 0.000109). The t-tests and P-values for the trend, the harmonics and the autocorrelation term may also be unreliable because of the autocorrelation in the Residual Series. > harm.y157<-harm.hotel.fit3$coef[1]+harm.hotel.fit3$coef[2]*157+ + harm.hotel.fit3$coef[3]*(157^2)+ + harm.hotel.fit3$coef[4]*cos(2*pi*157*(1/12))+ + harm.hotel.fit3$coef[5]*sin(2*pi*157*(1/12))+ + harm.hotel.fit3$coef[6]*cos(2*pi*157*(2/12))+ + harm.hotel.fit3$coef[7]*sin(2*pi*157*(2/12))+ + harm.hotel.fit3$coef[8]*cos(2*pi*157*(3/12))+ + harm.hotel.fit3$coef[9]*sin(2*pi*157*(3/12))+ + harm.hotel.fit3$coef[10]*cos(2*pi*157*(4/12))+ + harm.hotel.fit3$coef[11]*sin(2*pi*157*(4/12))+ + harm.hotel.fit3$coef[12]*cos(2*pi*157*(5/12))+ + harm.hotel.fit3$coef[13]*sin(2*pi*157*(5/12))+ + harm.hotel.fit3$coef[14]*cos(2*pi*157*(6/12))+ + harm.hotel.fit3$coef[15]*log.occupancy.ts[156] > harm.y157 (Intercept) 6.663182 > harm.y158<-harm.hotel.fit3$coef[1]+harm.hotel.fit3$coef[2]*158+ + harm.hotel.fit3$coef[3]*(158^2)+ + harm.hotel.fit3$coef[4]*cos(2*pi*158*(1/12))+ + harm.hotel.fit3$coef[5]*sin(2*pi*158*(1/12))+ + harm.hotel.fit3$coef[6]*cos(2*pi*158*(2/12))+ + harm.hotel.fit3$coef[7]*sin(2*pi*158*(2/12))+ + harm.hotel.fit3$coef[8]*cos(2*pi*158*(3/12))+ + harm.hotel.fit3$coef[9]*sin(2*pi*158*(3/12))+ + harm.hotel.fit3$coef[10]*cos(2*pi*158*(4/12))+ + harm.hotel.fit3$coef[11]*sin(2*pi*158*(4/12))+ + harm.hotel.fit3$coef[12]*cos(2*pi*158*(5/12))+ + harm.hotel.fit3$coef[13]*sin(2*pi*158*(5/12))+ + harm.hotel.fit3$coef[14]*cos(2*pi*158*(6/12))+ + harm.hotel.fit3$coef[15]*harm.y157 > harm.y158 (Intercept) 6.596561 > harm.y159<-harm.hotel.fit3$coef[1]+harm.hotel.fit3$coef[2]*159+ + harm.hotel.fit3$coef[3]*(159^2)+ + harm.hotel.fit3$coef[4]*cos(2*pi*159*(1/12))+ + harm.hotel.fit3$coef[5]*sin(2*pi*159*(1/12))+ + harm.hotel.fit3$coef[6]*cos(2*pi*159*(2/12))+ + harm.hotel.fit3$coef[7]*sin(2*pi*159*(2/12))+ + harm.hotel.fit3$coef[8]*cos(2*pi*159*(3/12))+ + harm.hotel.fit3$coef[9]*sin(2*pi*159*(3/12))+ + harm.hotel.fit3$coef[10]*cos(2*pi*159*(4/12))+ + harm.hotel.fit3$coef[11]*sin(2*pi*159*(4/12))+ + harm.hotel.fit3$coef[12]*cos(2*pi*159*(5/12))+ + harm.hotel.fit3$coef[13]*sin(2*pi*159*(5/12))+ + harm.hotel.fit3$coef[14]*cos(2*pi*159*(6/12))+ + harm.hotel.fit3$coef[15]*harm.y158 > harm.y159 (Intercept) 6.627039 > harm.y160<-harm.hotel.fit3$coef[1]+harm.hotel.fit3$coef[2]*160+ + harm.hotel.fit3$coef[3]*(160^2)+ + harm.hotel.fit3$coef[4]*cos(2*pi*160*(1/12))+ + harm.hotel.fit3$coef[5]*sin(2*pi*160*(1/12))+ + harm.hotel.fit3$coef[6]*cos(2*pi*160*(2/12))+ + harm.hotel.fit3$coef[7]*sin(2*pi*160*(2/12))+ + harm.hotel.fit3$coef[8]*cos(2*pi*160*(3/12))+ + harm.hotel.fit3$coef[9]*sin(2*pi*160*(3/12))+ + harm.hotel.fit3$coef[10]*cos(2*pi*160*(4/12))+ + harm.hotel.fit3$coef[11]*sin(2*pi*160*(4/12))+ + harm.hotel.fit3$coef[12]*cos(2*pi*160*(5/12))+ + harm.hotel.fit3$coef[13]*sin(2*pi*160*(5/12))+ + harm.hotel.fit3$coef[14]*cos(2*pi*160*(6/12))+ + harm.hotel.fit3$coef[15]*harm.y159 > harm.y160 (Intercept) 6.753603 > harm.y161<-harm.hotel.fit3$coef[1]+harm.hotel.fit3$coef[2]*161+ + harm.hotel.fit3$coef[3]*(161^2)+ + harm.hotel.fit3$coef[4]*cos(2*pi*161*(1/12))+ + harm.hotel.fit3$coef[5]*sin(2*pi*161*(1/12))+ + harm.hotel.fit3$coef[6]*cos(2*pi*161*(2/12))+ + harm.hotel.fit3$coef[7]*sin(2*pi*161*(2/12))+ + harm.hotel.fit3$coef[8]*cos(2*pi*161*(3/12))+ + harm.hotel.fit3$coef[9]*sin(2*pi*161*(3/12))+ + harm.hotel.fit3$coef[10]*cos(2*pi*161*(4/12))+ + harm.hotel.fit3$coef[11]*sin(2*pi*161*(4/12))+ + harm.hotel.fit3$coef[12]*cos(2*pi*161*(5/12))+ + harm.hotel.fit3$coef[13]*sin(2*pi*161*(5/12))+ + harm.hotel.fit3$coef[14]*cos(2*pi*161*(6/12))+ + harm.hotel.fit3$coef[15]*harm.y160 > harm.y161 (Intercept) 6.736059 > harm.pred<-c(harm.y157,harm.y158,harm.y159,harm.y160,harm.y161, harm.y162) > exp(harm.pred) (Intercept) (Intercept) (Intercept) (Intercept) (Intercept) (Intercept) 783.0390 732.5716 755.2425 857.1418 842.2352 958.9339 6.2 > harm.y162<-harm.hotel.fit3$coef[1]+harm.hotel.fit3$coef[2]*162+ + harm.hotel.fit3$coef[3]*(162^2)+ + harm.hotel.fit3$coef[4]*cos(2*pi*162*(1/12))+ + harm.hotel.fit3$coef[5]*sin(2*pi*162*(1/12))+ + harm.hotel.fit3$coef[6]*cos(2*pi*162*(2/12))+ + harm.hotel.fit3$coef[7]*sin(2*pi*162*(2/12))+ + harm.hotel.fit3$coef[8]*cos(2*pi*162*(3/12))+ + harm.hotel.fit3$coef[9]*sin(2*pi*162*(3/12))+ + harm.hotel.fit3$coef[10]*cos(2*pi*162*(4/12))+ + harm.hotel.fit3$coef[11]*sin(2*pi*162*(4/12))+ + harm.hotel.fit3$coef[12]*cos(2*pi*162*(5/12))+ + harm.hotel.fit3$coef[13]*sin(2*pi*162*(5/12))+ + harm.hotel.fit3$coef[14]*cos(2*pi*162*(6/12))+ + harm.hotel.fit3$coef[15]*harm.y161 > harm.y162 (Intercept) 6.865822 Question Five: > HW.add.fit<-HoltWinters(log.occupancy.ts) > HW.add.pred<-predict(HW.add.fit,n.ahead=6) > plot(HW.add.fit,HW.add.pred) Holt-Winters filtering 7.0 Observed / Fitted 6.4 6.6 6.8 1982 1984 1986 Time 1988 1990 1992 The Holt-Winters additive model fitted to the log hotel occupancy data fits the log series very well, almost from the beginning of the series. > exp(HW.add.pred) Jan Feb Mar Apr May Jun 1993 781.0748 744.2264 755.8839 841.0109 823.1895 956.1282 > HW.mult.fit<-HoltWinters(occupancy.ts,seasonal="multiplicative") > HW.mult.pred<-predict(HW.mult.fit,n.ahead=6) > plot(HW.mult.fit,HW.mult.pred) Holt-Winters filtering 1100 Observed / Fitted 500 600 700 800 900 1000 1982 1984 1986 Time 1988 1990 1992 The Holt-Winters multiplicative model fitted to the hotel occupancy data fits the series very well, almost from the beginning of the series. > HW.mult.pred Jan Feb Mar Apr May Jun 1993 780.8115 742.9288 755.2150 841.4635 824.9081 956.4086 Question Six & Bonus Question: > actual<-c(811,732,745,844,833,935) > predictions<-data.frame(exp(sf.pred),exp(reseas.pred),exp(harm.pred), exp(HW.add.pred),HW.mult.pred,actual) > predictions exp.sf.pred. exp.reseas.pred. exp.harm.pred. fit fit.1 actual 1 783.0390 780.3271 783.0390 781.0748 780.8115 811 2 732.5716 730.8995 732.5716 744.2264 742.9288 732 3 755.2425 753.7857 755.2425 755.8839 755.2150 745 4 857.1418 855.8857 857.1418 841.0109 841.4635 844 5 842.2352 841.3158 842.2352 823.1895 824.9081 833 6 958.9339 958.4348 958.9339 956.1282 956.4086 935 > sf.RMSEP<-sqrt(1/6*sum((actual-predictions$exp.sf.pred)^2)) > sf.RMSEP [1] 16.92079 > sa.RMSEP<-sqrt(1/6*sum((actual-predictions$exp.reseas.pred)^2)) > sa.RMSEP [1] 17.21840 > harm.RMSEP<-sqrt(1/6*sum((actual-predictions$exp.harm.pred)^2)) > harm.RMSEP [1] 16.92079 > HW.add.RMSEP<-sqrt(1/6*sum((actual-predictions$fit)^2)) > HW.add.RMSEP [1] 16.90681 > HW.mult.RMSEP<-sqrt(1/6*sum((actual-predictions$fit.1)^2)) > HW.mult.RMSEP [1] 16.66018 If we use the following statistic: ˆ ∑ ( yt − yt ) 2 t =1 τ we have the “sum of the squared prediction errors”. The model with the smallest sum would be the best predicting model. We could also take the average: 1τ ˆ ∑ ( yt − yt ) 2 The model with the smallest RMSEP is the best predicting model. This is the Holt-Winters Multiplicative model (RMSEP = 16.66018) followed by the Holt-Winters Additive model (RMSEP = 16.90681), then the seasonal factor and harmonic models (RMSEPs = 16.92079), and lastly, the seasonally adjusted model is the worst predicting model (RMSEP = 17.2184). τ t =1 giving the “mean squared error of prediction”. The smaller the value of the statistic, the smaller the average squared error in our predictions. If we select the model with the smallest mean squared error of prediction, we should have the best model for prediction. If our average squared error of prediction is large, we could use the “root mean squared error of prediction” instead: ∑(y τ t =1 1 τ t ˆ − yt ) 2 This last statistic has the advantage of being on the same scale as the prediction errors. All 3 statistics penalise large deviations from the actual observations making them preferable to the absolute deviation alternatives below: ∑y t =1 τ t ˆ − yt t the sum of absolute deviations of our predictions the mean absolute deviation of our predictions ∑y τ t =1 1 τ ˆ − yt ...
View Full Document

Ask a homework question - tutors are online