This preview shows page 1. Sign up to view the full content.
Unformatted text preview: the data was not shown as there was no good way to show the data
and separating hyperplane in 3 dimensions).
Thus we focus our analysis on the 2feature model and try to increase the the number of training
examples.We expanded our area of interest and instead looked at two locations: central area of SF
(Japantown + Divisadero) and south SF. We combined these two areas to form another data set with
~1000 training examples. We now used a cutoff of 30 crimes. Training on this data set gave us a
training error of 16%. The plot of this data is shown below in Fig 4. Fig. 4: SVM trained on larger dataset, including data from Central SF and Southern SF We can see a very similar trend in this data, where it seems rental prices only have a marginal ability to
predict crime. However, even the qualitative trends seem consistent with those of the smaller region.
Notably, there always seems to be a small “bump” in the center where higher crime exists in areas of
median rental price. This may just be variance due to the fitting. If it is true though, it may also be due to
many factors: for example, reported crimes may be lower in poorer locations. However, the most
important predictor seemed to be business density consistent with our smaller data set.
To obtain a more realistic idea of what the testing error of our model might be, we used the model
obtained by training over the data from South San Francisco and Central San Francisco and tested it
over SoMa data. Doing so gave us an error rate of 13/40 = 32.5%. Comparing with that of our training
error, we see an even greater discrepancy, indicating even higher variance.
The results of our model demonstrate that there is indeed some predictive power (if...
View Full Document
- Fall '09
- Machine Learning, 2014 Projects