This preview shows page 1. Sign up to view the full content.
Unformatted text preview: red in the past 12 months in that area. We can already begin to see a correlation between
business density of an area and the number of crimes that occur in an area. We will classify using a
support vector machine (SVM), and we first find a cutoff to determine which classifies as high crime
versus what classifies as low crime. We used a cutoff of 60 crimes or less in the area to give an even
split between high and low crime. Figure 2: Scatter plot of crime rate vs business density over 200m x200m squares in the SoMa region plotted in Fig. 1. Training of SVM and analysis
Figure 3 plots the SVM obtained using Matlab’s SVM library and with a Gaussian kernel. In this
example the feature size was reduced to only 2, the total business density and the average rental price.
The plot shows the SVM trained on the entire data Figure 3: SVM trained on data from SOMA, using 2 features: business density and average rental price We see that generally, places with fewer businesses had lower crime rates, as expected from Fig. 1.
However, interestingly enough, rental rates seem to be only marginally useful in predicting crime rates.
To test the robustness of our SVM we used leave one out cross validation (LOOCV). Our LOOCV
error was 25% using just these two features. Our training error on the entire set was 5/40 = 12.5%. We
see that the two values are somewhat close, and thus we seem to be making an okay tradeoff between
bias and variance, although the overall error is quite high.
We next attempted to run the SVM by including another feature, that is the number of restaurants in
each area. However, doing so gave no improvement in the LOOCV error, but increased the number of
support vectors and increased the training error. Hence this feature was not useful in helping to predict
the crime rates. (Visualization of...
View Full Document
This document was uploaded on 02/07/2014.
- Fall '09
- Machine Learning, 2014 Projects