LeeLeeKek-CrimePredictionUsingBusinessesAndHousingValuesInSanFrancisco[1]

Inthis

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: red in the past 12 months in that area. We can already begin to see a correlation between business density of an area and the number of crimes that occur in an area. We will classify using a support vector machine (SVM), and we first find a cutoff to determine which classifies as high crime versus what classifies as low crime. We used a cutoff of 60 crimes or less in the area to give an even split between high and low crime. Figure 2: Scatter plot of crime rate vs business density over 200m x200m squares in the SoMa region plotted in Fig. 1. Training of SVM and analysis Figure 3 plots the SVM obtained using Matlab’s SVM library and with a Gaussian kernel. In this example the feature size was reduced to only 2, the total business density and the average rental price. The plot shows the SVM trained on the entire data Figure 3: SVM trained on data from SOMA, using 2 features: business density and average rental price We see that generally, places with fewer businesses had lower crime rates, as expected from Fig. 1. However, interestingly enough, rental rates seem to be only marginally useful in predicting crime rates. To test the robustness of our SVM we used leave one out cross validation (LOOCV). Our LOOCV error was 25% using just these two features. Our training error on the entire set was 5/40 = 12.5%. We see that the two values are somewhat close, and thus we seem to be making an okay tradeoff between bias and variance, although the overall error is quite high. We next attempted to run the SVM by including another feature, that is the number of restaurants in each area. However, doing so gave no improvement in the LOOCV error, but increased the number of support vectors and increased the training error. Hence this feature was not useful in helping to predict the crime rates. (Visualization of...
View Full Document

This document was uploaded on 02/07/2014.

Ask a homework question - tutors are online