This preview shows page 1. Sign up to view the full content.
Unformatted text preview: e number of crime incidents in a certain area given a certain features specific to
the area. We treat this as a binary classification problem, where for the target variable we define “high
crime rate” as the occurrence of a threshold number of crimes over some period of time (in our data,
over 12 months). The types of crime included in our data are aggravated assault, battery, burglary,
carrying a concealed weapon, child abuse, public nuisance, conspiracy, courtesy report, credit card
theft, suspended or revoked drivers license, theft, rape, forgery of prescription, lost property, inflicting
injury on cohabitee, malicious mischief, vandalism, missing juvenile, parole violation, violation of restraining order, transportation of marijuana, traffic violation, threats against life, tampering with vehicle,
stolen vehicle, suspicious towards child or female, robber, sales of controlled substance and resisting
arrest. As an initial starting point, we looked at a region of San Francisco around the SOMA district,
roughly 1.5km2 in area, shown in Fig. 1. Figure 1: Area of San Francisco used for initial testing of our model. We divide the region into square grids of area 200 m x 200 m , or 34 blocks across, each considered
a training example. The features we considered are the number of businesses within the area and the
average rental pricing, we also specifically look at the number of restaurants in the area.
The training data is extracted from the data sets downloaded from the data.gov website, which include
crime information from the San Francisco Police Department’s Central Database Incident System
(CABLE), businesses actively registered with the San Francisco Office of the Treasurer & Tax
Collector, and rental listings on Padmapper.
Figure 2 plots the density of businesses inside a 200m x 200 m square versus number of crimes that
View Full Document
- Fall '09
- Machine Learning, 2014 Projects