Lecture6_2005

Lecture6_2005 - Green / Statistics Regression: Tricks of...

Info iconThis preview shows pages 1–4. Sign up to view the full content.

View Full Document Right Arrow Icon
Green // Statistics Regression: Tricks of the Trade Data for this example come from Green, Strolovitch, and Wong (1998): “Defended Neighborhoods, Integration, and Racially Motivated Crime” ( American Journal of Sociology ). The dependent variable is the number of reported hate crimes directed at African-American targets in each New York City “community district” during 1987-1995, divided by the community district’s population (in 1000s). Thus, a value of .5 means that a given community district (the average population of which is approximately 130,000) experienced 1 anti-black hate crime for every 2000 residents over this period. The distribution of anti-black hate crime per capita in each district can be expressed both statistically and graphically: Descriptive Statistics: b_hcpc Variable N Mean SE Mean StDev Minimum Q1 Median Q3 Maximum b_hcpc 51 0.1515 0.0191 0.1361 0.0168 0.0682 0.1043 0.1897 0.6218 b_hcpc Frequency 0.60 0.48 0.36 0.24 0.12 0.00 14 12 10 8 6 4 2 0 Histogram of b_hcpc
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Rates of anti-black hate crime tend to be higher in areas where whites have historically been the predominant group. pw80 b_hcpc 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 Scatterplot of Anti-Black Hate Crime by Percentage White in 1980 Let’s now estimate a simple regression of Anti-Black Hate Crime on Percentage White in 1980. Bivariate regression results The regression equation is b_hcpc = 0.0341 + 0.237 pw80 Predictor Coef SE Coef T P Constant 0.03408 0.02945 1.16 0.253 pw80 0.23660 0.04989 4.74 0.000 S = 0.113836 R-Sq = 31.5% R-Sq(adj) = 30.1% The regression confirms the impression conveyed by the scatterplot: there appears to be a strong relationship between the past proportion of whites in a neighborhood and hate crime directed against blacks.
Background image of page 2
Dummy Variables Sometimes we are interested in the effects of independent variables that have no natural metric. For example, we might be interested in variations in hate crime rates across the five boroughs of New York City (1=Brooklyn, 2=Bronx, 3=Manhattan, 4=Queens, 5=Staten Island). Clearly, the values 1 to 5 are completely arbitrary. We need some way to summarize statistical information across such artificial categories. One way is graphical. Another approach is to calculate means within each group. boro N MEAN MEDIAN TRMEAN STDEV SEMEAN b_hcpc 1 10 0.1352 0.0795 0.0982 0.1634 0.0517 2 18 0.1610 0.1072 0.1411 0.1597 0.0376 3 6 0.0834 0.0965 0.0834 0.0336 0.0137 4 14 0.1548 0.1098 0.1406 0.1075 0.0287 5 3 0.2697 0.2830 0.2697 0.1062 0.0613 A more general method, however, is to use dummy variables and multiple regression. A dummy variable is a variable that takes on the value of 1 or 0. Create one dummy variable for each category of a variable like borough, but if you are going to include an intercept (or constant term) in your regression equation, exclude one category . Here, I show it both ways. Notice the warning message that appears when one includes every category and a constant. MTB > NoConstant
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 4
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 04/07/2008 for the course STAT 102 taught by Professor Jonathanreuning-schererdonaldgreen during the Fall '05 term at Yale.

Page1 / 9

Lecture6_2005 - Green / Statistics Regression: Tricks of...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online