b 1 , which is an estimate of 1 . Continuing with our worked-out example, we can obtain the
standard
error of the estimate s cfw_ b 1 as follows:
cfw_
()=
sb=
MSE
xx
1.023
2074.4
4.9315 *10
i
2
12
4
scfw_b = 4.9315 *10 = 2.2207 *10 1
242
Statistical theo
Statistical Distributions in Action 201
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
1
teaching evaluation score
probability = 0.8176
density
23
Normal distribution
probability of obtaining teaching evaluation <= 4.5
4 4.5 5 6 7
Figure 6.12 Probability of obtaining a
most transit enthusiasts hate to acknowledge, have largely been ignored by those active in planning
circles and municipal politics. Instead, an ad nauseam campaign against excessive commute
times, wrongly attributed to cars, has ensued.
The dissatisfactio
larger family sizes. In comparison, the native-born population reported an average household
size of 2.6 persons whereas the size of India-born immigrant households was around 3.5
persons. The difference between immigrants from India and other South Asian
challenge is that there are billions of queries so it is hard to determine exactly which queries are
the most predictive for a particular purpose. Google Trends classifies the queries into categories,
which helps a little, but even then we have hundreds o
Italy
0 5 10 15 20
Figure 3.24 Health spending levels in developed economies
Source: WHO (Data obtained from Guardian ) 43
A large number of Pakistanis are also included in the 45 million uninsured in the U.S. It is
rather odd to see the 10,000-plus Pakis
on 400-plus courses taught by 98 instructors from the University of Texas. They obtained the
average teaching evaluation scores recorded by students in the teaching evaluation surveys. The
authors also constituted a panel of students who reviewed photogra
Imagine for a second if the reverse were true. That is, the children born to taller parents
ended up being taller than their own tall parents. The average height of the population would
therefore increase in every successive generation. A few generations
Prob>F 0 0
Adj R-sqrd 0.661 0.661
Standard errors in parentheses
* p<0.01, * p<0.05, * p<0.1
Multicollinearity
Multicollinearity arises when strong correlation exists between some explanatory variables. This
happens when more than one explanatory variable
density increases, so do the commute times.
It is important to note that any discussion on commuting and commute times in North
America cannot take place without an explicit recognition of the role played by income and race.
It is well known that low-inco
Model 3 in Table 7.21 reports the results for the specification with three regressors, namely
weekly household income, adults, and the children in households. The resulting three coefficients
are all statistically significant and positive. The model can b
analysis because the order of alternatives is rather arbitrary. There are, however, scenarios where
the ordering of outcomes matters. Consider, for instance, Table 8.1 , where households car
ownership
is presented.
Table 8.1 Example of an Ordered Categori
2
12
2
2
1 2 Equation 6.9
Get the test statistics via Equation 6.10 :
=
t
xx
sdev
12
Equation 6.10
I use the same example of teaching evaluations to determine the difference between the
evaluation scores of male and female instructors assuming equal varia
groups.
Let us test the average teaching evaluations for a discretized variable for beauty, which in
raw form is a continuous variable. I convert the continuous variable into three categories namely:
low beauty, average looking, and good looking. The R co
outreg2 using tab8-10.doc, cttop(eform with inc_10k) append label eform
addstat(Pseudo R-squared, 'e(r2_p)')
Table 8.10 Exponentiated Coefficients for Logit Models
Variables
(1)
eform with Income
1 If Smokes
0 Otherwise
(2)
eform with inc_10k
1 If Smokes
Regression in Action 275
appearance is statistically significant and positive even when I hold gender, minority status, English
proficiency, tenure status, and attributes of courses constant in the model.
I also notice an increase in the model fit when I
342 Chapter 8 To Be or Not to Be
Figure 8.26 Linear regression dialog box in SPSS
Figure 8.27 shows the resulting output. Notice that the output (especially estimated coefficients
and standard errors) is identical to that reported in Table 8.7 (Model 1),
reg hprice bdrms sqrf
outreg2 using 88h_4.doc, label append
reg hprice bdrms lotsize sqrf
outreg2 using 88h_4.doc, label append
reg hprice bdrms lotsize sqrf colonial
outreg2 using 88h_4.doc, label append
Table 7.9 An Incremental Approach to Building a Mo
the probability to smoke if the price of cigarettes is increased to 77 cents? Zelig allows one to
run such simulations with ease. The output of the simulation is presented in Figure 8.16 where I
have highlighted key values of interest. The first set of si
Now returning to the example of labor force survey of married women, I set y = 1 if the
woman is in the labor force and y = 0 if she is unemployed. The independent variables include
number of children, wifes education, and expected income. Now consider th
households members 2.776 1.268 1 6
no children 0.560 0.497 0 1
The data set contains information on 1,000 households from Canada. The average number
of adults is 1.95 persons per household. The minimum number of adults is 1 and the maximum
is 3. There are
I believe that teaching evaluations measured students subjective appreciation of the course
and the instructor, and might not necessarily translate into teaching productivity. Furthermore,
I must point out the inherent disconnect in the Hamermesh study be
Source: Canadian Real Estate Association
Another important factor to consider, which was also highlighted by Ms. Perkins in the
Globe , is the record low mortgage rates that facilitate borrowing larger amounts for real estate
acquisitions. Figure 3.12 con
Teaching Ratings 159
teaching evaluation score
2.0 2.5 3.0 3.5 4.0 4.5 5.0
no yes
minority status
Figure 5.19 Box plot for teaching scores, differentiated by instructors minority status
The box plot presents several statistical measures in a very compact
and computer science to come up with data science. The debate about how different data science
is from traditional statistics will continue for some time.
My definition is radically different from others who view data scientists in the narrow
context of a
0.06
0.08
0.12
0.1
2
Outcome of a roll of two dice
Density
Probability
3 4 5 6 7 8 9 10 11 12
Figure 6.4 Histogram of the outcomes for rolling two dice
0
1.2
0.2
1
0.8
0.6
0.4
2
Outcome of a roll of two dice
Density
Prob <= x
3 4 5 6 7 8 9 10 11 12
Figure
unlock the insights of data and tell a fantastic story via the data. 20 Aint I glad to see storytelling
being mentioned by Dr. Patil as a key characteristic of data science?
I find it quite surprising that even when the worlds largest big data firm and th
100
80
60
0 20 40 60 0 20 40 60
households with children (%)
median commute time (min)
0
20
0 20 40 60
Figure 5.46 Scatter plot of commute times and households with children in a neighborhood
I observe three different relationships for the three types of
edu/zelig .
15. Imai, K., King, G., and Lau, O. (2008). Toward A Common Framework for Statistical Analysis
and Development. Journal of Computational and Graphical Statistics . Vol. 17, No. 4 (December),
pp. 892913.
16. Pakula, A. J., Barish, K., Gerrity,
ones use of the Internet. I would treat these four responses with suspicion and convert the do not
knows into missing values.
Table 4.2 Tabulation on the Use of the Internet
User status Internet
yes 358
no 638
dont know 4
refused 0
Lets look at the distri
Regression in Action 263
explanatory variables used in the model. The adjusted R-squared is more conservative than the
R-squared because it penalizes the R-squared for using additional explanatory variables.
The following Stata code regresses the housing
1.8, we will know that it falls in the rejection region (see Figure 6.18 ) and we will reject the null
hypothesis that the difference in means is less than zero.
Figure 6.18 One-tailed test (right-tail)
210 Chapter 6 Hypothetically Speaking
t-distribution