Stat 512 – 2
Solutions to Homework #3
Dr. Simonsen
Due Wednesday, September 14, 2005.
For the next 3 questions use the grade point average data described in the text with KNNL problem 1.19.
1.
Describe the distribution of the explanatory variable.
Show the plots and output that were helpful in
learning about this variable.
[Note to grader:
this is an openended question.
As long as there is a good description, they do not have
to have exactly the same results as in these solutions.]
Using PROC UNIVARIATE we see there are 120 observations ranging between 14 and 35 with a mean
of 24.725 and median of 25; their standard deviation is 4.472.
There do not appear to be any extreme
observations (i.e., ones far away from the others) in the histogram plot below.
An examination of the
boxplot, histogram, and qqplot (not all of these are necessary) shows that the distribution of the test scores
appears to be reasonably symmetric and approximately normal.
The UNIVARIATE Procedure
Variable:
testscore
Moments
N
120
Sum Weights
120
Mean
24.725
Sum Observations
2967
Std Deviation
4.47206549
Variance
19.9993697
Skewness
0.1363553
Kurtosis
0.5596968
Uncorrected SS
75739
Corrected SS
2379.925
Coeff Variation
18.0872214
Std Error Mean
0.40824186
Basic Statistical Measures
Location
Variability
Mean
24.72500
Std Deviation
4.47207
Median
25.00000
Variance
19.99937
Mode
24.00000
Range
21.00000
Interquartile Range
7.00000
Extreme Observations
Lowest
Highest
Value
Obs
Value
Obs
14
2
32
84
15
48
32
104
16
119
33
15
16
52
34
80
16
32
35
106
Stem Leaf
#
Boxplot
35 0
1

34 0
1

33 0
1

32 0000
4

31 0000
4

30 0000000
7

29 0000000
7

28 0000000000
10
++
27 0000000000
10


26 0000000000
10


25 0000000000
10
**
24 000000000000
12

+

23 00000
5


This preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
22 0000
4


21 000000000
9
++
20 0000000000
10

19 000
3

18 0000000
7

17

16 000
3

15 0
1

14 0
1

++++
2.
Run the linear regression to predict GPA from the entrance test score and obtain the residuals (do
not include a list of the residuals in your solution).
(a)
Verify that the sum of the residuals is zero by running
PROC UNIVARIATE
with the output from
the regression.
The UNIVARIATE Procedure
Variable:
resid
(Residual)
Moments
N
120
Sum Weights
120
Mean
0
Sum Observations
0
Std Deviation
0.62050134
Variance
0.38502191
Skewness
1.0067279
Kurtosis
2.50187662
Uncorrected SS
45.8176078
Corrected SS
45.8176078
Coeff Variation
.
Std Error Mean
0.05664376