Stat 102 Project 1
Spring 2012
Due before 5:00 on Friday, Feb 24
, in 400 JMHH
You may work individually or collaboratively in groups of up to
students per group.
The group may consist of students from any of the three sections of stat102 classes. Each
“collaborative” of 1 to 3 students should hand in
project writeup.
Groups should work independently of each other
. (You may consult instructors or
TAs if you need help.)
NO LATE PROJECTS WILL BE ACCEPTED!
You may use this file as part of your
Data
: The data needed is in our webcafe site under PROJECTS. The data sets
available to you give information about the performance in 1986 and the career
performance up through 1986, for all fullseason major league baseball players in 1987
(except pitchers) who also played in the major leagues in 1986, This includes information
such as batting average, number of hits, number of runs scored, etc., as well as the
number of years played in the major leagues through 1986. (You don't need to understand
anything about baseball to do a good job, though such knowledge might make the project
more fun.) The first, larger data set also contains a column for the players’ 1986 salaries
(in $1,000). [See the footnote on the last page for a little more information about the data.]
The data are in two files. The larger file
(“non_Phil_players_baseball_project_2011”) has all the data for all of the players
except
for those playing for Philadelphia. The smaller file
(“Phil_players_baseball_project_2011”) has the data for only the Philadelphia players
except that the column of data containing salaries has been omitted
Goal of the Project
: The end goal of this project is to use the larger file to find a
xvariable to use in a linear regression to predict salary (or log salary). You want
to choose this one xvariable so as to produce the best possible regression prediction. You
will then use your prediction equation to predict the 1987 salary of
Von Hayes
, a
Philadelphia player. You’ll also be asked for confidence intervals for this prediction.
[Note: You might be able to discover his 1987 salary from some other data source.
You’re welcome to look out of curiosity, but that is not the point of this exercise and
you’re not allowed to use that information in your answers on this project.]
You will also use your equation to predict the average salary for nonpitchers on a
team having a roster with players having xvariable like those on the Philadelphia team.
The following (fictional) scenario explains why such information might be useful.
