LectureWeek3 - Lecture 3 Linear Regression Exploratory Data...

• Notes
• 135
• 50% (2) 1 out of 2 people found this document helpful

This preview shows page 1 - 8 out of 135 pages.

Lecture 3: Linear Regression, Exploratory Data Analysis, and the Bootstrap STAT GR5206 Statistical Computing & Introduction to Data Science Cynthia Rush Columbia University September 23, 2016 Cynthia Rush Lecture 3: Regression and Graphics September 23, 2016 1 / 104
Course Notes Next week labs are meeting. Homework 1 is due on Monday at 8pm. No late homeworks accepted. Homework 2 will be assigned on Monday. Remember to use Piazza to ask questions. Cynthia Rush Lecture 3: Regression and Graphics September 23, 2016 2 / 104
Last Time Filtering . Accessing elements of a structure based on some criteria. v[v>5], m[ m[,1]!=0, ] . Lists . Elements can all be di erent types. Access like l[], l\$name . Create with list() . NA and NULL values . NA is missing data and NULL doesn’t exist. Factors and Tables . Factors is how R classifies categorical variables. Dataframes . Used for data that is organized with rows indicating cases and columns indicating variables. Importing and Exporting Data in R . Use read.csv() and read.table() depending on dataset type. The working directory. Control Statements . We studdied iteration, for loops and while loops, and if, else statements. Vectorized Operations . To be used instead of iterations. Cynthia Rush Lecture 3: Regression and Graphics September 23, 2016 3 / 104
Section I Multiple Linear Regression Cynthia Rush Lecture 3: Regression and Graphics September 23, 2016 4 / 104
Multiple Linear Regression Example A large national grocery retailer tracks productivity and costs of its facilities closely. Consider a data set obtained from a single distribution center for a one-year period. Each data point for each variable represents one week of activity. The variables included are number of cases shipped in thousands ( X 1 ), the indirect costs of labor as a percentage of total costs ( X 2 ), a qualitative predictor called holiday that is coded 1 if the week has a holiday and 0 otherwise ( X 3 ), and total labor hours ( Y ). Cynthia Rush Lecture 3: Regression and Graphics September 23, 2016 5 / 104
Multiple Linear Regression Suppose, as statisticians, we are asked to build a model to predict total labor hours in the future using this dataset. What information would be useful to provide such a model? Is there a relationship between holidays and total labor hoours? What about number of casses shipped? Indirect costs? How strong are these relationships? Is the relationship linear? Cynthia Rush Lecture 3: Regression and Graphics September 23, 2016 6 / 104
Multiple Linear Regression
• • • 