# Project 22 - Nicholaus Johnson 1 Examine the variables and...

Nicholaus Johnson 1. Examine the variables and their relationships to each other: a) Profit is roughly normal with no pattern of skeweness (peaks are in the middle of the graph). Area is unimodal (has one peak) and right skewed (peak is to the left of the graph). Population is unimodal (has one peak) and is slightly left skewed (peak is in the to the right of the graph). Outlet is unimodal (has one peak) and roughly normal (peaks are in the middle of the graph). Commission is bimodal (has two peaks) and has no pattern of skewness because it is an indicator variable. b) Summary Statistics: Column Mean Variance Std. Dev. Median Min Max PROFIT 1120.0392 128571.32 358.56842 1032 188 1786 AREA 13.064902 49.50234 7.03579 11.2 6.12 40.34 POPN 3.7531374 1.1808249 1.0866576 3.887 0.297 5.744 OUTLETS 174.0196 933.6596 30.555843 174 85 234 COMMIS 0.627451 0.23843138 0.48829436 1 0 1

This table shows the significant summary statistics like mean (average), variance (how much points of data deviate from the average), standard deviation (how much data deviates as a whole from the mean), median (the middle number), the minimum and the maximum for each of the variables. These help us obtain a rough estimate of what we should see in the data: Net profit margin has a mean of about \$1,120,000 (net profit is in 1000’s) and a standard deviation of \$359. The minimum value for profit is \$188 and the maximum is \$1786. This shows that all salespeople do make a net profit; none of them incur a loss. Area has a mean of 13.065 square miles, and a standard deviation of about 7.036 sq miles. These representatives cover areas of up to 40.340 sq miles. Population has a mean of 3.75 million people and a standard deviation of 1.09 million people. The heavy majority of the districts hold 2-5 million people, with the smallest district holding 297,000 and the largest holding 5.74 million people. Number of outlets has a mean of 174 outlets, with a standard deviation of about 31 outlets. The districts range from having 85 to 234 outlets in them. c) Profit and area seems to have a negatively curved moderately strong linear relationship (the dots point downward along a straight line and curve up). Profit and population seems to have a positive somewhat linear relationship (the dots point upward along a straight line). Profit and outlets have a weak positively curved relationship (the dots point upward along a straight line). d) Correlation matrix: PROFIT AREA POPN OUTLETS AREA -0.69585794 POPN 0.6221696 -0.8389509 OUTLETS 0.45451215 -0.6405834 0.7429463
COMMIS 0.26555324 0.13560116 -0.26944116 -0.30914697 This correlation matrix shows the relationship between one variable and every other variable. Here we see the correlation between area and population is -.839. Since a correlation close to .9 or -.9 is considered high and may possibly cause multicollinearity, these two explanatory variables are ones we may want to watch more closely. 2. Perform a Multiple Linear Regression analysis using all the explanatory variables

