**Unformatted text preview: ** Stat 104: Quantitative Methods for Economics Homework 1: Due Friday, September 18 Melissa Kaplan 1) A company has 30 employees, including a director. The lowest salary among the 30 employees is $22,000. The director’s salary is $180,000, which is more than twice as much as anyone else’s salary. Decide for each of the following statements about the 30 salaries whether it is true, false, or you cannot tell on the basis of the information at hand. a) The average salary is below $60,000. Cant tell True Can't Tell False b) The median salary is below $60,000. True Can't Tell False Cant tell c) If all salaries are increased by $1,000, that adds $1,000 True Can't Tell False to the average. True d) If the director's salary is doubled, and all other salaries True Can't Tell False remain the same, that increases the average salary. True e) If the director's salary is doubled, and all other salaries True Can't Tell False remain the same, that increases the median salary. False f) The standard deviation of the salaries is larger True Can't Tell False than $180,000. False 2) Data set sexpart.dta is the sexual partner dataset mentioned in class. Load it into Stata using the command use a) Compare the standard deviation and IQR as measures of spread on the full data set. Which measure do you think is more appropriate to describe the spread in the data set? 1 . summarize x, detail
x
Percentiles Smallest 1% 0 0 5% 1 0 10% 1 0 Obs 105 25% 1 0 Sum of Wgt. 105 50% 1
Largest Mean 64.92381 Std. Dev. 585.1631
342415.9 75% 6 45 90% 15 150 Variance 95% 30 150 Skewness 10.0797 99% 150 6000 Kurtosis 102.7355 .
Sd=585.1631 IQR=75-‐25=5 The IQR is more appropriate because the SD is very off due to extreme outliers. b) Compare which points are flagged as outliers using the two methods discussed in class (Z score and boxplot method). . list if x<mx-1.96*sdx | x>mx+1.96*sdx 70. x mx sdx 6000 64.92381 585.1631 . According to the z-‐score method anything that’s over 2 SD away from the mean is an outlier=6000 is the only outlier. According to the boxplot method: IQR=Q3-‐Q1=6-‐1=5 Q1-‐5(1.5)=-‐6.5, nothing is below -‐6.5. Q3+5(1.5)=13.5. (15,15,18,19,30,30,40,45,150,150,6000) are all outliers. c) Remove the outliers flagged using the boxplot method. Recalculate the IQR and standard deviation of this smaller dataset. Are the values closer to each other now? IQR=3 SD=3.36 The values are closer together now. 3) A portfolio that is 30% foreign and 70% American has a mean rate of return of about 15.8%, with a standard deviation of 14.3%. a) According to Chebyshev’s Inequality, at least 75% of returns will be between what values? -‐12.8% and 44.4% 2 b) According to Chebyshev’s Inequality, at least 88.9% of returns will be between what two values? -‐27.1% and 58.7% c) Should an investor be surprised if she has a negative rate of return? Why? An investor should not be surprised if she has a negative rate of return, as a large portion of the spread is negative as 0 is a little more than one sd away from the mean. d) If we were going to use the Empirical Rule, what would we need to assume about the returns? We would need to assume the returns were mound shaped. 4) A class survey was done where students were asked how many coins they had on them at that moment, how many facebook friends they had and how many states in the United Stated they had visited. Fill in the blanks (show your work). . summarize coins states friends Variable | Obs Mean Std. Dev. Min Max -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ coins | 130 6.976923 11.53657 0 73 states | 130 13.92308 6.455111 2 35 friends | 130 716.0231 628.1866 0 3854 . correlate coins states friends (obs=130) | coins states friends -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ coins | 1.0000 states | 0.0392 1.0000 friends | -‐0.0766 Blank 1 1.0000 . correlate coins states friends,covariance (obs=130) | coins states friends -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐+-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ coins | Blank 2 states | 2.92069 41.6685 friends | -‐555.325 931.777 394618 Blank 1 = correlation of states and friends. Sxy/(SxSy)=931.777/(6.455*628.1866)=.23 3 Blank 2 = covariance of coin and coin is just the sd=11.537 5) This question moves us in the direction of understanding that just because two variables are uncorrelated does not mean they are independent. a) Explain in words what a correlation of 0 implies. This means that the two variables have no correlation association of any kind b) Load the blas data set into Stata and find the correlation of X and Y use . correlate Y X
(obs=20)
Y
Y X 1.0000 0.0000
1.0000 The correlation is O. c) Plot the data-‐does it agree with your definition? 0 .2 Y
.4 .6 .8 X -1 -.5 0 X .5 1 There is no linear correlation, however there is a parabolic correlation. 6) We have state by state data (plus Washington, DC) on percentage of residents over the age of 25 who have at least a bachelor’s degree and median salary. Load this data into Stata with the command use a) What is the correlation between these two variables? 4 . correlate bach income
(obs=51)
bach
bach 1.0000 income 0.7542 income 1.0000 The correlation is .7542. b) Produce a scatter plot of the data with percentage with bachelor’s degree on the X axis. Notice the outlier? Who does that point belong to? Can you think of any reasons why this location might have a high percentage of residents with a bachelor’s degree but a lower than expected median income? 10 20 bach
30 40 50 . 40000 50000 60000 70000 income Washington D.C. is the outlier. This makes sense because DC has a lot of politicians with degrees but they get payed very little. c) Remove the outlier point found in (b) and recalculate the correlation. How do the two correlation values compare? What does this illustrate about correlation? . correlate bach income
(obs=50)
bach
bach 1.0000 income 0.8206 income 1.0000 The correlation is now .86. Its higher now that the outlier is removed. 5 7) The mean rate of return and standard deviation of Stocks 1 and 2 are given below: Stock 1 Stock 2 Mean 8 % 11 % Standard deviation 10 % 15 % a) Given that the correlation between stocks is -‐1.0, find risk (standard deviation) and (mean) return of a portfolio that has 60% in Stock 1 and 40% in stock 2. Sxy=rxy(Sx)(Sy)=(-‐1)(10)(15)=-‐150 Sd=square root of var, var=(.6)2(10)2+(.4)2(15)2+2(.6)(.4)(-‐150)=0 M=(.6)(8)+(.4)(11)=9.2 b) Given that the correlation between stocks is 0, find risk (standard deviation) and (mean) return of a portfolio that has 60% in Stock 1 and 40% in stock 2. If correlation=0, than covariance=0. Var==(.6)2(10)2+(.4)2(15)2=72, so sd=8.49 Mean=(.6)(8)+(.4)(11)=9.2 c) Given that the correlation between stocks is 1, find risk (standard deviation) and (mean) return of a portfolio that has 60% in Stock 1 and 40% in stock 2. Sd=square root of var, var=(.6)2(10)2+(.4)2(15)2+2(.6)(.4)(150)=144. Sd=12 Mean=(.6)(8)+(.4)(11)=9.2 d) What appears to be the relationship between correlation and risk? The stronger the correlation the higher the risk. 8) One can show mathematically that if two stocks have correlation of -‐1, then if one s
puts 2 x100% of their money in stock 1, and the rest in stock 2, the resulting s1 + s2
portfolio with have 0 risk. But will the portfolio have a positive return? The file qqq.dta has daily return data for stocks QID and QLD [I for inverse and L for long]. You may read this file into stata using the command use a) Verify that QLD and QID have a correlation that is essentially -‐1 . correlate qld qid
(obs=2,039)
qld
qld 1.0000 qid -0.9961 qid 1.0000 b) What weights are required for QLD and QID to have a 0 risk (standard deviation) portfolio? S1=.0282283 S2=.0282328 6 s2
x100% >>.0282328/(.0282283+.0282328)=50% in QLD and 50% in QID s1 + s2 . summarize, detail
QLD
Percentiles
1% -.0818448 Smallest
-.1929524 5% -.0457889 -.18279 10% -.0304119 -.1484018 Obs 2,039 25% -.0107411 -.1172538 Sum of Wgt. 2,039 50% .0021327
Largest Mean .0010203 Std. Dev. .0282283 75% .0141661 .119349 90% .0288205 .1221766 Variance .0007968 95% .0403503 .2085202 Skewness .0370598 99% .0803043 .2457729 Kurtosis 11.13232 QID
Percentiles Smallest 1% -.0787773 -.2298287 5% -.0405255 -.2158285 10% -.0290089 -.1243274 Obs 2,039 25% -.0143279 -.1230052 Sum of Wgt. 2,039 50% -.0022036 Mean
Largest 75% Std. Dev. -.0011307
.0282328 .0105893 .1201585 90% .03053 .1517696 Variance .0007971 95% .0457752 .1746141 Skewness -.0068944 99% .0828717 .1995293 Kurtosis 10.75025 . c) Compute the mean and standard deviation of the portfolio from part (b). What do you find? Mean= Rbar=(.5)(.0282283)+(.5)(.0282328)=-‐.0000552 . correlate qld qid, covariance
(obs=2,039)
qld
qld .000797 qid -.000794 qid .000797 variance=(.5)2(.0282283)2+(.5)2(.0282328)2+2(.5)(.5)(-‐.000794)= .0197 sd=square root of variance= .14 9) The file myrets2015.dta has 45rows of monthly returns data for MRK, KORS, LULU and SPY. That is, each row represents the monthly return for each of the four stocks. Load it into Stata using the command 7 use a) What company does each symbol represent? Go to finance.yahoo.com to find out. MRK= Merck &Co. KORS= Michael Kors Holdings. LULU=Lululemon Athletica Inc. SPY= SPDR S&P 500. b) What is the average monthly return for each of the stocks? What is the standard deviation for the returns of the stocks? KORS=m= .0156299, sd= .1078644 LULU=m= .0091119, sd= .1132002 SPY=m= .0115786, sd= .0283794 MRK= m=.0108976, sd=.0449689 What is the correlation between MRK and KORS, MRK and LULU and KORS and LULU ? . correlate mrk kors
(obs=45)
mrk
mrk 1.0000 kors 0.0708 kors 1.0000 .
correlation between MRK and KORS= .0708 . correlate mrk lulu
(obs=45)
mrk
mrk 1.0000 lulu -0.1073 lulu 1.0000 correlation between MRK and Lulu=-‐.1073 . correlate kors lulu
(obs=45)
kors
kors 1.0000 lulu 0.2416 lulu 1.0000 correlation between Kors and Lulu=.2416 c) Find the Beta for each stock. That is run a regression of each stock return as the Y variable and SPY returns as the X variable. Beta is the slope from this regression. Rank the stocks based on their Beta values (smallest to larges). Is the order the same as if you ranked them on their standard deviations from smallest to largest? 8 . regress mrk spy
Source SS df MS Number of obs = 45 F(1, 43) = 7.67 Model .013469968 1 .013469968 Prob > F = 0.0083 Residual .075506748 43 .001755971 R-squared = 0.1514 Adj R-squared = 0.1317 Root MSE = .0419 Total .088976715 44 Std. Err. .002022198 mrk Coef. t P>|t| [95% Conf. Interval] spy .6165287 .2226018 2.77 0.008 .1676094 1.065448 _cons .0037591 .0067576 0.56 0.581 -.0098688 .017387 .
.6165 . regress kors spy
Source SS df MS Number of obs = 45 F(1, 43) = 8.05 Model .080696471 1 .080696471 Prob > F = 0.0069 Residual .431231354 43 .010028636 R-squared = 0.1576 Adj R-squared = 0.1380 Root MSE = .10014 Total .511927825 kors Coef. 44 Std. Err. .011634723 t P>|t| [95% Conf. Interval] spy 1.509028 .5319747 2.84 0.007 .4361988 2.581857 _cons -.0018425 .0161492 -0.11 0.910 -.0344106 .0307255 . 1.509 . regress lulu spy
Source SS df MS Number of obs = 45 F(1, 43) = 0.84
0.3638 Model .010837061 1 .010837061 Prob > F = Residual .552990996 43 .012860256 R-squared = 0.0192 Adj R-squared = -0.0036 Root MSE = .1134 Total .563828057 44 Std. Err. .012814274 lulu Coef. t P>|t| [95% Conf. Interval] spy .5530008 .6024137 0.92 0.364 -.6618821 1.767884 _cons .002709 .0182876 0.15 0.883 -.0341714 .0395894 . .553. smallest to largest Beta: Lulu(.553), Mrk(.6165), Kors(1.509). Smallest to largest Sd: Mrk, Kors, Lulu 9 -.2 0 .2 .4 The order is not the same. d) Compare the Betas from part (c) to the Betas obtained from finance.yahoo.com. Are they about the same value? [you have up to the minute data and Yahoo doesn’t update that frequently] Yahoo: MRK= 0.580286
KORS= 0.769783 Lulu= 0.266956
No they are not the same value. e) Create a side by side boxplot for these three stocks. How do they compare? Which looks the riskiest, which the safest? MRK
LULU KORS MRK clearly looks the safest as there are no outliers and the shape of the boxplot is symmetric and is very centered. KORS is the riskiest as it has a large spread, multiple outliers, and its very skewed to the right. f) Give the expected return and standard deviation of all the possible two stock portfolios (MRK,KORS), (MRK,LULU), (KORS,LULU) with equal amounts invested in each stock (weights of .5 for each stock). MRK, KORS Variance= (.5)2(.045)2+(.5)2(.108)2+2(.5)(.5)(.000344)=.00359, sd=.06 Mean= .5(.011)+.5(.016)=.014 MRK, LULU Variance=(.5)2(.045)2+(.5)2(.113)2+2(.5)(.5)(-‐.00546)=.00097, sd=.03 Mean=.5(.011)+.5(.009)=.005 KORS, LULU Variance=(.5)2(.113)2+(.5)2(.108)2+2(.5)(.5)(.003)=.266, sd=.516 Mean=.5(.009)+.5(.016)=.085 g) Rank the three portfolios based on their standard deviation. How do they compare with holding one of the individual stocks? KORS, LULU (sd=.516), MRK, KORS (sd=.06), MRK, LULU (sd=.03) 10 Holding on original stock: LULU=m= .0091119, sd= .1132002 KORS=m= .0156299, sd= .1078644 MRK= m=.0108976, sd=.0449689 Generally, the sd is lower. This makes sense because when you have two stocks it balances out the risk. 10)In class we showed how one could split a data set into two groups using the median of the X values, then find points ( X 1 , Y1 ) and ( X 2 , Y2 ) . We then fit a line between these two points using the familiar Y − Y1 = m( X − X 1 ) formula where m = (Y2 − Y1 ) / ( X 2 − X1 ) . This can be done in Stata as follows (there are fancier ways to do this in Stata-‐we’re just showing you one way below). For this example we will use the data set onlineedu.dta. One of the biggest changes in higher education in recent years has been the growth of online universities. The Online Education Database is an independent organization whose mission is to build a comprehensive list of the top accredited online colleges. The data set onlineedu.xls shows the retention rate (%) and the graduation rate (%) for 29 online colleges (Online Education Database website, January 2009). We want to model graduation rate as a function of retention rate. a) Load the data into Stata (of course!) use b) Find the median of the X’s (retention rate) . summarize rr,detail RR -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ Percentiles Smallest 1% 4 4 5% 7 7 10% 29 29 Obs 29 25% 45 33 Sum of Wgt. 29 50% 60 Mean 57.41379 Largest Std. Dev. 23.24023 75% 69 78 90% 95 95 Variance 540.1084 95% 100 100 Skewness -‐.2936542 11 99% 100 100 Kurtosis 3.185897 So the median of the X’s equals 60. c) Find the means for values below the median . summarize rr gr if rr<=60,detail RR -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ Percentiles Smallest 1% 4 4 5% 4 7 10% 7 29 Obs 15 25% 33 33 Sum of Wgt. 15 50% 45 Mean 40.6 Largest Std. Dev. 16.826 75% 51 51 90% 60 54 Variance 283.1143 95% 60 60 Skewness -‐1.069821 99% 60 60 Kurtosis 3.279719 GR -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ Percentiles Smallest 1% 25 25 5% 25 25 10% 25 28 Obs 15 25% 32 32 Sum of Wgt. 15 50% 36 Mean 37.66667 Largest Std. Dev. 8.582929 75% 45 45 90% 48 47 Variance 73.66667 95% 53 48 Skewness .0990315 99% 53 53 Kurtosis 1.98994 So ( X 1 , Y1 ) =(40.6,37.67) d) Find the means for values above the median 12 . summarize rr gr if rr>60,detail RR -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ Percentiles Smallest 1% 62 62 5% 62 63 10% 63 63 Obs 14 25% 65 65 Sum of Wgt. 14 50% 71 Mean 75.42857 Largest Std. Dev. 13.51759 75% 78 78 90% 100 95 Variance 182.7253 95% 100 100 Skewness .9216191 99% 100 100 Kurtosis 2.425894 GR -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐ Percentiles Smallest 1% 34 34 5% 34 36 10% 36 36 Obs 14 25% 37 37 Sum of Wgt. 14 50% 48 Mean 46.14286 Largest Std. Dev. 9.50188 75% 55 55 90% 57 56 Variance 90.28571 95% 61 57 Skewness .0594538 99% 61 61 Kurtosis 1.437486 So ( X 2 , Y2 ) =(75.43,46.14) e) Find the line between the points ( X 1 , Y1 ) and ( X 2 , Y2 ) . ( X 1 , Y1 ) =(40.6,37.67) ( X 2 , Y2 ) =(75.43,46.14) (46.14 − 37.67)
= 0.24 (75.43 − 40.6)
Y − Y1 = m( X − X1 ) Y = 37.67 + 0.24( X − 40.6) = 27.93 + 0.24 X So the equation of the fitted line is Y=27.93+0.24X m = (Y2 − Y1 ) / ( X 2 − X1 ) = 13 f) Calculate the fitted values . generate fit1 = 27.93+0.24*rr g) Plot the data with the fitted line 20 30 40 50 60 You can graph your resulting line in Stata on top of the scatter plot as follows (the || command in Stata lets one stack graphs on top of each other): . scatter gr rr || line fit1 rr 0 20 40
GR RR 60 80 100 fit1 h) Now do this two-‐point method using the medians in each subgroup instead of the means. I.
Report the equation of this new line. X1,Y1=(45,36) X2,Y2=(71,48) Slope=m=(48-‐36)/(71-‐45)=.4615 Y=.4615x+15.23 II.
Compare this new line to the one previously found using the means in each sub group. Are the lines about the same or different? Original line= Y=27.93+0.24X The lines are very different. III.
Create a plot that shows the data, and the two lines on it. Make sure it’s clear which line is which. 14 60
50
40
30
20 0 20 40 60 80 100 RR(%)
GR(%)
fit2 fit1 FIT 1 is the original line and fit 2 is the line using the two point method with the medians. 11)The owner of a moving company typically has his most experienced manager predict the total number of labor hours that will be required to complete an upcoming move. This approach has proved useful in the past, but the owner has the business objective of developing a more accurate method of predicting labor hours (Y). In a preliminary effort to provide a more accurate method, the owner has decided to use the number of cubic feet moved as the independent variable (X) and has collected data for 36 moves in which the origin and destination were within the borough of Manhattan in New York City and in which the travel time was an insignificant portion of the hours worked. The data may be loaded into Stata as follows use Use Stata to answer the questions below. a) Create a scatter diagram of the data. 15 80
60
Hours
40
20
0 0 500 1000 1500 Feet b) Fit a least squares regression line to this data and interpret the slope (stata command reg). Y=.05x-‐2.37 The slope means that for every extra cubic feet moved 3 minutes is added to the labor time of the move. c) Predict the labor hours for a 500 cubic feet move using the estimated regression equation developed in part (b). Y=.05(500)-‐2.37=2.63 hours 12)A stock's (or mutual fund's) β (beta) measures the relationship between the stock's rate of return and the average rate of return for the market as a whole. Now beta is easy to compute. It is the slope from a simple linear regression [what we refer to as ! ], where the dependent variable (Y) is the stock's rate of return and the independent variable is the market rate of return (X) (usually taken as the rate of return of the S&P 500 or the Nasdaq). Stocks with beta values greater than 1 are considered ``aggressive'' and stocks with beta less than 1 are considered defensive. A stock with a beta value near 1 is called a neutral security. As an...

View
Full Document

- Spring '15
- Michael Parzen
- Pearson product-moment correlation coefficient, data set, Std. Dev, KORS