This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: 36-226 Summer 2010Homework 2Solutions1. The code for this problem is given at the end.(a) The histogram is shown in Figure1below. The summary statistics areMin.1st Qu.MedianMean3rd Qu.Max.0.16.024.2321.8168.214260.0(b) This distribution is highly right skewed. Therefore a good measure of the center andspread are the median and IQR which are 24.2 and 162.2 respectively. Note that themean and SD areNOTgood estimates of the center and spread. There is a huge outlieraround 15000 (the United States) among others.(c) This data is very right skewed and measures income. The appropriate transformationin cases like this is to take the log of the data. Figure2shows the histogram of thetransformed data with the fitted normal distribution. The right side of the figure showswhat is called a qq-plot (you were not required to make one). This plot is good forchecking if the data is well approximated by the normal distribution (or any distributionreally). If the data lie along the straight line, then you have a good fit.(d) The boxplot is shown in Figure3. We can see easily that wealth is not evenly distributed,in particular the median GDP per capita in group 3 is around $43000 which is about 5times the median of group 1, 10 times the median in groups 2, 4, and 7, 30 times thatin group 5 and 50 times that in group 6. Looking at the plot, group 4 seems to havethe largest IQR, which is probably to be expected since there are a number of highlydeveloped middle eastern countries as well as a number that belong in the stone age.Africa is of course the most concentrated in poverty with the lowest median and smallestIQR. The outliers in group 3 are Luxembourg and Norway on the high side and Polandon the low side. The US? Right in the middle.Code for Question 1---------------------------------------------------------------------load("/path/to/data/set/gdpdata.Rdata")attach(data)hist(gdp,breaks=40)z = log(gdp)z = z[-168]hist(z,breaks=20,freq=FALSE)z = seq(min(z),max(z),length=100)y=dnorm(x,mean(z),sd(z))lines(x,y,col=2)boxplot(gdppc~labs,main=’Side-by-Side Boxplots’,ylab=’GDP per capita’,xlab=’Country group’)by(gdppc,labs,summary)2. Order Statistics....
View Full Document
This document was uploaded on 07/14/2011.
- Summer '09