sum() is a numeric vector, then the result will be the sum of its elements, but if the
argument is a logical vector, then the result will be the number of TRUE elements.
Also, the plot function will produce an ordinary scatterplot if its x,y arguments are
1.
Find the means and standard deviations for each variable.
CIG:
Mean: 24.91409
S.D.:
5.573286
R Code:
cigarette = smoking[,"CIG"]
mean.cig = mean(cigarette)
sd.cig = sd(cigarette)
mean.cig
sd.cig
BLAD:
Mean:
Wenn
PART I
Find the means and standard deviations for each variable.
Mean
CIG
BLAD
LUNG
KID
LEUK
24.91409
4.121136
19.653182
2.794545
6.829773
Standard Deviation
CIG
5.573286
BLAD
0.9649249
LUNG
4.228122
KID
0.5190799
LEUK
0.6382589
Example R scripts
This section contains direct links to R scripts discussed in class.
Jan. 23, 2015:
http:/www.utdallas.edu/~ammann/stat3355scripts/ex01232015.r
Here is the png graphic generated by that script:
Savings rate data, Jan. 26, 2015:
http:/www.
11
In some applications, the proportions within the subintervals are of greater interest
than the frequencies. In these cases a relative frequency histrogam can be used instead. In
this case the vertical axis is rescaled by dividing the frequencies by t
Mazda MPV V6
Mitsubishi Wagon 4
Nissan Axxess 4
Nissan Van 4
3735
3415
3185
3690
181
143
146
146
19
20
20
19
5.263158
5.000000
5.000000
5.263158
Van
Van
Van
Van
The 4 plots given below represent histograms of Weight with the mean of Weight superimposed. T
See sections 4.3, 4.4 in the textbook for details and addtional examples.
Measures of Association
The automobile dataset given above includes both Weight and Mileage of 60 automobiles.
In addition to describing location and dispersion for each variable se
51
52
Sample size determination. If our estimate must satisfy requirements both for the
level of condence and for the precision of the estimate, then it is necessary to have some
prior information that gives a bound on or an estimate of . Let 0 denote thi
3. The previous two sets of hypotheses are examples of onesided tests  we only are interested in detecting the possibility that the population proportion falls on one particular
side of the reference value. In the rst case, we only are interested in sho
41
For example, suppose we randomly select 500 voters and nd that 260 of these voters
favor this candidate. Then our estimate of the population proportion is p = 260/500 = 0.520.
We are about 95% certain that the error of this estimate is no more than 2 (
If we use 5% level of signicance for this test, then we would reject the null hypothesis and
conclude that hiring and gender are not independent.
0.0.1
Nonparametric tests
Although the central limit theorem is a powerful tool that allows us to make decisi
PART 1:
This data come from experiment of 10 boys wore shoes with different material, A and B, on the soles of their left and
right shoes, respectively, for one month after which the amount of wear for each sole was measured.
1. Determine if there is a di
Population: Ux=1/N sum(from i=1, N) Xi
Ux: population mean
Population Variance= 1/N sum(from i=1, N)(XiUx)`2=@`2
/Variance= @=population standard deviation
Sample: x(mean) =1/n sum(fromi=1, n)Xi sample mean
Trangle`2=1/(n1)sum(from i=1, n)(xix mean)`2
arge Sample Estimation of a Population Proportion
Suppose we randomly select n individuals from the population of voters and let N denote the
proportion in the sample who favor a particular candidate. Then
is our estimate of .
The value of this estimate d
Homework 2
Due date: October 27, 2015.
1. Use the data in
http:/www.utdallas.edu/~ammann/stat3355scripts/BirthwtSmoke.csv
a.Construct a 95% confidence interval for the proportion of mothers who smoke.
b.What sample size would be required to estimate this
Factory Before After
Pilot
55
60
Pilot
106 111 S.Pilot
Pilot
64
58 S.Nonpilot
Pilot
66
82
Pilot
62
68 Mean.Pilot
Pilot
87
90 Mean.Nonpilot
Pilot
71
79
Pilot
85
88
Pilot
105
99
Pilot
103 104 v1
Pilot
56
62 v2
Pilot
89
98
Pilot
50
54 df
Pilot
108 119
Pilot
The tools acquired thus far allow us to make even more powerful
inferences about populations. An example of this is making statistical
decisions. This process involves formulating a hypothesis about a given
issue and t
1. Does this plant have a high mean score at the 5% level of significance?
a. The null hypothesis is that u<=75 then the test statistic value is 1.389 which is less than
the critical value calculated with 5% level of significance, 1.645.
1. Construct a 95% confidence interval for the proportion of mothers who smoke.
Lower bound = 0.3674185
Upper bound = 0.4221411
This means that we are 95% sure that the proportion of mothers who smoke is within
this interval
2. What samp
Homework 1, due Jan 27
1. Consider three random variables X, Y and Z with E(X) = E(Y ) =
E(Z) = 2 and E(X 2 ) = E(Y 2 ) = E(Z 2 ) = 9. Also Cov(X, Y ) = Cov(X, Z) =
Cov(Y, Z) = 2. Find Var(X + 2Y Z).
2. Linear combinations of normal random variables are s
Homework 6, due March 9
1. a) A straight line is fitted by least squares resulting in y = 3 + 2x. The first
observation is x = 3 and y = 10. Find the first residual.
b) Suppose given the same observations as in part a), we fit a regression through
origin
1.
a. 47.12% people of the group are male.
b. 10.81% people of the group have green eyes.
c. 11.83% of males have green eyes.
d. 51.56% of those with green eyes are male.
e. 55.45% of those with brown eyes are female.
2.
a. Expected Frequencies
Blue
Brown
1. The procedure of using sample data to find the estimated regression equation is better known as
_.
a. point estimation
b. interval estimation
c. the least squares method
d. extrapolation
Answer: C
2. Which of the following inferences can be drawn from
