#### You've reached the end of your free preview.

Want to read all 40 pages?

**Unformatted text preview: **Applications of Mathematical Statistics STATISTICAL - revision INFERENCE Applications of M A T H E M A T I C A L S T A T I S T I C S - EXAMPLES Task 1:
Approximately 1% of women aged 40-50 have breast cancer. A woman with
breast cancer has a 90% chance of a positive test from a mammogram, while
a woman without has a 10% chance of a false positive result. What is the
probability a woman has breast cancer given that she just had a positive test?
Use a tabular approach to Bayes Theorem to find all the proper posterior
probabilities.
Task 2:
The probability distribution is given: ⁄ Determine all the numerical parameters for the random variable described
by the above function with the proper value of the parametr .
Find cdf (cumulative distribution function).
Task 3:
Two functions are given: probability density function and cumulative
distribution function which describe the continuous random variable ∈ (−∞, ] (−∞, ]
⁄ ∙ ∈ (, ] () =
() = { ( , ] ( , +∞)
{ ∈ (, +∞)
Find the probability that the random variable will exceed the value , and the probability that this random variable will have values in the interval
(0; 0,5]. Determine the value: ( = , ) .
Task 4:
According to the American Bankers Association, only one in 10 people are
dissatisfied with their local bank. If 7 people are randomly selected, what is
the probability that the number dissatisfied with their local bank is:
a) exactly two, b) at most two c) at least two d) between one and three,
inclusive ?
1 Applications of Mathematical Statistics - revision Task 5:
If calls to your cell phone are a Poisson process with a constant rate =2 calls
per hour, what’s the probability that, if you forget to turn your phone off in a
1,5 hour movie, your phone rings during that time?
How many phone calls do you expect to get during the movie? Can you
estimate the approximate error of this value? Task 6:
Determine the area under the standard normal curve that lies;
a) from the left tail to -2,08 b) between -2,06 and 2,02 c) from 0,6 to
the right tail Task 7:
The random variable has a normal distribution ∶ (, ).
Find the probability: ( < < ). Task 8:
The annual wages of U.S. farm laborers in 1926 were normally distributed
with a mean of 586 $ and standard deviation of 97 $. What percentage of
U.S. farm laborers in 1926 had an annual wage:
a) between 500 $ and 700 $ b) at least 400 $ c) more than 800 $ Task 9:
Age of stockbrokers in the USA has a normal distribution with the standard
deviation 2,5 years. In the sample of 16 stockbrokers the average age was
exactly 30 years. Estimate by the proper confidence interval the mean
stockbrokers age for all the population taking into account 95 percent of
credibility. What is the margin error of this estimate? 2 Applications of Mathematical Statistics - revision Task 10 :
We have a sample of twenty people who went on a cruise to Alaska. Their
two-week weight gain is shown below:
Weight Gain
1
0
2
1,5
3 Frequency
4
5
8
2
1 Construct 90% confidence interval for the mean weight gain (kg) for all
tourists who go on a cruise to Alaska if it’s known that the appropriate
random variable has a normal distribution. Task 11:
A health study involves 1000 randomly selected deaths, with 331 of them
caused by heart disease. Use the sample data as a pilot study and find the
sample size necessary to estimate the proportion of all deaths caused by
heart disease. Assume that you want 98% credibility of the estimate and the
error no more than 0,01. Task 12:
On the base of the survey about smoking by the students there was
established that in the sample of 1000 students fortunately only 56 was
smokers. Taking the significance level = , verify hypothesis that the
percent of all the students addicted to smoking is over %. 3 Applications of Mathematical Statistics - revision Task 13:
The John Brown Health Club claims in advertisements that „you will lose on
average over 5 kilograms weight after only two weeks of the John Brown diet
and exercise programme”.
The Country Bureau of Consumer Affairs conducts a test of that claim by
randomly selecting 100 people who have signed up for the programme. It was
found that the data set was the following: weight loss
(kilograms) number
of people 0 - 2]
4
2-4
10
4-6
55
6-8
25
8 - 10
6
Use 0,05 significance level to test the advertised claim. Task 14:
Check, if two variables: gender and party affiliation are independent or not,
having the following data: Democrat Republican Male 20 30 Female 30 20 = , 4 Applications of Mathematical Statistics - revision Task 15:
The data set is given:
students’ monthly expenditure
for cultural life ($) number of students [20 – 30
30 – 40
40 – 50
50 – 80 10
30
40
20 1. Build % confidence interval for the mean of the monthly expenditure
for cultural life for the whole students population.
2. a) Estimate with 0,99 probability confidence interval for the fraction of all
the students who spend monthly above $ for cultural life.
b) What is the margin error of this estimate ?
c) Determine the sample size required to estimate this parameter with
the error not bigger than %.
3. Check the claim that the average monthly expenditure for cultural life for
the whole students population is quite different than $ at the significance
level , .
4. At the significance level 0,05 verify the assumption that the average
monthly expenditure for cultural life for the whole students population
exceeds $ .
5. Is the claim true that the proportion of all the students who spend
monthly below $ for cultural life is only % ? Check this assumption at
the significance level , .
6. Verify the hypothesis that the random variable (students' monthly
expenditure for cultural life) has a normal distribution with % credibility. 5 Applications of Mathematical Statistics - revision Answers
Task 1:
Approximately 1% of women aged 40-50 have breast cancer. A woman with breast cancer
has a 90% chance of a positive test from a mammogram, while a woman without has a 10%
chance of a false positive result. What is the probability a woman has breast cancer given
that she just had a positive test?
Use a tabular approach to Bayes Theorem to find all the proper posterior probabilities. Solution: – a woman has a positive test - the event that the woman has a cancer - the event that the woman doesn’t have a cancer
(/ ) − a woman has a positive test under the condition she has a cancer
(/ ) − a woman has a positive test under the condition she has no a
cancer
Prior probabilities:
( ) = , → 1% women have a cancer
( ) = , → 99% women have no a cancer Conditional probabilities:
(/ ) = , → a woman with breast cancer has a 90% chance of a positive test
from a mammogram (/ ) = , → a woman without a cancer has a 10% chance of a false positive
test result ( /) = ??? → probability a woman has breast cancer given that she
just had a positive test
The probability (1/ ) must be determined from Bayes' theorem:
( /) = ( ) (/ )
( ) (/ ) + ( ) (/ )
6 Applications of Mathematical Statistics ( /) = = - revision , ∙ , =
, ∙ , + , ∙ , , , =
≈ , , + , , , ∙ % = , % Tabular approach to Bayes’ Theorem:
-1Events Sum -2-3Prior
Conditional
Probabilities Probabilities
( )
(/ )
, , , , ------- -4Joint
Probabilities
( ) ∙ (/ )
( ∩ )
, , () = , -5Posterior
Probabilities
( )(/ )
( /) =
()
, , Task 2:
The probability distribution is given: ⁄
Determine all the numerical parameters for the random variable described by the above
function with the proper value of the parametr . Find the cumulative distribution function
and draw the graph of this function. Solution: = = ∑ = ∑ = ∙ + ∙
=
+
=
≈ , 7 Applications of Mathematical Statistics - revision = = ∑ − () = ∙ + ∙ − ( ) =
−
=
−
=
≈ , = √ = √, = , = , ± , cumulative distribution function: ( ) = ( ≤ ) () = ( ≤ ) () 1
2 1/3
2/3 /
1 Task 3:
Two functions are given: probability density function and cumulative distribution function which describe the continuous random variable (−∞, ] ∈ (−∞, ] () = {⁄ ∙ ∈ (, ]
() = { ( , ] ∈ (, +∞) ( , +∞)
Find the probability that the random variable will exceed the value , and the probability
that random variable will have values in the interval (0; 0,5]. Determine the value:
( = , ) . Solution:
) ) ) ( > , ) = − ( < , ) = − (, ) = − (, ) = = − ∙ , = − , = , ( < < , ) = (, ) − () = (, ) − = =
∙ , = , ( = , ) = 8 Applications of Mathematical Statistics - revision Task 4:
According to the American Bankers Association, only one in 10 people are dissatisfied with
their local bank. If 7 people are randomly selected, what is the probability that the number
dissatisfied with their local bank is:
a) exactly two, b) at most two c) at least two d) between one and three, inclusive ? Solution: − number of dissatisfied clients − probability of success in one trail = , → probability of dissatisfied client − probability of failure in one trail = , += − number of Bernoulli trails = ⇒ , + , = → 7 people a) − exactly two clients are dissatisfied − number of success in Bernoulli trails = → dissatisfied clients () = () = ( ) ∙ ∙ − () = ( ) ∙ (, ) ∙ (, ) = , ≈ , probability from the table b) − at most two clients are dissatisfied
() = ( ≤ ) = ( = ) + ( = ) + ( = )
= () + () + () = = ( ) ∙ (, ) ∙ (, ) + ( ) ∙ (, ) ∙ (, ) + ( ) ∙ (, ) ∙ (, ) = = , + , + , ≈ , probabilities from the table 9 Applications of Mathematical Statistics - revision c) − at least two clients are dissatisfied
() = ( ≥ ) = − ( < ) = − [ ( = ) + ( = )] = = − [ () + ()] = − [ , + , ] =
probabilities from the table = − [ , ] = , ≈ , d) − number of dissatisfied clients is between one and three
() = ( ≤ ≤ ) = ( = ) + ( = ) + ( = )
= () + () + () = = ( ) ∙ (, ) ∙ (, ) + ( ) ∙ (, ) ∙ (, ) + ( ) ∙ (, ) ∙ (, ) = = , + , + , ≈ , probabilities from the table n k 0.01 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 7 0 0.932065 0.698337 0.478297 0.209715 0.082354 0.027994 0.007813 0.001638 0.000219 1 0.065904 0.257282 0.372009 0.367002 0.247063 0.130637 0.054688 0.017203 0.003572 2 0.001997 0.040623 0.124003 0.275251 0.317652 0.261274 0.164063 0.077414 0.025005 3 3.36E-05 0.003563 0.022964 0.114688 0.226895 0.290304 0.273438 0.193536 0.097241 4 3.4E-07 0.000188 0.002552 0.028672 0.09724 0.193536 0.273438 0.290304 0.226895 5 2.06E-09 5.92E-06 0.00017 0.004301 0.025005 0.077414 0.164063 0.261274 0.317652 6 6.93E-12 1.04E-07 6.3E-06 0.000358 0.003572 0.017203 0.054688 0.130637 0.247063 7 1E-14 7.81E-10 1E-07 1.28E-05 0.000219 0.001638 0.007813 0.027994 0.082354 k 0.01 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 n Task 5:
If calls to your cell phone are a Poisson process with a constant rate =2 calls per hour,
what’s the probability that, if you forget to turn your phone off in a 1,5 hour movie, your
phone rings during that time?
How many phone calls do you expect to get during the movie? Can you estimate the
approximate error of this value? 10 Applications of Mathematical Statistics - revision Solution: − the number of calls to the cell phone
Theoretically the random variable can assume any value 0,1,2,3, …. − has Poisson distribution, so we know that: ∶ ( ,
( = ) = − ∙ and: √ ) ! = − the average number of calls during one hour (interval of time)
( ≥ ) = − ( = ) We look for probability: We will count: ( = ) . In our case: − the number of successes - number of calls during 1,5 hours = ,
= − no calls, but:
⇒ the average number of calls during one and half hours
= = 1h 1,5 h = ,
so:
( = ) = = − −
∙
= ∙
= ∙
= =
!
! =
=
= , ≈ , (, )
, ∙ , , The same probability is in the table:
( = ) = ( ≤ ) = , ≈ , So we have: ( ≥ ) = − ( = ) = − , = , 11 Applications of Mathematical Statistics - revision Cumulative Poisson Probability Distribution Table
( ≤ ) = ∑= − ∙ k
0
1
2
3
4
5 2.10
0.1225
0.3796
0.6496
0.8386
0.9379
0.9796 2.20
0.1108
0.3546
0.6227
0.8194
0.9275
0.9751 2.30
0.1003
0.3309
0.5960
0.7993
0.9162
0.9700 2.40
0.0907
0.3084
0.5697
0.7787
0.9041
0.9643 2.50
0.0821
0.2873
0.5438
0.7576
0.8912
0.9580 2.60
0.0743
0.2674
0.5184
0.7360
0.8774
0.9510 ! 2.70
0.0672
0.2487
0.4936
0.7141
0.8629
0.9433 2.80
0.0608
0.2311
0.4695
0.6919
0.8477
0.9349 2.90
0.0550
0.2146
0.4460
0.6696
0.8318
0.9258 3.00
0.0498
0.1991
0.4232
0.6472
0.8153
0.9161 How many phone calls do you expect to get during the movie? Can you
estimate the approximate error of this value? = µ = = = √ = µ = = = = √ = √ = , We can expect on average 3 calls during the movie with the approximate
error 1,73 calls. = ± , Task 6:
Determine the area under the standard normal curve that lies;
b) from the left tail to -2,08 b) between -2,06 and 2,02 c) from 0,6 to the right tail Solution: ∶ ( , )
12 Applications of Mathematical Statistics - revision ) ( < −, ) = (−, ) = − (, ) = − , = , ≈ , value from the table ) (−, < < , ) = (, ) − (−, ) =
= (, ) − [ − (, ) ] =
= (, ) − + (, ) =
= (, ) + (, ) − =
= , + , − = , − = , ≈ , values from the table ) ( > , ) = − ( < , ) = − (, ) =
= − , = , value from the table Task 7: The random variable has a normal distribution ∶ (, ).
Find the probability: ( < < ). Solution: − has normal distribution ∶ ( , )
=, = We must apply standardization:
= → − ∶ (, ) − the variable which has the standard normal distribution ∶ (, ) ( < < ) =
( −
−
−
<
<
)= 13 Applications of Mathematical Statistics = (
= ( - revision −
−
< <
)= < < ) = (, < < , ) = = (, ) − (, ) = = , − , = , values from the table Task 8:
The annual wages of U.S. farm laborers in 1926 were normally distributed with a mean of
586 $ and standard deviation of 97 $. What percentage of U.S. farm laborers in 1926 had an
annual wage:
a) between 500 $ and 700 $ b) at least 400 $ c) more than 800 $ Solution: −annual wages of U.S. farm laborers in 1926 − has normal distribution ∶ ( , ) = , = ) ( < < ) =
= (
= ( ( − − − <
<
)= − − < <
)= − < <
) = (−, < < , ) = = (, ) − (−, ) = (, ) − [ − (, )] =
14 Applications of Mathematical Statistics - revision = (, ) + (, ) − =
= , + , + = , − = , ≈ , values from the table − − − >
) = ( >
) −
= ( >
) = ( > −, ) = − ( < −, ) = = − (−, ) = − [ − (, ) ] = ) ( > ) = ( = − + (, ) = (, ) = , ≈ , value from the table ) ( > ) = (
= ( > − − − >
) = ( >
)= ) = ( > , ) = − ( < , ) = − (, ) = = − , = , ≈ , value from the table Answers:
70 % of U.S. farm laborers in 1926 had annual wages between 500$ and
700$, 97 % of U.S. farm laborers in 1926 had at least 400 $ annual wages.
Only 1,4 % of U.S. farm laborers in 1926 had over 800$ annual wages. 15 Applications of Mathematical Statistics - revision Task 9:
Age of stockbrokers in the USA has a normal distribution with the standard deviation 2,5
years. In the sample of 16 stockbrokers the average age was exactly 30 years. Estimate by
the proper confidence interval the mean stockbrokers age for all the population taking into
account 95 percent of credibility. What is the margin error of this estimate? Solution: − − has a normal distribution ∶ ( ; , ) = , years − the mean age
= − = , - confidence level
We choose: Model 1. − is given ( − ∙ √ < < + ∙ √ ) = − = = years, = , years − critical value for the confidence level − = , − = , ⇒ = , To determine the critical value we use the standard statistical table for ∶ (, ) using the formula: ( ) = − − cumulative distribution function for the normal variable ∶ (, )
( ) = − , = −
= − , = , = , + , = , = , 16 Applications of Mathematical Statistics ( − ∙ = − ∙ = + ∙ √ √ √ < < + ∙ = − , ∙
= + , ∙ , √ √ - revision ) = , = − , = , , √ = + , = , ∈ (, ; , ) years = = ∙ √ = ∙ = , ∙ , √ = , ∙ , = = , ∙ , = , ≈ , years
Confidence interval (, ; , ) years covers the mean age for all
the stockbrokers in the USA with probability , (with credibility % ),
(with % sure), (with % risk error) with maximum error , years. Task 10:
We have a sample of twenty people who went on a cruise to Alaska. Their two-week weight
gain is shown below:
Weight Gain Frequency
1
4
0
5
2
8
1,5
2
3
1
Construct 90% confidence interval for the mean weight gain (kg) for all tourists who go on a
cruise to Alaska if it’s known that the appropriate random variable has a normal distribution. Solution: − − random variable ∶ ( ; ) − the mean weight gain
17 Applications of Mathematical Statistics - revision = − = , - confidence level = We choose: Model 2. ( − ∙ = − ∙ = + ∙ √ − < < + ∙ √ − ) = − = , √ − √ − We will find sample parameters: =? , =? Auxiliary calculations: − ( − ) ( − ) = , 1 4 4 -0,3 0,09 0,36 0 5 0 -1,3 1,69 8,45 2 8 16 0,7 0,49 3,92 1,5 2 3 0,2 0,04 0,08 3 1 3 1,7 2,89 2,89 ∑ 20 26 ____ ____ 15,7 =
∑ ∙ = = , = =√
∑( − ) = √
, = √, ≈ , = 18 Applications of Mathematical Statistics - revision − critical value for the confidence level − − = , ⇒ = , the Student's −distribution table:
… = , …
⋮ − = , ⋮ ≈ , = − ∙ √ − = , − , , √ = , − , , =
, = , − , ∙ , = , − , = , = + ∙ √ − = , + , , √ = , + , , =
, = , + , ∙ , = , + , = , ∈ ( , ; , ) (, < < , ) = , 90% confidence interval for the mean weight gain for all tourists who go on a
cruise to Alaska is between , and , kg. Task 11:
A health study involves 1000 randomly selected deaths, with 331 of them caused by heart
disease. Use the sample data as a pilot study and find the sample size necessary to estimate
the proportion of all deaths caused by heart disease. Assume that you want 98% credibility
of the estimate and the error no more than 0,01. Solution: = ??? − the sample size necessary to satisfy the margin error requirements = , 19 Applications of Mathematical Statistics ̂ ̂ = ∙ √
⇒ ̂= =
= , ≈ , - revision ̂ ̂ = ̂ = , − = , ⇒ ( ) = − = , , = −
= − , = , = , + , = , = , ̂ ̂ , ∙ , ∙ (, )
, ∙ , =
=
=
= (, )
, = , = , = Task 12:
On the base of the survey about smoking by the students there was established that in the
sample of 1000 students fortunately only 56 was smokers. Taking the significance level = 0,05 verify hypothesis that the percent of all the students addicted to smoking is over
3%. Solution: − proportion (fraction) of all students who smoke (in the whole population) − − sample size = − number of students who smoke (in the sample)
̂ − proportion of all smokers (in the sample) − sample fraction 20 Applications of Mathematical Statistics ̂= ̂= - revision = , − hypothetical value of the population proportion (population fraction)
- hypothetical value of the population fraction for all students who smoke % = % ⇒ = , = , = , = , : = : > : = , : > , ⇒ one - tailed test with the right critical region Test statistic:
= ̂ − √ = ̂ − ∙ √ = ∙ , − , √, ∙ , = = , √, = , √, = , = , , ( ) = − , = , 21 Applications of Mathematical Statistics - revision = , right critical region = , = , test statistics The value of the test statistics falls into the critical region.
So, at the significance level of 0,05 we know that our conjecture is true, it
means that the percent of all the students addicted to smoking is over %. Task 13:
The John Brown Health Club claims in advertisements that „you will lose on average over 5
kilograms weight after only two weeks of the John Brown diet and exercise programme”.
The Country Bureau of Consumer Affairs conducts a test of that claim by randomly selecting
100 people who have signed up for the programme. It was found that the data set was the
following:
weight loss
(kilograms) number
of people 0 - 2]
4
2-4
10
4-6
55
6-8
25
8 - 10
6
Use 0,05 significance level to test the advertised claim. Solution: − weight loss after two weeks of the John Brown diet and exercise
programme
22 Applications of Mathematical Statistics - revision − average weight loss after two weeks of the John Brown diet and exercise
programme ∶ ??? − unknown = , = kg - hypothetical value of the population mean = : = kg : > kg ⇒
the Health Club claims that „you will lose on average
over 5 kilograms weight after only two weeks” : = kg : > kg ⇒ one - tailed test with the right critical region Test statistic:
= − √ – sample mean − sample standard deviation 23 Applications of Mathematical Statistics - revision Auxiliary calculations:
̇ ̇ ̇ − (̇ − ) (̇ − ) 0-2 4 1 4 -4,4 19,36 77,44 2-4 10 3 30 -2,4 5,76 57,6 4-6 55 5 275 -0,4 0,16 8,8 6-8 25 7 175 1,6 2,56 64 8 - 10 6 9 54 3,6 12,96 77,76 total 100 ---- 538 ---- ---- 285,6 , =
∑ ̇ ∙ = = = , ≈ , =√
∑(̇ − ) = √
, = √ , = , ≈ ,...

View
Full Document

- Spring '16
- Math, Statistics, Normal Distribution, Mathematical statistics