{[ promptMessage ]}

Bookmark it

{[ promptMessage ]}

week_nine_sessions_one_and_two_vs

week_nine_sessions_one_and_two_vs - Week nine Sessions one...

Info icon This preview shows pages 1–13. Sign up to view the full content.

View Full Document Right Arrow Icon
Image of page 1

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 2
Image of page 3

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 4
Image of page 5

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 6
Image of page 7

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 8
Image of page 9

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 10
Image of page 11

Info icon This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document Right Arrow Icon
Image of page 12
Image of page 13
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Week nine Sessions one and two Lecture notes (Chapter 21) ProfessorEsfandiari The obj ective of this lecture is to... ° Revisit P value as a conditional probability ' Introduce you to the concept of the level of significance and its role in hypothesis testing ' Show you how to determine critical values based on the level of significance and the function of critical values in hypothesis testing. ° Define and interpret type I error or- alpha and how it relates to hypothesis testing ° Define and interpret type 11 error or beta and how it relates to hypothesis testing ' Define and interpret powers of the test and how it relates to hypothesis testing ° Define and interpret effect size - Discus the difference between statistical and practical significance Revisiting P value as a conditional probability Remember conditional probability from prior chapters. We defined P (A | B) as the conditional probability of event A; given that B has happened. The P value is actually a conditional probability and the condition for it is the null hypothesis being true. We will define P as P-value = Probability (of an observed statistic such as pA or X resulting from the sample data or even more extreme values than pA or X given that that null hypothesis is true) What do small P values mean? They show the sample data are like rare events and highly unlikely events, In such cases we use the data from the sample as evidence against the null hypothesis. What do large P values mean? Large P values show that the sample data are not surprising . It does not provide us with any evidence against the null hypothesis. The results from the sample are aligned with the null hypothesis and thus'there is no reason to reject it. ' 3 ' What is significance level? In order to decide what is a “rare event”, we need to set a threshold for the P value. That threshold is called the “level of significance” and we show it with alpha (or). Level of significance is generally set at 0.05 , 0.01, and 0.001. This depends on the nature of the study. If the study is done in controlled condition, then you go for a lower level of significance. For instance, if you are talking about sending a satellite to the space, you want a much lower alpha. However, if you are working on a drug that is being tested for the first time on the treatment of HIV, you go for a higher level of alpha and accept a higher risk of rejecting a true null hypothesis. ' If the P level is less than alpha, then you have found sufficient evidence from the data to reject the null. ' If the p value is larger than alpha, then you have insufficient evidence from the data to reject the null hypothesis and we “fail to reject the null”. Another way of looking at alpha and P value ' Alpha is the risk you are willing to take to reject a true null and it is referred to as “nominal” type one error. ° P value is the risk you will actually take to reject a true null and it is referred to as “actual” type one error. ° If the actual risk of rejecting a true null (P value) is less than the nominal risk you are willing to take to reject the true null (alpha), then you will reject the null. ' If the actual risk of rejecting a true null (P-value) is larger than the nominal risk you are willing to take to reject the true null (alpha), then you will fail to reject the null. Testing the null at a fixed level of alpha and with reference to critical values (the traditional way) In prior lecture on confidence interval we talked about 90%, 95%, and 99% confidence interval and mentioned that since based on CLT distribution of pA for independent random samples of size 30 or more follows the normal distribution, we can use the Z values corresponding to the 90%, 95%, and 99% intervals to compute the confidence interval for P. In the previous lecture, we defined confidence interval for the population proportion and showed that the corresponding Z values or critical for the 90%, 95%, and 99% were 1.65, 1.98, and 2.57. The alpha level is the complement of the confidence interval and thus the corresponding alpha levels of significance for 90%, 95%, and 99% confidence interval are 0.10, 0.05, and 0.01 respectively. The areas beyond these critical values are called critical regions. In testing the null hypothesis for the proportion, if the Z value that we compute through (pA-P)/SE(P) falls in the critical region we will reject the null hypothesis. Whereas, if the Z value we find falls within the confidence area, we fail to reject the null. Example: Jerry wonders (two-tailed) if a coin is fair. He tosses it one hundred times and gets 65 heads. Given this information answer the following questions: . a) State the null and the alternative hypothesis in words. H0: P = 0.50 If a coin is fair one expects the proportion of heads to be 50%. Ha: P # 0.50 We do not know if the coin is fair or not so that the proportion of heads could be more or less than 50% in the log run. a) Test the null hypothesis by the calculation of the P value. Z = (13" - P)/SD (19“) P" = 65/100 = 0.65 SD(pA) = 0050* 0.50/10 = 0050* 0.50/10 = 0.05 Z = (0.65 — 0.50)/0.05, Z = +3 P-value = Probability {(Z > = +3) +probability of (Z < = —3)} P value = (0.0013) + (0.0013) = 0.0026. Thus, the actual risk of rejecting a true null is 0.26%. Since 0.0026 is very low, we reject the null. c) Interpret the results within context Since the P value is very low, we conclude that the coin is not fair and it is biased so that heads are more likely than tails. (1) Suppose that Jared has a coin and once he tosses it he gets 60 tails. He thinks the coin is biased toward giving more tails. Write the null and the alternative hypothesis in symbols and test the null at alpha equal to 0.05. H0: P = 0.50, Ha: P > 0.50 P" = 60/100 = 0.60 Z (1)“) = (13" - P)/ SDG’) z (pA) = 0.69 — 0.50/0.05 = 010/005 = 10/5 = 2 Type I and type 11 error, power of the test As it was illustrated before hypothesis testing is sort of a conditional probability. The following table helps to illustrate this: Table one: Definition of Type I and [I error and power of the test and guidelines for deciding to reject or fail to reject the null r—m The Truth H0 is true H0 is false Given that H0 is true Given that H0 is false, Decision: Rejecting it is NOT OK Rejecting it is OK Reject Ho Power (1—6) Type I error (a) Decision: Given H0 is false, failing to reject it is Given that H0 is true, Failing to reject it is Fail to reject H0 NOT OK OK Type [I error (B) Table two: Type I and [I error and power of the test applied to medicine j The Truth | Does not have the disease Has the disease Decision: Diagnosis: patient has the Diagnosis: Patient has the disease disease . 0K NOT OK Reject Ho False Positive Power (1-6) Type I error (on) Decision: Diagnosis: The patient Diagnosis: Patient does not have does not have the disease the disease Fail to reject H0 NOT OK False negative OK Type II error (fl) Table three: Type I and I] error and power of the test applied to nutrition The Truth Diet program does not work Diet program works Decision: Join the program Join the program NOT OK OK Reject Ho Type I error (a) Power (1-[3) Decision: Do not join the program Do not join the program NOT OK Fail to reject H0 OK' Type II error ([3) Table four: Type I and 11 error and power of the test applied to teaching Online quizzes improve learning stats The Truth Online quizzes do not im-rove learnin_ stats Decision: Include online quizzes in Include online quizzes in stats stats curriculum curriculum NOT OK OK Reject Ho Power (l—B) Do not include online quizzes in stats curriculum Type I error (at) Do not include online quizzes in stats curriculum Decision: Fail to reject H0 OK NOT OK Type II error ([5) Since it is a one-tailed test, we need to place all of the alpha on the right side of the curve. The corresponding Z value is = 1.65. Since the calculated Z value is larger than the critical Z value (2 > 1.65), we reject the null and conclude that the coin is biased toward giving more tails. e) Now suppose that Jared wanted to test HQ: P = 0.50 vs. H(a) P # 0.50 at alpha 2 0.01. The results would change as follows. Since the alternative hypothesis is two-tailed, we need- to divide alpha in two parts and the corresponding critical values are +/— 2.57. Since the calculated Z value is less than the critical values (+ 2 < 2.57), we fail to reject the null at the level of 0.01 and we conclude that Jared’s coin is {Left +0 flaw He -5~ ‘jlvcrvm —Z=§8‘l‘v T16? Comparison of testing the null hypothesis through calculation of the P value and testing the null at a fixed level of alpha In the old days when the computers were not around, they tested the null hypothesis at fixed levels of alpha generally 0.05, 0.01 and 0.001. In today’s world, since computer software carry the statistical analysis, P—Value is readily calculated. Also, P value gives us the actual probability of rejecting a true null. For instance, the P- value of 0.35 means we have a 35% risk of rejecting a true null, and since it is too high of a risk to take, we do not reject the null. However, if we simply say we fail to reject the null at alpha equal to 0.05 and do not report the P—value, the risk of rejecting the true null could have been as low as 0.06 and as high as 0.99. Suppose the Null Hypothesis is true. Suppose the Null Hypothesis is not true. Fail lo reject Ho 3 [3* Figure 1: Illustration of Type I and Type II Errors and power of the curve P0 shows the hypothesized value of proportion under the null. The curve with mean equal to P0 displays the pA distribution given that the null is true. P shows the hypothesized value of proportion under the alternative. The curve with mean equal to P displays the distribution of 1)" under the alternative hypothesis. Effect size: Effect size is defined as the difference between the mean under H (A) and the mean under H0. Effect size makes more sense when we talk about a treatment. Suppose we say that after participation in a weight loss program, participants are expected to lose 10 pounds. That means if they do nothing (null) they lose zero pounds; it under the null equals to zero. If they participate in the program they lose ten pounds; pLunder the alternative equals to ten. The difference of the two means or ten pounds is called the effect size. (see graph 2 on page 8: effect size = 2). The area of 0. shows a one—tailed test of the null hypothesis or the nominal risk of rejecting a true null. If the alternative hypothesis were two tailed, we would divide this area in half. The area to the left of on is the area that we are assuming that H0 is true and we are not going to reject H0. B displays type H error of not rejecting a null hypothesis that is false (NOT OK). Notice that B is the area that the null and the alternative curve overlap. If we did not have any type 11 error, there would be no overlap. The area that is under the alternative curve and is the complement of B or (1 - B) displays the power rejecting a null that is not true (OK). Criticai Line 2: 1,04% Figure two: Displaying type I and II, power and effect size = 2 The difference between statistical and practical significance As a consumer of statistics, you should differentiate between statistical and practical significance. As we discussed before, the standard error is inversely proportional to the size of the sample. As sample size increases, standard error decreases, Z value increases, and P—Value decreases. But, a small P-value does not necessarily imply practical significance. Example: Let us say that the proportion of American public who endorse spending more money on mental health of the homeless is 55%. Based on a study, state X is being criticized for not endorsing the support of the mental health of the homeless. The relevant data are given below H0: P = 0.60 Ha: P < 0.60 P“ = 0.57 N = 2000 (random sample picked from state X) Z (PA) ”-7 (PA - P)/SD (13“) z (pA) = (0.57 —— 0.60)/ 0.011 Z(p") = (—0.03)/0.011 = 2.72 P value = {Prob (Z > = 2.72) + prob (Z < = -2.72)} = 0.0033*2 = 0.0066 Since the P value is very low, we reject the null and conclude that the proportion of people who endorse endorsing the mental health of the homeless in state X is less than the proportion in American public. But, the question is do we consider the difference between 57% and 60% to be statistically meaningful! An example from my own research: After Columbine shooting in the State of Colorado, the US Office of Education provided funding for designing an intervention that would help to improve the attitude of students toward police and authority. This study was conducted in six states on a total of 30,000 sixth grade students. The control group was exposed to the social studies book and the experimental group was exposed to the social studies book plus training in “law related education”. The students in the control and the experimental groups were tested on their attitude toward law and authority before and after the programthat lasted a whole academic year. There were 2000 students in the experimental and 1000 students in the control group (we did not collect data on all 30,000 students). H0: (u improvement in attitude for experimental) — (u improvement in attitude control) = 0 H0: (u improvement in attitude for experimental) — (u improvement in attitude control) > 0 Findings indicated that there was a statistically significant difference between the attitude of the experimental and the control group towards police and authority with the experimental group scoring improving 3% more than the control group (P = 0.000). However, when the findings were presented to the Senate, they were not willing to fund the project again because they felt that the 3% difference was not that important or meaningful. This implies that in spite of the fact that the results were statistically significant with very low P-value, they did not find this difference to be practically significant. This shows that statistical significant does not guarantee practical significance. If we use a large enough sample we can make any small difference look statistically significant. 10 Problems solved in lecture from chapter 21 2. Which alternative? In each of the following cases, is the alternative hypothesis one sided or two-sided? What are the hypotheses? a) A college dining service wants to know if students prefer plastic to metal cutlery. They conduct a survey to find out. b) In recent years, 10% of the college juniors have applied for study abroad. The dean’s office conducts a survey to see if that changed this year. c) A pharmaceutical company conducts a clinical trial to se if more patients who take a new drug experience headache relief than the 22% who claimed relief after taking the placebo. d) At a small computer peripheral company, only 60% of the hard drives produced passed all their performance tests the first time. Management recently invested a lot of resources into the production system and now conducts a test to see if it helped. 2. Which alternative? 21) Two sided. Let p be the percentage of students who prefer plastic. H0: 50% of students prefer plastic. (p = 0.50) HA : The percentage of students who prefer plastic is not 50%. (p 75 0.50) b) Two sided. Let p be the percentage of juniors planning to study abroad. Ho: 10% of juniors plan to study abroad. (p = 0.10) HA: The percentage of juniors plan to study abroad is not 10%. (p i 0.10) c) One sided. Let p be the percentage of people who experience relief. H0: 22% of people experience headache relief with the drug. (p = 0.22) HA: More than 22% of people experience headache relief with the drug. (p > 0.22) d) One sided. Let p be the percentage of hard drives that pass all performance tests. Ho: 60% of hard drives pass all performance tests. (p = 0.60) HA: The percentage of drives that pass all performance tests is greater than 60%. (p > 0.60) Problem 8: Significant again (problem) A new reading program may reduce the number of elementary school students who read below grade level. The company that developed this program supplied materials and teacher training for a large scale test involving nearly 8500 children in several school districts. Statistical analysis of the results showed that the percentage of students who did not attain the grade—level standard was reduced from 15.9% to 15.1%. The hypothesis that the new reading program produced no improvement was rejected with a p value of 0.023. a) Explain what the p-Value means in context. b) Even though this reading method has been shown to be significantly better, why might you not recommend that your local school adapt it? 11 8. Significant again (answers) a) If 15.9% is the true percentage of children who did not attain the grade level standard, there is only a 2.3% chance of observing 15.1% of children (in a sample of 8500) not attaining grade level by natural sampling variation alone. b) Under old methods, 1352 students would not be expected to read at grade level. With the new program, 1284 Would not be expected to read at grade level. This is only a decrease of 68 students. The costs of switching to the new program might outweigh the potential benefit. It is also important to realize that this is only a potential benefit. 13. Loan (problem) Before lending someone money, banks must decide whether they believe the applicant will repay the loan. Once strategy used in a point system. One officer’s assess information about the applicant totaling points they award for the person’s income level, credit history, current death burden, and so on. The higher the point total, the more convinced the bank is that it is safe to make the loan. Any applicant with a lower point total than a certain cut—off score is denied the loan. We can think of this decision as a hypothesis test. Since the bank makes it profit from the interest collected on repaid loans, their null hypothesis is that the applicant will repay the loan and therefore, should get the money. Only if the person’s score falls below the minimum cutoff, will the bank reject the null and deny the loan. This system is reasonably reliable, but, of course, sometimes there are mistakes. a) When a person defaults on a loan, which type of error did the bank make? b) Which kind of error is it when the bank misses and opportunity to make a loan to someone who will have repaid it? 0) Suppose the bank decides to lower the cutoff score from 250 points to 200. Is that analogous to choosing a higher or lower value of alpha for a hypothesis test? Explain. d) What impact does this change in cutoff value have on the chance of each type of error? 13.Loans (answer). a) The bank has made a Type 11 error. The person was not a good credit risk, and the bank failed to notice this. b) The bank has made a Type I error. The person was a good credit risk, and the bank was convinced that he/she was not. c) By making it easier to get a loan, the bank has reduced the alpha level. It takes less evidence to grant the person the loan. d) The risk of Type I error is decreased and the risk of Type 11 error has increased. 12 17. Homeowners 2005 (problem). In 2005 the US. Census Bureau reported that 68.9% of American families owned their homes. Census data reveal that the ownership rate in one small city is much lower. The city council is debating a plan to offer tax breaks to first-time home buyers in order to encourage people to become homeowners. They decide to adopt the plan on a 2-year trial basis and use the date they collect to make a decision about the continuing tax breaks. Since this plan costs the city tax revenues, they will continue to use it only if there is strong evidence that the rate of home ownership is increasing. a) In words, what will their hypothesis be ? b) What would a type I error be? c) What would a type II error be? d) For each type of error, ell who would be harmed. e) What would be the power of the test represent in this context? 17. Homeowners 2005 (answer). a) ...
View Full Document

{[ snackBarMessage ]}

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern