CSc/Emgt 404 -- Data Mining Name:______________________________ Final Exam December 11, 2001 Time exam started: ________________________ Time exam completed: ______________________ NOTE: Exam time should NOT exceed 2 hours and 30 minutes. Fax to: D. St. Clair 573-341-4501 St. Clair’s phone number for questions: 573-465-5963 (available 3:45 – 9:00 PM) You may add extra pages as needed. Indicate total # pages including this page: ___________________

CSc 401 – Data Mining Name:______________________________ Final Exam December 14, 2000 Score:_________________/100 Directions: Carefully answer each of the following questions. This is an open-book, open-note exam. You may use calculators but you may NOT use computers. You are NOT to get help from others. Points will be assigned on answer quality as well as answer correctness. CLEARLY show all work. PUT YOUR NAME AT THE TOP OF EACH PAGE. 1. Suppose you are trying to fit the model X Y 1 ^ 0 ^ ^ β + = . You have processed the data and have obtained the following information [22 pts.] 05 . 5 ) ( 35 . 6 ) ( 2 _ ^ 20 , 1 2 _ 20 , 1 = - = - - - y y y y i i i i a. Computer R 2 R 2 = 5.05/6.35=79.52% b. Complete the analysis of variance table. (See attached F table) ANOVA df SS MS F Tabular F (95% level) Regression Residual Total eebff76904c9a213cc51ed1f20b1ec00bd0ac636.doc - 1
Using a and b above, answer the following questions: c. Is the linear term X 1 ^ β significant in the model at the 95% level? Clearly explain. d. What percent of the variation about the mean is explained by the regression line? Clearly explain. Regression SSQ To find the percent the Regression SSQ contributes to the overall variation, you must divide by the total variation about the mean. e.

