This preview shows pages 1–10. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full DocumentThis preview has intentionally blurred sections. Sign up to view the full version.
View Full Document
Unformatted text preview: 45am fw’uA STA 825 Final Exam — Wednesday, March 18
SHOW ALL WORK Name: 1. A study was conducted to evaluate the effects of different washing treatments on
the quality of beef after storage. There were three washing treatments: normal, ‘ chlorine, and lactic acid. On each of four consecutive days, three samples of
beef were prepared and each sample was randomly assigned to one of the three
treatments such that each treatment was observed on one sample per day. The
experiment had to be conducted over several days because it was not possible to
prepare and treat more than three samples per day. After treatment, all samples
were stored for 10 days. After storage, each of the samples was evaluated on
six criteria: beefy aroma, bloody/serumy aroma, metallic aroma, grassy / barnyard
aroma, sour aroma, and spoiled aroma. a. Why is this a multivariate rather than a univariate data situation? ‘BQCaLSL 5rd 0/ mg» ’3 A (an) it“ 36‘s ALAS (ix/M2,. (Lit/fa my éLVVL ' 5 CWTch; /\S~€{\JN\/( n L .
(heth " as {ﬁg/arse chhaLéJ. at s ei/‘ﬁrfrsc J
/ J~y€v¢£~  V l
1) ?/49L [1 Cl \ b. What is the design here? (Hint: we only talked about three designs, and
its one of those three.) R&A&3Mi?€ J (an /
(D463 («Li (J; i r ‘ ' 9 ., iajmﬂfi;
K :6,“ Ma,» 4 M i W c. Write down an appropriate MANOVA model for analyzing these data. In— Ix \
Fe 5/0.;AS e Vi; V d. In terms of the parameters of your model, write down the null hypothesis
for testing that there is no difference among the three washing methods. e. After testing the null hypothesis from (d), we will typically want to ob—
tain simultaneous conﬁdence intervals for speciﬁc comparisons among the
treatment means on particular response variables. Give one reason that the
Bonferroni method may be preferred in some cases over the Roy method
for obtaining simultaneous conﬁdence intervals. Give one reason that the
Roy method may be preferred in some cases over the Bonferroni method. fang/«Tam Alan/g5 Cwll flu (at ROB ,Aiefvah gallon.) Snécpivai 2. In the early 1900’s, several investigators were interested in predicting behavioral
and social outcomes among people based on physical characteristics. Macdon—
nell (1902) reports a correlation matrix for the following seven physical variables
measured on 3000 British criminals: (1) head length, (2) head breadth, (3) face
breadth, (4) left ﬁnger length, (5) left forearm length, (6) left foot length, and (7)
height. Assume that all original variables were measured in centimeters. The attached SAS program and output (labelled “Final Exam Problem #2”) per— forms a principal components analysis based Macdonnell’s correlation matrix. An
swer the following questions: a. One of the goals of principal components analysis is to reduce the dimen
sion of the original data. How would you choose the number of principal
components to retain for subsequent analyses? In this example, how many
principal components would you retain? mam/J.. Maw/019% g
2) uUL/Q 44942 Woman kdpu Wyn“; 4v “:0qu
3} (Xe 7% #0,.1 egqu/uw '> O
9) 5u23¢¢74m [swcli’zmi
/%fe,' 10/2944 alwer 3 $6.5 5‘0 )461’7L I / “Kasai at Ila/[99M 43¢ 2’6 “’4”ch (YT2\ b. Brieﬂy interpret the ﬁrst two principal components in this example. That
is, what aspect of the original variables is captured by the ﬁrst principal
component? the second? ‘ (Heb V5:?—L CiaflfaﬂS% Agave! 512g, org/LAPL //,m«l) ¢ Ami} [walk c. Explain why it is not appropriate for this example to perform a principal
components analysis on the covariance matrix rather than the correlation
matrix. gécwébq Myrlcmce n ch'za é S A 4 ‘3 C“ 114+ Jed. %/(“/L% 154' V4? Wcl‘ W"? eméy/A} 528: 711/“ ﬁxe/ r).ng UAgAA (/MJZQJ
MPH“ 4/0) day A 2 [2e w’// (ﬁrm/14A #6. W P (L. a“ a, 7%; ff '2 c. w, // mag 43; ﬂew’)3 ﬁt” all yﬁir‘mﬂael d. Suppose that in addition to the seven variables described above, an eighth
variable, computed as head length minus head breadth, had been included
in the analysis to capture head shape. What would be the variance of the
last principal component in such an analysis and Why? ﬂ E’é‘bea’t/JG») bum/[J AL 0 éQ/Q’WSQ [QQC/ I; A qux 50”; 27/ 0 darn/4mg 4 3‘?
( 1W1 (Man “a a4 ed M\ 39 is a I72, ’4 av“ K4 1/6 I1“ I 3. Annual ﬁnancial data are available on ﬁrms. Four ﬁnancial variables including
1121 2 (cash ﬂow) / (total debt), 1:2 2 (net income) / (total assets), :33 2 (current as—
sets) / (current liabilities), and x4 2 (current assets) / (net sales), were collected for
21 ﬁrms that subsequently went bankrupt and 25 ﬁnancially sound ﬁrms at about
the same point in time. A discriminant analysis for these data is performed in
the attached SAS program and output (labelled “Final Exam Problem #3”). The
discriminant analysis is based only on $2 and $3. Answer the following questions. a. In the SAS program, a hypothesis test is performed using PROC GLM of
the hypothesis that the mean vectors are the same in the two groups (ﬁ
nancially sound and ﬁnancially troubled ﬁrms). Why is this test performed
prior to performing a discriminant analysis, and what does the result of the
hypothesis test say about how the discriminant analysis will perform? .ﬂe 1; far ﬂoateJ Jaws/ac h J\ S CriMfyLCM+ 04 0w) / Q’Ja Per 4%. I“ ‘1 s
are, JﬁCZ/uzr 0/ {105/ 0 MEI [Mug if, )6 M) ,w/ amng Mm [v1 :SZWg/Q/ ﬂt/ éx/ogc/ a" 5;») Efﬁgy [all 05,57 ()2 :r Q/iJ‘C, F INM . £4 L 7{a,1 Z—  n I I V . ‘ I éd—‘SC {ﬂea/15 are Aﬁ'ijm'ﬁuxq‘f J Cél  4. r I
\ 7L {3 Dc’k’n So L'JK
slaw/J ex/eg/ 76% [Cami _ + 4 74
am QM» La); //. l < w Mg b. In this example, equal priors and costs of misclassiﬁcation were assumed.
Explain why it may be more appropriate to use different costs of misclassi
ﬁcation in this analysis? Why might it be more appropriate to use different
prior probabilities? /7Z 0087!? t?! /lx’\\§p’&,35ol{24 7gb» I’M—:33, 51 OﬂPWACf/Dﬁk'ie
5am, w/ max M a 6mm“; Jam; 4w ‘ ’ t  ‘ v J (inn ‘rLeaL‘L if Me» Li W #04311 145% Jul LL "if‘ohﬁ, r V a for),qu )6»i’i44" l—S [Mbsz LL’QS ‘4) gs; Q/qscd'ca‘
; ,/ M3 reign mam 42f «CM saw ﬂies/J 75403 5 ’17))" ’49 “VDd 464 4r»
». \ \ tr x ,«Wg. I Q (L a (V (V'Q n (LL/“LAC ﬁre w‘a
(D. 9' .5 ‘0 L2 9  I3 .500\C/ rm J HQ“ 47/0 éa ‘31’g5An ‘ MHZ 741:4 ("141/
~ 0493 r.“ C A ‘ K , u / r
Just ﬂu: em ﬁfe/.4 m ‘ c. Write down the linear discriminant rule for classifying ﬁrms as ﬁnancially
sound or ﬁnancially troubled. 1 1,0575" I )010‘3x1 Ali ; ('bll'zgg + w x : W'Zé'fog SINCE” ' 7.0%? d. There are two estimated error rates in the output. What error rate would
you use and why, if you were describing how you expected the linear dis—
criminant rule from (d) to perform in practice (when actually classifying
ﬁrms of unknown status as either ﬁnancial sound or troubled)? WOU’J VSi C F5955" Va [Jew/74“ carter c, is 1’3 5* éum a mama/5M a i
(Jena/tuxkrcl . l/ 3" 3'2th Luz Aim/q «A M
Joy/7‘ wqucL Mb CL‘SCNMNYVW’I/ rule L203 (‘Z nib/J CL 47‘ , 4. The following table lists measurements on 5 nutritional variables for 12 breakfast
cereals. TABLE 12.9 BREAKFASTCEREAL DATA x1 x2 x3 x4 x5 Vitamin A Protein Carbohydrates Fat Calories (‘76 daily Cereal (gm) (gm) (gm) (per 02) allowance)a
1. Life 6 19 l 110 0
2. Grape Nuts 3 23 0 100 25
3. Super Sugar Crisp 2 26 0 HO 25
4. Special K 6 21 0 “0 25
5‘ Rice Krispies 2 25 0 HQ 25
6. Raisin Bran 3 28 l 120 25
7. Product 19 2 24 0 110 100
8. Wheaties 3 23 l MO 25
9. Total 3 23 1 HO 100
10. Puffed Rice l 13 0 50 0
l 1. Sugar Corn Pops l 26 0 l IO 25
12. Sugar Smacks 2 25 O 110 25 ‘ 0 indicates less than 2%. A cluster analysis for these 12 cereal brands is performed in the attached SAS
program and output (labelled “Final Exam Problem #4”). Answer the following questions. a. Identify the type of clustering algorithm that was used in the SAS program.
(Don’t describe the algorithm, just tell me the type of algorithm it is —
more than one adjective is necessary). A ‘47 lU/wefa(7(ve/ ﬂa’zmc/HQ / 6/95 ,1 Us“ .
' 30“ t y
(XVQIch/Q, [A [C . b. What is the cluster structure for 5 clusters given by the results of PROC
CLUSTER in this example. That is, assuming there are 5 clusters, tell me
which cereals belong together. l 1: C5 ‘gPCLfa‘ /< C'ngf : Chi/2‘ «Ql‘ (<2 Mia/9,05" Sjtb/ can?”
Ciuiw 3 : gal{IA gré/U [tau/)9;
[ la; [64’ Ll : 2014+ (Cl) 73/1, [lulu f; 774124 RM c. Based on the pseudo—T2 statistics printed in the SAS output, what is the
appropriate number of clusters for this example. a, b 612(;;~v’5 L LL; M m AM 5’ 47; w (It; A,“ ’“J'C‘J‘”?Y *8le £1612!) mural 4g} 7184+ R's/re lea’lﬁrela A, {far}. d. Attached to the output is a page of star plots for the 12 cereals. Explain
Why just a dot appears for Puffed Rice. Bﬁuwse '72 (Q cl ane )LCLS 4Q? V
ﬁmlles‘fz Value all '7/ Varied/es, ...
View
Full
Document
 Fall '10
 HALL

Click to edit the document details