Unformatted Document Excerpt
Coursehero >>
Michigan >>
Michigan >>
STAT 36-754
Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
28
Shannon Chapter Entropy and
Kullback-Leibler
Divergence
Section 28.1 introduces Shannon entropy and its most basic properties, including the way it measures how close a random variable is
to being uniformly distributed.
Section 28.2 describes relative entropy, or Kullback-Leibler divergence, which measures the discrepancy between two probability
distributions, and from which Shannon entropy can be constructed.
Section 28.2.1 describes some statistical aspects of relative entropy,
especially its relationship to expected log-likelihood and to Fisher
information.
Section 28.3 introduces the idea of the mutual information shared
by two random variables, and shows how to use it as a measure of
serial dependence, like a nonlinear version of autocovariance (Section
28.3.1).
Information theory studies stochastic processes as sources of information,
or as models of communication channels. It appeared in essentially its modern
form with Shannon (1948), and rapidly proved to be an extremely useful mathematical tool, not only for the study of communication and control in the animal
and the machine (Wiener, 1961), but more technically as a vital part of probability theory, with deep connections to statistical inference (Kullback, 1968),
to ergodic theory, and to large deviations theory. In an introduction thats so
limited its almost a crime, we will do little more than build enough theory to
see how it can t in with the theory of inference, and then get what we need
to progress to large deviations. If you want to learn more (and you should!),
the deservedly-standard modern textbook is Cover and Thomas (1991), and a
good treatment, at something more like our level of mathematical rigor, is Gray
189
CHAPTER 28. ENTROPY AND DIVERGENCE
190
(1990).1
28.1
Shannon Entropy
The most basic concept of information theory is that of the entropy of a random
variable, or its distribution, often called Shannon entropy to distinguish it from
the many other sorts. This is a measure of the uncertainty or variability associated with the random variable. Lets start with the discrete case, where the
variable takes on only a nite or countable number of values, and everything is
easier.
Denition 356 (Shannon Entropy (Discrete Case)) The Shannon entropy,
or just entropy, of a discrete random variable X is
H [X ]
x
P (X = x) log P (X = x) = E [log P (X )]
(28.1)
when the sum exists. Entropy has units of bits when the logarithm has base 2,
and nats when it has base e.
The joint entropy of two random variables, H [X, Y ], is the entropy of their
joint distribution.
The conditional entropy of X given Y , H [X |Y ] is
H [X |Y ]
P (Y = y )
y
x
= E [log P (X |Y )]
= H [X, Y ] H [Y ]
P (X = x|Y = y ) log P (X = x|Y = y ) 28.2)
(
(28.3)
(28.4)
Here are some important properties of the Shannon entropy, presented without proofs (which are not hard).
1. H [X ] 0
2. H [X ] = 0 i x0 : X = x0 a.s.
3. If X can take on n < dierent values (with positive probability), then
H [X ] log n. H [X ] = log n i X is uniformly distributed.
4. H [X ] + H [Y ] H [X, Y ], with equality i X and Y are independent. (This
comes from the logarithm in the denition.)
1 Remarkably, almost all of the p ost-1948 development has b een either amplifying or rening
themes rst sounded by Shannon. For example, one of the fundamental results, which we
will see in the next chapter, is the Shannon-Macmillan-Breiman theorem, or asymptotic
equipartition prop erty, which says roughly that the log-likelihood per unit time of a random
sequence converges to a constant, characteristic of the data-generating pro cess. Shannons
original version was convergence in probability for ergodic Markov chains; the modern form
is almost sure convergence for any stationary and ergodic process. Pessimistically, this says
something about the decadence of modern mathematical science; optimistically, something
ab out the value of getting it right the rst time.
CHAPTER 28. ENTROPY AND DIVERGENCE
191
5. H [X, Y ] H [X ].
6. H [X |Y ] 0, with equality i X is a.s. constant given Y , for almost all
Y.
7. H [X |Y ] H [X ], with equality i X is independent of Y . (Conditioning
reduces entropy.)
8. H [f (X )] H [X ], for any measurable function f , with equality i f is
invertible.
The rst three properties can be summarized by saying that H [X ] is maximized by a uniform distribution, and minimized, to zero, by a degenerate one
which is a.s. constant. We can then think of H [X ] as the variability of X ,
something like the log of the eective number of values it can take on. We can
also think of it as how uncertain we are about X s value.2 H [X, Y ] is then how
much variability or uncertainty is associated with the pair variable X, Y , and
H [Y |X ] is how much uncertainty remains about Y once X is known, averaging over Y . Similarly interpretations follow for the other properties. The fact
that H [f (X )] = H [X ] if f is invertible is nice, because then f just relabels the
possible values, meshing nicely with this interpretation.
A simple consequence of the above results is particularly important for later
use.
Lemma 357 (Chain Rule for Shannon Entropy) Let X1 , X2 , . . . Xn be discretevalued random variables on a common probability space. Then
n
H [X1 , X2 , . . . Xn ] = H [X1 ] +
i=2
H [Xn |X1 , . . . Xn1 ]
(28.5)
Proof: From the denitions, it is easily seen that H [X2 |X1 ] = H [X2 , X1 ]
H [X1 ]. This establishes the chain rule for n = 2. A simple argument by
induction does the rest.
For non-discrete random variables, it is necessary to introduce a reference
measure, and many of the nice properties go away.
Denition 358 (Shannon Entropy (General Case)) The Shannon entropy
of a random variable X with distribution , with respect to a reference measure
, is
d
(28.6)
H [X ] E log
d
2 This line of reasoning is sometimes supplemented by saying that we are more surprised
to nd that X = x the less probable that event is, supp osing that surprise should go as the
log of one over that probability, and dening entropy as exp ected surprise. The choice of
the logarithm, rather than any other increasing function, is of course retroactive, though one
might cobble together some kind of psychophysical justication, since the p erceived intensity
of a sensation often grows logarithmically with the physical magnitude of the stimulus. More
dubious, to my mind, is the idea that there is any surprise at al l when a fair coin coming up
heads.
192
CHAPTER 28. ENTROPY AND DIVERGENCE
when << . Joint and conditional entropies are dened similarly. We wil l
also write H [], with the same meaning. This is sometimes cal led dierential
entropy when is Lebesgue measure on Euclidean space, especial ly R, and then
is written h(X ) or h[X ].
It remains true, in the general case, that H [X |Y ] = H [X, Y ] H [Y ], provided all of the entropies are nite. The chain rule remains valid, conditioning
still reduces entropy, and the joint entropy is still the sum of the marginal
entropies, with equality i the variables are independent. However, depending
on the reference measure, H [X ] can be negative; e.g., if is Lebesgue measure
and L (X ) = (x), then H [X ] = .
28.2
Relative Entropy or Kullback-Leibler Divergence
Some of the diculties associated with Shannon entropy, in the general case,
can be evaded by using relative entropy.
Denition 359 (Relative Entropy, Kullback-Leibler Divergence) Given
two probability distributions, << , the relative entropy of with respect to
, or the Kullback-Leibler divergence of from , is
D( ) = E log
d
d
(28.7)
If is not absolutely continuous with respect to , then D( ) = .
Lemma 360 D( ) 0, with equality = i almost everywhere ().
d
d
Proof: From Jensens inequality, E log d log E d = log 1 = 0. The
second part follows from the conditions for equality in Jensens inequality.
Lemma 361 (Divergence and Total Variation) For any two distributions,
2
1
D( ) 2 ln 2 1 .
Proof: Algebra. See, e.g., Cover and Thomas (1991, Lemma 12.6.1, pp. 300
301).
Denition 362 The conditional relative entropy, D((Y |X ) (Y |X )) is
D((Y |X ) (Y |X )) E log
d (Y |X )
d(Y |X )
(28.8)
Lemma 363 (Chain Rule for Relative Entropy) D((X, Y ) (X, Y )) =
D((X ) (X )) + D((Y |X ) (Y |X ))
Proof: Algebra.
Shannon entropy can be constructed from the relative entropy.
CHAPTER 28. ENTROPY AND DIVERGENCE
193
Lemma 364 The Shannon entropy of a discrete-valued random variable X ,
with distribution , is
H [X ] = log n D( )
(28.9)
where n is the number of values X can take on (with positive probability), and
is the uniform distribution over those values.
Proof: Algebra.
A similar result holds for the entropy of a variable which takes values in a
nite subset, of volume V , of a Euclidean space, i.e., H [X ] = log V D( ),
where is Lebesgue measure and is the uniform probability measure on the
range of X .
28.2.1
Statistical Asp ects of Relative Entropy
From Lemma 361, convergence in relative entropy, D( n ) 0 as n ,
implies convergence in the total variation (L1 ) metric. Because of Lemma 360,
we can say that KL divergence has some of the properties of a metric on the
space of probability distribution: its non-negative, with equality only when the
two distributions are equal (a.e.). Unfortunately, however, it is not symmetric,
and it does not obey the triangle inequality. (This is why its the KL divergence
rather than the KL distance.) Nonetheless, its enough like a metric that it can
be used to construct a kind of geometry on the space of probability distributions,
and so of statistical models, which can be extremely useful. While we will not
be able to go very far into this information geometry3 , it will be important to
indicate a few of the connections between information-theoretic notions, and
the more usual ones of statistical theory.
Denition 365 (Cross-entropy) The cross-entropy of and , Q( ), is
Q ( ) E log
d
d
(28.10)
where is absolutely continuous with respect to the reference measure . If the
domain is discrete, we wil l take the reference measure to be uniform and drop
the subscript, unless otherwise noted.
Lemma 366 Suppose and are the distributions of two probability models,
and << . Then the cross-entropy is the expected negative log-likelihood of
the model corresponding to , when the actual distribution is . The actual
or empirical negative log-likelihood of the model corresponding to is Q ( ),
where is the empirical distribution.
Proof: Obvious from the denitions.
3 See Kass and Vos (1997) or Amari and Nagaoka (1993/2000). For applications to statistical inference for stochastic pro cesses, see Taniguchi and Kakizawa (2000). For an easier
general intro duction, Kulhavy (1996) is hard to b eat.
CHAPTER 28. ENTROPY AND DIVERGENCE
194
Lemma 367 If << << , then Q ( ) = H [] + D( ).
Proof: By the chain rule for densities,
E
d
d
d
log
d
d
log
d
d d
d d
d
d
= log
+ log
d
d
d
d
= E log
+ E log
d
d
=
(28.11)
(28.12)
(28.13)
The result follows by applying the denitions.
Corollary 368 (Gibbss Inequality) Q ( ) H [], with equality i =
a.e.
Proof: Insert the result of Lemma 360 into the preceding proposition.
The statistical interpretation of the proposition is this: The log-likelihood
of a model, leading to distribution , can be broken into two parts. One is the
divergence of from ; the other just the entropy of , i.e., it is the same for all
models. If we are considering the expected log-likelihood, then is the actual
data-generating distribution. If we are considering the empirical log-likelihood,
then is the empirical distribution. In either case, to maximize the likelihood
is to minimize the relative entropy, or divergence. What we would like to do, as
statisticians, is minimize the divergence from the data-generating distribution,
since that will let us predict future values. What we can do is minimize divergence from the empirical distribution. The consistency of maximum likelihood
methods comes down, then, to nding conditions under which a shrinking divergence from the empirical distribution guarantees a shrinking divergence from
the true distribution.4
Denition 369 Let Rk , k < , be the parameter indexing a set M of
statistical models, where for every , << , with densities p . Then the
Fisher information matrix is
Iij () E
log p
di
log p
dj
(28.14)
Corollary 370 The Fisher information matrix is equal to the Hessian (second
partial derivative) matrix of the relative entropy:
Iij (0 ) =
2
D(0 )
i j
(28.15)
4 If we did have a triangle inequality, then we could say D ( ) D ( ) + D ( ), and
it would b e enough to make sure that both the terms on the RHS went to zero, say by some
combination of maximizing the likelihood in-sample, so D( ) is small, and ergodicity, so
that D( ) is small. While, as noted, there is no triangle inequality, under some conditions
this idea is roughly right; there are nice diagrams in Kulhavy (1996).
195
CHAPTER 28. ENTROPY AND DIVERGENCE
Proof: It is a classical result (see, e.g., Lehmann and Casella (1998, sec. 2.6.1))
2
that Iij () = E j log p . The present result follows from this, Lemma
i
366, Lemma 367, and the fact that H [0 ] is independent of .
28.3
Mutual Information
Denition 371 (Mutual Information) The mutual information between two
random variables, X and Y , is the divergence of the product of their marginal
distributions from their actual joint distribution:
I [X ; Y ] D(L (X, Y ) L (X ) L (Y ))
(28.16)
Similarly, the mutual information among n random variables X1 , X2 , . . . Xn is
n
I [X1 ; X2 ; . . . ; Xn ] D(L (X1 , X2 , . . . Xn )
i=1
L (Xi ))
(28.17)
the divergence of the product distribution from the joint distribution.
Prop osition 372 I [X ; Y ] 0, with equality i X and Y are independent.
Proof: Directly from Lemma 360.
Prop osition 373 If al l the entropies involved are nite,
I [X ; Y ]
= H [X ] + H [Y ] H [X, Y ]
= H [X ] H [X |Y ]
= H [Y ] H [Y |X ]
(28.18)
(28.19)
(28.20)
so I [X ; Y ] H [X ] H [Y ].
Proof: Calculation.
This leads to the interpretation of the mutual information as the reduction
in uncertainty or eective variability of X when Y is known, averaging over their
joint distribution. Notice that in the discrete case, we can say H [X ] = I [X ; X ],
which is why H [X ] is sometimes known as the self-information.
28.3.1
Mutual Information Function
Just as with the autocovariance function, we can dene a mutual information
function for one-parameter processes, to serve as a measure of serial dependence.
Denition 374 (Mutual Information Function) The mutual information
function of a one-parameter stochastic process X is
(t1 , t2 ) I [Xt1 ; Xt2 ]
(28.21)
which is symmetric in its arguments. If the process is stationary, it is a function
of |t1 t2 | alone.
CHAPTER 28. ENTROPY AND DIVERGENCE
196
Notice that, unlike the autocovariance function, includes nonlinear dependencies between Xt1 and Xt2 . Also notice that ( ) = 0 means that the two
variables are strictly independent, not just uncorrelated.
Theorem 375 A stationary process is mixing if ( ) 0.
Proof: Because then the total variation distance between the joint distribution,
L (Xt1 Xt2 ), and the product of the marginal distributions, L (Xt1 ) L (Xt2 ), is
being forced down towards zero, which implies mixing (Denition 338).
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more.
Course Hero has millions of course specific materials providing students with the best way to expand
their education.
Below is a small sample set of documents:
Michigan - STAT - 36-754
Chapter 29Entropy Rates andAsymptotic EquipartitionSection 29.1 introduces the entropy rate the asymptotic entropy per time-step of a stochastic process and shows that it iswell-dened; and similarly for information, divergence, etc. rates.Section 29.
Michigan - STAT - 36-754
Chapter 30General Theory of LargeDeviationsA family of random variables follows the large deviations principle if the probability of the variables falling into bad sets, representing large deviations from expectations, declines exponentially insome ap
Michigan - STAT - 36-754
Chapter 31Large Deviations for I IDSequences: The Return ofRelative EntropySection 31.1 introduces the exponential version of the Markov inequality, which will be our ma jor calculating device, and shows howit naturally leads to both the cumulant gen
Michigan - STAT - 36-754
Chapter 32Large Deviations forMarkov SequencesThis chapter establishes large deviations principles for Markovsequences as natural consequences of the large deviations principlesfor IID sequences in Chapter 31. (LDPs for continuous-time Markovprocess
Michigan - STAT - 36-754
Chapter 34Large Deviations forWeakly Dep endentSequences: TheGrtner-Ellis TheoremaThis chapter proves the Grtner-Ellis theorem, establishing anaLDP for not-too-dependent processes taking values in topologicalvector spaces. Most of our earlier LDP
Michigan - STAT - 36-754
Chapter 35Large Deviations forStochastic DierentialEquationsThis last chapter revisits large deviations for stochastic dierential equations in the small-noise limit, rst raised in Chapter 22.Section 35.1 establishes the LDP for the Wiener process (Sc
Michigan - STAT - 36-754
BibliographyAbramowitz, Milton and Irene A. Stegun (eds.) (1964). Handbook of Mathematical Functions . Washington, D.C.: National Bureau of Standards. URLhttp:/www.math.sfu.ca/cbm/aands/.Algoet, Paul (1992). Universal Schemes for Prediction, Gambling a
Michigan - STAT - 36-754
Solution to Homework #1, 36-75427 January 2006Exercise 1.1 (The product -eld answers countable questions)Let D = S X S , where the union ranges over all countable subsets S of the index set T . For any event D D, whether or not asample path x D depend
Michigan - STAT - 36-754
Solution to Homework #2, 36-7547 February 2006Exercise 5.3 (The Logistic Map as a MeasurePreserving Transformation)The logistic map with a = 4 is a measure-preserving transformation, and the measure it preserves has the density 1/ x (1 x)(on the unit
Michigan - STAT - 36-754
Solution to Homework #3, 36-75425 February 2006Exercise 10.1I need one last revision of the denition of a Markov operator: a linear operatoron L1 satisfying the following conditions.1. If f 0 (-a.e.), then Kf 0 (-a.e.).2. If f M (-a.e.), then Kf M (
Michigan - STAT - 36-754
Syllabus for Advanced Probability II,Stochastic Processes36-754Cosma ShaliziSpring 2006This course is an advanced treatment of interdependent random variablesand random functions, with twin emphases on extending the limit theoremsof probability fro
George Mason - STAT - 344
Introduction to Engineering StatisticsLecture 02 TopicsCollecting engineering dataMechanistic and empirical modelsProbability and probability modelsLecture 02 Reference:Montgomery: Sec 1.2 through 1.41Basic Types of StudiesThree basic methods for
George Mason - STAT - 344
Probability ALecture 03 TopicsRandom experimentsSample spacesEventsCounting techniquesLecture 03 Reference:Montgomery: Sec 2.112ProbabilityCHAPTER OUTLINE2-1 Sample Spaces & Events2-1.1 Random Experiments2-1.2 Sample Spaces2-1.3 Events2-1.
George Mason - STAT - 344
Probability BLecture 04 TopicsEqually likely outcomesProbability rulesUnions, intersections & complementsSet operationsConditional probabilities in treesLecture 04 Reference:Montgomery:Sec 2.2 Axioms of ProbabilitySec 2.3 Addition rulesSec 2.4
George Mason - STAT - 344
Probability CLecture 05 TopicsMultiplication ruleTotal probability ruleIndependence of eventsReliabilityBayes TheoremRandom variablesLecture 05 Reference:Montgomery:Sec 2.5Sec 2.6Sec 2.7Sec 2.8Multiplication, total probability rulesIndepend
George Mason - STAT - 344
Discrete Probability ALecture 06 TopicsDiscrete random variables, defined & graphedCumulative distribution functions, defined &graphedMean and variance of a discrete random variableDefined mathematicallyGraphically explainedLecture 06 Reference:M
George Mason - STAT - 344
Discrete Probability BLecture 07 TopicsFor each of these distributions, we will examine the:Graph and parametersProbability mass and cumulative distribution functionsMean and varianceUniform distributionBinomial distribution:Negative binomial dist
George Mason - STAT - 344
Discrete Probability CLecture 08 TopicsFor each of these distributions, we will examine the:Graph and parametersProbability mass and cumulative distribution functionsMean and varianceHypergeometric distributionPoisson distributionLecture 08 Refere
George Mason - STAT - 344
Probability & Statistics forEngineers/Scientists ILecture 01 TopicsIntroduction to the Syllabus, Assignment SheetBlackboard for course materials, lecture notesIntroduction to the instructorBasic ideas in statisticsIllustration of computer tools RL
George Mason - STAT - 344
Continuous Probability ALecture 09 TopicsContinuous variable distribution propertiesPDF & CDF functions and graphsDerivation of the mean and varianceDesign and uses of the uniform distributionLecture 09 Reference:Montgomery:Sec 4.1Sec 4.2Sec 4.3
George Mason - STAT - 344
Continuous Probability BLecture 10 TopicsNormal distribution graphs and parametersStandard normal calculation, table and softwareApproximating discrete distributions with the normalExponential distributionFormula, graphs and parameterApplicationsL
George Mason - STAT - 344
Continuous Probability CLecture 11 TopicsBuilding on the exponential distribution of prior lectureMotivation, formula, graph, parameters andapplications of the:Erlang distribution and its extension, the gamma distributionWeibull distributionLognorm
George Mason - STAT - 344
Joint Probability Distributions ALecture 12 TopicsBuilding on the exponential distribution of prior lectureMotivation, formula, graph, parameters andapplications of the:Erlang distribution and its extension, the gamma distributionWeibull distributio
George Mason - STAT - 344
Joint Probability Distributions BLecture 13 TopicsPairwise independent random variablesRectangular ranges are necessary, but not sufficientFinding these probability distributions (> 2 dimensions)Joint, marginal and conditional distributionsIndepende
George Mason - STAT - 344
Joint Probability Distributions CLecture 14 TopicsDiscrete multinomial distributionContinuous bivariate normal distributionIndependentDependent (covariance & correlation)Reproductive propertyLinear combinations of random variablesSums and averages
George Mason - STAT - 344
General Bivariate Continuous DistributionsThis continuous variable example illustrates1) Finding the marginal and conditional for the two variables andcorresponding expected values, variances, and standarddeviations.2) Finding general conditional dis
George Mason - STAT - 344
Bivariate Discrete DistributionsLet X and Y be two discrete random variables defined on a samplespace S of an experiment.The joint probability mass function p(x, y) is defined for each pair ofnumbers (x, y) byIn this class the pairs of numbers can be
George Mason - STAT - 344
Gamma DistributionThe gamma distribution with parameters r and can be thought of asthe waiting time for r Poisson events when r is integer. The parameteris the expected number of Poisson events per a unit time interval. Ifincrease the typical wait for
George Mason - STAT - 344
Review:MarginalandConditionalDistributionsandCovarianceforContinuousDistributionsManytopicsinthetextbeginwithgeneralcaseexamplesandthencallattentiontofamiliesofdistribution,especiallythenormalfamily.Thefollowingusesapolynomialdensityfortworandomvariabl
George Mason - STAT - 344
Midterm 2 Overview by ChapterChapter 4 Continuous distributionsFamilies: Identification, domains, expected value variance: See SummaryProbability problems:R script: Normal Distribution, Exponential Distribution, Gamma DistributionHand integration: Si
George Mason - STAT - 344
1. Probability Density Functions from Chapter 4.In the Midterm exam, some density functions will be provided. You may be asked to fill in anyof the additional information: the family names, the domain possible values, and the expectedvalue and variance
George Mason - STAT - 344
Analysis of Paired DataThe Paired t TestThe sample consists of n independently selected items for which a pairof observations is made.We can compute the difference for each pairs and make inferencesabout the mean of these differences using a one samp
George Mason - STAT - 344
Data Type, Population Parameters and R Functionsfor Hypothesis Test and Confidence IntervalsSingle Population InferenceDataParameterR functionCount or fractionProportion pbinom.testof n itemsin class of interestContinuousMean t.testPaired co
George Mason - STAT - 344
Inference about a Difference BetweenPopulation ProportionsExample problem:Olestra was a fat substitute used in some snack foods.After some people consuming such snacks reported gastrointestinalproblems an experiment was performed.Results:90 of 563
George Mason - STAT - 344
Interpreting R Hypothesis Test and Confidence Interval OutputProblems are worth .5 points each. There are 50 problems.Directions: Most answers are very short. Round many digits answers to 2 significant digits.Write neatly giving the problem number and
George Mason - STAT - 344
Interpreting R Hypothesis Test OutputIn writing numeric values for answers, round to 3 significant digits.1.Exact binomial testdata: 12 and 24number of successes = 12, number of trials = 24, p-value = 0.03139alternative hypothesis: true probability
George Mason - STAT - 344
George Mason - STAT - 344
R Inputx = c( 25.8, 36.6, 26.3, 21.8, 27.2)t.test( x, alternative="greater", mu=25, conf.level=.95)R OutputOne Sample t-testdata: xt = 1.0382, df = 4, p-value = 0.1789alternative hypothesis: true mean is greater than 2595 percent confidence interv
George Mason - STAT - 344
Concepts of Point EstimationLecture 18 (former 17) Topics Basic properties of a confidence interval Large-sample confidence intervalsPopulation mean for measurement dataPopulation proportion for categorical data Bootstrap confidence intervals ignore
George Mason - STAT - 344
Confidence IntervalsLecture 20 TopicsVariancesProportionsPrediction intervalsLecture 19 Reference:Montgomery Sections 9-1 thru 9-3Devore Lecture 20Devore Lecture 211Hypothesis and Test ProceduresLecture 20 TopicsHypothesis tests versus confide
George Mason - STAT - 344
Risks and P-ValuesLecture 21and 22 TopicsType II errors risksP-ValuesLecture 21 Reference:Montgomery Sections 9-4, 9-1Excel WSReviewedStat 344 Lecture 221 RisksGo to file: Stat 344 Lecture 21 WSconcerning the interaction of theseinterrelate
George Mason - STAT - 344
dcfeae7461006edd771c0bf8ba9d38963497f08b.xlsDr. SimsIllustration of Defined Alternative HypothesisInput DataH0: =75H1: ==n==7491000.01Output DataIntermediate Calcs7070.470.871.271.67272.472.873.273.67474.474.875.275.67676.4
George Mason - STAT - 344
Two-Sample t-test proceduresTwo-sample t-test procedures enable inference about the difference ofmeans for two populations,Samples from the two populations denoted 1 and 2 are stored invectors called x and y for convenience.The procedures make use of
George Mason - STAT - 344
Tests concerning a population mean.The mean of a random sample from a population provides afoundation for creating a test statistic to assesses hypothesis about apopulation mean.Case 1. The population is from the normal family with meanThe standard d
George Mason - STAT - 344
Tests concerning a Population ProportionBackground: Large Sample TestsCommon large sample test statistics have form Z =.is the estimator for the population parameter of interest.is the expected value under the Null Hypothesis.is standard deviation o
George Mason - STAT - 344
Quiz1Scope ThisisaclosedbookandnotesquizrelatedtoChapter1and associatedRscripts. Thescopeisgivenbelow. Hopefullymanywillgetaperfectscope. 1. BeabletousewordstodescribedensityplotsasinFigure 1.11 2. Beabletowritethedefinitionsofthemeanandmedianon page25and
George Mason - MTH - 203
(J / jS O lUlIM ath 203-001 Spring 2011E xam 1Name: L astF irst( Problem 1 ) (25 points) F ind t he g eneral so lution o f t he linear s ystem (pleasewrite t he soluti on in t he v ector form) o r e xpla in w hy t he s ystem is inconsistent .- X2
George Mason - MTH - 203
~c7L ~T ()~JM a th 203-001 Spring 2011E xam 2N a rne: LastF irst( Prob le m 1) ( 18 point s) C ompute t h e fo llowi ng determin a nt s. Show s teps b u t tryt o avoid u nn ecessary c alcul at ions when possible.2o51-1 3237-644L il@68-
George Mason - MTH - 203
S OL U I) OJ\!M ath 203-001 Spring 2011E xam 3F irstName: L ast(P roblem 1 ) (25 points) For t he m atrix A =[~ ~]do t he following:(1) F ind all eigenvalues;(2) For each eigenvalue, find t he basis of t he eigenspace;(3) I f i t t urns o ut t h
Grand Canyon - FIN - 650
4/16/2010Chapter 15. Ch 15-12 Build a ModelReacher Technology has consulted with investment bankers and determined the interest rate it would payfor different capital structures, as shown below. Data for the risk-free rate, the market risk premium, an
Grand Canyon - FIN - 650
Chapter 22Qifeng (Danny) GuoP22-6McDowell Industries sells on terms of 3/10, net 30. Total sales for the year are $912,500.Forty percent of the customers pay on the 10th day and take discounts: while the other 60%pay, on average, 40 days after their
Grand Canyon - FIN - 650
4/16/2010Chapter 13. Ch 13-11 Build a ModelThe Henley Corporation is a privately held company specializing in lawn care products and services. The most recentfinancial statements are shown below.Income Statement for the Year Ending December 31 (Millio
Grand Canyon - FIN - 650
Chapter 18Qifeng (Danny) GuoP18-1Axel Telecommunications has a target capital structure that consists of 70% debtand 30% equity. The company anticipates that its capital budget for the upcomingyear will be $3,000,000. If Axel reports net income of $2
Grand Canyon - FIN - 650
4/16/2010Chapter 11. Ch 11-18 Build a ModelWebmasters.com has developed a powerful new server that would be used for corporations Internet activities. It wouldcost $10 million at Year 0 to buy the equipment necessary to manufacture the server. The proj
Grand Canyon - FIN - 650
Chapter 14Qifeng(Danny) GuoP14-1Baxter Video Products sales are expected to increase from $5 million in 2007 to $6million in 2008 or by 20%. Its assets totaled $3 million at the end of 2007. Baxter is atfull capacity, so its assets must grow at the s
Grand Canyon - FIN - 650
Chapter 10Qifeng (Danny) GuoP10-2LL Incorporated's currently outstanding 11% coupon bonds have ayield to maturity of 8%. LL believes it could issue at par new bondsthat would provide a similar yield to maturity. If its marginal tax rate is35%, what
Grand Canyon - FIN - 650
Week 2 HomeworkQifeng(Danny) GuoChapter 2P2-1An investor recently purchased a corporatebond that yields 9%. The investor is in the 36%combined federal and state tax bracket. What isthe bonds after-tax yield?Yield before TaxTax RateYield after Ta
Grand Canyon - FIN - 650
Chapter 4Qifeng(Danny) GuoPVInterestYearFV$10,00010%5$16,105.10FVYearsInterestPV$5,000207%$1,292.10PMTYearsInterestFVFvdue$30057%$1,725.22$1,845.99a.PVYearsInterestFV$50016%$530.00b.PVYearsInterestFV$50026%$561
Grand Canyon - FIN - 650
Chapter 1 Mini CaseQifeng (Danny) GuoAssume that you recently graduated and have just reported to work as an investmentadvisor at the brokerage firm of Balik and Kiefer Inc. One of the firms clients isMichelle DellaTorre, a professional tennis player
Punjab Engineering College - LALA - 222
Clayton VHS AP Physics B 05-06 Chapter 4 Quiz SolutionsClayton VHS AP Physics B 05-06 Chapter 4 Quiz SolutionsClayton VHS AP Physics B 05-06 Chapter 4 Quiz SolutionsClayton VHS AP Physics B 05-06 Chapter 4 Quiz SolutionsClayton VHS AP Physics B 05-06
Punjab Engineering College - LALA - 222
Clayton VHS AP Physics B 05-03 Chapter 4 Homework SolutionsClayton VHS AP Physics B 05-03 Chapter 4 Homework SolutionsClayton VHS AP Physics B 05-03 Chapter 4 Homework SolutionsClayton VHS AP Physics B 05-03 Chapter 4 Homework SolutionsClayton VHS AP