This preview shows page 1. Sign up to view the full content.
Unformatted text preview: The Statistician (2000)
49, Part 2, pp. 229±240 Symbolic maximum likelihood estimation with
Mathematica
Colin Rose
Theoretical Research Institute, Sydney, Australia and Murray D. Smith
University of Sydney, Australia
[Received May 1998. Revised October 1999]
Summary. Mathematica is a symbolic programming language that empowers the user to undertake
complicated algebraic tasks. One such task is the derivation of maximum likelihood estimators,
demonstrably an important topic in statistics at both the research and the expository level. In this
paper, a Mathematica package is provided that contains a function entitled SuperLog. This function
utilizes patternmatching code that enhances Mathematica's ability to simplify expressions involving
the natural logarithm of a product of algebraic terms. This enhancement to Mathematica's functionality can be of particular bene®t for maximum likelihood estimation.
Keywords: Computer algebra systems; Estimate; Estimator; Mathematica; Symbolic maximum
likelihood; Teaching 1. Introduction
Although statistical software has long been used for maximum likelihood (ML) estimation, the
focus of attention has almost always been on obtaining ML estimates (a numerical problem), rather than on deriving ML estimators (a symbolic problem). Even after the introduction of powerful
computer algebra systems, in the context of ML estimation, symbolic engines have largely only
been used to solve numerical problems (see, for example, Currie (1995) and Cook and Broemeling
(1995)) or to derive large sample symbolic approximations to ML estimators (see Stafford and
Andrews (1993) and Stafford et al. (1994)). By contrast, we use a computer algebra system for the
®rst time to derive exact symbolic ML estimators from ®rst principles.
This paper shows how Mathematica (see Wolfram (1996)) can be extended to handle symbolic
ML estimation, which is demonstrably an important topic in statistics, at both the research and the
expository level. The paper has ®ve sections. Section 2 expands Mathematica's programming
language to handle symbolic ML estimation. Section 3 illustrates the approach with three simple
expository examples and is therefore well suited for teaching purposes. Section 4 extends the
analysis to more challenging material. Section 5 concludes. Appendices follow giving a glossary
of Mathematica terms and the package code.
Address for correspondence: Murray D. Smith, Econometrics and Business Statistics, University of Sydney, Sydney,
NSW 2006, Australia.
Email: [email protected]
& 2000 Royal Statistical Society 0039±0526/00/49229 230 C. Rose and M. D. Smith 2. Extending Mathematica: the package SMLE
Consider the following simple problem. Let ( Y1 , F F F, Yn ) denote a random sample of size n
collected on Y $ Poisson(è), where parameter è . 0 is unknownÐshow that the mean of the
random sample is the ML estimator of è.
We begin our answer in the usual way by inputting the likelihood function into Mathematica:
In[1]: L n
eÀè èYi
!
i1 Yi If we try to evaluate the loglikelihood
In[2]:
Out[2] Log[L]
4
5
n
eÀè èYi
Log
!
i1 Yi nothing happens, and for good reason! Mathematica will not normally `convert' an expression
such as Log[a3b] into Log[a]Log[b] because symbols a and b could represent anything.
Prima facie, it may appear then that a student armed with a pencil and paper can achieve much
more than Mathematica. Of course, when deriving the loglikelihood, the student makes use of
information which Mathematica does not `know'; for example, the student knows that Yi takes
nonnegative integer values, that n is a positive integer, that è is positive and real valued, that each
term in the product is positive and real valued etc. Mathematica assumes nothing about the
symbols that have been entered, so its inaction is perfectly reasonable.
Fortunately, ML problems have some common features which we can exploit: the contributions
to the likelihood are all positive valued, the sample size is a positive integer and parameters and
variables are real valued. These features can be added to Mathematica, not by providing explicit
information about each symbol in turn, but rather by providing a patternmatching function that
assumes these common features. To do so, we load our small package entitled SMLE.m (see
Appendix B) into Mathematica in the usual way:
In[3]: (SMLEXm (The code given in the SMLE.m package should be stored as a text ®le on disc under the name
SMLE.m. It should be placed in any one of the directories which Mathematica searches when
attempting to ®nd an external ®leÐthis list of directories can be determined by using the $Path
command.) This package is also available electronically by anonymous ®le transfer protocol at
ftp:aaftpXtriXorgXauaSMLEXm
We then `activate' our SuperLog function as follows:
In[4]: SuperLog[On]
SuperLog is now OnX If we now evaluate Log[L] again, we obtain a much more useful result:
In[5]: logL Log[L] Out[5] Àn è À n
i 1 Log[Yi ] Log[è]
! n
i 1 Yi In essence, SuperLog modi®es Mathematica's Log function to mimic what our student does: Maximum Likelihood Estimation with Mathematica 231 it converts `logarithms of products' into `sums of logarithms'. It simpli®es loglikelihood expressions of the form log(Ð in1 f i ) where we know that f i is de®ned on the positive real line. To do
this, SuperLog exploits the power of the Mathematica programming language by using patternmatching code (apart from reserved symbols, the function accepts any indexing symbol, any
symbol for sample size and any functional form for the likelihood contributions). The SMLE.m
package has been tested using both version 3 and version 4 of Mathematica on platforms including
Macintosh, Windows 95, Windows 98, Windows NT, SGI Unix and DEC Unix. To return
Mathematica's Log function to its default status, simply enter SuperLog[Off]. 3. Simple examples
In this section, we consider three expository examples. 3.1. Example 1: Poisson distribution
We have already derived the loglikelihood for the Poisson distribution. The score is
In[6]:
Out[6] score d è logL
n
Yi
Àn i1
è where Mathematica's internal partial differential operator d è has been used. Setting the score to 0
de®nes the ML estimator è. The resulting equation can be easily solved by using Mathematica's
Solve function. The ML estimator è is given as a replacement rule 3 for è:
^
è Solve[score 0, è]
&&
''
n
i1 Yi
Out[7]
è3
n
where the secondorder conditions con®rm that è, the mean of the random sample, is indeed the
ML estimator.
In[7]: In[8]:
Out[8] d fè,2g logL
n
Yi
À i21
è Finally, let us suppose that an observed random sample is (1, 0, 3, 4):
In[9]: data f1, 0, 3, 4g;
Then the ML estimate of è is obtained by substitution of these data into the ML estimator è:
In[10]: ^
è aX fn 3 4, Yi :. data[[i]]g Out[10] ffè 3 2gg (There are several ways to enter data into Mathematica objects. The method used here is based on
one of Mathematica's replacement rules :.. The notation data[ ] is Mathematica notation for
[i]
the ith element of the list data.) 232 C. Rose and M. D. Smith 3.2. Example 2: exponential distribution
Suppose that the positivevalued continuous variable Y is such that Y $ Exp(è), where parameter
è . 0. For a random sample of size n on Y , we have the loglikelihood function as
4
5
n
eÀYi aè
aa Apart
In[11]:
logL Log
è
i1
n
Out[11] Y i 1 i Àn Log[è] À è which has been further simpli®ed by using the Apart function. The score is
In[12]:
Out[12] score d è logL
n
n
Yi
À i21
è
è The ®rstorder conditions are given by
In[13]:
Out[13] ^
è Solve[score 0, è]
&&
''
n
i1 Yi
è3
n
and the secondorder conditions evaluated at è are
In[14]: ^
d fè,2g logL aX Flatten[è] Out[14] n3
À n
( i1 Yi )2
Hence, the Hessian is strictly negative. The ML estimator is thus è (1a n) Ó in1 Yi . 3.3. Example 3: beta distribution
Let the proportion Y be a random variable de®ned over the unit interval of the real line, 0 , y , 1.
Furthermore, let the probability density function (PDF) of Y be given by
In[15]: f è yèÀ1 Thus Y $ beta(è, 1), where parameter è . 0 is unknown. Clearly the distribution of Y is a special
case of the standard twoparameter beta distribution. For a random sample of size n on Y , the loglikelihood for è is given by
!
n
:
(f aX y 3 Yi )
In[16]
logL Log
i1 Out[16] n Log[è] (À1 è) The ML estimator of è is derived as n
i1 Log[Yi ] Maximum Likelihood Estimation with Mathematica In[17]:
Out[17] 233 ^
è Solve[d è logL 0, è]
&&
''
n
è 3 À n
i1 Log[Yi ] because on inspecting the Hessian
In[18]: d fè,2g logL Out[18] À n
è2 we see that it is negative for all è; thus the loglikelihood is globally concave in è and the solution
to the ®rstorder condition corresponds to the unique maximum. 4. Further illustrations
In this section, we consider three examples in which Mathematica is used to derive the ML
estimator. All three are commonly used when teaching ML estimation. In addition, we show how
to use Mathematica to tackle other important issues concerning ML estimation such as deriving
the distribution of an ML estimator, proving concavity of a loglikelihood and concentration of a
loglikelihood. 4.1. Example 3 (continued): beta distribution
Continuing the beta(è, 1) model, consider the problem of determining the exact distribution of the
ML estimator è À naÓ in1 log( Yi ). This can be attempted with Mathematica by using the
momentgenerating function (MGF) method (Mittelhammer (1996), section 3.5, provides a good
description of this method). First, we derive the MGF of log( Y ), i.e. we ®nd E[expf t log( Y )g]
E( Y t ):
In[19]:
In[20]:
Out[20] SetOptions[Integrate, GenerateConditions 3 False];
1
mgf yt f dy
0 è
tè where, for simplicity, we have turned off the Integrate option which yields conditional output.
Next, because ( Y1 , F F F, Yn ) is a random sample of size n, the MGF of
log(Y ) À
is, by the MGF theorem, equal to & n
1
1
log( Yi )
n i1
è t
E exp À log( Y )
n '! n
E ( Y À ta n ) n X Given our previous output, we only need to use a replacement rule to determine the MGF of
log( Y ): 234 C. Rose and M. D. Smith In[21]:
Out[21]
n
t
mgf aX t 3 À
aa Simplify
n
n
nè
Àt n è This expression can be recognized as the MGF of a random variable W $ gamma( n, 1a nè), as we
verify by comparing it with
&
'
waÀ1 eÀwab
1
;
aX a 3 n, b 3
In[22]:
g
nè
Gamma[a] ba
I
etw g dw aa PowerExpand
In[23]:
0 Out[23] nn èn (Àt n è)Àn
Given that log( Y ) is gamma distributed, and that è 1alog( Y ), it follows that the ML estimator
has an inverted gamma distribution with parameters n and 1a nè. The PDF of è q . 0 is easily
è
derived by transformation:
In[24]: Out[24] 1
;
pdf Abs[d q w] g
q
À1n Àn
1
1
Ànèaq
e
q
nè
Abs[q]2 Gamma[n] w If desired, this expression can be further simpli®ed by using a replacement rule, since the argument of the absolute value is always positive. The mean of the ML estimator is easily derived:
I
q pdf dq aa FullSimplify
In[25]:
0 Out[25] nè
À1 n
It is easy to see from this output that the ML estimator è is biased upwards. 4.2. Example 4: normal linear regression model
A statistical model of considerable practical importance is the normal linear regression model.
For illustration, we shall consider a simple case of this model, namely a regression model with a
constant dummy and one regressor variable X . For a given value of X x, the conditional distribution of the dependent variable Y is assumed to be
Y j( X x) $ N (â1 â2 x, ã),
where parameter è (â1 , â2 , ã) (we use ã to denote the variance parameter). Denote a random
sample of T pairs on ( Y , X ) by (( Y1 , X 1 ), F F F, ( YT , X T )). We assume, conditional on each X k
xk , that Yi is independent of Yj for all i T j ( i, j, k 1, F F F, T ). Under these assumptions, the
loglikelihood is given by 4
In[26]: logL Log Out[26] À Maximum Likelihood Estimation with Mathematica
T
e 5 À(Yk À ì)2 a(2ã) p
2ð ã k1 235 aX ì 3 â1 xk â2
1
T ã Log[2] T ã Log[ð] T ã Log[ã]
2ã T â2 â2
1
2 T
k1
T
T
T
T
2
x2 2â1 â2
xk À
Yk À 2â2
xk Yk
Yk
k
k1 k1 k 1 k1 The score vector is given by
In[27]: score fd â1 logL, d â2 logL, d ã logLg aa Simplify Out[27] V
T
T
T
T
T
b
`ÀT â À â x Y Àâ x À â x2 x Y
1
2
k
k
1
k
2
k
k
k
k1
k1
k1
k1
k1
b
,
,
X
ã
ã
T
T
T
2
1
ÀT ã T â2 â2
xk 2â1 â2
xk À
Yk
1
2
2ã2
k1
k 1
k 1 À 2 â2
T k1 xk Yk
T k1 Y2
k W
b
a
b
Y The ML estimators of è are derived by Mathematica as
^
è Solve[score f0, 0, 0g, fâ1 , â2 , ãg] In[28]: VV
bb
bb
bb
T
T
2
T
T
T
T
2
``
2
xk
Yk 2
xk
Yk
xk Yk À T
xk Yk
ã3 À
bb
bb
k1
k1
k1
k1
k 1
k1
bb
XX Out[28]
À
â1 3 T
k1
T
k1 2
xk
2
k x T
k1 T
Y2 T
k
T
T
k1
x2
k T
T
k1 Y2
k 0 T
2
T
2
xk T
xk
TÀ
, Yk À
xk
xk Yk
k1
k1
k1
, â2 3
T
2
T
2
À
xk T
xk
k1 k1 k1 T
k1 xk
T
k 1 k1 T
T
WW
bb
bb
xk Yk bb
aa Yk À T
k1
2
T
2
xk À T
xk
k1 k1 bb
bb
bb
YY The functional form given for the estimator is unfamiliar and imposing. However, if we iteratively
solve the ®rstorder conditions 236 C. Rose and M. D. Smith In[29]:
Out[29]
In[30]:
Out[30] ~
â1 Solve[score[[1]] 0, â1 ] aa Simplify
VV
WW
T
T
aa
``
À â2
xk
Yk
k 1
k 1
XXâ 1 3
YY
T
~
~
â2 Solve[(score[[2]] aX â1 ) 0, â2 ] aa Simplify
VV
WW
T
T
T
bb
bb
bb
bb
ba
xk
Yk À T
xk Yk bb
``
ab
k1
k1
k 1
â2 3
T
2
T
2
bb
bb
bb
bb
bb
bb
xk À T
xk
XX
YY
k1 k1 it is fairly easy to see that these results are just the wellknown formulae for the estimators that are
presented in most elementary texts:
â1 Y À â2 x
and
( Yk À Y )( xk À x)
â2
T k 1 0 T
k 1 ( xk À x)2 X Further analysis might include the veri®cation of the equivalence between functional forms and a
formal check of the secondorder conditions. Both exercises involve a mixture of work with
Mathematica and with a pencil and paper.
For data collected on the variables, it is straightforward to compute ML estimates. For example,
if the data are
In[31]: depY f1, 0, 3, 4g; regX f1, 2, 3, 4g; the ML estimates are
In[32]:
Out[32] ^
è aX fT 3 4, Yk :. depY[[k]], xk :. regX[[k]]g
&&
''
7
6
, â1 3 À1, â2 3
ã3
10
5 4.3. Example 5: gamma distribution
Let ( Y1 , F F F, YT ) denote a random sample of size T collected on a random variable Y y . 0
which is gamma distributed; i.e. Y $ gamma(á, â), where parameter è (á, â), with á . 0 and
â . 0. We wish to determine the ML estimator of è. The loglikelihood is
4
5
T
YáÀ1 eÀYi aâ
i
aa Apart
In[33]:
logL Log
á
i1 Gamma[á] â
T
T
T
Yi
Log[Yi ] á
Log[Yi ] À i1
Out[33]
ÀT á Log[â] À T Log[Gamma[á]] À
â
i1
i 1
For this problem, the ML estimator of è admits, only in part, a closed form solution. To see this,
consider the score: Maximum Likelihood Estimation with Mathematica In[34]:
Out[34] score fd á logL, d â logLg
@ T
Tá
Log[Yi ], À
ÀT Log[â] À T PolyGamma[0, á]
â
i 1 T 237 Y A i1 i
2 â where the PolyGamma[0,á] term is Mathematica's notation for the digamma function (the ®rst
derivative of the natural logarithm of the gamma function). For Mathematica to derive the ML
estimator in closed form, it must successfully execute Solve[score{0,0},{á,â}].
Unfortunately, this task cannot be performed in this case because of the presence of the
PolyGamma function. Nevertheless, we can make progress by concentrating the loglikelihood in
â. This is because
In[35]: ^
â Solve[score[[2]] 0, â]
&& Out[35] T
â3 Y
Tá '' i1 i
is in a closed form. Mathematica has given us â as a function of á, i.e. â â(á). Hence, the ML
estimator of â is
T
1
Yi ,
â(á)
á T i 1
where á denotes the ML estimator of á. Replacing â with â(á) in the loglikelihood yields the
concentrated loglikelihood:
In[36]: ^
ConlogL logL aX Flatten[ â]
T Out[36] ÀT á À T Log[Gamma[á]] À T á Log
á T
i1 Y
Tá i 1 i !
À T
i1 Log[Yi ] Log[Yi ] As in the original problem, maximization of the concentrated loglikelihood with respect to á
does not yield a closed form solution. Hence, the ML estimator is de®ned implicitly by
á arg maxá . 0 (ConlogL)X
For a speci®c set of data, numerical techniques are required to determine the ML estimate á.
With actual data, the issue of determining a suitable numerical algorithm to maximize the
observed concentrated loglikelihood becomes important. Mathematica's inbuilt optimizer
FindMinimum has a suite of the more popular gradient method algorithms available to the user,
including the Newton±Raphson algorithm. Of course, which of these algorithms, if any, is
appropriate depends very much on the particular situation at hand. In the present case, further
algebra with Mathematica enables us to uncover concavity in á:
In[37]:
Out[37] d fá,2g ConlogL aa Factor
À T(À1 á PolyGamma[1, á])
á 238 C. Rose and M. D. Smith The sign of the derivative obviously depends on the quantity within round brackets. A plot against
various á . 0 is a simple, informal device for determining its sign; for example,
In[38]: Plot[À1 á PolyGamma[1, á], fá, 0X1, 5g] As the plot is everywhere positive, the Hessian must be negative for all á . 0. Thus the
concentrated loglikelihood is concave in á, and we may conclude that the Newton±Raphson
algorithm is well suited for estimating á. 5. Conclusion
Although computer software has long been used for ML estimation, the focus has almost always
been on numerical rather than on symbolic problems. In this paper, we have shown how the
Mathematica computer algebra system can be used to derive symbolic ML estimators. This was
done by modifying Mathematica's Log command using our SuperLog function. This is
especially useful when constructing the loglikelihood for a random sample consisting of
independent random variables. We have shown how this new function enhances Mathematica's
functionality and have demonstrated its particular bene®ts for the teaching of ML estimation. Acknowledgements
Revisions to this paper were undertaken while Smith was visiting the Sonderforschungsbereich
386, Institut fur Statistik, LudwigMaximiliansUniversitat, Munchen, Germany. The content of
È
ÈÈ
È
the paper bene®tted from presentations at the Worldwide Mathematica Conference (Chicago,
1998), the International Association for Mathematics and Computers in Simulation Conference on
Applications of Computer Algebra (Prague, 1998) and the Royal Statistical Society international
conference (Glasgow, 1998) and from discussions with seminar participants at the University of
Sydney. We also thank the Joint Editor and referee for helpful comments. Smith acknowledges
gratefully the support of the Alexander von HumboldtStiftung. Appendix A: Glossary
For completeness, we provide a short summary of the use and syntax of some of the Mathematica
functions that appear in the paper. Further details of these, and all other functions, can be found in
Wolfram (1996); see also the online help provided with the Mathematica software. Maximum Likelihood Estimation with Mathematica A.1. Simpli®cation rules A.2. Brackets A.3. Partial differentiation A.4. Replacement rules A.5. 239 Search functions Functions Apart, Factor, FullSimplify, PowerExpand, and Simplify represent a portion of
a suite of commands designed to perform various algebraic manipulations to a speci®ed expression. The
syntax of each of these commands is, for example, Simplify[expr] or expr // Simplify. The
latter is the form which we generally prefer. Brackets in Mathematica have distinct interpretations and are not interchangeable. Square brackets [ ]
are reserved for use with Mathematica function names; for example the logarithm function is Log[ ]. A
list of elements is collected between braces { }; vectors and matrices are built from lists. Round brackets
( ) are the only parentheses which have the usual termcollective interpretation. d è is Mathematica's partial differential operator. Its syntax, for the partial derivative of an expression
expr with respect to a symbol è, is d è expr. The syntax for the secondorder partial derivative is
d fè,2g expr. The ability to replace, in a given expression, a symbol with other symbols or numeric quantities is a vital
component of the Mathematica programming language. For instance, expraX è 3 ô, acts on expr by
replacing symbol è with ô, wherever the former occurs in expr. By contrast, expraX è: . ô, acts on
expr by replacing symbol è with ô, but delaying the evaluation of ô until after the replacement has been
performed. This form of transformation is especially useful when expr is a symbolic sum, and we wish
to evaluate it for a speci®c set of data. Solve searches for solutions to an equation or set of equations. The command Solve[expr 0, è]
attempts to ®nd the values of è for which expr 0. If sol denotes the solution to an equation, the
output from Solve is reported in the form of a list of replacement rules, e.g. fè 3 solg. In the case of
polynomial equations, Mathematica attempts to output all solutions, including complexvalued solutions.
Sets of equations must be enclosed within a list, e.g. fexpr1 0, expr2 0g. Similarly, if the
solutions are sought in two or more variables, then those variable symbols must be placed in a list also,
e.g. fè1, è2g: hence, Solve[fexpr1 0, expr2 0g, fè1, è2g]. Appendix B: Code
(Ã :Name:
(Ã :Authors:
(Ã :Version:
(Ã :Legal:
(Ã :Summary: SMLE
Colin Rose and Murray D. Smith
Mathematica v3, or v4, or later required
Copyright 1999
Symbolic Maximum Likelihood Estimation BeginPackage["SMLE`"]
SuperLog:
:usage
"SuperLog[On] activates the enhanced Log operator, so that
Log[Product[_ _]] objects get converted into sums of logs.
SuperLog[Off] switches the enhancement off."
Begin["`Private`"]
SuperLog[Q_]: Module[ferk, iii, nnng, Ã)
Ã)
Ã)
Ã)
Ã) 240 C. Rose and M. D. Smith Product[iii,{iii,nnn}]; (Ã preload Product Ã )
Which[
Q On,
Unprotect[Log]; Clear[Log];
Log[Product[x_, {k_, a_, b_}]]:
Log[Product[Times[erk, x], {k, a, b}]] aX erk À . 1;
Log[Product[HoldPattern[Times[x_ _]], {k_, a_, b_}]] X Simplify[
Map[Sum[#, {k, a, b}]&, Plus@@Map[Expand[PowerExpand[Log[#]]]&,
List[x]] ] //.
Sum[u_X w_, {kk_, aa_, bb_}] X. u Sum[w,{kk, aa, bb}] /; FreeQ[u, kk]
True];
Protect[Log]; Print["SuperLog is now On."],
Q Off,
Unprotect[Log]; Clear[Log]; Protect[Log]; Print["SuperLog is now Off."],
True,
Print["Carumbah! Please use SuperLog[On] or SuperLog[Off]."]]
End
Protect[ SuperLog ];
EndPackage References
Cook, P. and Broemeling, L. (1995) Bayesian statistics using Mathematica. Am. Statistn, 49, 70±76.
Currie, I. D. (1995) Maximum likelihood estimation with Mathematica. Appl. Statist., 44, 379±394.
Mittelhammer, R. (1996) Mathematical Statistics for Economics and Business. New York: Springer.
Stafford, J. and Andrews, D. (1993) A symbolic algorithm for studying adjustments to the pro®le likelihood. Biometrika,
80, 715±730.
Stafford, J., Andrews, D. and Wang, Y. (1994) Symbolic computation: a uni®ed approach to studying likelihood. Statist.
Comput., 4, 235±245.
Wolfram, S. (1996) The Mathematica Book, 3rd edn. Cambridge: Cambridge University Press. ...
View
Full
Document
This note was uploaded on 02/28/2010 for the course ECO 211 taught by Professor Gilo during the Spring '10 term at Young Harris.
 Spring '10
 gilo

Click to edit the document details