13.6 P ROBABILISTIC L EXICALIZED CFG S 227 Suppose we were to treat a probabilistic lexicalized CFG like a really big CFG that just happened to have lots of very complex non-terminals and estimate the probabilities for each rule from maximum likelihood estimates. Thus, according to Eq. 13.18 , the MLE estimate for the probability for the rule P(VP(dumped,VBD) ! VBD(dumped, VBD) NP(sacks,NNS) PP(into,P)) would be Count(VP(dumped,VBD) ! VBD(dumped, VBD) NP(sacks,NNS) PP(into,P)) Count(VP(dumped,VBD)) (13.23) But there’s no way we can get good estimates of counts like those in ( 13.23 ) because they are so specific: we’re unlikely to see many (or even any) instances of a sentence with a verb phrase headed by dumped that has one NP argument headed by sacks and a PP argument headed by into . In other words, counts of fully lexicalized PCFG rules like this will be far too sparse, and most rule probabilities will come out 0.
