This preview shows pages 1–2. Sign up to view the full content.
This preview has intentionally blurred sections. Sign up to view the full version.View Full Document
Unformatted text preview: Learning Belief Networks in the Presence of Missing Values and Hidden Variables Nir Friedman Computer Science Division, 387 Soda Hall University of California, Berkeley, CA 94720 email@example.com Abstract In recent years there has been a flurry of works on learning probabilistic belief networks . Cur- rent state of the art methods have been shown to be successful for two learning scenarios: learn- ing both network structure and parameters from complete data, and learning parameters for a fixed network from incomplete data—that is, in the presence of missing values or hidden variables. However, no method has yet been demonstrated to effectively learn network structure from incom- plete data. In this paper, we propose a new method for learning network structure from incomplete data. This method is based on an extension of the Expectation-Maximization (EM) algorithm for model selection problems that performs search for the best structure inside the EM procedure. We prove the convergence of this algorithm, and adapt it for learning belief networks. We then describe how to learn networks in two scenarios: when the data contains missing values, and in the presence of hidden variables. We provide exper- imental results that show the effectiveness of our procedure in both scenarios. 1 INTRODUCTION Belief networks (BN) (also known as Bayesian networks and directed probabilistic networks ) are a graphical repre- sentation for probability distributions. They are arguably the representation of choice for uncertainty in artificial in- telligence. These networks provide a compact and natural representation, effective inference, and efficient learning. They have been successfully applied in expert systems, diagnostic engines, and optimal decision making systems (e.g., [Heckerman et al. 1995]). A Belief network consists of two components. The first is a directed acyclic graph in which each vertex corre- sponds to a random variable. This graph represents a set of conditional independence properties of the represented distribution. This component captures the structure of the probability distribution, and is exploited for efficient infer- ence and decision making. Thus, while belief networks can represent arbitrary probability distributions, they provide computational advantage for those distributions that can be represented with a simple structure. The second component is a collection of local interaction models that describe the conditional probability of each variable given its parents in the graph. Together, these two components represent a unique probability distribution [Pearl 1988]. Eliciting belief networks from experts can be a laborious and expensive process in large applications. Thus, in recent years there has been a growing interest in learning belief networks from data [Cooper and Herskovits 1992; Lam and Bacchus 1994; Heckerman et al. 1995]....
View Full Document
This note was uploaded on 02/12/2010 for the course COMPUTER S 10586 taught by Professor Jilinwang during the Fall '09 term at Zhejiang University.
- Fall '09
- Computer Science