Unformatted Document Excerpt
Coursehero >>
Iowa >>
Iowa State >>
CS 673
Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Course Hero has millions of student submitted documents similar to the one
below including study guides, practice problems, reference materials, practice exams, textbook help and tutor support.
Bayesian Using Networks to Analyze Expression Data
Nir Friedman
School of Computer Science & Engineering Hebrew University Jerusalem, 91904, ISRAEL nir@cs.huji.ac.il
Michal Linial
Institute of Life Sciences Hebrew University Jerusalem, 91904, ISRAEL michall@leonardo.ls.huji.ac.il
Iftach Nachman
Center for Neural Computation and School of Computer Science & Engineering Hebrew University Jerusalem, 91904, ISRAEL iftach@cs.huji.ac.il
Dana Pe'er
School of Computer Science & Engineering Hebrew University Jerusalem, 91904, ISRAEL danab@cs.huji.ac.il
Abstract DNA hybridization arrays simultaneously measure the expression level for thousands of genes. These measurements provide a snapshot of transcription levels within the cell. A major challenge in computational biology is to uncover, from such measurements, gene/protein interactions and key biological features of cellular systems. In this paper, we propose a new framework for discovering interactions between genes based on multiple expression measurements. This framework builds on the use of Bayesian networks for representing statistical dependencies. A Bayesian network is a graph-based model of joint multivariate probability distributions that captures properties of conditional independence between variables. Such models are attractive for their ability to describe complex stochastic processes, and since they provide clear methodologies for learning from (noisy) observations. We start by showing how Bayesian networks can describe interactions between genes. We then describe a method for recovering gene interactions from microarray data using tools for learning Bayesian networks. Finally, we apply this method to the S. cerevisiae cell-cycle measurements of Spellman et al. (1998).
A preliminary version of this work appeared in Proceedings of the Fourth Annual International Conference on Computational Molecular Biology, 2000. This work was supported through the generosity of the Michael Sacher Trust and Israeli Science Foundation equipment grant.
0
1
Introduction
A central goal of molecular biology is to understand the regulation of protein synthesis and its reactions to external and internal signals. All the cells in an organism carry the same genomic data, yet their protein makeup can be drastically different both temporally and spatially, due to regulation. Protein synthesis is regulated by many mechanisms at its different stages. These include mechanisms for controlling transcription initiation, RNA splicing, mRNA transport, translation initiation, post-translational modications, and degradation of mRNA/protein. One of the main junctions at which regulation occurs is mRNA transcription. A major role in this machinery is played by proteins themselves, that bind to regulatory regions along the DNA, greatly affecting the transcription of the genes they regulate. In recent years, technical breakthroughs in spotting hybridization probes and advances in genome sequencing efforts lead to development of DNA microarrays, which consist of many species of probes, either oligonucleotides or cDNA, that are immobilized in a predened organization to a solid phase. By using DNA microarrays researchers are now able to measure the abundance of thousands of mRNA targets simultaneously (DeRisi., Iyer & Brown 1997, Lockhart, Dong, Byrne, Follettie, Gallo, Chee, Mittmann, Want, Kobayashi, Horton & Brown 1996, Wen, Furhmann, Micheals, Carr, Smith, Barker & Somogyi 1998). Unlike classical experiments, where the expression levels of only a few genes were reported, DNA microarray experiments can measure all the genes of an organism, providing a genomic viewpoint on gene expression. As a consequence, this technology facilitates new experimental approaches for understanding gene expression and regulation (Iyer, Eisen, Ross, Schuler, Moore, Lee, Trent, Staudt, Hudson, Boguski, Lashkari, Shalon, Botstein & Brown 1999, Spellman, Sherlock, Zhang, Iyer, Anders, Eisen, Brown, Botstein & Futcher 1998). Early microarray experiments examined few samples, and mainly focused on differential display across tissues or conditions of interest. The design of recent experiments focuses on performing a larger number of microarray assays ranging in size from a dozen to a few hundreds of samples. In the near future, data sets containing thousands of samples will become available. Such experiments collect enormous amounts of data, which clearly reect many aspects of the underlying biological processes. An important challenge is to develop methodologies that are both statistically sound and computationally tractable for analyzing such data sets and inferring biological interactions from them. Most of the analysis tools currently used are based on clustering algorithms. These algorithms attempt to locate groups of genes that have similar expression patterns over a set of experiments (Alon, Barkai, Notterman, Gish, Ybarra, Mack & Levine 1999, Ben-Dor, Shamir & Yakhini 1999, Eisen, Spellman, Brown & Botstein 1998, Michaels, Carr, Askenazi, Fuhrman, Wen & Somogyi 1998, Spellman et al. 1998). Such analysis has proven to be useful in discovering genes that are co-regulated. A more ambitious goal for analysis is revealing the structure of the transcriptional regulation process (Akutsu, Kuhara, Maruyama & Minyano 1998, Chen, Filkov & Skiena 1999, Somogyi, Fuhrman, Askenazi & Wuensche 1996, Weaver, Workman & Stormo 1999). This is clearly a hard problem. The current data is extremely noisy. Moreover, mRNA expression data alone only gives a partial picture that does not reect key events such as translation and protein (in)activation. Finally, the amount of samples, even in the largest experiments in the foreseeable future, does not provide enough information to construct a full detailed 1
Figure 1: An example of a simple Bayesian network structure.
B
D
The network structure also implies that the joint distribution has the product form
C
model with high statistical signicance. In this paper, we introduce a new approach for analyzing gene expression patterns, that uncovers properties of the transcriptional program by examining statistical properties of dependence and conditional independence in the data. We base our approach on the well-studied statistical tool of Bayesian networks (Pearl 1988). These networks represent the dependence structure between multiple interacting quantities (e.g., expression levels of different genes). Our approach, probabilistic in nature, is capable of handling noise and estimating the condence in the different features of the network. We are therefore able to focus on interactions whose signal in the data is strong. Bayesian networks are a promising tool for analyzing gene expression patterns. First, they are particularly useful for describing processes composed of locally interacting components; that is, the value of each component directly depends on the values of a relatively small number of components. Second, statistical foundations for learning Bayesian networks from observations, and computational algorithms to do so are well understood and have been used successfully in many applications. Finally, Bayesian networks provide models of causal inuence: Although Bayesian networks are mathematically dened strictly in terms of probabilities and conditional independence statements, a connection can be made between this characterization and the notion of direct causal inuence. (Heckerman, Meek & Cooper 1997, Pearl & Verma 1991, Spirtes, Glymour & Scheines 1993). The remainder of this paper is organized as follows. In Section 2, we review key concepts of Bayesian networks, learning them from observations, and using them to infer causality. In Section 3, we describe how Bayesian networks can be applied to model interactions among genes and discuss the technical issues that are posed by this type of data. In Section 4, we apply our approach to the gene-expression data of Spellman et al. (Spellman et al. 1998), analyzing the statistical signicance of the results and their biological plausibility. Finally, in Section 5, we conclude with a discussion of related approaches and future work.
2
" #" " " # " 2!10& )('&%$ !
# " 7 3 29B 3 )A@7 3 987 3 26 3 5&0 3 4 " " #" "
" %$
E
A
This network structure implies several conditional independence statements: , , , , and .
2
Bayesian Networks
U )F U F 6)W T S F" II I "H F D !)RPQPP@GE4 F e
2.1 Representing Distributions with Bayesian Networks
Consider a nite set of random variables where each variable may take on a value from the domain Val . In this paper, we use capital letters, such as , for variable names and lowercase letters to denote specic values taken by those variables. Sets of variables are denoted by boldface capital letters , and assignments of values to the variables in these sets are denoted by boldface lowercase letters x y z. We denote to mean is independent of conditioned on . A Bayesian network is a representation of a joint probability distribution. This representation consists of two components. The rst component, , is a directed acyclic graph (DAG) whose vertices correspond to the random variables . The second component, describes a conditional distribution for each variable, given its parents in . Together, these two components specify a unique distribution on . The graph represents conditional independence assumptions that allow the joint distribution to be decomposed, economizing on the number of parameters. The graph encodes the Markov Assumption:
By applying the chain rule of probabilities and properties of conditional independencies, any joint distribution that satises (*) can be decomposed into the product form
is the set of parents of in . Figure 1 shows an example of a graph , lists the where Pa Markov independencies it encodes, and the product form they imply. A graph species a product form as in (1). To fully specify a joint distribution, we also need to specify each of the conditional probabilities in the product form. The second part of the Bayesian network describes these conditional distributions, Pa for each variable . We denote the parameters that specify these distributions by . In specifying these conditional distributions we can choose from several representations. In this paper we focus on two of the most commonly used representations. For the following discussion, . The choice of representation depends suppose that the parents of a variable are on the type of variables we are dealing with: Discrete variables. If each of and takes discrete values from a nite set, then we can represent as a table that species the probability of values for for each joint assignment to . Thus if, for example, all the variables are binary valued, the table will specify distributions. This is a general representation which can describe any discrete conditional distribution. Thus, we do not loose expressiveness by using this representation. This exibility comes at a price: The number of free parameters is exponential in the number of parents. 3
U )F
F
i
U F R6)u
t
U F %)u 3
t
Pa
i
(*) Each variable
is independent of its non-descendants, given its parents in
.
`" X" Y%BF g &h
e
p
d
i
"U F %Ru)W
" "
g" e f" i p
H r bsU U F %)W 3 q Y()RQPPQ@W 3 4 S F"I I I "H F
T w v"I I I "H v x5PPPY!D
S F"I I I "H )Ri PPP@F
d
w v"I I I "H 5PPPYv
S
c" a %b%" g i
w w5v"PPQv I I I "H w v"I I I "H v x5QPPP( F
S F"I I I "H )RPPPQ@F
V
U F
F u 3
C
U F
U u F
i
U
i
V
t
d
(1)
y
Continuous variables. Unlike the case of discrete variables, when the variable and its parents are real valued, there is no representation that can represent all possible densities. A natural choice for multivariate continuous distributions is the use of Gaussian distributions. These can be represented in a Bayesian network by using linear Gaussian conditional densities. In this representation the conditional density of given its parents is given by:
That is, is normally distributed around a mean that depends linearly on the values of its parents. The variance of this normal distribution is independent of the parents' values. If all the variables in a network have linear Gaussian conditional distributions, then the joint distribution is a multivariate Gaussian (Lauritzen & Wermuth 1989).
Hybrid networks. When our network contains a mixture of discrete and continuous variables, we need to consider how to represent a conditional distribution for a continuous variable with discrete parents, and for a discrete variable with continuous parents. In this paper, we disallow the latter case. When a continuous variable has discrete parents, we use conditional Gaussian distributions (Lauritzen & Wermuth 1989) in which for each joint assignment to the discrete parents of , we represent a linear Gaussian distribution of given its continuous parents.
Given a Bayesian network, we might want to answer many types of questions that involve the joint probability (e.g., what is the probability of given observation of some of the other variables?) or independencies in the domain (e.g., are and independent once we observe ?). The literature contains a suite of algorithms that can answer such queries (see, e.g. (Jensen 1996, Pearl 1988)), exploiting the explicit representation of structure in order to answer queries efciently.
2.2 Equivalence Classes of Bayesian Networks
A Bayesian network structure implies a set of independence assumptions in addition to (*). Let Ind be the set of independence statements (of the form is independent of given ) that hold in all distributions satisfying these Markov assumptions. These can be derived as consequences of (*) (see (Pearl 1988)). More than one graph can imply exactly the same set of independencies. For example, consider and both imply the same set graphs over two variables and . The graphs of independencies (i.e., Ind . Two graphs and are equivalent if Ind Ind . That is, both graphs are alternative ways of describing the same set of independencies . This notion of equivalence is crucial, since when we examine observations from a distribution, we cannot distinguish between equivalent graphs. Pearl and Verma (1991) show that we can characterize equivalence classes of graphs using a simple representation. In particular, these results establish that equivalent graphs have the same underlying undirected graph but might disagree on the direction of some of the arcs.
4
j li
g
F
F
4 f
X
i
F
I "U U %6Gbh
X
e
F
X
F
F
U
j ki
w " II I "H !6Yxb%PQPP%
FV
X
d
4
i
F
F
F
ighf 4 X
F W 3
i
i
F
w v"I I I "H 5PPPYv F
i
y y `
Theorem 2.1 (Pearl & Verma 1991) Two DAGs are equivalent if and only if they have the same underlying undirected graph and the same v-structures (i.e. converging directed edges into the same node, such as ). Moreover, an equivalence class of network structures can be uniquely represented by a partially directed graph (PDAG), where a directed edge denotes that all members of the equivalence class contain the arc ; an undirected edge denotes that some members of the class contain the arc , while others contain the arc . Given a DAG , the PDAG representation of its equivalence class can be constructed efciently (Chickering 1995).
2.3 Learning Bayesian Networks
The problem of learning a Bayesian network can be stated as follows. Given a training set x x of independent instances of , nd a network that best matches . More precisely, we search for an equivalence class of networks that best matches . The theory of learning networks from data has been examined extensively over the last decade. We only briey describe the high-level details here. We refer the interested reader to (Heckerman 1998) for a recent tutorial on the subject. The common approach to this problem is to introduce a statistically motivated scoring function that evaluates each network with respect to the training data, and to search for the optimal network according to this score. One method for deriving a score is based on Bayesian considerations (see (Cooper & Herskovits 1992, Heckerman, Geiger & Chickering 1995) for complete description). In this score, we evaluate the posterior probability of a graph given the data: S
is the marginal likelihood which averages the probability of the data over all possible parameter assignments to . The particular choice of priors and for each determines the exact Bayesian score. Under mild assumptions on the prior probabilities, this scoring metric is asymptotically consistent: Given a sufciently large number of samples, graph structures that exactly capture all dependencies in the distribution, will receive, with high probability, a higher score than all other graphs (see for example (Friedman & Yakhini 1996)). This means, that given a sufciently large number of instances, learning procedures can pinpoint the exact network structure up to the correct equivalence class. In this work we use the priors described by Heckerman and Geiger (1995) for hybrid networks of multinomial distributions and conditional Gaussian distributions. (This prior combines earlier works on priors for multinomial networks (Buntine 1991, Cooper & Herskovits 1992, Heckerman et al. 1995) and for Gaussian networks (Geiger & Heckerman 1994).) We refer the reader to (Heckerman & Geiger 1995) and (Heckerman 1998) for details on these priors. For this paper, we note several properties of these priors that will be used in the discussion below. These properties 5
i
i
u i
u 3
i
u u 3 w"
i
3
i
E7 3 4 i
where
is a constant independent of
and
4 q
i
v u w"
i i # 3 }~|z 3 } { { i ~|z ! 3 ~|z } {
i
t 4 rp
F
d
X
X
F X d rqF
C
4 x yi 4 p
i
E7 3
o e pln X
d
X
d m F
d
F
i
T
s
"I I I QPP" H D #
are relevant when we learn from complete data, that is, a data set in which each instance contains the values of all the variables in the network. In this case, the following properties hold. First, the priors are structure equivalent, i.e., if and are equivalent graphs they are guaranteed to have the same posterior score. Second, the score is decomposable. That is, the score can be rewritten as the sum
where the contribution of every variable to the total network score depends only on its own value and the values of its parents in . Finally, these local contributions for each variable can be computed using a closed form equation (again, see (Heckerman & Geiger 1995) for details). Once the prior is specied and the data is given, learning amounts to nding the structure that maximizes the score. This problem is known to be NP-hard (Chickering 1996), thus we resort to heuristic search. The decomposition of the score is crucial for this optimization problem. A local search procedure that changes one arc at each move can efciently evaluate the gains made by adding, removing or reversing a single arc. An example of such a procedure is a greedy hillclimbing algorithm that at each step performs the local change that results in the maximal gain, until it reaches a local maximum. Although this procedure does not necessarily nd a global maximum, it does perform well in practice. Examples of other search methods that advance using one-arc changes include beam-search, stochastic hill-climbing, and simulated annealing.
2.4 Learning Causal Patterns
Recall, a Bayesian network is a model of dependencies between multiple measurements. However, we are also interested in modeling the mechanism that generated these dependencies. Thus, we want to model the ow of causality in the system of interest (e.g., gene transcription). A causal network is a model of such causal processes. Having a causal interpretation facilitates predicting the effect of an intervention in the domain: setting the value of a variable in such a way that the manipulation itself does not affect the other variables. While at rst glance there seems to be no direct connection between probability distributions and causality, causal interpretations for Bayesian Networks have been proposed (Pearl & Verma 1991). A causal network is mathematically represented similarly to a Bayesian network, a DAG where each node represents a random variable along with a local probability model for each node. However, causal networks have a stricter interpretation of the meaning of edges: the parents of a variable are its immediate causes. A causal network models not only the distribution of the observations, but also the effects of causes , then manipulating the value of affects the value of . On the interventions. If other hand, if is a cause of , then manipulating will not affect . Thus, although and are equivalent Bayesian networks, they are not equivalent as causal networks. In our biological domain assume is a transcription factor of . If we knockout gene then this will affect the expression of gene , but a knockout of gene has no effect on the expression of gene . A causal network can be interpreted as a Bayesian network when we are willing to make the Causal Markov Assumption: given the values of a variable's immediate causes, it is independent 6
i
X
d
F
X
F
x
X
t
F
X
X
F
U )F
i
F
X
X
x i
S
ScoreContribution
Pa
" %
U F u)W
j i
"U F 7)W
i
U
4
F
F
X
X pe
F
F
of its earlier causes. When the casual Markov assumption holds, the causal network satises the Markov independencies of the corresponding Bayesian network. For example, this assumption is a natural one in models of genetic pedigrees: once we know the genetic makeup of the individual's parents, the genetic makeup of her ancestors is not informative about her own genetic makeup. The central issue is: When can we learn a causal network from observations? This issue received a thorough treatment in the literature (Heckerman et al. 1997, Pearl & Verma 1991, Spirtes et al. 1993). We briey review the relevant results for our needs here. First we make an important distinction between an observation: a passive measurement of our domain (i.e., a sample from ) and an intervention: setting the values of some variables using forces outside the causal model (e.g., gene knockout or over-expression). It is well known that interventions are an important tool for inferring causality. What is surprising is that some causal relations can be inferred from observations alone. From observations alone, we cannot distinguish between causal networks that specify the same independence assumptions, i.e., belong to the same equivalence class (see section 2.2). When learning an equivalence class (PDAG) from the data, we can conclude that the true causal network is possibly any one of the networks in this class. If a directed arc is in the PDAG, then all the networks in the equivalence class agree that is an immediate cause of . In such a situation we infer the causal direction of the interaction between and . Thus, if we are willing to accept the Causal Markov Assumption and we can learn a PDAG from the data, then we can recover some of the causal directions. Moreover, by using Theorem 2.1, we can predict what aspects of a proposed model can be recovered based on observations alone. The situation is more complex when we have a combination of observations and results of different interventions. From such data we might be able to distinguish between equivalent structures. Cooper and Yoo (1999) show how to develop a Bayesian approach for learning from such mixed data.
3
Analyzing Expression Data
In this section we describe our approach to analyzing gene expression data using Bayesian network learning techniques. First we present our modeling assumptions. We consider probability distributions over all possible states of the system in question (a cell or an organism and its environment). We describe the state of the system using random variables. These random variable denote the expression level of individual genes. In addition, we can include random variables that denote other attributes that affect the system, such as experimental conditions, temporal indicators (i.e., the time/stage that the sample was taken from), background variables (e.g., which clinical procedure was used to get a biopsy sample), and exogenous cellular conditions. We thus attempt to build a model which is a joint distribution over a collection of random variables. If we had such a model, we could answer a wide range of queries about the system. For example, does the expression level of a particular gene depend on the experimental condition? Is this dependence direct, or indirect? If it is indirect, which genes mediate the dependency? Not having a model at hand, we want to learn one from the available data and use it to answer questions about the system.
7
X
X d rF
X
F
F
C
In order to learn such a model from expression data, we need to deal with several important issues that arise when learning in this domain. These involve statistical aspects of interpreting the results, algorithmic complexity issues in learning from the data, and the choice of local probability models. Most of the difculties in learning from expression data revolve around the following central point: Contrary to most previous applications of learning Bayesian networks, expression data involves transcript levels of thousands of genes while current data sets contain at most a few dozen samples. This raises problems in computational complexity and the statistical signicance of the resulting networks. On the positive side, genetic regulation networks are sparse, i.e., given a gene, it is assumed that no more than a few dozen genes directly affect its transcription. Bayesian networks are especially suited for learning in such sparse domains.
3.1 Representing Partial Models
When learning models with many variables, small data sets are not sufciently informative to signicantly determine that a single model is the right one. Instead, many different networks should be considered as reasonable explanations of the given data. From a Bayesian perspective, we say that the posterior probability over models is not dominated by a single model (or equivalence class of models). Our approach is to analyze this set of plausible (i.e., high-scoring) networks. Although this set can be very large, we might attempt to characterize features that are common to most of these networks, and focus on learning them. Before we examine the issue of inferring such features, we briey discuss two classes of features involving pairs of variables. While at this point we handle only pairwise features, it is clear that this type of analysis is not restricted to them, and in the future we are planning on examining more complex features. The rst type of feature is Markov relations: Is in the Markov blanket of ? The Markov blanket of is the minimal set of variables that shield from the rest of the variables in the model. More precisely, given its Markov blanket is independent from the remaining variables in the network. It is easy to check that this relation is symmetric: is in 's Markov blanket if and only if there is either an edge between them, or both are parents of another variable (Pearl 1988). In the context of gene expression analysis, a Markov relation indicates that the two genes are related in some joint biological interaction or process. Note that two variables in a Markov relation are directly linked in the sense that no variable in the model mediates the dependence between them. It remains possible that an unobserved variable (e.g., protein activation) is an intermediate in their interaction. The second type of features is order relations: Is an ancestor of in all the networks of a given equivalence class? That is, does the given PDAG contain a path from to in which all the edges are directed? This type of feature does not involve only a close neighborhood, but rather captures a global property. Recall that under the assumptions of Section 2.4, learning that is an ancestor of would imply that is a cause of . However, these assumptions do not necessarily hold in the context of expression data. Thus, we view such a relation as an indication, rather than evidence, that might be a causal ancestor of .
This observation is not unique to Bayesian network models. It equally well applies to other models that are learned from gene expression data, such as clustering models.
8
F
X
F
F
X
F
X
F
F
X
X
X
F
F
F
H
X
F
3.2 Estimating Statistical Condence in Features
We now face the following problem: To what extent does the data support a given feature? More precisely, we want to estimate a measure of condence in the features of the learned networks, where condence approximates the likelihood that a given feature is actually true (i.e. is based on a genuine correlation and causation). An effective, and relatively simple, approach for estimating condence is the bootstrap method (Efron & Tibshirani 1993). The main idea behind the bootstrap is simple. We generate perturbed versions of our original data set, and learn from them. In this way we collect many networks, all of which are fairly reasonable models of the data. These networks show how small perturbations to the data can affect many of the features. In our context, we use the bootstrap as follows:
the resulting dataset. Re-sample with replacement instances from . Denote by Apply the learning procedure on to induce a network structure .
conf
We refer the reader to (Friedman, Goldszmidt & Wyner 1999) for more details, as well as largescale simulation experiments with this method. These simulation experiments show that features induced with high condence are rarely false positives, even in cases where the data sets are small compared to the system being learned. This bootstrap procedure appears especially robust for the Markov and order features described in section 3.1.
3.3 Efcient Learning Algorithms
In section 2.3, we formulated learning Bayesian network structure as an optimization problem in the space of directed acyclic graphs. The number of such graphs is super-exponential in the number of variables. As we consider hundreds of variables, we must deal with an extremely large search space. Therefore, we need to use (and develop) efcient search algorithms. To facilitate efcient learning, we need to be able to focus the attention of the search procedure on relevant regions of the search space, giving rise to the Sparse Candidate algorithm (Friedman, Nachman & Pe'er 1999). The main idea of this technique is that we can identify a relatively small number of candidate parents for each gene based on simple local statistics (such as correlation). We then restrict our search to networks in which only the candidate parents of a variable can be its parents, resulting in a much smaller search space in which we can hope to nd a good structure quickly. A possible pitfall of this approach is that early choices can result in an overly restricted search space. To avoid this problem, we devised an iterative algorithm that adapts the candidate sets 9
i
i
where
is 1 if
is a feature in
, and 0 otherwise.
6U
i sU H r
4 Y
For each feature
of interest calculate
U
i
U
~
U
For
(in our experiments, we set
4
II I %QPy4
y y
).
during search. At each iteration , for each variable , the algorithm chooses the set of variables which are the most promising candidate parents for . We then search for , a high scoring network in which Pa . (Ideally, we would like to nd the highest scoring network given the constraints, but since we are using a heuristic search, we do not have such a guarantee.) The network found is then used to guide the selection of better candidate sets for the next iteration. We ensure that the score of monotonically improves in each iteration by requiring Pa . The algorithm continues until there is no change in the candidate sets. We briey outline our method for choosing . We assign each variable some score of relevance to , choosing variables with the highest score. The question is then how to measure the relevance of potential parent to . Friedman et al. (1999) examine several measures of relevance. Based on their experiments, one of the most successful measures is simply the improvement in the score of if we add as an additional parent. More precisely, we calculate
This quantity measures how much the inclusion of an edge from to can improve the score associated with . We then choose the new candidate set to contain the previous parent set Pa and the variables that seem to be more informative given this set of parents. We refer the reader to (Friedman, Nachman & Pe'er 1999) for more details on the algorithm and its complexity, as well as empirical results comparing its performance to traditional search techniques.
3.4 Local Probability Models
In order to specify a Bayesian network model, we still need to choose the type of the local probability models we learn. In the current work, we consider two approaches:
Multinomial model. In this model we treat each variable as discrete and learn a multinomial distribution that describes the probability of each possible state of the child variable given the state of its parents.
Linear Gaussian model. In this model we learn a linear regression model for the child variable given its parents.
These models were chosen since their posterior can be efciently calculated in closed form. To apply the multinomial model we need to discretize the gene expression values. We choose to discretize these values into three categories: under-expressed ( ), normal ( ), and over-expressed , depending on whether the rate expression is signicantly lower than, similar to, or greater than control, respectively. The control expression level of a gene can be either determined experimentally (as in the methods of (DeRisi. et al. 1997)), or it can be set as the average expression level of the gene across experiments. We discretize by setting a threshold to the ratio between measured expression and control. In our experiments we choose a threshold value of in logarithmic (base ) scale. Thus, values with ratio to control lower than are considered under-expressed, and values higher than are considered over-expressed. Each of these two models has benets and drawbacks. On one hand, it is clear that by discretizing the measured expression levels we are loosing information. The linear-Gaussian model 10
x
t
I
U )F
F
h
x
t
ScoreContribution
Pa
ScoreContribution
Pa
4 SU #
I %
U W F
F
U )F
" U W F
U # U F S p86)W S i U S #
U )F
5
t
T G U W F D F
F
U )F
F
U )F
U # U F S u)W " U W F
%
t
U )F
U )F
i S T w X"I I I "H X b%PPPD U F u)W ~% t y y
does not suffer from the information loss caused by discretization. On the other hand, the linearGaussian model can only detect dependencies that are close to linear. In particular, it is not likely to discover combinatorial effects (e.g., a gene is over expressed only if several genes are jointly over expressed, but not if at least one of them is not over expressed). The multinomial model is more exible and can capture such dependencies.
4
Application to Cell Cycle Expression Patterns
We applied our approach to the data of Spellman et al. (Spellman et al. 1998). This data set contains 76 gene expression measurements of the mRNA levels of 6177 S. cerevisiae ORFs. These experiments measure six time series under different cell cycle synchronization methods. Spellman et al. (1998) identied 800 genes whose expression varied over the different cell-cycle stages. In learning from this data, we treat each measurement as an independent sample from a distribution, and do not take into account the temporal aspect of the measurement. Since it is clear that the cell cycle process is of temporal nature, we compensate by introducing an additional variable denoting the cell cycle phase. This variable is forced to be a root in all the networks learned. Its presence allows to model dependency of expression levels on the current cell cycle phase. We used the Sparse Candidate algorithm with a 200-fold bootstrap in the learning process. We performed two experiments, one with the discrete multinomial distribution, the other with the linear Gaussian distribution. The learned features show that we can recover intricate structure even from such small data sets. It is important to note that our learning algorithm uses no prior biological knowledge nor constraints. All learned networks and relations are based solely on the information conveyed in the measurements themselves. These results are available at our WWW site: http://www.cs.huji.ac.il/labs/compbio/expression. Figure 2 illustrates the graphical display of some results from this analysis.
4.1 Robustness Analysis
We performed a number of tests to analyze the statistical signicance and robustness of our procedure. Some of these tests were carried on the smaller 250 gene data set for computational reasons. To test the credibility of our condence assessment, we created a random data set by randomly permuting the order of the experiments independently for each gene. Thus for each gene the order was random, but the composition of the series remained unchanged. In such a data set, genes are independent of each other, and thus we do not expect to nd real features. As expected, both order and Markov relations in the random data set have signicantly lower condence. We compare the distribution of condence estimates between the original data set and the randomized set in Figure 3. Clearly, the distribution of condence estimates in the original data set have a longer and heavier tail in the high condence region. In the linear-Gaussian model we see that random data does not generate any feature with condence above 0.3. The multinomial model is
We note that we can learn temporal models using a Bayesian network that includes gene expression values in two (or more) consecutive time points (Friedman, Murphy & Russell 1998). This raises the number of variables in the model. We are currently perusing this issue.
11
more expressive, and thus susceptible to over-tting. For this model, we see a smaller gap between the two distributions. Nonetheless, randomized data does not generate any feature with condence above 0.8, which leads us to believe that most features that are learned in the original data set with such condence are not an artifact of the bootstrap estimation. Since the analysis was not performed on the whole S. cerevisiae genome, we also tested the robustness of our analysis to the addition of more genes, comparing the condence of the learned features between the 800 gene dataset and a smaller 250 gene data set that contains genes appearing in eight major clusters described by Spellman et al.Figure 4 compares feature condence in the analysis of the two datasets for the multinomial model. As we can see, there is a strong correlation between condence levels of the features between the two data sets. The comparison for the linearGaussian model gives similar results. A crucial choice for the multinomial experiment is the threshold level used for discretization of the expression levels. It is clear that by setting a different threshold, we would get different discrete expression patterns. Thus, it is important to test the robustness and sensitivity of the high condence features to the choice of this threshold. This was tested by repeating the experiments using different thresholds. Again, the graphs show a denite linear tendency in the condence estimates of features between the different discretization thresholds. Obviously, this linear correlation gets weaker for larger threshold differences. We also note that order relations are much more robust to changes in the threshold than Markov relations. A valid criticism of our discretization method is that it penalizes genes whose natural range of variation is small: since we use a xed threshold, we would not detect changes in such genes. A possible way to avoid this problem is to normalize the expression of genes in the data. That is, we rescale the expression level of each gene, so that the relative expression level has the same mean and variance for all genes. We note that analysis methods that use Pearson correlation to compare genes, such as (Ben-Dor et al. 1999, Eisen et al. 1998), implicitly perform such a normalization. When we discretize a normalized dataset, we are essentially rescaling the discretization factor differently for each gene, depending on its variance in the data. We tried this approach with several discretization levels, and got results comparable to our original discretization method. The 20 top Markov relations highlighted by this method were a bit different, but interesting and biologically sensible in their own right. The order relations were again more robust to the change of methods and discretization thresholds. A possible reason is that order relations depend on the network structure in a global manner, and thus can remain intact even after many local changes to the structure. The Markov relation, being a local one, is more easily disrupted. Since the graphs learned are extremely sparse, each discretization method highlights different signals in the data, which are reected in the Markov relations learned. A similar picture arises when we compare the results of the multinomial experiment to those of the linear-Gaussian experiment (Figure 5). In this case there is virtually no correlation between the Markov relations found by the two methods, while the order relations show some correlation. This
An undesired effect of such a normalization is the amplication of measurement noise. If a gene has xed expression levels across samples, we expect the variance in measured expression levels to be noise either in the experimental conditions or the measurements . When we normalize the expression levels of genes, we loose the distinction between such noise and true (i.e., signicant) changes in expression levels. In our experiments, we can safely assume this effect will not be too grave, since we only focus on genes that display signicant changes across experiments.
12
Table 1: List of dominant genes in the ordering relations. Included are the top 10 dominant genes for each experiments.
Gene/ORF MCD1 MSH6 CSI2 CLN2 YLR183C RFA2 RSR1 CDC45 RAD53 CDC5 POL30 YOX1 SRO4 CLN1 YBR089W Score in Experiment Multinomial Gaussian 550 525 292 508 444 497 497 454 551 448 456 423 352 395 60 209 376 400 463 324 298 394 383 353 321 291 239 Notes Mitotic Chromosome Determinant,null mutant is inviable Required for mismatch repair in mitosis and meiosis cell wall maintenance, chitin synthesis Role in cell cycle START, null mutant exhibits G1 arrest Contains forkheaded associated domain, thus possibly nuclear Involved in nucleotide excision repair, null mutant is inviable GTP-binding protein of the RAS family involved in bud site selection Required for initiation of chromosomal replication, null mutant lethal Cell cycle control, checkpoint function, null mutant lethal Cell cycle control, required for exit from mitosis, null mutant lethal Required for DNA replication and repair, null mutant is inviable Homeodomain protein Involved in cellular polarization during budding Role in cell cycle START, null mutant exhibits G1 arrest
supports our assumption that the two methods highlight different types of connections between genes. In summary, although many of the results we report below (especially order relations) are stable across the different experiments discussed in the previous paragraph, it is clear that our analysis is sensitive to the choice of local model, and in the case of the multinomial model, to the discretization method. In all the methods we tried, our analysis found interesting relationships in the data. Thus, one challenge is to nd alternative methods that can recover all these relationships in one analysis. We are currently working on learning with semi-parametric density models that would circumvent the need for discretization on one hand, and allow nonlinear dependency relations on the other.
4.2 Biological Analysis
We believe that the results of this analysis can be indicative of biological phenomena in the data. This is conrmed by our ability to predict sensible relations between genes of known function. We now examine several consequences that we have learned from the data. We consider, in turn, the order relations and Markov relations found by our analysis.
13
4.2.1 Order Relations The most striking feature of the high condence order relations, is the existence of dominant genes. Out of all 800 genes only few seem to dominate the order (i.e., appear before many genes). The intuition is that these genes are indicative of potential causal sources of the cell-cycle process. denote the condence in being ancestor of . We dene the dominance score Let of as using the constant for rewarding high condence features and the threshold to discard low condence ones. These dominant genes are extremely robust to parameter selection for both , , the discretization cutoff of section 3.4 and the local probability model used. A list of the highest scoring dominating genes for both experiments appears in table 1. Inspection of the list of dominant genes reveals quite a few interesting features. Among them are genes directly involved in initiation of the cell-cycle and its control. For example, CLN1, CLN2, CDC5 and RAD53 whose functional relation has been established (Cvrckova & Nasmyth 1993, Drebot, Johnston, Friesen & Singer 1993). The genes MCD1, RFA2, CDC45, RAD53, CDC5 and POL30 were found to be essential (Guacci, Koshland & Strunnikov 1997). These are clearly key genes in essential cell functions. Some of them are components of pre-replication complexes(CDC45,POL30). Others (like RFA2,POL30 and MSH6) are involved in DNA repair. It is known that DNA repair is associated with transcription initiation, and DNA areas which are more active in transcription, are also repaired more frequently (McGregor 1999, Tornaletti & Hanawalt 1999). Furthermore, a cell cycle control mechanism causes an abort when the DNA has been improperly replicated (Eisen & Lucchesi 1998). Most of the dominant genes encode nuclear proteins, and some of the unknown genes are also potentially nuclear: (e.g., YLR183C contains a forkhead-associated domain which is found almost entirely among nuclear proteins). A few non nuclear dominant genes are localized in the cytoplasm membrane (SRO4 and RSR1). These are involved in the budding and sporulation process which have an important role in the cell-cycle. RSR1 belongs to the RAS family of proteins, which are known as initiators of signal transduction cascades in the cell. 4.2.2 Markov Relations We begin with an analysis of the Markov relations in the multinomial experiment. Inspection of the top Markov relations reveals that most are functionally related. A list of the top scoring relations can be found in table 2. Among these, all involving two known genes make sense biologically. When one of the ORFs is unknown careful searches using Psi-Blast (Altschul, Thomas, Schaffer, Zhang, Miller & Lipman 1997), Pfam (Sonnhammer, Eddy, Birney, Bateman & Durbin 1998) and Protomap (Yona, Linial & Linial 1998) can reveal rm homologies to proteins functionally related to the other gene in the pair. For example YHR143W, which is paired to the endochitinase CTS1, is related to EGT2 - a cell wall maintenance protein. Several of the unknown pairs are physically adjacent on the chromosome, and thus presumably regulated by the same mechanism (see (Blumenthal 1998)), although special care should be taken for pairs whose chromosomal location overlap on complementary strands, since in these cases we might see an artifact resulting from cross-hybridization. Such an analysis raises the number of biologically sensible pairs to nineteen out of the twenty top relations. There are some interesting Markov relations found that are beyond the limitations of clustering techniques. Among the high condence Markov relations, one can nd examples of conditional in14
X
F
" w %BW# X" F
! 7! ~
X" F )%u#
F
Table 2: List of top Markov relations, multinomial experiment.
Condence 1.0 0.985 0.985 0.98 0.975 0.97 0.94 0.94 0.92 0.91 0.9 0.89 0.88 0.86 0.85 0.85 0.85 Gene 1 YKL163W-PIR3 PRY2 MCD1 PHO11 HHT1 HTB2 YNL057W YHR143W YOR263C YGR086 FAR1 CLN2 YDR033W STE2 HHF1 MET10 CDC9 Gene 2 YKL164C-PIR1 YKR012C MSH6 PHO12 HTB1 HTA1 YNL058C CTS1 YOR264W SIC1 ASH1 SVS1 NCE2 MFA2 HHF2 ECM17 RAD27 Notes Close locality on chromosome Close locality on chromosome Both bind to DNA during mitosis Both nearly identical acid phosphatases Both are Histones Both are Histones Close locality on chromosome Homolog to EGT2 cell wall control, both involved in Cytokinesis Close locality on chromosome Homolog to mammalian nuclear ran protein, both involved in nuclear function Both part of a mating type switch, expression uncorrelated Function of SVS1 unknown Homolog to transmembrame proteins suggest both involved in protein secretion A mating factor and receptor Both are Histones Both are sulte reductases Both participate in Okazaki fragment processing
dependence, i.e., a group of highly correlated genes whose correlation can be explained within our network structure. One such example involves the genes CLN2,RNR3,SVS1,SRO4 and RAD51. Their expression is correlated, and in (Spellman et al. 1998) they all appear in the same cluster. In our network CLN2 is with high condence a parent of each of the other 4 genes, while no links are found between them (see gure 2). This suits biological knowledge: CLN2 is a central and early cell cycle control, while there is no clear biological relationship between the others. Some of the other Markov relations are inter-cluster, pairing genes with low correlation in their expression. One such regulatory link is FAR1-ASH1: both proteins are known to participate in a mating type switch. The correlation of their expression patterns is low and (Spellman et al. 1998) cluster them into different clusters. When looking further down the list for pairs whose Markov relation condence is high relative to their correlation, interesting pairs surface. For example SAG1 and MF-ALPHA-1, a match between the factor that induces the mating process and an essential protein that participates in the mating process. Another match is LAC1 and YNL300W. LAC1 is a GPI transport protein and YNL300W is most likely modied by GPI (based on sequence homology). The Markov relations from the Gaussian experiment are summarized in table 3. Since the Gaussian model focuses on highly correlated genes, most of the high scoring genes are tightly correlated. When we checked the DNA sequence of pairs of physically adjacent genes at the top of Table 3, we found that there is signicant overlap. This suggests that these correlations are spurious 15
Table 3: List of top Markov relations, Gaussian experiment. (The table skips over 5 additional pairs with which close locality.)
Condence 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 ... 0.99 0.99 0.99 0.975 Gene 1 YOR263C CDC46 CDC45 SHM2 MET3 YJL194W-CDC6 YGR151C YGR151C STE2 YDL037C YCL040W-GLK1 HTA1 HHF2 YHR143W ARO9 SRO4 Gene 2 YOR264W YOR066W SPH1 GCV2 ECM17 YJL195C YGR152C YGR152C-RSR1 MFA2 YDL039C WCL042C HTA2 HHT2 CTS1 DIP5 YOL007C Notes Close locality on chromosome YOR066W is totally unknown. No suggestion for immediate link. SHM2 interconverts glycine, GCV2 is regulated by glycine MET3 required to convert sulfate to sulde, ECM17 sulte reductase Close locality on chromosome Close locality on chromosome Close locality on chromosome A mating factor and receptor Both homologs to mucin proteins Close locality on chromosome two physically linked histones both histones Homolog to EGT2 cell wall control, both involved in Cytokinesis DIP5 transports glutamate which regulates ARO9 Both proteins are involved in cell wall regulation at the plasma membrane.
and due to cross hybridization. Thus, we ignore the relations with the highest score. However, in can be discarded as spite of this technical problem, few of the pairs with a condence of biologically false. Some of the relations are robust and also appear in the multinomial experiment (e.g. STE2MFA2, CST1-YHR143W). Most interesting are the genes linked through regulation. These include: SHM2 which converts glycine that regulates GCV2 and DIP5 which transports glutamate which regulates ARO9. Some pairs participate in the same metabolic process, such as: CTS1YHR143 and SRO4-YOL007C all which participate in cell wall regulation. Other interesting high condence ( ) examples are: OLE1-FAA4 linked through fatty acid metabolism, STE2-AGA2 linked through the mating process and KIP3-MSB1, both playing a role in polarity establishment.
5
Discussion and Future Work
In this paper we presented a new approach for analyzing gene expression data that builds on the theory and algorithms for learning Bayesian networks. We described how to apply these techniques to gene expression data. The approach builds on two techniques that were motivated by the
16
I
I
challenges posed by this domain: a novel search algorithm (Friedman, Nachman & Pe'er 1999) and an approach for estimating statistical condence (Friedman, Goldszmidt & Wyner 1999). We applied our methods to real expression data of Spellman et al. (1998). Although, we did not use any prior knowledge, we managed to extract many biologically plausible conclusions from this analysis. Our approach is quite different than the clustering approach used by (Alon et al. 1999, Ben-Dor et al. 1999, Eisen et al. 1998, Michaels et al. 1998, Spellman et al. 1998), in that it attempts to learn a much richer structure from the data. Our methods are capable of discovering causal relationships, interactions between genes other than positive correlation, and ner intra-cluster structure. We are currently developing hybrid approaches that combine our methods with clustering algorithms to learn models over clustered genes. The biological motivation of our approach is similar to work on inducing genetic networks from data (Akutsu et al. 1998, Chen et al. 1999, Somogyi et al. 1996, Weaver et al. 1999). There are two key differences: First, the models we learn have probabilistic semantics. This better ts the stochastic nature of both the biological processes and noisy experiments. Second, our focus is on extracting features that are pronounced in the data, in contrast to current genetic network approaches that attempt to nd a single model that explains the data. We are currently working on improving methods for expression analysis by expanding the framework described in this work. Promising directions for such extensions are: (a) Developing the theory for learning local probability models that are suitable for the type of interactions that appear in expression data; (b) Improving the theory and algorithms for estimating condence levels; (c) Incorporating biological knowledge (such as possible regulatory regions) as prior knowledge to the analysis; (d) Improving our search heuristics; (e) Learning temporal models, such as Dynamic Bayesian Networks (Friedman et al. 1998), from temporal expression data (f) Developing methods that discover hidden variables (e.g protein activation). Finally, one of the most exciting longer term prospects of this line of research is discovering causal patterns from gene expression data. We plan to build on and extend the theory for learning causal relations from data and apply it to gene expression. The theory of causal networks allows learning both from observational data and interventional data, where the experiment intervenes with some causal mechanisms of the observed system. In gene expression context, we can model knockout/over-expressed mutants as such interventions. Thus, we can design methods that deal with mixed forms of data in a principled manner (See (Cooper & Yoo 1999) for a recent work in this direction). In addition, this theory can provide tools for experimental design, that is, understanding which interventions are deemed most informative to determining the causal structure in the underlying system. Acknowledgements The authors are grateful to Gill Bejerano, Hadar Benyaminy, David Engelberg, Moises Goldszmidt, Daphne Koller, Matan Ninio, Itzik Pe'er , and Gavin Sherlock for comments on drafts of this paper and useful discussions relating to this work. We also thank Matan Ninio for help in running and analyzing the robustness experiments.
17
References
Akutsu, S., Kuhara, T., Maruyama, O. & Minyano, S. (1998), Identication of gene regulatory networks by strategic gene disruptions and gene over-expressions, in `Proc. Ninth Annual ACM-SIAM Symposium on Discrete Algorithms', ACM-SIAM. Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D. & Levine, A. J. (1999), `Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays', Proc. Nat. Acad. Sci. USA 96, 67456750. Altschul, S., Thomas, L., Schaffer, A., Zhang, J. Zhang, Z., Miller, W. & Lipman, D. (1997), `Gapped blast and psi-blast: a new generation of protein database search programs', Nucleic Acids Research 25. Ben-Dor, A., Shamir, R. & Yakhini, Z. (1999), `Clustering gene expression patterns', Journal of Computational Biology 6, 281297. Blumenthal, T. (1998), `Gene clusters and polycistronic transcription in eukaryotes', Bioessays pp. 480487. Buntine, W. (1991), Theory renement on Bayesian networks, in `Proceedings of the Seventh Annual Conference on Uncertainty in AI (UAI)', pp. 5260. Chen, T., Filkov, V. & Skiena, S. (1999), Identifying gene regulatory networks from experimental data, in `Proc. 3' rd Annual International Conference on Computational Molecular Biology (RECOMB)'. Chickering, D. M. (1995), A transformational characterization of equivalent Bayesian network structures, in `Proc. Eleventh Conference on Uncertainty in Articial Intelligence (UAI '95)', pp. 8798. Chickering, D. M. (1996), Learning Bayesian networks is NP-complete, in D. Fisher & H.-J. Lenz, eds, `Learning from Data: Articial Intelligence and Statistics V', Springer Verlag. Cooper, G. F. & Herskovits, E. (1992), `A Bayesian method for the induction of probabilistic networks from data', Machine Learning 9, 309347. Cooper, G. & Yoo, C. (1999), Causal discovery from a mixture of experimental and observational data, in `Proc. Fifteenth Conference on Uncertainty in Articial Intelligence (UAI '99)', pp. 116125. Cvrckova, F. & Nasmyth, K. (1993), `Yeast G1 cyclins CLN1 and CLN2 and a GAP-like protein have a role in bud formation', EMBO. J 12, 52775286. DeRisi., J., Iyer, V. & Brown, P. (1997), `Exploring the metabolic and genetic control of gene expression on a genomic scale', Science 282, 699705.
18
Drebot, M. A., Johnston, G. C., Friesen, J. D. & Singer, R. A. (1993), `An impaired RNA polymerase II activity in saccharomyces cerevisiae causes cell-cycle inhibition at START', Mol. Gen. Genet. 241, 327334. Efron, B. & Tibshirani, R. J. (1993), An Introduction to the Bootstrap, Chapman & Hall, London. Eisen, A. & Lucchesi, J. (1998), `Unraveling the role of helicases in transcription', Bioessays 20, 634641. Eisen, M., Spellman, P., Brown, P. & Botstein, D. (1998), `Cluster analysis and display of genomewide expression patterns', Proc. Nat. Acad. Sci. USA 95, 1486314868. Friedman, N., Goldszmidt, M. & Wyner, A. (1999), Data analysis with Bayesian networks: A bootstrap approach, in `Proc. Fifteenth Conference on Uncertainty in Articial Intelligence (UAI '99)', pp. 206215. Friedman, N., Murphy, K. & Russell, S. (1998), Learning the structure of dynamic probabilistic networks, in `Proc. Fourteenth Conference on Uncertainty in Articial Intelligence (UAI '98)', pp. 139147. Friedman, N., Nachman, I. & Pe'er , D. (1999), Learning Bayesian network structure from massive datasets: The sparse candidate algorithm, in `Proc. Fifteenth Conference on Uncertainty in Articial Intelligence (UAI '99)', pp. 196205. Friedman, N. & Yakhini, Z. (1996), On the sample complexity of learning Bayesian networks, in `Proc. Twelfth Conference on Uncertainty in Articial Intelligence (UAI '96)', pp. 274282. Geiger, D. & Heckerman, D. (1994), Learning Gaussian networks, in `Proc. Tenth Conference on Uncertainty in Articial Intelligence (UAI '94)', pp. 235243. Guacci, V., Koshland, D. & Strunnikov, A. (1997), `A direct link between sister chromatid cohesion and chromosome condensation revealed through the analysis of MCD1 in s. cerevisiae', Cell 91(1), 4757. Heckerman, D. (1998), A tutorial on learning with Bayesian networks, in M. I. Jordan, ed., `Learning in Graphical Models', Kluwer, Dordrecht, Netherlands. Heckerman, D. & Geiger, D. (1995), Learning Bayesian networks: a unication for discrete and Gaussian domains, in `Proc. Eleventh Conference on Uncertainty in Articial Intelligence (UAI '95)', pp. 274284. Heckerman, D., Geiger, D. & Chickering, D. M. (1995), `Learning Bayesian networks: The combination of knowledge and statistical data', Machine Learning 20, 197243. Heckerman, D., Meek, C. & Cooper, G. (1997), A Bayesian approach to causal discovery, Technical report. Technical Report MSR-TR-97-05, Microsoft Research.
19
Iyer, V., Eisen, M., Ross, D., Schuler, G., Moore, T., Lee, J., Trent, J., Staudt, L., Hudson, J., Boguski, M., Lashkari, D., Shalon, D., Botstein, D. & Brown, P. (1999), `The transcriptional program in the response of human broblasts to serum', Science 283, 8387. Jensen, F. V. (1996), An introduction to Bayesian Networks, University College London Press, London. Lauritzen, S. L. & Wermuth, N. (1989), `Graphical models for associations between variables, some of which are qualitative and some quantitative', Annals of Statistics 17, 3157. Lockhart, D. J., Dong, H., Byrne, M. C., Follettie, M. T., Gallo, M. V., Chee, M. S., Mittmann, M., Want, C., Kobayashi, M., Horton, H. & Brown, E. L. (1996), `DNA expression monitoring by hybridization of high density oligonucleotide arrays', Nature Biotechnology 14, 16751680. McGregor, W. G. (1999), `DNA repair, DNA replication, and UV mutagenesis', J. Investig. Dermatol. Symp. Proc. 4, 15. Michaels, G., Carr, D., Askenazi, M., Fuhrman, S., Wen, X. & Somogyi, R. (1998), Cluster analysis and data visualization for large scale gene expression data, in `Pac. Symp. Biocomputing', pp. 4253. Pearl, J. (1988), Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann, San Francisco, Calif. Pearl, J. & Verma, T. S. (1991), A theory of inferred causation, in `Principles of Knowledge Representation and Reasoning: Proc. Second International Conference (KR '91)', pp. 441452. Somogyi, R., Fuhrman, S., Askenazi, M. & Wuensche, A. (1996), The gene expression matrix: Towards the extraction of genetic network architectures, in `The Second World Congress of Nonlinear Analysts (WCNA)'. Sonnhammer, E. L., Eddy, S., Birney, E., Bateman, A. & Durbin, R. (1998), `Pfam: multiple sequence alignments and hmm-proles of protein domains', Nucleic Acids Research 26, 320 322. http://pfam.wustl.edu/. Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., Brown, P., Botstein, D. & Futcher, B. (1998), `Comprehensive identication of cell cycle-regulated genes of the yeast sacccharomyces cerevisiae by microarray hybridization', Molecular Biology of the Cell 9, 32733297. Spirtes, P., Glymour, C. & Scheines, R. (1993), Causation, prediction, and search, SpringerVerlag. Tornaletti, S. & Hanawalt, P. C. (1999), `Effect of DNA lesions on transcription elongation', Biochimie 81, 139146. Weaver, D., Workman, C. & Stormo, G. (1999), Modeling regulatory networks with weight matrices, in `Pac. Symp. Biocomputing', pp. 112123. 20
Wen, X., Furhmann, S., Micheals, G. S., Carr, D. B., Smith, S., Barker, J. L. & Somogyi, R. (1998), `Large-scale temporal gene expression mapping of central nervous system development', Proc. Nat. Acad. Sci. USA 95, 334339. Yona, G., Linial, N. & Linial, M. (1998), `Protomap - automated classication of all protein sequences: a hierarchy of protein families, and local maps of the protein space', Proteins: Structure, Function, and Genetics 37, 360378.
21
Figure 2: An example of the graphical display of Markov features. This graph shows a local map for the gene SVS1. The width (and color) of edges corresponds to the computed condence level. An edge is directed if there is a sufciently high condence in the order between the genes connected by the edge. This local map shows that CLN2 separates SVS1 from several other genes. Although there is a strong connection between CLN2 to all these genes, there are no other edges connecting them. This indicates that, with high condence, these genes are conditionally independent given the expression level of CLN2.
22
Markov Multinomial
250
Order
1400
Features with Confidence above
200 150
1000
100
50
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
400 200 0
Linear-Gaussian
4000
500 400 300 200 100 0
1000
3000 2500 2000 1500 1000 500 0
Features with Confidence above
Features with Confidence above
3500
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 3: Plots of abundance of features with different condence levels for the cell cycle data set (solid line), and the randomized data set (dotted line). The -axis denotes the condence threshold, and the -axis denotes the number of features with condence equal or higher than the corresponding -value. The graphs on the left column show Markov features, and the ones on the right column show Order features. The top row describes features found using the multinomial model, and the bottom row describes features found by the linear-Gaussian model. Inset in each graph is plot of the tail of the distribution.
23
V
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Random Real
14000
800 600 400
12000 10000 8000
200
6000
0 0.2 0.4 0.6 0.8 1
4000 2000 0 0.2 0.4 0.6 0.8 1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Random Real
u u u
u u u
u u u
u u u
u u u
0
600
800
1200
A A A999 A A A AA AA AA A A A A
| | |
a
V
Order relations
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2
Markov relations
0.4
0.6
0.8
1
Figure 4: Comparison of condence levels obtained in two datasets differing in the number of genes, on the multinomial experiment. Each relation is shown as a point, with the -coordinate being its condence in the the 250 genes data set and the -coordinate the condence in the 800 genes data set. The left gure shows order relation features, and the right gure shows Markov relation features.
Order relations
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2
Markov relations
0.4
0.6
0.8
Figure 5: Comparison of of condence levels between the multinomial experiment and the linearGaussian experiment. Each relation is shown as a point, with the -coordinate being its condence in the multinomial experiment, and the -coordinate its condence in the linear-Gaussian experiment. The left gure shows order relation features, and the right gure shows Markov relation features. 24
V
V
a
1
a
Find millions of documents on Course Hero - Study Guides, Lecture Notes, Reference Materials, Practice Exams and more.
Course Hero has millions of course specific materials providing students with the best way to expand
their education.
Below is a small sample set of documents:
Iowa State - CS - 573
0Maximum-Likelihood Parameter EstimationIntroductionWe could design an optimal classifier if we knew:P(i) (priors) p(x | i) (class-conditional densities)1Unfortunately, we rarely have this complete information! We have some knowledge and tr
Iowa State - CS - 573
Decision Tree Learning1Outline Decision tree representation Decision tree learning (ID3) Information gain Overfitting Extensions2Example problemProblem: decide whether to wait for a table at a restaurant, based on the following attribut
Iowa State - CS - 573
Evaluation of Classifiers p. 1Evaluating classifiersHow well can a classifier be expected to perform on novel data? Choice of performance measure How close is the estimated performance to the true performance? Comparing Classifiers p. 2Measu
Iowa State - CS - 573
Instance-Based Learning p.1Outlinek-Nearest NeighborLocally weighted regression Lazy and eager learning p.2Instance-Based LearningLearning methods so far: construct a general, explicit hypothesis of the target function over the entire inst
Iowa State - CS - 573
Bayesian networksModeling and Inference p.1A Bayesian NetworkVisit to Asia? (A) Smoker? (S)Tuberculosis? (T)Lung Cancer? (C)Bronchitis? (B) Tuberculosis or Cancer? (P)Positive X-Ray? (X)Dyspnoea? (D) p.2Reasoning with BNsWhat type
Iowa State - CS - 573
MACHINE LEARNING1Why study machine learning AI is the enterprise of design and analysis of intelligent agents. Intelligent behavior requires knowledge (e.g., model of the environment) Explicitly specifying the knowledge needed for specific tas
Iowa State - CS - 573
Bayesian Decision TheoryLearning as Bayesian InferenceFormulate the learning task as a process of probabilistic inference Inference step: determine P(x,t) from data Decision step: for given x, determine optimal t. Bayesian Decision Theory A funda
Iowa State - CS - 573
Ensemble Classifiers p.1Ensemble LearningAn ensemble of classifiers is a set of classifiers whose individual decisions are combined in some way (typically by weighted or unweighted voting) to classify new examples, aka committees Dietterich, Ens
Iowa State - CS - 573
Naive Bayes Classifier1Bayesian recipe for classification The Bayesian recipe is simple, optimal, and in principle, straightforward to apply We could design an optimal classifier if we knew:P(i) (priors) p(x | i) (class-conditional densities)
Iowa State - CS - 573
Information Theory p.1InformationA discrete random variable X , that can take on possible valuesx1 , . . . , xn with distribution P(xi )How much information is conveyed when you are told a specic value? Information content of a message is rel
Iowa State - CS - 573
Review of Probability Theory1Outline Uncertainty Probability basics Random variables Probability Distributions Probabilistic Inference Continuous variables Expected Values Independence Bayes' Rule2Representing and Reasoning under Unc
Iowa State - CS - 573
Bayesian networksSyntax and Semantics p.1Bayesian ClassifiersGenerative models: model P(x1 , . . . , xd , ) E.g., Naive Bayes classifier Naive Bayes assumption Use Bayesian networks to model P(x1 , . . . , xd , ) Systematically use domain knowl
Iowa State - DEC - 0308
Adaptive Rename Space Size in Superscalar Processors Final ReportDec03-08Advisor and ClientDr. Akhilesh TyagiTeam MembersKevin Brandt Anna Hentzel Mark Taylor Ben ThompsonDecember 17, 2003Table of Contents1.INTRODUCTORY MATERIAL..1 1.1.Ex
Iowa State - ENG - 461
SUBROUTINE CMINEX (ANALYZ, X, VLB, VUB, ISC, NDV, NCON, NSIDE, & IPRINT, ITMAX, IER) IMPLICIT REAL*8 (A-H,O-Z)CC*C**C* *C* Murphy Laboratory - Ap
Iowa State - ENG - 461
Iowa State University Department of Aerospace Engineering AerE 461Necessary Conditions for Best Solutions Design Optimimization Techniques Formulation: 1. Unconstrained Problem Given the constants vector, c = (c1 , c2 , c3 , c4 ) = (1,3,2,1) Minimiz
Iowa State - ENG - 524
The general 2D variational grid generation systemThe general 2D variational grid generation system is given by: Ar cc ) Br ch ) Cr hh ) Dr c ) Er h + 0 where A=As+Ao+Aa, B=Bs+Bo+Ba, C=Cs+Co+Ca. These matrices in addition to D and E are given by: *
Iowa State - ENG - 461
Aero 361 Problem 6We will do a trajectory optimization for the orbit insertion of a multistage, rocket powered vehicle. This problem focuses on the optimization process and the coupling of the trajectory analyzer developed in problem 3 with the opti
Iowa State - ENG - 361
Wing Aerodynamics AnalysisWing DefinitionWe will develop a model for a 3D wing geometry with sweep, taper, and twist. The airfoil section used to construct the wing is arbitrary. Consider the wing shown below:Figure 1z y ctip rootr V cho
Iowa State - ENG - 361
Aero 361 Problem 2A spherical mass of .25 slugs is hanging still at sea level 3 feet below a frictionless and rigidly supported bearing. The mass is perturbed from this position by ro tating it through an angle, qo, and then releasing it. We wish to
Iowa State - ENG - 461
Formulation We seek to optimize the functionalJ = J( c , x , u )(1)subject to equality constraintshk ( c , x, u ) = 0and inequality constraintsk = 1,2,3, . . lj = 1,2,3, . . m(2)g j ( c , x, u ) 0where( 3)represents an n-compone
Iowa State - ENG - 461
Preliminary DesignAerospace Engineering FlavorTopics Covered Blooms Cognitive Taxonomy Analysis, Synthesis, Design Conceptual, Preliminary, Detailed Design Aerospace Industry Design Loop Role of Analysis in Design Endeavorsl A A ATaxonomy: Stu
Iowa State - ENG - 461
199419951996D J F M A M J J A S O N D J F MA M J J A S O N D J F M2-D Airfoil Design 2-D Transonic WT S & C Studies Analytical 3-D Aero 3-D Low Speed WT 3-D Transonic WT Simulator Program Performance Preliminary Airloads6 53 1.5 2.5 4.5 1.
Iowa State - ENG - 643
Consider first the mass, M, as the extensive property of some control mass of a substance. There is no dissipation of this property by friction and we will assume no depletion by chemical reactions. Thus, the conservation of mass for the stationary c
Iowa State - ENG - 461
Transcendental Constraint SystemsA transcendental constraint system is one in which an explicit solution for the objective function cannot be obtained. Constraint systems of this type require iterative procedures for solution access. Example #1: Ex
Iowa State - ENG - 461
Iowa State University Department of Aerospace Engineering AerE 461Necessary Conditions for Best Solutions Design Optimimization Techniques Formulation: 1. Unconstrained Problem Given the constants vector, c = (c1 , c2 , c3 , c4 ) = (1,3,2,1) Minimiz
Iowa State - ENG - 461
Aerospace Engineering 461 PresentationTopic: _ Date: _ Group Presenting: _ Group Judging: _ Grading: A score of 1-10 is assessed for each of the following criteria: Clearness of Presentation/Organization - Was listener able to follow along with ease
Iowa State - ENG - 446
The Search for Optimal PreconditioningThis document contains information about accelerating convergence of iterative schemes for solving the compressible fluid equations for cases near the incompressible limit of low Mach numbers. The governing equa
Iowa State - ENG - 241
1AerE 241 Final Exam (Closed Book)Part I, MatchingInsert the letter from the item in the right column into the blank provided in the left column for that item which provides the best match. Mark the letters clearly._1. _2. _3. _4. _5. _6. _7. _
Iowa State - ENG - 461
Formulation We seek to optimize the functionalJ = J( c , x, u )(1 )subject to equality constraintshk ( c , x , u ) = 0and inequality constraints wherek = 1,2,3, . . j = 1,2,3, . . m(2)g j ( c , x, u ) 0x = { xi } u = { uk }( 3)i
Iowa State - ENG - 241
1AerE 241 Final Exam (Open Book)Multiple ChoiceCircle the letter of the best single answer to each question.1.At a point in air, the pressure and temperature are measured to be 1.01(10)5 N/m2 and 300oK respectively. What is the density of the
Iowa State - ENG - 461
Formulation We seek to optimize the functionalJ = J( c , x, u )(1 )subject to equality constraintshk ( c , x , u ) = 0and inequality constraints wherek = 1,2,3, . . j = 1,2,3, . . m(2)g j ( c , x, u ) 0x = { xi } u = { uk }( 3)i
Iowa State - ENG - 461
Lift Distribution & Moment at Root for Elliptic Planform WingA typical elliptic planform wing ideally experiences an elliptic load distribution across its span. The lift force per unit span at a particular spanwise station, y, is given by: = V( y )
Iowa State - ENG - 643
t dV + VV V dA = 0 tV T VdV + VV dA = - p dA + dA + gdVV V V V t edV + VV eV dA = - pV dA + V T dA + V gdV V V VdA gT V U = V e U = (U , U ) tU = UdVVU = tt s-V) s dA
Iowa State - ENG - 643
<html><head><title>Routine Name</title></head><body background="http:/www.eng.iastate.edu/~hindman/gif-files/Backgrounds/tanmarble.gif"><hr><h3 align=center><= Routine Name =></H3><hr><pre>.Put code here.</pre></body></html>
Iowa State - ENG - 643
1618006224006064422460652.542446212.541500000000000000000114641412215885104763975151310512101116461412141174137133313151345121110412558151111
Iowa State - ENG - 643
9248010-1010-100010-1010-101111-1-1-1-11621231671731272732671341781383843781481451891494594891591521961925922961 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Iowa State - ENG - 643
> syms r m Ev> syms 'Ev' 'm' 'r' > U=[r,m,Ev];> U=[r,m,Ev] U = [ r, m, Ev] > syms g > p=(g-1)*(Ev-(1/2)*m^2/r) p = (g-1)*(Ev-1/2*m^2/r) > F=[m;p+m^2/r;(p+Ev)*m/r] F = [
Iowa State - ENG - 461
Barrel Bursting: The maximum combustion chamber pressure must not exceed the critical pressure at which the barrel will burst. The barrel thickness must be defined to assure safety in this area. A factor of safety of 2 is used. The stresses in the ba
Iowa State - ENG - 461
Sphere Buckling: The maximum combustion chamber pressure must not exceed the critical pressure at which the spherical shell will burst. The spherical shell thickness must be defined to assure safety in this area. A factor of safety of 10 is used. The
Iowa State - ENG - 461
theta L thick d2s xdist g(1) g(2) g(3)= 30 400 0.3 1.8 4646.96 -0.008 -2.392 -0.056=Vmuzzle=1772.1 f/stib=0.0202 seconds (time in barrel)NetF=-971.5 lbs (net force at end of barrel)theta=30.000000L=400.00000
Iowa State - ENG - 461
subroutine RK4(zn,t,z,dt)real*8 zn(4),z(4),t,dtreal*8 f1(4),f2(4),f3(4),f4(4)real*8 z1(4),z2(4),z3(4)call RHS(zn,t,f1);z1=zn+0.5*dt*f1;call RHS(z1,t+0.5*dt,f2);z2=zn+0.5*dt*f2;call RHS(z2,t+0.5*dt,f3);z3=zn+dt*f3;call RHS(z3,t+dt,f4);z=zn
Iowa State - ENG - 461
clear all;% unconstrained functionxr=0:.2:3;xa=[xr;xr;xr;xr;xr;xr;xr;xr;xr;xr;xr;xr;xr;xr;xr;xr];ya=xa';J=1+3*(xa-2).^2+(ya-1).^2;f1=figure;view(10,20);surfc(xa,ya,J);f2=figure;contour(xa,ya,J,50);yc=xr.^2;hold on;plot(xr,yc);[f
Iowa State - ENG - 524
program driver implicit real*8 (a-h,o-z) dimension ff(1000),fa(1000),xa(1000) dimension fd(100),x(100),a(100),b(100),c(100),cp(102) pi=3.141592654c-> how many intervals ? NOTE: #points is n+1 n=9c-> what kind o
Iowa State - ENG - 361
Subject:My rocket code.Students wishing to use my rocket code need the following information:Test it with:-program driverreal*8 obj,rocket,xv(12)integer nn=12xv(1)=.xv(2)=...xv(12)=.obj=rocket(xv,n)print*,objend-To test, cr
Iowa State - ENG - 524
subroutine invert(l3,l2,l1,d,u1,u2,u3,r,nl,nu,idim) implicit real*8 (a-h,o-z) real*8 l1,l2,l3integer nl,nu,idim dimension l3(*),l2(*),l1(*),d(*),u1(*),u2(*),u3(*),r(idim,*)c-c-this solver was written by R.G. Hindma
Iowa State - ENG - 461
AIRCRAFT SIZING DESIGN PROBLEM Aer E 461: Design Problem #1, John Doe, 09/23/88 CONSTRAINT BOUNDARIES: (1) W/S = 30 LB/(SQ FT) (2) T/S=8 LB/(SQ FT) (3)
Iowa State - ENG - 361
program mydriverimplicit noneinteger npts,ireal*8 x(100),y(100)character foiltype*(20)print*,'enter foiltype,npts:'read(*,'(a4,i4)')foiltype,nptscall makeafoil(foiltype,npts,x,y)do i=1,npts print*,x(i),y(i)enddoend
Iowa State - ENG - 361
SUBROUTINE DGEFS (A, LDA, N, V, ITASK, IND, WORK, IWORK, RCOND)CC*C*C* *C* Murphy Laboratory - Apollo Network - Math Library *C*
Iowa State - ENG - 361
program analysisimplicit noneinteger .real*8 .*-* stuff for DGEFS(Amat,LMAX,neqns,RHS,1,IND,WORK,IWORK,CN)*-* LMAX >= neqns LMAX is array dimension Amat(LMAX,LMAX)* IND a return error code* WORK a work array used in DGEFS* IWORK a
Iowa State - ENG - 361
program wingimplicit noneinteger .real*8 .*-* construct the wing geometry and MCL geometry and write data* to files in fast format** Input is croot,span,taper,twist,sweep,Nc,Nb*-* get the parameterscall input(.)* get the airfo
Iowa State - ENG - 461
program driverimplicit noneinclude './inc/global.h'real*8 uu(4), uulb(4), uuub(4), ui, uj, nsainteger i, j, k, nvar, ncon, isc(4), ier, w1, w2external analyz! read input datacall input() if (whichway .EQ. 0) then call ou
Iowa State - ENG - 461
c *c *c Sample Main Program to show implementation of the secant subroutinec Main Programc Declare the transcendental function routine (fcn) as externalimplicit real*8 (a-h,o-z) external fcnc Establish the initial marching location (x
Iowa State - ENG - 341
% Semi-colon prevents output of command.% Define a collection of x's & y's.x=0:.05:1;y=0:.05:1;% Define a mesh of points.[xx,yy]=meshgrid(x,y);% Compute the T distribution at each pointzz=some function of xx and yy% Plot surfacesurf(zz);
Iowa State - ENG - 341
Real*4 To,PI,x(21,21),y(21,21),fx(21,21),fy(21,21)Real*4 xx,yy,f(21,21),zInteger j,kTo=0.PI=3.141592654Do k=1,21Do j=1,21 x(j,k)=(j-1)*1.0/(20) xx=x(j,k) y(j,k)=(k-1)*1.0/(20) yy=y(j,k) f(j,k)= your function
Iowa State - ENG - 461
subroutine analyz(uu,ndvv,obj,gcon,nconn)include "./inc/global.h"integer ndvv,nconn,ireal*8 uu(ndvv),gcon(nconn),obj,nsado i=1,ndvvu(i)=ulb(i)+uu(i)*(uub(i)-ulb(i)enddoobj=nsa()do i=1,nconngcon(i)=g(i)enddoreturnend
Iowa State - ENG - 461
function driver()clear all;global u x c ukmin ukmax outputglobal hglobal PI% set the constantsmyinit;% input which control variables & rangesprompt = {'First u Index:','# u values:',. 'Second u Index:','# u values:',. '
Iowa State - ENG - 461
function z = RK4(zn,t)global dtf1=RHS(zn,t);z1=zn+0.5*dt*f1;f2=RHS(z1,t+0.5*dt);z2=zn+0.5*dt*f2;f3=RHS(z2,t+0.5*dt);z3=zn+dt*f3;f4=RHS(z3,t+dt);z=zn+dt*(f1+f4+2*(f2+f3)/6.0;
Iowa State - ENG - 215
Hindman, RichFrom:Longacre, Kenneth D [Kenneth.D.Longacre@USAHQ.UnitedSpaceAlliance.com]Sent:Thursday, September 02, 1999 7:07 PMTo:'Vance, Judy'; 'hindman@iastate.edu'Cc:Brockway, Daniel J; Andrew, Robert L; Longacre, Kenneth DSubject:Recr
Iowa State - ENG - 461
function g=getcons() global u c gg(1)=c(2)+c(1)+c(3)-u(1)-u(2);%slope -1 through center +c(3) in y. gg(2)=u(1)-u(2)+c(4);%slope 1 through y=c(4). g=gg';return;
Iowa State - ENG - 461
function mydriver()clear all;clear global all;global u x c ukmin ukmax% set the constantsmyconstants;% input prompt = {'Input 0|1|2 -> 0=one Nom Soln,1=opt,2=plots:'};title = 'Input For Sample Design Problem';lines= 1;def = {'0'};cell
Iowa State - ENG - 461
function [g geq]=confun(uu) global u c ukmin ukmax u=ukmin+uu.*(ukmax-ukmin); g=getcons; geq=[];return;