aPrimerLearningBNCompBio_07plos

aPrimerLearningBNCompBio_07plos - Education A Primer on...

Info iconThis preview shows pages 1–2. Sign up to view the full content.

View Full Document Right Arrow Icon
Education A Primer on Learning in Bayesian Networks for Computational Biology Chris J. Needham * , James R. Bradford, Andrew J. Bulpitt, David R. Westhead Introduction Bayesian networks (BNs) provide a neat and compact representation for expressing joint probability distributions (JPDs) and for inference. They are becoming increasingly important in the biological sciences for the tasks of inferring cellular networks [1], modelling protein signalling pathways [2], systems biology, data integration [3], classiFcation [4], and genetic data analysis [5]. The representation and use of probability theory makes BNs suitable for combining domain knowledge and data, expressing causal relationships, avoiding overFtting a model to training data, and learning from incomplete datasets. The probabilistic formalism provides a natural treatment for the stochastic nature of biological systems and measurements. This primer aims to introduce BNs to the computational biologist, focusing on the concepts behind methods for learning the parameters and structure of models, at a time when they are becoming the machine learning method of choice. There are many applications in biology where we wish to classify data; for example, gene function prediction. To solve such problems, a set of rules are required that can be used for prediction, but often such knowledge is unavailable, or in practice there turn out to be many exceptions to the rules or so many rules that this approach produces poor results. Machine learning approaches often produce better results, where a large number of examples (the training set) is used to adapt the parameters of a model that can then be used for performing predictions or classiFcations on data. There are many different types of models that may be required and many different approaches to training the models, each with its pros and cons. An excellent overview of the topic can be found in [6] and [7]. Neural networks, for example, are often able to learn a model from training data, but it is often difFcult to extract information about the model, which with other methods can provide valuable insights into the data or problem being solved. A common problem in machine learning is overFtting, where the learned model is too complex and generalises poorly to unseen data. Increasing the size of the training dataset may reduce this; however, this assumes more training data is readily available, which is often not the case. In addition, often it is important to determine the uncertainty in the learned model parameters or even in the choice of model. This primer focuses on the use of BNs, which offer a solution to these issues. The use of Bayesian probability theory provides mechanisms for describing uncertainty and for adapting the number of parameters to the size of the data. Using a graphical representation provides a simple way to visualise the structure of a model. Inspection of models can provide valuable insights into the properties of the data and allow new models to be produced.
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Image of page 2
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 04/06/2010 for the course COMPUTER S COSC1520 taught by Professor Paul during the Spring '09 term at York University.

Page1 / 8

aPrimerLearningBNCompBio_07plos - Education A Primer on...

This preview shows document pages 1 - 2. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online