Introduction to Bioinformatics
Christopher Lee September 24, 2009
This course is for people who may want to invent new kinds of bioinformatics
1
Bioinformatics is the study of the inherent structure of biological information.
There is inherent structure:
Phylogeny Analysis
Christopher Lee
Phylogeny: Reconstructing Evolutionary History
Goal: infer past history that produced a set of modern characters (sequences, typically). Ingredients:
Characters: e.g. sequence differences Evolutionary model
Distance m
Ancestral Reconstruction & Selection Pressure
Christopher Lee December 3, 2009
Evolutionary Trees as Markov Chains
Assume were given binary tree as a directed graph G with nodes u, and branch lengths tuv for each edge u v. Assume leaf-node u emits observa
HMM Training: Baum-Welch Algorithm
Christopher Lee December 1, 2009
How to model gene evolution?
atggggctcagcgacggggagtggcagcaggtgctgaacgtctgggggaa atggggctcagtgatggggagtggcagatggtgctgaacatctgggggaa atggctgatcatgatctggttctgaagtgctggggagccgtggaggccga atggc
Chapter 3
A Recipe for Inference
3.0.3 Pure Inference
n
The projection operation is very useful for Bayesian inference when expressed in the following form: Pr(A|Bi ) Pr(Bi ) = Pr(A)
i=0
This enables us to rewrite Bayes Law in the form: Pr(H|O) = Pr(O|H)
Chapter 1
What is Inference?
I think the most interesting question in the world is how we think. This is one of the basic questions of life but how well do we understand it? Our difculty is not a shortage of ideas, but rather that different elds give disc
C260A Lecture 6: Measuring Evidence for Single Nucleotide Polymorphism
Christopher Lee October 15, 2009
Single Nucleotide Polymorphisms
Every persons genome is unique; on average there is one letter difference per 1000 letters in the DNA four-base code. T
C260A Lecture 5: Probabilistic Modeling
Christopher Lee October 7, 2009
Dening Events vs. Variables event: a subset of our total probability space S. p(e) = a number. variable: some slicing of S into disjoint subsets, each labeled with a distinct value. N
C260A Lecture 4: Probabilistic Modeling
Christopher Lee October 6, 2009
Conditional Probability
p(S C) p(S|C) = p(C)
Call S the subject and C the condition variable.
1
Draw a Venn diagram of the Monty Hall hidden ( ) vs. observed joint probability, in whi
C260A Lecture 3: A Recipe for Inference
Christopher Lee October 1, 2009
Whats the probability the sun will rise tomorrow?
Pierre-Simon Laplace worked out a clever solution to this problem.
1
The Binomial Likelihood Two outcomes: success vs. failure ns ns
C260A Lecture 2: Intro to Inference
Christopher Lee September 29, 2009
What is the fundamental difference between math and science?
1
A Diagnostic Test (T) for a Disease (D)
T + total T
D+ 1 9 10 960 30 990 D total 961 39 1000 p(T +|D+) = 9/10; p(T |D) =
Distance Metrics
Cartesian Distance
D( x, y ) = | x i y i |2
i
Manhattan Distance D( x, y ) = | x i y i |
i
Triangle inequality: Dab+DbcDac Additive distances:
D( x, y ) =
d
x y
ij
Clock-like: D(x,A)=D(y,A) for all x,y descended from common ancestor A