# MicroRNA - MicroRNA The Computational Challenge...

This preview shows pages 1–14. Sign up to view the full content.

MicroRNA The Computational Challenge Bioinformatics Seminar, March 9, 2005 By Yaron Levy

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Tree of RNA Types
miRNA Biological Process

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Micro RNA – Computational Approach Problem 1: Finding putative microRNA from a sequence Horesh et al, using suffix trees data structure Problem 2: Computing secondary structure of a given sequence Zuker & Steigler, minimum free energy, using dynamic programming Problem 3: miRNA predicting algorithms Lim et al, MiRscan Problem 4: Predicting miRNA target genes Lewis et al, TargetScan
Problem 1 Find these

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Problem 1: Finding putative microRNA from a sequence A naïve idea: slide a “window” of size L over the sequence of size N, looking for stems of size S. Computationally O(NL+NS) – too much A better approach – using a suffix tree.
S = M A L A Y A L A M \$ 1 2 3 4 5 6 7 8 9 10 \$ YALAM\$ M \$ ALAYALAM\$ \$M A AL LA 6 2 8 4 7 3 1 9 5 10 What is a suffix tree?

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Suffix tree properties For a string S of length n , there are n leaves and at most n internal nodes. therefore requires only linear space Each leaf represents a unique suffix. Concatenation of edge labels from root to a leaf spells out the suffix. Each internal node represents a distinct common prefix to at least two suffixes.
Finding a (short) Pattern in a (long) String 1. Build a suffix tree of the string. 2. Starting from the root, traverse a path matching characters of the pattern. 3. If stuck, pattern not present in string. Otherwise, each leaf below gives a position of the pattern in the string.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Find “ALA” \$ YALAM\$ M \$ ALAYALAM\$ M\$ A AL LA 6 2 8 4 7 3 1 9 5 10 Two matches - at 6 and 2 Finding a Pattern in a String
Generalized Suffix Tree \$ O ND W I \$OG D \$OGI OW\$ \$W \$ INDOW\$ (2, 3) (1, 4) (2, 5) (2, 4) (2, 1) (1, 2) (2, 2) (1, 3) (1, 5) (2, 6) (1, 6) (1, 1) (1, 7) (2, 7) WINDOW\$ INDIGO\$ 1234567 1234567

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Horesh et al – using a generalized suffix tree for finding putative microRNA’s Assumptions: At least a triple repeat is necessary: 2 for the stems of the hairpin – close to each other in the sequence, and as inverted repeat of each other The rest are target genes – can be anywhere The repeats must be fully matched – no mismatches are allowed This is more of a constraint
Construct a generalized suffix tree of the original sequence and the inverted repeat sequence. Preprocess the suffix tree for calculating: Length of suffixes Number of repeats Index of suffix in sequence With these attributes for each node, along with the indices of the suffixes in the sequence, it is possible to find the requested triple (or more) repeats. Computationally efficient O(N)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

{[ snackBarMessage ]}

### Page1 / 53

MicroRNA - MicroRNA The Computational Challenge...

This preview shows document pages 1 - 14. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online