cancerdetection - Data Mining Algorithms for Cancer...

Info iconThis preview shows pages 1–15. Sign up to view the full content.

View Full Document Right Arrow Icon
1 Data Mining Algorithms for Cancer Detection Nirmalya Bandhopadhay, Jun Liu, Sanjay Ranka, Tamer Kahveci http://www.cise.ufl.edu
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 Outline • Cancer Datasets are growing - CGH, Microarray, Microarray time course • Datasets are High Dimensional – 1000 to 20000 dimensions • Maximum Influence Feature Selection • Biological Pathway Feature Selection • Cancer Progression Modeling
Background image of page 2
3 Gene copy number The number of copies of genes can vary from person to person. – ~0.4% of the gene copy numbers are different for pairs of people. Variations in copy numbers can alter resistance to disease – EGFR copy number can be higher than normal in Non-small cell lung cancer. Healthy Cancer Lung images (ALA)
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
4 Raw and smoothed CGH data
Background image of page 4
5 Example CGH dataset 862 genomic intervals in the Progenetix database
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
6 Problem description •Given a new sample, which class does this sample belong to? •Which features should we use to make this decision?
Background image of page 6
7 Classification with SVM • Consider a two-class, linearly separable classification problem • Many decision boundaries! • The decision boundary should be as far away from the data of both classes as possible – We should maximize the margin, m Class 1 Class 2 m
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
8 Let { x 1 , . .., x n } be our data set and let y i {1,-1} be the class label of x i Maximize J over α i SVM Formulation Similarity between x i and x j •The decision boundary can be constructed as
Background image of page 8
9 Pairwise similarity measures • Raw measure – Count the number of genomic intervals that both samples have gain (or loss) at that position. Raw = 3
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
10 SVM based on Raw kernel Using SVM with the Raw kernel amounts to solving the following quadratic program The resulting decision function is Maximize J over α i : Use Raw kernel to replace Use Raw kernel to replace Is this cool?
Background image of page 10
11 Is Raw kernel valid? Not all similarity function can serve as kernel. This requires the underlying kernel matrix M is “positive semi- definite”. M is positive semi-definite if for all vectors v, v T Mv 0
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
12 • Proof: define a function Φ () where Φ : a {1, 0, -1} m b {1, 0} 2m ,where Φ (gain) = Φ (1) = 01 Φ (no-change) = Φ (0) = 00 Φ (loss) = Φ (-1) = 10 – Raw(X, Y) = Φ (X) T Φ (Y) Is Raw kernel valid? X = 0 1 1 0 1 -1 Y = 0 1 0 -1 -1 -1 * * Φ (X) = 0 0 0 1 0 1 0 0 0 1 1 0 Φ (Y) = 0 0 0 1 0 0 1 0 1 0 1 0 * * Raw(X, Y) = 2 Φ (X) T Φ (Y) = 2
Background image of page 12
13 Raw Kernel is valid! • Raw kernel can be written as Raw(X, Y) = Φ (X) T Φ (Y) • Define a 2m by n matrix • Therefore, Let M denote the Kernel matrix of Raw
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
14 MIFS for multi-class data One-versus-all SVM [1, 2, 31] [3, 4, 12] [5, 8, 15] Sort ranks of features [2, 31, 1] [12, 4, 3] Ranks of features [5, 15, 8] Feature 2 Feature 3 Feature 4
Background image of page 14
Image of page 15
This is the end of the preview. Sign up to access the rest of the document.

This note was uploaded on 11/13/2011 for the course CIS 4930 taught by Professor Staff during the Spring '08 term at University of Florida.

Page1 / 43

cancerdetection - Data Mining Algorithms for Cancer...

This preview shows document pages 1 - 15. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online