Lecture-biosurveillance by Andrew Moore and colleagues

Lecture-biosurveillance by Andrew Moore and colleagues -...

Info iconThis preview shows pages 1–7. Sign up to view the full content.

View Full Document Right Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Rapid Detection of Significant Spatial Clusters Daniel B. Neill Andrew W. Moore The Auton Lab Carnegie Mellon University School of Computer Science E-mail : {neill, awm}@cs.cmu.edu Introduction Goals of data mining: Discover patterns in data. Distinguish patterns that are significant from those that are likely to have occurred by chance. For example: In epidemiology , a rise in the number of disease cases in a region may or may not be indicative of an emerging epidemic. In brain imaging , an increase in measured fMRI activation may or may not represent a real increase in brain activity. This is why significance testing is important! Problem overview Assume data has been aggregated to an N x N grid. Each grid cell s ij has a count c ij and a population p ij . Our goal is to find overdensities : spatial regions where the counts are significantly higher than expected, given the underlying population. P=5000 C=27 P=3500 C=14 P=4500 C=22 P=3000 C=15 P=1000 C=5 P=5000 C=26 P=4000 C=17 P=3000 C=12 P=2000 C=12 P=1000 C=4 P=5000 C=19 P=5008 C=25 P=4000 C=43 P=3000 C=37 P=4000 C=20 P=4800 C=18 P=4800 C=20 P=4000 C=40 P=3000 C=22 P=4000 C=16 P=4700 C=20 P=3000 C=13 P=3000 C=18 P=2000 C=20 P=1000 C=4 Underlying population of cell Count of cell This region has an overdensity of counts. Application domains In epidemiology : Counts c ij represent number of disease cases in a region, or some related observable quantity (Emergency Department visits, sales of OTC medications). Populations p ij can be obtained from census data or historical counts (e.g. past OTC sales). In brain imaging : Counts c ij represent fMRI activation in a given voxel. Populations p ij represent baseline activation under null condition. Also applicable to other domains, e.g. astrophysics, surveillance. Application domains In epidemiology : Counts c ij represent number of disease cases in a region, or some related observable quantity (Emergency Department visits, sales of OTC medications). Populations p ij can be obtained from census data or historical counts (e.g. past OTC sales). In brain imaging : Counts c ij represent fMRI activation in a given voxel. Populations p ij represent baseline activation under null condition. Also applicable to other domains, e.g. astrophysics, surveillance. Goal : find clusters of disease cases, allowing early detection of epidemics. Application domains In epidemiology : Counts c ij represent number of disease cases in a region, or some related observable quantity (Emergency Department visits, sales of OTC medications). Populations p ij can be obtained from census data or historical counts (e.g. past OTC sales)....
View Full Document

This note was uploaded on 10/06/2011 for the course CS 2434 taught by Professor Shasha during the Spring '11 term at NYU.

Page1 / 30

Lecture-biosurveillance by Andrew Moore and colleagues -...

This preview shows document pages 1 - 7. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online