lect11

# lect11 - Lecture outline Nearest-neighbor search in low...

This preview shows pages 1–19. Sign up to view the full content.

Lecture outline Nearest-neighbor search in low dimensions kd-trees Nearest-neighbor search in high dimensions LSH Applications to data mining

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Definition Given: a set X of n points in R d Nearest neighbor: for any query point q є R d return the point x є X minimizing D(x,q) Intuition: Find the point in X that is the closest to q
Motivation Learning: Nearest neighbor rule Databases: Retrieval Data mining: Clustering Donald Knuth in vol.3 of The Art of Computer Programming called it the post-office problem, referring to the application of assigning a resident to the nearest-post office

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Nearest-neighbor rule
MNIST dataset “2”

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Methods for computing NN Linear scan: O(nd) time This is pretty much all what is known for exact algorithms with theoretical guarantees In practice: kd-trees work “well” in “low-medium” dimensions
2 -dimensional kd-trees A data structure to support range queries in R 2 Not the most efficient solution in theory Everyone uses it in practice Preprocessing time: O(nlogn) Space complexity: O(n) Query time: O(n 1/2 +k)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
2 -dimensional kd-trees Algorithm: Choose x or y coordinate (alternate) Choose the median of the coordinate; this defines a horizontal or vertical line Recurse on both sides We get a binary tree: Size O(n) Depth O(logn) Construction time O(nlogn)
Construction of kd-trees

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Construction of kd-trees
Construction of kd-trees

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Construction of kd-trees
Construction of kd-trees

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
The complete kd-tree
Region of node v Region(v) : the subtree rooted at v stores the points in black dots

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
Searching in kd-trees Range-searching in 2-d Given a set of n points, build a data structure that for any query rectangle R reports all point in R
kd-tree: range queries Recursive procedure starting from v = root Search (v,R) If v is a leaf, then report the point stored in v if it lies in R Otherwise, if Reg(v) is contained in R , report all points in the subtree(v) Otherwise: If Reg(left(v)) intersects R , then Search (left(v),R) If Reg(right(v)) intersects R , then Search (right(v),R)

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
We will show that Search takes at most O(n 1/2 +P) time, where P is the number of reported points The total time needed to report all points in all sub-trees is O(P) We just need to bound the number of nodes v such that region(v) intersects R but is not contained in R (i.e., boundary of R intersects the boundary of region(v) ) gross overestimation
This is the end of the preview. Sign up to access the rest of the document.

## This document was uploaded on 10/05/2010.

### Page1 / 51

lect11 - Lecture outline Nearest-neighbor search in low...

This preview shows document pages 1 - 19. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online