lect11

lect11 - Lecture outline Nearest-neighbor search in low...

Info iconThis preview shows pages 1–19. Sign up to view the full content.

View Full Document Right Arrow Icon
Lecture outline Nearest-neighbor search in low dimensions kd-trees Nearest-neighbor search in high dimensions LSH Applications to data mining
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Definition Given: a set X of n points in R d Nearest neighbor: for any query point q є R d return the point x є X minimizing D(x,q) Intuition: Find the point in X that is the closest to q
Background image of page 2
Motivation Learning: Nearest neighbor rule Databases: Retrieval Data mining: Clustering Donald Knuth in vol.3 of The Art of Computer Programming called it the post-office problem, referring to the application of assigning a resident to the nearest-post office
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Nearest-neighbor rule
Background image of page 4
MNIST dataset “2”
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Methods for computing NN Linear scan: O(nd) time This is pretty much all what is known for exact algorithms with theoretical guarantees In practice: kd-trees work “well” in “low-medium” dimensions
Background image of page 6
2 -dimensional kd-trees A data structure to support range queries in R 2 Not the most efficient solution in theory Everyone uses it in practice Preprocessing time: O(nlogn) Space complexity: O(n) Query time: O(n 1/2 +k)
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
2 -dimensional kd-trees Algorithm: Choose x or y coordinate (alternate) Choose the median of the coordinate; this defines a horizontal or vertical line Recurse on both sides We get a binary tree: Size O(n) Depth O(logn) Construction time O(nlogn)
Background image of page 8
Construction of kd-trees
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Construction of kd-trees
Background image of page 10
Construction of kd-trees
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Construction of kd-trees
Background image of page 12
Construction of kd-trees
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
The complete kd-tree
Background image of page 14
Region of node v Region(v) : the subtree rooted at v stores the points in black dots
Background image of page 15

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Searching in kd-trees Range-searching in 2-d Given a set of n points, build a data structure that for any query rectangle R reports all point in R
Background image of page 16
kd-tree: range queries Recursive procedure starting from v = root Search (v,R) If v is a leaf, then report the point stored in v if it lies in R Otherwise, if Reg(v) is contained in R , report all points in the subtree(v) Otherwise: If Reg(left(v)) intersects R , then Search (left(v),R) If Reg(right(v)) intersects R , then Search (right(v),R)
Background image of page 17

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
We will show that Search takes at most O(n 1/2 +P) time, where P is the number of reported points The total time needed to report all points in all sub-trees is O(P) We just need to bound the number of nodes v such that region(v) intersects R but is not contained in R (i.e., boundary of R intersects the boundary of region(v) ) gross overestimation
Background image of page 18
Image of page 19
This is the end of the preview. Sign up to access the rest of the document.

This document was uploaded on 10/05/2010.

Page1 / 51

lect11 - Lecture outline Nearest-neighbor search in low...

This preview shows document pages 1 - 19. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online