Hint Set a 0 0 b 1 1 and c 0 1 and p \u00bd Does it still satisfy the triangle

Hint set a 0 0 b 1 1 and c 0 1 and p ½ does it still

This preview shows page 13 - 21 out of 25 pages.

Hint: Set a = (0, 0) b = (1, 1) and c = (0, 1) and p = ½ Does it still satisfy the triangle inequality?
Image of page 13
A word of caution For these distance metrics to work nicely, the attributes must be scaled before using them. We have done this earlier in neural net training. In many cases, you might want to weight each attribute differently. This is called weighted distance. Example of weighted Euclidean distance: 9 S A / , A B = T E D E / − D E B G H EIJ J/G where w a is weight for the a th attribute
Image of page 14
Example:Data for UTD students' GPA (attribute) and getting internship (class) ispresented below:GPA2.62.82.853.13.23.33.43.553.63.73.754.04.0Internship0110110101001Using Manhattan distance, what will be the prediction of k-NN for astudent with GPA of 3.5 in the following cases:1.k = 12.k = 33.k = 5
Image of page 15
Example: Data for UTD students' GPA (attribute) and getting internship (class) is presented below: GPA 2.6 2.8 2.85 3.1 3.2 3.3 3.4 3.55 3.6 3.7 3.75 4.0 4.0 Internship 0 1 1 0 1 1 0 1 0 1 0 0 1 Using Manhattan distance, what will be the prediction of k-NN for a student with GPA of 3.5 in the following cases: 1. k = 1: Nearest neighbor: {(3.55, 1)} => Majority class = 1 2. k = 3: Nearest neighbors: {(3.55, 1), (3.4, 0), (3.6, 0) } => Majority class = 0 3. k = 5: Nearest neighbors: {(3.55, 1), (3.4, 0), (3.6, 0), (3.3, 1), (3.6, 1) } => Majority class = 1
Image of page 16
Handout Let's practice some questions from the handout.
Image of page 17
Handout Let's practice some questions from the handout. Realization: - The calculations become more and more tedious as k increases and more so when dimensions increase. - More work is done during the testing phase than during the training phase.
Image of page 18
Advantages and Disadvantages of k-NN Advantages: Training is very fast Can learn complex target functions easily Easy to program Disadvantages: Slow at query time Lots of storage and processing in memory Doesn't scale well in higher dimensions Easily tricked by noisy data items and irrelevant attributes Value of k can vary results significantly Curse of Dimensionality
Image of page 19
k-NN for Continuous Output We presented k-NN for classification, but it can easily be used to approximate functions where the output is continuous.
Image of page 20
Image of page 21

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture