We need to consider the time and space requirements for the two distinct phases

# We need to consider the time and space requirements

• Notes
• 58

This preview shows page 37 - 43 out of 58 pages.

We need to consider the time and space requirements for the two distinct phases of classification : Time to construct the classifier In the case of the simpler linear classifier, the time taken to fit the line, this is linear in the number of instances. Time to use the model In the case of the simpler linear classifier, the time taken to test which side of the line the unlabeled instance is. This can be done in constant time. Speed and Scalability I Speed and Scalability I 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 As we shall see, some classification algorithms are very efficient in one aspect, and very poor in the other.
Speed and Scalability II Speed and Scalability II We need to consider the time and space requirements for the two distinct phases of classification : •Time to construct the classifier •In the case of the simpler linear classifier, the time taken to fit the line, this is linear in the number of instances. Time to use the model In the case of the simpler linear classifier, the time taken to test which side of the line the unlabeled instance is. This can be done in constant time. Speed and Scalability I Speed and Scalability I 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 As we shall see, some classification algorithms are very efficient in one aspect, and very poor in the other. For learning with small datasets, this is the whole picture However, for data mining with massive datasets, it is not so much the (main memory) time complexity that matters, rather it is how many times we have to scan the database. This is because for most data mining operations, disk access times completely dominate the CPU times. For data mining, researchers often report the number of times you must scan the database.
Robustness I Robustness I We need to consider what happens when we have: Noise For example, a persons age could have been mistyped as 650 instead of 65, how does this effect our classifier? (This is important only for building the classifier, if the instance to be classified is noisy we can do nothing).Missing valuesFor example suppose we want to classify an insect, but we only know the abdomen length (X-axis), and not the antennae length (Y-axis), can we still classify the instance? 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 2
Robustness II Robustness II We need to consider what happens when we have: Irrelevant features For example, suppose we want to classify people as either Suitable_Grad_Student Unsuitable_Grad_Student And it happens that scoring more than 5 on a particular test is a perfect indicator for this problem… 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 If we also use “hair_length” as a feature, how will this effect our classifier?
Robustness III Robustness III We need to consider what happens when we have: Streaming data For many real world problems, we don’t have a single fixed dataset. Instead, the data continuously arrives, potentially forever… (stock market, weather data, sensor data etc) 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Can our classifier handle streaming data?
Interpretability Interpretability Some classifiers offer a bonus feature. The structure of the learned classifier tells use something about the domain. Height