Data with a small number of dimensions tend to be qualitatively different than moderate or high dimensional data. In order to understand the Curse of Dimensionality , we first need to understand the other two characteristics of Data. 11 More data, more features or dimensions, is always a good thing, right? It’s actually a blessing and a curse.

Data Mining Data Data Sparsity For some datasets, most features have values of 0; in many cases fewer than 1% of the entries are non-zero. Such a data is called sparse data or it can be said that the data set has Sparsity. Can be a problem for many methods, often statistical ones. Can create a statistical bias due to small samples. Can also be an advantage, because less storage may be needed. 12
Data Mining Data Data Resolution Different resolutions reveal different patterns. If the resolution is too fine, a pattern may be buried in noise. If the resolution is too coarse, the pattern may disappear. For example, variations in atmospheric pressure on a scale of hours reflect the movement of storms and other weather systems. On a scale of months, such phenomena are not detectable. 13

Data Mining Data Curse of Dimensionality Many types of Data Analysis becomes difficult as the dimensionality (number of attributes in the data set) of the data set increases. Specifically, as dimensionality increases, the data becomes increasingly sparse in the space that it occupies . For classification, this can mean that there are not enough data objects to allow the creation of a model that reliably assigns a class to all possible objects. For clustering, the definitions of density and the distance between points, which are critical for clustering, become less meaningful.
