How would you deal a data with Target class imbalance problem ANS The classical

How would you deal a data with target class imbalance

This preview shows page 2 - 5 out of 7 pages.

2.How would you deal a data with Target class imbalance problem?
If you take the example of rare diseases, a machine learning model may suffer from accuracy paradox, which makes it difficult to control false positives (or Type I Error) and false negatives (or Type II Error). This means that the patient may suffer from a rare disease but the machine learning model will not predict so since the majority of the datawill be from patients without the disease. In the example of fraud detection, the goal is to identify whether the transaction is fraudulent or not. Because most transactions are not fraudulent, this causes the model to predict the fraudulent transactions as valid. To overcome these challenges, several approaches have been developed that can be implemented during the pre-processing stage. One commonly used strategy is called resampling, which includes under sampling and oversampling techniques. If one balances the dataset by removing the instance from the overrepresented class then it’s called under sampling. Oversampling can be achieved by adding similar instances of underrepresented class to balance the skewed class ratio. Resampling could be done with or without replacement. The first two approaches are depicted in the image below and are explained in the following sections in detail.Re-sampling:Resampling is the process of reconstructing the data sample from the actual data sets either by non-statistical estimation or statistical estimation. In non-statistical estimation, we randomly draw samples from the actual population hoping that the data distribution has a similar distribution to the actual population. Statistical estimation, however, involves estimating the parameters of the actual population and then drawing the subsamples. In this way, we extract data samples that carry most of the information from the actual population. These resampling techniques help us in drawing the samples when the data is highly imbalanced.
Under sampling:Random under sampling is a method in which we randomly select the samples from the majority class and discard the remaining. Because we assume that any random sample accurately reflects the distribution of the data, this is a naïve approach. This is a classical method in which the goal is to balance class distributions through the random elimination of majority class examples. This leads to discarding potentially useful data that could be

You've reached the end of your free preview.

Want to read all 7 pages?

• Winter '17

What students are saying

• As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

Kiran Temple University Fox School of Business ‘17, Course Hero Intern

• I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

Dana University of Pennsylvania ‘17, Course Hero Intern

• The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

Jill Tulane University ‘16, Course Hero Intern