How would you deal a data with Target class imbalance problem ANS The classical

How would you deal a data with target class imbalance

This preview shows page 2 - 5 out of 7 pages.

2.How would you deal a data with Target class imbalance problem?
Image of page 2
If you take the example of rare diseases, a machine learning model may suffer from accuracy paradox, which makes it difficult to control false positives (or Type I Error) and false negatives (or Type II Error). This means that the patient may suffer from a rare disease but the machine learning model will not predict so since the majority of the datawill be from patients without the disease. In the example of fraud detection, the goal is to identify whether the transaction is fraudulent or not. Because most transactions are not fraudulent, this causes the model to predict the fraudulent transactions as valid. To overcome these challenges, several approaches have been developed that can be implemented during the pre-processing stage. One commonly used strategy is called resampling, which includes under sampling and oversampling techniques. If one balances the dataset by removing the instance from the overrepresented class then it’s called under sampling. Oversampling can be achieved by adding similar instances of underrepresented class to balance the skewed class ratio. Resampling could be done with or without replacement. The first two approaches are depicted in the image below and are explained in the following sections in detail.Re-sampling:Resampling is the process of reconstructing the data sample from the actual data sets either by non-statistical estimation or statistical estimation. In non-statistical estimation, we randomly draw samples from the actual population hoping that the data distribution has a similar distribution to the actual population. Statistical estimation, however, involves estimating the parameters of the actual population and then drawing the subsamples. In this way, we extract data samples that carry most of the information from the actual population. These resampling techniques help us in drawing the samples when the data is highly imbalanced.
Image of page 3
Under sampling:Random under sampling is a method in which we randomly select the samples from the majority class and discard the remaining. Because we assume that any random sample accurately reflects the distribution of the data, this is a naïve approach. This is a classical method in which the goal is to balance class distributions through the random elimination of majority class examples. This leads to discarding potentially useful data that could be
Image of page 4
Image of page 5

You've reached the end of your free preview.

Want to read all 7 pages?

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Stuck? We have tutors online 24/7 who can help you get unstuck.
A+ icon
Ask Expert Tutors You can ask You can ask You can ask (will expire )
Answers in as fast as 15 minutes