THE ROLE OF RANDOM FOREST IN CREDITCARD FRAUD ANALYSISSudeep Dogga Data Science and Artificial IntelligenceBOURNEMOUTH UNIVERSITYBournemouth, England[email protected]Abstract—The focus of the paper is the analysis of credit card fraud. The tremendous increase in credit card dealings and proceedingsin recent years has led to a significant increase in fraud. The main drawback in the credit card usage is that it does not require the cardholder to authorize the transaction, so it is hard to find whether the transaction is genuine or not. Many machine learning algorithmscan be used to analyze the credit card fraud, the paper is focused mainly on the Random Forest algorithm because of its advantages likehigher dimensionality and accuracy. It is capable to solve both classification and regression issues.Keywords—Random forest, Decision Tree, Credit card fraud analysis.I.INTRODUCTIONRandom Forest is one of the many machine learning algorithms used for credit card fraud analysis. Many other methods using AI,Data Mining has been used for years. Credit card frauds can happen in different ways such as lost card misused by unknownperson, card details overseen by next person in public places, by making fake calls convincing the individuals disclose theirconfidential card details and with adaptive technology hacking bank accounts. Credit card fraud is the commonly practiced fraudthat effects the financial sector with billions of losses globally. One in every thousand credit card transactions are declared asfraud. Two main challenges involved in the credit card fraud analysis is to handle huge amount of imbalanced data that iscontinuously generated from everyday transactions and limitations in data availability because of banks privacy policies for theircustomers. Whenever an online transaction is happening there is a chance that there should be a hacker who is looking to steal thedetails of card and take advantage of that data, so in-order to understand the attributes that are basically detecting the fraudtransaction we need to collect the data in-order to analyze the data. Credit card frauds are hard to detect as the fraudsters came upwith new way of fraud each other time. From [[ CITATION Lyl20 \l en-US ]]In USA It is stated that The Federal TradeCommission (FTC) identified more than 3.2 million cases of fraud in 2019, it is the most common type of fraud with identity theftoccurring in 20.33% of cases. From [ CITATION DCl20 \l en-US ] In UK statistics stated that the distribution of total annual fraudlosses on UK-issued debit and credit cards are 76 percent of Card not present, 15 percent of card fraud. Machine learning isconsidered as the productive approach to analyze the few fraud transaction among millions of genuine transactions by analyzingthe data of previous transaction. This previous dataset has to be divided into two parts, one part is to train the models and anotherpart is to test the trained models.