# Hw1 - Problem 1 A database of 10 million credit card transactions had 1 fraud cases and the remaining 99 of the transactions were legitimate A data

Unformatted text preview: Problem 1. A database of 10 million credit card transactions had 1% fraud cases and the remaining 99% of the transactions were legitimate. A data miner studying fraud detection is using a random sample of 20,000 transactions. For this purpose, would a simple random sample or a stratified random sample be best? Explain why and discuss how the sample you think is best would be chosen. Problem 2. Data Preparation and Exploration The Excel file Baseball.xls contains data on baseball salaries based on performance. The Text file Baseball.txt describes the raw data. Use the SAS EM/Insight software as the exploratory DM platform. (a) List each variable together with its model role, measurement scale and type. Scan the data for missing values. Are there any? (b) Plot a histogram of player salaries. Does the salary distribution appear to be skewed? Discuss your answer. Repeat the same exercise for the variable RBI. Plot four histograms for player statistics of your choice in one plot and discuss your observations.for player statistics of your choice in one plot and discuss your observations....
ORIE 474

