The holiday season in between US Thanksgiving and Christmas is an important season for US retailers. The
dillards2004.txt contains a sample of transactions at various Dillard's department stores during the
peak of the holiday shopping season in 2004.
As often happens, the publisher has named the le with the general .txt extension. However, the le is
obviously stored in one of the formats we have learned about in class. How will you determine what format
the le is in and how to read it into R?
The MSA, Store, and Dept columns are all unique ID numbers of various metropolitan statistical areas,
stores, and departments. The STYPE column indicates whether the transaction is a Purchase or Return.
The ORIG column represents the original price of all items in the transaction (price * quantity). ACTUAL
represents the actual price paid by the customer. MARKDOWN should be ORIG - ACTUAL, but this is
real data and it has some anomalies. How will you account for them? The example console session shows
results for data cleaned using one approach. There are many others!
Complete the following tasks:
a. Read the le into a data.frame variable named dillards_df. Add a column to the data.frame
named MARKDOWN_PCT that indicates the percentage markdown of each transaction. MARKDOWN_PCT
should be a decimal number, rounded to 3 places.
b. Given the variable store, calculate the range (dened as max - min) of ORIG transaction
amounts for the store and save the result of your analysis in a variable named range_for_store.
c. Given the variable dept, calculate the minimum markdown percentage for the given department
and store your analysis in a variable named min_pct_for_dept.
I have no idea what I need to convert it to and how to. Based off of what we're learning I'm assuming it's JSON, but I still don't know how I'm supposed to make the 1 variable into multiple without downloading packages we haven't downloaded yet.
Here's what the txt looks like:
2680,5002,1100,GARY F ,P,6039,6039,0
2680,5002,1100,GARY F ,R,378,378,0