# SDMD project week 3 - Megha Sharma.docx - Problem 1 A...

• 18
• 100% (2) 2 out of 2 people found this document helpful

This preview shows page 1 - 5 out of 18 pages.

Problem 1 A wholesale distributor operating in different regions of Portugal has information on annual spending of several items in their stores across different regions and channels. The data ( Wholesale Customer.csv ) consists of 440 large retailers’ annual spending on 6 different varieties of products in 3 different regions (Lisbon, Oporto, Other) and across different sales channel (Hotel/Restaurant/Café HoReCa, Retail). 1.1. Use methods of descriptive statistics to summarize data. Which Region and which Channel seems to spend more? Answer 1.1 To analyze the total spending on the different variable a column of total has been inserted in data set. Based on total spending by the customers in Channel and Hotels it can be said that Channel “Hotel” has spent more, and Retail has spent less. And Region “Other” has spent more and Oporto has spent less 1.2. There are 6 different varieties of items are considered. Do all varieties show similar behavior across Region and Channel? Answer 1.2 All varieties of items don’t show the similar behavior across Region and Channel. Considering the subset of data for Channel and Regions. And analyzing the data description In Hotel and Retail environment variables have different variation i.e. standard deviation from the sample mean as shown below in description of data
Channel- Hotel Channel-Retail From the observation, total number of buyers/spenders are more in Channel ‘Hotel’ then Channel ‘Retail’. Fresh items are more purchased in Hotel in comparison to Retail. This is well indicated by the standard deviation of each varieties of item, which is different in Hotel and Retail subsets of data.
Region- ‘Lisbon’ Region- ‘Oproto’ Region ‘Other’ From the analyzing the variables data for the various regions it can be concluded that Other Region has 316 number of buyers, followed by Region Lisbon having 77 number of buyers and then Region Oporto having only 47 number of buyers only. This indicates that more of the Hotels are in Other Region as maximum spent is by Hotel Channel and more number of buyers are in Others Region. Coming on the sale of variables it can be concluded that for Region ‘Lisbon’ maximum spent is on Fresh Items, as compare to Grocery in Region ‘Oporto’ and Fresh Items in Region ‘Other’.
This is well substantiated by the standard deviation which is inconsistent among the each varieties in the different subset of Regions. Thus we can concluded that all varieties do not show similar behavior across Region and Channel. Also, the overall behavior of items without analyzing it with any particular Channel or Region also shows that they do not have correlation among them. Fresh, Milk, Frozen, Delicatessen does not have either positive or negative relation with any of the item.