We have obtained data set from a restaurant containing the customer ID, the time when the customer came (either
for lunch or for dinner) and the cost of the first, second and third course. Every course consists of a meal and some drinks. In this case the food has a fixed cost (see the menu below) while the drinks are a random number that is added (let's just say the restaurant has a very wide range of drinks).
The restaurant has different types of clients. There business clients that usually come in and are more likely to have the more expensive dishes. The restaurant is located next to a fitness center and attracts some of those healthier folks. They usually go for soups or salads and obviously hardly ever take desserts. Across the street there is a retirement center. They usually take a three-course menu often ending with a nice piece of pie. The other customers are usually just one time customers who are passing by for a quick main dish.
Are task is to first check the data and look for signs of these customers. Can we find which ones are likely which type? Once we have this we can determine their likelihood of visiting, what dishes they usually order, etc. The restaurant asked us to use this data to create some simulations using this data.
Starters: Soup $3, Tomato-Mozarella $15, Oysters $20
Mains: Salad $9, Spaghetti $20, Steak $25, Lobster $40
Desserts: Ice cream $15, Pie $10
Using Python programming language
• Read in data of data.csv
• Make a plot of the distribution of the cost in each course
• Make a barplot of the cost per course
• Because you know every course is the cost of the food + cost of drinks and suppose you can lower down to the nearest food. Determine the cost of the drinks per course. For example if the cost of the first course is 4.7, you know it's Soup 3$ plus 1.7$ drinks. Similarly 19.1 is mozzarella + 4.1$ drinks. You can assume the cost of the drinks is never bigger than the difference in price between two dishes in one course.
• Create 6 additional columns from this: 3 times the actual food that was bought for each course as well as
• The cost of the drinks (the drinks are basically the cost that was given minus the cost of the actual dish
• Cluster using kmeans data on cost per course (the original columns not the split-up ones).
You can assume there are are 4 clusters. Don't use the time or any other information, just cluster on the three columns of the cost.
(look at the unsupervised clustering example on the iris set. You can use the .values argument to immediately get the right format out of a dataframe to use in the model.
• Add the labels to your data
• Can you figure out which is the healthy group, which is the retirement group, business and normal customers?
• Can you find out any specific characteristics per group?
• Can you determine the distribution of clients (how many of each group do I expect)
So answering the question "what type of client is this"
• Can you determine the likelihood for each of these clients to get a certain course.
So answering the question "How likely is this type of client to get a starter, main and dessert?"
• Can you figure out the probability of a certain type of customer ordering a certain dish.
For example: a one-time customer buys ice cream in 30% cases and pie in 70% case (supposed they have dessert).
• Determine the distribution of dishes, per course, per customer type.
• Can you determine the distribution of the cost of the drinks per course? You should have that in a column from data. (remember the drink is the difference between the price and the actual cost of the dish.
Now that we know this distribution lets start making our own simulation (this means you will need random number generation)
• Create programming structure that allows you to make these four types of clients.
• All clients get a certain ID. It seems our original data set didn't capture the same customers returning and gave everyone a different ID. For the simulation we are going to assume that our regular customers only come to the restaurant once (so they get a unique ID). The other ones will be returning customers (so an ID that comes back).
For the business customers, every time you get one. In about 50% of the cases they have already visited returning. For Healthy customers about 70%. For Retirement customers we assume 90% are returning. Every customer should have a function goToRestaurant that creates the first, second and third course (in words not in price)
• Now using this information. And using the distributions you have determined above (you now have the distribution of WHAT type of client comes, WHEN they come, IF they are returning or new customers and what the probability is per course per dish:
• Create file that contains 5 years of exactly 365 days and 20 courses per day where you output
# TIME, CUSTOMERID, CUSTOMERTYPE, COURSE1, COURSE2, COURSE3, DRINKS1, DRINKS2, DRINKS3, TOTAL1, TOTAL2, TOTAL3
# Lunch, ID0344566, "Business", "Oysters", "Lobster", "Ice Cream", 1.4, 2.3, 1.6, 21.4, 42.3, 16.6
• Compare your data to the input data. What is the same? What is different?
• How would the revenue of the restaurant change if they suddenly had twice more healthy customers?
What if they increased the price of the Spaghetti?
Add one "research question" yourself like this and answer it.
Recently Asked Questions
- 6.26LAB: Elements in a range I need help on a program that first gets a list of integers from input. That list is followed by two more integers representing
- gallonsused=15 milesdriven=155 gettinggoodmileage=15 notgettinggoodmilieage=10 mpg=milesdriven / gallonsused print(mpg) if mpg > 20: print(gettinggoodmileage)
- Pseudo code Get hourlywage Get hoursworked Is hourlywage > hoursworked display hourlywage else display hoursworked hourlywage=7 hoursworked=12 if hourlywage >