Coffee or Tea Specialty ShopsJim Thorton’sExecutive TeamGeorgia TselikisMarch 28, 2020OverviewThe following analysis explorescollected data on the market for coffee and tea from thepast 23 years. Measures of central tendency, measures of spread, distribution of bothhistograms and box-plots, linear correlation, and correlation type will all be analyzed toprovide a basis of recommendation for whether coffee or tea is the best option to investin a specialty shop.Measures of Central TendencyThemeanrepresents all values of the data set, it is also referred to as theaverage.From the data set forTea (L per person),the mean is 70.94478261, thus representingall of the Tea values from the data set. From theCoffee (L per person)data set, themean is 100.1286957, again this number represents all the values of the Coffee dataset. Looking at both means, it is evident that the mean for the data set of Coffee (L perperson) is greater than the mean for the data set of Tea (L per person). This indicatesthat the Coffee dataset is slightly more spread out.Themedianis the data point that lies in the middle of a data set when ordered, and isalso referred to as the 50th percentile.From the data set forTea (L per person),themedian is 68.31, the mean from this data set, recall, is 70.94478261. It is greater thanthe median thus the data is positively skewed. From theCoffee (L per person)dataset, the median is 101.31, it is greater than the mean, recall, 100.1286957, thus thedata is negatively skewed.Themodal intervalisthe interval of data points that shows up the most.From the dataset forTea (L per person),the modal interval is 55 - 70. This indicates that the mostyears from the data set, have 55 - 70 litres of tea per person. From the data set forCoffee (L per person),the modal interval is 100 - 105. This indicates that the most ofthe years from the data set have 100 - 105 litres per person.Measures of SpreadMeasures of spread help to describe the variability in a data set by summarizing theextent to which data is clustered around the center. There are three measures ofspread, range, interquartile range (IQR), and standard deviation.
Therangeis the difference between the maximum data point and the minimum datapoint. The range for the data set ofTea (L per person)is 69.2. The range is notextremely small, but not extremely large either, this indicates that there is amoderate/medium amount of variability. From theCoffee (L per person)is 18.68. Thisis a small range thus indicating that in the data set there is low variability. Comparingthe ranges from both the Tea data set and the Coffee data set, it can be seen that theTea data set has a higher level of variability than the Coffee data set.