I need help with these

two questions. Please. This is from my data mining class.

4. Suppose that a hospital tested the age and body fat data for 18 randomly selected adults with the
following results: age 23 23 27 27 39 41 47 49 50 52 54 54 56 57 58 5s 60 61
fat 9.5 26.5 7.8 17.8 31.4 25.9 27.4 27.2 31.2 34.6 42.5 28.8 33.4 30.2 34.1 32.9 41.2 35.7
(a) Calculate the 5 number summary for the age attribute, and use this to draw a boxplot for age. (2)
(b) List a possible source of noise or errors in this dataset. (1) (0) Suppose that the last two values for the body fat attribute were missing. Choose a data cleaning
method for handling the missing data problem in this scenario. Justify your choice. (1) 5. Suppose that a data warehouse consists of the four dimensions, date, spectator, location, and game,
with the data cube measure charge, where charge is the fare that a spectator pays when watching a
baseball game on a given date. Spectators may be students, adults, or seniors, with each category having its own charge rate. (a) Design concept hierarchies for each of these dimensions that could be used to encode the levels of
aggregation needed for typical analyses of this collection of data. (2) (b) How many cuboids are there in the data cube with the concept hierarchies that you’ve specified
(including the base and apex cuboids)? (2) (0) Starting with the base cuboid [datc, spectator, location, game], what specific OLAP operations
should you perform in order to list the total charge paid by student spectators at Camden Yards
in 2015? (2)
