View the step-by-step solution to:

Tasks 1A. Initial data exploration 1. Identify the type of attributes {row id, age, job, marital, ., y} (nominal, ordinal, interval or ratio).

Tasks 1A. Initial data exploration 1. Identify the type of attributes {row id, age, job, marital, ......., y} (nominal, ordinal, interval or ratio). If it's not clear you may need to justify why you choose the type.

2. Identify the values of the summarising properties for the attributes including frequency, location and spread (e.g. value ranges of the attributes, frequency of values, distributions, medians, means, variances, percentiles, etc. - the statistics that have been covered in the lectures and materials given). Note that not all of these summary statistics will make sense for all the attribute types, so use your judgement! Where necessary, use proper visualisations for the corresponding statistics.

3. Using KNIME or other tools, explore your data set and identify any outliers, clusters of similar instances, "interesting" attributes and specific values of those attributes. Note that you may need to 'temporarily' recode attributes to numeric or from numeric to nominal. In the report include the corresponding snapshots from the tools and explanation of what has been identified there. Present your findings in the assignment report.

1B. Data preprocessing Perform each of the following data preparation tasks (each task applies to the original data) using your choice of tool: a. Use the following binning techniques to smooth the values of the "campaign" attribute: • equi-width binning • equi-depth binning. In the assignment report for each of these techniques you need to illustrate your steps. In your Excel workbook file place the results in separate columns in the corresponding spreadsheet. Use your judgement in choosing the appropriate number of bins - and justify this in the report. b. Use the following techniques to normalise the attribute "duration": • min-max normalization to transform the values onto the range [0.0-1.0]. • z-score normalization to transform the values.

In your Excel workbook file place the results in separate columns in the corresponding spreadsheet

. c. Discretise the "age" attribute into the following categories: Adult, Mid-age and Old-age. Provide the frequency of each category in your data set. In the assignment report provide explanation about each of the applied techniques. In your Excel workbook file place the results in a separate column in the corresponding spreadsheet. d. Binarise the "marital" variable [with values "0" or "1"]. In the assignment report provide explanation about the applied binarisation technique. In your Excel workbook file place the results in separate columns in the corresponding spreadsheet.

Why Join Course Hero?

Course Hero has all the homework and study help you need to succeed! We’ve got course-specific notes, study guides, and practice tests along with expert tutors.

-

Educational Resources
• -

Study Documents

Find the best study resources around, tagged to your specific courses. Share your own to gain free Course Hero access.

Browse Documents