Part 1: Planning
1. Plumbing Inc. has been selling plumbing supplies for the last 20 years. The owner, Joe,
Part 1: Text
Part 1: Text
1.Whataretwoproblemswithusingtextinmachinelearning?Whyarethesenotissueswhenusing
numericfeaturesaswehavebeenusingthroughoutclasssofar?
One major problem with using text is that text is un
Data mining: an application of data science; the extraction of useful/generalizable
info from the data (semi-automated)
-patterns
-generalizable=applicable to the population from which the data is dra
Part 1: Proposal Critique
Read the following proposal and provide 5 different critiques. Reference the ap
UN Global Pulse - Big Data For Development
Talk with Gijs Brouwer
8/2/2016
Part 1: Data Mining
1. For each of the following examples, tell me whether they are describing "Data Mining"
or "Data Mining in Use". For "Data Mining" tasks, please replace ANSwith DM. For "Data
Susan Dos Santos
Data Distribution
There are many distributional forms that our marketing data can take on.
Most data follows a specific form of one type or another.
Because of this fact, we can make
Susan Dos Santos
Correlation Analysis
The strength of the linear relationship between two variables is assessed by using
correlation analysis. For example, are age and income somehow related? In other
Susan Dos Santos
Chi Square and ANOVA Tests
Sometimes we want to evaluate the equality of more than 2 categories or groups.
For example is the average weight loss among three diet plans the same or a
Susan Dos Santos
Data Organization
Qualitative data we display with
Frequency distributions
Bar charts
Pie charts
Quantitative data we display with
Frequency distributions or relative frequency distri
Susan Dos Santos
Measures of Central Tendency
Did you know that the average or mean is quite sensitive to extreme data values
(outliers) which could cause you to make incorrect conclusions based on th
Susan Dos Santos
Sampling and Data Types
Every day marketers make decisions based on sample data
Based on the results of samples, inferences are drawn about the population as a whole
Statistics: A bra
Susan Dos Santos
The Central Limit Theorem
The assessment of sample means (average and percentages) are the basis of many every
day business decisions.
Therefore understanding exactly how an average i
Susan Dos Santos
Measures of Dispersion
The two main measures of dispersion of concern are:
The Range
The Variance and Standard Deviation
The range is the largest observation minus the smallest obse
Susan Dos Santos
Ensure Valid Test and Survey Results Trough Proper
Sample Size Estimation
The more accuracy required in our estimates, the more we will need to sample.
The key here is to determine ho
Susan Dos Santos
Analyzing our Marketing Test, Survey Results and Other
Metrics Using Confidence Intervals
When we estimate population averages or percentages based on samples, a certain
amount of err