I400 Data Mining
Lecture notes for January 31, 2005
Data Transformation
In data transformation, the data are modified into forms more appropriate for mining. Several
techniques may be viewed as data transformation techniques:
1. Smoothing these techniques
Yeung Wing Yan
53085940
MS4224 Enterprise Data Mining
Assignment 1
Q3
The weight of evidence (WOE) is calculated by the following table order by WOE ascending.
STATUS
COUN BU
NOT P(STATUS |
P(STATUS |
WOE
T
Y
BUY BUY)
NOT BUY)
= P(STATUS|
= P(STATUS
=P(ST
Yeung Wing Yan
53085940
MS4224 Enterprise Data Mining
Assignment 1
Question 1
(a) Fill in the missing value of OPENPRICE by the median method.
The 7th and the 8thvalues OF OPERPRICE are 0.51 and 0.7 respectively
The missing value
= (0.51+0.7)/2
=0.605
(b)
For each project:
Cumulative Probability
1.00
0.98
0.975
X
Loss ($million)
1
10
Assume you have 1000 trial numbers. Sort the loss amount from the smallest to the largest.
loss
trial #
amount
1
2
3
4
5
.
969
970
971
972
973
974
975
976
977
978
979
980
981
MiniCase study-Crank Ltd
Crank has been in business since the 1920s and have three locations in the UK. Their
Head Office and main manufacturing site is in Leicester. This site makes complex tubular
assemblies for defence organisations, oil and gas and tr
Topic 6 Tutorial Questions
Q1
Do students at your school study more, less, or about the same as at other business schools?
Business week reported that at the top 50 business schools? Business Week reported that at
the top 50 business schools, students stu
Sampling Distribution Question
Q1
Given a normal distribution with = 100 and = 12, if you select a sample of n = 36, what
is the probability that X is
a) Less than 95?
b) Between 95 and 97.5?
c) Above 102.2?
d) There is a 65% chance that X is above what v
Q1
X ~ N(100, 102)
a)
P(X > 85) = P (Z >
85 100
)
10
= P (Z > -1.5) = 1 - P (Z <= -1.5)
= 1 0.0668 = 0.9332
b)
P(X < 80) = P (Z < -2) = 0.0228
c)
P(X < 80 or X > 110) = P (Z < -2) + P (Z > 1) = 0.1815
d) P (Xlower < X < Xupper) = 0.8
P(Z < -1.28) = 0.10
X
Q1
The 3-D display makes it difficult to read the bars. Focusing at the front of each bar,
the side of each bar, or the back of each bar will give different impressions.
The x-axis goes from right to left, instead of the usual direction left to right, the
Topic 1 Tutorial Questions
Q1
The following graph shows the U.S. household income data in 1985.
(Source: The U.S. Department of Labor)
Critique the graph in terms of its layout, content and clarity
Q2
Figure 1 below shows the profits of ABC company from 2
Confidence Interval
Q1
If X =120, =24 and n=36, construct a 99% confidence interval estimate of the population mean,
Q2
A stationery store wants to estimate the mean retail value of greeting cards that it has in its inventory.
A random sample of 100 greet
Probability
Q1
Each year, ratings are compiled concerning the performance of new cars during the first 90
days of use. Suppose that the cars have been categorized according to whether the car needs
warranty-related repair (yes or no) and the country in wh
Basic ProbabilityAnswers
Q1
Needs warranty- related repair
U.S.
Non-U.S.
Total
Yes
0.025
0.015
0.04
No
0.575
0.385
0.96
Total
0.600
0.400
1.00
a.
b.
c.
d.
P(needs warranty repair) = 0.04
P(needs warranty repair and manufacturer based in U.S.) = 0.025
P(ne
MS2200 Business Statistics
Topic 8
Simple Linear Regression
Reference
Levine, D.M., Krehbiel, T.C. and Berenson, M.L., Business Statistics: A First Course, 6/e
(International Edition), 2013, Pearson Education Ltd, Chapter 3 & 12
1
Outline
Linear Associati
MS2200 Business Statistics
Topic 7
Inference for the Proportion
Reference
Levine, D.M., Krehbiel, T.C. and Berenson, M.L., Business Statistics: A First Course, 6/e
(International Edition), 2013, Pearson Education Ltd, Chapter 7 & 8 & 9
1
Outline
Sampling
MS2200 Business Statistics (Week 6)
Review Questions
1
Z-table look up and rounding
Z
0.00
0.01
0.02
0.0
0.5000
0.5040
0.5080
0.1
0.5398
0.5438
0.5478
0.2
0.5793
0.5832
0.5871
0.3
0.6179
0.6217
0.6255
2
Example
1.
Weights of a certain population can be as
Dr Susanna TAM
Office: AC1-P7615
Tel: 3442-7483
Email: susannat@cityu.edu.hk
1
What is Statistics?
Statistics
The branch of mathematics that
transforms data into useful information for
decision makers.
Principles of
Probability
Descriptive Statistics
Coll
MS2200 Business Statistics
Review Questions
1
Z-table look up and rounding
Z
0.00
0.01
0.02
0.0
0.5000
0.5040
0.5080
0.1
0.5398
0.5438
0.5478
0.2
0.5793
0.5832
0.5871
0.3
0.6179
0.6217
0.6255
2
Example
1.
Weights of a certain population can be assumed nor
1
Dr. Eman Leung
Office: AC1-G7508
Tel: 3442-8374
Email: emaleung@cityu.edu.hk
2
Review Question #1
You are trying to develop a strategy for investing in two different
stocks. The anticipated annual return for a $1,000 investment in
each stock has the fol
MS2200 Business Statistics
Topic 4
Sampling Distributions
Reference
Levine, D.M., Krehbiel, T.C. and Berenson, M.L., Business Statistics: A First Course, 6/e
(International Edition), 2013, Pearson Education Ltd, Chapter 7
1
Outline
Sampling Distribution o
Dr. Eman Leung
Office: AC1-G7508
Tel: 3442-8374
Email: emaleung@cityu.edu.hk
1
MS2200 Business Statistics
Week 2 Review:
1. Joint, Compound Events
2. Conditional Probability and Statistical
Independence
3. Sample Statistics vs. Population Parameters (will
tt, tJQ
A~ ~n )-S. At4-r[- ~kcJ-lfllcfw_>dd ~
I
;-~
1lu-e
/ ttl
tcfw_;v)O~tJ M -(';~ IvvILPl
hJ : t 576/ IAtfb 7g/
h,
cfw_' &1'r-C (o;fi I7fl
A QOtr
7'b
d&WYI \ffA'd fJ-(ow~ cuvcfw_
( &i: "7tl L 'r R ~cfw_ Ol-:t . tCL I ,- s
I.
eA. ~ f~
I
e. l
90 b
Ac
CITY UNIVERSITY OF HONG KONG
_
Course code & title
:
MS4102 Business Forecasting Methods
Session
:
Semester B, 2009/2010
Time allowed
:
Two hours
This paper has 17 pages (including this page)
Materials, aids and instruments permitted to be used during exa
Chapter Topics
Multiple regression
Autocorrelation
Regression Method
Slide 2
1
Regression Methods
Simple Linear Regression
To forecast an outcome (response variable,
dependent variable) of a study based on a
certain number of factors (explanatory varia
?
,~,~,~
.
>
'$~b
'1"'6-
' .2:,;,'. ,:,.t,<,\>
If
1f'
.
M S6215: F orecastinq M ethods f or B usiness
M id-session t est
1 )
W hat i s t he v alue o f h i n e xpression ( l.l)?
2 )
How m any u nknown p arameters a re t here i n ( 1.1) a nd w hat a re t