Data Analysis in Finance and Risk Management Science
RMSC 4002

Fall 2015
Tutorial 1 Introduction to R
1. Development environment
1.1 Download R
Download URL:
https:/cran.rproject.org/bin/linux/ (For GNU/Linux, choose your own distribution)
https:/cran.rproject.org/bin/macosx/ (For OS X)
https:/cran.rproject.org/bin/windows/
Data Analysis in Finance and Risk Management Science
RMSC 4002

Fall 2015
Attribute VB_Name = "Multivariate"
Option Explicit
Option Base 1
Function mcov(data As Range) As Variant
Dim i As Integer
Dim j As Integer
Dim nc As Integer
nc = data.Columns.Count
Dim cov() As Double
ReDim cov(nc, nc)
For i = 1 To nc
For j = 1
Data Analysis in Finance and Risk Management Science
RMSC 4002

Fall 2015
# improved kmeans()
# Try kmeans(x,k) serveal times and output the best (largest ratio) trial
# x is the matrix of input variable, k is the no. of clusters
# try is no. of trials
# display cluster size and stat, output cluster label
kmstat<function(x,k)
Data Analysis in Finance and Risk Management Science
RMSC 4002

Fall 2015
Instructions on Using the tool
( Building a classification Model)
Step 1: Enter Your Data
(A) Enter your data in The Data worksheet, starting from the cell AC105
(B) The observations should be in rows and the variables should be in columns.
(C) Above each
Data Analysis in Finance and Risk Management Science
RMSC 4002

Fall 2015
# Kmean clustering
d<read.csv("iris.csv") # read in data
d1<d[,1:4] # save the first 4 columns to d1
km<kmeans(d1,3) # Kmeans with k=3
print(km) # print result
readline("Hit <Return> to continue:")
plot(d1,col=km$cluster) # plot d1 with color
readlin
Data Analysis in Finance and Risk Management Science
RMSC 4002

Fall 2015
Tutorial 3
1. Interaction Terms
It is easy to include interaction terms in a linear model using the lm() function. The syntax
lstat:black tells R to include an interaction term between lstat and black. The syntax
lstat*age simultaneously includes lstat, a
Data Analysis in Finance and Risk Management Science
RMSC 4002

Fall 2015
Tutorial 2Simple and Multiple Linear Regression
1 Libraries
The library() function is used to load libraries, or groups of functions and data sets that are
not included in the base R distribution.
library(MASS)
library(ISLR)
If you receive an error mess
Data Analysis in Finance and Risk Management Science
RMSC 4002

Fall 2015
Bayesian Statistics
Exercises 3.
1. Jim Berger introduced the following medical diagnosis problem. Let
D denote the event that a patient has the disease. Consider a patient
sampled from a population where the disease prevalence is p0 , so p0 =
Pr(D). Cons
Data Analysis in Finance and Risk Management Science
RMSC 4002

Fall 2015
Advanced Statistical Methods for Research
Solution to exam questions on logistic regression
Suggested solutions appear in bold
INSTRUCTIONS: Plan to spend about 1 hour on the problems on logistic regression.
Answer each question and show work in the space
Data Analysis in Finance and Risk Management Science
RMSC 4002

Fall 2015
Tutorial 5KNN&application
1. KNearest Neighbors
knn() function requires four inputs.
A.
A matrix containing the predictors associated with the training data,
labeled train.X below.
B. A matrix containing the predictors associated with the data for which
Data Analysis in Finance and Risk Management Science
RMSC 4002

Fall 2015
Homework Set 2
September 23, 2016
NOTICE: The homework is due on Oct. 8 (Saturday) before the class.
Please provide the R codes and steps that you use to get your
solutions. You are allowed, and even encouraged, to discuss the homeworks
with your classmat
Data Analysis in Finance and Risk Management Science
RMSC 4002

Fall 2015
Group 6 Consulting
Memo
To:
Professor Ye
From:
Group 6, Section 2 (Ken Klein, Lev Lavrichtchev, Meridith Nelson, Don Schmidt, Saurabh Swaroop)
Re:
Red Brand Canners
Executive Summary
To determine what amount of tomato products to pack in the coming year,
Data Analysis in Finance and Risk Management Science
RMSC 4002

Fall 2015
Instructions on Using the tool
( Building a classification Model)
Step 1: Enter Your Data
(A) Enter your data in The Data worksheet, starting from the cell AC105
(B) The observations should be in rows and the variables should be in columns.
(C) Above each
Data Analysis in Finance and Risk Management Science
RMSC 4002

Fall 2015
Instructions on Using the tool
( Building a Tree based Classification Model )
Step 1: Enter Your Data
(A) Enter your data in The Data worksheet, starting from the cell L24
Number of rows in your data should be between 10 and 10,000
Application won't build
Data Analysis in Finance and Risk Management Science
RMSC 4002

Fall 2015
Instructions on Using the tool
( Building a Tree based Classification Model )
Step 1: Enter Your Data
(A) Enter your data in The Data worksheet, starting from the cell L24
Number of rows in your data should be between 10 and 10,000
Application won't build
Data Analysis in Finance and Risk Management Science
RMSC 4002

Fall 2015
Instructions on Using the tool
( Building a classification Model)
Step 1: Enter Your Data
(A) Enter your data in The Data worksheet, starting from the cell AC105
(B) The observations should be in rows and the variables should be in columns.
(C) Above each
Data Analysis in Finance and Risk Management Science
RMSC 4002

Fall 2015
d<read.csv("stock.csv")
t1<as.ts(d$HSBC) # save as time series
t2<as.ts(d$CLP)
t3<as.ts(d$CK)
u1<(lag(t1)t1)/t1 # compute u
u2<(lag(t2)t2)/t2
u3<(lag(t3)t3)/t3
msd<function(t,w) cfw_ # function to compute moving s.d.
n<length(t)w+1
out<c()
Data Analysis in Finance and Risk Management Science
RMSC 4002

Fall 2015
Chapter 1
Multivariate Normal Distribution
Normal distribution is the most important distribution in Statistics. Multivariate
normal distribution is the natural extension of the univariate normal distribution. In
this chapter, the multivariate normal dist
Data Analysis in Finance and Risk Management Science
RMSC 4002

Fall 2015
Chapter 2
Volatilities and Correlations
Volatilities and correlations are important concepts in quantitative finance, statistics
and risk management. They are important parameters in forecasting stock price,
simulating stochastic models, pricing options a
Data Analysis in Finance and Risk Management Science
RMSC 4002

Fall 2015
Chapter 8
Cluster Analysis
So far, the methods we mentioned such as Binary Logistic regression, Classification
Tree and Neural Network can derive rules from a training data set. That is, the data
set has known group members. For example, stocks with known