Problem Set 9
Topic: Categorical Explanatory Variables
1. If you believe that a statement is false, briefly say why you think
it is false.
a. The purpose of an interaction variable is to force fit the
1. (6 points) Two dice are rolled and the two resulting values are multiplied together to form the
quantity z. What are the expected value and variance of the random variable z?
2. (8 points) Two stoc
IMPORTING DATA INTO R
Introduction
Flat Files
Importing Data into R
Importing data into R
?
Importing Data into R
5 Types
!
Flat Files
Excel Files
Statistical So!ware
Databases
Data from the Web
"
Imp
R Data Import/Export
Version 3.3.0 (2016-05-03)
R Core Team
This manual is for R, version 3.3.0 (2016-05-03).
c 20002016 R Core Team
Copyright
Permission is granted to make and distribute verbatim co
Cleaning Data in R
Why care about cleaning data?
Collect
Clean
Analyze
Report
Cleaning data
Everything else
Cleaning Data in R
What we'll cover in this course
1. Exploring raw data
2. Tidying data
3.
1
Market Fundamentals
The basic questions on markets:
Q: What is a market?
A market is nothing more than a system of shared rules which can be laws
or collective understandings held in place by custom
INTRODUCTION TO R
Vector Arithmetic
Introduction to R
Vector Arithmetic
> my_apples <- 5
> my_oranges <- 6
> my_apples + my_oranges
[1] 11
my_apples is a vector!
my_oranges is a vector!
Computations a
WhatisMLE(maximumlikelihoodestimator)
a.Usedinestimatingstatisticalparameters,Itassumesa(NO)distributionoftheparameterand
maximizes its joint probability distribution, estimate is obtained at the poin
Q: How would you calculate the variance of the columns of a matrix (called mat) in R without using
for loops.
A: This question establishes familiarity with R by indirectly asking about one of the bigg
Q: Suppose you have the option to go into one of two bank branches. Branch one has 10 tellers,
each with a separate queue of 10 customers, and branch two has 10 tellers, sharing one queue of
100 custo
15.Whatisthedifferenceb/wR2andAdjustedR2?
R2 is a statistic that will give some information about the goodness of fit of a model. In
regression,theR2coefficientofdeterminationisastatisticalmeasureofho
27.HowdoufindgoodnessofyourmodelinGLM?
a. Its not Rsquare, here it is Chisquare. b. Percent Correct Predictions c. Hosmer and
LemeshowGoodnessofFitTestd.ROCcurvese.SomersDf.Gammag.Tauah.Ci.More
thanad
22.WhatarethedifferenttypesofrotationinFactorloading?
Varimax rotation is an orthogonal rotation of the factor axes to maximize the variance of
thesquaredloadingsofafactor(column)onallthevariables(row
DATA VISUALIZATION WITH GGPLOT2
Statistics with Geoms
Data Visualization with ggplot2
ggplot2, course 2
Statistics
Coordinates
Facets
Themes
Data Visualization Best Practices
Case Study: California He
DATA MANIPULATION WITH DPLYR
Introduction
Data Manipulation with dplyr
Group dose 1 dose 2
Sum
A
3
3
6
A
4
5
9
B
3
1
4
B
1
3
4
C
1
3
4
C
2
2
4
n
min
mean
max
6
4
5.2
9
Group Total
A
15
B
8
C
8
Data Ma
Once the data was downloaded it became important to transform the data in order to make
any sense out of it. While this transformation was run across different parameters only the
select parameters we
Based on the time series plot we find that there has been a decline in the trips per day and hence the
revenue for yellow cabs in the NYC post May 2015. This also coincides with the medallion
amendmen
Methodology in Practice
1.Data Collection:
Data has been scraped from using a custom webscrapper built using R programming. The scrapper
used the tickers for 42 stocks which were part of
the portfolio
Backtesting Results
a) Maximising Risk adjusted return: The strategy finds the maximum
level of return at the minimum level of risk.
b) Minimize Expected Shortfall: This strategy minimises the expect
Risk Budgets Optimization
Portfolios are weighted and optimization is based on the constraints
specified by the risk budgets. A minimum Expected Shortfall Portfolio
(minES) forms the basis on which c
Objective of the Project
This project aims to build efficient trading model incorporating
sound risk management. Implementing integrated risk management
is one of the basic building blocks of a sound
Final
Model
Performance:
Output & Test for
Interpretations:
1. Linear fit is the adequate fit with significant variables
LONGLOSS, SHORTLOSS, GPWPERSONAL, GPWCOMM and
LIQUIDRATIO.
2. Adjusted R-square
Running the 2nd Regression model (using the significant
variables only): Once the significant independent variables
were identified we proceed to create a second regression
model, but this time only u
Multiple Linear Regression
Once the outliers were identified and removed, we proceed to
carry out the regression on the data set, in order to do this we
followed the following steps;
Partition the dat
Insurance Company Expenses
Overview:
Like every other business, insurance companies seek to
minimize expenses associated with doing business in order to
enhance profitability. To study expenses, we ex
Graphical Representation of Data
301
28
Graphical Representation of Data
28.1 INTRODUCTION
Whenever verbal problems involving a certain situation is presented visually before the
learners, it makes ea
DATA ANALYSIS - THE DATA TABLE WAY
INTRODUCTION
What is data.table?
Think data.frame as a set of columns
Every column is the same length but dierent type
Goal 1: Reduce programming time
(fewer functio