Problem Set 9
Topic: Categorical Explanatory Variables
1. If you believe that a statement is false, briefly say why you think
it is false.
a. The purpose of an interaction variable is to force fit the two
groups to be parallel.
b. To check the similar var
1. (6 points) Two dice are rolled and the two resulting values are multiplied together to form the
quantity z. What are the expected value and variance of the random variable z?
2. (8 points) Two stocks are available. The corresponding expected rates of r
R Data Import/Export
Version 3.3.0 (2016-05-03)
R Core Team
This manual is for R, version 3.3.0 (2016-05-03).
c 20002016 R Core Team
Copyright
Permission is granted to make and distribute verbatim copies of this manual provided
the copyright notice and t
Cleaning Data in R
Why care about cleaning data?
Collect
Clean
Analyze
Report
Cleaning data
Everything else
Cleaning Data in R
What we'll cover in this course
1. Exploring raw data
2. Tidying data
3. Preparing data for analysis
4. Pu!ing it all together
C
1
Market Fundamentals
The basic questions on markets:
Q: What is a market?
A market is nothing more than a system of shared rules which can be laws
or collective understandings held in place by custom or explicit
agreement.
Collective
Understandings
CULTU
INTRODUCTION TO R
Vector Arithmetic
Introduction to R
Vector Arithmetic
> my_apples <- 5
> my_oranges <- 6
> my_apples + my_oranges
[1] 11
my_apples is a vector!
my_oranges is a vector!
Computations are performed element-wise
> earnings <- c(50, 100, 30)
WhatisMLE(maximumlikelihoodestimator)
a.Usedinestimatingstatisticalparameters,Itassumesa(NO)distributionoftheparameterand
maximizes its joint probability distribution, estimate is obtained at the point where probability
distributionofparameterismaximum.
b
Q: How would you calculate the variance of the columns of a matrix (called mat) in R without using
for loops.
A: This question establishes familiarity with R by indirectly asking about one of the biggest flaws of
the language. If the candidate has used it
Q: Suppose you have the option to go into one of two bank branches. Branch one has 10 tellers,
each with a separate queue of 10 customers, and branch two has 10 tellers, sharing one queue of
100 customers. Which do you choose?
A: This question establishes
15.Whatisthedifferenceb/wR2andAdjustedR2?
R2 is a statistic that will give some information about the goodness of fit of a model. In
regression,theR2coefficientofdeterminationisastatisticalmeasureofhowwelltheregression
lineapproximatestherealdatapoints.An
27.HowdoufindgoodnessofyourmodelinGLM?
a. Its not Rsquare, here it is Chisquare. b. Percent Correct Predictions c. Hosmer and
LemeshowGoodnessofFitTestd.ROCcurvese.SomersDf.Gammag.Tauah.Ci.More
thanadozenR2typesummaries
28.Whatcriteriadouusetofindparamete
22.WhatarethedifferenttypesofrotationinFactorloading?
Varimax rotation is an orthogonal rotation of the factor axes to maximize the variance of
thesquaredloadingsofafactor(column)onallthevariables(rows)inafactormatrix,whichhas
theeffect of differentiating
22.WhatarethedifferenttypesofrotationinFactorloading?
Varimax rotation is an orthogonal rotation of the factor axes to maximize the variance of
thesquaredloadingsofafactor(column)onallthevariables(rows)inafactormatrix,whichhas
theeffect of differentiating
DATA VISUALIZATION WITH GGPLOT2
Statistics with Geoms
Data Visualization with ggplot2
ggplot2, course 2
Statistics
Coordinates
Facets
Themes
Data Visualization Best Practices
Case Study: California Health Information Survey
Data Visualization with ggplot2
Once the data was downloaded it became important to transform the data in order to make
any sense out of it. While this transformation was run across different parameters only the
select parameters were considered for the final analysis. Following are som
Based on the time series plot we find that there has been a decline in the trips per day and hence the
revenue for yellow cabs in the NYC post May 2015. This also coincides with the medallion
amendments by the NY taxi and Limousine commission and the adve
Methodology in Practice
1.Data Collection:
Data has been scraped from using a custom webscrapper built using R programming. The scrapper
used the tickers for 42 stocks which were part of
the portfolio to extract their daily data starting
March 2012 till M
Backtesting Results
a) Maximising Risk adjusted return: The strategy finds the maximum
level of return at the minimum level of risk.
b) Minimize Expected Shortfall: This strategy minimises the expected
loss at the 95% confidence level.
c) Minimize Expect
Risk Budgets Optimization
Portfolios are weighted and optimization is based on the constraints
specified by the risk budgets. A minimum Expected Shortfall Portfolio
(minES) forms the basis on which constraints can be applied based on
the risk tolerance o
Objective of the Project
This project aims to build efficient trading model incorporating
sound risk management. Implementing integrated risk management
is one of the basic building blocks of a sound trading strategy. This
approach makes it possible to li
Final
Model
Performance:
Output & Test for
Interpretations:
1. Linear fit is the adequate fit with significant variables
LONGLOSS, SHORTLOSS, GPWPERSONAL, GPWCOMM and
LIQUIDRATIO.
2. Adjusted R-squared is 99.51%.
3. Residuals are homoscedastic but not nor
Running the 2nd Regression model (using the significant
variables only): Once the significant independent variables
were identified we proceed to create a second regression
model, but this time only using the significant variables. The
out for which is as
Multiple Linear Regression
Once the outliers were identified and removed, we proceed to
carry out the regression on the data set, in order to do this we
followed the following steps;
Partition the data: The data for partitioned into Train and Test
data se
Insurance Company Expenses
Overview:
Like every other business, insurance companies seek to
minimize expenses associated with doing business in order to
enhance profitability. To study expenses, we examine a random
sample of 384 insurance companies from t
Graphical Representation of Data
301
28
Graphical Representation of Data
28.1 INTRODUCTION
Whenever verbal problems involving a certain situation is presented visually before the
learners, it makes easier for the learner to understand the problem and atte
DATA ANALYSIS - THE DATA TABLE WAY
INTRODUCTION
What is data.table?
Think data.frame as a set of columns
Every column is the same length but dierent type
Goal 1: Reduce programming time
(fewer function calls, less variable name repetition)
Goal 2: Reduce