Cheat Sheet for R and RStudio
L. Jason Anastasopoulos
April 29, 2013
1
Downloading and Installation
First download R for your OS: R
Next download RStudio for your OS: RStudio
2
Uploading Data into R-Studio
R-Studio Makes uploading CSV les into R extreme

Big Data - Hadoop/MapReduce
Sambit Sahu
Credit to: Rohit Wagle and Juan Rodriguez
Agenda
!Why Big Data?
!Apache Hadoop
Introduction
Architecture
Programming
2
Hypothetical Job
!You just got an awesome job at data-mining start-up .
Congratulations !
Free S

A brief overlook at your datacenter kernel, synchronous
communication, and distributed messaging
Building a Scalable
Application
Motivation
What
How
does cloud computing helps?
Cloud
Are
is a scalable application?
= Scalable?
business applications and

Cloud and Big Data
Sambit Sahu
IBM Research
Course Objective
! Graduate level course on Cloud Computing
Focus is on learning and building extremely large scale systems and applications
leveraging Cloud.
Learn concepts as well as hands-on experience by u

Lecture 2: IaaS Cloud and Amazon EC2
Sambit Sahu, IBM Research
Recap from Lecture 1
2
Different Cloud Offerings: A Layered Perspective
! Higher the stack, less control but more automation for user
! Lower the stack, more control but more responsibility fo

This is a summary of the paper Bigtable: A Distributed Storage System for Structured Data. References
are shorthanded as (x.y) where x is the page number and y is the paragraph on that page.
Background
Googles Bigtable is a datastructure similar to, but n

Lecture 3: Understanding On-demand Infrastructure
Sambit Sahu, IBM Research
Last week: IaaS Cloud and Amazon EC2
! We learned how to request a resource using AWS programming APIs
Amazon EC2 SDK for java on Eclipse
http:/aws.amazon.com/eclipse/
A simple

To put it another way, the random variable X in a binomial distribution can be defined as follows: Let Xi =
1 if the ith bernoulli trial is successful, 0 otherwise. Then, X = Xi, where the Xis are independent and
identically distributed (iid). That is, X

134 CHAPTER 4. CONDITIONAL PROBABILITY
Example 4.3 Consider our voting example from Section 1.2: three candidates A,
B, and C are running for office. We decided that A and B have an equal chance of
winning and C is only 1/2 as likely to win as A. Let A be

Statistical Excel Functions:
Functions
Mean =AVERAGE(data range of cells in a column)
Median =MEDIAN(data range)
Mode =MODE.SNGL(data range)
Range = MAX (data range) - MIN(data range)
Standard deviation =STDEV.S(data range)
Variance =VAR.S(data range)
Cor

Achieving Information Dominance: Unleashing the
Ozone Widget Framework
By Ms. Patricia Diercks, Captain George Galdorisi (U.S. Navy Retired), Ms. Amanda George, Mr. Brent
Brockman, Ms. Wanda Lam, Ms. Analiza Lozano, Ms. Rita Painter, and Mr. Glenn Tolenti

PRICING POLICY
Gari Jenkins
VALUE
Value = Benefits - Cost (do not just focus on price)
Attributes: Generic (must have) ~ Order qualifying / non-compensatory
Discriminatory (win Mkt Share) ~ Order Winning / compensatory
. . . . how they combine to provide

Gareth James Daniela Witten Trevor Hastie Robert Tibshirani
An Introduction to Statistical Learning
with Applications in R
An Introduction to Statistical Learning provides an accessible overview of the eld
of statistical learning, an essential toolset for

Founda'ons of Data Science
Lecture 3
Rumi Chunara, PhD
CS3943/9223
So Far
What is Data Science?
Data Handling
Doing Data Science
Intro to R
Types of Data
Data cleaning, sampling, processing
Today
GeCng Data + AP

Date,United States,Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut,Delaware,District of Columbia,Florida,Georgia,Hawaii,Idaho,Illinois,Indiana,Iowa,Kansas,Kentucky,Louisiana,Maine,Maryland,Massachusetts,Michigan,Minnesota,Mississippi,Misso

Assignment #1: Data Exploration
Spring 2016
CS3943/9223
Prof. Rumi Chunara
Total: 30 points
All questions must be completed in R. Implement and comment your code so that
anyone reading the file can reproduce the code easily (e.g. set the file path once at

