36-350 Homework 2 Due Thursday, September 11,
2014, 11:59 PM on Blackboard
36-350 Homework 1 Due Thursday, September 4, 2014,
Homework Assignment # 8
36-350, Data Mining
SOLUTIONS
1. The Prototype Method Really Is a Linear Classier
(a) Show that the prototype method is really a linear classier of the form
sgn b + x w, and nd b and w in terms of c+ and c .
Answer: x is closer to
Homework 1
36-350: Data Mining
SOLUTIONS
1. (a) What is the bag-of-words representation of the sentence to be or not
to be?
Answer: A vector with one component for each word in our dictionary, all of them zero except for the following:
be not
2
1
or
1
to
Homework 3: Super Scalper Scrape
36-350, Fall 2014
Lab 7: How the Tetracycline Came to Peoria
36-350
10 October 2014
Agenda: Transforming data; combining information from multiple objects; practice with selective access;
practice applying functions.
Now-common ideas like early adopters and viral marketing
Lab 12: International Chess: Hot Or Not?
One of the earliest examples of a convergent, adaptive Markov process was the rating system devised by
Arpad Elo to rank chess players. It has endured for so long as a simple system for so long that it is used as a
Lab 8: How the Doctors Got Their Scrips
36-350
24 October 2014
Agenda: Writing functions to automate jobs; applying functions; creating new variables.
In the last lab, we worked with one of the classic data sets on the diusion of innvoations, on the sprea
Lab 10: Debt, the Last Seventy Years
36-350
7 November 2014
Agenda: Practicing split-apply-combine.
Gross domestic product (GDP) is a measure of the total market value of all goods and services produced in a
given country in a given year. The percentage g
36-350: Lab 1, August 29 2014
Todays agenda: Manipulating data objects; using the built-in functions, doing numerical calculations, and
basic plots; reinforcing core probabilistic ideas.
General instructions for labs: Upload an R Markdown le, named with y
Lab 11: Money, Its A Hit
This lab shows how R interfaces with other programs, particularly SQL database management. We will use
the package RSQLite, which not only provides an R interface but also installs a minimal library for database
access.
Unless you
Homework 10: The Students Break the Bank at Monte
Carlo
36-350
Due at 11:59 pm on Monday, 24 November 2014
This is the nal homework of the semester. Celebrate on your own time.
Now that we are discussing simulation, for loops are now back on the table. Le
36-350 Lab 6 October Surprise
The distribution of talent is a major question of statistical research and a foundation of many young statistical
careers (including 75% of your instructors), and there is no shortage of the distribution of observable talent
Homework 5: Pareto and Kuznets on the Grand Tour
36-350
Due at 11:59 pm on Thursday, 2 October 2014
We continue working with the World Top Incomes Database [http:/topincomes.g-mond.parisschoolofeconomics.
eu], and the Pareto distribution, as in the lab. W
Homework 6: Bug Hunt
36-350
Due at 11:59 pm on Thursday, 23 October 2014
In this assignment you will debug each of the functions included in the accompanying R script so that they
produce the correct results.
source("hw-06-supplement.R")
You will replace
Homework 4: The Death and Life of Great American
City Scaling Laws
36-350, Fall 2014
Due at 11:59 pm on Thursday, 25 September 2014
Homework 9: Growth, Debt, and Time
36-350
Due at 11:59 pm on Monday, 17 November 2014
We continue to work with the data set on economic growth and government debt from lab 10.
1. Load the data and make a scatter-plot of GDP growth (vertical axis) against
Homework Assignment 7
36-350, Data Mining
Solutions
1. Base rates (10 points)
(a) What fraction of the e-mails are actually spam?
Answer: 39%.
> sum(spam$spam="spam")
[1] 1813
> 1813/nrow(spam)
[1] 0.3940448
(b) What should the constant classier predict?
-title: "Final Exam"
author: "Abby Smith, Kenny Hu"
date: "Saturday, November 21, 2015"
output: html_document
-Part 0: Instructions
=
Make sure you read these.
- Unless you have opted out of the pairing process, you will have been assigned
a random
partne
Midterm Exam
36-350, Data Mining
In class, 15 October 2008
When doing calculations, show your work. No calculators are needed, or
allowed. Ask me if you need extra paper. There are three numbered questions,
all equally weighted. You are not expected to do
-title: "Lab 12"
author: "Abby Smith (als1)"
date: "Friday, December 04, 2015"
output: html_document
-Today's agenda: practicing simulation tasks, in the context of fun and games:
darts!
*Background.* Darts is a game where you throw metal darts at a dartb
-title: "Lab5"
author: "Abby Smith"
date: "Friday, October 02, 2015"
output: html_document
-#Part 1: Feline Hearts
`cfw_r
library(MASS)
data(cats)
summary(cats)
`
a. This shows the distribution of cats by sex and then the summary statistics
(mean, median,
Homework 1
Abby Smith
Thursday, September 03, 2015
Part 1
strikedat<read.table("http:/www.stat.cmu.edu/~ryantibs/statcomp/homework/strike.txt",
header=TRUE)
dim(strikedat)
# [1] 625
7
There are 625 rows and 7 columns.
colnames(strikedat)
# [1] "X1" "X2" "
Homework 8
Abby Smith
Thursday, November 05, 2015
library("numDeriv")
1.
n = 100; p = 10; s = 3
set.seed(0)
x = matrix(rnorm(n*p),n,p)
b = c(-0.7,0.7,1,rep(0,p-s)
y = x %*% b + rt(n,df=2)
b
#
[1] -0.7
0.7
1.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
No, you would not