Stat133Lecture1

Stat133Lecture1 - Statistics 133: Concepts in Computing...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Statistics 133: Concepts in Computing with Data Instructor : Dr. Cari Kaufman [email protected] GSI: Daisy Huang [email protected] Thursday, August 28, 2008 1 What Are Data? Thursday, August 28, 2008 2 Numbers Example: Traffic on I-80 Thursday, August 28, 2008 3 Text Example: SPAM or HAM? Thursday, August 28, 2008 4 Images, video, or audio Example: Mary Jane ski area and Rifle Sight trail Height taken from a digital elevation model, with overlaid high-resolution photograph. Plan your descent through the bumps and go for it. Bump skiing does not get much harder than this .This pitch is a long one and typically does not have much loose snow so technique is impor tant even if you decide to traverse across to the left to lose some speed. Bear to to skier's left at the bottom of this pitch and finish out the run on Feebleminded. Look for good snow on the sides. However you get down this run you should feel like you skied something hard and a bit wild -- and done it in view of all the folks comfor tably sitting on the SuperGauge chairs. You will not find many other black runs that will stretch you like Riflesight Notch. - From the Mar y Jane Project Thursday, August 28, 2008 5 Meta-data Example: Shelters along the Applachian trail Thursday, August 28, 2008 6 Course Expectations Thursday, August 28, 2008 7 Getting Started with R Thursday, August 28, 2008 8 Why use R? Some of you may have used statistical software with a GUI, like Minitab. You may also be familiar with other programming languages, like C, Java, Python, etc. In this class, we’ll use the R programming language and environment as our “home base” for performing many data analytic tasks. Some benefits of R: • Allows custom analyses and easy replicability • High level language designed for statistics • Active user community, lots of add-ons • It’s free! Thursday, August 28, 2008 9 A screenshot from http://www.R-project.org/ Thursday, August 28, 2008 10 R can be run in interactive or batch modes. The interactive mode is useful for trying out new analyses and making sure your code is doing what you think it is. The batch mode is useful for carrying out pre-defined analyses in the background. For now, we’ll focus on the interactive mode. When you fire up R, you’ll see a prompt, like this: Thursday, August 28, 2008 11 At the prompt, you can type an expression. An expression is a combination of letters/numbers/symbols which are interpreted by a particular programming language according to its rules. It then returns a value. We can also say it evaluates to that value. >3+5 [1] 8 > 1:20 [1] 1 2 3 4 5 6 7 8 [14] 14 15 16 17 18 19 20 > > # This is a comment > > 30 + 10 / # I'm not done typing +2 [1] 35 9 10 11 12 13 Thursday, August 28, 2008 12 To store a value, we can assign it to a variable. > x1 <- 32 %% 5 > print(x1) [1] 2 > x2 <- 32 %/% 5 > x2 # In interactive mode, this prints the object [1] 6 > ls() # List all my variables [1] "x1" "x2" > rm(x2) # Remove a variable > ls() [1] "x1" Thursday, August 28, 2008 13 Variable names must follow some rules: • May not start with a digit or underscore (_) • May contain numbers, characters, and some punctuation - period and underscore are ok, but most others are not • Case-sensitive, so x and X are different Advice on variable names: • Use meaningful names • Avoid names that already have a meaning in R. doubt, check: > exists("pi") [1] TRUE If in Thursday, August 28, 2008 14 There are several ways to save your objects for later. You can use the save and load functions to save specific variables. > save(x1, file = "x1.RData") > rm(x1) > ls() character(0) > load(file = "x1.RData") > ls() [1] "x1" When you quit R, you’ll be asked whether you want to save ALL the contents of your current workspace. > q() Save workspace image? [y/n/c]: Thursday, August 28, 2008 15 A function is a portion of code that performs a specific task. Usually it takes some inputs, performs some computations, and returns a value. The inputs are called arguments to the function. When you use a function with a particular set of arguments, you are set to be calling the function. The computer evaluates the function call and returns the output. For now, we’ll work with R’s built-in functions, and the most important things to know are how to call the function and how to get help when you need it. Thursday, August 28, 2008 16 First, determine the arguments. > args(rnorm) function (n, mean = 0, sd = 1) NULL > args(plot) default function (x, y, ...) values The “...” argument is special and we’ll talk about it later. When you call a function, you can specify the arguments either by position or by name, or a combination. > x <- 1:100 > y <- rnorm(100, sd = x) # Combination > plot(x, y) # By position Thursday, August 28, 2008 17 y -150 0 -100 -50 0 50 100 150 20 40 x 60 80 100 Thursday, August 28, 2008 18 > help(rnorm) # A shortened version of the real page: Normal package:stats R Documentation The Normal Distribution Description: Random generation for the normal distribution with mean equal to 'mean' and standard deviation equal to 'sd'. Usage: rnorm(n, mean = 0, sd = 1) Arguments: n: number of observations. mean: vector of means. sd: vector of standard deviations. Thursday, August 28, 2008 19 Details: If 'mean' or 'sd' are not specified they assume the default values of 0 and 1, respectively. Value: 'rnorm' generates random deviates. Source: See RNG for how to select the algorithm and for references to the supplied methods. References: Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) _The New S Language_. Wadsworth & Brooks/Cole. See Also: 'runif' and '.Random.seed' Examples: ... Thursday, August 28, 2008 20 R has a number of built-in data types. The three most basic types are numeric, character, and logical. You can check the type using the mode function. > mode(3.5) [1] "numeric" > mode("Hello") [1] "character" > mode(2 < 3) [1] "logical" Actually, the three types are numeric, character, and logical vectors. There’s no such thing as a scalar in R, just a vector of length one. Thursday, August 28, 2008 21 A vector in R is a collection of values of the same type. You can join vectors together using the c (for “concatenate”) function. > c(1.3, 2, 8/3) [1] 1.300000 2.000000 2.666667 > c("a", "l", "q") [1] "a" "l" "q" > c(TRUE, FALSE, FALSE) [1] TRUE FALSE FALSE > > c(1, 2, FALSE) [1] 1 2 0 > c(1, 2, "c") [1] "1" "2" "c" The last two expressions illustrate implicit coercion. You should try to avoid this in most situations. Thursday, August 28, 2008 22 The elements of a vector can have names. > unfair.coin <- c("heads" = 0.55, "tails" = 0.45) > unfair.coin heads tails 0.55 0.45 > names(unfair.coin) [1] "heads" "tails" > > # Another way to do it > fair.coin <- c(0.5, 0.5) > names(fair.coin) <- names(unfair.coin) > fair.coin heads tails 0.5 0.5 Thursday, August 28, 2008 23 There five ways to extract elements of a vector. > unfair.coin[1] # 1) Inclusion by position heads 0.55 > unfair.coin[-1] # 2) Exclusion by position tails 0.45 > unfair.coin["heads"] # 3) By name heads 0.55 > unfair.coin[unfair.coin > 0.5] # 4) By logical index heads 0.55 > unfair.coin # 5) No index (include everything) heads tails 0.55 0.45 Thursday, August 28, 2008 24 ...
View Full Document

This note was uploaded on 10/08/2010 for the course ENGIN 120 taught by Professor Ilan during the Spring '08 term at Berkeley.

Ask a homework question - tutors are online