COMM 205Introduction to Management Information SystemsInstructor: Adam SaundersLecture 20: Data Wrangling Part II March 18, 2019
LOAD THE TIDYVERSE LIBRARY•Before you start, make sure you have tidyverseloaded.•To do so, first typelibrary(tidyverse)•Then, load the data frame companies(the one we used in the previous lecture). COMM 205: Introduction to Management Information SystemsLecture 20: Data Wrangling Part II
DATA WRANGLING•Data wrangling is defined as the process of transforming and mapping datafrom one raw data form into another, with the intent of making the data more useful (e.g., for analytics). •In R, we will use the dplyrand magrittrpackages shipped with tidyverse. •Since you have already installed tidyverse, go ahead and load it by entering library(tidyverse)in Rstudio.•magrittroffers a set of operators which make your code more readable. COMM 205: Introduction to Management Information SystemsLecture 20: Data Wrangling Part II
DATA WRANGLING•As indicated last lecture, dplyrprovides functions (in the form of verbs) that solve the most common data manipulation challenges. •Below are some of the main dplyrfunctions:FunctionHow it worksselect()picks variables based on their namesfilter()picks observations based on their valuesmutate()adds new variables that are functions of existing variablessummarise()reduces multiple values to a summary statisticsarrange()changes the order of rowsgroup_by()allows a user to perform the above functions on a subset of the dataCOMM 205: Introduction to Management Information SystemsLecture 20: Data Wrangling Part II
USING YOUR FIRST R DATASET•In this course, we will be frequently using a dataset stored in the file North_American_Stock_Market_1994-2013.rds•The dataset contains information on virtually all publicly traded firms in Canada and the United States from 1994 to 2013 (obtained from Compustat).
You've reached the end of your free preview.
Want to read all 20 pages?
- Fall '19
- Statistics, Decimal