# ch1 - STAT 200, Lang Wu 1 Chapter 1. Distributions of Data...

This preview shows pages 1–4. Sign up to view the full content.

STAT 200, Lang Wu 1 Chapter 1. Distributions of Data Statistics is about learning from data, i.e., getting useful information from data. Statistics is becoming increasingly important in modern world, since many important decisions are made based on analysis of data and various data are becoming widely available (e.g., data from internet). Statistics plays an important role in almost every area, such as medicine, ﬁnance, IT industry, etc. As the New York Times article (August 6, 2009) “For Today’s Graduate, Just One Word: Statistics” says: “For many diﬀerent jobs in today’s world, mostly what you do is data analysis (statistics), even for jobs which seem unrelated to statistics. ...... Many today’s decisions in industry and government are based on data analysis results. ...... Statisticians are thus in high demand.” Statistics consists of three parts: i) getting data, ii) analyze data; and iii) draw conclusions. Methods of getting data will be discussed in Chapter 3. In this chapter, we assume that data are already available for analysis. Once you have a dataset in hand, how do you analyze the data to extract useful information? What information will be useful? How to ﬁnd them? Such a task is not trivial, especially when the dataset is large. The ﬁrst step you can do is to display the data using graphs, and then check important features of the graph, e.g., how the data are distributed and if the data are symmetric or skewed. summarize the data using a few important numbers, such as the averages and some measures of variation of the data.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
STAT 200, Lang Wu 2 In the following sections, we describe various graphical tools and numerical methods more closely. 1.1. Graphical Display of Data Distributions Once we have collected data, the ﬁrst thing we do is often to check how the data are distributed , i.e., how the data spread out. A simple way is to display the data distribution using graphs. A picture is worth a thousand words. As an illustration, we consider a small dataset which consists of quiz scores from 10 randomly selected students in a large class: quiz score 22, 65, 78, 55, 86, 77, 63, 88, 73, 91, 77 gender M M F F F F M F M F M We want to see how the data are distributed, e.g., are they symmetric? skewed? widely spread out? etc. Note that even though this may not be important for a small dataset, it can be very important for large datasets. Here we use a small dataset for illustration purpose. In the above dataset, the quiz score and gender are called two variables . Thus, data are values of variables of interest in a particular application. The distribution of a variable shows what values the variable take and how
STAT 200, Lang Wu 3 often the variable takes these values. Data of a variable is the values that variable takes in a particular study (these values are usually a small subset of all possible values this variable may take). In a diﬀerent study, the same variable may take diﬀerent values, e.g., the quiz scores for 10 diﬀerent students will be diﬀerent.

This preview has intentionally blurred sections. Sign up to view the full version.

View Full Document
This is the end of the preview. Sign up to access the rest of the document.

## This note was uploaded on 10/24/2011 for the course CHEMISTRY chem taught by Professor David during the Spring '11 term at The University of British Columbia.

### Page1 / 17

ch1 - STAT 200, Lang Wu 1 Chapter 1. Distributions of Data...

This preview shows document pages 1 - 4. Sign up to view the full document.

View Full Document
Ask a homework question - tutors are online