This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Chapter 1: Examining Distributions Statistics is the science of data. We therefore begin our study of statistics by mastering the art of examining data. Any set of data contains information about some group of individuals. Data Sets: The data set contains information about some group of individuals. • Individuals are the objects described by a se of data. Individuals may be people, but they may also be business firms, common stocks, or other objects. The information is organized in variables. • A variable is any characteristic of an individual. A variable can take different values for different individuals. Example of a data set: Individual Sex Age Weight 1 F 21 120 2 M 20 175 3 M 24 190 4 F 19 140 5 F 20 110 Individual: row or column Variable: row or column Two types of Variables: Categorical and Quantitative Variables: • A categorical variable places an individual into one of several groups or categories. o Sex (male or female) o Occupation (teacher, construction worker, lawyer, doctor, others) o Field of study (Arts and humanities, Biological science, Business, Education, others). • A quantitative variable takes numerical values for with arithmetic operations such as adding and averaging make sense. The distribution of a variable tells us what values it takes and how often it take these values. 1.1 Displaying distributions with Graphs: Statistical tools and ideas help us examine data to describe their main features. This examination is called exploratory data analysis. 4&;$*!#49$.!&,2$*7!!
<7 "9$!&,2$*!#:!&!=&4$+#.(=&,!&.(&/,$!&.$!,&/$,*!:#.!49$!=&4$+#.($*7!"9$!
5(*4.(/24(#)!#:!&!=&4$+#.(=&,!&.(&/,$!,(*4*!49$!=&4$+#.($*!&)5!+($*!$(49$.!49$
=#2)4!#.!49$!6$.=$)4!#:!()5((52&,*!19#!:&,,!()!$&=9!=&4$+#.37!! ! Exploring Data: +
• Begin by examining each variable itself. Then move on to study the relationships among the variables. 2345/01+>$4*!'$&*2.$!49$!&.(&/,$!6.('&.3!'&?#.!()!49(*!=,&**@!
+
• Begin with a graph or graphs. Then add numerical summaries of specific !
aspects of the data. AAAAAAAAAAAAAAAAAAAAAAAAA!
!
AAAAAAAAAAAAAAAAAAAAAAAAA!
! Example: Lets measure the variable primary major in this class: AAAAAAAAAAAAAAAAAAAAAAAAA!
!
AAAAAAAAAAAAAAAAAAAAAAAAA!
___________________________ ________________________________ !
___________________________ AAAAAAAAAAAAAAAAAAAAAAAAA! ! ________________________________ AAAAAAAAAAAAAAAAAAAAAAAAA!
! ___________________________ ________________________________ AAAAAAAAAAAAAAAAAAAAAAAAA! ! ________________________________ AAAAAAAAAAAAAAAAAAAAAAAAA!
___________________________ ! AAAAAAAAAAAAAAAAAAAAAAAAA!ar graphs and pAAAAAAAAAAAAAAAAAAAAAAAAAA!
!
Categorical variables: b !
ie charts: ! The values of a categorical variable are labels for the categories, such as “male” and ! “female”. The distribution of a categorical variable lists the categories and gives either the count or the percent of individuals who fall in each category. 6"0+78%$9!*9#1!49$!5(*4.(/24(#)!#:!&!=&4$+#.(=&,!&.(&/,$!&*!&!B6($C!19#*$!*,(=$*!&.$
*(D$*!/3!49$!=#2)4*!#.!6$.=$)4*!:#.!49$!=&4$+#.($*7! a “pie” whose slices are Pie chart
show the distribution of a categorical variable as sizes by the counts or percents for the categories. E F!6($!=9&.4!'2*4!()=,25$!&,,!49$!=&4$+#.($*!49&4!'&;$!26!49$!19#,$!
A pG8HHIJ7!! include all the categories that make up the whole (100%). ie chart must
Use a pie chart when you want to emphasize each category’s relation to the Ewhole. K*$!&!6($!=9&.4!19$)!3#2!1&)4!4#!$'69&*(D$!$&=9!=&4$+#.3L*!.$,&4(#)!4#!
49$!19#,$7!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!"#$%#"&'(!"#$"#%#&'!#()*!)('#+,"!,.!'*#!/("0(12#!(%!(!1("3!4*#!/("0(12#!1("!
*#0+*'%!%*,5!'*#!)('#+,"!),6&'%!,"!$#")#&'%3!!
• Bar graphs
represent each category of the variable as a bar. The variable bar 7("!+"($*%!("#!#(%0#"!',!8(9#!'*(&!$0#!)*("'%!(&:!(2%,!#(%0#"!',!"#(:3!!
heights show the category counts or percents. • 4• Bar graphs are easier to make than pie charts and also easier to read. *#!1("%!)(&!1#!(""(&+#:!1!(2$*(1#'0)(2!,":#"3!!
• ;'!0%!,.'#&!1#''#"!',!(""(&+#!'*#!1("%!0&!,":#"!,.!*#0+*'3!4*0%!*#2$%!6%!
• The bars can be arranged by alphabetical order. 088#:0('#2!',!%##!5*0)*!8(<,"!($$#("!8,%'!,.'#&3!!height. This helps us • It is often better to arrange the bars in order of immediately to see which major appear most often. !
o A bar graph whose categories are ordered from frequent to least 7("!+"($*%!(&:!$0#!)*("'%!("#!8(0&2!',,2%!.,"!$"#%#&'0&+!:('(=!'*#!*#2$!,6!',!
frequent is called a Pareto chart. +"(%$!:('(!>60)923!!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! Bar graphs and pie charts are mainly tools for presenting data: they help you to grasp data quickly. Quantitative variables: Histograms, Stem Plots and Time Plots Quantitative variables often take many values. The distribution tells us what values the variable takes and how often it takes these values. • The most common graph of the distribution of one quantitative variable is a histogram. Histograms
looks similar to bar graphs, but are fundamentally different from them. The length and the width of the bars have specific meanings.
Length is proportional to count.
Width determined by data ranges. The bars touch, indicating that all values of the variable are covered. To make a histogram of the distribution of this variable, proceed as follows: 1. Sort the data in ascending order. 2. Choose the classes. Divide the range of the data into classes of equal width. Range=maximum value –minimal value 3. Count the individuals in each class. The sum of each class is equal to the number of individuals in the study. 4. Draw the histogram. Make a scale for the variable whose distribution you are displaying on the horizontal axis. The vertical axis contains the scale of counts. Each bar represents a class. The base of the bar covers the class, and the bar height is the class count. Draw the bars with no horizontal space between them unless a class is empty, so that its bar has height zero. Example: Age 18, 17, 20, 20, 21, 23, 20, 22, 20, 20, 25, 19, 19, 19, 35, 32 1. Sort the data Age 15, 18, 19, 19, 19, 20, 20, 20, 20, 20, 21, 22, 23, 25, 32, 35 Range= 35
15 =20 2. Choose the classes. 3. Count the individuals in each class Classes Frequency 15
19.9 5 20
24.9 8 25
29.9 1 30
34.9 1 35
39.9 1 Note: Just make sure to specify the classes precisely so that each individual falls into exactly one class. 3. Draw the histogram. Stem Plots (also called as stem
and
leaf plots): Much like a histogram turned sideways. Instead of bars, we list the individual values. Only used for small data sets. To make a stem plot: 1. Separate each observation into a stem, consisting of all but the final (rightmost) digit, and a leaf the final digit. Stems may have as many digits as needed, but each leaf contains only a single digit. 2. Write a steam in a vertical column with the smallest at the top, and draw a vertical line at the right of this column. Be sure to include all the stems needed to span the data, even when some will have no leaves. 3. Write each leaf in the row to the right of its stem, in increasing order out from the stem. Lets use the previous example to do a stem plot. NOTE: If we have more than 2 digits we do the same, except now our stem will have more than one digit but ALWAYS our leaf is going to have only one digit. Interpreting Histograms and stem
plots: 1. In any graph of data, look for the overall pattern and for striking deviations from that pattern. 2. You can describe the overall pattern of a histogram by its shape, center, and spread. 3. An important kind of deviation is an outlier, an individual value that falls outside the overall pattern. Symmetry and Skewed Distribution: 1. Symmetric distribution
the left and right sides of the distribution are approximately mirror images of each other. 2. Right skewed
the right side extends much farther out than the left. 3. Left skewed
the left side extends much farther out than the right. Time Plots: 1. Plot each observation against the time at which it was measured. 2. The horizontal axis represents time. 3. The vertical axis represents the variables being measured. 4. Connecting data points by lines emphasizes changes over time. Interpreting Time Plots: 1. A trend in a time series is a persistent, long
term rise or fall. 2. A pattern in a time series that repeats itself at known regular intervals of time is called seasonal variation. ...
View Full
Document
 Spring '08
 BATEH
 Statistics

Click to edit the document details