This preview shows page 1. Sign up to view the full content.
Unformatted text preview: Chapter 2: Chapter 2 Frequency Distributions and Graphs q y p
Introduction d 21 Organizing Data g g 22 Histograms, Frequency Polygons, and Ogives 2 3 Other Types of Graphs 23 Other Types of Graphs Organizing Data Organizing Data
Data original form are called raw data, example (Wealthy People) p 49 57 38 73 81 74 59 76 65 69 54 56 69 68 78 65 85 49 69 61 48 81 68 37 43 78 82 43 64 67 52 56 81 77 79 85 40 85 59 80 60 71 57 61 69 61 83 90 87 74 Little information can be obtained from raw data. Organize these data into class and frequencies, a The frequency of a class then is the number of q y data values contained in a specific class. A frequency distribution is the organization of A frequency distribution is the organization of raw data in table form, using classes and frequencies.
Frequency distribution Categorical Frequency Categorical Frequency Distributions Grouped frequency Grouped frequency distribution Categorical Frequency Distributions Used for data that can be placed in specific categories, such as nominal or ordinallevel data. (political affiliation, religious affiliation) Example (Blood Types) Twentyfive army inductees were given a blood test The data set is A B B AB O O O B AB B B B O A O A O O O AB AB A O B A. Construct a frequency distribution. O O AB AB A O B A Construct a frequency distribution Step 1 Make a table Step 2 Tally the data. Step 3 Count the tallies. Step 4 Find the totals. d h l Step 5 Find the percentage of values in each class % = 100*f/n. Percentages are called relative frequency. Grouped Frequency Distributions Grouped Frequency Distributions When the range of the data is large, the data must be grouped into classes. For example, hours of boat grouped into classes For example hours of boat batteries, . Class limits (lower limit, Upper limit) Class boundaries are used to separate the classes so that there are no gaps in the frequency distribution. that there are no gaps in the freq enc distrib tion For whole numbers, Lower limit 0.5=310.5=30.5= lower boundary Upper limit+0.5=37+0.5=37.5= upper boundary If the limits are in tenths, such as 7.88.8, the boundaries for that class would be 7.758.85. Class width= lower limit of a classlower limit of the previous class. For example, 3124=7. The class midpoint . For example, Notes: The classes must be mutually exclusive. The classes must be mutually exclusive. correct incorrect The classes must be continuous. Even if there are no values in a class. in a class. The classes must be exhaustive. There should be enough classes to accommodate all the data. The classes must be equal in width. One exception occurs when a distribution has a class that is openended. (open ended distribution) ) Step 2 Tally the data. p y Step 3 Find the numerical frequencies from the tallies. the tallies A cumulative frequency distribution is a distribution that shows the number of data values less than or equal to a specific value (usually an upper boundary). For this example For this example . Example: The data shown here represent the number of miles per gallon (mpg) that 30 selected fourwheel of miles per gallon (mpg) that 30 selected four wheel drive sports utility vehicles obtained in city driving. Construct a frequency distribution, and analyze the Construct a frequency distribution, and analyze the distribution. 12 17 12 14 16 18 16 18 12 16 17 15 15 16 12 15 16 16 12 14 15 12 15 15 19 13 16 18 16 14 Solution: Step 1. Determine the classes. Since the range of the data set is small (1912= 7), classes consisting of a single data value can be used. Step 2. Tally the data. Step 3. Find the frequencies and the cumulative frequencies. Th The cumulative frequencies are l ti f i Constructing a Grouped Frequency Distribution Step 1 Determine the classes. Step 1 Determine the classes Find the highest and lowest values. Find the range Find the range. Select the number of classes desired. Find the width by dividing the range by the number of Find the width by dividing the range by the number of classes and rounding up. Select a starting point (usually the lowest value or any convenient number less than the lowest value); add i b l h h l l ) dd the width to get the lower limits. Find the upper class limits Find the upper class limits. Find the boundaries. Step 2 Tally the data. Step 2 Tally the data Step 3 Find the numerical frequencies from the tallies, and find the cumulative frequencies. q 22 Histograms, Frequency Polygons, and Ogives . 1. Histogram : The histogram is a graph that displays the data by using h h h h d l h d b contiguous vertical bars of various heights to represent the frequencies of the classes. the frequencies of the classes. Example: Construct a histogram to represent the data shown for the record high temperatures for each of the 50 states. Solution: Step 1 Draw and label the x and y axes. S 1D dl b l h d Step 2 Represent the frequency on the y axis and the class th i d th l boundaries on the x axis. . Step 3 Using the frequencies as the heights, draw vertical bars for each class. The class with the greatest number of data values (18) is 109.5 114.5, followed by 13 The class with the greatest number of data values (18) is 109.5114.5, followed by 13 for 114.519.5. The graph also has one peak with the data clustering around it. When you are analyzing histograms and frequency polygons, look at the shape of the curve. For example, Does it have one peak or two peaks? Does it have one peak or two peaks? Is it relatively flat, or is it Ushaped? Are the data values spread out on the graph, or are they clustered around the center? Are there data values in the extreme ends? These may be outliers. Are there any gaps in the histogram? Are the data clustered at one end or the other, indicating a skewed distribution? 2. The Frequency Polygon
The frequency polygon is a graph that displays the data by using lines that connect points plotted for the frequencies at the midpoints of the classes. The frequencies are represented by the id i f h l Th f i db h heights of the points. Example: Construct a frequency polygon for high Temperatures Example: Construct a frequency polygon for high Temperatures data. Step 1 Find the midpoints of each class. For example, Step 1 Find the midpoints of each class For example (99.5+104.5)/2=102. Step 2 Draw the x and y axes. Step 2 Draw the x and y axes. Label the x axis with the midpoint of each class, and midpoint of each class and the y axis for the frequencies. Step 3 Using the midpoints for the x values and the frequencies as the y values, plot the points. Step 4 Connect adjacent points with line segments. Draw a line back to the x axis at the beginning and end of the graph, at the same distance that the previous and next midpoints would be located. located 3. The cumulative frequency graph (Ogive)
The Ogive is a graph that represents the cumulative frequencies for the classes in a frequency frequencies for the classes in a frequency distribution. Example: Construct an Ogive for high Temperatures Example: Construct an Ogive for high Temperatures data. Step 1 Find the cumulative frequency for each class. Step 1 Find the cumulative frequency for each class . Step 2 Draw the x and y axes. Label the x axis with the class boundaries and the y axis to p q represent the cumulative frequencies. Step 3 Plot the cumulative frequency at each upper class boundary. l b d Step 4 Starting with the first upper class p g pp boundary, 104.5, connect adjacent points with line segments. Then extend the graph to the first line segments Then extend the graph to the first lower class boundary, 99.5, on the x axis. Cumulative frequency graphs are used to visually represent how many values are below a certain upper represent how many values are below a certain upper class boundary. For example, to find out how many record high temperatures are less than 114.5F, locate 114.5F on the x axis, draw a vertical line up until it 114 5F th i d ti l li til it intersects the graph, and then draw a horizontal line at that point to the y axis. The y axis value is 28. that point to the y axis. The y axis value is 28. . Relative Frequency Graphs
Relative frequency graph uses proportions instead of frequencies. Example: Construct a histogram frequency polygon and ogive Construct a histogram, frequency polygon, and ogive using relative frequencies for the following distribution. Solution: Step 1 Convert each frequency to a proportion or relative frequency. . Step 2 Find the cumulative relative frequencies. p q Step 3 Draw each graph with y axis uses proportions. proportions Common Distribution Shapes A bellshaped distribution has a single peak and tapers off at either end. It is approximately tapers off at either end. It is approximately symmetric. A uniform distribution is basically flat or rectangular. A Jshaped distribution has a few data values on the left side and increases as one moves to the right. A reverse Jshaped distribution is the opposite of the J shaped distribution. When the peak of a distribution is to the left and the data values taper off to the right, a distribution is said to be positively or rightskewed. When the data values are clustered to the right and taper off to the left, a distribution is said to be negatively or left skewed. Bimodal distribution Ushaped distribution 2 3 Other Types of Graphs 23 Other Types of Graphs A bar graph represents the data by using vertical or horizontal bars whose heights or lengths represent the frequencies of the data. Example: The following table shows the average money spent by firstyear college students. Draw a horizontal and college students Draw a horizontal and vertical bar graph for the data. Solution: 1. Draw and label the x and y axes. 2. Draw the bars corresponding to the frequencies. . A Pareto chart is used to represent a frequency distribution for a categorical variable, and the di ib i f i l i bl d h frequencies are displayed by the heights of vertical bars, which are arranged in order from highest to lowest. which are arranged in order from highest to lowest Example: The table shown here is the average cost per mile for passenger the average cost per mile for passenger vehicles on state turnpikes. Solution: Solution: Step 1 Arrange the data from the largest to smallest according to frequency. largest to smallest according to frequency Step 2 Draw and label the x and y axes. Step 3 Draw the bars corresponding to the Step 3 Draw the bars corresponding to the frequencies. When you analyze a Pareto chart, make comparisons by h k i b looking at the heights of the bars. the bars The Pareto chart shows that Florida has the highest cost per mile. The cost is more than twice as high as the cost for Indiana. A time series graph represents data that occur over a specific period of time. i d f i Example: plot a time series graph for the tabled data. tabled data Step 1 Draw and label the x and y axes. Step 2 Label the x axis for years and the y axis for the damage. Step 3 Plot each point according to the table. Step 4 Draw line segments connecting adjacent points. a compound time series graph a compound time series graph The graph shows a steady increase over the 5year period. When you analyze a time series graph look for a trend or When you analyze a time series graph, look for a trend or pattern that occurs over the time period and the slope, or steepness, of the line. A line that is steep over a specific time period indicates a rapid increase or decrease over that period. A pie graph is a circle that is divided into sections or wedges according to the percentage of frequencies in each category of the distribution. Example: This frequency distribution shows the number of pounds of each shows the number of pounds of each snack food eaten during the Super Bowl. Construct a pie graph for the data. l hf h d Solution: Step 1 Convert the frequencies into a proportional part of the circle using the formula Degree=(f/n)*360, so Step 2 Convert frequency to percentage using the formula %=(f/n)*100%, so h f l % (f/ )* %
Step 3 Using a protractor and a compass, draw the graph using the pp p g p , Appropriate degree measures found in step 1, and label each section with the name and percentages. y To analyze the nature of the data shown in the pie graph, look at the size of the sections in the pie the size of the sections in the pie graph. Misleading Graphs
Inappropriately drawn graphs can misrepresent the data and lead the reader to false conclusions. Example: changing the units at the starting point on the y axis can convey a very different visual representation of the data. the data It is not wrong to truncate an axis of the graph; many times it is necessary to do so. However, the reader should be aware of this fact and interpret the graph accordingly. thi f t d i t t th h di l Another way to misrepresent data on a graph is by y p g p y omitting labels or units on the axes of the graph. The g p graph shown compares the cost of living, economic p g, growth, population growth, etc., of four main g g p geographic areas in the United States. However, , since there are no numbers on the y axis, very little information can be gained g Summary of the types of the graphs A stem and leaf plot is a data plot that uses part of the data value as the stem and part of the data value as the data value as the stem and part of the data value as the leaf to form groups or classes. Example: Construct a stem and leaf plot for the data 25 31 20 32 13 14 43 02 57 23 36 32 33 32 44 32 52 44 51 45. Solution: Step 1 Arrange the data in order, 02, 13, 14, 20, 23, 25, 31, 32, 32, 32, 32, 33, 36, 43, 44, 02 13 14 20 23 25 31 32 32 32 32 33 36 43 44 44, 45, 51, 52, 57 Step 2 Separate the data according to the first digit, Step 2 Separate the data according to the first digit 02 13, 14 20, 23, 25 31, 32, 32, 32, 32, 33, 36 43, 44, 44, 45 51, 52, 57 43 44 44 45 51 52 57 Step 3 A display can be made by using the leading digit as the stem and the trailing g g g digit as the leaf. Note: 1. If there are no data values in a class, you should write f h d l l h ld the stem number and leave the leaf row blank. Do not put a zero in the leaf row. put a zero in the leaf row. 2. When the data values are in the hundreds, such as 325, the stem is 32 and the leaf is 5. 3. When you analyze a stem and leaf plot, look for peaks and gaps in the distribution. See if the distribution is symmetric or skewed. Check the variability of the data symmetric or skewed Check the variability of the data by looking at the spread. 4. The backtoback stem and leaf plot. The back to back stem and leaf plot. . ...
View
Full
Document
This note was uploaded on 03/10/2012 for the course STAT 101 taught by Professor Johnanderson during the Spring '12 term at Amity University.
 Spring '12
 johnanderson
 Statistics, Histograms

Click to edit the document details