Chapter+1and2

Chapter+1and2 - Chapter 1 Data and Statistics I need help!...

Info iconThis preview shows page 1. Sign up to view the full content.

View Full Document Right Arrow Icon
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: Chapter 1 Data and Statistics I need help! • Applications in Business and Economics • Data • Data Sources • Descriptive Statistics • Statistical Inference • Statistical Analysis Using Microsoft Excel Applications in Business and Economics • Accounting Public accounting firms use statistical sampling procedures when conducting audits for their clients. s Economics Economists use statistical information in making forecasts about the future of the economy or some aspect of it. Applications in Business and Economics s Marketing Electronic point­of­sale scanners at retail checkout counters are used to collect data for a variety of marketing research applications. s Production A variety of statistical quality control charts are used to monitor the output of a production process. Applications in Business and Economics Finance Financial advisors use price­earnings ratios and dividend yields to guide their investment recommendations. Data and Data Sets • Data are the facts and figures collected, summarized, analyzed, and interpreted. The data collected in a particular study are referred to as the data set. Elements, Variables, and Observations The elements are the entities on which data are collected. A variable is a characteristic of interest for the elements. The set of measurements collected for a particular element is called an observation. The total number of data values in a data set is the number of elements multiplied by the number of variables. Data, Data Sets, Elements, Variables, and Observations Element Names Variables Observation Company Dataram EnergySouth Keystone LandCare Psychemedics Stock Annual Earn/ Exchange Sales($M) Share($) NQ N N NQ N 73.10 0.86 74.00 1.67 365.70 0.86 111.40 0.33 17.60 0.13 Data Set Scales of Measurement Scales of measurement include: Nominal Interval Ordinal Ratio The scale determines the amount of information contained in the data. The scale indicates the data summarization and statistical analyses that are most appropriate. Scales of Measurement • Nominal Data are labels or names used to identify an attribute of the element. A nonnumeric label or numeric code may be used. Scales of Measurement s Nominal Example: Students of a university are classified by the school in which they are enrolled using a nonnumeric label such as Business, Humanities, Education, and so on. Alternatively, a numeric code could be used for the school variable (e.g. 1 denotes Business, 2 denotes Humanities, 3 denotes Education, and so on). Scales of Measurement • Ordinal The data have the properties of nominal data and the order or rank of the data is meaningful. A nonnumeric label or numeric code may be used. Scales of Measurement • Ordinal Example: Students of a university are classified by their class standing using a nonnumeric label such as Freshman, Sophomore, Junior, or Senior. Alternatively, a numeric code could be used for the class standing variable (e.g. 1 denotes Freshman, 2 denotes Sophomore, and so on). Scales of Measurement • Interval The data have the properties of ordinal data, and the interval between observations is expressed in terms of a fixed unit of measure. Interval data are always numeric. Scales of Measurement • Interval Example: Melissa has an SAT score of 1205, while Kevin has an SAT score of 1090. Melissa scored 115 points more than Kevin. Scales of Measurement • Ratio The data have all the properties of interval data and the ratio of two values is meaningful. Variables such as distance, height, weight, and time use the ratio scale. This scale must contain a zero value that indicates that nothing exists for the variable at the zero point. Scales of Measurement • Ratio Example: Melissa’s college record shows 36 credit hours earned, while Kevin’s record shows 72 credit hours earned. Kevin has twice as many credit hours earned as Melissa. Qualitative and Quantitative Data Data can be further classified as being qualitative or quantitative. The statistical analysis that is appropriate depends on whether the data for the variable are qualitative or quantitative. In general, there are more alternatives for statistical analysis when the data are quantitative. Qualitative Data Labels or names used to identify an attribute of each element Often referred to as categorical data Use either the nominal or ordinal scale of measurement Can be either numeric or nonnumeric Appropriate statistical analyses are rather limited Quantitative Data Quantitative data indicate how many or how much: discrete, if measuring how many continuous, if measuring how much Quantitative data are always numeric. Ordinary arithmetic operations are meaningful for quantitative data. Scales of Measurement Data Qualitative Numerical Nominal Ordinal Quantitative Non­numerical Nominal Ordinal Numerical Interval Ratio Descriptive Statistics • Descriptive statistics are the tabular, graphical, and numerical methods used to summarize data. Statistical Inference Population − the set of all elements of interest in a particular study Sample − a subset of the population Statistical inference − the process of using data obtained from a sample to make estimates and test hypotheses about the characteristics of a population Census − collecting data for a population Sample survey − collecting data for a sample Process of Statistical Inference 1. Population consists of all tune­ups. Average cost of parts is unknown. 2. A sample of 50 4. The sample average 3. The sample data engine tune­ups is examined. provide a sample average parts cost of $79 per tune­up. is used to estimate the population average. Chapter 2 Descriptive Statistics: Tabular and Graphical Presentations Part A • Summarizing Qualitative Data • Summarizing Quantitative Data Summarizing Qualitative Data • • • • • Frequency Distribution Relative Frequency Distribution Percent Frequency Distribution Bar Graph Pie Chart Frequency Distribution A frequency distribution is a tabular summary of data showing the frequency (or number) of items in each of several non­overlapping classes. The objective is to provide insights about the data that cannot be quickly obtained by looking only at the original data. Example: Marada Inn Guests staying at Marada Inn were asked to rate the quality of their accommodations as being excellent, above average, average, below average, or poor. The ratings provided by a sample of 20 guests are: Below Average Average Above Average Above Average Above Average Average Above Average Average Above Average Above Average Below Average Poor Excellent Above Average Average Above Average Below Average Poor Above Average Average Average Frequency Distribution Rating Frequency 2 Poor 3 Below Average 5 Average 9 Above Average 1 Excellent Total 20 Relative Frequency Distribution The relative frequency of a class is the fraction or proportion of the total number of data items belonging to the class. A relative frequency distribution is a tabular summary of a set of data showing the relative frequency for each class. Percent Frequency Distribution The percent frequency of a class is the relative frequency multiplied by 100. A percent frequency distribution is a tabular summary of a set of data showing the percent frequency for each class. Relative Frequency and Percent Frequency Distributions Relative Frequency Rating .10 Poor .15 Below Average .25 Average .45 Above Average .05 Excellent Total 1.00 Percent Frequency 10 15 25 .10(100) = 10 45 5 100 1/20 = .05 Bar Graph A bar graph is a graphical device for depicting qualitative data. On one axis (usually the horizontal axis), we specify the labels that are used for each of the classes. A frequency, relative frequency, or percent frequency scale can be used for the other axis (usually the vertical axis). Using a bar of fixed width drawn above each class label, we extend the height appropriately. The bars are separated to emphasize the fact that each class is a separate category. Bar Graph Marada Inn Quality Ratings 10 9 Frequency 8 7 6 5 4 3 2 1 Poor Below Average Above Excellent Average Average Rating Pie Chart The pie chart is a commonly used graphical device for presenting relative frequency distributions for qualitative data. s First draw a circle; then use the relative frequencies to subdivide the circle into sectors that correspond to the relative frequency for each class. s Since there are 360 degrees in a circle, a class with a relative frequency of .25 would consume .25(360) = 90 degrees of the circle. Pie Chart Marada Inn Quality Ratings Excellent 5% Poor 10% Below Average 15% Above Average 45% Average 25% Example: Marada Inn s Insights Gained from the Preceding Pie Chart • One­half of the customers surveyed gave Marada a quality rating of “above average” or “excellent” (looking at the left side of the pie). This might please the manager. • For each customer who gave an “excellent” rating, there were two customers who gave a “poor” rating (looking at the top of the pie). This should displease the manager. Summarizing Quantitative Data • Frequency Distribution • Relative Frequency and Percent Frequency Distributions • Histogram • Cumulative Distributions • Ogive Example: Hudson Auto Repair The manager of Hudson Auto would like to have a better understanding of the cost of parts used in the engine tune­ups performed in the shop. She examines 50 customer invoices for tune­ups. The costs of parts, rounded to the nearest dollar, are listed on the next slide. Example: Hudson Auto Repair s Sample of Parts Cost($) for 50 Tune­ups 91 71 104 85 62 78 69 74 97 82 93 72 62 88 98 57 89 68 68 101 75 66 97 83 79 52 75 105 68 105 99 79 77 71 79 80 75 65 69 69 97 72 80 67 62 62 76 109 74 73 Frequency Distribution • Guidelines for Selecting Number of Classes • Use between 5 and 20 classes. • Data sets with a larger number of elements usually require a larger number of classes. • Smaller data sets usually require fewer classes. Frequency Distribution • Guidelines for Selecting Width of Classes •Use classes of equal width. •Approximate Class Width = Largest Data Value − Smallest Data Value Number of Classes Frequency Distribution For Hudson Auto Repair, if we choose six classes: Approximate Class Width = (109 ­ 52)/6 = 9.5 ≅ Frequency Parts Cost ($) 50­59 2 60­69 13 70­79 16 80­89 7 90­99 7 100­109 5 Total 50 10 10 Relative Frequency and Percent Frequency Distributions Parts Relative Percent Cost ($) Frequency Frequency 50­59 .04 4 60­69 .26 2/50 26 .04(100) 70­79 .32 32 80­89 .14 14 90­99 .14 14 100­109 .10 10 Total 1.00 100 Relative Frequency and Percent Frequency Distributions s Insights Gained from the Percent Frequency Distribution • Only 4% of the parts costs are in the $50­59 class. • 30% of the parts costs are under $70. • The greatest percentage (32% or almost one­third) of the parts costs are in the $70­79 class. • 10% of the parts costs are $100 or more. Histogram Another common graphical presentation of quantitative data is a histogram. The variable of interest is placed on the horizontal axis. A rectangle is drawn above each class interval with its height corresponding to the interval’s frequency, relative frequency, or percent frequency. Unlike a bar graph, a histogram has no natural separation between rectangles of adjacent classes. Histogram 18 Tune­up Parts Cost 16 Frequency 14 12 10 8 6 4 2 50− 59 60− 69 70− 80− 79 89 90− 99 100­110 Parts Cost ($) Histogram • Symmetric – Left tail is the mirror image of the right tail – Examples: heights and weights of people Relative Frequency .35 .30 .25 .20 .15 .10 .05 0 Histogram • Moderately Skewed Left – A longer tail to the left – Example: exam scores Relative Frequency .35 .30 .25 .20 .15 .10 .05 0 Histogram • Moderately Right Skewed – A Longer tail to the right – Example: housing values Relative Frequency .35 .30 .25 .20 .15 .10 .05 0 Histogram • Highly Skewed Right – A very long tail to the right – Example: executive salaries Relative Frequency .35 .30 .25 .20 .15 .10 .05 0 Cumulative Distributions Cumulative frequency distribution − shows the number of items with values less than or equal to the upper limit of each class.. Cumulative relative frequency distribution – shows the proportion of items with values less than or equal to the upper limit of each class. Cumulative percent frequency distribution – shows the percentage of items with values less than or equal to the upper limit of each class. Cumulative Distributions • Hudson Auto Repair Cumulative Cumulative Cumulative Relative Percent Frequency Frequency Cost ($) Frequency 2 .04 < 59 4 15 .30 < 69 30 < 79 31 2 + 13 .62 15/50 62 .30(100) 38 .76 < 89 76 45 .90 < 99 90 1.00 < 109 50 100 Ogive s s The data values are shown on the horizontal axis. s Shown on the vertical axis are the: • cumulative frequencies, or • cumulative relative frequencies, or • cumulative percent frequencies s The frequency (one of the above) of each class is plotted as a point. s An ogive is a graph of a cumulative distribution. The plotted points are connected by straight lines. Ogive s Hudson Auto Repair • Because the class limits for the parts­cost data are 50­ 59, 60­69, and so on, there appear to be one­unit gaps from 59 to 60, 69 to 70, and so on. • • These gaps are eliminated by plotting points halfway between the class limits. Thus, 59.5 is used for the 50­59 class, 69.5 is used for the 60­69 class, and so on. Ogive with Cumulative Percent Frequencies Cumulative Percent Frequency Tune­up Parts Cost 100 80 60 (89.5, 76) 40 20 Parts 50 60 70 80 90 100 110 Cost ($) Chapter 2 Descriptive Statistics: Tabular and Graphical Presentations Part B Exploratory Data Analysis s Crosstabulations and y Scatter Diagrams s x Exploratory Data Analysis The techniques of exploratory data analysis consist of simple arithmetic and easy­to­draw pictures that can be used to summarize data quickly. One such technique is the stem­and­leaf display. Stem-and-Leaf Display A stem­and­leaf display shows both the rank order and shape of the distribution of the data. It is similar to a histogram on its side, but it has the advantage of showing the actual data values. The first digits of each data item are arranged to the left of a vertical line. To the right of the vertical line we record the last digit for each item in rank order. Each line in the display is referred to as a stem. Each digit on a stem is a leaf. Example: Hudson Auto Repair The manager of Hudson Auto would like to have a better understanding of the cost of parts used in the engine tune­ups performed in the shop. She examines 50 customer invoices for tune­ups. The costs of parts, rounded to the nearest dollar, are listed on the next slide. Example: Hudson Auto Repair s Sample of Parts Cost ($) for 50 Tune­ups 91 71 104 85 62 78 69 74 97 82 93 72 62 88 98 57 89 68 68 101 75 66 97 83 79 52 75 105 68 105 99 79 77 71 79 80 75 65 69 69 97 72 80 67 62 62 76 109 74 73 Stem­and­Leaf Display 5 6 7 8 9 10 2 7 2 2 2 2 5 6 7 8 8 8 9 9 9 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9 0 0 2 3 5 8 9 1 3 7 7 7 8 9 1 4 5 5 9 a stem a leaf Stretched Stem-and-Leaf Display If we believe the original stem­and­leaf display has condensed the data too much, we can stretch the display by using two stems for each leading digit(s). Whenever a stem value is stated twice, the first value corresponds to leaf values of 0 − 4, and the second value corresponds to leaf values of 5 − 9. Stretched Stem­and­Leaf Display 5 5 6 6 7 7 8 8 9 9 10 10 2 7 2 2 2 2 5 6 7 8 8 8 9 9 9 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9 0 0 2 3 5 8 9 1 3 7 7 7 8 9 1 4 5 5 9 Stem­and­Leaf Display s Leaf Units • A single digit is used to define each leaf. • In the preceding example, the leaf unit was 1. • Leaf units may be 100, 10, 1, 0.1, and so on. • Where the leaf unit is not shown, it is assumed to equal 1. Example: Leaf Unit = 0.1 If we have data with values such as 8.6 11.7 9.4 9.1 10.2 11.0 8.8 a stem­and­leaf display of these data will be Leaf Unit = 0.1 8 6 8 9 1 4 10 2 11 0 7 Example: Leaf Unit = 10 If we have data with values such as 1806 1717 1974 1791 1682 1910 1838 a stem­and­leaf display of these data will be Leaf Unit = 10 16 8 17 1 9 18 0 3 19 1 7 The 82 in 1682 is rounded down to 80 and is represented as an 8. Crosstabulations and Scatter Diagrams Thus far we have focused on methods that are used to summarize the data for one variable at a time. Often a manager is interested in tabular and graphical methods that will help understand the relationship between two variables. Crosstabulation and a scatter diagram are two methods for summarizing the data for two variables simultaneously. Crosstabulation A crosstabulation is a tabular summary of data for two variables. s Crosstabulation can be used when: • one variable is qualitative and the other is quantitative, • both variables are qualitative, or • both variables are quantitative. The left and top margin labels define the classes for the two variables. Crosstabulation s Example: Finger Lakes Homes The number of Finger Lakes homes sold for each style and price for the past two years is shown below. quantitative qualitative variable variable Price Range < $99,000 > $99,000 Total Home Style Colonial Log Split A­Frame 18 6 19 12 12 14 16 3 30 20 35 15 Total 55 45 100 Crosstabulation • Insights Gained from Preceding Crosstabulation • The greatest number of homes (19) in the sample are a split­level style and priced at less than or equal to $99,000. • Only three homes in the sample are an A­Frame style and priced at more than $99,000. Crosstabulation Frequency distribution for the price variable Price Range < $99,000 > $99,000 Total Home Style Colonial Log Split A­Frame 18 6 19 12 12 14 16 3 30 20 35 15 Frequency distribution for the home style variable Total 55 45 100 Crosstabulation: Row or Column Percentages • Converting the entries in the table into row percentages or column percentages can provide additional insight about the relationship between the two variables. Crosstabulation: Row Percentages Price Range < $99,000 > $99,000 Home Style Colonial Log Split A­Frame 32.73 10.91 34.55 21.82 26.67 31.11 35.56 6.67 Total 100 100 Note: row totals are actually 100.01 due to rounding. (Colonial and > $99K)/(All >$99K) x 100 = (12/45) x 100 Crosstabulation: Column Percentages Price Range < $99,000 > $99,000 Total Home Style Colonial Log Split A­Frame 60.00 30.00 54.29 80.00 40.00 70.00 45.71 20.00 100 100 100 100 (Colonial and > $99K)/(All Colonial) x 100 = (12/30) x 100 Crosstabulation: Simpson’s Paradox Data in two or more crosstabulations are often aggregated to produce a summary crosstabulation. We must be careful in drawing conclusions about the relationship between the two variables in the aggregated crosstabulation. Simpson’ Paradox: In some cases the conclusions based upon an aggregated crosstabulation can be completely reversed if we look at the unaggregated data. suggests the overall relationship between the variables. Scatter Diagram and Trendline A scatter diagram is a graphical presentation of the relationship between two quantitative variables. One variable is shown on the horizontal axis and the other variable is shown on the vertical axis. The general pattern of the plotted points suggests the overall relationship between the variables. A trendline is an approximation of the relationship. Scatter Diagram • A Positive Relationship y x Scatter Diagram • A Negative Relationship y x Scatter Diagram • No Apparent Relationship y x Example: Panthers Football Team • Scatter Diagram The Panthers football team is interested in investigating the relationship, if any, between interceptions made and points scored. x = Number of Interceptions y = Number of Points Scored 1 3 2 1 3 14 24 18 17 30 Number of Points Scored Scatter Diagram 35 y 30 25 20 15 10 5 0 0 1 2 3 Number of Interceptions 4 x Example: Panthers Football Team s Insights Gained from the Preceding Scatter Diagram • The scatter diagram indicates a positive relationship between the number of interceptions and the number of points scored. • Higher points scored are associated with a higher number of interceptions. • The relationship is not perfect; all plotted points in the scatter diagram are not on a straight line. Tabular and Graphical Procedures Data Qualitative Data Quantitative Data Tabular Methods Tabular Methods Graphical Methods •Frequency Distribution •Rel. Freq. Dist. •Percent Freq. Distribution •Crosstabulation Graphical Methods •Bar Graph •Pie Chart •Frequency Distribution •Rel. Freq. Dist. •Cum. Freq. Dist. •Cum. Rel. Freq. Distribution •Stem­and­Leaf Display •Crosstabulation •Histogram •Ogive •Scatter Diagram ...
View Full Document

This note was uploaded on 08/28/2011 for the course BUS 300 taught by Professor White during the Spring '09 term at Rutgers.

Ask a homework question - tutors are online