Data Description Lecture Number 3 October 19, 2016 Lecture Number 3 Data Description October 19, 2016 1 / 51
Outline of Lecture 3 1 Introduction 2 Measures of Central Tendency 3 Measures of Variability 4 Chebyshev’s Theorem and The Empirical Rule 5 Skewness 6 Distribution Shapes 7 Measures of Position 8 Exploratory Data Analysis(EDA) Lecture Number 3 Data Description October 19, 2016 2 / 51
Introduction There are three (3) major characteristics of a single variable that we tend to look at: (1) Central tendency; (2) Variability or Dispersion; and (3) Distribution. These three can help us make some statistical summary statements about a large and complex set of individual values for a variable Graphs can help you describe the basic shape of a data distribution; ”a picture is worth a thousand words.” There are limitations, however, to the use of graphs - e.g. graphs are somewhat imprecise for use in statistical inference. One way to overcome limitations of graphs is to use numerical measures , which can be calculated for either a sample or a population of measurements. You can use the data to calculate a set of numbers that will convey a good mental picture of the frequency distribution. Lecture Number 3 Data Description October 19, 2016 3 / 51
Measures of Central Tendency Definition of measures of central tendency A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. As such, measures of central tendency are sometimes called measures of central location. Thus, the purpose of measures of central tendency is to identify the location of the centre of a given distribution. There are generally three measures of central tendency: the mean; the median; and the mode. Note that: measures found by using all the data values in the population are called parameters while those calculated by using the data values from samples are called statistics ! Secondly, it is important to note that one needs to know how to calculate these measures for both ungrouped and grouped data! Lecture Number 3 Data Description October 19, 2016 4 / 51
Measures of Central Tendecny - The Mean Definition of Mean The mean is defined as the arithmetic average of a set of data values (measurements or observations), that is, the sum of the values divided by the total number of values. It is a very common and useful measure of center and is often referred to as the arithmetic mean, or simply the average, of a set of measurements. To distinguish between the mean for the sample and the mean for the population, we will use the symbol ¯ x read as: ”x-bar” for a sample mean and the symbol μ (Greek lower case mu) for the mean of a population. Since most statistical formulas (including one for the mean) involve adding or ”summing” numbers, we use a shorthand symbol , (Greek capital sigma),to indicate the process of summing.
