Descriptive statistics describe a data set. The mean (average), median (middle score), and mode (most common score) are measures of central tendency. Range and standard deviation are measures of variability.
Scientists aim to accomplish two goals with their data. First, they want to organize and describe the data in meaningful ways. Second, they want to use the data to make inferences, or predictions, about a population of interest. To achieve the first goal, scientists rely on descriptive statistics. Descriptive statistics summarize a data set.
Descriptive statistics most often involve a measure of central tendency, one value representing the entire set of values. The most commonly used measure of central tendency is the mean, commonly referred to as the average. The mean is calculated by adding all values in a data set and dividing the sum by the total number of values. With a set of five exam scores (67, 72, 72, 91, 77), the mean would be
Another measure of central tendency is the median, the value in the data set where half the values fall below that value and half are above it. To calculate the median, values need to be put in order from highest to lowest: 91, 77, 72, 72, 67. In this data set, the median is 72 because there are two exam scores below this number and two above it. In a data set with an even number of values, the median is the mean of the two middle values (92, 91, 77, 72, 72, 67—the median is the mean of 77 and 72, or 74.5). The last measure of central tendency is the mode, the most frequently occurring value in the data set. In this case, the mode is 72 because it occurs twice in the data set. If a data set contains a value that occurs the same number of times as another value, then the set has more than one mode. If a data set has no repeated values, then it has no mode.
It is also useful to examine the variance between values in a data set. Variance refers to how similar or different the values are to one another. The least sophisticated measure of variation is range, the numerical difference between the highest and lowest values in a data set. A more commonly used measure of variation is the standard deviation, a measure of how much values differ from the mean. Most naturally occurring phenomena have normally distributed values. A normal distribution is a symmetrical, bell-shaped distribution in which most values fall near the mean (about 68% are within one standard deviation of the mean). For example, grades generally follow a bell-shaped distribution: some grades are very high and others are very low, but most are around the middle of the range (i.e., mean). Graphical displays of data help researchers visualize the entire data set at once, including variability in scores, relationships between variables, and potential differences between groups.
Normal and Skewed Distributions
Inferential statistics help determine whether it is possible to make predictions about the general population based on research results. In large samples, findings can be statistically significant without representing a large enough effect to have practical significance.
To make predictions about a population of interest, scientists use inferential statistics. Inferential statistics generalize conclusions from the sample to a larger population. Different types of inferential statistics serve different purposes. The correlation coefficient measures the relationship between variables. T-tests measure differences between groups. An analysis of variance measures interactions between one or more variables.
Confidence in making inferences is determined by statistical significance, which indicates the probability that an observed result occurred due to chance. As a general rule, a p-value (i.e., calculated probability) of 0.05 is the cutoff for a statistically significant result. A p-value of 0.05 represents 95% confidence the results are not due to error. However, just because a result is statistically significant does not mean it is practically significant. Practical significance indicates whether the result is useful in the real world. A measurement that can help determine practical significance is effect size. Effect size is a measure of the magnitude of a finding. For example, in a study where researchers gave caffeinated coffee to one group before an exam and decaffeinated coffee to another group, they found the caffeinated coffee group scored higher on the exam than the decaffeinated group with a p-value of 0.01. However, upon measuring the effect size, they found that the coffee group scored only 3 points higher on a test out of 100 points. Three additional points will not make a significant difference in people's final letter grade. The effect of caffeine on test scores was statistically significant but small.