Probability and Statistical Analysis in Sociology

Probability and Statistical Analysis

Statistical analysis aims to understand the mathematical likelihood that a behavior or event will occur.

Statistics and statistical methods are used in sociology to describe and draw inferences about populations. Statistical analysis can give researchers various types of information. It can help to support or refute a hypothesis, summarize information, and show probabilities. A probability is the likelihood that a specific behavior or event will occur. Inferential statistics is an approach to analyzing data that begins with a hypothesis and explores if data are consistent with this hypothesis. Inferential statistics is used for making inferences about the larger population from which the sample (the group studied) was drawn. The goal of inferential statistics is to draw a conclusion about a sample and generalize it to a larger population (draw general conclusions about a whole group, a whole community, or a whole society, based on information obtained from a smaller group). In order to do this as accurately as possible, the researcher must have confidence that the sample reflects the population. Random sampling gives the most confidence that the sample will represent the population. Descriptive statistics is an approach to analyzing data that explains the data and summarizes the sample. This method of analysis describes the data in some way. Charts and graphs that compare data are a form of descriptive statistics. For example, a chart might show the average age of marriage for women at different time periods.

Statistical analysis is useful in many situations. Many aspects of people's lives cannot be based on instinct or trial and error. Decisions based on data can provide better results. This is true in the business world, in the medical field, and in studying the social world. Consider how statistical analysis can impact education. In order for a society to make decisions about how to educate children, it can be useful to have data that is correctly interpreted about a range of issues such as when most children learn to read or what teaching approaches are most successful. In sociology, researchers use statistical analysis to make connections that might be obscurely recognized or understood. Sociologists use statistical analysis to study social and cultural issues and changes in society. Statistical analysis is helpful when researchers need to measure things, examine relationships, make predictions, test hypotheses, develop theories, make comparisons, describe phenomena, and explore issues.

An example of how researchers can use statistics is the study of poverty and obesity. In the United States both poverty and obesity are issues of concern. Statistical analysis shows that there is a correlation between poverty and obesity in the United States: higher percentages of individuals living in poverty have higher rates of obesity. This statistical information provides researchers with information about both poverty and obesity, suggesting potential areas of research. The information does not show that poverty is the only factor that leads to high rates of obesity,but it does show that there is a connection between these two problems. It is important that the data be correctly analyzed and interpreted. Analysis of data can reveal questions that require additional study. Further research can help researchers, policy makers, and medical professionals better understand how best to address poverty and obesity. Another example is data about rampage shooters, people who perpetrate mass shootings in which many victims are targeted. Analysis of data shows that the vast majority of rampage shooters are relatively young, white, and male. This analysis can help researchers and society at large try to make sense of this type of violence by focusing on those specific demographic groups (young, white, and male). It can also help combat misunderstandings that target certain groups unfairly and hinder progress toward solving a problem. In the case of rampage shooters, perpetrators are often assumed to be mentally ill. Statistical analysis of data about rampage shooters can help to push back against the incorrect idea that mentally ill people are often dangerous.

Central Tendency, Mean, Median, and Mode

Statistical analysis provides measures including central tendency, mean, median, and mode.
One use of statistical analysis is to provide meaningful comparisons. To do so, it can be useful to identify the central tendency, one value in a data set that is used to describe the center, or middle, point in the set. Descriptive statistics is an approach used to explain data and provide a summary of the information collected by a study. Measures of central tendency (mean, median, and mode) are descriptive statistics. A mean is an average. It is calculated by adding all values in a data set and dividing the sum by the total number of values. For example, adding a group of 10 numbers and then dividing by 10 gives the mean for the group. The median is the value in a data set where half the values fall below it and half are above. In other words, median the midpoint in a series of values arranged in numerical order. Half of all the numbers in the set are above the median, and half are below. In a group of 13 numbers arranged in numerical order, the seventh number is the median. In a group of 14 numbers arranged in numerical order, the median would be the halfway point between the seventh and eighth numbers. Often, the median is a better measure than the mean because the mean can be skewed (impacted) by extremes at either end of the data set. Consider pay rates at a large company. Employees who want to know what a typical salary in the company is would need to know the median pay, not the mean (average) pay. The median pay would be a more appropriate way of understanding employee salaries. The mean (average) salary would not provide useful information, because most CEOs make around 380 times as much as their average employee. The CEO's pay is an extreme outlier that skews the mean. Finally, the mode is the value or score that appears most frequently in a set of data. To understand test scores, educators might look at the mode in a set of test scores. In a class of 20 students, if 7 students score an 80 on the test, 80 is the mode. This means that 80 is the most frequent score for that test, even though the majority of the students did not score an 80.

Correlation and Causation

Researchers consider whether data show correlation or causation, but sociologists generally look for meaningful correlation between variables.

Sociologists can look for correlation, the relationship between variables. Correlation can show that different factors such as income, education, gender, or race have a strong or weak relationship. The strength of the relationship between two variables can help to indicate how likely it is that the correlation is meaningful. Positive correlation means that two variables both increase. For example, if a study shows that people with more education earn more income, a positive correlation between education and income is established. This positive correlation might be weak or strong, depending on how closely levels of income and education align. A negative correlation means that when one variable increases, a second variable decreases. A study might show that when prenatal care for pregnant women increases, complications during childbirth decrease. This negative correlation might be weak or strong.

Some studies can also show causation, a clear relationship of cause and effect. It is more difficult to prove that one thing causes another to happen. To establish a causal relationship, researchers must show that the cause occurs before the effect, that the factors under consideration are always related in this way, and that other explanations can be ruled out. It is challenging to demonstrate a cause-and-effect relationship in all fields. This is particularly true in sociology, where data are tied to the complex, lived experiences of individuals.