1IntroductionThis bookis about data science. This term has no precise defini-tion. Data science involves some statistics, some probability, somecomputing—and above all, some knowledge of your data set (the“science” part).The goal of data science is to help us understand patterns ofvariation in data: economic growth rates, dinosaur skull volumes,student SAT scores, genes in a population, Congressional partyaffiliations, drug dosage levels, your choice of toothpaste versusmine . . . really any variable that can be measured.To do that, we often usemodels. A model is a metaphor, a de-scription of a system that helps us to reason more clearly. Like allmetaphors, models are approximations, and will never account forevery last detail. A useful mantra here is: all models are wrong,but some models are useful.1Aerospace engineers work with1Attributed to George Box.physical models—blueprints, simulations, mock-ups, wind-tunnelprototypes—to help them understand a proposed airplane design.Geneticists work with animal models—fruit flies, mice, zebrafish—to help them understand heredity. In data science, we work withstatistical models to help us understandvariation.Like the weather, most variation in the world exhibits somefeatures that are predictable, and some that are unpredictable. Willit snow on Christmas day? It’s more likely in Boston than Austin,and more likely still at the North Pole; that’s predictable variation.But even as late as Christmas eve, and even at the North Pole,nobody knows for sure; that’s unpredictable variation.Statistical models describe both the predictable and the un-predictable variation in some system. More than that, they allowus to partition observed variation into its predictable and unpre-dictable components. This focus on the structured quantificationof uncertainty is what distinguishes data science from ordinaryevidence-based reasoning. It’s important to know what the evi-dence says, goes this line of thinking. But it’s also important to
6data scienceknow what it doesn’t say. Sometimes that’s the tricky part.