The Visual Display of Quantitative Information - Tufte.pdf - (A landmark book a wondeiful book Frederick Mosteller Harvard University(A tour deforce 11

The Visual Display of Quantitative Information - Tufte.pdf...

This preview shows page 1 out of 208 pages.

You've reached the end of your free preview.

Want to read all 208 pages?

Unformatted text preview: ((A landmark book, a wondeiful book. Frederick Mosteller, Harvard University ((A tour deforce. 11 John W. Tuk.ey, Bell Laboratories and Princeton U Diversity ((The centuf)''s best book on statistical graphics. 11 Computing Reviews ((Best 100 non-fiction books tfthe 2oth centuf)'. 11 amazon.com ((A classic reference. The overall intention and power tfthe book is stunning. 11 Optical Engineering �'�'A visual Strunk and White. 11 Boston Globe The cover graphic, drawn by Minoru Niijima, is based on E. J. Marey's train schedule from Paris to Lyon in LA methode graphique (Paris, t88s). Author photography by Robert Del Tredici Design & Typography by Howard I. Gralla 11 Edward R. Tufte \ The Visual Display ·of Quantitative Information SECOND EDITION Graphics Press · Cheshire} Connecticut .. -. Copyright c 2001 by Edward Rolf Tufte PUBLISHED BY GRAPHICS PRESS LLC PosT OFFICE Box 430, CHESHIRE, CoNNECTICUT 06410 All rights to illustrations and text reserved by Edward Rolf Tufte. This work may not be copied, reproduced, or translated in whole or in part without written permission of the publisher, except for brief excerpts in connection with reviews or scholarly analysis. Use with any form of information storage and retrieval, electronic adaptation or whatever, computer software, or by similar or dissimilar methods now known or developed in the future is also stricdy forbidden without written permission of the publisher. A number of illustrations are reproduced by permission; those copyright-holders are credited on page 197. Printed in the United States <ifAmerica Second edition,fft i h printing, August 2007 Contents PART I .. -.. GRAPHICAL PRACTICE 1 Graphical Excellence 2 Graphical Integrity 3 Sources of Graphical Integrity and Sophistication PART II 13 53 79 THEORY OF DATA GRAPHICS 4 Data-Ink and Graphical Redesign 5 Charijunk: Vibrations, Grids, and Ducks 6 Data-Ink Maximization and Graphical Design 7 Multifunctioning Graphical Elements 8 Data Density and Small Multiples 9 Aesthetics and Technique in Data Graphical Design 91 107 123 139 161 Epilogue: Designs for the Display of Information 177 191 For my parents Edward E:'Tufie and Virginia James Tufte '" To the memory of John W. Tukey (1915-zooo) troduction to the Second Edition " This new edition provides high-resolution color reproductions of e many graphics ofWilliam Playfair, adds color to other .images here appropriate, and includes all the changes and corrections accumulated during the 17 printings of the first edition. This book began in 1975 when Dean Donald Stokes ofPrinceton's Woodrow Wilson School asked me to teach statistics to a dozen journalists who were visiting that year to learn some economics. I annotated a collection of readings, with a long section on statistical graphics. The literature here was thin, too often grimly devoted to explaining use of the ruling pen and to promulgating "graphic standards" indifferent to the nature of visual evidence and quantitative reasoning. Soon I wrote up some ideas. ThenJohn Tukey, the phenomenal Princeton statistician, suggested that we give a series ofjoint seminars. Since the rnid-1960s, Tukey had opened up the field, as his brilliant technical contributions made it clear that the study of statistical graphics was intellectually respectable and not just about pie charts and ruling pens. After moving to Yale University, I finished the manuscript in 1982. A publisher was interested but planned to print only 2,000 copies and to charge a very high price, contrary to my hopes for a wide readership. I also sought to design the book so as to make it self-exemplifying-that is, the physical object itself would reflect the intellectual principles advanced in the book. Publishers seemed appalled at the prospect that an author rni_ght govern design. Consequently I investigated self-publishing. This required a first­ rate book designer, a lot of money (at least for a young professor), and a large garage. I found Howard Gralla who had designed many museum catalogs with great care and craft. He was willing to work closely with this difficult author who was fill ed with all sorts of opinions about design and typography. We spent the summer in his studio laying out the book, page by page. We were able to integrate graphics right into the text, sometimes into the middle of a sentence, eliminating the usual separation of text and image­ one of the ideas Visual Display advocated. To fmance the book I took out another mortgage on my home. The bank officer said this was the second most unusual loan that she had ever made; first place belonged to a loan to a circus to buy an elephant! My view on self-publishing was to go all out, to make the best and most elegant and wonderful book possible, without compromise. Otherwise, why do it? Most of all, the book, as a thing in itself, gave to me fresh new eyes for the inteqectual and aesthetic joy of visual evidence, visual reasoning, and v1sual understanding. January 2001 Cheshire1 Connecticut __) __ Introduction ,. Data graphics visually display measured quantities by means of the combined use of points, lines, a coordinate system, nurl}.bers, symbols, words, shading, and color. The use ofabstract, non-representational pictures to show numbers is a surprisingly recent invention, perhaps because of the diversity of skills required-the visual-artistic, empirical-statistical, and mathematical. It was not until 175o- 18oo that statistical graphics­ length and area to show quantity, time-series, scatterplots, and multivariate displays-were invented, long after such triumphs of mathematical ingenuity as logarithms, Cartesian coordinates, the calculus, and the basics of probability theory. The remarkable William Playfair (1759- 1823) developed or improved upon nearly all the fundamental graphical designs, seeking to replace conven­ tional tables of numbers with the systematic visual representations of his "linear arithmetic ." Modem data graphics can do much more than simply substitute for small statistical tables. At their best, graphics are �ruments for reasoning about quantitative information. Often the most effec­ tive way to describe, explore, and summarize a set of numbers­ even a very large set-is to look at pictures of those numbers. Furthermore, of all methods for analyzing and communicating statistical information, well-designed data graphics are usually the simplest and at the same time the most powerful. The first part of this book reviews the graphical practice of the two centuries since Playfair. The reader will, I hope, rejoice in the graphical glories shown in Chapter 1 and then condemn the lapses and lost opportunities exhibited in Chapter 2. Chapter 3, on graph­ ical integrity and sophistication, seeks to account for these differ­ ences in quality of graphical design. The second part of the book provides a language for discussing graphics and a practical theory of data graphics. Applying to most visual displays of quantitative information, the theory leads to changes and improvements in design, suggests why some graphics might be better than others, and generates new types of graphics. The emphasis is on maximizing principles, empirical measures of graphical performance, and the sequential improvement of graphics through revision and editing. Insights into graphical design are to be gained, I believe, from theories of what makes for excellence in art, architecture, and prose. This is a book·about the design of statistical graphics and, as such, it is concerned both with design and with statistics. But it is also about how to communicate information through the simultaneous presentation of words, numbers, and pictures. The design of statis­ tical graphics is a universal matter-like mathematics-and is not tied to the unique features of a particular language. The descriptive concepts (a vocabulary for graphics) and the principles advanced apply to most designs. I have at times provided evidence about the scope of these ideas, by showing how frequently a principle applies to (a random sample of ) news and scientific graphics. Each year, the world over, somewhere between 900 billion (9 X 101 1 ) and 2 trillion (2 X 10 12 ) images of statistical graphics are printed. The principles of this book apply to most of those graphics. Some of the suggested changes are small, but others are substantial, with consequences for hundreds of billions of printed pages. But I hope also that the book has consequences for the viewers and makers of those images-that they will never view or create statis­ tical graphics the same way again. That is in p..art because we are about to see, collected here, so many wonderful drawings, those of Playfair, of Minard, of Marey, and, nowadays, of the computer. Most of all, then, this book is a celebration of data graphics. PART I .. Graphical Practice I Graphical Excellence Excellence in statistical graphics consists of complex ideas communicated with clarity, precision, and efficiency. Graphical displays should • • • • • • • • • show the data induce the viewer to think about the substance rather than about methodology, graphi<;: design, the technology of graphic pro­ duction, or something else avoid distorting what the data have to say present many numbers in a small space make large data sets coherent encourage the eye to compare different pieces of data reveal the data at several levels of detail, from a broad overview to �the fi ne structure serve a reasonably clear purpose : description, exploration, tabulation, or decoration be closely integrated with the statistical and verbal descriptions of a data set. Graphics reveal data. Indeed graphics can be more precise and revealing than conventional statistical computations. Consider Anscombe's quartet: all four of these data sets are described by exactly the same linear model (at least until the residuals are ex­ amined) . X y 10.0 8.04 8.0 6.95 7.58 13.0 8.81 9.0 11.0 8.33 14.0 9.96 6.0 7.24 4.0 4.26 12.0 10.84 7.0 4.82 5.0 5.68 X 10.0 8.0 13.0 9.0 11.0 14.0 6.0 4.0 12.0 7.0 5.0 II y X 9.14 8.14 8.74 8.77 9.26 8.10 6.13 3.10 9.13 7.26 4.74 10.0 8.0 13.0 9.0 11.0 14.0 6.0 4.0 12.0 7.0 5.0 III y 7.46 6.77 12.74 7.U 7.81 8.84 6.08 5.39 8.15 6.42 5.73 X IV y 8.0 6.58 5.76 8.0 7.71 8.0 8.84 8.0 8.0 8.47 8.0 7.04 5.25 8.0 19.0 12.50 5.56 8.0 7.91 8.0 6.89 8.0 N=11 mean ofX's =9.0 mean ofY's =7.5 equation ofregression line: Y=3+0.5X standard error ofestimate ofslope=0.118 t = 4.24 sum of squares X- X =110.0 regression sum of squares = 27.50 residual sum of squares ofY =13.75 correlation coefficient = .82 r2 = .67 14 GRAPHICAL PRACTICE And yet how they differ, as the graphical display of the data makes vividly clear: F. J. Anscombe, "Graphs in Statistical Analysis," American Statistician, 27 (February 1973), 17-21. II I • 10 • • • 5 • • •• • • • • • • •• •• • • • • •• ·. 10 20 • III • • • • i • • • • • • • IV • I And likewise a graphic easily reveals point A, a wildshot obser­ vation that will dominate standard statistical calculations. Note that point A hides in the marginal distribution but shows up as clearly exceptional in the bivariate scatter. • • • - • • • • • • • • • • • • • • • • • • • • • • • • • • A Stephen S. Brier and Stephen E. Fien­ berg, "Recent Econometric Modelling of Crime and Punishment: Support for the Deterrence Hypothesis?" in Stephen E. Fienberg and Albert J. Reiss, Jr., eds., Indicators of Crime and Criminal Justice: Quantitative Studies (Washington, D.C., 1980), p. 89. t GRAPH ICAL EXCELLENCE Of course, statistical graphics, just like statistical calculations, are only as good as what goes into them. An ill-specified or prepos­ terous model or a puny data set cannot be rescued by a graphic (or by calculation) , no matter how clever or fancy. A silly theory means a silly graphic : New York Stock Prices � .;. � -.004 -� -.002 c. "' � u .......... ........ ...... .. ...___ ./ / ./ 168 .......�, \ \ \ \ ' . London Stock Pricea Normal \ \ SoLAR RADIATION AND 152 STOCK PRicES A. New York stock prices (Barron's average). B. Solar Radiation, inverted, and C. London stock prices, all by months, 1929 (after Garcia-Mata and Shaffner). Let us turn to the practice of graphical excellence, the efficient communication of complex quantitative ideas. Excellence, nearly always of a multivariate sort, is illustrated here for fundamental graphical designs: data maps, time-series, space-time narrative designs, and relational graphics. These ex��ples serve several purposes, providing a set of high-quality graphics that can be discussed (and sometimes even redrawn) in constructing a theory of data graphics, helping to demonstrate a descriptive terminology, and telling in brief about the history of graphical development. Most of all, we will be able to see just how good statistical graphics can be. 15 Edward R. Dewey and Edwin F. Dakin, Cycles: The Science of Prediction (New York, 1947), p. 144- 16 GRAPH ICAL PRACTICE Data Maps These six maps report the age-adjusted death rate from various types of cancer for the 3,056 counties of the United States. Each map portrays some 2 1,000 numbers. 1 Only a picture can carry such a volume of data in such a small space. Furthermore, all that data, thanks to the graphic, can be thought about in many different ways at many different levels of analysis-ranging from the con­ templation of general overall patterns to the detection of very fine county-by-county detail. To take just a few examples, look at the • • • • • • 1 Each county's rate is located in two dimensions and, further, at least four numbers would be necessary to recon­ struct the size and shape of each county. This yields 7X 3,056 entries in a data matrix sufficient to reproduce a map. high death r-ates from cancer in the northeast part of the country and around-'·t he Great Lakes In highest decile, statistically significant .. low rates in an east-west band across the middle of the country Significantly high, but not in highest decile .. In highest decile, but not statistically significant .. higher rates for men than for women in the south, particularly Louisiana (cancers probably caused by occupational exposure, from working with asbestos in shipyards) wmsual hot spots, including northern Minnesota and a few counties in Iowa and Nebraska along the Missouri River Not significantly different from U.S. as a whole Significantly lower than U.S. as a whole differences in types of cancer by region (for example, the high rates of stomach cancer in the north-central part of the country -probably the result of the consumption of smoked fish by Scandinavians) rates in areas where you have lived. The maps provide many leads into the causes-and avoidanceof cancer. For example, the authors report: In certain situations . . . the unusual experience of a county warrants further investigation. For example, Salem County, NewJersey, leads the nation in bladder cancer mortality among white men. We attribute this excess risk to occupational exposures, since about 25 percent of the employed persons in this county work in the chemical industry, particularly the manufacturing of organic chemicals, which may cause b ladder tumors. After the finding was communicated to NewJersey health officials, a company in the area reported that at least 330 workers in a single plant had developed bladder cancer during the last 50 years. It is urgent that surveys of cancer risk and programs in cancer control be initiated among workers and former workers in this area.2 2Robert Hoover, Thomas]. Mason, Frank W. McKay, and Joseph F. Frau­ meni, Jr., "Cancer by County: New Resource for Etiologic Clues," Science, 189 (September 19, 1975), 1006. Maps from Atlas of Cancer Mortality for U.S. Counties: 195o-1g6g, by Thomas]. Mason, Frank W. McKay, Robert · Hoover, William]. Blot, and Joseph F. Fraumeni,Jr. (Washington, D.C.: Public Health Service, National Institutes of Health, 1975). The six maps shown here were redesigned and redrawn by Lawrence Fahey and Edward Tufte. All types of cancer, white females; age-adjusted rate by county, 195o-1969 All types of cancer, white males; age-adjusted rate by county, 195o-1969 Trachea, bronchus, and lung cancer; white females; age-adjusted rate by county, 195o-1969 T�;achea, bronchus, and lung cancer; white males; age-adjusted rate by county, 195o-1969 Stomach cancer, white females; age-adjusted rate by county, 195o-1969 Stomach cancer, white males; age-adjusted rate by county, 195o-1969 I I I I I I I I I I I I I I I I I I I I I I I I I I I I 'I I 20 GRAPHICAL PRACTICE The maps repay careful study. Notice how quickly and naturally our attention has been directed toward exploring the substantive content of the data rather than toward questions of methodology and technique. Nonetheless the maps do have their flaws. They wrongly equate the visual importance of each county with its geographic area rather than with the number of people living in the county (or the number of cancer deaths) . Our visual impres­ sion of the data is entangled with the circumstance of geographic boundaries, shapes, and areas-the chronic problem afflicting shaded­ in-area designs of such "blot maps" or "patch maps." A further shortcoming, a defect of data rather than graphical composition, i1S that the maps are founded on a suspect data source, death certificate reports on the cause of death. These reports fall under the influence of diagnostic fashions prevailing among doc­ tors and coroners in particular places and times, a troublesome adulterant of the evidence purporting to describe the already some­ times ambiguous matter of the exact bodily site of the primary cancer. Thus part of the regional clustering seen on the maps, as well as some of the hot spots, may reflect varying diagnostic customs and fads along with the actual differences in cancer rates between areas. Data maps have a curious history. It was not until the seventeenth century that the combination of cartographic and statistical skills required to construct the data map came together, fully 5,000 years after the fi rst geographic maps were drawn on clay tablets. And many highly sophisticated geographic maps were produced cen­ turies before the first map containing any statistical material was drawn. 3 For example, a detailed map with � full grid was engraved during the eleventh century A.D. in China. The Yli Chi Thu (Map of the Tracks of Yli the Great) shown here is described byJoseph Needham as the . . . most remarkable cartographic work of its age in any culture, carved in stone in + 1 1 37 but probably dating from before + 1100. The scale of the grid is 100 li to the division. The coastal outline is relatively firm and the precision of the network of river systems extraordinary. The size ofthe original, which is now in the Pei Lin Museum at Sian, is about 3 feet square. The name of the geographer is not known. . . . Anyone who compares this map with the contemporary productions of European religious cosmography cannot but be amazed at the extent to which Chinese geography was at that time ahead of the West . . . . There was nothing like it in Europe till the Escorial MS. map of about + 1550 4 • • • . 3 Data maps are usually described as "thematic maps" in cartography. For a thorough account, see Arthur H. Rob­ inson, Early Thematic Mapping in the History of Cartography (Chicago, 1982). On the history of statistical graphics, see H. Gray Funkhouser, "Historical Devel­ opment of the Graphical Representation of Statistical Data," Osiris, 3 (November 1937), 269-404; and James R. Beniger and Dorothy L. Robyn, "Quantitative Graphics in Statistics: A Brief Histoz:y," American Statistician, 32 (February 1978), 1-11. 4Joseph Needham, Science and Civilisa­ tion in China (Cambridge, 1959), vol. 3, 546-547· GRAPH ICAL EXCELLENCE H-+-l-+-+-+-H-+-+-+-HI I ' • .. .. .. }, (• 21 • ., • " ... +-t--H-+-t.Q.H·•=+-·�·¥+1>+-F'-+-+l.'::�� • " 9 • y "/ .. .. E. Chavannes, "Les Deux Plus Anciens Specimens de...
View Full Document

  • Fall '18
  • baiocchi
  • The American, The Land, Edward Tufte, Information graphics, William Playfair, Charles Joseph Minard, statistical graphics, Edward Rolf Tufte

What students are saying

  • Left Quote Icon

    As a current student on this bumpy collegiate pathway, I stumbled upon Course Hero, where I can find study resources for nearly all my courses, get online help from tutors 24/7, and even share my old projects, papers, and lecture notes with other students.

    Student Picture

    Kiran Temple University Fox School of Business ‘17, Course Hero Intern

  • Left Quote Icon

    I cannot even describe how much Course Hero helped me this summer. It’s truly become something I can always rely on and help me. In the end, I was not only able to survive summer classes, but I was able to thrive thanks to Course Hero.

    Student Picture

    Dana University of Pennsylvania ‘17, Course Hero Intern

  • Left Quote Icon

    The ability to access any university’s resources through Course Hero proved invaluable in my case. I was behind on Tulane coursework and actually used UCLA’s materials to help me move forward and get everything together on time.

    Student Picture

    Jill Tulane University ‘16, Course Hero Intern

Stuck? We have tutors online 24/7 who can help you get unstuck.
A+ icon
Ask Expert Tutors You can ask You can ask You can ask (will expire )
Answers in as fast as 15 minutes