tufte - I want to reach that state of condensation...

Info iconThis preview shows pages 1–16. Sign up to view the full content.

View Full Document Right Arrow Icon
Background image of page 1

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 2
Background image of page 3

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 4
Background image of page 5

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 6
Background image of page 7

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 8
Background image of page 9

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 10
Background image of page 11

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 12
Background image of page 13

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 14
Background image of page 15

Info iconThis preview has intentionally blurred sections. Sign up to view the full version.

View Full DocumentRight Arrow Icon
Background image of page 16
This is the end of the preview. Sign up to access the rest of the document.

Unformatted text preview: I want to reach that state of condensation ofsensatiom which constitutes a picture. Henri Matisse 8 Data Density and Small Multiples Our eyes can make a remarkable number of distinctions within a small area. With the use of very light grid lines, it is easy to locate 625 points in one square inch or, equivalently, 100 points in one square centimeter. Or consider how an 80 by 80 grid over a square inch—about 30 by 30 over a square centimeter—divides the space:1 With the help of considerable redundancy and context, our eyes make fine distinctions of this sort all the time. Measurement instru- ments used in engineering, architectural, and machine work are engraved with scales of 20 increments to the centimeter and 50 to the inch. Or consider the reading of fine print. The type in the US. Statistical Abstract is set at 12 lines per vertical inch, with each line running at about 23 characters per inch for a maximum den- sity of 276 characters per square inch. The actual density, given the white space, is in this case 185 characters per square inch or 28 per square centimeter. No. 1450. STEEL PRODUCTS—NET SHIPMENTS, BY MARKET CLASS: 1960 TO 1978 [In thousands or short (one. Comprises carbon, alloy, and stainless steel."N .e.c." means not elsewhere classified] MARKET cuss Total 1 90.798 111.430 Steel ior converting and processing. Independent iorgers, n.a .. 1,192 industrial iasteners 1. Steel service centers, distributors 17,333 Construction. incl. maintenance 0.612 Contractors' products 3,480 Automotive. 21,253 Rail transportation 2.549 Freight ears, passenger cars, locomotives.. .. 2,188 Rails and all other 1,361 Shipbuilding and man 845 Aircrait and aerospace 60 Oil and gas industries 1.140 Minin , quarrying, and mberlng. 508 Agrieu toral, incl. machinery ____ .. 1.805 Machinery, industrial equip Electrical equipment"... Appliances, utenstls, and en Other domestic commercial eq Genuine packfiging, shipping. Cans an closures ...... ._ Ordnanoe and other military _. Exports (reporting companies only) I Total includes nonclasslfled shipments. and, beginning 1970, data include estimates for a relatively small number oi wmpanies which report raw steel production but not shi ments. ’ Bolts, nuts, rivets, and screws. I Includes rai ways, rapid transit systems, railroad rails, trackwor , and equipment. 2 5,281 distinctions 'A square grid formed on each side by 11 parallel black and n-1 parallel white lines contains 112 intersections of two black lines (corners of squares), (n—1)2 intersections of two white lines (white squares), and 2n(n—1) intersections of a black and white line (sides of squares), for a total of (an-i)2 line intersections or distinct locations. US. Bureau of the Census, Statistical Abstract ofthe United States: 1979 (Wash- ington, D.C., 1979), p. 822. ——_— 162 THEORY or DATA GRAPHICS Maps routinely present even finer detail. A cartographer writes that "the resolving power of the eye enables it to differentiate to 0.1 mm where provoked to do so. Clearly, therefore, conciseness is of the essence and high resolution graphics are a common denominator of cartography."2 Distinctions at 0.1 mm mean 254 per inch. How many statistical graphics take advantage of the ability of the eye to detect large amounts of information in small spaces? And how much information should graphics show? Let us begin by considering an empirical measure of graphical performance, the data density. Data Density in Graphical Practice The numbers that go into a graphic can be organized into a data matrix of observations by variables. Taking into account the size of the graphic in relation to the amount of data displayed yields the data density: . . number of entries in data matrix data den51ty of a graph1c = ———— area of data graphic Data matrices and data densities vary enormously in practice. At one extreme, this overwrought display (originally printed in five colors) presents a data matrix of four entries, the names and the numbers for the two bars on the right. The left bar is merely the total of the other two. The graph covers 26.5 square inches (171 square centimeters), resulting in a data density of .15 nurn— bers per square inch (.02 numbers per square centimeter), which is thin indeed. DATA DENSITY AND SMALL MULTIPLES 163 2D. P. Bickmore, “The Relevance of Percent Cartography,” in John C. Davis and 35 Michael McCullagh, eds., Display and Analysis of Spatial Data (London, 1975), 1). 331. n77#15 ,,-...10 TOTAL In college In adult PARTICIPATION or university education Executive Oflice of the President, Office of Management and Budget, Social Indicators, 1973 (Washington, DC, 1973), P- 86- —' DATA DENSITY AND SMALL MULTIPLES 165 164 THEORY or DATA GRAPHICS The exemplar from the jASA style sheet comes in at a light— weight 3.8 numbers per square inch (0.6 numbers per square cen— timeter) and a small data matrix of 32 entries: An annual sunshine record reports about 1,000 numbers per F. J. Monkhouse and H. R. Wilkinson, - Maps and Diagrams (London, third square inch (160 per square centimeter): edition, 1971), PP. 247F243. AVERAGE PROBABILITY 0J5 IILSiTIDInnII mu no an 5 Ion u» 0.|0 us: “'1ng T 0.05 - ' uuvluul nzczuasn mum FEBIHAI‘I umn mm ‘ . I0 l5 lllh... In conn‘ast, the New York weather history, in this reduced version, does very well at 181 numbers per square inch (28 per square centimeter): The visual metaphor corresponds appropriately to the data if the image is reversed, so that the light areas are the times when the sun shines: NEW YORK CITY‘S WEATHER FOR 1 930 AmuAerv-znruns July21:Im' u u- an u.- 3. i w an mam“. 33€E§§§§§ ax FRECIFIYATION IN meuss 166 THEORY OF DATA GRAPHICS This map (27 square inches, 175 square centimeters) shows the location and boundaries of 30,000 communes of France. It would require at least 240,000 numbers to recreate the data of the map (30,000 latitudes, 30,000 longitudes, and perhaps six numbers describing the shape of each commune). Thus that data den51ty is nearly 9,000 numbers per square inch, or 1,400 numbers per square centimeter. The new map of the galaxies locates 2,275,328 encoded rectangles on a two-dimensional surface of 61 square inches (390 square centimeters). Each rectangle represents three numbers (two by its location, one by its shading), yielding a data density of 110,000 numbers per square inch or 17,000 numbers per square centimeter. That is die current record. Jacques Bertin, Semialagie Graphique (Paris, second edition, 1973), p. 152. DATA DENSITY AND SMALL MULTIPLES Data Density and the Size of the Data Matrix: Publication Practices The table shows the data density and the size of the data matrix for graphics sampled from scientific and news publications. At least 20 graphics from each publication were examined. The table records an enormous diversity of graphical performances both within and between publications A few data—rich designs appear in nearly every publication. The opportunity is there but it is rarely exploited: the average published graphic is rather thin, Data Density and Size of Data Matrix, Statistical Graphics in Selected Publications, Circa 1979—1980 Data Density (Numbers per square inch) median minimum maximum Size of Data Matrix median minimum maximum Nature 48 3 362 177 15 3780 Journal ofthe Royal 27 4 115 200 10 1460 Statistical Society, B Science 21 5 44 109 26 31 6 Wall Street Journal 19 3 154 135 28 788 Fortune 18 5 31 96 42 156 The Times (London) 18 2 122 50 14 440 Journal of the American 17 4 167 150 46 1600 Statistiral Association Asuhi 13 2 113 29 15 472 New England journal 12 3 913 84 8 3600 of Medicine The Economist 9 1 51 36 3 192 Le Mantle 8 1 17 66 11 312 Psychological Bulletin 8 1 74 46 8 420 Journal qfthe American 7 1 39 53 14 735 Medical Association New York Times 7 1 13 35 6 580 Business Week 6 2 12 32 14 96 Newsweek 6 1 13 23 2 96 Annuaire Statistiqu 6 1 25 96 12 540 de la France Scientific American 5 69 46 14 652 Statistical Abstract of 5 2 23 38 8 164 the United States American Political 2 1 10 16 9 40 Science Review Pravda 0.2 0.1 1 5 4 20 167 168 THEORY OF DATA GRAPHICS based on about 50 numbers shown at the rate of 10 per square inch. Among the world’s newspapers, the Wall Streetjoumal, The Times (London), and Asahi publish data-rich graphics, with data densities equal to those of the Journal of the American Statistical Association. Most of the American papers and magazines, along with Pravda, publish less data per graphic than the major papers of other industrialized countries. Very few statistical graphics achieve the information display rates found in maps. Highly detailed maps portray 100,000 to 150,ooo bits per square inch. For example, the average U.S. Geological Survey topographic quadrangle (measuring 17 by 2.3 inches) is estimated to contain over 100 million bits of informa- tion, or about 250,000 per square inch (40,000 per square centimeter).3 Perhaps some day statistical graphics will perform as successfully as maps in carrying information. High-Information Graphics Data graphics should often be based on large rather than small data matrices and have a high rather than low data density. More information is better than less information, especially when the marginal costs of handling and interpreting additional information are low, as they are for most graphics. The simple things belong in tables or in the text; graphics can give a sense of large and complex data sets that cannot be managed in any other way. If the graphic becomes overcrowded (although several thousand numbers represented may be just fine), a variety of data—reduction techniques—averaging, clustering, smoothing—can thin the num— bers out before plotting.4 Summary graphics can emerge from high—information displays, but there is nowhere to go if we begin with a low-information design. Data-rich designs give a context and credibility to statistical evidence. Low—information designs are suspect: what is left out, what is hidden, why are we shown so little? High—density graphics help us to compare parts of the data by displaying much information within the view of the eye: we look at one page at a time and the more on the page, the more effective and comparative our eye can be.5 The principle, then, is: Maximize data density and the size of the data matrix, within reason. High-information graphics must be designed with special care. As the volume of data increases, data measures must shrink (smaller dots for scatters, thinner lines for busy time-series). The clutter of 3 Morris M. Thompson, Mapsflar America (Washington, DC, 1979), p. 187. 4 Paul A. Tukey and John W. Tukey, "Summarization: Smoothing; Supple- mented Views," in Vic Barnett, ed., Interpreting Multivariate Data (Chichester, England, 1982), ch. 12; and William S. Cleveland, “Robust Locally Weighted Regression and Smoothing Scatterplou,” Journal of the American Smtistiml Associa- tion. 74 (1979). 829—836- 51t is suggested in the analysis of x—ray films to “search a reduced image so that the whole display can be perceived on at least one occasion without large eye movement.” Edward Llewellyn Thomas, “Advice to the Searcher or What Do We Tell Them?" in Richard A. Monty and John W. Senders, eds., Eye Movement: and Psythalogizal Processes (Hillsdale, NJ., 1976), p- 349- DATA DENSITY AND SMALL MULTIPLES chartjunk, non-data—ink, and redundant data—ink is even more costly than usual in data-rich designs. The way to increase data density other than by enlarging the data matrix is to reduce the area of a graphic. The Shrink Principle has wide application: Graphics can be shrunk way down. Many data graphics can be reduced in area to half their currently published size with virtually no loss in legibility and information. For example, Bertin's crisp and elegant line allows the display of 17 small-scale graphics on a single page along with extensive text. Repeated application of the Shrink Principle leads to a powerful and effective graphical design, the small multiple. 6 7 T‘ .m‘VAA WA .‘m—B - g 1 ’f 11 {-4 Ixcllc )1 I0 PROBLEMES GRAPHIQUES POSES PAH LES CHRONIOUES U_n_toial sur dcux eases (sur dcux um) duu dire dwlse pur dcux (I). Un total pour su muis sen: multiplic pur dcux dans des cuses unlluellcr. Courhestrup poinlucs. réduirc l'éclicllc ties 0. la sensnbllne anguluire s'inscrii duns une tone moyenne nulour de 70". 51 la courbe_n'csl pus niductible (gruudes el ple‘titissyanauons) employer Ics euluuuer rem- p In . Courbestrop plates: uugmenlcr l'éehellc dcs Q Variations irés faihles pur rupporl uu lulu], Celui-ci perd de l‘impununce cl h: zéru pnur are suppnmé, ii condmun que la lcelcur van 5:: suppression (9). Le gruphiquc pcul due inlcr Pretelcomme une' ‘L‘lcfullti‘ll lsli I'elude I'm: dcx vana tons esl ncecssurre é' - - ' i ([0) (v. p 240). ( t c c loguruhmiquc Trés grande amplilude enlre les valeun extre- mes. II_ raui udmcurc : I")'Sou dc ne pas pcrcevoir les plus peincs variations. 2°) Soil de m: s'imércsscr qu'uux différcuces relatives (échelle Iognrilhmique) suns cunna‘llrc In quantile absoluc. 3") Son udmcllre des péri es differenlcs danslla composunle ordonnét cl lcs trailer is des tchellcs differeules au-dessus de I'échelle commune (I2). Cycles (res marques. SI l'eludc pone sur la campuruisun des phases dc chaque cycle. il est preferable de decom- poscr (l3) de muniere s superposer les cvcles (l4). La construction poluirc peut euc emplévée. deprefe'rence dansunc form: spirale(l5)(neipus commenter par un lrop peril cercle); pour spec» Iaeulaire qu'clle soil. ellc est moirls efficace que la construction orthogonule. Courbes annuelles dc pluie ou de temperature. Uh cyclepossede deux phases (l7). pourquoi u an olan qu'unc a la perception du specla» ieur ‘.' (l6). Jacques Bertin, Semiologie Graphique (Paris, second edition, 1973), p. 214. 169 17o THEORY OF DATA GRAPHICS DATA DENSITY AND SMALL MULTIPLES 171 Small Multi les - . _ . P Thcsc grim Small multiples show the distribution of occurrence Arthur Wiskemann “Zur Melanome t . . . . . _ i n - Small multiples resemble the frames of a mov1e: a series of graphics, From video tape by Gregory]. McRae, of the cancer m°13n°m3« The 511365 0f 269 primary melanomas are stehung Clutch chronische Lichmnwip showing the same combination of variables, indexed by changes fihforgiaegiiflgojloiyélhe recorded, along Wlth the distribution between men and women, kung, ' Der Hautarzt, 25 (1974), 21. _ _ , e m s . , , . . _ , , in another variable. Twenty—three hours of Los Angeles am 1301- w, R. Goodm, and J‘ H_ Seinfeld, N ow. the data graPhlcal arlthmetlci Slmllar to that of the lution are organized into this display, based on a computer gen— "DCVCIOPmcut 0” Second-Generatim mulnwmdow PIOI- d .d sh ‘ th h 1 d. .1) ti f Mathematical Model for Urban Air erate v1 eo tape. own is e our y average istri u on o Pollution. L Model Formulation," reactive hydrocarbon ermssxons. The des1gn remains constant Atmorpheric Environment, 16 (1982), through all the frames, so that attention is devoted entirely to 679—696 shifts in the data: Abb. 1. Verteilung von 269 prim'slren Melanomen auf Kopf und Hale Abb. 2 Frauen Abb. 3 Abb. 2 u. 3. Differenzierung der Melanomverteilung nach Geschlechbern 172 THEORY OF DATA GRAPHICS The efl‘ects of sampling errors are shown in these 12 distributions, Edmond A. Murphy, “One Cause? f o r m norma ' : Many Causes? The Argument from the each based on a sample 0 5 ando 1 deVIates Bimoda] Distribution,” Jamal of Chronic Diseases, 17 (1964), 3°9- :ALAAA‘A MALAAL These six distributions show the age composition of herring catches each year from 1908 to 1913. A tremendous number of herring were spawned in 1904, and that class began to dominate the 1908 catch as four—year-olds, then the 1909 catch as five—year—olds, and SO on: 11 12 B 1.. 15 ‘6 Johan Hjort, “Fluctuations in the Great 5 H 5 6 7 fl 9 1a Fisheries of Northern Europe,” Rapport: et Proces—Verbaux, 20 (1914), in Susan Schlee, The Edge afan Unfamiliar World (New York, 1973), p. 226. [A 1\ - E “A. 556789101151'11516 This next design compares a complex set of data: shown are the chromosomes of (from left to right) man, chimpanzee, gorilla, Jorge]. Yunis and Om Prakash. “The d Th - ‘1 - - b h d th t Origin of Man: A Chromosomal PIC- an orangutan. e srrrn annes etween umans an e grea “ml Legacy,” 5mm, “5 (March 19’ apes are to be noted. 1982), 1527. LIIIIZHD ZED] 0 III: IIII (Ilr 15 A‘x': ..-_r .‘ uhi.‘. 111nm )iIii'i'i' (11111“: ‘1]- II) II lull: NIKKI) II Imll—III: LIE‘.' 11' "21 I! f .53 E s’ a E i ii a? 5% n ‘fff‘ "" .‘I‘u‘l .1 u ~=~2 - “flakk - “Kietzmialnzh It." . - f“ [m mJII'IImI-Iflrlm >< (mm mm ‘+ i .< V — —— ‘7 i 174 THEORY OF DATA GRAPHICS DATA DENSITY AND SMALL MULTIPLES 175 And, finally, a visually similar small multiple, the Consumer Consumer Reports, 47 (April 1982), Condusion Reports frequency-of—repair records for automobiles built from 199—2” Redraw“ 1976 to 1981. This is a particularly ingenious mix of table and graphic, portraying a complex set of comparisons between man- ufacturers, types of cars, year, and trouble spots. ‘ ineVitably comparative Well—designed small multiples are - deftly multivariate - shrunken, high-density graphics - usually based on a large data matrix 0- Much Wit-on wag. O = Emlyn“ mm- o n Avum o =Wam mm may. . : Much wan nun um.- - drawn almost entirely with data-ink ~ efficient in interpretation anflbu. Chwmln mun mm... 210.3210 m"""‘"’" Ford Gun-ace ram simme- IIZWDI um Accord Chi-"ILVI 7s 77 7a 79 so 31 7a 77 7s 75 an al 75 77 7s 79 an m 75 77 7a 79 an al 75 77 7a 79 so a: 75 77 7s 75 so a: ' often narrative in content showin Shifts in d1 1 ‘ 000000 00000 000.00 A-v-wmm 000000 000000 000000 . .’ g “canons ‘P o o . o o o o . . . o . o O O O O mmmmm . . o o o o O O o o o o o o O O O 0 between variables as the index variable changes (thereby o o o o o o o o O o o o o o O o o Bwvwnwvw o o o o o O o o o o o o o O o o o o revealing interaction or multiplicative effects). 000000 00000 000000 “Vt-"M's 000000 000000 000000 000000 00000 000000 “WWW 000000 000000 000000 000000 00000 000000 5""5 000000 000000 000000 .0000 000000 mm 000 000000 000000 000000 00000 000000 “MW 000000 000000 000000 000000 00000 000000 E‘W'm‘MVW'Nmi‘i’ 000000 000000 000000 - 000000 00... 000000 Engineaeelmg 000000 000000 000.00 Small m‘fluples mfl‘xt much 0fthe theory 0fd3t3 graphics: 6The two nphorisms on the meaning of 000000 00000 000000 E"‘-""°"‘°""“‘“‘ 000000 000000 000000 "lcag"nrcl - ' .] .d~ . _ 000000 00.00 000000 meme-m 000000 000000 000000 For non-data—ink, less is more. wig Mic:Jrhip52illfolii::ndlidelsz-xf 88883: 3:83: 888888 MW“ 883:83 388888 888888 F d 1&1 mehfimemMmemmm lgmllonsvsmm or arm—i ' 6 A I I - " oooooo 000.0 000000 mMM 000000 000000 000000 ’mmabw' ,gxtuMWWmmwmmmm 00000 000000 "MVNSS‘WW'W" 000 000000 000000 ' 000000 00000 000000 "MSWWMW'MWH 000000 000000 000000 000000 00000 000000 'mbh'fl“ 000000 000000 000000 0000 00000 00000 00000 “WW 0 00000 00000 Muelde soon Flvmumh Vuluvl Sublnl lump: two) “VI-HIM rayon amu- anluwnlln mush Vane no mm imlml) (mun-mil ldlcull 7s 77 7a 75 an m 75 77 7s 79 so m 75 77 7a 79 so an 75 77 7e 79 so a: 75 77 7s 7: so at 75 77 7a 19 sail 00000 00000 000000 A-r-mdflimim 000000 0000 000000 00000 00000 000000 Bodemwwn 000000 00000 000000 00000 souvemwusu 000000 00000 000000 00000 Emvhlmw 000000 00000 000000 00000 smug.“ 000000 00000 000000 00000 was 000000 00000 000000 Chm 000000 00000 000000 00000 omen“. 000000 00000 000000 00000 E‘m'lwwmmlcmsm 000000 00000 000000 00000 Encin-wlinn 000000 00000 000000 00000 Ensinemechaniul 000000 00000 000000 00000 amismalem 000000 00000 000000 00000 MW"- 000000 00000 000000 00000 000 Ignmnsvsmm 000000 00000 000000 00000 .00 0 Suspension 000000 00000 000000 000 0 00 'nnsm'sswmw" 000000 00000 000000 00000 00000 000000 "insmssmlavwmflicl 000000 000000 00000 00000 000000 'r-vN-‘W 000000 00000 000000 00000 00000 00000 “mud-=- 00000 0000 00000 ltfis a principle that shines impartially on thejust and unjust that once you have a point Corruption Ellidence Presentations .' 0 view all history will back you up. I ' . ' Eflects Without Causes, C herry-Picking, Overreaching, Chartjunk, and the Rage to Conclude Van Wyck Brooks, America 's Coming-af-Age (New York, 1915), 20. The rage for wanting to conclude is one of the most deadly and most fruitless manias to befall humanity. Each religion and each philosophy has pretended to have God to itself, to measure the infinite, and to know the recipe for happiness. What arrogance and what nonsense! I see, to the contrary, that the greatest geniuses and the greatest works have never concluded. M AKING 3 Presentation is a moral act as well as an intellectual activity. Gustave Flaubert. Correspondance (Paris, 1929), VOL v, “L The use of corrupt manipulations and blatant rhetorical ploys in a report or presentation— outright lying, flagwaving, personal attacks, setting up phony alternatives, misdirection, jargon-mongering, evading key issues, [The anthropological idea of “culture” is] so difluse and all-embracing as to seem like an feigning diSime’esmd °bjCCtiVitY’ Wiuful “fisundemanding Of Other points of view—suggests that the presenter lacks both credibility and or believe W d d . ‘ I i I evidence. To maintain standards of quality, relevance, and integrity for . . . . 2 were con emne , it seemed, to working with a logic and a language in evidence, consumers of presentations should insist that presenters be held which concept, cause, form, and outcome had the same name. intellectually and ethically responsible for what they show and tell. Thus consuming a presentation is also an intellectual and a moral activity. all—seasons explanation for anything human beings might contrive to do, imagine, say, be, Clifford Geertz, Available Light (Princeton, 2000), 12-13. THESE 2 chapters describe widely used presentation methods that are enemies of the truth, that corrupt reasoning, that produce unbeautiful anti—evidence. Rather than blatant, the methods described are subtle, Crack and sometim“ break; under the burdm, the more dangerous for being so. These maneuvers often distort evidence, Under the tension, slip, slideJ perish) deceive the audience, and exploit the bond of trust necessary for reliable and intelligent communication. Words strain, Decay with imprecision, will not stay in place, Will not stay still. C ORRUPT maneuvers, blatant or subtle, are epidemic in political speeches, T. S. Eliot, “Burnt Norton," Four Quartets (London, 1943). marketing, intemet rants, and PowerPoint pitches. Less often, perhaps, corruptions of reasoning and evidence show up in serious presentations: our examples in these 2 chapters include strategic plans in business, a report of a presidential commission, economics, engineering analysis during a crisis, certain data analysis techniques, public health, medical research, and social science. First you establish the traditional “two views” of the question. You then put forward a common-sensicaljustification of the one, only to refute it by the other. Finally, you send them both packing by the use of a third interpretation, in which both the others are shown to be equally unsatisfactory. Certain verbal maneuvers enable you to line up the traditional DESPITE the threat 0f Corruption 3 Consumer OfPresematiom Sho‘fld try to be hopeful and curious, avoid premature skepticism, and maintain d . , an open mind but not an empty head. After all, many presentations are an container, appearance and reality, essence and existence, continuity and discontinuity, not corrupt, Furthermore, a Presenter engaging in corrupt maneuvers and so on. Before long the exercise becomes the merest verbalizing, reflection gives place might be rePorting What eventually tum out to be accurate and tmthful to a kind of superior punning, and the “accomplished philosopher ” may be recognized by comlmions’ A Particular danger’ then’ Of corrupt maneuvers is not only . . . , I that the enable 1 in but also that the lace the truth in disre ute. the ingenuity with which he makes ever—bolder play with assonance, ambiguity, and the From scientific regortgs to political speech; few things are moregppalling use of those words which sound alike and yet bear quite different meanings. than listening to inept and specious arguments made by one’s allies. “antitheses” as complementary aspects of a single reality: form and substance, content Claude Levi—Strauss, Tristes Tropiques (Paris, 195 5; London, 1961), 54. 14,2 BEAUTIFUL EVIDENCE Efi'em Without Causes and The Evasion of Responsibility [3 ' ' ribes major lapses in security and I h Re t of the 9/11 Comnussmn desc I m‘scsed ogportunities that may have allowed the attacks to ta e place. 1 k [Despite the CIA’s numerous warnings, America’s] domestic agencies never b'l' d in response to the threat. They did not have direction, and did? mo 1 126 la institute. The borders were not hardened. Transportation net have a P n ttofortified. Electronic surveillance was not targeted against sydtems [31:21; State and local law enforcement were not marshaled to :ugnr-lbeiit the FBI’s efforts. The public was not warned.‘ Here and elsewhere in the Report, conspicuously absent is the agentl 0f inaction. Above, 5 verbs are passive. The 3 active verbs take the utter ylho vague subject “domestic agenc1es.” Exactly who not mzdte a filamwerc did not follow up, who failed to warn the pubhc? These things ad. 5 d he must have been not done by somebody, and the somebo. 1? mt 0' to him "2 By means of the passive voice, the 9/11 Commiss1on :ZZESIEtributing responsibility for security lapses. Of course, ifthe passive didn’t exist, they would have done it some other way, CC Although often a useful writing technique, passive verbs also adval'IZOL1t fists without causes, an immaculate conception.To Speak of ends w1t1 at c without agents, actions without actors ls contrary to c B m‘ean‘S‘ again-lie issues at hand involve responsibilities or decisions or plans: :Ziinslailniiasoning is necessary. The logic of decisions is “If we do such—stifl- such [cause], then we hope this—and—that W111 happen [desned eHece (-fid And the logic of responsibility 15 the log1c of the actlve v01ce: some":1 ity or did not do something. Alert audiences should watch opt for caus from nowhere and its sometime assistantithe. passwe v01ce. day The techniQue of CVaSion by PaSSIVe voice is well—known and “(711 a noted— for example, in Strunk and White, Elements of Style—Van yrt some reviewers of The 9/11 Commission Report failed to detect its oveven evasive deflections of responsibility. What is obvious, and perhaps eE so tiresome to the sophisticated, in Strunk and White s textbook is noomuS obvious in serious action. Why should we fail to be vigilant and rig It about the quality of evidence and Its presentation just because a rppzn is part of a public dialogue, or is meant for the nevys medla, or is 1’0 the government, or concerns an Important mattcn WHAT the passive voice is for verbal reasoning, certain statistical metZOdS are for data reasoning: anti—causal, a jumble of effects without caulit; In Particular, the techniques of data mmmg, factor analysts, mu an dimensional scaling crunch and grind vast data matrices down into $1211 lumps but don’t test causal models. These techniques are perhapls us;ns for those who have lots of data but no ideas. To be relevant for eels and actions, whatever emerges from data crunching must somehow turn into evidence about causal processes. 1 The 9/11 Commission Reporz:FinaI Report ofthe National Commission on Terrorist Attacks Upon the Unitesl States (New York, 2004), 265. “ Got It 3 Thomas Powers, How Bush Wrong,” The New York REVIEW of Books, 60 (September 23, 2004), 87. CORRUPT TECHNIQUES IN EVIDENCE PRESENTATIONS LIKE the passive voice, the bullet-list format collaborates with evasive out causes, as in the fragmented generic ans and the dreaded mission statement: 73: Accelerate The Introduction Of New Products Ii! * Accelerate Revenue Recognition!!! Better to say who will accelerate, an will accelerate. An effective method the sentence, with subjects and pred their effects. Identifying specific ag help forensic accountants and pros excessively accelerated recognition of revenue. In presentations of plans, schemes, and strategies, bullet outlines all confused even about simple, one Harvard Business Review, a deep ana d what, how, when, and where they ology for making such statements is icates, nouns and verbs, agents and ents of action may also eventually ecutors target those responsible for get —way causal models. Here, from the lysis of bullets for business plans: Bullets leave critical assump tions about how the business works unstated. Consider these major objec tives from a standard five-year strategic plan 0 Increase market share by 2 5%. 0 Increase profits by 30%. I Increase new—product introductions to ten a year. Implicit in this plan is a complex but unexplained vision of the organiza— tion, the market, and the customer. However, we cannot extrapolate that vision from the bullet list. The plan does not tell us how these objectives tie together and, in fact, many radically different strategies could be repre- sented by these three simple points. Does improved marketing increase market share, which results in increased profits (perhaps from economies of scale), thus providing funds for increased new—product development? Market share —> Profits H New—product development Or maybe new-product develop ment will result in both increased profits and market share at once: k t h New—product development M Mat 6 S are Profits Alternatively, perhaps windfall profits will let us just buy market share by stepping up advertising and ne w—product development: Profits —> New—product development —> Market share.3 It follows that more complex and realistic multivariate causal models are way over the head of the simplistic bullet—list format. of presentations should sketch out the causal models lurking within the analysis, assess how the evidence links up to the theories, draw diagrams with arrows, label alleged causes and effects, contemplate the meaning of arrows, and do What the presenter should have done in the first place. 3 Gordon Shaw, Robert Brown, Philip Bromiley, “Strategic Stories: How 3M is Rewriting Business Planning,"HarI/anl Business Review, 76 (May—lune. 1998), 44. 143 — 144 BEAUTIFUL EVIDENCE Cherry—Picking, Evidence Selection, C ulled Data THE most widespread and serious obstacle to learning the truth from an evidence—based report is cherry—picking, as presenters pick and choose, select and reveal only the evidence that advances their point of View. “It is a principle that shines impartially on the just and unjust that once you have a point of view all history will back you up.”4 Few presenters are saintly enough to provide their audience with competing explanations, contrary evidence, or an accounting of the full pool of evidence tapped to construct the presentation. But thoughtful presenters might at least privately test their analysis: Has evidence been filtered or culled in a biased manner? Do the findings grow from evidence or from evidence selection? Would the findings survive the scrutiny ofo skeptic or investigator of research fraud? What would Richard Feynman think? Such questions may help presenters get it right before enthusiasm for their own work takes over. To avoid being fooled, consumers of presentations must ask these questions as well, It is idle, however, for skeptics to claim that the evidence presented in a report has been selected. Of course there is more evidence than that published. The key issue is whether evidence selection has compromised the true account—if we only knew what it was—of the underlying data. A CLEAR sign of cherry—picking is that a report appears too good to be true, provoking consumers of the report to mutter “It’s more complicated than that.” But unless the back-office operations in preparing a report are known, it is difficult to identify the specific effects of evidence selection for any single report. A series of reports, however, can decisively reveal corrupt practices. It is perhaps merely a happy coincidence when the reported quarterly earnings for one corporation exceed its forecasted earnings by exactly one penny; but if this miracle occurs in 20% of all corporations, it is a sign of systematic manipulation of financial data.5 In medical research, too often the first published study testing a new treatment provides the strongest evidence that will ever he found for that treatment. As better controlled studies—less vulnerable to the enthusiasms of researchers and their sponsors—are then conducted, the treatment’s reported efficacy declines.Years after the initial study, as the Evidence Decay Cycle plays out, sometimes the only remaining issue is whether the treatment is in fact harmful. Lack of disciplined comparisons, which gives free play to cherry—picking, erodes the credibility of early reports of enthusiasts, no matter how authoritative their sales pitch: One day when I was ajunior medical student, a very important Boston surgeon visited the school and delivered a great treatise on a large number of patients who had undergone successful operations for vascular reconstruction. At the end of the lecture, a young student at the back of the room timidly asked, “Do you have any controls?" Well, the great surgeon drew himself lost: 4 Van Wyck Brooks, America’s Coming- of-Age (New York, 1915), 20. Or, as Nero Wolfe remarks, l‘Once the fabric is woven it may be embellished at will." Rex Stout. The Golden Spiders (New York, 1955), 85. 5 The distribution of published test statistics over a series of papers may reveal culling of evidence. Consider a research issue between those who believe there is no relationship between 1 variables (call this hypothesis Ho) and those who believe there is a relationship (HA). Advocates ofHo and HA gather data, run many multiple regressions, and examine the t—statistics on the relevant regression coefficients. Both schools of advocates seek to publish decisive results; and both seek to avoid ambiguous results. Such advocacy may lead to evidence selection, resulting in a peculiar distribution of all published test statistics: a heaping of results that strongly favor either H0 or HA, and fewer results in the dreaded Zone of Boredom, Ambiguity, and Unpublishability, the zone of non-results. For t-tests on regression coefficients, the ZBAU values fall between 1.6 to 2.0, where the choice between Ho and HA is aclose call. For my book Political Control qfthe Economy (Princeton, 1978), I compiled the distribu— tion of the 248 published t-statistics from all 17 of the then—published studies of election year macroeconomic conditions and the U.S. national vote for the political party of the incumbent president. Both the Ho and the HA heapings, as well as the Zone of Boredom, Ambiguitymnd Unpublishahility, are seen below in the distribution of published t-statistics: 5% . 3,; ,, i ‘-| M erJ V: 0. 4‘ 0 values of published t-statistics for 248 regression coefficients in all I7 published studies of economic conditions and election outcomes CORRUPT TECHNIQUES IN EVIDENCE PRESENTATIONS 145 up to his full height, hit the desk, and said, “Do you mean did I not operate on half of the patients?" The hall grew very quiet then. The voice at the back of the room very hesitantly replied, “Yes, that's what I had in mind." Then the visitor’s fist really came down as he thundered, “Of course not. That would have doomed half of them to their death.” God, it was quiet then, and one could scarcely hear the small voice ask, “Which half ?”‘ THOMAS CHALMERS, a founder of evidence-based medicine, repeatedly demonstrated that the more susceptible a research design is to evidence selection and bias, the more enthusiastic the evidence becomes for favored treatments. For example, Chalmers and colleagues examined 53 published reports evaluating a surgical procedure, a portcaval shunt for esophageal bleeding.7 All studies were rated on (1) enthusiasm of the findings for the surgery, (2) quality of the research design (good design : random assignment of patients to treatment or control groups; bad = treatment group not compared with any proper control). The gold standard of research designs is the randomized controlled trial (RCT), which assigns patients randomly to the treatment or the control group (assuring within chance limits that both groups are identical in all respects, known and unknown, thereby avoiding, for example, selection of more promising patients to favored treatments). Of the 53 published studies, only 6 were well-designed (RCT). And their findings were clear: none of the 6 well-designed studies were markedly enthusiastic about the operation: Quality of research design versus degree of investigator enthusiasm for the portcaval shunt surgical procedure, 53 published studies 6 Dr. E. E. Peacock, J12, University of Arizona College of Medicine; quoted in Medical World News (September 1, 1972). 45. 7 N. D. Grace, H. Muench, T. C. Chalmers, “The Present Status of Shunts for Portal Hypertension in Cirrhosis," Gastroenterology 50 (1966), 684—691; table shown here as updated by Chalmers injohn P. Gilbert, Richard]. Light, and Frederick Mosteller, “How Well Do Social Innovations Work?” in Judith Tanur, et al, Statistics: A Guide to the Unknown (San Francisco, 1978), 135, MARKED MODERATE N0 ENTHUSIASM ENTHUSIASM ENTHUSIASM RESULTS OF 6 WELL-DESIGNED (RCT) STUDIES: H 3 3 RESULTS OF 47 POORLY DESIGNED STUDIES: 34 10 3 In contrast, for 47 studies lacking valid controls, 34 expressed marked enthusiasm for the surgery. Thus 72% (34 of 4.7) of the poorly controlled studies got it wrong, endorsing a surgical procedure unwarranted by the RCT gold standard. This link between lousy research designs and wrongly enthusiastic reports has been replicated again and again for all too many drugs and surgical procedures (some eventually abandoned thanks to the meta—analysis of evidence similar to the table above). Loosely designed studies allow the underlying medical reality to be filtered and cherry—picked so as to reliably produce unreliable evidence for favored treatments. Discovery of this persistent bias led to regulatory and scientific standards requiring that research on medical treatments use randomized controlled trials. These requirements, alas, have in turn fostered imaginative new methods for cherry-picking the data of medical research in favor of good news.‘3 3 An-Wen Chan, Asbjorn Hro'bjartsson, Metre T. Haahr, Peter C. Gotzsche, Douglas G. Altman, “Empirical Evidence for Selective Reporting of Outcomes in Randomized Trials: Comparison of Protocols to Published Articles,” Journal ofthc American Medical Association, 291 (May 26, 2004), 2457-2465. 146 BEAUTIFUL EVIDENCE Controlled trials are prospective: a possible cause-effect relationship is identified, an intervention made, future outcomes observed. In contrast, evidence for many phenomena—weather, politics, geology, economics, art history, business—largely comes from retrospective, nonexperimental observation. In such after—the-fact studies, researchers and presenters have a good many opportunities to decide what counts as relevant evidence, which are also terrific opportunities for cherry—picking. Consider a standard methodology in economics, finance, and political economy: explanations are empirically generated or perhaps evaluated by means of fitted models. Such multiple regression analyses of historical data seek to account for the past and to predict the future. In theory, perfectly reasonable; in practice, these efforts may be compromised by a certain slackness of theory and method: (1) Imprecise theories Some theories suggest what variables might be related. But the theories tend to be vague and broad, hinting at perhaps 5 to 10 relevant effects, and 5 to 100 candidate causal variables. (2) Many “notions” Researchers hold a variety of sub—theoretical ideas, or notions, that are employed in the course of data analysis: trying out variables not explicitly justified by the theory; excluding part of the data or mixing in ad hoc dummy variables to re—estimate the model; taking logarithms and other transforms; fitting innumerable lag structures to time—series. Researchers may have 5 to 100 notion options available. (3) Many dtfirerent operational measures for the same concept Consider the variety of plausible empirical measures of economic growth, social status, cultural norms, educational achievement, political competition. (4) Data slack In conducting data analysis, researchers decide many details: treatment of missing data, reconciliation of discrepant sources, construction of classifications, choice of the beginning/ending points in time-series (a notorious cheat in financial data), choice of category cut- points, and so on. Not all these decisions are necessarily made innocent of the favored result, and a few small tilts in the same direction soon add up to a finding.9 All told, many plausible models result. For le explanatory variables, there are 2k — 1 possible fitted models, then multiplied by notions and on through the rest of the slack. Routinely 104' to 107 fitted models are available; all can be quickly computed and sorted over. These models are not independent and many look pretty much alike, but which few to publish from millions of possibilities? This latitude for evidence selection makes it difficult to distinguish between reliable findings and cherry- pickings. Now and then it may matter. Such model-searching might find something new and true. Or merely something brittle and over-fitted, a model that collapses when used for prediction.” Found models (possibly from among millions searched) must be replicated afresh on innocent data—for how can models produced 9 “Exercising the right of occasional suppression and slight modification, it is truly absurd to see how plastic a limited number of observations become in the hands of men with preconceived ideas.” Francis Galton, Meteorographica (London, 1863), 5. ‘0 C. M Bishop's diagram mocks a fussy over—fitted model (the red line) wandering around the data space picking up every little piece of stray variation. Built on the quicksand of idiosyncratic and random variation rather than the rock of predictive causal theory, over—fitted models shrink when applied to new data, Indeed, the decline in explained variance for a found model compared with that model applied to new data is appropriately called. in the jargon of model—building,"shrinkage’i CORRUPT TECHNIQUES IN EVIDENCE PRESENTATIONS 147 by massive searches then be confirmed by reference to the same material? This circularity of stale testing of cherry—picked data—generated models leaves researchers and their findings no more secure than Ignorance’s justification for hope in The Pilgrim ’s Progress: Ignorance: But my heart and life agree together, and therefore my hope is well grounded. Christian: Who told thee that thy heart and life agree together? Ignorance: My heart tells me so.“ Credible explanations grow from the combined testimony of 3 more or less independent, mutually reinforcing sources—explanatory theory, empirical evidence, and rejection of competing alternative explanations. Cherry-picking dilutes and confounds these 3 sources into the wishful circular thinking and just—so stories of Ignorance. BETWEEN the initial data collection and the final published report falls the shadow of the evidence reduction, construction, and representation process: data are selected, sorted, edited, summarized, massaged, and arranged into published graphs, diagrams, images, charts, tables, numbers, words. In this sequence of representation, a report represents some data which represents the physical world: raw data: evidence reduction, observations, —> construction, and measurements representation This process of evidence construction and representation, though not a black box but certainly a gray area, consists of all the decisions that cause the published findings of a report. These decisions are made, to varying degrees, both in the spirit of doing analytical detective work to discover what is going on and in the spirit of advancing a favored point of view. Thus the integrity of a report depends in part on the integrity of the process of evidence construction; alert consumers of a report must seek some kind of assurance that the process was sensible and honest.12 Given the persistent threat of cherry-picking and aggressive advocacy, consumers of reports and presentations might well ask: Do the report ’s findings grow from the evidence or from the process of evidence construction? Would that process survive the scrutiny of a research audit? Does the presenter have a reputation for cherry-picking? Is the particular field Qf inquiry notorious for advocacy and evidence corruption (investment analysis, land development, new drug research, sales reports)? Are the findings too good to be true? Have the report’s findings been independently replicated? How much does the decision to he made depend on the evidence in the report at hand? Who paid for the work? 11John Bunyan, The Pilgrim '5 Progress (1678), chapter 15. the report or presentation: ——-> findings represented by graphs, tables, diagrams, images, numbers, words ‘2 John Ioannidis, “Why Most Published Research Findings Are False," Public Library afScience, 2 (August 2005), e124, 2 paper with a distinctly provocative title, suggests that in medical research: “A research finding is less likely to be true when the studies con— ducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexi- bility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical signifi- cance.” Ioannidis’s collection of references provides a summary of the considerable evidence on evidence corruption. 148 BEAUTIFUL EVIDENCE Panning, Overreaching, and Economisting The book Painting Outside the Lines: Patterns of Creativity in Modern Art by David Galenson begins with an intellectual history of the work, disarmineg written in the first—person singular with 35 self—references in the first 2 pages. At right, the second page of the introduction. In the first paragraph here, value clearly refers not to merits but to prices of paintings, as art dealers, market transactions, and results of art auctions are mentioned. Described is a 2—variable study: the price of a painting in relation to the age of the artist when the painting was made. Fair enough, but why should anyone care about this correlation? Thus the key problem of the preface: can what might appear to be dustbowl empiricism be parlayed into an interesting book? And, if interesting, true? Subtle shifts in language broaden the study’s conceptual scope, although the underlying evidence remains unchanged. The word value migrates into the comparative, as high-priced paintings are described as the “most valuable work” of an artist. This comes close to a pun, since the dictionary meanings of valuable are (1) monetary worth and (2) meritorious, admirable, esteemed qualities. The paragraph’s final sentence completes the punning equation, as a rhetorical maneuver turns auction prices into “importance of artistic work” via the intermediate term of “most valuable work.” The intermediate link “most valuable work” now blithely converts auction prices into a measure of “best work” of an artist. Views on this matter differ. “The present commercialization of the art world, at its top end, is a cultural obscenity,” said Robert Hughes, an expert on art and art markets, “When you have the super—rich paying for an immature Rose Period Picasso $104 million [in 2004], close to the GNP of some Caribbean or African states, something is rotten. Such gestures do no honor to art; they debase it making the desire for it pathological.” Can rarely-auctioned paintings be understood as commodities? Perhaps the auction price for a painting also reflects previous prices (statistically adjusted for inflation), trends in economic conditions, varying fashions for artists, dealer and auction—house marketing, commission price-fixing, changes in the number of customers for pricey art, provenance of paintings, and gullibility of art—collecting nouveau riche adjusted for trendy. Variations in auction prices result in part from distinctive characteristics of art markets (see, for example, 2 recent books: Meryle Secrest, Duveen: A Life in Art and Christopher Mason, The Art of the Steal: Inside the Sotheby’s—Christie’s Auction House Scandal). But wait: the book is also about the life cycle of artistic productivity, measured in part by . . . auction prices. There’s more: auction prices help explain how artists think and work. Just imagine that. How does today’s price at Christie’s or Sotheby’s explain how Cézanne “planned and executed his work" 120 years ago? Backward run the inferences until reels the mind. Since my college course I had ism etl interested in modern art. In the spring of 1997, in talking with set. in n I dealers, I was intrigued to learn that the value of a particular contempm artist’s work had declined over the course of his career. Wondering hm inmon this was, I realized that I could ‘ find out systematically, in much II». the wayI had measured age effects in my earlier work: I could use marhu mactions—the results of recent auc- tions—to estimate the relationship Ia. en the value of an artist’s paintings and the artist’s age at the time of [ion cution. During the following sum- mer I collected the appropriate (lair {ml made these estimates for the most prominent American painters of It» iiiiicrations, the Abstract Expressionists and their successors. The results were startling. [in - ml valuable work of Jackson Pollock, Mark Rothko, and the other Abshw ressionists was almost invariably done late in their careers, but just Ilm osite was true for Jasper Johns, Rob- ert Rauschenberg, and the other mim- nters of the next generation. The emphasis of my college course on lb» portance of the early work of Johns and Stella had therefore not been m - Ident, for the leading artists of their generation almost all produced the. t valuable work at early ages. Curious whether these results in n Ique to New York in the 19505 and 19605, I then made a similar stud w careers of the great French painters who dominated the first century of In a art. To my surprise, I again found evidence of a shift over time. Motlm nters in France born before 1850, in— cluding Manet and Cézanne, normall- duced their most valuable work late in their careers, but the leading arl "' he generations that followed. from Gauguin and van Gogh through PM. and Braque, typically did their best work when they were much young" This book presents the results --- se two studies of the relationship be- tween age and productivity, togemr- th my interpretation of the causes of those results and consideration of son», their consequences. In doing this research, I found the same excitemru Identifying and explaining systematic patterns in the history of art that I hunt. vays found in doing similar work on problems in economic and social 1in I was amazed how often the results of quantitative analysis, whether w tion prices or textbook illustrations, . could lead to accurate predictions aboa. w individual painters conceived of - their enterprise as artists, and em» r uthow theywent about planning and executing their work. But I was also (in ointed to discover how completely art historians have neglected quanlllal - approaches to their discipline. Such work can offer new insights into lln tory of modern art, and in so doing adds another dimension to exislim. ark based on traditional approaches. Above, David Galenson, Painting Outside the Lines: Patterns of Creativity in Modern Art (Cambridge, Massachusetts, 2001), xiv. Quotation at left: Robert Hughes, “A Bastion Against Cultural Obscenity," The Guardian, June 3, 2004. Histarians' Fallacies (New York, 1970), 274; 2000), 12—13. Quotations at right: David Hackett Fischer, Clifford Geertz, Available Light:Anthropological Reflections on Philosophical Topics (Princeton, CORRUPT TECHNIQUES IN EVIDENCE PRESENTATIONS 149 Fischer’s Historians’ Fallacies describes the punning multiplicity of meaning as “the fallacy ofequivocation . . . whenever a term is used in two or more senses within a single argument, so that a conclusion appears to follow when in fact it does not.” Auction prices allegedly carry enormous information, relevant one way or another to all the following: (1) the most valuable, meaning both price and merit, work of an artist, (2) historical importance, (3) best artistic work, (4) artistic productivity, (5) creativity, (6) how artists conceive their works, (7) how artists paint their works. Consider a thought experiment: Do our theories about (1), (2), (3), (4), (5), (6), and (7) change when new data (say, in 2010-2020) for auction prices become available? If yes, how and Why should art history be rewritten? If no, what exactly is the relevance of auction prices for understanding “patterns of creativity in modern art”? Concepts that explain everything explain nothing. Such concepts deny distinctions between cause and efied, theory and evidence, explanation and correlation. Here the explanatory meaning of “price” in economic history is as mushy as the meaning of “culture” in anthropology, which Clifford Geertz devastatingly describes as “so diffuse and all-embracing as to seem like an all-seasons explanation for anything human beings might contrive to do, imagine, say, be, or believe. . . . We were condemned, it seemed, to working with a logic and a language in which concept, cause, form, and outcome had the same name.” The artist’s age at the time a painting was painted, the other key variable, also has a multiplicity of meanings. When Picasso was 26, it was also 1907, a magical year, the beginning of Cubism. Thus an artist’s age is confounded with the year the painting was created and everything going on during that moment in art. Thus for some artists at some times, age is a proxy (statistical jargon for “pun”) for art history. At least Professor Galenson is satisfied. “Systematic” analysis has produced “startling” findings, which create “surprise,” “excitement,” “new insights.” He is “amazed how often [his] results lead to accurate predictions” although “also disappointed how completely art historians have neglected” his approach. The problem here is not the self-congratulation, but rather that self-reported self-astonishment is presented as evidence for the credibility of one’s own research. Slippery language, stupendous conclusions. This syndrome of overreaching is economically described by a new word: economisting, with accents on can and mist: economisting (e kon'o mist' ing) 1. The act or process of converting limited evidence into grand claims by means of punning, multiplicity of meaning, and over- reaching. 2. The belief or practice that empirical evidence can only confirm and never disconfirm a favored theory. 3. Conclusions that are theory-driven, not evidence-based. See also confirmation bias, painting with a broad brush, Iraqi weapons of mass destruction, marketing, post-modem critical theory, German meaning of “mist”. Ln(PHce) 15.0 145- 14.0 13.5 13.0 12.5 11.5 150 BEAUTIFUL EVIDENCE 145 140 A 13.5 g i 5 13.0 Estimated age—price profile 12-5 Estimated age-price profile 12.0 (or Paul Cézanne (1839-1906) (or Pablo Picasso (1881»1 973) 12.0 2025303540455055606570 102030405060706090 Age Age a m E 5 19 n .z The puns in Painting Outside the Lines nearly make some empirically testable claims about the extraordinary explanatory powers of auction prices. What then is provided in the way of evidence about artists, paintings, and prices? For one thing, the book presents 15 data tables with 2,029 entries (artists’ birthdates, deathdates, ages, and frequency of appearances in exhibitions and art history textbooks)—but not a single auction price or price index or anything else that might measure an economic transaction for a particular painting. Likewise for the text in the 265—page book: no prices. The book’s 2 graphs show a vertical axis, Ln(Price) : the natural logarithm of prices, along with age/ price curves for Cézanne and Picasso. These 2. tidy curves, without any actual data points, are the only quantitative evidence about auction prices in the book. Readers are not even told the number of paintings plotted graphically. The Cézanne age/ price curve shown above are consistent with all sorts of underlying data: ' n a ’ . anhnlswumnsmunmfimtsmuwflmmhmnwsugmmnmawamxsmawwm no A.- ” a, What information is necessary to assess the credibility of the book’s 2 graphs? Here is a checklist, familiar to students of research methods: Number (n) of paintings in the analysis. The data matrix (auction price and age of artist, for the n paintings). Equations of the fitted models, and plotted curves accompanied by the data. Quality of fit of the estimated models. The substantive meaning, quantitatively expressed, of the estimated models.l3 Thoroughly dequantified, Painting Outside the Lines provides none of this information. Elementary standards of statistical evidence are not met by the book’s notable publisher (Harvard University Press) or the notable author (Professor of Economics, University of Chicago). The economisting puns are unsupported by the econornisting evidence. David Galenson, Painting Outside the Lines: Patterns of Creativity in Modern Art (Cambridge, Massachusetts. 2001). 15, figures 2.: and 2.2. 13 The 2-dimensional space for these graphs can yield intriguing quantitative interpreta- tions, a point missed by the fitted polynomial curves at left (which use t”, t‘, t3. t3, t‘, t5 terms to fit auction prices). Instead, consider an exponential model, where p 2 auction price and t 2 age ofthe artist: )2 = aeb' Taking natural logarithms and then letting c = logca allows the model to be estimated by ordinary least squares (0L3). Conveniently, this is also the 2—space of the graphs above: logcp= c+ lit The regression coefficient, 1:, has a useful substantive interpretation: in the model p = 115'”, [ix 100 is equal to the percent increase in [2 per unit increase in t, iflz is small (say, less than 0,25).The proportional increase in p, per unit increase in t, is AP 7 172-111 - _ _i = smceAr=t t=1 A, pl ( z 1 J b,_ b =ae‘. aelr=eb_1 ael’h =(1+b+%!bz+%!b’+...)-1 by the series expansion ofeb. If!) is small, then higher-order terms can be dropped z(1+h)—l=h the 01.5 slope in the log: price by age space. Another possibility is to estimate elasticities by taking the logarithm of both variables. As insiders know, however,the resulting R2 will be much smaller compared to letting 6 polynomial age terms bend and wander around to fit auction prices.What theory of markets—other than barefoot empiricism— specifies such a model? Higher powers of an artist's age make no substantive sense; at a vigorous 80, Picasso’s age to the fifth is a stupefying 3,276,800,000. What then can the regression coefficient on t5 mean: “A 100,000,000 quintic-year change in artist age explains a fls change in logcprices"?! CORRUPT TECHNIQUES IN EVIDENCE PRESENTATIONS 151 IN REPORTS on quantitative work, frequent puns involve the language of mathematical statistics: significance, confidence, maximum likelihood, bias, standard errors, optimal. (Recall the "strict consensus of parsimonious trees with optimized activity patterns” in our earlier example of fitted cladograms.) In statistics, these words have clearly defined technical meanings, The everyday meanings of these words, however, resonate far beyond their narrow technical usage. Statistical tests against the null hypothesis allow some researchers to make punning claims about the significance (everyday meaning) of their findings. Statistical significance (technical meaning) derives from the ridiculousness of the null hypothesis, sample size, fateful and usually false assumptions about independence of observations, assumptions about the sampling distribution under the null hypothesis, and, yes, size of the effect. Thoughtful researchers will report content—relevant measures of effects and not confound those measures with tests against the null hypothesis. Insiders already know this, but the punning problems persist in workaday research practice. Such puns identify mediocrities. Puns from rnicroeconomics and mathematical statistics—and, for that matter, quantum mechanics, evolutionary theory, fractals, chaos theory— claim unmerited credibility by trading on the authority and sometimes the jargon of the original narrow technical achievement. Puns enable overreaching, as previously bright ideas sprawl, grow mushy, and collapse into vague metaphors when extended outside their original domain. Steven Weinberg describes the breakdown in logic when once—precise concepts overreach: Quantum mechanics has been variously cited as giving support to mysticism, or free will, or the decline of quantitative rationality. Now, I would agree that any— one is entitled to draw any inspiration he or she can from quantum mechanics, or from anything else. This is what I meant when I wrote that I had nothing to say against the use of science as metaphor, But there is a difference between inspiration and implication, and in talking of the “telling cultural implications” of quantum mechanics, Professor Levine may be confusing the two. There is simply no way that any cultural consequences can be implied by quantum mechanics. It is true that quantum mechanics does “apply always and every— where,” but what applies is not a proverb about diverse points of view but a precise mathematical formalism, which among other things tells us that the difference between the predictions of quantum mechanics and pre—quantum classical mechanics, which is so important for the behavior of atoms. becomes negligible at the scale of human affairs.” WHEN a precise, narrowly focused technical idea becomes metaphor and sprawls globally, its credibility must he earned afresh locally by means of specific evidence demonstrating the relevance and explanatory power of the idea in its new application. It is not enough for presenters to make ever—bolder puns, as meaning drifts into duplicity. Something has to be explained. ‘4 Steven Weinberg, Facing Up: Science and Its Cultural Adi/mane: (Cambridge, Massachusetts, 2001), 156-157. 152 BEAUTIFUL EVIDENCE Chartjunk: Content-Free Stuff Replaces Evidence For consumers of presentations, gratuitous and cartoonish decoration of statistical graphics provides evidence about the presenter’s integrity and statistical skills: little integrity, no statistical skills. Imagine the quality of analysis behind this chart: a fanciful 3-fold change in revenue growth is depicted by a 7-fold change in bar area and an immense change in the apparent volume of the guy in the track suit. And who would believe in revenue growth projections for some 4. years? For cynical or malicious presenters, chartjunk decoration reflects their contempt for evidence and for their audience. Chartjunk flows from the premise that audiences can be charmed, distracted, or fooled by means of content-free misdirection: garish colors, designer colors, corny clip-art, generic decoration, phony dimensionality. For decoration, it sure is ugly. Audience members at a presentation featuring chartjunk rather than evidence should ask themselves “Is this the quality of analysis that we are relying on to understand a problem or to make a decision? Why should we trust this presenter? Just how high can the presenter count? Does the presenter think we’re fools? Why are we having this meeting?” Revenue Growth For/emst! $40 $30 ___j____ Ear 1 Year 2 CORRUPT TECHNIQUES IN EVIDENCE PRESENTATIONS 153 Microsoft Excel and PowerPoint produce, ineptly, many of the data graphics and tables used in presentations today. Filled with chartjunk, the default graph templates in these programs are useful for constructing deceptive investment and weight-loss pitches. Excel chartjunk can some- times be finessed by skilled users; PowerPoint graph templates are broken beyond repair. For preparing data presentations other than ads in tabloid newspapers, 3 professional statistical graphics program is essential. ALONG with the chartjunk of garish decoration, there is also the chartjunk of graph bureaucracy: useless or optically active grids, boxes and frames around graphs, redundant representations of data, cross-hatched bars. In this spreadsheet assessing the danger to the Columbia space shuttle, the most prominent visual activities are the vast empty framing areas and the grid prisons that surround the unexplained and unreadable numbers. Very little chartjunk appears in the sports, weather, and financial tables in newspapers, or in the tables and graphs published in major scientific journals— since the content is too important and too complex for fooling around with chartjunk.15 (Exp-rim 354 madam am we supervision at a physician .nd a court mum ‘5 On chartjunk, see Edward Tufts, The Visual Display of Quantitative Information (Cheshire, Connecticut, 1983, 2001), chapter 5. Results of Impact Analysis for particle size = 20” x 10” x 6” VMAX V IMPANG (nan (Mn: mine.) (damn 557 20 ‘3 0 mm“ m—sas —m n 6% 609 4‘ 700 533 707 \790 202 644 711 its as 55:! ms 527 67: 1742 we STS-107 Debris impacting Orbilcr Wing 154 BEAUTIFUL EVIDENCE When Evidence is Mediated and Marketed: Pitching Out Corrupts Within EVIDENCE-BASED reports are repackaged and marketed by bureaucracies of secondary presentations: public relations, advertising, programs for public outreach, schoolbook publishing, journalism, and the vast government Ministries of Propaganda. Soon enough, tertiary presentations pitch recaps of opinions about a summary of some evidence somewhere. In repackagings, a persistent rage to conclude denies the implications, complexities, and uncertainties of primary evidence.“ A strong selection bias typically operates: news wins out over olds, recency rather than the quality of evidence decides the relevance of evidence. Secondary re-presentations have much larger audiences than primary reports. Rarely read, the original report blurs and fades away. A handful of technical reviewers might examine the complete set of evidence for a new drug; possibly 500 people read medical journal articles about the drug and 5,000 peruse medical abstracts; 500,000 might see news reports and millions advertisements for the drug, as the original evidence has ultimately passed through 3 or 4 repackagings on the way to market. Or consider textbooks: a successful college text is assigned to 200,000 students, some of whom read it; the primary works summarized by the textbook are read by a few researchers. For reports of government commissions, for each reader of the original, there are perhaps 100,000 readers of mediated secondary versions. Sometimes, actually reading a government report constitutes original research. Substantial resources are devoted to repackaging: pharmaceutical companies spend more on marketing than on drug research, schoolbook publishers more on lobbying textbook selection committees than on writing books, financial firms more on promoting investment products than on discovering them. In repackaging and commodifying evidence-based material for wider distribution, the bureaucracies of secondary presentation get their whack at the primary report: they edit, clarify, interpret, summarize, simplify, over—simplify, spin, tart up, mess up. And they make errors. If you worry about evidence corruption in primary reports, secondary presentations will give you a lot more to worry about. Repackaging adds its own special interpretative filter to the critical process of learning from evidence: ‘5 Flaubert’s phrase “la rage tie vouloir conclure” occurs in his letter to Mlle Leroyer de Chantepie, October 23, 1863, here translated: “The rage for wanting to conclude is one of the most deadly and most fruitless mamas to befall humanity. Each religion and each philosophy has pretended to have God to itself, to mea- sure the infinite, and to know the recipe for happiness. What arrogance and what nonsense! I see, to the contrary, that the greatest geniuses and the greatest works have never concluded." Gustave Flaubert, Correspondance (Paris, 1929), vol. v, 111. raw data: evidence reduction, the primary report: reports produced by the observations, —> construction, and —> findings represented by graphs, tables, -> bureaucracies of secondary measurements representation diagrams, images, numbers, words and tertiary presentations r t t «— ‘ Secondary bureaucracies of presentation may lack the technical skills and substantive knowledge to detect their mistakes. In the sausage-making, chop-shop production of many secondary and tertiary presentations, absent are methods that routinely help enforce the intellectual quality corrupting feedback, as bureaucracies of presentation undermine the integrity of evidence and analysis CORRUPT TECHNIQUES IN EVIDENCE PRESENTATIONS 155 and integrity in primary work: external review and final approval by content experts, professional standards of evidence, skeptical intelligence. And then, to impede direct communication between originators and audiences, bureaucracies of secondary presentation may limit access to the primary report through copyrights, inconvenient or costly subscriptions, and overreaching claims of corporate privilege or government secrecy. PRODUCERS of primary reports may console themselves about distortions in mediated versions of their work by recalling the words of the noted marketeer P. T. Barnum: “Without publicity a terrible thing happens— nothing.” Evidence cannot become relevant if no one knows about it. How can creators of evidence-based reports defend the integrity of their work against repackaging mischief? To start with, primary reports should be inexpensively and directly available (internet, self-publishing, leaks to journalists), short-circuiting the bureaucracies of secondary presentation. Creators of original material should never surrender rights to their work. They should prepare their own secondary reports to replace repackagings. They should police secondary mediated versions of their work, and turn the mistakes of the easily mocked pitch culture into notorious examples. And consumers of evidence should stay reasonably close to primary sources and to evidence—interpreters who provide honest unbiased readings. There’s almost always something to learn from a first—rate primary source (for example, that secondary versions incorrectly interpret the primary). For consumers, an indicator of an untrustworthy presentation bureaucracy is its denial of access to primary evidence (for example, by requiring that all publications be pre—approved by the PR department). Another sign is that repackagings always manage, somehow, to support a predetermined line. Wise consumers should keep in mind that reports from an unbiased interpreter will not always agree with their own prior views. And that one’s allies are not all that less likely to corrupt evidence than one’s opponents.” BY generating corrupt repackagings, an organization’s bureaucracy of secondary presentations may come to corrupt the integrity of work within the organization. Compromised external communications promote compromised internal communications, as pitching out corrupts within. If a corporation distorts evidence presented to consumers, stockholders, and journalists, then it may soon lie to itself Or, similarly, the chronic problem of government intelligence agencies: once the collection and selection of evidence starts to become fixed around a pre-determined policy line, intelligence agencies may become perpetually unintelligent, confused about the differences between detective work and marketing. Or, even in writing novels: “Cliché spreads inwards from the language of the book to its heart. Cliché always does?” Nowadays the common tool for pitching out—and corrupting evidence within—is PowerPoint. ‘7 A good guide for assessing the quality and the credibility of nonfiction reports is Sarah Harrison Smith, The Fact Checker’s Bible'A Guide to Getting It Right (New York, 2004). 1‘ Martin Amis, The War Against Cliche (New York, 2001), p. 137. ...
View Full Document

This note was uploaded on 01/12/2010 for the course CS 147 at Stanford.

Page1 / 16

tufte - I want to reach that state of condensation...

This preview shows document pages 1 - 16. Sign up to view the full document.

View Full Document Right Arrow Icon
Ask a homework question - tutors are online