Ethics in Statistical Practice and Communication Five Recommendations

![rw-book-cover](https://rss.onlinelibrary.wiley.com/cms/asset/9a5beb2a-dde9-47d6-96f2-d6240f8a4c04/sign.2018.15.issue-5.cover.jpg?trick=1724249871376) ## Highlights Ethics in statistics is about more than good practice. It extends to the communication of uncertainty and variation. **Andrew Gelman** presents five recommendations for dealing with fundamental dilemmas ([View Highlight](https://read.readwise.io/read/01j5tmtwf1vf59x2wpfyp6m52w)) --- **S**tatistics and ethics are intertwined, at least in the negative sense, given the famous saying about lies, damn lies, and statistics, and the well-known book, *How to Lie with Statistics* (which, ironically, was written by a journalist with little knowledge of statistics who later accepted thousands of dollars from cigarette companies and told a congressional hearing in 1965 that inferences in the Surgeon General's report on the dangers of smoking were fallacious). ([View Highlight](https://read.readwise.io/read/01j5tmv3sqxr4dkq0fn2rxgky7)) --- The principle that one should present data as honestly as possible is a fine starting point but does not capture the dynamic nature of science communication: audiences interpret the statistics (and the paragraphs) they read in the context of their understanding of the world and their expectations of the author, who in turn has various goals of exposition and persuasion – and all of this is happening within a competitive publishing environment, in which authors of scientific papers and policy reports have incentives to make dramatic claims. ([View Highlight](https://read.readwise.io/read/01j5tmva2yj9ppgb4hs2t97cva)) --- Just as you write in part in order to figure out what you are trying to say, so you do statistics not just to learn from data but also to learn what you can learn from data, and to decide how to gather future data to help resolve key uncertainties. ([View Highlight](https://read.readwise.io/read/01j5tmvm72wv87rvfjmmxad1jt)) --- Statistical conclusions are data-based and they can also be, notoriously, dependent on the methods used to analyse the data. An extreme example is the influential paper of Reinhart and Rogoff on the effects of deficit spending, which was used to justify budget-cutting policies.[2](https://rss.onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2018.01193.x#sign1193-bib-0002) In an infamous mistake, the authors had misaligned columns in an Excel spreadsheet so their results did not actually follow from their data. This highly consequential error was not detected until years after the article was published and later researchers went to the trouble of replicating the analysis,[3](https://rss.onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2018.01193.x#sign1193-bib-0003) illustrating how important it is to make data and data-analysis scripts available to others – providing more “eyes on the street”, as it were. ([View Highlight](https://read.readwise.io/read/01j5tmvv7tgrx4wd16vvmn7wxx)) --- Open data and open methods imply a replicable “paper trail” leading from raw data, through processing and statistical analysis, to published conclusions. ([View Highlight](https://read.readwise.io/read/01j5tmw267ff18r1rjww1v4wsx)) --- Statistics professors promote quantitative measurement, controlled experimentation, careful adjustment in observational studies, and data-based decision-making. But in teaching their own classes, they (we) tend to make decisions and inferences based on non-quantitative recall of uncontrolled interventions, just trying things out and seeing what we see – behaviour that we would consider laughable and borderline unethical in social or health research. ([View Highlight](https://read.readwise.io/read/01j5tmw6wpb8613gks89b2rx39)) --- The point here is not that Bayes is better (or worse) but that, under any inferential philosophy, we should be able to identify what information is being used in methods. ([View Highlight](https://read.readwise.io/read/01j5tmxxg15gfshthrmwyx4tx7)) --- In some settings, prior information is as strong as or stronger than the data from any given study. For example, Gertler *et al*. reported on an early-childhood intervention performed in an experiment in Jamaica that increased adult earnings (when the children grew up) by an estimated 42%, and the result was statistically significant, thus the data were consistent with effects between roughly 0% and an 80% increase in earnings.[7](https://rss.onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2018.01193.x#sign1193-bib-0007) But prior knowledge of previous early-childhood interventions suggests that effects of 80%, or even 40%, are implausible. It is fine to present the results from this particular study without reference to any prior information, or to include such information in a non-Bayesian way, as is done in power calculations. But it is not appropriate to offer policy recommendations from this one estimate in isolation. Rather, it is important to understand the implications of the method being used. ([View Highlight](https://read.readwise.io/read/01j5tmy1wn8em1easegspqhjht)) --- When it comes to data collection, institutional review boards can make it difficult to share one's own data or access others', and when it comes to reporting results, journals favour brevity over completeness. Even in this online age, top journals often aspire to the *Science/Nature* format of three-page articles. Details can appear in online appendices, but these usually focus not on the specifics of a study but rather on supplementary analyses to buttress the main paper's claims. Published articles typically focus on building a convincing case and giving a sense of certainty, not on making available all the information that would allow outsiders to check and replicate the research. ([View Highlight](https://read.readwise.io/read/01j5tmzej2ge01c0c5zf2ga6rd)) --- For a study to be ethical it should be informative, which implies serious attention to measurement, design, and data collection. ([View Highlight](https://read.readwise.io/read/01j5tn0qycrcps5r0a9vvw8cet)) --- A system of marginalising criticism creates an incentive for authors to promote dramatic claims, with an upside when published in top journals and little downside if errors are later found. ([View Highlight](https://read.readwise.io/read/01j5tn0y4rbeb148cdavaa07bb)) --- Researchers can do even better by criticising their own work, as done by Nosek, Spies, and Motyl, who performed an experiment to study “embodiment of political extremism”.[9](https://rss.onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2018.01193.x#sign1193-bib-0009) Their initial finding: “Participants from the political left, right and center (*N* = 1979) completed a perceptual judgment task in which words were presented in different shades of gray. … The results were stunning. Moderates perceived the shades of gray more accurately than extremists on the left and right (*p* = .01). Our conclusion: political extremists perceive the world in black-and-white, figuratively and literally.” ([View Highlight](https://read.readwise.io/read/01j5tn1aph2tpt6f4f2n6edynd)) --- Before publishing this result, though, the authors decided to collect new data and replicate their study: “We ran 1300 participants, giving us .995 power to detect an effect of the original effect size at α = .05.” And then the punch line: “The effect vanished (*p* = .59).” How did this happen? The original statistically significant result was obtained via a data-dependent analysis procedure. The researchers compared accuracy of perception, but there are many other outcomes they could have looked at: for example, there could have been a correlation with average perceived shade, or an interaction with age, sex, or various other logical moderators, or an effect just for Democrats or just for Republicans, and so forth. The replication, with its pre-chosen comparison, was not subject to this selection effect. ([View Highlight](https://read.readwise.io/read/01j5tn1dwdwdvzykv40qvbc39z)) --- Many fields of empirical research have become notorious for claims published in serious journals which make little sense (for example, the claim that people react differently to hurricanes with male and female names,[10](https://rss.onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2018.01193.x#sign1193-bib-0010) or the claim that women have dramatically different political preferences at different times of the month, or the claim that the subliminal image of a smiley face has large effects on attitudes on immigration policy[11](https://rss.onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2018.01193.x#sign1193-bib-0011)) but which are easily understood as the inevitable product of explicit or implicit searches for statistical significance with flexible hypotheses that are rich in researcher degrees of freedom.[4](https://rss.onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2018.01193.x#sign1193-bib-0004) ([View Highlight](https://read.readwise.io/read/01j5tn24tgay09g9kwa74649f8)) --- In statistics, we use mathematical analysis and stochastic simulation to evaluate the properties of proposed designs and data analyses. Recommendations for ethics are qualitative and cannot be evaluated in such formal ways. ([View Highlight](https://read.readwise.io/read/01j5tn2h5p4zeansrs3z25bjw9)) --- So far, this is just a story of statistical confusion perhaps abetted by incentives towards reporting dramatic claims on weak evidence. The ethics comes in if we think of this entire journal publication system as a sort of machine for laundering uncertainty: researchers start with junk data (for example, poorly-thought-out experiments on college students, or surveys of online Mechanical Turk participants) and then work with the data, straining out the null results and reporting what is statistically significant, in a process analogous to the notorious mortgage lenders of the mid-2000s, who created high-value “tranches” out of subprime loans. ([View Highlight](https://read.readwise.io/read/01j5tn2qtjvhmkr41xv0fg0nse)) --- Ethics is, in this way, central to statistics and public policy. We use statistics to measure uncertainty and variation, but all too often we sell our methods as a sort of alchemy that will transform these into certainty. The first step to not fooling others is to not fool ourselves. ([View Highlight](https://read.readwise.io/read/01j5tn3022gs3yatz49w9625cs)) ---