Emerson Statistics | The Scientist Game

Scott S. Emerson, M.D., Ph.D.
Professor Emeritus of Biostatistics
University of Washington
Seattle, Washington
semerson@RCT-design.com

When the Statistical Crimes Against Humanity Trials are held at some point in the (hopefully near) future, I believe the top three charges against teachers of introductory applied statistics courses and authors of introductory texts will be:

1. Ever making mention of an assumption of data having a normal distribution or any other specific distribution, for that matter (instead, we should in the most typical case talk about whether our sample size is sufficiently large to justify an approximate normal distribution for our estimates of effect),

2. Any emphasis on the P value as the primary result of a statistical analysis (instead, we should focus on the precision of our estimates as measured on the scientific scale of a frequentist confidence interval or Bayesian credible interval), and

3. Promotion of experimental design based on 80% power (instead, we should discuss the design of experiments to discriminate between hypotheses).

The common thread in the definition of these crimes is my belief that an introductory applied statistics course should be titled "The Use of Statistics to Answer Scientific Questions". That is, for the students in a nonmajors applied statistics class, Statistics is merely a tool to be used in Science (in the broadest sense of the word), and Science is about proving things to people. I find, however, that introductory statistics texts completely lose sight of the need to emphasize the scientific method, which is ultimately based on an adversarial view of proof: Scientific studies should be directed toward discriminating between competing (sets of) hypotheses. Hence, goals of experiments should not be stated as a desire to "prove that" a certain hypothesis is true (it might not be), but instead to "decide which" of two hypotheses might be true.

In teaching experimental design, then, we must first discuss the ways in which the applied scientist should define the most plausible competing hypotheses and then design an experiment that would tend to result in different outcomes under those competing hypotheses. In order to address the nondeterministic nature of the outcomes, we must describe a framework for defining what we mean by statistically discriminating between hypotheses.

Drawing on the game of Eleusis (invented by Robert Abbott and described by Martin Gardner in Scientific American in October, 1977), I developed an exercise ("The Scientist Game") to illustrate some common foibles of scientists and statisticians when they first approach the statistical design of an experiment. The "game" aspect was originally used by me to teach elementary school students (grades 2 through 5) about the scientific method. But I later started presenting the game to my colleagues at the Arizona Cancer Center, audiences in statistical seminars, and students in my statistics classes, In a revelation that perhaps says more about me than the true recreational appeal of the game, I have also presented it to fellow attendees at a wine and cheese party and fellow passengers on commercial airlines.

In this game, the Scientist is charged with discovering the rule that dictates patterns of objects appearing over time in a greatly simplified universe. Starting with observational data, the Scientists are to first identify the "low hanging fruit", and then devise an experimental strategy to test the scientific hypotheses that they have formulated from their observation. In this exercise, I have found that the vast majority of participants, no matter the level of prior scientific experience, make choices that would correspond to very poor experiments. In my presentation I then demonstrate what I would regard a more appropriate scientific approach and make brief comments on the role of multiple comparisons (data dredging or data mining), binary searches, and incorporation of Bayesian priors into the optimal design strategy. Lastly I draw a parallel to the choice of statistical power when designing a study in the real world: My claim is that the only thing that should differ is the use of confidence intervals or credible intervals to "discriminate" between hypotheses.

An audio/video recording of The Scientist Game is available here. The PowerPoint slides are available as a pdf.

RCTdesign.org : The Use of Statistics to Answer Scientific Questions

Emerson Statistics: The Scientist Game