Spencer SCD @ Vandy pt. 3: Schools of analysis for SCDs and relevant design choices

Schools of Analysis and Research Design Choices

When I was developing my own review of literature for the analysis of SCDs for my qualifying process and later dissertation, I sorted the analysis of SCDs into roughly four buckets.

  1. Visual analysis
  2. Non-overlap indices
  3. Model-based approaches (including simple models for effect sizes)
  4. Randomization tests

John Ferron gave a sort of opening talk about the analysis of SCDs. The talk was helpful to situate many of the discussions throughout the day. He proposed three broad kinds of analysis in SCDs that I found interesting, in part because it was similar to but also different from the way that I have mentally divided up analysis of SCDs in the past. He proposed three general classes of analysis:

  1. Visual analysis
  2. Effect size estimation
  3. Null hypothesis testing

I think John’s organization is better for a way of looking at SCDs that I’ve been contemplating than my own scheme above. These three forms of are appropriate for related, but not entirely equivalent questions.

Visual analysis reflects a desire to know whether some outcome (usually behavior) has changed enough in a way that both matters and replicates. I have some thoughts about what constitutes replication (I think I mentioned this in the last post?) that I should write about later but suffice it to say that I think the idea of replication in SCDs is tricky and interesting in a way that is kind of specific to the visual analysis tradition but might also relate to issues in many other research design and analysis contexts.

Effect size estimation asks, “how big is the effect of the intervention?” Coupled with meta-analysis, we can ask further questions about treatment variability, heterogeneity, and structural factors that impact the how big the intervention effect is. John grouped non-overlap indices in this bucket, but I think I would put them closer to the visual analysis bucket. Non-overlap indices are insensitive to magnitude. Once there’s no overlap between phases, the difference between a small effect and a large effect cannot be observed based on nonoverlap effects alone. This is somewhat parallel to the idea that I think I see in visual analysis (although I am by no means an expert) that cares less about the precise magnitude of the effects than whether it is big enough.

Null hypothesis testing asks the question “are the observed effects unlikely to have occurred if the true effect is zero?” or more colloquially (if maybe not quite correctly) “are the effects different from zero?” Null hypothesis testing is perhaps less concerned with the exact value of the outcome and more concerned that an effect is “real” in some probability-related sense. The effect size estimation and null hypothesis testing approaches are often used in tandem, although not always. Many of the randomization approaches that have been proposed for SCDs do not directly include a component of effect size estimation.

As we consider these different types of analysis, I think it’s helpful to bring up the inductive/deductive continuum offered by Brian Cook and Austin Johnson. To paraphrase the description, rather than a single unified approach, SCDs encompass a set of research practices that answer a variety of questions. Some SCDs are highly inductive, engaging in an open-ended and flexible investigation of how the researcher or clinician might be able to move the behavior of someone experiencing distress or exhibiting behaviors that are causing some sort of social difficult. In contrast, some SCDs are highly deductive, trying to better understand the consequences of some well-defined intervention. Many (perhaps most?) SCDs fall somewhere in the middle of this continuum of research questions. But if you look at the variety of analysis approaches and this continuum of investigation style, you may begin to see (or at least, I have recently begun believe) that SCDs are asking a variety of questions, and the kind of inferences they are making are not all the same. Consequently, it may be that an effect size from a highly inductive study may not be all that interpretable, because the data from the intervention phase are not really all from the same “intervention.” It also seems to me that research practices that are helpful for certain kinds of inductive studies are not entirely helpful for deductive studies. Research practices which may support clarity in visual analysis may represent a problem for effect size estimation or null hypothesis tests (see, for instance, my work on simulated response-guided baselines).

Some practices are taken as a given, I think. For researchers who use response-guided baselines, the question of whether or not to be response-guided isn’t a question: of course you are always gathering data in a response-guided fashion! But I’m not sure this is the right response. This is a research design question that you should consider carefully. It might be that there are ethical reasons that require you be responsive, but you should recognize that as a design choice it isn’t always optimal. Given the other talks there that thoughtfully challenged conventional wisdom about what is or is not the best design choice in SCDs (such as Tim Slocum and Jennifer Ledford’s dual talk on nonconcurrent multiple baseline designs) I hope we are entering a time where we can be more thoughtful about the kind of inference researchers are interested in making in their SCD, the kind of analysis that can answer their question, and the sorts of research practices that can support (or harm!) understanding the evidence from their research.

Daniel M. Swan
Research Associate, Applied Research Methods and Statistics Lab

I am a methodologist whose work has primarily focused on single-case designs.