Lesson 3:

Negative controls

Summary

This lesson dives deep into negative controls, classic comparisons that define what an outcome looks like when there is “no effect.” Negative controls can take the standard form of withholding an intervention in its entirety from an otherwise matched group, or can be targeted to isolate an independent variable more specifically from other components of intervention in an experiment. A negative controls planner will help to brainstorm and compare targeted controls, offering a reusable framework for planning out negative control groups in an experiment.

Goal

Learn how to use negative controls to contextualize outcomes and rule out alternative explanations for results.

3.1 Why test?

In the last lesson, we outlined a plan to find out if Rareberries cause mice to develop exceptional coordination. We had an independent variable and a dependent variable. We worried about other kinds of variables that could interfere with our results, and we looked at strategies to control them. “Test” was only one of those strategies! So, is this optional?

Let’s take a closer look at our research question. We want to know whether Rareberries affect coordination development in mice.

What makes that question interesting is whether our Rareberry-fed mice develop “exceptional” coordination. A researcher conducting an experiment on the Rareberry is likely interested in discovering a significant result. Significant results like “exceptional” are relative. Exceptional compared to what?

This calls for a negative control!

A negative control provides a baseline for comparison, defining “0” or “no effect.”

But, until researchers bring Rareberries to mice, all mice everywhere could conceivably have “no effect” from this intervention. Does every negative control need to be a negative control group?

3.2 The problem with data sourced from references

Sometimes, successful comparison is possible without a test group that does not receive the intervention. What if:

We measure outcomes with a standard test. Is there an extant, validated, score guide?
We are familiar with this subject and experiment. We know ‘exceptional’ when we see it.
We could determine average scores by consulting previous data, publicly available repositories, or conducting a meta-analysis.

If we are just exploring and want to get an idea about the situation before we do a more rigorous experiment, these kinds of comparisons may be appropriate. Cheap exploratory work, when it is reported as such, can be very useful. But, even with exploratory work, we need to know the risks. The only thing worse than wasting valuable research time is unwittingly drawing the wrong conclusions.

In this case, we aren’t going to let our Rareberries go to waste. This superfood is rare–it says so in the name. We want to maximize the information we can obtain from our resources, given our operational constraints.

All three examples listed above involve using formal or informal reference data in order to establish a baseline for comparison. But, every experiment is unique. Things happen at different times, with different arrangements, in different places, with the researcher wearing different lucky shirts. Most unique features of an experiment will not seriously impact its results. But, we cannot identify all the differences between two experiments.

Wherever those reference data came from, their experimental conditions were their own. Every single difference is now a potential variable that could explain or obscure an intervention effect in the study of Rareberry-eating mice

When an experiment includes a direct comparison group, we don’t have to worry so much about these kinds of concerns. We can compare groups with and without our intervention knowing that all subjects shared whatever unexpected conditions might have made our particular experiment unique. This is the most important reason to decide whether or not you need a control group.

3.3 What about our own past data?

Some pre-existing data might be better than others. If a lab has run rotarod tests with juvenile mice multiple times previously, the research team might want to use previously collected data as a negative control.

This is an option, but it carries risks. A historical dataset from the same laboratory will have many more things in common with a new set of experimental conditions than reference values or data shared from another lab. In a direct comparison between using historical data from the laboratory conducting the experiment and a general reference data, the historical data is preferable.

But, historical data has limitations. There may be potential unknown variables unique to either current or past experimental conditions. A particular employee may or may not be present. Changes in weather may cause even temperature-controlled environments to experience variability. Only a concurrent negative control has the most in common with an experiment. As a result, a concurrent negative control is the absolute best way to be sure that we have a well-matched group for comparison.

3.4 The “no intervention” negative control

A concurrent negative control gives an experiment a well-matched point of comparison to test for a difference between “effect” and “no effect.”. But, what does that mean? What kind of negative control group do we want?

What most people would consider a “default” negative control group simply withholds the intervention. Half of the mice receive Rareberries, and half of the mice receive a typical diet. All other factors are matched between both groups.

“No intervention” is a good negative control. But, how good a negative control actually is depends on what the research team is interested in and how many other variables might be at work within an intervention.

It can be tempting to view interventions as a unified whole. Either a mouse receives a Rareberry, or it does not. However, there are two potential problems to consider. First of all, we need to consider what is involved in our Rareberries treatment. Is the addition of Rareberries truly the only difference between our intervention and control groups? Furthermore, we want to scrutinize our research question to ask what it is we are truly interested in. Is it only the Rareberries themselves, or do we actually have a hypothesis that some key ingredient therein might be responsible? If we are interested in an independent variable that is in fact only one component of a more complex intervention, , we may want to use a control group that is more precisely designed to isolate our component of interest.

Consider the variables that might be at play in assessing the effects of Rareberries on juvenile mouse coordination. Rats who receive Rareberries might receive:

A delicious treat that inspires acrobatic performances.
Extra moisture or nutrients.
A lower percentage of daily calories from mouse pellets.
An extra food delivery visit after the mouse pellets have been dispensed.

Most interventions consist of multiple components, and not all of those components are actually the thing we are trying to test. It might be interesting if an extra food delivery visit affected motor coordination, but it isn’t exactly the question we are interested in. In fact, these look a lot more like the “variables of concern” we talked about in our last lesson.

3.5 The targeted negative control

Component variables that “hide” within a intervention require a more narrowly targeted negative control. Precisely targeting a negative control is a chance to employ creative solutions to design control groups that, alone or in combination, have every single thing in common with an intervention group except the precise variable the team is interested in testing.

This requires refining the exact object of interest in an experiment. In a classic example, an intervention in which a drug is suspended or dissolved in a vehicle solution might be compared against a “vehicle control,” narrowing the object of interest to the drug itself rather than the complete intervention of “drug + vehicle.” The research team investigating Rareberries wants to know if this superfood affects coordination development in mice.

To begin with, we may want to identify some features of Rareberries that could be responsible for improvements in juvenile mouse coordination. Rareberries contain things like water, sugar, and antioxidants common among berries.

One targeted negative control could attempt to isolate the more unique components of Rareberries by comparison with a negative control that is better matched for water, sugar, and antioxidants compared with a standard rodent diet. To do this, the research team could identify another berry in the same family (heath) with a similar size and makeup to a Rareberry, like a blueberry.

At this point, our team could thus have an intervention group, a “no intervention” negative control, and a more precisely targeted negative control.

This targeted negative control is just one possibility. Can you think of any other precisely targeted negative controls?

3.6 How many?

Now that we have an idea for a negative control that is more precisely targeted, does that remove the need for a “no intervention” negative control? Are there other alternatives worth exploring?

It depends. What controls are appropriate for an experiment depend on the nature of the control options and the value of revealing component variables that may affect experiment results.

Consider the use of blueberries as a targeted negative control in the Rareberries experiment. blueberries are a different berry, rather than being the same berry without a particular compound of interest.

This does not mean that using blueberries is a problem, or that blueberries are an unsuitable targeted negative control. Blueberries capture several of the concerning components identified above. They are a delicious treat, they contain extra moisture and nutrients, they lower the percentage of daily calories from pellets, and they can be delivered via an extra food delivery. Because the research team does not know whether Rareberries improve motor coordination, speculation about the specific causal mechanism whereby a given compound within a Rareberry causes improvements in motor coordination is very difficult to do in an informed way.

However, like many targeted negative controls, the introduction of blueberries also invites more questions. What if there’s something new and unique about blueberries - maybe something that inhibits motor coordination development? If the characteristics shared by blueberries and Rareberries are in fact responsible for an effect, do we want to be able to test that as a hypothesis? When we include multiple negative controls, these more targeted comparison groups can alternatively be thought of as their own interventions, testing alternative hypotheses about components of treatment that compete with our original hypothesis.

The more creative a researcher gets with targeted negative controls, the more helpful it can be to have multiple negative control groups. Having both a blueberries and a “no intervention” negative control would allow the research team to be more confident that the features causing improvements in juvenile mouse coordination are specific to Rareberries. But, each negative control group added to a study requires making practical changes to the makeup of that study. Dividing samples into three separate groups could threaten the statistical power of proposed tests. . Procuring and testing more mice to resolve that problem could expand the budget and schedule of the study dramatically. Lesson 8 of this unit gives additional guidance and practice for thinking through the trade-offs that come from attempting to implement controls in an imperfect world.

3.7 Activity: Negative controls along the way

Let’s practice designing targeted negative controls!

In this activity, you will receive a short description of an experiment and a list of features that are shared (or not) with a “no intervention” negative control. Your job is to propose additional negative control groups that could be used for the experiment. Use the dropdown cells in the table to create more specifically targeted negative controls, and try to isolate additional components of interest.

After you have shared your ideas, you will have a chance to review others’ answers and reflect on the design challenges at play.

Embedded Webpage

Click anywhere to start

Post-activity questions:

How many negative control groups were you able to dream up? How many of them do you think would truly be realistic or wise to apply?
What criteria are you considering when you think about what is realistic or wise? How might this apply to your own experimental work?

In this lesson, we have learned about negative control groups for our experiment. These define what we mean by “no effect” and will play an important role in the statistical analysis we choose to evaluate the answer to our question.

It is also worth noting that In many experiments there are, additionally, techniques applied at various points that require their very own “baseline” or “0” value be defined in order to calibrate a measurement. When a spectrophotometer calls for a “blank” sample, or a qPCR assay calls for a “no template control”, these are also a kind of negative control - and they are just as important!

After selecting the ideal negative control groups for your study, it’s a good idea to have a look through your technical plans, and take note of the presence and quality of negative controls as needed at each step.

Takeaways:

A negative control provides a baseline for comparison by defining what a “no effect” result looks like.
Using reference values as a baseline for comparison can be instructive, and becomes less risky the more similar the conditions for the reference value are to the current experiment. Only a concurrent negative control ensures that we have the closest possible match for comparison.
Targeted negative controls are most necessary when constituent components of an intervention are identified as variables of concern, and can help ensure that experimental outcomes are specific to the presumed causal mechanisms at play in the intervention.
Conducting an experiment with multiple negative controls can allow researchers to investigate the effects of multiple components of an intervention on their results, but must be weighed against practical constraints.

Reflection:

Have you ever used reference data or a historical control as a baseline? Are there any pros and cons specific to your research context that relate to using these kinds of data for comparison?
What targeted negative controls have you employed in your own research?
Consider your previous and planned experiments. Could you modify a previous or future experiment to include a targeted negative control?
When you read a paper, do you take note of the kinds of negative controls that they employ? If so, do you have any favorites? If not, give this a try!

Lesson 4:

"Before" as a negative control

Summary

This lesson addresses the use of baseline measurements made before and/or after treatment as a potential form of negative control. A repeated measures design allows for researchers to sample two or more times from a uniquely matched subject, which can eliminate a number of problems posed by differences between intervention and control groups. But, time controls expose all subjects to the complete intervention, which means that problems specific to the intervention that could have been identified by targeted negative controls may go undiscovered. Study designs that combine both pre-post measures and targeted negative controls can achieve the best of both worlds.

Goal

Explore reference points beyond control groups as negative controls to understand the costs and benefits of testing repeated measures as a form of control.

4.1 The problem with time

The earliest studies of lobotomy, conducted by António Egaz Moniz, did not include any control group. Instead of comparing results against an equivalent population who had not received the intervention, a population given a sham intervention, or a population given another intervention, this research compared each patient with themselves. This, combined with metrics that defined patient improvement on the basis of compliance, made for a very convincing story.

Using “before” as a control is a classic strategy and is very common in experiments with a repeated measures design. This design has many unique advantages, including reducing the number of subjects in an experiment, lowering associated operational costs to conducting the experiment, and allowing study of changes over time. If the disease treated by lobotomy had been different in nature, this experimental design would have posed far less of a risk of getting it wrong.

Time-based controls can lead researchers astray. To see how, consider the Rareberries experiment from the previous lessons.

What if we had, instead of creating a control group, simply tested our mice when they first arrived, then again after several weeks of a Rareberries-rich diet? Would improvement indicate an effect of Rareberries?

Most people would not assess an effect that occurs during development by comparing measurements of a group taken before and after the experiment period. If the mice arrived as pups and left as adults, their performance and coordination would inevitably improve. Physical and cognitive development over time also coincide with the period of eating the Rareberry diet, which makes it impossible to attribute the cause of changes observed at the end of the study to their diet. A study assessing coordination development over time without a control group risks being confused by effects of maturation, as juvenile mice are the population selected because we want to see the impact of a Rareberry diet on their development, and due to their young age they will also drastically improve their coordination by the end of the study.

The alternative explanation for an improved outcome, in this case, is obvious. The danger comes with those that aren’t. When “before” is used as a control group, time - and anything that occurs while it passes - can be an alternative explanation for an outcome.

If you know exactly what to expect from an outcome over time, the risk posed by time can be lower. Diseases with stable, definitive symptoms are less likely to have outcomes that are affected by time in unexpected ways. Consider disorders of metabolism, where certain functional proteins may be missing, for example. Outcomes can still be complex, but certain underlying features can’t simply change over time.

Acute mental illness is a different story. Anything that has a reasonable likelihood of changing “spontaneously” (a handy word for any cause we aren’t aware of that occurs over time) is vulnerable to time as an alternative explanation when outcomes are compared only with an earlier baseline.

4.2 Recovery and other complex time points

Time proceeds in one direction only. This means that if you expect your intervention to create a pattern over time, differences between that pattern and one-directional change may help you to distinguish an effect of your intervention from an effect related to time.

Recovery is a particularly useful example of an experimental design using time points, because that pattern involves an expected return to baseline which can be used as an additional negative control.

Pre measurement (negative control) -> intervention -> recovery (negative control)

Recovery may not be a perfect negative control, such as in cases where a system is enduringly altered by an intervention. As a reference point to document the condition of subjects after the intervention has elapsed, however, it is a valuable control for potential confounds with time.

The greatest strength of a repeated-measures design is the ability to compare subjects to themselves. There is nothing more well-matched to an individual than itself. Important variables will be equally represented in both pre- and post-intervention groups. This conveniently avoids a pernicious difficulty in studies with multiple groups, where constraining some variables and randomizing to distribute others helps to minimize the impact of key covariates, but intervention and control groups will never be identical. Not so for within-subject comparisons!

4.3 Remaining variables of concern

Using repeated measures as controls may allow for subject characteristics to be ideally matched, but subject characteristics are not the only sources of concern in a study. In Lesson 2 we explored how variables in the environment, intervention, and measurement process can all be sources of concern. In Lesson 3 we unpacked how it is often necessary to establish means to assess the impacts of multiple components of an intervention, in addition to the presence or absence of the intervention. Concerns associated with the intervention itself–vehicle, handling procedures for intervention-receiving subjects, unanticipated interactions with environment, and more are still bundled with our independent variable, making it more difficult to make strong, specific claims on the basis of the work.

This is similar to the problem with “no intervention” control groups. Our only negative controls are either before or after the intervention, our only negative control conditions are effectively “no intervention.” Narrowing our question from “everything about the intervention” to one specific hypothesized component requires one or more targeted negative control groups to creatively isolate the component in question.

4.4 Both is best

There are clear advantages to repeated measures designs and targeted negative control groups. This makes experimental designs that employ both a particularly strong choice.

This choice can also add complexity to a study. When researchers apply both a repeated measures design such as “before intervention” and “after intervention” measurement, and a control group that does not receive the intervention (with its own “before” and “after”), there are new opportunities for error. As you might suspect, this complexity can create new challenges as well as opportunities.

Activity:

In our next activity, you will be given results from an experiment. Your job is to read and digest the results, then share your thoughts on what claims you think the research team can make on the basis of the data they have gathered.

Embedded Webpage

Click anywhere to start

Post-activity questions:

What information did you use to interpret results in this example? Did you identify a statistical error immediately?
What other observations did you see in the collected data? Do you agree with others’ evaluations?

The term “control” can mean many different things. This is not bad. But, cultivating an awareness of different applications of the term control and understanding why each of these roles matter is key to doing more rigorous science.

Writing is a medium of communication that represents language through the inscription of signs and symbols.

In most languages, writing is a complement to speech or spoken language. Writing is not a language but a form of technology. Within a language system, writing relies on many of the same structures as speech, such as vocabulary, grammar and semantics, with the added dependency of a system of signs or symbols, usually in the form of a formal alphabet. The result of writing is generally called text, and the recipient of text is called a reader. Motivations for writing include publication, storytelling, correspondence and diary. Writing has been instrumental in keeping history, dissemination of knowledge through the media and the formation of legal systems.

4.5 The DINS error

In the activity, you reviewed a dataset in which researchers had both time controls and a negative control group, but only explicitly tested one of them to evaluate the significance. It might be convincing to look at a study with two groups, one that achieved a significant result compared to its baseline and one that did not, and conclude that there is a significant difference between the groups. But, when there is no formal test to compare intervention and control groups, this opens the door to subjective interpretation of data, and that is particularly hazardous when we look to a difference in “before/after” significance to inform that judgement.

Remember:

A difference in nominal significance is not the same as a significant difference.

The fact that “before” and “after” are significantly different in the intervention group but not the control group does not tell us whether the intervention and control group were significantly different. When we assume that it does, this is called a “difference in nominal significance” or DINS error. Quite often, a statistical test under these circumstances will demonstrate that the intervention group is not significantly different - either in mean or in the before/after difference - from a negative control! In fact, the data can be strikingly similar between groups even when one before/after comparison just crosses the threshold to “significance.” This is, fundamentally, why we use statistical tests in science - you can’t just look at the data and know. It is also a reason we need to be more cautious about how we use “p-values” more broadly. Significant differences for a comparison cannot be evaluated in the absence of a test that is specific to that comparison.

It’s worth noting that, while the potential for this error is a characteristic of designs that combine time controls with other control groups, it is also possible to make this mistake in many other multi-factorial design settings. For example, if two different interventions are independently compared with control, but not tested for difference from each other or an interaction, the same error can (and frequently does) occur.

A difference in nominal significance is not the same as a significant difference.
George et. al., 2016.

4.6 Planning and reporting

Rigorous study design requires planning. As a design becomes more complex, it becomes particularly important to plan in advance for statistical analysis, gathering as much information as you can to make sure that you have the sample size to power your comparisons of interest.

Regardless of the comparisons you choose, but especially if you choose not to explicitly compare test points or groups, be sure to report fully and accurately on your results! DINS errors are not only committed by authors - readers can make this mistake themselves when they look at the data (as perhaps you did in the activity!). Adding straightforward statements that acknowledge what was and was not compared can support readers in accurately understanding your work.

Takeaways:

Repeated measurements can be used as a form of negative control, allowing subjects to be compared against themselves before and/or after an intervention.
Outcomes characterized by a pattern, such as those with an expected recovery back to baseline following the intervention, have a lower risk of confusion between outcomes caused by an intervention and those caused by time or by influential variables that coincide with its passage.
Repeated measures resemble “no intervention” controls and share the same risk of confusing components of intervention with a hypothesized specific cause. Targeted negative controls can only be accomplished with a separate control group.
Designing studies to combine both repeated measures and negative control groups can allow for multiple points of valuable comparison, so long as the appropriate statistical analysis is selected for each and every claim.

Reflection:

Are there ways that repeated measurements could address limitations in work you have done previously or read about?
Have you conducted or learned about an experiment that exclusively used repeated measurements without a negative control group? What considerations have you applied in interpreting these data?
Have you encountered a DINS error in the wild? What could you do in reading future research to check whether the authors of a paper may have accidentally committed this error?