Lesson 5:

Positive controls

Summary

This lesson introduces positive controls. Positive controls are groups and validation procedures used to verify that an experiment actually worked. Positive controls can check against false negatives, can validate measures as they are taken, and can provide vital context for interpreting results.

Goal

Develop an understanding of the role of positive controls in experimental design to plan when and how to implement positive controls in an experiment.

5.1 What more comparisons could we possibly need?

In the past two lessons we have learned about using negative controls to help isolate the specific intervention we are looking for, test for potential alternative explanations, and match as many concerning variables as possible between groups or timepoints.

Negative controls are most useful in eliminating alternative explanations for an outcome that we believe to be a consequence of our intervention - a “positive” result. They can also help us recognize interference from variables that obscure a true cause by preventing or hiding an outcome, but only if their effects are explicitly tested or are manifest, e.g. when control values are suspiciously variable or differ among negative controls.

In other words, negative controls are most useful for avoiding false positive results, though they can help to diagnose false negatives that result from certain kinds of infuential variables. Furthermore, negative controls define a baseline, an essential feature for interpreting an effect’s size.

In this lesson, we focus on a type of control that targets a particular class of concerns: methodological failures during an experiment. These failures are more typically associated with false negative results - failing to detect a true relationship. And they aren’t typically caught with negative controls, because the outcome of a method failure tends to resemble the expected outcome of a negative control.

Positive controls are a mirror image of negative controls in certain ways. In other ways they are their own unique design feature.

5.2 When “no” means “inconclusive”

Let’s return to our Rareberries example. Imagine we performed our experiment as planned and found no effect. The mice fed Rareberries performed no better than the mice fed blueberries or our “no intervention” control. We might wish our results were different, but we are fully prepared to accept what the data tell us. There are always more options, but based on everything we’ve learned so far, this was a well-designed experiment with proper negative controls.

Pause. How do we know our negative result is valid? What mechanisms do we have in place to know if our tests are functioning properly?

This study measures coordination development by evaluating subject time on a rotarod. Being confident in the results of this study requires being confident that each mouse had an equal opportunity to stay on the rotarod, that all times were calculated identically, and that there were no factors in place creating an artificial ceiling for how long a mouse could stay on the rotarod. If we cannot say confidently that our means of evaluating subject time on a rotarod is capable of returning a positive result - a significantly longer time to fall - we have an inconclusive result, not a definitive null result.

This means our experiment was missing a positive control! We have no way to know if it was actually able to detect superior performance during our experiment.

But it hasn’t been long, so maybe it’s not too late to catch a problem. How would we go about doing this?

5.3 Positive control groups show what was possible

For the sake of our example, let’s introduce a positive control group. The research team finds a mouse strain that is known to perform exceptionally well on coordination tests. When the team tests this strain of supremely coordinated mice, they also show no difference from the negative control and the Rareberry-fed mice.

Consider the role this positive control group could play in each of these four scenarios:

Experimental failure. If the fit mice do not outperform our negative controls, we know that something is wrong with the experiment and it is not able to accurately measure results.
Rareberries are ineffective. If the fit mice outperform negative controls and the mice with Rareberry-supplemented diets do not, we can more definitively state that Rareberries do not improve coordination in juvenile mice.
Rareberries are okay. If the fit mice outperform the Rareberry-fed mice, but the Rareberry-fed mice do outperform the negative controls, Rareberries have a significant but not necessarily exciting effect on juvenile mouse coordination.
Rareberries are incredible. If the Rareberry diet mice significantly outperform the fit mice, and the fit mice outperform the negative controls, we may have a miracle supplement on our hands.

This is how positive controls work. Testing with a strain of dextrous mice known to perform well on tests of coordination could allow the team to immediately identify that something made it impossible for the system to detect higher levels of performance on the rotarod, and to investigate the cause of this problem. Perhaps a technician whistles when bored by long mouse tests, spooking the Rareberry-fed mice off of the rotarod prematurely. Perhaps the rotarod has a lubricant leak, or produces a strange vibration after a certain period of sustained operation that dissuades mice from continuing the test.

Not every experiment has an obvious choice for a test group that can demonstrate what a positive result or maximum effect could look like. But, with a little creativity, it is often possible to conceive of a control group that demonstrates that our experiment was capable of finding an effect. This kind of creativity can be one of the most fun and gratifying parts of science!

5.4 Validation & positive control

The fit mice in our example served as an additional group that tested a key component of our experimental method because they are known to perform well. They were a positive control group.

Using a positive control group is one form of validation, a broader category of processes intended to verify function and accuracy for any part of an experimental procedure. Like other kinds of validation, positive control groups allow us to demonstrate that a component of an experiment works as expected. However, it is a very specific form.

Some validations can be performed separately from an experiment. Validating components of an experiment can demonstrate that the effect we are looking for can be caused or measured - generally - and that our equipment is capable of measuring it accurately - at the time of validation. They can help us be confident that everything is working as expected before we begin. That’s extremely useful! However, we often need additional confidence that everything is working as expected during our experiment.

A positive control is a kind of validation that is specifically associated with an experiment. It describes any test we perform that verifies that our method worked as intended during our study.

5.5 Control groups, control tests

A positive control group like the fit mice serves to validate a measure. But, this group does not offer validation for all steps in the experiment. . Failures in Rareberry delivery might leave behind obvious evidence–uneaten berries–but if steps are not taken to verify that the mice are eating their Rareberries it might not be possible to spot that a crucial step in the procedure is failing. Additionally, positive control groups are typically very specific. Fit mice might not be very useful if we were assessing a suite of outcomes, with other expected effects on problem-solving or sensitivity to smell.

A positive control group can be useful to demonstrate what a positive outcome can look like. But there are many ways that individual methods can be verified during an experiment, and any of these tests can play a role as one of many “positive controls.”

Consider a self-administered test for an illness or condition. Most at-home tests will always undergo some kind of visual change once used, even if the results are negative.The C line on a COVID-19 test contains antibodies that bind to components of the test solution, independent of the presence of antigens associated with COVID-19. Most commercial assays build in a positive control that automatically verifies the functionality of the assay each time it is performed.

Positive controls may indicate:

Maximum binding or staining.
Accurate measurement of a known standard
Ability to detect “presence.”

So, positive control can be thought of as a multi-pronged strategy for validation during your experiment. By combining creative control groups with smaller technical validations, we assemble a solid defense against hidden experimental failures.

5.6 Context and reality checks for positive results

Positive controls are essential to eliminating the possibility that a negative result might be explained by a methodological failure. However, like negative controls, they also offer the benefit of contextualizing our effect size when we find one. Above, we described this as scenario 3, “Rareberries are okay,” and scenario 4, “Rareberries are incredible.”

Now that the team has addressed any faults that produced a false negative result, they re-run their tests and find that mice raised on Rareberries do, in fact, perform better than the mice raised on a standard diet or with a blueberry-supplemented diet.

What kind of context can the fit mice provide?

If Rareberry mice perform better than negative controls but not as well as fit mice, we might be reasonably impressed, but not necessarily calling the newspapers.
If the intervention-fed mice perform better than our positive controls? Now that’s interesting.

In the case of a positive result, a group that defines what a “maximum” or “positive” result looks like offers a valuable point of comparison for interpretation of effect size. Here, it’s conceivable that we could create mice that are even more coordinated than our positive controls. In other cases, however, if a positive control reflects a reliable maximum, exceeding a positive control could be an implausible result - an alternate sign of potential error in a method. This is another important role that positive controls can play: letting us know when our values or effect sizes are reasonable and when they defy credulity.

Activity

Try it out! Suppose that you are interested in studying the effect of swim training on Amyotrophic Lateral Sclerosis (ALS) in mice, using three measures of disease severity:

- Grip strength: How long mice are able to hold on to a moving handle.
- Mitochondrial efficiency: How efficiently cells are able to turn fuel into usable energy.
- Oxidative stress: A biochemical imbalance that risks muscular “wear-and-tear”.

Embedded Webpage

Click anywhere to start

Post-activity questions:

What positive controls did you add to the study? Did you see controls that you thought were creative or interesting?
Did you find you tended to focus more on adding positive control groups, or on adding steps to validate specific components of the experiment? Why do you think that is?
How would you prioritize the positive controls you or others considered for this experiment? Are some more important or practical than others?

Takeaways:

An experiment with proper negative controls can still return an invalid result if it does not also contain steps to validate its negative results.
Positive controls allow researchers to identify critical failures in experimental methods. If a control group with a reliable maximal outcome fails to show expected results, this is a sign that there is a problem with the experimental method.
Positive controls are one type of validation. Positive controls validate an experiment as it is happening, ensuring that data collected is not limited by hidden errors.
Positive controls contextualize results, allowing researchers to see when a significant result is impressive - or unrealistic - relative to known maximum values, and giving a scale for the kinds of results that bear serious consideration.

Reflection:

When you see a negative result from an experiment, how do you react? If you question the validity of that negative result, what information informs that questioning?
What positive controls have you used in your own research, or seen used in other studies? Are there experiments you have conducted or seen without a positive control that you could try adding one to?
Try to think of one possible positive control that you haven’t yet tried or seen.
How do you think positive controls should be prioritized? Are they more important for some kinds of experiments than others? Why or why not?

Lesson 6:

Controlling for bias with placebos

Summary

This lesson explains how researcher and subject biases can undermine the rigor of a study. Expectations from researchers or research participants can influence outcomes, detracting from the goals of obtaining a rigorous measure of how changes in an independent variable produce changes in a dependent variable.

Goal

Recognize that the biases of researchers and subjects can diminish the effectiveness of well-designed experimental controls.
Evaluate study designs and identify the need to use masking and/or placebo controls.

6.1 Does brain training actually work?

The brain training commercial industry boomed between 2005 and 2015. Products such as NintendoDS video game Brain Age: Train Your Brain in Minutes a Day! and online program Lumosity achieved tens of millions in sales and investor funding, respectively. This industry was founded on a simple premise. Fifteen minutes a day of games can produce large improvements in cognitive skill. But is this claim supported by rigorous scientific evidence? (spoiler alert: it is not. In 2016, Lumos Labs agreed to a $50 million settlement with the FTC over false advertising for Lumosity).

Let's imagine what a study that investigates this claim might look like:

independent variable: brain training
dependent variable: improvement in cognitive ability (change in some standard measure of cognitive skill)

Then, for a simple negative control design, our two groups of subjects might be:

intervention group: participants who receive brain training
control group: participants who receive no brain training

Pause. Are there any obvious gaps in this simplified experimental setup?

[Brain training] was founded on a simple premise. Fifteen minutes a day of games can produce large improvements in cognitive skill.

6.2 Expectations are another type of concern

Let's start by reviewing our 4 categories of variables of concern:

Subject-specific variables

We could expect that some subjects are more or less sensitive to brain training, but we imagine our pilot study is going to recruit from an undergraduate population of convenience, they will be closely matched on attributes such as age, educational attainment, etc. We are also using a repeated measures design, so variability between subjects will be accounted for.

Environment-specific variables

There could be large variability in the home environment of subjects, but it does not seem likely that there will be a systematic difference between the intervention and control group.

Measurement-specific variables

There could be large variability in how subjects perform on the cognitive skills measure(s), but it does not seem likely that there will be a systematic difference between the intervention and control group.

Intervention-specific variables

In addition to the active intervention component of brain training, there are some other differences between the intervention and control groups that could be a factor. These include: mental and/or physical stimulation (a necessary component of brain training), a subject's awareness of whether they are receiving brain training or not, and a subject's expectations of how brain training will affect their cognitive skills.

In particular, it is well-known that performance in a variety of settings can be boosted by confidence or other motivational factors (for example, whether a cognitive test is labeled as a set of puzzles or a measure of IQ [Brown & Day 2006]). Thus, we might expect that the expectation of improvement due to brain training could influence performance on cognitive tests.

6.3 Measuring the strength of expectation

Foroughi et al. (2016) designed a study to measure the strength of this expectation effect. Whereas past studies compared a brain training group to a control group that did not receive brain training, Foroughi's team gave both groups of subjects in the study the same brain training intervention, choosing to manipulate the expectation of the groups. One group was recruited using a flyer that advertised "Brain Training & Cognitive Enhancement," while the other flyer read "Email Today & Participate in a Study".

adapted from Figure 1 of Foroughi et al. 2016

Foroughi et. al. sought to demonstrate that studies comparing an intervention group to a "no intervention" negative control group will have differences in both the active intervention component (brain training) as well as the expectations of subjects. They did so by devising an expectation control group, who would receive the same intervention but NOT the expectation. The outcomes from this group can then be compared to the intervention group (that receives both the brain training and expectations), in order to measure the magnitude of effect that expectations around brain training will have on changes in cognitive performance.

This is a creative example of a targeted negative control. The table below, modeled off of the negative control planner used in lesson 3, demonstrates how this control differs from a placebo control and a naive (“no-intervention”) negative control.

	intervention group	placebo control group	naive negative control (no intervention)	expectation control group (intervention w/o expectation)
active intervention component	YES	NO	NO	YES
subject expectations of intervention	YES	YES	NO	NO

6.4 The Placebo Effect, and other Biases

More broadly, this phenomenon that Foroughi et al. sought to characterize is known as the Placebo Effect. The Placebo Effect occurs when a subject's expectation affects outcomes. Although commonly thought of as occurring in human subjects, there is evidence that the placebo effect also occurs in non-human animals that can "learn" to expect that an intervention will have some therapeutic effect (e.g. Muñana et al. [2010] found that some dogs show a substantial decrease in epileptic seizures after receiving a placebo).

However, the placebo effect is not the only kind of bias to watch out for!

Researchers may experience a similar effect, called expectation bias, where their expectations can result in differences in performance between an intervention group and a control group. In a brain training study, this could occur if a researcher is aware of whether a subject is in the intervention or control group, and then inadvertently influences subjects who then have differences in test performance.

Researcher expectations can also impact data collection through a phenomenon called observer bias. When data collection involves human observation or annotation, these expectations can creep in to influence judgments and actions.

These biases can influence a variety of experiments and phenomenon, including:

drug trials
intervention and training of animals
surgical outcomes
expectations of negative side effects overpowering the normal effect of an intervention ("nocebo")

6.5 Averting bias using proper masking and placebos

What can be done about these challenges to rigor and internal validity?

One of the main tools is masking, which is "the process by which information that has the potential to influence study results is withheld from one or more parties involved in a research study.” (Monaghan et al. 2021). Masking is a technique that can involve a variety of methodological steps, such as omitting information about whether a subject is in an intervention or control group, from the subject and researchers. For more on proper masking, see our unit on Confirmation Bias.

A key tool to support masking is the use of a carefully designed placebo. A successful placebo ensures that subjects cannot tell if they have been assigned to an intervention or control group. These key criteria are important to ensure that a placebo is effective:

A placebo is identical in every way to the intervention but is inactive in the causal agent of interest (e.g. the drug ingredient that is hypothesized to have an effect vs. the delivery of the drug).
Using standardized procedures for delivery, so that there is no perceivable difference between the administration of an intervention and the administration of a placebo.
Using a placebo that produces similar side effects to the intervention, so that the side effects (or lack thereof) do not clue subjects in on whether they are receiving the intervention or a placebo.
Some studies use a "sham", which is a negative control that mimics an intervention. Shams can be quite extensive, up to and including surgeries where a deactivated device is implanted in subjects, performed by surgeons unaware of whether the device is a sham or not.

Masking also extends beyond study participants or research subjects. If a researcher is aware of which group a sample belongs to, they may still suffer from expectation or observer bias. Where possible, taking steps to separate knowledge of subject allocation from important tasks like monitoring subjects or analyzing data can help avert these biases. Above all else, good masking procedures are important to ensure that neither subjects nor researchers can distinguish between a placebo and the intervention.

Activity: Identify problems and solutions related to biases.

It can be tricky to know that we need to be concerned about a placebo or masking in our study. And once we know that there is an issue, it is not always obvious what the best solution is.

In our next activity, you will be given a description of an experiment with its methods broken out into distinct steps. Evaluate whether each description has a methodological issue, identify which portion of the methods best isolates the issue, and choose an appropriate fix.

Embedded Webpage

Click anywhere to start

Post-activity questions:

Which methodological fixes would be challenging to implement?
How would you plan to implement these methodological fixes?

Takeaways:

Relevant controls can be unreliable if we don’t account for bias (in the study subjects or researchers).
Masking, placebos, and randomization are all key rigor tools to address bias.
Well-designed placebos (including shams) must be indistinguishable from interventions.
Be sure to document all procedures to report accurately later!

Reflection:

Think back to a recent study you've read or conducted—are there any ways in which researcher or subject expectations could have influenced the results?
Foroughi et al. used a recruitment mechanism and masking to test the strength of the placebo effect for brain training interventions; what are other ways to isolate the influence of brain training from expectations?
Designing a good placebo depends on isolating the hypothesized causal agent of the intervention—what are some options if you are unsure which component of the intervention IS the relevant causal agent?