Controls
Clarity, not coincidence!
Lesson 1:
The dangers of incomplete experimental design
Summary
This lesson introduces experimental controls as a vital check on compelling scientific breakthroughs that seem too good to be true. A brief review of lobotomy as a treatment for mental illnesses illustrates some of the problems caused by conducting research without proper controls. An interactive review of a retracted paper testing the effect of serine protease inhibitor neuroserpin on hippocampal neuron length introduces some ways to carefully assess for issues with experimental controls.
Goal
Understand the importance of experimental controls and begin building an intuition for how to spot problems with controls in published work.
1.1 The hot new treatment.
Imagine you are a clinician in 1935, working with patients who suffer from a terrible, incurable illness. A new treatment has been developed! Published studies emerge with promising results. They report a majority of patients as cured or improved. And, patients who did not experience positive outcomes are reported as unchanged in condition.
This seems extremely positive. Let’s hear from one research team directly:
“...I do not wish to make any comment since the facts speak for themselves. These were hospital patients who were well studied and well followed. The recoveries have been maintained. I cannot believe that the recoveries can be explained by simple coincidence. “
- The second of 2 small studies into the treatment, n=38 total.
One pilot study for the procedure reports that the treatment alleviates patients’ “insomnia, nervous tension, apprehension and anxiety.” Patients are “more comfortable” after the intervention. The authors acknowledge the limitations of this approach. They note that “every patient loses something by this operation . . . Some spontaneity, some sparkle.” But, the results are incontrovertible. The symptoms identified as the most severe hallmarks of illness appear, in a great many cases to be cured.
A small but mounting body of evidence about the treatment appears to confirm these initial findings. Each study carefully profiles treated individuals in detail. This treatment seems to produce incredible results when comparing patients’ condition pre- and post-treatment.
Pause:
- Would you use this treatment?
- Is it ethical to delay treatment for a terrible disease?
- What information do you want to know from the research teams reporting their success with this treatment?
50,000 lobotomies were performed in the United States between 1949 and 1952.
In 1947, the Nobel Prize was awarded to António Egaz Moniz for pioneering the technique. Champions of lobotomy were respected, scientifically trained, and sincere in their efforts to improve the lives of their patients.
There is more to the story, however. A comprehensive survey of US psychiatric facilities from 1949-1951 found that most lobotomy patients in the United States were women. A more recent methodological review suggests that markers of successful lobotomy treatments outcomes were influenced by the presumptions of practitioners, and included unvalidated measures such as praise from patients’ husbands (Tone and Koziol, 2018).
Lobotomy is no longer believed to be an effective treatment for mental illness. But, what are we to make of all those patients that appeared to be cured in early studies? How did things get so askew as to convince reputable, well-trained scientists that they had discovered a miracle cure?
Part of this problem has to do with intake practices for lobotomy patients. Patients selected for treatment tended to be newer patients, often early or immediately following their first hospitalization. Someone who has been recently hospitalized as a result of their chronic and severe mental illness might be in a position where the effect of being hospitalized alone could have a significant impact on the severity of their illness. Mental illnesses are often imprecisely defined and appear to resolve spontaneously. In contrast, patients with a long history of hospitalization are those who have not recovered on their own. And, younger patients were disproportionately selected for lobotomies under the belief that early intervention was important for success. Younger patients hospitalized for the first time may also be patients more likely to recover on their own. In this context, there is no way to know if the observed benefits could be entirely attributable to spontaneous recovery without comparison to a similar group of patients who did NOT receive the intervention.
1.2 A more accurate measure.
Hindsight makes it easy to identify problems in research. Unfortunately, research success requires that we design for problems in advance! Let’s explore study designs that more accurately estimated the effects of lobotomy on severe mental illness.
McKenzie and Kaczanowski (1964) conducted a five-year study on 366 patients with severe mental illness. Half of the study participants received a frontal lobotomy (the treatment group). The other half of patients received standard psychiatric care and did not receive a frontal lobotomy (the control group). Patients between groups were intentionally matched on age, sex, type of mental illness, and severity of mental illness (such as chronic illness vs. recent internment). Over a five-year span, 43.17% (79/183) of patients in the treatment group were discharged from treatment facilities, while 37.16% (68/183) of patients in the control group were discharged.
True, 6 percent could be a meaningful effect size, but let’s see how the team reported these findings:
“One hundred and eighty-three patients who had received prefrontal leukotomy over a period of several years were compared with a closely similar group who had not undergone the operation. No significant differences in rate of hospital discharge after a five-year follow-up period were found between the two groups. The groups showed no consistent difference in outcome in relation to diagnosis, chronicity vs. intermittency of illness, heredity, education, family attitude, premorbid adjustment, or degree of insight on the part of the patients. It is concluded that prefrontal leukotomy does not produce any rate of remission significantly beyond that to be expected without the operation.”
When they tested the difference between treatment and control groups, the research team concluded that the difference between the intervention and the control groups was not distinguishable from that due to random chance. By conducting a rigorous analysis of their findings, and using a control group to establish a baseline for no effect, the research team was able to demonstrate that lobotomy was not an effective treatment.
Unfortunately, lobotomy was not promptly abandoned in response to compelling findings like those of McKenzie and Kaczanowski. Alternative treatments were limited, and common practices can be slow to change. However, the popularity of the procedure gradually declined, and with the advent of new medications, it finally became obsolete.
“One hundred and eighty-three patients who had received prefrontal leukotomy over a period of several years were compared with a closely similar group who had not undergone the operation. No significant differences in rate of hospital discharge after a five-year follow-up period were found between the two groups. The groups showed no consistent difference in outcome in relation to diagnosis, chronicity vs. intermittency of illness, heredity, education, family attitude, premorbid adjustment, or degree of insight on the part of the patients. It is concluded that prefrontal leukotomy does not produce any rate of remission significantly beyond that to be expected without the operation.”
Most researchers are not in a position to conduct research studies on irreversible procedures at such a large scale. But, non-significant findings happen all the time. Thinking about how things went so awry with lobotomy can help us to consider how we consume information about other flash-in-the-pan research findings.
Many studies preceding McKenzie and Kaczowski did not include a dedicated control group. This includes the paper by Freeman and Watts (1937) quoted without attribution above, that found patients to be more comfortable after receiving a lobotomy even if they lost a certain “spark.”
Absent proper experimental controls, it is very difficult to evaluate whether an intervention has done anything at all. A well-designed control group makes it possible to: 1) compare outcomes between the control group and the treatment group, and 2) attribute those differences to the effects of the intervention received by the treatment group.
Later studies did include control groups. However, improperly-designed controls may not help to isolate the effects of the intervention. For example, if a researcher has just spent an extraordinary amount of time and effort learning how to perform a lobotomy and then performed several lobotomies, they might be especially attentive to patients recovering from the procedure. Worse, and as we cover in lesson 6 of this unit, researchers’ biases may accidentally influence their assessment of outcomes. And, without identifying and attempting to match the conditions experienced by both groups, there may be additional experimental factors or other systematic differences that could offer alternative explanations for any observed effects.
Research cannot distinguish false findings from true findings without methods to isolate the actual effects of the interventions under consideration and demonstrate that any results - positive or negative, large or small - cannot be attributed to any reasonable alternative. This is where controls come in.
1.3 Finding concerns in a study design.
Consider the following experiment. A research team is attempting to identify causes of neuronal growth and connectivity in the developing and adult nervous system. They hypothesize that a brain-specific serine protease inhibitor called neuroserpin plays a key role in regulating neuronal development. We'll look at a simplified description of the study design and look for areas of potential concern.
The researcher team obtains a uniform group of 18 day old rat embryos. They dissociate hippocampal cultures from the embryos by treatment with papain and trituration with a fire-polished pasteur pipette. Neuron cultures in the control group are placed into a sterile neurobasal medium that is replaced every 7 days. Cultures in the intervention group are treated with filter-sterilized recombinant neuroserpin. Immunohistochemistry is used to stain axons, and algorithmic image analysis is used on randomly selected grids to measure axon length as a metric of neuronal growth.
Activity: Identify study concerns
In this activity, you will consider several elements of this simplified study design, which is based on a real experiment, and look for areas of potential concern. Then, you’ll review your concerns and those of others in a process diagram to visualize how they manifest throughout the research workflow.
Can you guess the concern that caused the paper upon which this example is based to be retracted? (Don’t read ahead yet if you don’t want spoilers!) You may not be familiar with this experiment, or even this kind of science, but you can do it! Identify variables in the study that you think could impact the authors’ ability to confidently attribute outcomes to their intervention, then share your thoughts on their potential implications.
Post-activity questions:
- What variables did you find concerning?
- Did you notice that filter sterilization was a component of treatment that was not shared with the control group? Why do you think this was done?
- What could the research team have done differently?
1.4 Science is hard!
The study that served as the basis of this activity was later retracted because the syringe filter the research team used in the intervention group had contaminants that were responsible for the effect. It’s very easy to say that errors occur because people are foolish, or that other researchers are careless. But, most people don’t want to make errors, and it’s hard to know that a syringe filter is responsible for your findings until someone points that out to you. Most problems are unforeseen.
Interventions are often complex, with many elements that may not be of interest to the scientific research question, but which could be responsible for certain outcomes. If one of these factors does influence the outcome measure (e.g. neuron growth), that systematic difference between groups will confuse the interpretation of results. Good experimental control is fundamentally about isolating variables so that we can confidently attribute outcomes - effect or no-effect - with accuracy.
It’s very easy to say that errors occur because people are foolish, or that other researchers are careless. But, most people don’t want to make errors, and it’s hard to know that a syringe filter is responsible for findings until someone points that out. Most problems are unforeseen. We want to learn how to avoid these, too.
Thinking about controls can help us to become better at predicting concerns in our own work. But, attending to concerning components of a research study extends well beyond the actual conducting of the study. Transparent reporting of methods is the only way for others to review and identify issues with an experiment after the fact. Rigorous science requires both careful study design with well-controlled concerns combined with clear and complete reporting is the recipe for rigorous science.
Controls work to address concerns, isolating the relationships we are truly interested in understanding. There are many concerns that could impact the validity of a scientific claim. For many (different) concerns, many (different) controls may be needed. Across the remaining lessons in this unit we will explore how to properly establish experimental controls, avoid common issues with controls, and practice doing our best science in an imperfect world.
Takeaways:
- Research lacking proper controls might seem obviously flawed in hindsight, but in reality many concerns are frighteningly easy to miss. Learning to methodically anticipate potential concerns and implement careful controls is the best defense against a dangerous false result.
- Opportunities for control can be identified throughout the research workflow. A process diagram can help visualize that workflow and think through concerns at each step.
- Controls work to address concerns. Concerns are any factor that can impact the strength of a scientific claim.
Reflection:
- Can you recall a time you read research findings that were later walked back or retracted?
- When was the last time you performed an experiment and discovered an unforeseen variable impacted your results?
- What steps do you take to map out your research? Where do you incorporate and plan controls in that work?
Lesson 2:
Control, Controls, Controlled
Summary
This lesson disambiguates the word “control” in science. Is a “control group” the same as “controlling for” a variable? Not always. In this lesson, we’ll unpack these meanings and introduce a simple framework to help identify potential concerns and explore opportunities to manage variables that could mislead.
Goal
- Identify potential sources of concern and differentiate among strategies available to apply experimental control — including but not limited to control groups.
- Explore prior knowledge and consider the reasoning YOU use when selecting an approach to “control.”
2.1 The many faces of “control”
In everyday language, “control” means to take charge or hold something steady. In scientific research, though, the word wears many hats. Sometimes it refers to a comparison group without an intervention. Sometimes it means managing variability. Sometimes it’s a strategy for protecting results from bias.
We think we know what control means, but it can be surprisingly hard to nail down in an experimental setting.
How many of these uses of the word “control” look familiar to you?
Example 1. “We used untreated fruit flies, fed the same food and raised in the same environment, as a control group.” This is an example of Controls as a comparison group. In this example, control refers to another study population subjected to the same conditions as the intervention group for the duration of the experiment, but not given the intervention.
Example 2. “We controlled for the influence of circadian rhythms by testing all participants 2 hours after walking.” This is an example of control meaning removing the influence of a variable. In this example, controlling for a variable refers to keeping that variable constant within the study in order to insulate an experiment from the effects of that variable.
Example 3. “We used stratified randomization to distribute mice across treatment arms, controlling for sex and age.” This is an example of control meaning distribution to create comparable groups. This example describes a treatment allocation practice to achieve the same insulating effect while still allowing the full range of a variable to be included in the study. It does so by taking steps to avoid the potential for uneven distribution of variability across the study groups.
Example 4. “We included a cell lysate known to express Q in the western blot as a positive control.” This is an example of control meaning a test of a known sample for technical validation. Control in this example refers to the calculations performed to identify the impact of a variable on the outcome of an experiment in quantitative terms, then adjust all results in a way that removes that impact prior to hypothesis testing.
Example 5. “Controlling for levels of daily exercise, drug ABC reduced the severity of neuropathic pain by an average of 62%.” This is an example of control meaning a statistical correction for the effect of a variable. Control in this example refers to the calculations performed to identify the impact of a variable on the outcome of an experiment in quantitative terms, then adjust all results in a way that removes that impact prior to hypothesis testing.
These examples are all very different uses of the word “control”! As it turns out, this is one of those words where everyone thinks they know what they mean, but which they may struggle to define in a way that applies to all uses.
Clearly, there is more to “control” than just control groups.
Before we set about breaking these different uses down, let’s take a moment to introduce a related term that has a somewhat more straightforward definition. Internal validity describes the extent to which a scientific claim is supported by data collected for the specific methods and subjects of a study. This pairs with external validity - the extent to which a claim can be generalized to contexts beyond the specific methods or subjects applied - as a means to evaluate the usefulness of a project overall.
Well crafted controls, in all their many forms, combine with other elements of rigorous study design to enhance internal validity and enable more confidence in scientific claims. They narrow the pool of potential alternative explanations, isolating variables so that effects, relationships, observations, or lack thereof can be attributed with accuracy. As we will see later in the unit, effects of controls on external validity can sometimes be more complex, such that studies may intentionally (and carefully) relax certain kinds of control in order to preserve external validity.
But how can we better understand what exactly we mean when we use this word “control,” and how each application affects internal and external validity? Let’s explore this further with some help from an example.
2.2 A new superfood?
Let’s imagine an example experiment. A research team is exploring the effects of a new superfood called the Rareberry. The Rareberry is reported to cause exceptional motor skill development in juvenile mammals. Does it? Let’s design an experiment to find out.
Our question: Do Rareberries affect coordination in mice? We hypothesize that some unknown phytonutrient they contain will improve development of motor coordination. The intervention, or independent variable, is the consumption of Rareberries. There are a lot of ways to demonstrate that a mouse has superior motor skills. The outcome, or dependent variable, we have selected to assess for this study is coordination, which serves as a proxy for their overall motor skills. We predict that juvenile mice fed Rareberries will develop to become highly coordinated adults..
Our plan: supplement one group of lab mice with Rareberries, then test their motor coordination on a rotarod—a standard assay where mice must maintain balance on a horizontal rotating rod, just high enough above the floor that mice will try to stay on the rod rather than falling or jumping off. The rate of rotation is increased at a steady rate until the mouse falls. More coordinated mice are able to stay on the rod longer, so time to fall off is recorded as the outcome of this test.
If the Rareberry group mice perform well, will that provide convincing evidence that the berries are effective? What if they don’t perform well? Will we be prepared to walk away confident that we have busted this myth? Rigorous controls ensure that alternative explanations are limited.
2.3 Concerns about validity
In any experiment, we will have an independent variable - our intervention - and a dependent variable - our outcome. Inevitably, though, the world is chock full of other variables. Many of these are sources of concern for our experiment.
Our independent variable: Presence or absence of Rareberries in the diet
Our dependent variable: time on a rotarod
As a general rule, variables of concern will arise from one of 4 sources. Here are those sources, with some example variables to consider:
- Subject: Factors like age, sex, genetic strain, and history.
- Environment: lighting, noise, time of testing
- Intervention: vehicle solution, delivery method, handling, time
- Measurement & analysis: who does the scoring, what tools are used, how data are processed, bias in analysis
Variables of concern may vary from study to study. A study exploring effects of an intervention on participants’ smoking habits may be concerned with participant’s existing smoking habits. A study exploring the effects of an intervention on social dynamics within a group could be concerned about potential sources of environmental agitation or aggression. A study exploring the effects of an intervention on political views might be concerned about raters’ biases in favor of a specific result. A study exploring the effects of serine protease inhibitor neuroserpin on neuronal growth would now know that they need to be concerned about the delivery and sterilization methods used to introduce neuroserpin into the experimental environment.
A Closer Look: The Rareberry and cage height
Consider this scenario for our Rareberry study: mice housed on higher shelves in the rack have a better view of the room. They can watch other mice playing below them. They receive more light exposure and tend to be active for longer periods of time. Over time, these environmental differences enhance their motor coordination.
If our Rareberry-fed mice happen to be more often found living on the upper shelves, how might this affect our results?
Pause to consider:
How might each of these categories (subject, environment, intervention, measurement & analysis) apply in our Rareberries experiment? Can you think of one variable that is likely to influence our outcome in each category?
Variables are concerning when we either know or suspect that they might affect our outcome. We have to be vigilant because - as we saw in the neuroserpin study - just because we don’t suspect a variable, that doesn’t mean we shouldn’t be concerned! Depending on how they act and how they are distributed, impactful variables might exaggerate, diminish, change, or obscure our outcome of interest. If this happens, it reduces the accuracy of our experiment: we will be unable to trust that any result - or lack thereof - is specifically attributable to our intervention. In other words, it harms the internal validity of our experiment.
At its heart, “control” in experimental design means managing variables that could distort our understanding of those that truly interest us. Apart from statistical errors, failed controls of one kind or another may be the single most common source of wrong answers in science.
2.4 Strategies for control
It can be helpful to organize our options for control into categories. Consider how you might apply each of these options for managing a variable:
1. Constrain
Sometimes we manage variables of concern by keeping them constant or narrowing the range of variability allowed in our study. For example, using only male mice of a specific age limits the impact that effects of sex or age could have on our outcomes.
2. Distribute
When a variable isn’t constrained in our study design, we often take steps to prevent it from affecting any one group disproportionately. Depending on a study design, randomization methods can ensure that key variables are evenly distributed. They may still add to the variability of our dataset overall, but we can at least be assured that their influence across groups should be similar.
3. Test
Sometimes, we want or need to know what kind of an impact a variable has on our outcome. Testing for specific impacts or outcomes helps us contextualize any effect of our independent variable by comparing it to minimum, maximum, or alternative outcomes to understand a difference. Sometimes this may mean creating a new “control group”; alternatively, we may record and test for the effect of a variable like sex to help us understand how and whether our findings apply to the general population.
Activity: Address study concerns
When we design an experiment, we apply control strategies to create the best experiment we can. In many cases, we may not be fully aware of why we are choosing one strategy over another, or have even considered that alternatives exist.
In our next activity, you will be given a description of a familiar experiment with several “concerns.” Use your best intuition to sort these concerns into the strategy bin that you think best fits the approach you would take. While in some cases there may be common answers, there isn’t necessarily a best answer for any given variable.
After sorting your concerns and explaining a bit about how you envision a strategy being applied to it, you will see how others chose to address each concern.
Post-activity questions:
In sorting concerns into different categories, you had a chance to think about the differences and relationships among our options for experimental control. Now, consider the following questions:
- Why did you make the choices that you made?
- Were there some variables you always chose to distribute? To constrain?
- Other variables that didn’t make sense to handle with anything but a control group?
The term “control” can mean many different things. This is not bad. But, cultivating an awareness of different applications of the term control and understanding why each of these roles matter is key to doing more rigorous science.
Takeaways:
- Control, used colloquially, refers to things as divergent as setting up a formal control group and applying a statistical correction. Control groups, controlling for variables, controlling group allocations, and controlling variable influence on results are all different things and it’s important to know which you mean when you use the word control.
- Controls are key determinants of the internal and external validity of an experiment. They narrow the pool of potential explanatory variables and improve the accuracy of attribution.
- Broadly speaking, strategies for controlling concerning components of an experimental design follow three patterns: Constraining variables by keeping them constant or within a permitted range of low variability, distributing variables across subject allocations, and testing for the effects of variables, sometimes by creating control groups.
Reflection:
- What kinds of factors would be important to you in making decisions about how to handle concerning variables in your own research?
- Think of a choice you have had to make regarding control for your own research. How does your choice fit into our category bins? How would your research be different if you had chosen any of the other categories, as well or instead?
