Lesson 1:

The dangers of incomplete experimental design

Summary

This lesson introduces experimental controls as a vital check on compelling scientific breakthroughs that seem too good to be true. A brief review of lobotomy as a treatment for mental illnesses illustrates some of the problems caused by conducting research without proper controls. An interactive review of a retracted paper testing the effect of serine protease inhibitor neuroserpin on hippocampal neuron length introduces some ways to carefully assess for issues with experimental controls.

Goal

Understand the importance of experimental controls and begin building an intuition for how to spot problems with controls in published work.

1.1 The hot new treatment.

A new treatment was developed in 1935. You are working with patients who suffer from an incurable illness that could now be treatable. Published studies emerge with promising results. They report a majority of patients as cured or improved! And, patients who did not experience positive outcomes are reported as unchanged in condition.

This seems extremely positive. Let’s hear from one research team directly:

“...I do not wish to make any comment since the facts speak for themselves. These were hospital patients who were well studied and well followed. The recoveries have been maintained. I cannot believe that the recoveries can be explained by simple coincidence. “

The second of 2 small studies into the treatment, n=38 total.

One pilot study for the procedure reports that the treatment alleviates patients’ “insomnia, nervous tension, apprehension and anxiety.” Patients are “more comfortable” after the intervention. And, this seems like a good paper where the authors acknowledge the limitations of this approach. They note that “every patient loses something by this operation . . . Some spontaneity, some sparkle.” But, the results are incontrovertible. The symptoms identified as the most severe hallmarks of illness appear, in a great many cases to be cured.

A small but mounting body of evidence into the treatment appears to confirm these initial findings. Every participant in every study receives the treatment. Furthermore, this treatment has produced incredible results when comparing patients’ condition pre- and post-treatment.

Pause:

Would you use this treatment?
Is it ethical to delay treatment for a terrible disease?
What information do you want to know from the research teams reporting their success with this treatment?

50,000 lobotomies were performed in the United States between 1949 and 1952.

In 1947, the Nobel Prize was awarded to António Egaz Moniz for pioneering the technique. Champions of lobotomy were respected, scientifically trained, and sincere in their efforts to improve the lives of their patients.

There is more to the story, however. Kramer (1951) reported, on the basis of a comprehensive survey of US psychiatric facilities from 1949-1951, that most lobotomy patients in the United States were women. Tone and Koziol (2018) observe that markers of successful lobotomy treatments outcomes were influenced by the presumptions of practitioners, and included unscientific measures such as praise from patients’ husbands.

Lobotomy is no longer believed to be an effective treatment for mental illness. But, what are we to make of all those patients that appeared to be cured in early studies? How did things get so askew as to convince reputable, well-trained scientists that they had discovered a miracle cure?

Part of this problem has to do with intake practices for lobotomy patients. Patients selected for treatment tended to be newer patients, often early or immediately following their first hospitalization. Someone who has been recently hospitalized as a result of their chronic and severe mental illness might be in a position where the effect of being hospitalized alone could have a significant impact on the severity of their illness. Mental illnesses are often weakly defined and can resolve spontaneously. Patients with a long history of hospitalization are those who have not recovered on their own. And, younger patients were disproportionately selected for lobotomies under the presumption that healthier and younger patients would have a better chance at recovering from such an invasive procedure. Younger patients hospitalized for the first time may also be patients more likely to recover on their own, and there is no way to know if this is the case without including people not subject to the intervention in an experiment.

Hindsight makes it easy to identify problematic research. But, when we do research we are in the present. Let’s explore a study design that could more accurately measure how lobotomy treats severe mental illness.

McKenzie and Kaczanowski (1964) conducted a five-year study on 366 patients with severe mental illness. Half of the study participants received a frontal lobotomy. The other half of patients received standard psychiatric care and did not receive a frontal lobotomy. Patients between groups were intentionally matched on age, sex, type of mental illness, and severity of mental illness (such as chronic illness vs. recent internment). Over a five-year span, 43.17% (79/183) of patients in the treatment group were discharged from treatment facilities, while 37.16% (68/183) of patients in the control group were discharged.

True, 6 percent could be a meaningful effect size, but let’s see how the team reported these findings:

“One hundred and eighty-three patients who had received prefrontal leukotomy over a period of several years were compared with a closely similar group who had not undergone the operation. No significant differences in rate of hospital discharge after a five-year follow-up period were found between the two groups. The groups showed no consistent difference in outcome in relation to diagnosis, chronicity vs. intermittency of illness, heredity, education, family attitude, premorbid adjustment, or degree of insight on the part of the patients. It is concluded that prefrontal leukotomy does not produce any rate of remission significantly beyond that to be expected without the operation.”

When they tested the difference between treatment and control groups, the research team concluded that the difference between the intervention and the control groups is more likely than not due to random chance. By conducting a rigorous analysis of their findings, and using a control group to establish a baseline for no effect, the research team was able to demonstrate that lobotomy was not an effective treatment.

Unfortunately, lobotomy was abandoned not in response to compelling findings like those of McKenzie and Kaczowski, but due to the advent of new medications.

“One hundred and eighty-three patients who had received prefrontal leukotomy over a period of several years were compared with a closely similar group who had not undergone the operation. No significant differences in rate of hospital discharge after a five-year follow-up period were found between the two groups. The groups showed no consistent difference in outcome in relation to diagnosis, chronicity vs. intermittency of illness, heredity, education, family attitude, premorbid adjustment, or degree of insight on the part of the patients. It is concluded that prefrontal leukotomy does not produce any rate of remission significantly beyond that to be expected without the operation.”
McKenzie and Kaczanowski (1964).

Most researchers are not in a position to conduct research studies on irreversible procedures at such a large scale. But, non-significant findings happen all the time. Thinking about how things went so awry with lobotomy can help us to consider how we consume information about other flash-in-the-pan research findings.

Many studies preceding McKenzie and Kaczowski did not include a dedicated control group. This includes the paper by Freeman and Watts (1937) quoted without attribution above, that found patients to be more comfortable after receiving a lobotomy even if they lost a certain “spark.”

Absent proper experimental controls, it is very difficult to evaluate whether an intervention has done anything at all. Without establishing a dedicated control group, it may be difficult or impossible to measure differences in outcome for those who receive an intervention versus those who do not.

Later studies did include control groups, but they were still vulnerable to error due to poor controls. If a researcher has just spent an extraordinary amount of time and effort learning how to perform a lobotomy and then performing several lobotomies, they might be especially attentive to patients recovering from the procedure. Worse, and as we cover in lesson 6 of this unit, researchers’ biases may accidentally influence their assessment of outcomes. And, without identifying and attempting to match the conditions experienced by both groups to demonstrate that difference between groups can be isolated to the presence or absence of an intervention, there may be other kinds of less obvious interventions occurring in patients who do not receive lobotomies that are just as effective as the surgical intervention.

Research cannot distinguish false findings from true findings without methods to isolate the actual effects of the interventions under consideration and demonstrate that a given experimental design is capable of identifying a positive result. This is where controls come in.

1.3 A concerning study design.

Consider the following experiment. A research team is attempting to identify causes of neuronal growth and connectivity in the developing and adult nervous system. They hypothesize that a brain-specific serine protease inhibitor called neuroserpin plays a key role in regulating neuronal development.

The team obtains a uniform group of 18 day old rat embryos. They dissociate hippocampal cultures from the embryos by treatment with papain and trituration with a fire-polished pasteur pipette. The control group is placed into an assumed-sterile neurobasal medium that is replaced every 7 days. The intervention group is treated with filtered recombinant neuroserpin. Immunohistochemistry is used to stain axons, and algorithmic image analysis is used on randomly selected grids to measure axon length as a metric of neuronal growth.

Hindsight makes it easy to identify problematic research. But, when we do research we are in the present. Let’s explore a study design that could more accurately measure how lobotomy treats severe mental illness.

McKenzie and Kaczaowski (1964) conducted a five-year study on 366 patients with severe mental illness. Half of the study participants received a frontal lobotomy. The other half of patients received standard psychiatric care and did not receive a frontal lobotomy. Patients between groups were intentionally matched on age, sex, type of mental illness, and severity of mental illness (such as chronic illness vs. recent internment). Over a five-year span, 43.17% (79/183) of patients in the treatment group were discharged from treatment facilities, while 37.16% (68/183) of patients in the control group were discharged.

True, 6 percent could be a meaningful effect size, but let’s see how the team reported these findings:

“One hundred and eighty-three patients who had received prefrontal leukotomy over a period of several years were compared with a closely similar group who had not undergone the operation. No significant differences in rate of hospital discharge after a five-year follow-up period were found between the two groups. The groups showed no consistent difference in outcome in relation to diagnosis, chronicity vs. intermittency of illness, heredity, education, family attitude, premorbid adjustment, or degree of insight on the part of the patients. It is concluded that prefrontal leukotomy does not produce any rate of remission significantly beyond that to be expected without the operation.”

When they tested the difference between treatment and control groups, the research team concluded that the difference between the intervention and the control groups is more likely than not due to random chance. By conducting a rigorous analysis of their findings, and using a control group to establish a baseline for no effect, the research team was able to demonstrate that lobotomy was not an effective treatment.

Unfortunately, lobotomy was abandoned not in response to compelling findings like those of McKenzie and Kaczowski, but due to the advent of new medications.

Activity:

This is a lot of information. Try out this next activity to see this information laid out in a process diagram. Your goal is to learn what you can about the experiment from the provided diagram and identify concerns.

You may not be familiar with this experiment, or even this kind of science, but you can do it! Identify variables in the study that you think could impact result validity, then brainstorm the implications of failure to control those variables.

Click anywhere to start

Post-activity questions:

What variables did you find concerning?
Did you notice that the syringe filter could introduce potential contaminants? That’s a very specific and frustrating problem. The research team did not.
What could the research team have done differently?

1.4 Science is hard!

The study that served as the basis of this activity was later retracted because the syringe filter the research team used in the intervention group had contaminants that were responsible for the effect. It’s very easy to say that errors occur because people are foolish, or that other researchers are careless. But, most people don’t want to make errors, and it’s hard to know that a syringe filter is responsible for your findings until someone points that out to you. Most problems are unforeseen.

Careful study design, including the use of proper controls, addresses concerns that we are able to predict. Thinking about controls can help you to become better at predicting concerns in your own work. But, attending to concerning components of a research study extends well beyond the actual conducting of the study. Transparent reporting of methods is the only way for others to review and identify issues with an experiment after the fact. Rigorous science requires both careful study design that identifies concerns and implements controls to address those concerns combined with clear reporting of methods, findings, and limitations is the recipe for rigorous science.

Controls work to address concerns. There are many concerns that could impact the strength of a scientific claim. For many concerns, many controls. Across the remaining lessons in this unit we will explore how to properly establish experimental controls, avoid common issues with controls, and practice doing our best science in an imperfect world.

Takeaways:

Research lacking proper controls might seem obviously flawed in hindsight, but critically evaluating research methods, thinking carefully about concerns that might arise from those methods, and implementing controls is the best defense against a dangerous false positive result.
Experiments can be laid out in process diagrams to improve clarity and allow teams to work to identify potential concerns.
Controls work to address concerns. Concerns are any factor that can impact the strength of a scientific claim.

Reflection:

Can you recall a time you read research findings that were later walked back or retracted?
When was the last time you performed an experiment and discovered an unforeseen variable impacted your results?
What steps do you take to map out your research? Where do you incorporate and plan controls in that work?

Lesson 2:

Control, Controls, Controlled

Summary

This lesson disambiguates the word “control” in science. Is a “control group” the same as “controlling for” a variable? Not always. In this lesson, we’ll unpack these meanings and introduce a simple framework to help identify potential concerns and explore opportunities to manage variables that could mislead.

Goal

Identify potential sources of concern and differentiate among strategies available to apply experimental control — including but not limited to control groups.
Explore prior knowledge and consider the reasoning YOU use when selecting an approach to “control.”

2.1 The many faces of “control”

In everyday language, “control” means to take charge or hold something steady. In scientific research, though, the word wears many hats. Sometimes it refers to a comparison group without an intervention. Sometimes it means managing variability. Sometimes it’s a strategy for protecting results from bias.

We think we know what control means, but it can be surprisingly hard to nail down in an experimental setting.

How many of these uses of the word “control” look familiar to you?

Example 1. “We used untreated fruit flies, fed the same food and raised in the same environment, as a control group.” This is an example of Controls as a comparison group. In this example, control refers to another study population subjected to the same conditions as the intervention group for the duration of the experiment, but not given the intervention.

Example 2. “We controlled for the influence of circadian rhythms by testing all participants 2 hours after walking.” This is an example of control meaning removing the influence of a variable. In this example, controlling for a variable refers to keeping that variable constant within the study in order to insulate an experiment from the effects of that variable.

Example 3. “We used stratified randomization to distribute mice across treatment arms, controlling for sex and age.” This is an example of control meaning distribution to create comparable groups. This example describes a treatment allocation practice to achieve the same insulating effect while still allowing the full range of a variable to be included in the study. It does so by taking steps to avoid the potential for uneven distribution of variability across the study groups.

Example 4. “Controlling for levels of daily exercise, drug ABC reduced the severity of neuropathic pain by an average of 62%.” This is an example of control meaning a statistical correction for the effect of a variable. Control in this example refers to the calculations performed to identify the impact of a variable on the outcome of an experiment in quantitative terms, then adjust all results in a way that removes that impact prior to hypothesis testing.

What’s missing from these examples? They are all common examples of “scientific control”, yet none of them describes the main way we were using “control” in Lesson 1! Our missing lobotomy control was a separate group of subjects that could be used for comparison to test and eliminate alternative causes (like spontaneous remission) of an outcome from variables other than the independent variable.

Clearly, there is more to “control” than just control groups.

2.2 A new superfood?

Let’s imagine an example experiment. A research team is exploring the effects of a new superfood called the Rareberry. The Rareberry is reported to cause exceptional motor skill development in juvenile mammals. Does it? Let’s design an experiment to find out.

Our question: Do Rareberries affect coordination development in mice? The intervention, or independent variable, is the consumption of Rareberries. There are a lot of ways to demonstrate a juvenile mammal has superior motor skills. The dependent variable selected for this study is coordination, which serves as a proxy for their overall motor skills. We hypothesize that juvenile mice fed Rareberries will develop greater coordination than the control group.

Our plan: supplement one group of lab mice with Rareberries, then test their motor coordination on a rotarod—a standard assay where mice must maintain balance on a rotating rod.

If the Rareberry mice perform well, will that provide convincing evidence that the berries are effective? What if they don’t perform well? Will we be prepared to walk away confident that we have busted this myth? Rigorous controls ensure that research is able to make meaningful scientific claims about the world.

2.3 Concerns about validity

In any experiment, we will have an independent variable - our intervention - and a dependent variable - our outcome. Inevitably, though, the world is chock full of other variables. Many of these are sources of concern for our experiment.

Our independent variable: Rareberries

Our dependent variable: time on a rotarod

As a general rule, variables of concern will arise from one of 4 sources.

Subject: Factors like age, sex, genetic strain, and history.
Environment: lighting, noise, time of testing
Measurement: who does the scoring, what tools are used, how data are processed prior to analysis
Intervention: vehicle solution, delivery method, handling, time

Variables of concern may vary from study to study. A study exploring effects of an intervention on participants’ smoking habits may be concerned withparticipant’s existing smoking habits. A study exploring the effects of an intervention on social dynamics within a group could be concerned about potential sources of environmental agitation or aggression. A study exploring the effects of an intervention with qualitatively assessed measures might be concerned about raters’ biases in favor of a specific result. A study exploring the effects of serine protease inhibitor neuroserpin on neuronal growth would now know that they need to be concerned about the delivery and sterilization methods used to introduce neuroserpin into the experimental environment.

A Closer Look: The Rareberry and cage height

Consider this scenario for our Rareberry study: mice housed on higher shelves in the rack have a better view of the room. They can watch other mice cavorting below them. They receive more light exposure and tend to be active for longer periods of time. Over time, these environmental differences enhance their motor coordination.

If our Rareberry-fed mice happen to be more often found living on the upper shelves, how might this affect our results?

Pause to consider:

How might each of these categories (subject, environment, measurement, intervention) apply in our Rareberries experiment? Can you think of one variable that is likely to influence our outcome in each category?

Variables are concerning when we either know or suspect that they might affect our outcome. We have to be vigilant because - as we saw in the neuroserpin study - just because we don’t suspect a variable, that doesn’t mean we shouldn’t be concerned! Depending on how they act and how they are distributed, impactful variables might exaggerate, diminish, change, or obscure our outcome of interest. If this happens, it reduces the accuracy of our experiment: we will be unable to trust that any result - or lack thereof - is specifically attributable to our intervention.

This kind of accuracy has a name: internal validity. Uncontrolled variables threaten the internal validity of our experiment because they have the power to make our conclusions about the subjects in our study wrong.

Internal validity may be contrasted with external validity, which describes the extent to which our conclusions also apply to subjects or circumstances beyond our study. External validity is sometimes known as the “generalizability” of results. We’ll return to this later, when we discuss the frustrating trade-offs that come with some types of controls.

At its heart, “control” in experimental design means managing variables that could distort our understanding of cause and effect. Apart from statistical errors, failed controls of one kind or another may be the single most common source of wrong answers in science.

2.4 Strategies for control

It can be helpful to organize our options for control into categories. Consider how you might apply each of these options for managing a variable:

1. Constrain

Sometimes we manage variables of concern by keeping them constant or narrowing the range of variability allowed in our study. For example, using only male mice of a specific age limits the impact that effects of sex or age could have on our outcomes.

2. Distribute

When a variable isn’t constrained in our study design, we often take steps to prevent it from affecting any one group disproportionately. Depending on a study design, randomization methods can ensure that key variables are evenly distributed. They may still add to the variability of our dataset overall, but we can at least be assured that their influence across groups should be similar.

3. Test

Sometimes, we want or need to know what kind of an impact a variable has on our outcome. Testing for specific impacts or outcomes helps us contextualize any effect of our independent variable by comparing it to minimum, maximum, or alternative outcomes to understand a difference. Sometimes this may mean creating a new “control group”; alternatively, we may record and test for the effect of a variable like sex to help us understand how and whether our findings apply to the general population.

Activity: How do you think about control?

When we design an experiment, we apply control strategies to create the best experiment we can. In many cases, we may not be fully aware of why we are choosing one strategy over another, or have even considered that alternatives exist.

In our next activity, you will be given a description of a familiar experiment with several “concerns.” Use your best intuition to sort these concerns into the strategy bin that you think best fits the approach you would take. While in some cases there may be common answers, there isn’t necessarily a best answer for any given variable.

After sorting your concerns and explaining a bit about how you envision a strategy being applied to it, you will see how others chose to address each concern.

Click anywhere to start

Post-activity questions:

In sorting concerns into different categories, you had a chance to think about the differences and relationships among our options for experimental control. Now, consider the following questions:

Why did you make the choices that you made?
Were there some variables you always chose to distribute? To constrain?
Other variables that didn’t make sense to handle with anything but a control group?

The term “control” can mean many different things. This is not bad. But, cultivating an awareness of different applications of the term control and understanding why each of these roles matter is key to doing more rigorous science.

Takeaways:

Control, used colloquially, refers to things as divergent as setting up a formal control group and applying a statistical correction. Control groups, controlling for variables, controlling group allocations, and controlling variable influence on results are all different things and it’s important to know which you mean when you use the word control.
Controls are important because they are key determinants ofthe internal and external validity of an experiment. How the experimentalist knows that their result is accurate, and whether others can generalize the experiment’s results.
Broadly speaking, strategies for controlling concerning components of an experimental design follow three patterns: Constraining variables by keeping them constant or within a permitted range of low variability, distributing variables across subject allocations, and testing for the effects of variables, sometimes by creating control groups.

Reflection:

What kinds of factors would be important to you in making decisions about how to handle concerning variables in your own research?
Think of a choice you have had to make regarding control for your own research. How does your choice fit into our category bins? How would your research be different if you had chosen any of the other categories, as well or instead?

Controls

Clarity, not coincidence!

Original draft by Aparna Shah, Karen Word, CENTER team

Lesson 1:

The dangers of incomplete experimental design

Summary

Goal

1.1 The hot new treatment.

50,000 lobotomies were performed in the United States between 1949 and 1952.

1.3 A concerning study design.

Activity:

Post-activity questions:

1.4 Science is hard!

Takeaways:

Reflection:

Lesson 2:

Control, Controls, Controlled

Summary

Goal

2.1 The many faces of “control”

2.2 A new superfood?

2.3 Concerns about validity

2.4 Strategies for control

Activity: How do you think about control?

Post-activity questions:

Takeaways:

Reflection: