Randomization
Randomize to know the effect is real.
Lesson 1:
Knowing results are real
Summary:
This lesson introduces the problem of baseline imbalance, a systematic difference between treatment and control groups at the start of a study that can distort conclusions about a treatment's effect. Baseline imbalances can arise from subject characteristics, experimental factors, or external influences, and they may be known to researchers or entirely unrecognized. They can stem from unconscious bias in the assignment process itself. Random allocation addresses these problems by removing bias from group assignment and distributing potentially influential factors across treatment and control groups, reducing the risk that group differences drive the observed outcome. The rest of this unit will teach appropriate methods to implement randomization to support valid conclusions.
Goal:
- Identify how baseline imbalances between groups can lead to incorrect conclusions about a treatment's effect.
- Examine data to identify subject characteristics or experimental factors that may be imbalanced between groups.
- Recognize how random allocation reduces the risk of baseline imbalance and supports rigorous causal inference.
1.1 Example study: lithium treatment in an ALS mouse model
To understand the role of randomization in experimental design, let's examine a neuroscience study. A team of researchers is investigating whether lithium treatment could improve survival in an ALS mouse model based on previous findings from Fornai et al. 2008. They're working with a transgenic mouse model of ALS comparing the effects of lithium carbonate solution against a saline vehicle control. To measure the effectiveness of the treatment, they'll track survival time after treatment initiation in a treatment and a control group. Before treatment, researchers recorded mouse characteristics including age, sex, weight, and motor ability (time to fall on rotarod).
Fig. 1. Group allocation of mice to treatment and control conditions. For every cage of four mice, the first two mice caught are given the saline control. The remaining two mice receive the lithium treatment.
1.2 Balancing treatment groups
The research team develops a protocol to select which mice are treated and which mice will serve as controls. From each cage of four mice, the first two mice caught receive the vehicle control (saline solution). The remaining two mice in the cage are then assigned to receive the lithium treatment.
1.3 Did lithium slow ALS disease progression?
After running the experiment and analyzing their data, they find encouraging results. The mice in the lithium treatment group show longer survival times compared to the control group, and the treatment effect appears promising. Based on these initial findings, it seems lithium could potentially improve outcomes for ALS patients.
Fig. 2. Survival in ALS mice with and without treatment.
Survival in days following treatment with lithium or saline solution (control).
1.4 Examining the evidence
However, before accepting these results, let’s carefully examine their experimental design and data. Several questions emerge: What exactly is driving the survival difference between these groups? Can we confidently attribute this effect to the lithium treatment? We need to consider what other factors might influence both how mice were assigned to treatment groups and their survival. How can we validate these conclusions?
Let’s look at several mouse variables that could affect these results: weight, sex, which litter they came from, and age. We need to also look at the estimated disease stage of each mouse at the start of the experiment. In this model, a rotarod test is used to monitor motor ability, a proxy for disease progression. In ALS, as motor neurons die and muscles waste, the mice fall off the rotarod sooner. Thus, decreased “time to fall” may indicate a more advanced disease stage. Which of these factors might have influenced the treatment assignment or survival of the mice?
Activity: Look for an imbalance
Analyze each variable to identify potential relationships with treatment assignment and survival.
Post-activity questions:
- Which covariate(s) are a problem for drawing a conclusion about lithium treatment?
- Besides the confounders that were provided, are there others that could be relevant?
1.5 A closer look at mouse behavior
Our analysis of the experimental procedure reveals a pattern in how mice were selected for different treatment groups. The researchers notice that more active mice are consistently harder to catch, while less active mice are typically caught first. In this ALS mouse model, activity level correlates with disease progression. Mice in earlier disease stages are more active, while those in later stages show reduced activity (Knippenberg et al., 2010).
1.6 Disease progression influences treatment
Examination of the data confirms a disease progression bias in treatment selection. The mice that ended up in the lithium treatment group had longer times on the rotarod (time to fall), meaning they showed a less advanced disease stage. The control group contained more mice at a later disease stage. This difference in starting conditions is referred to as a baseline imbalance, and it can both influence outcomes and bias results (Roberts & Torgerson, 1999). In the next lesson, we will explore more ways that biases, both conscious and unconscious, can influence experimental design.
Fig. 3. Baseline distribution of time to fall in seconds by group allocation.
Disease stage is measured by the time a mouse remains on the rotarod (time to fall), measured in seconds. Shorter times equal a more advanced disease stage.
1.7 Disease progression relates to survival rates
Disease stage is related to survival. Mice in an earlier disease stage survive longer.
Fig. 4. Association between the baseline time to fall on rotarod and mouse survival at the end of study.
Disease stage is measured by the time a mouse remains on rotarod. Shorter times equal a more advanced baseline disease stage.
1.8 The problem of confounding variables
A confounding variable is a variable that influences both independent and dependent variables in a study. It creates an indirect relationship between independent and dependent variables. As a result, confounding variables can mask or amplify the influence of an independent variable on a dependent variable.
In this study, the relationship between disease stage, group assignment, and survival illustrates why disease stage is a confounding variable.
Disease stage creates a spurious correlation between treatment and survival that complicates our interpretation of the results. It influenced both whether mice were assigned to receive lithium treatment and how long they survived. This creates ambiguity about the treatment effect: are we seeing longer survival because of the lithium, or simply because the lithium group happened to include more mice with an earlier disease stage?
1.9 Implications for this study: we don’t know if the effect is real
The implications of this confounding extend beyond just disease stage. While we've identified disease stage as a confounder, there might be other confounding variables that were not measured. Even more concerning, there could be confounding variables the researchers didn't even know about and therefore couldn't measure. This kind of systematic bias in treatment assignment makes it impossible to draw clear conclusions about the treatment's effectiveness. This will be explored in more detail in Lesson 2.
1.10 The power of proper randomization
What if these researchers revised their experimental approach? Instead of assigning treatments based on which mice are caught first, they could have implemented randomization of treatments from the very beginning (Kang et al. 2008). Before any handling begins, each mouse should be randomly assigned to an experimental condition. This ensures that the ease of catching a mouse has no relationship to which treatment it receives.
Fig. 5. Non-randomized treatment allocation resulted in a baseline imbalance.
Distribution of baseline time to fall by treatment group when mice were assigned by catching order.
Fig. 6. Randomized treatment allocation would have prevented the baseline imbalance.
Distribution of baseline time to fall by treatment group when mice were randomly assigned to each group.
1.11 Random allocation reduces the risk of bias
When we randomize treatment assignments, we distribute potential confounding factors - both the ones we know about and the ones we don't. Randomizing which mice receive lithium or control solutions will allow the researchers to study the relationship between lithium treatment and survival, and to make valid causal inferences about the treatment's effectiveness. In later lessons, we will explore the different types of randomization and how they can be applied to prevent baseline imbalance and minimize possible biases.
Fig. 7. Non-randomized treatment allocation resulted in a baseline imbalance that confounded the final result.
Survival of mice by treatment group when mice were assigned by catching order.
Fig. 8. Randomized treatment allocation would have prevented the appearance of a false effect in the study outcome.
Survival of mice by treatment group, when mice were randomly assigned to each group.
1.12 Next time, randomize!
Random allocation provides multiple benefits for experimental design. It reduces biases in how subjects are assigned to treatment groups. Randomization can help avoid unbalanced groups where one treatment has significantly more subjects than another. Perhaps most importantly, it distributes the effects of both known and unknown confounding variables across treatment groups. This distribution allows us to make stronger causal inferences about treatment and ensures we meet the assumptions required for our chosen statistical analysis.
1.13 Practical implementation
To implement proper randomization in this ALS study, the research team needs to make several specific changes to their protocol. Before beginning any experimental procedures, they must generate a complete randomized allocation sequence that assigns treatments to all mice. This sequence should be created using appropriate randomization software or methods, not by arbitrary selection. They must strictly follow this pre-generated sequence when providing treatments, regardless of how easy or difficult each mouse is to handle.
In later lessons, we will talk through the details of how to generate and implement this randomized sequence into different research scenarios, and how to ensure that the appropriate people are kept masked or blinded from which mice are receiving which treatment. Throughout the experiment, they must carefully document the randomization process and any deviations that occur.
By following these more rigorous procedures, the researchers can be much more confident in their conclusions about whether lithium truly affects disease progression in this ALS model. The extra effort required for proper randomization is outweighed by the increased validity and reliability of results.
Takeaways:
- Without proper randomization, baseline imbalances between groups can create misleading correlations between treatment and outcomes, making it impossible to determine if observed effects are truly caused by the intervention.
- Random allocation distributes both known and unknown influential factors across groups, allowing researchers to make valid causal inferences about treatment effectiveness.
- Implementing a complete randomization sequence before beginning an experiment and strictly following it throughout the study is essential for reducing systematic bias and increasing the validity and reliability of research findings.
Reflection:
- What subject characteristics, experimental factors, or external influences could create baseline imbalances in your research?
- What challenges do you imagine could arise when not properly implementing randomization protocols?
- How would you explain to a non-scientist the importance of randomization for ensuring proper conclusions?
Lesson 2:
Random treatment allocation
Summary:
This lesson explores what it means to have a random allocation sequence. Why is simply alternating allocation between treatment group and control group an issue? Non-random allocation can introduce systematic biases that are a problem for producing rigorous results. Although true random allocation may appear unbalanced, it achieves the primary goal of eliminating systematic biases to produce valid conclusions.
Goal:
- Distinguish between random allocation sequences and other allocation sequences that are balanced.
- Recognize how non-random allocation sequences can introduce bias that challenge study validity.
- Recall that the primary goal of random allocation is to reduce systematic biases.
2.1 A need for balance
In Lesson 1, we explored how the lithium treatment study was compromised when researchers unknowingly assigned mice to treatment groups based on how easy they were to catch. Imbalanced group characteristics led to confounding, where differences in outcomes were caused by pre-existing differences between groups rather than the intervention itself. This cautionary tale illustrates a fundamental rigor principle: if our treatment and control groups aren’t comparable at baseline, we cannot confidently attribute any differences in outcomes to the intervention.
Therefore, when designing experiments, researchers face two fundamental balancing challenges:
- Balance in group characteristics: Creating treatment groups that are comparable in all aspects except the intervention being tested. This balance is needed for drawing valid conclusions about whether a treatment actually caused the outcome. Many statistical tests we rely on assume that observations are independent and groups are equivalent at baseline. When these assumptions are violated, the interpretation of results must take these violations into account.
- Balance in group sizes: Ensuring that treatment groups have a similar sample size, which gives us the most statistical power to detect real effects.
If one group has far fewer subjects than another group, the precision of estimates for the group with fewer subjects decreases, potentially obscuring real treatment effects or requiring larger overall sample sizes to achieve the same power in a statistical analysis. With uneven group sizes, our ability to detect real effects decreases, and we might miss important discoveries.
What if we want to balance treatment groups in a study - how would we do it?
2.2 Balancing treatment groups in the wild
Researchers often recognize the need for balance and try various approaches to achieve it. Let’s look at some common methods:
- A parasitologist is administering a parasite to mice in pairs. Within each pair, a control or experimental treatment is then assigned in an alternating fashion (first treatment, then control).
- A cell biologist always replaces the cell culture media of the control wells first followed by the treatment wells.
- A clinician assigns the patients in the first week of an epidemic wave to receive treatment and the next week to control.
- A neuroscientist assigns treatment by tank, such that all fish in a given tank belong to either the experimental or control group.
- A clinical trial coordinator assigns the treatment group based on each patient’s date of birth or date of trial entry.
- A surgeon conducts sham surgeries in the morning and an equal number of experimental surgeries in the afternoon.
These approaches may seem reasonable and can create the appearance of balance, but they can also introduce biases. Unfortunately, such approaches are often reported as “randomized” in the literature. Let’s see if we can identify the consequences of using truly random assignment versus these other approaches.
Activity: The challenge of true randomization
When researchers claim to use randomization in their studies, what does that actually mean in practice? In this activity, consider three sequences from published studies that all claimed to use "randomized allocation".
Which one represents true randomization?
Post-activity questions:
- What information did you use to decide whether a sequence was random?
- What did you notice about the non-random sequences?
2.3 The deceptive nature of alternation: the problem with Sequence 1
Looking at Sequence 1, we see a perfect alternating pattern between groups A (control) and B (treatment). This creates perfectly equal group sizes, but it introduces several potential biases:
- Selection bias: If researchers know the next assignment will be treatment B, they might unconsciously select a "better" participant for that group. Imagine a researcher thinking "the next animal looks stressed” and making a judgement call about whether to enroll them into a stressful intervention or moving to the next animal instead.
- Order effects: Subjects or samples could be unintentionally ordered in some way, such as by such as by size, litter, batch, or cohort. A small but consistent difference could emerge. If the same order is followed throughout the experiment, environmental factors that vary systematically with time (morning vs. afternoon, day of week, seasonal effects) become confounded with treatment groups. This can include personnel effects if certain staff are more likely to work with one treatment group due to work scheduling. It can also include any changes that can happen over time, such as wearing down of an instrument.
Consider a study where treatment A is always administered to a zebrafish in the morning and treatment B in the afternoon due to this alternating pattern. Any natural variations in zebrafish physiology related to light or temperature would become entangled with the treatment effect. Additionally, the researcher might unconsciously select healthier-looking fish for a certain treatment or handle the fish differently based on the predictability of the treatment assignment.
2.4 Human attempts at "randomness": the problem with Sequence 2
Sequence 2 might feel more “random” at first glance. It avoids patterns while maintaining balance between treatments. However, this sequence shows signs of manual allocation by a researcher attempting to "look random."
When humans try to generate random sequences manually, we have “tells” that make it easy to distinguish our sequences from real random sequences:
- We avoid long runs of the same value, rarely putting more than 2-3 As or Bs in a row
- We maintain closer to 50-50 balance than true randomness would produce
- We create patterns that "feel" random to us, but aren’t actually random
Manual allocation looks more “random" than alternating allocation, but it comes with the same risks. It introduces dependence between each consecutive assignment in order to maintain the appearance of randomness. Selection bias may also happen unconsciously if animals are manually assigned to groups, and we cannot trust our intuition as to whether our treatment assignments feel “random enough”. Learn more about the impact of unconscious bias in research in our Confirmation Bias Unit.
2.5 True randomization revealed: why Sequence 3 is different
Sequence 3 might have seemed less "random" than Sequence 2, but it is the only randomized sequence! True randomization often produces results that surprise us:
- It can create long runs of the same value (like seven A's in a row)
- It may result in unequal group sizes (here, 20 A's and 10 B's)
- It doesn’t feel random! Human brains are pattern-seeking, and true randomness often feels wrong to us. We might notice how any random sequences in a row might start with A, or we think we see patterns or tendencies that feel non-random
Our intuitions about randomness can lead us to try other methods instead, but true randomization is key for containing bias in research. Compared to these other approaches, only randomization:
- Reduces selection bias in research
- Ensures independence of assignments
- Allows valid statistical inference
- Balances both known and unknown confounding variables
With alternating allocation, order effects get amplified across hundreds of samples. If morning vs. afternoon creates even a small physiological difference, this effect compounds across the entire study, potentially creating a false treatment effect.
Manual allocation gives unconscious biases more opportunities to create systematic differences between groups. Even small biases in how researchers assign treatments can accumulate into significant group differences.
In contrast, while true randomization might create some imbalance with smaller sample sizes, it tends to produce more comparable groups overall. The larger the study, the more likely that important variables will be distributed similarly between groups.
2.6 When “random” isn’t actually random
Sometimes, methods that appear random actually follow predictable patterns or systematic rules. We can distinguish this false randomization from true randomization if the allocation techniques can be reproduced to yield identical group assignments repeatedly. Common examples include assigning treatments based on odd or even identification numbers, using subject birth dates or admission dates, or allocating treatments by position in plates, cages, or tanks.
False randomization can even occur when researchers start with proper randomization methods. Bias can infiltrate during implementation if researchers generate a proper random sequence but then selectively apply it, regenerate "unfavorable" sequences until getting one that looks better, or make post-hoc adjustments to achieve what feels like “better” balance.
To maintain the integrity of the randomization, selection should occur independently of assignment. For instance, group allocation assignments can be done in advance using identification numbers. If a researcher first consults the random sequence and then selects a subject to allocate into the designated group, they might inadvertently introduce bias when implementing randomization. They might think, "I got a 'B' assignment, let me choose this particular mouse for the B group." This subtle procedural detail makes a significant difference in maintaining the integrity of randomization by ensuring that individual selection happens independently of treatment assignment, preserving the core benefit of randomization—the elimination of systematic biases, both conscious and unconscious, that could affect experimental outcomes.
Instead, researchers should ensure their randomization method is stochastic and not replicable. Researchers should use random number generators, generate complete allocation sequences before beginning studies, conceal these sequences from personnel involved in recruitment and treatment, and strictly follow the pre-specified sequences without deviation.
2.7 What does true randomized allocation entail?
These are the essential steps for the creation of truly randomized groups:
- Create a randomized allocation sequence.
- Select an animal/participant/sample.
- Look at the next entry in the allocation sequence and perform the assignment.
Only in this way can we avoid the systematic biases that risk confounding our study. If we know the experimental group prior to selection of the subject, it can influence the selection!
2.8 Real-world consequences of non-random allocation
Remember those other balancing approaches mentioned in the beginning of this lesson? Let’s take a deeper look at how those choices impacted their studies:
Parasite buildup: When administering the parasite to each mouse pair, parasites built up in the dropper between the first dose and the second dose. This meant that the second mouse always received more parasites. With subsequent group allocation, treatment mice were consistently assigned first, and so the control group systematically received higher parasite loads, confounding the results.
Cell viability: In the cell culture experiment where controls were always processed first, the treatment cells experienced longer exposure to air and slightly more desiccation. This made the order of cell culture media replacement a confounding variable affecting cell viability independently of the treatment.
Epidemic virulence: When clinicians assigned the first week of patients in an epidemic to treatment and the next week to control, they inadvertently created a temporal confound. As epidemics progress, pathogens often evolve reduced virulence, meaning control patients (who came later) had less virulent infections.
Environmental gradients: In the zebrafish study, researchers assigned treatments "randomly" by tank position. However, undetected temperature and light gradients across the room created systematic environmental differences between treatment groups, violating the statistical assumption of independence.
Clinical identifier: Assignment of treatment based on a patient’s date of birth or trial entry creates a predictable rule that increases the risk of selection bias. It can also create a baseline imbalance due to correlations between the group and patient age and thus disease severity.
Skill improvement: In the surgical intervention study, the first patients of each day received the surgical sham procedure while later patients received the experimental procedure. Over the day, surgeons improved their technique, creating an advantage for the later patients that had nothing to do with the intervention itself.
2.9 Remember the goal
The goal of randomization isn't to create perfectly balanced groups or sequences that look random to our eyes. Instead, randomization serves to eliminate systematic biases and allow valid causal inference about treatment effects. In the above consequences of non-random allocation, the lesson is not that we need to identify every source of bias. Instead, we cannot know and anticipate all confounding variables, so randomization helps minimize the risk of known and unknown confounds.
In the next lesson, we'll explore different methods of randomization and how to select the best approach for our specific research context.
Takeaways:
- True randomization often produces results that appear non-intuitive, like long runs of the same treatment or unequal group sizes, yet it remains essential for reducing selection bias while distributing both known and unknown confounding variables across treatment groups.
- Common non-random allocation methods like alternation or manual allocation may create the appearance of balance but introduce systematic biases that can invalidate research findings.
- To maintain randomization integrity, researchers should use proper random number methods, create complete randomization sequences before beginning studies, conceal these sequences from personnel involved in recruitment and treatment, and follow pre-specified sequences without deviation.
Reflection:
- What are some simple ways you can implement random allocation in the lab to reduce the temptation to use manual or alternating allocation?
- What are other examples of time-related factors that could be a problem when using an alternating allocation sequence?
- How would you convince a colleague that an "unbalanced-looking" random allocation is actually more desirable than a non-random balanced allocation sequence?
