Can HIV cure trials strike a balance between the need for meaningful data and participant safety?

Elle Aon/

A recent analysis indicates that HIV cure trials thus far haven’t included enough participants to detect when treatments provide moderate benefits. As a result, researchers may be missing opportunities to study and improve upon drug combinations that could eventually lead to a cure. Dr Jillian Lau, Dr Deborah Cromer and colleagues, whose  analysis was published in The Journal of Infectious Diseases, propose a hybrid trial design that would maximise the potential of finding treatment benefits while minimising participant risk.


HIV cure trials often involve an ‘analytical treatment interruption,’ which requires people with HIV who participate in them to stop taking their antiretroviral therapy (ART) in order for scientists to observe how uncontrolled HIV responds to the medicines being evaluated.

Joining these types of clinical trials can be daunting for a person with HIV who is used to maintaining an undetectable viral load. Furthermore, previous research conducted by Dr Lau indicated that many people with HIV were unsure how cure trials work, and many were not willing to accept long periods with a detectable viral load.

HIV researchers are working on different strategies to try to cure HIV. Some therapies target and try to shrink the latent HIV reservoir, which are immune cells infected with HIV but which have not produced new HIV for many months or years. Other treatments try to enhance the immune system’s ability to control HIV.



To eliminate a disease or a condition in an individual, or to fully restore health. A cure for HIV infection is one of the ultimate long-term goals of research today. It refers to a strategy or strategies that would eliminate HIV from a person’s body, or permanently control the virus and render it unable to cause disease. A ‘sterilising’ cure would completely eliminate the virus. A ‘functional’ cure would suppress HIV viral load, keeping it below the level of detection without the use of ART. The virus would not be eliminated from the body but would be effectively controlled and prevented from causing any illness. 

treatment interruption

Taking a planned break from HIV treatment, sometimes known as a ‘drugs holiday’. As this has been shown to lead to worse outcomes, treatment interruptions are not recommended. 

historical control

A comparison group of people not taking an experimental drug, taken from previous clinical trials (when old data is compared to new data).


A pill or liquid which looks and tastes exactly like a real drug, but contains no active substance.

control group

A group of participants in a trial who receive standard treatment, or no treatment at all, rather than the experimental treatment which is being tested. Also known as a control arm.

There are two types of analytical treatment interruption studies used to evaluate potential cure therapies: time to viral rebound (TVR) studies and set-point studies. In TVR studies, after participants stop ART, researchers measure the time it takes for viral loads to first reach detectable levels (50) and then the time it takes to reach a higher threshold (often 10,000), at which point participants resume ART. When participants receiving the treatment take longer to reach those thresholds compared to a control group, that indicates the therapy is slowing their viral rebound. TVR studies are often used to evaluate therapies that target the reservoir, and they’re usually just a few weeks in duration.

Set-point studies typically evaluate how well participants’ immune systems control HIV. In these much longer studies (often months in duration), researchers allow participants’ viral loads to spike to high levels to see whether their immune systems will eventually control the virus at loads lower than the initial spike. The threshold to resume ART therapy is typically set at higher levels (often 100,000). When participants receiving treatment are able to control HIV at lower viral load levels and for longer periods than those in a control group, that suggests the therapy is enhancing their immunity against the virus. 

Although no specific guidelines dictate how to design treatment interruption studies, a group of HIV researchers met in 2018 to discuss ethical considerations and recommend practices  to mitigate risk. Stopping ART during these studies doesn’t just expose participants to the risk of prolonged periods of detectable, and, in some cases, high viral loads, which increases the risk of  illness. Higher viral loads can also expose participants’ HIV-negative sex partners to the virus.

Thus far, most studies have reduced risk by having small numbers of participants and by not using a control group who take an inactive placebo instead of the treatment. Including a control group is usually important because it enables researchers to compare how those receiving medicine respond to those who don’t. However, the scientific community has debated the ethics of using control groups in these kind of cure trials because they expose  those people to prolonged periods with elevated viral loads without receiving any – even experimental – medicine.

The study

Dr Lau and colleagues used data from previous clinical trials to create mathematical models that enabled them to evaluate how various clinical design parameters impact a trial’s statistical power.

Statistical power is analogous to testing sensitivity. An HIV test with a 80% sensitivity, for example, would successfully identify 80% of people who are actually HIV positive and would miss detecting 20% of people who are HIV  positive. When a clinical trial evaluating a treatment therapy is designed to achieve 80% statistical power, and assuming the treatment has an actual benefit to those taking it, the trial would have an 80% probability of detecting the benefit and a 20% probability of missing it.

Simplistically, the more data a trial has, the more statistical power it will have. Parameters such as the number of participants, the number of controls, the duration of a trial, and the frequency of blood tests affect the statistical power of a trial.

TVR trial modelling results

Lau and colleagues modelled TVR studies designed to achieve 80% statistical power to see how the number of participants, controls, duration, and frequency of blood tests affected their ability to detect “reductions in reactivation frequency.”

This term relates to the reactivation of latent cells in the HIV reservoir. Latent cells in the reservoir periodically and (scientists believe) randomly reactivate themselves to produce more HIV virus. This reactivation may lead to latent cells producing virus, but under successful ART, a person’s viral loads remain undetectable because the drug combination effectively controls it. In the absence of ART, as in treatment interruption trials, when latent cells reactivate, they eventually produce enough virus to detect. When researchers evaluate a therapy’s ability to “reduce reactivation frequency,” practically speaking, they’re evaluating a therapy’s ability to slow how quickly viral loads rebound after stopping ART.

Looking at a recent TVR trial that involved 13 participants, their model showed that at 80% power, the trial would only have detected treatments with very large reactivation reductions (between 70 and 80%). In order for a TVR trial to detect reactivation reductions as small as 30%, it would need 120 participants in both the treatment and control arms. Because most TVR studies use few participants, they are likely not detecting modest treatment benefits.

It may seem counterintuitive to design cure studies to detect lower treatment benefits when the ultimate goal is to find the highest possible benefit (i.e., a cure that results in 100% reactivation reduction). However, Dr Lau  and Dr Cromer told that because we’re in the beginning stages of finding a cure, we need to detect when therapies provide moderate benefits in order to decide whether that therapy warrants further study.

“Are we throwing the baby out with the bathwater when we're rejecting all these trials because we're not seeing a difference? And are we missing something because they haven't been powered to detect small differences that we could take forward and learn from?” Dr Cromer asked. “No matter how we're doing these studies, we are going to be missing things because we're just not looking at enough people,” she said.

In addition to recommending more participants, the team proposes using ‘historical controls’ to supplement control groups. Historical data from previous trials where people interrupted ART could augment the number of controls included and improve a trial’s ability to detect lower treatment benefits. They modelled a hypothetical TVR study that included 50 participants and 50 controls. At 80% power, that design could detect reactivation reductions down to 43%. Adding 150 historical controls (a total of 200 controls) would allow the same trial to detect reductions down to 36%.

This may be a modest improvement, but using historical controls could also reduce the size of a control arm. “We may not even necessarily need true placebo controlled trials going forward if we can access 20 years of historical control data,” said Dr Lau. If using historical data could lessen the probability of someone entering a placebo group from the typical 50% down to 25%, for example, Dr Lau said that more people might be willing to participate in these studies.

Their modelling also showed virtually no improvement in the ability to detect reactivation reductions by extending TVR trial durations beyond five weeks. Past this point, their model predicted improvements in detection no greater than 1%. Similarly, they found virtually no benefit to conducting lab monitoring more frequently than every week. Monitoring people twice a week barely nudged the ability to detect lower reactivation reductions by 1%.

The researchers conducted a separate analysis to estimate the maximum risk of HIV infection during TVR studies based on previous research that estimated the likelihood of transmission occurring at various viral load levels above the detectable limit. They estimated the maximum risk of transmission assuming participants engaged in unprotected sex and that PrEP or other prevention strategies were not used. Also, if weekly monitoring doesn’t include same-day reporting of viral loads, a participant who needed to resume ART (because their viral load rose above 1,000) would likely be delayed until the next weekly visit before doing so.

For this scenario, they estimated that the maximum risk of transmission during a five week TVR study with a viral load threshold of 1,000 to reinitiate ART was 3.6 out of 1,000 participants engaging in heterosexual sex. For those engaging in insertive anal sex, the maximum risk was about 7 out of 1,000, and for those engaging in receptive anal sex the maximum risk was around 70 out of 1,000. Changing a study’s design to include rapid viral load testing and same-day resumption of ART reduces the estimated maximum risk to 0.9 out of 1,000 for heterosexual sex, 1.8 out of 1,000 for insertive anal sex, and about 18 per 1,000 for receptive anal sex.

Set-point study modelling results

Because set-point studies typically evaluate immune control of HIV, one complication is that some people, called post-treatment controllers, have immune systems that are naturally good at controlling HIV after stopping ART. A previous study, called CHAMP, found that about 4% of people with HIV were post-treatment controllers (defined in that study as people who maintained viral load below 400 at least two-thirds of the time for 48 weeks after stopping ART). The study also found that post-treatment control was much more frequent – at around 13% – among people who started treatment soon after they were infected with HIV.

Set-point studies need to have enough statistical power to distinguish between the benefits of a proposed cure therapy and post-treatment controllers, who would show some degree of immune control after stopping ART with or without the proposed cure therapy. Using the CHAMP study’s findings, the researchers assumed the lower baseline rate of post-treatment controllers of 4%. If a trial’s goal was to identify an increase in the number of controllers up to 20% (meaning the therapy has helped people who aren’t natural post-treatment controllers to suppress the virus), their model showed that a 24-week set-point study with a statistical power of 80% would need 60 participants.

Because of the extremely high viral loads in typical setpoint studies (up to 100,000), the authors compared how using a more conservative threshold (1,000) to restart ART would affect set-point studies’ ability to detect increases in post-treatment controllers. The CHAMP study found that 55% of post-treatment controllers had initial spikes in viral loads below 1,000, and they continued to control viral loads below 1,000.

Using this data, Lau and Cromer assumed using a 1000-copy threshold would mask 45% of post-treatment controllers, making it harder to detect (lowering the power) when therapies enhance participants’ immune control of HIV. To recover the power, more participants would be needed.  As in the above example, assuming 80% power and a goal of detecting a 20% increase in post-treatment controllers over a baseline of 4%, the  lower viral load threshold would raise the participant requirement from 60 to 120 in both the treatment and control groups.

As with the TVR studies, the researchers also estimated the maximum risk of HIV transmission during setpoint studies using the same assumptions described above (no prevention strategies, unprotected sex, no rapid viral load testing, and a week’s delay before resuming ART). Because of the much longer trial durations and higher viral load thresholds (for this they referred to a set-point study that used 50,000), they estimated the maximum risk of HIV transmission was 13 per 1,000 for heterosexual sex, about 25 per 1,000 for insertive anal sex, and an incredibly high 214 per 1,000 for receptive anal sex.

Proposed hybrid trial design

A setpoint study is generally more robust because it is longer and generates more data, which means they tend to have higher statistical power than shorter TVR studies. “What this paper is saying is that the trade-off between the power that you get is not, in our opinion, always sufficient to warrant the increased risk that’s associated with a setpoint study,” Dr Deborah Cromer said.

Based on their modelling and estimates of maximum transmission risk, the researchers propose a hybrid model be used for treatment interruption trials. In their proposed scheme, treatment interruption trials would begin with a five-week TVR study in accordance with their finding that no detection benefits occur past this duration. Although TVR studies tend to be used when therapies target the HIV reservoir, people responding to therapies targeting enhanced immunity would also exhibit slower viral rebound (if the treatment was effective).

Beginning a trial to evaluate a potential cure treatment with a TVR study would expose participants to much shorter durations of treatment interruptions while allowing researchers to determine whether that treatment showed enough benefit to proceed to a longer setpoint study. If clinicians agree further study is warranted and patients controlling viral loads below 1,000 agree to move to the next phase, they would continue close monitoring for 24 weeks or until such time that participants’ viral loads exceed 1,000, when they would restart ART.

The researchers recommend including rapid viral load testing and same-day ART resumption to minimise the risk of transmission. In their estimation, limiting the set-point portion to those already showing control to viral loads below 1,000, reducing the viral load threshold to resume ART to 1,000, limiting the duration to 24 weeks, and incorporating point-of-care testing and same-day ART reduces the maximum risk of transmission down to 0.2 per 1,000 for heterosexual sex, 0.35 per 1,000 for insertive anal sex, and 3.1 per 1,000 for receptive anal sex—a huge improvement over their estimate for traditional set-point studies.

If the initial TVR study indicates the treatment does not warrant further study (for example, if no benefit is detected), then researchers would avoid the cost of running a lengthy setpoint study.


The researchers conclude that cure trials do not use enough participants to give them sufficient statistical power to detect moderate treatment benefits. Because most cure trials also don’t use control arms, quantifying a treatment’s benefit is extremely difficult. The researchers recommend collaborating to create a historical control database that would enable trial designs that don’t completely rely on placebo control groups. However, they do point out that using historical control data means including people more likely to have begun ART during chronic HIV infection and people using older ART formulations, which could act as confounders.

Nonetheless, reducing the number of people given a placebo, viral load thresholds, trial durations, and the risks of HIV transmission might encourage more people with HIV to participate in treatment interruption studies.

Though the researchers recommended a hybrid trial design, Dr Lau said they do not intend for every future trial to use this approach. Instead, they want HIV clinicians to weigh the benefits of learning something that contributes to developing a cure against the risks participants (and their sexual partners) face. “Where do we find that balance and could this hybrid model shift that towards something that's a bit safer, a bit more scientifically robust, and a bit more acceptable to the trial participants as well?”


Lau J et al. Balancing Statistical Power and Risk in HIV Cure Clinical Trial Design. The Journal of Infectious Diseases, online ahead of print, 2 February 2022.

doi: 10.1093/infdis/jiac032