Electronic health records can help select candidates for HIV PrEP

Douglas Krakower at IDWeek 2016. Photo by Liz Highleyman, hivandhepatitis.com
This article is more than 8 years old. Click here for more recent articles on this topic

A machine learning algorithm used to analyse electronic health records (EHRs) identified high-risk individuals who could potentially benefit from HIV pre-exposure prophylaxis (PrEP), according to a report presented this week at IDWeek 2016 in New Orleans. Out of 800,000 patients in a large EHR database, more that 8000 were found to be potential PrEP candidates.

The US Food and Drug Administration (FDA) approved Truvada (tenofovir/emtricitabine) for HIV prevention in July 2012. Studies of gay and bisexual men have shown that PrEP reduces the risk of acquiring HIV by more than 90% if used consistently, with no new infections among people who took it at least four times a week.

PrEP use has accelerated rapidly in recent years as clinical trials and demonstration projects continue to confirm its safety and efficacy. It has been difficult to estimate how many people have used PrEP because this information is not centrally collected. A recent survey of retail pharmacies by Gilead Sciences found that more than 79,000 people in the US have taken PrEP over the past four years. But the Centers for Disease Control and Prevention estimates that more than 1.2 million people could potentially benefit from PrEP, including a quarter of sexually active gay and bisexual men.



Improvement in a tumour. Also, a mathematical model that allows us to measure the degree to which one of more factors influence an outcome.

pilot study

Small-scale, preliminary study, conducted to evaluate feasibility, time, cost, adverse events, and improve upon the design of a future full-scale research project.


Food and Drug Administration (FDA)

Regulatory agency that evaluates and approves medicines and medical devices for safety and efficacy in the United States. The FDA regulates over-the-counter and prescription drugs, including generic drugs. The European Medicines Agency performs a similar role in the European Union.

mathematical models

A range of complex mathematical techniques which aim to simulate a sequence of likely future events, in order to estimate the impact of a health intervention or the spread of an infection.


How well something works (in a research study). See also ‘effectiveness’.

Douglas Krakower of Beth Israel Deaconess Medical Center described an effort to develop an automated algorithm to identify people at increased risk for acquiring HIV using routinely collected information from electronic health records. This study was selected as a featured abstract by the HIV Medicine Association, one of the four infectious disease societies that sponsor IDWeek.

One of the major barriers to getting more people on PrEP is not having sexual health and risk evaluations done as part of routine care, as providers have competing demands and may lack the training and comfort to discussions sexual health with their patients, Dr Krakower noted as background.

The effort involved three steps. The researchers first extracted potentially relevant data from the electronic health records of Atrius Health, a large group practice in the Boston area with approximately 800,000 patients. They looked at more than 100 variables including patient demographics, recorded diagnoses, medication prescriptions, laboratory tests and procedures.

The team then matched each of the 138 patients who became infected with HIV between 2006 and 2015 to 100 control subjects of the same sex and similar duration of Atrius Health membership who remained HIV-negative, comparing their characteristics and risk factors.

They next used logistic regression modelling and machine learning to predict incident HIV infections among case versus control patients. Logistic regression is a more traditional approach that makes assumptions about what the data will look like, Krakower explained, while machine learning is a new approach in which the computer learns to recognise patterns in the data that may not have been apparent at the outset.

Finally, the researchers looked at whether the distribution of HIV prediction scores in the Atrius Health general population could point to a sub-population who might be candidates for PrEP.

In a comparison of computer algorithms to each other and to logistic regression, several machine learning methods did a good job – better than logistic regression – at predicting incident HIV infection. One method known as Ridge Regression demonstrated the best predictive performance (AUC = 0.76), and the LASSO method also performed well.

Looking more closely at a few of the variables associated with HIV risk, 6.5% of people who became infected with HIV had undergone anal cytology testing, compared to less than 0.1% of uninfected control subjects. Similarly, 3.6% of people with HIV had received a recent prescription for benzathine penicillin G (Bicillin) – used to treat syphilis – compared to less than 0.1% of uninfected controls. And 5.8% of people with HIV had ever had a positive gonorrhoea test, compared to less than 0.1% among control subjects.

The vast majority of members had risk scores indicating they were at very low or low risk of HIV infection. However, after excluding people who were already HIV-positive or currently receiving PrEP, the algorithm identified 8414 individuals – 1.1% of the general population – as potential PrEP candidates. 

"When you see 8000 patients, that's a lot to think about providing PrEP to, but if you have a primary care provider handling 1000 patients, this 1.1% would represent 11 of their patients," Dr Krakower said. "I think this is a clinically reasonable and manageable sub-group of the population for more intensive screening."

The investigators concluded that automated analysis of data routinely stored in electronic heath records can identify patients at increased risk for HIV who are potential candidates for PrEP.

They next plan to optimise the predictive algorithm and validate it with patients at Fenway Health in Boston, a community health centre that specialises in care for sexual and gender minorities where PrEP use is much more common. They then hope to conduct a pilot study with clinicians to see if the algorithm leads to increased appropriate use of PrEP in a real-world setting.


Krakower D et al. Automated identification of potential candidates for HIV pre-exposure prophylaxis using electronic health record data. IDWeek, abstract 860, 2016.