HIV-1 mutational pathways under multidrug therapy
© Lawyer et al; licensee BioMed Central Ltd. 2011
Received: 29 March 2011
Accepted: 27 July 2011
Published: 27 July 2011
Skip to main content
© Lawyer et al; licensee BioMed Central Ltd. 2011
Received: 29 March 2011
Accepted: 27 July 2011
Published: 27 July 2011
Genotype-derived drug resistance profiles are a valuable asset in HIV-1 therapy decisions. Therapy decisions could be further improved, both in terms of predicting length of current therapy success and in preserving followup therapy options, through better knowledge of mutational pathways- here defined as specific locations on the viral genome which, when mutant, alter the risk that additional specific mutations arise. We limit the search to locations in the reverse transcriptase region of the HIV-1 genome which host resistance mutations to nucleoside (NRTI) and non-nucleoside (NNRTI) reverse transcriptase inhibitors (as listed in the 2008 International AIDS Society report), or which were mutant at therapy start in 5% or more of the therapies studied.
A Cox proportional hazards model was fit to each location with the hazard of a mutation at that location during therapy proportional to the presence/absence of mutations at the remaining locations at therapy start. A pathway from preexisting to occurring mutation was indicated if the covariate was both selected as important via smoothly clipped absolute deviation (a form of regularized regression) and had a small p-value. The Cox model also allowed controlling for non-genetic parameters and potential nuisance factors such as viral resistance and number of previous therapies. Results were based on 1981 therapies given to 1495 distinct patients drawn from the EuResist database.
The strongest influence on the hazard of developing NRTI resistance was having more than four previous therapies, not any one existing resistance mutation. Known NRTI resistance pathways were shown, and previously speculated inhibition between the thymidine analog pathways was evidenced. Evidence was found for a number of specific pathways between NRTI and NNRTI resistance sites. A number of common mutations were shown to increase the hazard of developing both NRTI and NNRTI resistance. Viral resistance to the therapy compounds did not materially effect the hazard of mutation in our model.
The accuracy of therapy outcome prediction tools may be increased by including the number of previous treatments, and by considering locations in the HIV genome which increase the hazard of developing resistance mutations.
Antiretroviral treatment has turned infection with the Human Immunodeficiency Virus (HIV-1) into a manageable disease. Yet eventually the HIV variants circulating in the patient develop resistance to the applied drugs. In many cases, it is known which mutations give resistance to which drugs, allowing accurate prediction of therapy efficacy based on HIV genotyping , with generally good results [2, 3]. Better understanding of which pre-existing mutations effect the development of resistance would further improve treatment, informing both choice of compounds for the current therapy and long-term strategies to maintain treatment options when the current therapy fails.
Reverse transcriptase inhibitors (RTIs) are the longest used and arguably the most important class of antiretrovirals. These compounds inhibit the reverse transcription of single-stranded viral RNA into double-stranded viral DNA suitable for incorporation into the host DNA. They are classified as either nucleoside (NRTIs), which incorporate into and terminate transcription of the viral DNA, or non-nucleoside (NNRTIs), which change the conformation of the RT polymerase into a non-functional state. RTIs are expected to remain a critical therapy component even as new classes of drugs, such as entry and integrase inhibitors, are added to the anti-HIV arsenal .
Accordingly, a great deal of work has investigated development of RTI resistance. Many RTI resistance mutations are known to occur in clusters . Two of the most studied NRTI clusters are the thymidine analog resistance mutations, TAM-1 (41L, 210W, 215Y) and TAM-2 (67N, 70R, 215F, 219E/Q), . which show evidence of appearing in ordered sequence [6, 7]. Less evidence supports pathways to NNRTI resistance, which can arise from a single mutation  with little impact on viral fitness [9–11]. Data from clinical trials of efavirenz (an NNRTI), however, suggested that mutation at location 103 preceded mutation at locations 100, 101, 108, and 225 [12, 13].
Standard of care generally dictates two NRTIs supplemented with additional compounds which may include an NNRTI. Understanding of the development of resistance under such multidrug regimes is far from complete [14, 15]. It has been shown that subjects with NNRTI resistance were at greater hazard of developing NRTI resistance, and vice versa , but not which specific factors explained this. Several sources have indicated interactions and other forms of crossplay between NRTI and NNRTI resistance mutations, but have not demonstrated clear pathways [4, 17].
Many of the mutations which commonly occur during therapy do not have a known, direct connection with drug resistance. In the data studied here, 45 different locations frequently harbored mutations at therapy start; 32 of these are not on the International AIDS Society list of RTI resistance mutations . Patterns within these other mutations may underly the just commented on interplay, lending high interest to pathways leading from commonly mutant locations to known resistance sites.
We place the question of identifying mutational pathways in a survival analysis framework. A mutational pathway from (genetic) location a to b is signalled if mutation at location a alters the hazard of mutation at location b. Survival analysis extends the specificity of investigations based on co-occurrence of mutations (i.e. [4, 7, 17]) by indicating both excitatory and inhibitory influences, incorporating temporal dynamics, and making full use of the data despite the abundant censoring. The framework further allows for control of nuisance parameters which are inherent in clinical data. In the current case, the most important of these is having a high number of previous therapies. While other techniques from survival analysis have been previously applied to RTI resistance [11, 13, 16], and protease inhibitor resistance , this is, to our knowledge, the first use of such methods to directly addresses the question of specific mutational pathways.
We tested for pathways between known resistance sites, and also for pathways between commonly mutant locations. Pathways were signalled by a two stage filtering process. For a given mutational site, we first applied smoothly clipped absolute deviation (SCAD) , a form of regularized regression, to identify a subset of pre-existing mutations which showed evidence of influencing the hazard of mutation at that site. These were further screened via standard significance testing to identify those with strong evidence for effect. The model also tested for effects associated with the clinical variables. Longitudinal data were available from the EuResist database . EuResist maintains, to our knowledge, the largest HIV resistance database available for public research.
3TC ABC AZT
Therapy stop causes
Therapy stop causes
Change of therapy
Age 1st genotyping
Num prev therps
Days between genotypings
During this investigation only the location was specified and not the amino acid substitution. The assumption is that mutations detectable by population sequencing would be heavily influenced by treatment history. Binarization offered several additional advantages. It simplifies ambiguities arising from the genotyping method. Further, the study included commonly mutant locations not necessarily listed as important to resistance and thus with little literature support to decide which substitutions were relevant. Binarization allowed these locations to be treated identically to the known resistance mutations.
A location was considered to have a preexisting mutation if its genotype at therapy start did not completely agree with the wild type. For example, a location with wild-type "M" which showed the mixture "MV" at therapy start would be coded as a mutation. A mutation was considered to have occurred during therapy if the observed sequence did not contain an amino acid observed in the preexisting sequence and was not wild type. A preexisting "ML" which changed to "L" during the course of therapy would not be considered a mutation, while a change to "R" would be.
where hc i (t) is the hazard that location c becomes mutant in subject i during therapy and hc0(t) is the baseline hazard of mutation at c. The Cox model does not require specification of hc0(t), which is integrated out during the model fitting. A preexisting mutation at location j th in subject i is coded by p1, i ... pj,i. Having the median or fewer previous therapies was coded by nuisa,i. Other potential nuisance factors, such as the total number of mutations, viral resistance to the therapy compounds, and treatment start year were deemed non-informative in preliminary investigations. An event was signalled when location c was mutant in subject i at the second genotype but not the first. The time to event was the number of days between the two genotypes. While the direction of effect is likely to be correct, this simplifying assumption regarding time to event implies that estimates of magnitude should be regarded with caution.
Within this framework, identification of pathways reduces to a variable selection problem; selecting those regressors with strong evidence of effect on the hazard of developing the target mutation. We first filtered the list of potential regressors using smoothly clipped absolute deviation (SCAD) . SCAD is a form of regularized regression, similar to the LASSO, but with the added benefit that the regularization parameter scales with the magnitude of the regression coefficient. The regressors included in the best SCAD model were then tested for significance using the Wald estimate, and those with p < 0.01 were deemed to have sufficient evidence to suggest a pathway.
While each individual model could only detect one-step pathways (i.e. pre-existing mutation at locations p3 and p7 increased the hazard of mutation at c), fitting the model to each of the candidate regressors in turn produces an adjacency matrix which can be viewed as a directed graph allowing multi-step pathways. We first searched for pathways among known resistance sites by considering the combined list of NRTI and NNRTI resistance locations. We then searched for pathways in our list of commonly mutant locations.
Statistics and figures were created in the R software environment, version 2.7.1 . SCAD was implemented using the R package SIS . Cox model fitting and the Wald estimate were performed using the R package survival . Visualization was aided by the packages ggplot2  and igraph .
The basic model identifies pre-existing mutation with strong evidence for effect on the hazard of developing mutation at one specific target location c. Given this location, therapy data is restricted to those at risk of developing mutation at c, that is, those whose HIV genotype did not exhibit mutation at c at the start of treatment. When c was an NNRTI resistance location, therapies were further restricted to those receiving an NNRTI. Note that all of the therapies under consideration included NRTIs. Candidate regressors were also dependent on c. Obviously c itself could not be a candidate. Further, some locations never exhibited a preexisting mutation in at-risk therapies, and were dropped to prevent convergence issues. Finally, if data is not available on a specific pre-existing mutation for more than 10 of the at-risk therapies, it was dropped from the model.
The single factor which most consistently influences the hazard of mutation at locations with known involvement in NRTI resistance is the number of previous therapies. The median number of previous therapies in the data studied here is 4. Therapies for patients with 4 or less previous therapies are associated with less than half the risk of developing RT resistance. The effect is observed at 9 out of 16 locations associated with NRTI resistance: codons 65, 67, 69, 70, 74, 115, 210, 215, and 219. The median hazard ratio is 0.37 (ranging from 0.15 to 0.52), with the lowest 95% confidence bound at 0.06 and the highest at 0.81. This effect is not due to the presence of more known resistance mutations in patients with a large number of previous therapies, as known resistance mutations were regressed out by the model. In addition, further testing indicated that genotypically estimated viral resistance has negligible effect in our models. The finding could, however, represent the accumulation of mutations in regions about which little is known because they are rarely sequenced. For example, the first investigation of mutations in the connection and ribonuclease H domains of RT has shown that such mutations strongly influence AZT resistance in combination with the TAM pathways .
Hazard Ratios for pathways between known resistance locations
Hazard ratios between known resistance locations
41 → 108
67 → 70
67 → 190
67 → 219
70 → 210*
70 → 219
74 → 100
77 → 103
115 → 106
116 → 62
151 → 116
181 → 65
184 → 181†
184 → 210
190 → 184
210 → 70
215 → 41
215 → 65
Hazard Ratios for pathways from commonly mutant locations to known resistance locations
Hazard ratios; common to resistance locations
43 → 103
49 → 103
67 → 70
67 → 190
68 → 184
70 → 103
70 → 181
70 → 219
74 → 184
118 → 219
135 → 210
142 → 67
162 → 70
179 → 103
184 → 181
184 → 190
196 → 103
196 → 190
196 → 210
200 → 190
210 → 70
211 → 210
214 → 69
215 → 41
Inhibition of TAM-2 by TAM-1 is also observed. Mutation at location 210 is associated with a tenfold reduction in the risk of mutation at 70. A similar effect is witnessed in the reverse direction, though the reduction is only threefold, and the significance level is just above threshold (p = 0.011). The to date most thorough report on the TAM pathways  presented speculative evidence that TAM-1 inhibits TAM-2. Independent support for this inhibition was presented by Sing et. al . Other studies, however, have reported that some patients develop mutations in both TAM clusters, or switching from TAM-1 to TAM-2 .
No pathways are seen between NNRTI resistance locations. This was not unexpected, as in vitro investigations suggest that resistance to most NNRTIs can result from a single mutation . This evidence is corroborated by a Bayesian analysis of combinatorial mutation patterns which indicates that interactions among mutations granting nevirapine (an NNRTI) resistance were very weak . Pathways have been suggested, however, in data from a clinical trial of efavirenz [12, 13].
The method also indicated several cross-class resistance pathways. Specific pathways from NRTI to NNRTI resistance included the following, all of which showed multi-fold increase in hazard: 41 → 108, 67 → 190, 74 → 100, and 77 → 103. Previous work has observed that mutation at 74 is associated with increased frequency of NNRTI failure ; mutation at location 100 grants resistance to most NNRTIs . Some evidence also suggests that the L74V mutation compensates loss of viral fitness incurred by the double NNRTI resistance mutations L100I + K103N .
Survival analysis has shown that having any (N)NRTI resistance mutation increases the hazard of developing a mutation in the other class, . We note that Healy et al.'s findings of general dependence showed stronger effect sizes than ours. This slight divergence could be dependent on the selection of subjects. Most of Healy et al.'s subjects had few previous treatments, while the EuResist subjects had failed a median of four previous therapies. As the EuResist subjects had a number of accumulated mutations, the risk profile of mutations which could arise in these subjects is likely to differ substantially from Healy et al.'s subjects. The evolutionary dynamics might also differ between the two groups. The accumulated mutations in the HIV variants circulating in patients with a long treatment history are likely to have reduced the virus's replicative capacity . Differences could also be specific to viral subtype. We note that neither Healy et. al. nor the report of the clinical trial which supplied their data provide subtype information. Finally, Healy et al.'s data came from a prospective study, whereas the EuResist data is retrospective.
Pathways which lead to known resistance sites could prove informative in predicting the development of (further) resistance. A number of pathways to NRTI resistance sites begin from locations not listed as providing RTI resistance, though the estimated increase in hazard is in general lower than that observed between known resistance mutations (see Table 5). Their influence suggests that therapy outcome prediction engines could be improved by incorporation of the following pathways: 142 → 67, 214 → 69, and 162 → 70. It was also observed that 68 → 184, and that mutation at either 177 or 181 increases the hazard of 68. This final observation suggests an indirect 181 → 68 → 184 pathway from NNRTI to NRTI resistance. An inhibitory pathway was also identified, with mutation at 135 reducing the hazard of mutation at 210 by a factor of 0.38 (95% confidence bounds of 0.25, 0.59).
Several pre-existing mutations are associated with increased hazard for mutation granting NNRTI resistance, concurring with previous research suggesting that pathways to NNRTI resistance may start from previously unsuspected mutations . Notably, mutation at any of 43, 179,or 196 increases the hazard of mutation at 103.
196 or 200 increase the hazard at 190. Mutation at location 103 or 190 grants strong resistance to both EFV and NVP. This again suggests that consideration might be given to mutations at locations 43, 179, 196, or 200 before prescribing these NNRTIs. Location 43 could play a part in NRTI to NNRTI resistance, since 210 → 43 → 103. The 184 → 181 pathway suggested in the known resistance sites was again observed, and now with sufficient evidence to pass our threshold.
Several mutations seemed to decrease the hazard of NNRTI resistance. Notably, mutation at location 70 (part of the TAM-1 complex granting NRTI resistance) strongly inhibits mutation at location 181. Location 184 was observed to slightly inhibit mutation at location 190.
This work is, as far as we know, the first to fully employ the Cox model to identify specific mutational pathways. The proportional hazards approach is more sophisticated than methods which do not include time to event or censored events and which cannot control for nuisance parameters. Modeling was simplified, however, by assuming right censored data whereas interval censoring better describes the process generating the data. This simplifying assumption was required as we had no good indication of what an appropriate interval would be. It was justified in that the principle benefit of interval censoring is a more accurate estimate of effect size, not effect significance. Effect size did not play a role in our estimation of pathways.
Some evidence suggests that viral subtype also effects hazard of mutation. Several studies have observed a relationship between resistance mutations and viral subtype, as reviewed in  and . A difficulty with such studies, however, is that the data on non-B subtypes tends to come from resource-limited regions where treatment regimes do not always meet European standards of best clinical care. This situation is likely to change as the demographics of the disease change; the UK Health Protection Agency now reports that the majority of UK infections are non-B . Also, molecular dynamics suggest that different patterns of DNA synthesis in subtype C variants compared to subtype B increase the hazard of developing the K65R mutation . Since subtype prevalence varies greatly by patient risk category, it is possible that analysis of subjects in the heterosexual risk group might demonstrate different pathways, or the same pathways with different hazard ratios, than observed in the full data. On the other hand, binarization of the data may have reduced this effect. The V106M resistance mutation may be preferred over V106A in subtype C ; the current study gives both of these events the same encoding. Preliminary analysis found subtype to be non-informative to the current model as applied to the EuResist data.
Data for the study came from a large observational database. This can lead to several forms of bias in the data. While we believe these effects to be small, they should be noted. The first bias deals with model assumptions. Our data was censored at the time of the second genotyping. This event is not necessarily independent from our outcome measure, the hazard of mutation. Dependence between the two, however, is difficult to ascertain. The bias would be strongest when the genotyping was conducted due to virological failure of the therapy, a factor which would increase the hazard of additional mutation. On the other hand, virological failure could be caused by the development of resistance mutations, which are present in the model and whose effect is therefore already included in the calculations.
The remaining biases are due to the clinical practices which underly the data gathering. For some, and perhaps most, of the therapies, genotyping was used to select treatment. Neither drug choices nor drug combinations were random, nor were the associations between drugs and resistance mutations. The EuResist database does not record which, if any, patients were left on failing regimens. Continuing a therapy in the presence of detectable viral loads is linked to increased hazard of mutation . Finally, it should always be remembered that correlation does not imply causality. The underlying reason for an observed dependency could be influenced by shared inheritance from a founder virus or shared evolutionary pressures.
The current work did not investigate pathways to protease inhibitor (PI) pathways resistance as our primary interest was on cross-class RT mutational pathways. Nonetheless, some key work on pathways to protease inhibitor (PI) resistance should be noted. Many of the groups researching RTI resistance have also worked on PI resistance, and the development of statistical approaches has come from both endeavors. PI resistance mutations form clusters . Temporal dependencies in HIV genetic changes in response to PIs have been indicated via Bayesian model selection . Bayesian networks have been used to uncover direct influences between protein residues and treatment in clinical settings, and were able to determine the specific role of many resistance mutations against nelfinavir . Phylogenetic approaches are also informative in determining pathways . A cross-class pathway with promising clinical implications is a finding that patients with PI resistance are less likely to develop resistance to Bevirimat (a maturation inhibitor) than those who are PI-naive .
This investigation found specific pathways both within and across drug classes. It was further found that the most consistent factor speeding the development of NRTI resistance was not any known mutation, but rather having failed more than four previous therapies.
Joachim Büch assisted with accessing the EuResist database. This study benefited from data provided by EuResist Network EIDB, which receives financial support via the CHAIN Project funded by the European Community's Seventh Framework Programme FP7/2007-2013 under grant agreement #223131.
This work was supported by the Max Planck Institute. Support for AA was provided by the CHAIN project (EU grant HEALTH-F3-2009-223131). None of the funding organizations took part in collection, management, analysis or interpretation of the data.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.