First-line HIV treatment failures in non-B subtypes and recombinants: a cross-sectional analysis of multiple populations in Uganda

Background Our understanding of HIV-1 and antiretroviral treatment (ART) is strongly biased towards subtype B, the predominant subtype in North America and western Europe. Efforts to characterize the response to first-line treatments in other HIV-1 subtypes have been hindered by the availability of large study cohorts in resource-limited settings. To maximize our statistical power, we combined HIV-1 sequence and clinical data from every available study population associated with the Joint Clinical Research Centre (JCRC) in Uganda. These records were combined with contemporaneous ART-naive records from Uganda in the Stanford HIVdb database. Methods Treatment failures were defined by the presence of HIV genotype records with sample collection dates after the ART start dates in the JCRC database. Drug resistances were predicted by the Stanford HIVdb algorithm, and HIV subtype classification and recombination detection was performed with SCUEAL. We used Bayesian network analysis to evaluate associations between drug exposures and subtypes, and binomial regression for associations with recombination. Results This is the largest database of first-line treatment failures (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=1724$$\end{document}n=1724) in Uganda to date, with a predicted statistical power of 80% to detect subtype associations at an odds ratio of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ge 1.2$$\end{document}≥1.2. In the subset where drug regimen data were available, we observed that use of 3TC was associated with a higher rate of first line treatment failure, whereas regimens containing AZT and TDF were associated with reduced rates of failure. In the complete database, we found limited evidence of associations between HIV-1 subtypes and treatment failure, with the exception of a significantly lower frequency of failures among A/D recombinants that comprised about 7% of the population. First-line treatment failure was significantly associated with reduced numbers of recombination breakpoints across subtypes. Conclusions Expanding access to first-line ART should confer the anticipated public health benefits in Uganda, despite known differences in the pathogenesis of HIV-1 subtypes. Furthermore, the impact of ART may actually be enhanced by frequent inter-subtype recombination in this region. Electronic supplementary material The online version of this article (10.1186/s12981-019-0218-2) contains supplementary material, which is available to authorized users.

Sequence data from the Ugandan study populations were processed by the following steps. Multiple HIV genotypes from the same patients, including partial sequences of protease and RT respectively, were identified by parsing the sequence labels with study-specific regular expressions. Plurality consensus sequences at baseline and, if applicable, treatment failure were produced for each patient by using MAFFT 1 (version 7.305b) to generate a multiple sequence alignment that included the HXB2 pol reference sequence (Genbank accession number K03455) as a scaffold for incomplete sequences. If the mean p-distance (proportion of nucleotide differences including mixtures) in the alignment with the HXB2 reference removed exceeded a cutoff of 5% and the total overlap was greater than 100 bases, the consensus sequence was flagged as the potential result of merging mislabeled samples. A total of 7 consensus sequences were flagged. When multiple sequences (i.e., baseline and one or more failures) were available for a given patient, we used only the earliest failure sample for all cross-sectional analyses.
Amino acid polymorphisms were extracted from the sequence data by pairwise local alignment of each nucleotide sequence against the HXB2 reference. We used an implementation of the Gotoh algorithm in HyPhy with match/mismatch scores of 5 and −4 and gap open and extension penalties of 10 and 1, respectively. Any base insertions relative to the reference were discarded. The aligned sequences were translated into amino acids and recorded within the HXB2 coordinate system.
We obtained Stanford resistance predictions for the resulting consensus sequences using a custom Python script to automate transactions with the Stanford HIVdb Sierra web service 2 (algorithm version 8.1.1). The XML outputs generated by the Sierra web service were parsed using a Python script to extract drug-specific scores and to link the input sequences with their original labels. We calculated genotypic susceptibility scores (GSSs) in R by mapping Stanford resistance scores to five values according to the HIVdb algorithm version 7.0.1 ([−∞, 9] → 1.0, [10,14] → 0.75, [15, 29] → 0.5, [30, 59] → 0.25, [60, ∞] → 0) to produce matrix S with rows corresponding to patients and columns for different drugs. A value of S i, j = 1.0 indicates that patient i's infection is highly susceptible to drug j. Next, we extracted and sorted the drug exposure variables to generate a binary matrix E with the same dimensions as S and matching column order. Finally, we calculated GSS by summing across rows of the element-wise product of S and E.
HIV subtype classification and recombination detection was performed on the sequence data using both SCUEAL 3 and REGA 4 . The SCUEAL analysis was executed on our computing cluster with a high-sensitivity configuration of the genetic algorithm: a population size of 128 models; a stopping criterion of 100 generations without score improvement; a maximum of 5 recombination breakpoints; and a minimum recombination fragment length of 100 nucleotides. We used the 2011 HIV-1 pol reference alignment distributed with SCUEAL, which comprises 442 'pure' pol sequences from all HIV-1 group M, N, O and P subtypes and 39 circulating recombinant forms (CRFs). SCUEAL produces two levels of subtype and recombinant predictions: a low-level prediction that reports subtype reference assignments for each recombination fragment detected in the sequence, and a high-level summary of the subtype assignments. For example, a sequence with 2 recombination breakpoints (3 fragments) that are assigned respectively to subtypes A1, D and A1 is summarized as an A1/D recombinant; recombinants of three or more subtypes or sub-subtypes are labeled 'Complex'. We further reduced the high-level summaries by merging recombinants of sub-subtypes A1, A2, A3 and A4 as uniformly subtype A. Next, we re-categorized the sequences into subtypes A, C and D and A/D recombinants; any other subtypes, recombinants or CRFs were placed in an 'other' category. All subsequent references to SCUEAL subtype assignments will correspond to these simplified categories unless specified otherwise.
The REGA subtyping method was accessed using the web interface provided by the Stanford HIVdb database. Best subtype/recombinant assignments were extracted from REGA CSV outputs using an R script and reduced to high-level categories for comparison to the SCUEAL results. Subtype assignments for non-recombinant sequences were verified by phylogenetic reconstruction against the 2010 subtype reference set curated by the Los Alamos National Laboratory (LANL) HIV Sequence Database (http://www.hiv.lanl.gov/).
We used MAFFT to generate a multiple sequence alignment from all sequences classified as non-recombinant by SCUEAL, and manually refined the alignment using AliView 5 . For the purpose of visualizing subtypes, we reconstructed a phylogeny using FastTree2 6 under the general time-reversible model of nucleotide substitution. Bootstrap support values were estimated by the default Shimodaira-Hasegawa test 7 in FastTree2. Phylogenies were visualized and manually annotated using FigTree (A. Rambaut, http://tree.bio.ed.ac.uk/software/figtree).

Text S2: Bayesian network analysis
We used an implementation of the order Markov chain Monte Carlo (MCMC) method 8 in HyPhy 9 to perform a Bayesian network analysis. Bayesian networks are a class of machine learning methods, where the network is a probabilistic graphical model of the conditional dependencies among a number of variables 10 . Drug exposures and treatment failures were encoded as binary outcomes. Five drug exposure variables (ATV, DRV, SQV, IDV and ETR) were excluded for having an insufficient number of positives (n ≤ 5). Region values were encoded for Fort Portal, Kampala, and Mbale; otherwise the record was excluded. Subtype values were encoded for the simplified categories (subtype A, C, and D, A/D recombinant, and 'other'). The final data set for the Bayesian network analysis comprised 1750 rows (observations) and 18 variables. We ran two replicate order-MCMC chain samples for a burn-in period of 10 5 steps, followed by 10 6 steps that were sampled at regular intervals of 2000 steps. Next, we compared the replicate chains to assess their convergence to the posterior distribution. An edge e i j was included in the consensus Bayesian network if the sum of marginal posterior probabilities P(e i j ) + P(e ji ) exceeded a cutoff of 0.9. Edges were assigned the direction i → j if the proportion P(e i j )/(P(e i j ) + P(e ji )) was greater than 0.8, j → i if it was lesser than 0.2, and left undirected otherwise. Figure S1: Statistical power to detect an association between first-line treatment failure and a specific subtype (X). We assume that 10 4 individuals have started first-line treatment, that the overall prevalence of treatment failure is p = 5%, and that the prevalence of the subtype is x = 20%. The subtype-specific prevalence of treatment failure was set to q = pr/(1 − p(1 − r)), where r is the expected odds ratio for a Fisher's exact test on the resulting 2 × 2 contingency table. We simulated sampling of individuals at random without replacement to obtain a target sample size N, and estimated power from the proportion of 1000 samples where Fisher's exact test rejected the null hypothesis given r > 1 (left). Next, we simulated weighted sampling where the proportion of treatment failures was modified to a specific percentage, given N = 5000, to reflect the impact of targeted sampling in a retrospective study (right). Note that the x-axis of the second plot was rescaled. Odds ratio Power 50% 20% 10% 5% Figure S2: Phylogenetic tree of non-recombinant HIV-1 sequences annotated by subtype. The tree was arbitrarily rooted at the split between HIV-1 subtypes C and D. Simple subtype categories are indicated by colour (dark red = subtype A, blue = subtype C, green = subtype D). Orange branches indicate sequences in the 'other' category, including subtype U and circulating recombinant forms (CRFs), which are classified as non-recombinant because they were included in SCUEAL reference set. Two subtrees with a high proportion of subtype U are labeled on the plot; one of these subtrees (bootstrap support 81%) was placed between clades of subtype A (94%) and G (99%). We note that subtype G was recently proposed to be itself the product of a recombination involving a subtype A parental lineage 11 . Sequences within the subtype D clade that were classified as 'other' tended to be CRF10. The scale bar indicates the expected number of nucleotide substitutions per site.   Tables   Table S1: Summary table of