Standardized representation, visualization and searchable repository of antiretroviral treatment-change episodes

Background To identify the determinants of successful antiretroviral (ARV) therapy, researchers study the virological responses to treatment-change episodes (TCEs) accompanied by baseline plasma HIV-1 RNA levels, CD4+ T lymphocyte counts, and genotypic resistance data. Such studies, however, often differ in their inclusion and virological response criteria making direct comparisons of study results problematic. Moreover, the absence of a standard method for representing the data comprising a TCE makes it difficult to apply uniform criteria in the analysis of published studies of TCEs. Results To facilitate data sharing for TCE analyses, we developed an XML (Extensible Markup Language) Schema that represents the temporal relationship between plasma HIV-1 RNA levels, CD4 counts and genotypic drug resistance data surrounding an ARV treatment change. To demonstrate the adaptability of the TCE XML Schema to different clinical environments, we collaborate with four clinics to create a public repository of about 1,500 TCEs. Despite the nascent state of this TCE XML Repository, we were able to perform an analysis that generated a novel hypothesis pertaining to the optimal use of second-line therapies in resource-limited settings. We also developed an online program (TCE Finder) for searching the TCE XML Repository and another program (TCE Viewer) for generating a graphical depiction of a TCE from a TCE XML Schema document. Conclusions The TCE Suite of applications – the XML Schema, Viewer, Finder, and Repository – addresses several major needs in the analysis of the predictors of virological response to ARV therapy. The TCE XML Schema and Viewer facilitate sharing data comprising a TCE. The TCE Repository, the only publicly available collection of TCEs, and the TCE Finder can be used for testing the predictive value of genotypic resistance interpretation systems and potentially for generating and testing novel hypotheses pertaining to the optimal use of salvage ARV therapy.


Background
To identify determinants of successful antiretroviral (ARV) therapy in HIV-1-infected patients for whom a previous ARV treatment regimen has failed, researchers study clinical data associated with treatment-change episodes (TCEs) [1]. These studies characterize the relationship between past ARV treatments, plasma HIV-1 RNA levels, HIV-1 drug resistance genotype results, and the subsequent virological response to a salvage therapy regimen [2][3][4][5][6][7][8]. Such studies, however, often differ in their inclusion criteria, salvage therapy requirements, and definition of virological response.
To facilitate data sharing and analyses of combined data, we have developed a TCE XML Schema to represent treatment-change episodes. XML (Extensible Markup Language) is a markup language for encoding human and computer readable documents. An XML Schema defines constrained elements and attributes that can ensure a consistent representation of complex data. The TCE XML Schema is a richer representation of data than the flat files or spreadsheets, which form the basis for most analyses [9]. Here we collaborate with four clinics to create a public repository of 1,500 TCE XML documents represented using the TCE XML Schema (TCE Repository). To demonstrate the utility of such a repository for hypothesis generation and knowledge discovery, we analyzed a subset of the repository to obtain insights into the optimal use of second-line therapy in resource-limited settings.
We also describe two online programs that complement the TCE XML Schema: a TCE Viewer and a TCE Finder. The TCE Viewer accepts a valid TCE XML Schema document and creates a graphical depiction of the temporal relationship between ARV regimens, plasma HIV-1 RNA levels, peripheral blood CD4+ T lymphocyte counts (CD4 counts), and genotypic resistance data. The TCE Finder searches the TCE Repository according to user-defined criteria and retrieves those that meet the search criteria.

TCE XML schema
The TCE XML Schema elements and constraints were developed to represent the temporal relationship among ARVs, plasma HIV-1 RNA levels, CD4 counts and genotypic drug resistance data surrounding a treatment change. Each valid TCE XML Schema document must have a treatment change time point (baseline or time zero). The TCE baseline must be assigned a date or, at the very minimum, a calendar year. The preceding and subsequent data are demarcated by the number of weeks from baseline. The complete treatment history received before baseline is represented as a list of regimens, their durations, and associated plasma HIV-1 RNA levels and CD4 counts. However, if these data are not available, the XML Schema can represent the past treatment history as a list of one or more ARVs or ARV classes. Genotypic drug resistance test results are represented as nucleotide sequences or lists of amino acid mutations obtained prior to the treatment change. Optional elements include the nadir CD4 count, gender, age, ethnicity, and a metadata element for either annotating or just naming the TCE. The TCE XML Schema can be found at http:// hivdb.stanford.edu/TCEs/schema/TCE.xsd.
To demonstrate the adaptability of the TCE XML Schema to different clinical environments, we collaborated with four clinics from Kaiser-Permanente Medical Care Program-Northern/Southern California, University of Barcelona and EuResist Network Database. The study was approved by the Stanford University Institutional Review Board ("Clinical Significance of HIV-1 Drug Resistance: A Clinic Based Approach", Protocol ID: 13900).
The TCE Viewer accepts an XML file, validates the file against the TCE Schema, and generates a graphical depiction of the TCE containing three sections: (i) a figure with the ARV regimens, plasma HIV-1 RNA levels, and CD4 counts preceding and following the treatment change; (ii) a table with one or more genotypic resistance test results preceding the treatment change; and (iii) a compressed summary of the virological and immunological responses to past ARV regimens. The TCE Viewer provides an additional mechanism of validation because many clinicians are adept at visually recognizing anomalous clinical patterns that may have resulted from data entry errors.

TCE finder
The TCE Finder enables users to identify TCEs meeting user-defined search criteria (http://hivdb.stanford.edu/ TCEs/cgi-bin/TCE_finder.cgi). The TCE Finder accepts input parameters pertaining to the ARVs used prior to the change in therapy and/or to the ARVs used for salvage therapy. A summary of the TCEs matching the input criteria are then presented to the user in a table that contains the following fields: ARVs received and genotypic resistance test results obtained prior to baseline, the salvage ARV regimen, and plasma HIV-1 RNA levels obtained while taking the salvage ARV regimen. The table also contains a thumbnail image of each TCE that links to the graphical depiction of the TCE created by the TCE Viewer.

TCE repository
To demonstrate the ability of the TCE Schema to represent data from different clinics, we collaborated with four clinics to create a publicly available TCE repository. For the purposes of our collaboration we selected TCEs sharing each of the following criteria: (i) evidence for virological failure prior to a change in therapy defined a plasma HIV-1 RNA level of >1,000 copies/ml obtained within 8 weeks before the change; (ii) a complete list of ARVs received prior to baseline; (iii) a change in ARVs occurring within 24 weeks of a baseline genotypic resistance test; (iv) a new salvage regimen administered for at least four weeks; (v) one or more CD4 counts within 24 weeks prior to the ARV change; and (vi) two or more plasma HIV-1 RNA levels within the first 36 weeks while taking the salvage regimen.
Overall, 1,527 TCEs met the above inclusion criteria in- and 257 (IQR: 139 to 402), respectively. Patients had received a median six years (IQR: 3 to 8) of ARV therapy prior to the TCE. Previous ARVs included a median of four NRTIs, two PIs, and one NNRTI.
A complete listing of the ARVs used in the salvage regimens is shown in Table 1. Table 2 summarizes the ARV class combinations comprising the salvage therapy regimens: (i) 1,382 regimens (denoted in Table 2 as Type 1 regimens) comprised combinations of the first three approved ARV classes: NRTIs, NNRTIs, and PIs; (ii) 145 regimens (denoted in Table 2 as Type 2 regimens) contained at least one of the newer classes including the integrase inhibitor, raltegravir (RAL), fusion inhibitor, enfurvirtide (ENF), and CCR5 antagonist, maraviroc (MVC).
The median duration of the salvage therapy regimen was 52 weeks (IQR: 38 to 52). Plasma HIV-1 RNA levels following the ARV change were available a median of every 13 weeks. One or more plasma HIV-1 RNA levels were available in 91% of TCEs during the 8 to 16 week window following the change in therapy, in 83% of TCEs during the 16 to 36 week window, and in 58% of TCEs during the 36 to 52 week window. Two or more plasma HIV-1 RNA levels were available in 49% of TCEs following the change in therapy during the 8 to 16 week window, in 37% of TCEs during the 16 to 36 week window, and in 17% of TCEs during the 36 to 48 week windows.
Of the TCEs for which two or more plasma HIV-1 RNA levels were available during the 16 to 36 week window The TCE XML documents have been placed in a publicly available repository found on the following web page: http:// hivdb.stanford.edu/TCEs/. The TCE Repository contains three primary functions. First, users can employ the TCE Finder to identify TCEs matching specific criteria ( Figure 1) and examine the virological responses associated with the TCEs. Second, users can obtain a graphical depiction of their own TCEs by submitting a TCE XML document to the TCE Viewer ( Figure 2). Third, users can download the entire set of TCE XML documents in a compressed file format or browse each TCE document using the TCE Viewer.  Virological response to initial PI and NNRTI therapy: insights from the TCE repository One of the most pressing clinical challenges in resource-limited settings is the design of salvage therapy strategies for patients developing virological failure following an initial NRTI/NNRTI-containing regimen or, less commonly, an initial NRTI/PI-containing regimen. Figure 3A illustrates that the TCE XML Repository contains 111 NRTI/NNRTIexperienced but PI-naïve patients who received salvage therapy with a ritonavir-boosted PI and an optimized NRTI backbone. Figure 3B illustrates that the Repository contains 144 NRTI/PI-experienced but NNRTI-naïve patients who received salvage therapy with an NNRTI and an optimized NRTI backbone. The proportion of patients attaining virological suppression (<50 copies/ml) in the first 6 to 12 months of therapy was significantly higher in those receiving salvage therapy with a boosted PI (88/111, 79%) compared with an NNRTI (66/144, 46%; p < 0.001). The drug class used for salvage therapy (boosted PI vs. NNRTI) remained significant in a multivariate analysis that controlled for baseline CD4 count, plasma HIV-1 RNA level, calendar year, and the expected activity of the optimized NRTI backbone (i.e., the NRTI genotypic susceptibility score, GSS). Among those receiving boosted PIs, the proportions of responders were similar in those receiving atazanavir (26/33, 79%) compared with lopinavir (41/50, 82%). However, the mean baseline CD4 count was higher (343 vs. 263) and the mean baseline plasma HIV-1 RNA level was lower (3.9 vs. 4.2 log copies/ml) in those receiving atazanavir. Among those receiving NNRTIs, the proportions of responders were similar in those receiving efavirenz (52/108, 48%) compared with nevirapine (14/36, 39%). The mean baseline CD4 count and plasma HIV-1 RNA level were also similar in those receiving efavirenz compared with nevirapine (323 vs. 310; 4.1 vs. 4.2 log copies/ml).

Discussion
Studies of TCEs typically do not analyze the complete treatment history of a patient. Rather these studies parameterize essential features of the patient's past ARV exposures. This condensed treatment history combined with the response to a new therapy was called a "treatmentchange episode" (TCE) by Larder et al. of the Resistance Database Initiative (RDI) [22]. The TCE XML Schema is therefore much less complex than the relational database implemented by the HIV Cohort Data Exchange Protocol (HICDEP) [23]. Moreover, the fact that the TCE XML Schema does not require demographic or epidemiologic data and allows relative (rather than absolute) dates makes it impossible to identify individual patients or clinics [24]. The TCE XML suite comprises four medical informatics tools: (1) The XML Schema; (2) The TCE Viewer, an online program that creates a graphical representation of data in the XML document; (3) The TCE Repository, which provides the proof-of-concept that the TCE XML Schema can be used to exchange data from multiple clinics; and (4) The TCE Finder, a search engine to identify TCEs meeting specific criteria. The TCE XML suite is useful for comparing genotypic resistance interpretations and hypothesis generation and testing. It should therefore be distinguished from ongoing projects designed to optimize therapy for individual patients such as RDI's HIV Treatment Response Prediction System (TREPS) [22] and Genafor's Theo [25]. However, because the data in the TCE Repository is publicly available it can be used to increase the training sets for machine learning systems such as Theo and TREPS.
Despite its nascent stage, the TCE Repository has already been shown to be useful for comparing different genotypic resistance test interpretation systems. Specifically, 734 of the TCEs were previously used in a study comparing the predictive value of three algorithms [26]. Without such a repository, comparisons of genotypic resistance interpretation systems can be performed solely by using proprietary datasets. In addition, we demonstrate here that the TCE Repository makes it possible to generate novel hypotheses that that may be relevant to salvage therapy in resourcelimited regions. Indeed, at least one other research team has proposed the use of three rather than two NRTIs for certain salvage therapy scenarios in regions without access to newer ARV classes [27]. However, considering the large number of covariates associate with treatment response, very large numbers of TCEs will be required to adequately test novel hypotheses.
Although the XML Schema and Viewer are useful to individual research groups and collaborations, the usefulness of the Finder and Repository depends on the willingness of researchers to contribute data to this effort. Therefore, we have collaborated with four research groups to demonstrate the utility of the XML suite of applications for collaboration between multiple clinics. We are continuing to work with clinics in North America, Spain, and the EuResist Network Database to expand the Repository with TCEs that are relevant to resource limited regions (i.e. the regimens are confined primarily to NRTI, NNRTIs, and PIs) and with TCEs involving the use of more recently approved ARV classes including the integrase inhibitors and maraviroc.

Conclusions
The TCE Suite of applicationsthe XML Schema, Viewer, Finder, and Repositoryaddresses several major needs in the analysis of predictors of virological response to ARV therapy. The TCE XML Schema facilitates data sharing for generating and testing new hypotheses. The TCE Viewer helps users validate the temporal relationship between different data elements and it can be a useful teaching tool. The TCE Finder is an application designed for researchers who do not want to download the entire TCE repository but who would rather examine the solely the clinical data of patients sharing similar ARV treatment and genotypic resistance characteristics. The TCE Repository is the largest collection of publicly available TCEs. It is already useful for comparing the predictive value of genotypic resistance interpretation systems. As it increases in size it will become an increasingly useful resource for hypothesis generation and knowledge discovery.