The WIHS is a multicenter prospective study designed to explore the natural and treated history of HIV disease among women since 1994. The WIHS study design and methods are detailed elsewhere. Briefly, a total of 3,768 HIV-seropositive or high risk HIV-negative women aged 13 years or older were recruited from six consortia sites located in Chicago, Los Angeles, San Francisco, Washington D.C., Brooklyn and Bronx in New York City. The study was approved by the local institutional review board at each site and informed consent was obtained for all participants. Research visits are conducted semiannually and include extensive questionnaire-based interviews, specimen collection, physical and obstetric/gynecologic examination. Self-reported quality of life was ascertained at each semiannual visit through 1999 and annually thereafter. This analysis uses data collected through September 2004 (study visit 20). For this study, a matched cohort design was adopted and our analyses were restricted to the HIV-positive participants who enrolled in WIHS during 1994–1995 and had at least one QOL measurement after the matching (baseline) visit as described in detail below.
Among many QOL instruments used for HIV-infected populations, the Medical Outcome Study (MOS)-HIV has been one of the most widely used disease specific instruments. In WIHS, a shortened version of MOS-HIV developed by Bozzette et al was adopted to measure QOL. With this instrument, item redundancy is reduced while excellent reliability is maintained and construct validity is comparable to that of MOS-HIV. The shortened form has 21 items representing 9 domains: physical functioning, role functioning, energy/fatigue, social functioning, cognitive functioning, pain, emotional well-being, perceived health index and current health perception. The domain scores are derived by averaging the recoded raw scores for corresponding items of each domain expressed on a 0–100 scale, with higher values for better functioning and well-being according to an established scoring recommendation. In addition, one summary score is generated from six domains (physical functioning, role functioning, energy/fatigue, social functioning, pain and emotional well-being) on the basis of a published algorithm. The summary and nine domain scores are the outcomes of interest in this study.
HAART was defined following the Department of Health and Human Service/Kaiser Panel guidelines and defined as: (a) two or more nucleoside reverse transcriptase inhibitors (NRTIs) in combination with at least one protease inhibitor (PI) or one non-nucleoside reverse transcriptase inhibitor (NNRTI); (b) one NRTI in combination with at least one PI and at least one NNRTI; and (c) an abacavir or tenofovir containing regimen of three or more NRTIs in the absence of both PIs and NNRTIs. Combinations of zidovudine (AZT) and stavudine (d4T) with either a PI or NNRTI were not considered HAART. While HAART use can vary over time, in this analysis we consider trends following first HAART initiation.
On the basis of results from prior studies and data available in WIHS, we selected a number of variables possibly affecting participants' and/or provider's decision to initiate HAART or their QOL. Age was determined at the matching visit. Race/ethnicity was categorized as White non-Hispanic, Black non-Hispanic, Latina/Hispanic and other. Education level at study entry was coded as less than high school, completed high school, and above high school. Annual gross income was dichotomized as greater than $12,000 or not. The number of HIV-related constitutional symptoms, including fever, diarrhea, memory problems, neuropathy symptoms (numbness, tingling or burning), unintentional weight loss, confusion and night sweats, were aggregated for each visit. Standardized three or four color flow cytometry was used to determine total CD4+ cells/mm3 at laboratories concurrently at each visit. Plasma HIV-1 RNA levels were measured using the isothermal nucleic acid sequence based amplification (NASBA/Nuclisens) method (bioMérieux, Boxtel, NL) in laboratories participating in the NIH/NIAID Virology Quality Assurance Laboratory proficiency testing program. The current lower limit of quantification was 80 copies/ml using 1.0 ml sample input. Self-reported depressive symptoms was measured using the 20-item Center for Epidemiological Studies Depression Scale (CES-D), with a total score of 16 or greater used to define the presence of depression. Current employment, any insurance coverage, clinical AIDS diagnosis, and the number of outpatient visits, hospitalizations and medications taken (antiretroviral and non-antiretroviral) since last visit, were also included in our analysis. As calendar time affected the chance of HAART initiation[3, 16], it was also included as a covariate in estimating propensity score.
Propensity score matching
Unlike in randomized trials, use of therapies in observational studies is not from random assignment and thus unbalanced distributions of background confounders may bias the estimated exposure effects. To account for this, conventional matching or stratification methods can sometimes be used to create groups of exposed and unexposed individuals with similar measured covariates. Given the large number of background covariates and limited sample size in most observational studies, it is often implausible to control all covariates at one time in this way. As an alternative, propensity score methods have been developed that attempt to match or stratify on a scalar propensity score that reflects an individual's estimated probability of taking a treatment conditional on other variables. By selecting exposed and unexposed individuals matched on the propensity score, we eliminate the associations between HAART initiation and these covariates; thus, these factors will not serve as confounders when we evaluate the effect of HAART. As many factors could affect HAART initiation in WIHS, it is reasonable to use propensity score matching to help eliminate indication bias.
To construct the propensity score of initiating HAART in our analysis, a multiple logistic regression method was used. For the HAART users, we selected the last visit before HAART initiation as the matching visits. For the HAART naïve HIV positive women, we included all of their QOL visits as candidate matching visits. The matching visit data from the HAART exposed group and the candidate matching visit data from the HAART naïve group were pooled together and a propensity score was obtained for each participant at each visit conditional on a number of variables, including age, education, race/ethnicity, income, employment, health insurance, CD4+ cell counts, viral load, history of AIDS diagnosis, clinical depression, and number of symptoms, outpatient visits, hospitalizations and medications, QOL scores and calendar time. Every HAART user was matched to one randomly selected HAART naïve participant at a baseline visit with an equivalent (within 0.1% rounding level) propensity score of HAART initiation. For any HAART unexposed individual selected as a control, the rest of her visits were removed to ensure 1:1 matching. To evaluate the effect of propensity score matching, T tests and chi-square tests were performed to test differences in the distributions of background variables between the exposed and unexposed groups before and after matching.
Pattern mixture model analysis
After matching, the differences of the QOL summary score and the nine domain scores at each visit from their values at the matching baseline visit were used as the study outcomes. To evaluate the effect of HAART, a conventional random effects mixture model could be fit if data were missing only at random, e.g. not related to study outcomes. However, in our analysis, a substantial proportion (33%) of participants, especially those from the HAART naïve group (46%), died during the study follow-up. To obtain a better estimate of changes over time, we utilized a pattern mixture model approach where data were stratified by the pattern of follow-up and distinct models were constructed within each stratum To implement this approach, we grouped the drop-out times into 4 categories (≤ 2, 2.1–4, 4.1–6, and ≥ 6 years) and assumed that the distribution of response would be a weighted mixture over drop-out categories. The overall estimates of variable coefficients and standard errors were obtained across the pattern.
In each model, we included an overall intercept term, a binary indicator for HAART vs. HAART-naïve groups, and a variable reflecting the time (in per 6 months) from the baseline visit, which formed Model 1. Thus, the HAART indicator reflects short-term effects of HAART and the term for time reflects whether this change persists over follow-up. To assess if HAART impacts the overall long-term trend, we fit interaction terms between HAART and time. Furthermore, in order to account for residual confounding and explore possible mediators of how HAART exerted its effect on QOL, a series of models were fit with different combinations of covariates added to previous models: Model 2 added baseline age, ethnicity, and education variables to Model 1, Model 3 added time-varying socioeconomic variables of income, employment, and health insurance to Model 2, Model 4 added time-varying CD4+ cell counts and viral load to Model 3, and Model 5 added time varying symptoms, outpatient visits, hospitalizations, medications, AIDS and depression to Model 4. All statistical analyses were performed using a SAS version 9.1 (SAS Institute, Cary, NC) and Splus 7.0 (Insightful, Seattle, WA).