Genetic analysis of HIV-1 Circulating Recombinant Form 02_AG, B and C subtype-specific envelope sequences from Northern India and their predicted co-receptor usage

HIV-1 epidemic in India is largely driven by subtype C but other subtypes or recombinants have also been reported from several states of India. This is mainly due to the co-circulation of other genetic subtypes that potentially can recombine to generate recombinant/mosaic genomes. In this study, we report detail genetic characterization of HIV-1 envelope sequences from North India (Delhi and neighboring regions). Six of 13 were related to subtype C, one B and the rest six showed relatedness with CRF02_AG strain. The subtype C possessed the highly conserved GPGQ motif but subtype B possessed the GPGR motif in the V3 loop as observed earlier. While most of the sequences suggested CCR5 co-receptor usage, one subtype C sample clearly indicated CXCR4 usage. A successful mother to child transmission was established in two pairs. Thus, co-circulation of multiple subtypes (B and C) and the recombinant CRF02_AG strains in North India suggests a rapidly evolving scenario of HIV-1 epidemic in this region with impact on vaccine formulation. Since this is the first report of CRF02_AG envelope from India, it will be important to monitor the spread of this strain and its impact on HIV-1 transmission in India.


Introduction
HIV-1 displays a tremendous amount of genetic diversity. The binding of the HIV-1 to host cells is mediated by envelope glycoprotein. When the HIV-1 envelope protein binds to its primary receptor CD4, it undergoes conformational changes and it then binds to one of the coreceptors (chemokine receptor CCR5, CXCR4 or others) via its V3 loop. This tri-molecular interaction leads to the viral membrane fusion [1]. HIV-1 envelope is composed of rel-atively conserved (C1 to C5) and variable regions (V1 to V5). The V3 region elicits neutralizing antibodies and also govern co-receptor usage [1,2]. Replacements in the V3 region with basic amino acids are associated with CXCR4 usage [2,3]. Subtypes A and C usually contain a highly conserved GPGQ amino acid motif, while GPGR is the predominant motif in the V3 loop of subtype B envelopes [4,5]. Mutational patterns in the V3 loop region are likely to be of clinical significance as they can influence their susceptibility to known CCR5 inhibitors. Although all HIV-1 genetic subtypes originated in Africa, it is not fully understood how certain subtypes dominate different regions of the world. For e.g. subtype B predominates in US and UK but subtype C is predominant in India, some parts of Asia and Africa [6].
It is fairly well established that HIV-1 that uses CCR5 chemokine receptor (R5-tropic) is transmitted preferentially than the ones that use CXCR4 chemokine receptor [7]. Individuals with a 32 bp deletion in the CCR5 open reading frame (ORF) are largely protected against HIV-1 infection [7][8][9]. Approximately 50% of HIV-1 subtype B infected individuals show HIV-1 co-receptor switch from CCR5 to CXCR4 which is associated with rapid progression of HIV/AIDS [10]. This is observed mainly in US and UK where subtype B predominates. However, in India, where subtype C predominates, the coreceptor switch has not been observed [11]. Replacements of charged amino acids within the V3 region are known to alter the co-receptor usage [2,3,12]. Genetic variations in the subtype C HIV-1 envelope sequences have recently been reported from Southern India with some strains exhibiting multiple co-receptor usage, including CXCR4 chemokine receptor, present predominantly on T-helper lymphocytes [13,14]. It is noteworthy that we recently reported novel B/C LTR [15] and Vpr B/C/D sequences from North India [16].
Given the large size of India, and with increasing global travel, it is likely that subtypes other than B may also cocirculate, creating an ideal situation for the formation of recombinants. With this in mind, we genetically characterized the HIV-1 envelope sequences from HIV-1 infected individuals from Northern India and report the presence of HIV-1 CRF02_AG for the first time.

Genomic DNA isolation and Polymerase chain reaction
Genomic DNA was isolated from fresh peripheral blood collected in EDTA using a kit from Qiagen (QIAamp Blood Minikit) as described before by us [8,9]. All requisite ethical clearances were obtained before initiating this study. All the polymerase chain reactions (PCRs) were performed with high fidelity Taq DNA polymerase (Ex-Taq, Takara, Japan) using the following primers: Forward primer: 5'-ATGGGATCAAAGCCTAAAGCCAT-GTG Reverse primer: 5'-AGTGCTTCCTGCTGCTCCCAAGAAC-CCAAG Approximately 1.25 Kb DNA fragment corresponding to V1 to V5 region was amplified initially. Thereafter, 700 bp fragment (V3 to V5) was amplified using two internal sets of primers with following sequences: Forward primer: CTGTTAAATGGCAGTCTAGC Reverse primer: CACTTCTCCAATTGTCCCTCA The cycling conditions for amplifying both the fragments were: 35 cycles at 98°C for 15 sec, 55°C for 30 sec and 72°C for 1 min with a final extension at 72°C for 10 min. PCR-amplified DNA was cloned into pGem-T expression vector (Promega Biotech. WI, USA) and sequenced in both directions using T7 and SP6-specific primers. The sequence from one representative clone from each sample was used to carry out phylogenetic analysis and sequence comparisons. The final concentration of MgCl 2 was 20 mM for both the PCRs. Mother and child samples were processed separately to avoid cross contamination.

Patient population and genetic analysis
We carried out genetic analysis of 13 HIV-1 envelope sequences from Northern India. Nine unrelated and 2 mother-child pairs (Pair 1, D & E 57 and Pair 2, D & E 58) were selected randomly from two locations (one from GTB Hospital, Delhi -Samples ND1 to 5, all from commercial sex workers-CSW) and the rest were from Punjab/ Haryana region. Primers were designed to carry out nested PCR as described earlier. It is noteworthy that we were unable to amplify envelope sequences from several samples which may be due to extreme genetic variability and therefore difficult to draw conclusions about the frequency of any genetic subtype from this study. Alternatively, since most of the HIV-1 infected individuals were on antiretrovirals, the amounts of proviral DNA may have been too small to amplify. Sequences were compared with reference strains (figure 1) (Los Almos-http:// www.hiv.lanl.gov). At least 4 independent clones were analyzed from each sample and only one representative clone from each sample was genetically analyzed. Multiple sequence analysis was performed in ClustalW 1.8.3 obtained from DNA data bank of Japan (DDBJ) website http://clustalw.ddbj.nig.ac.jp/top-e.html. The phylogenetic analysis was carried out using MEGAA 4.1 (beta) software. Genotyping was carried out using viral genotyping tools located at NCBI http://www.ncbi.nlm.nih.gov/ projects/genotyping/formpage.cgi, REGA subtyping tool ver 2.0 http://www.bioafrica.net/subtypetool/html and Recombination Identification Program (RIP) 3.0. http:// www.hiv.lanl.gov/content/sequence/RIP/RIP.html. Potential N-glycosylation sites were calculated using N-GlycoSite program http://www.hiv.lanl.gov/content/ sequence/GLYCOSITE/glycosite.html.

Results and discussion
All of the HIV-1 infected individuals were infected through heterosexual route (except mother-child pair) and their CD4 count varied from 120 -150 (sample A81 & 82) and between 400-500 (D57 and D58). Most of them were under 1 st line of antiretroviral treatment. The GPGQ motif present in the middle of the V3 loop was conserved among all subtype C and CRF_02 AG strains. Remarkably 5 of subtype C samples showed conservation of A residue just downstream of GPGQ motif (not observed in consensus C) and 4 of them showed H to Y change just prior to the second cysteine of the V3 region ( figure 1). The subtype B sample (VT5) possessed the GPGR amino acid motif at the crown of the V3 loop as expected. It is noteworthy that we recently reported novel mosaic B/C HIV-1 LTR and B/C/D recombinant Vpr structures from the same region of India (Punjab/Haryana region) [15,16]. Group M subtype reference sequences along with outlier sequences were downloaded from Los Almos HIV data base. The sequences were subjected to various genetic subtyping tools (Phylogenetic Analysis, RIP 3.0, Viral Genotyping Tools and Rega Subtyping). This analysis indicated that 6 of 13 were related to subtype C, one B and the rest 6 showed resemblance with CRF02_AG strain (figure 2). Successful mother-to-child transmission was detected in both the pairs (Bootstrap value 99 in pair 1 and 71 in Pair 2) as judged by high bootstrap value (figure 2). It is noteworthy that no changes in the V3 sequences were observed in both the mother-infant pairs. Maximum intra-patient proviral diversity was observed in two samples (A81 and C5) (manuscript under preparation). It was reported earlier that subtype determination based on phylogenetic analysis should also be confirmed by using other tools or signature sequences present in V3 region [17]. Representative subtype sequences identified by RIP 3.0 program are given (additional file 1). Each curve is a comparison between the envelope regions being analyzed (query-as indicated at the top of each square) and multiple reference sequences downloaded from the data bank. Using this kind of analysis, HXB2 (panel A) and an isolate with an accession number FJ769836 (panel B), were identified as subtype B; isolate FJ968673 as CRF_02AG (panel C) and isolate with an accession number FJ968672 as subtype C.
The most remarkable finding was the predominance of CRF02_AG strain among the unrelated commercial sex workers (CSWs) from Delhi (Capital of India) region. All the isolates from Punjab/Haryana region showed relatedness with consensus C. This recombinant form is predominantly found in Africa (Cote Divoire, Mali, Senegal, Figure 1 HIV-1 envelope sequence comparison and coreceptor usage. HIV-1 envelope gene was amplified from infected individuals and subjected to sequencing as described in the text. Only the V3 loop region sequences with short flanking constant regions are shown with their accession numbers, their subtype assignment and possible co-receptor usage. Dots in the sequence indicate identity with consensus C, B, 02_AG and A sequences; asterisk indicates identical amino acids; single dot at the bottom of four groups of samples represents semi-conserved substitution of amino acids and double dots represent conserved substitution. Subtypes were determined using Viral Genotyping Tool, REGA Subtyping Tool and RIP 3.0 with maximum blast identity. . This is important because in some instances hypo-glycosylated forms of envelope have been associated with better transmission and in their ability to interact with neutralizing antibody [18].

HIV-1 envelope sequence comparison and coreceptor usage
It was remarkable that sample A81 clearly showed CXCR4 coreceptor usage by both the programs (WebPSSM and Geno2Pheno) designed to predict HIV-1 coreceptor usage. This is important because earlier studies with Indian subtype C envelope showed exclusive use of CCR5 co-receptor [11]. It is important to note that Samples A82 and C5 showed discrepancy in their predicted coreceptor usage and this is because the two programs use different parameters [19,20].
Successful transmission of virus (judged by high bootstrap values) was observed in both the mother-child pair samples. It is important to study the functional implications of the changes in the viral gene sequences between mother-infant pairs to understand the molecular basis of successful transmission [21]. VT5 (subtype B) sample, as expected, showed CXCR4 usage and all of the CRF02_AG strains showed CCR5 usage.
In summary, we show for the first time presence and transmission of CRF02_AG HIV-1 strain in India (Delhi -Capital of India) and presence of subtypes B and C in North India. These observations will impact on the T-cell epitope based vaccine. The existence of multiple HIV-1 genetic subtypes in this region is likely to generate novel and complex recombinants.