The HIV reverse transcriptase and protease sequence database is an on-line relational database that catalogues evolutionary and drug-related sequence variation in the human immunodeficiency virus (HIV) reverse transcriptase (RT) and protease enzymes, the molecular targets of antiretroviral therapy ( http://hivdb.stanford.edu ). The database contains a compilation of nearly all published HIV RT and protease sequences, including submissions to GenBank, sequences published in journal articles and sequences of HIV isolates from persons participating in clinical trials. Sequences are linked to data about the source of the sequence, the antiretroviral drug treatment history of the person from whom the sequence was obtained and the results of in vitro drug susceptibility testing. Sequence data on two new molecular targets of HIV drug therapy—gp41 (cell fusion) and integrase—will be added to the database in 2003.
Received September 14, 2002; Revised and Accepted October 9, 2002
Antiretroviral drug resistance is a major obstacle to the successful treatment of human immunodeficiency virus type 1 (HIV-1) infection. A large number of retrospective and prospective studies have demonstrated that the presence of drug resistance before starting a treatment regimen is an independent predictor of success of that regimen ( 1 ). As a result, several expert panels have recommended that HIV reverse transcriptase (RT) and protease sequencing be done to help physicians select antiretroviral drugs for their patients and genotypic resistance testing has been part of routine clinical care for the past several years ( 2 ).
The HIV RT and protease sequence database (HIVRT&PrDB) is intended to assist scientists designing new HIV-1 drugs, clinical investigators studying HIV-1 drug resistance and clinicians using genotypic HIV-1 drug resistance tests ( 3 ). The database links sequence changes in the molecular targets of HIV-1 therapy to other forms of data including treatment history and phenotypic (drug susceptibility) data. Data on the virological response (plasma HIV-1 RNA levels) to a new treatment regimen have been added and will soon be accessible over the web.
The HIVRT&PrDB is a relational database with 19 normalized (nonredundant) core tables, 10 look-up tables and about 20 derived tables. The database is implemented using MySQL on a Linux platform. There are several major hierarchical relationships linking key entities in the database: (i) patient treatment history (list of drug regimens and their start and stop dates); (ii) patient isolate (clinical) sequence drug susceptibility result; (iii) isolate (laboratory) drug susceptibility result; and (iv) patient plasma HIV-1 RNA level. Sequences are stored in a virtual alignment with the subtype B consensus sequence; thus amino acid sequences are also represented as lists of differences from the consensus sequence.
The HIVRT&PrDB contains data from more than 420 published papers. Sequences are available on HIV-1 isolates from more than 7000 individuals and from about 500 laboratory isolates containing mutations generated by virus passage or site-directed mutagenesis. About 20 000 drug susceptibility results from tests performed on more than 2000 virus isolates are available. Figures 1 and 2 contain composite alignments showing 193 protease and 395 RT mutations present at a frequency of >0.1% in HIV-1 isolates from treated and untreated persons. Figure 3 shows a summary of the drug susceptibility results available on each of the 16 approved antiretroviral drugs.
The database allows users to retrieve sets of sequences meeting specific criteria. Commonly submitted queries include: (i) the retrieval of sequences of HIV-1 isolates from patients receiving a specific drug treatment, (ii) the retrieval of sequences of HIV-1 isolates containing mutations at specific protease or RT positions, (iii) the retrieval of drug susceptibility data on HIV-1 isolates containing specific mutations or combinations of mutations, and (iv) a summary of data in any particular reference.
Each query initially returns data in the form of a table and each record in the returned table contains 8 or more columns of data. The data returned include: (i) hyperlinks to the MEDLINE abstract and GenBank record, (ii) a list of mutations in the sequence, (iii) a classification of the sequence by patient and time point, (iv) drug treatment history, and (v) additional data depending upon the query (e.g. drug susceptibility results, phylogenetic data, technical data about virus isolation and sequencing). Together with this table, users are given the option of downloading or viewing the raw sequence data in a variety of formats.
SEQUENCE INTERPRETATION PROGRAMS
The database website contains three sequence interpretation programs. The first program, HIVseq, accepts user-submitted RT and protease sequences, compares them to a reference sequence and uses the differences (mutations) as query parameters for interrogating the database ( 4 ). HIVseq allows users to examine new sequences in the context of previously published sequences, providing two main advantages. First, unusual sequence results can be detected and immediately rechecked. Second, unexpected associations between sequences or isolates can be discovered when the program retrieves data on isolates sharing one or more mutations with the new sequence.
The second program, a drug resistance interpretation program (HIVdb), accepts user-submitted protease and RT sequences and returns inferred levels of resistance to the 16 FDA-approved antiretroviral drugs. Each drug resistance mutation is assigned a drug penalty score; the total score for a drug is derived by adding the scores associated with each mutation. Using the total drug score, the program reports one of the following levels of inferred drug resistance: susceptible, potential low-level resistance, low-level resistance, intermediate resistance and high-level resistance.
The third program (HIValg), allows researchers to compare the output of different publicly available drug-resistance algorithms on the same sequence or set of sequences. The algorithms used by this program are encoded using a programming platform or Algorithm Specification Interface (ASI) developed to facilitate the comparison of HIV genotypic resistance algorithms. ASI consists of an XML format for specifying an algorithm and a compiler that transforms the XML into executable code.
NEW ADDITIONS PLANNED FOR THE HIVRT&PrDB
Two additions to the database are planned: (i) gp41 sequences and data on resistance to fusion inhibitors. The first fusion inhibitor, enfuvirtide (T-20) has been shown to have potent antiretroviral activity in clinical trials ( 5 , 6 ) and is likely to be approved in 2003. A wide range of mutations in gp41 contributing to T-20 resistance, most occurring between residues 36–45, have been reported, but mutations outside of this region also appear to contribute to drug resistance ( 7 , 8 ); (ii) integrase sequences. A new class of compounds that inhibit HIV-1 integrase have been shown to be active in vitro and in a SHIV rhesus macaque model of infection ( 9 , 10 ).