Molecular mechanisms of retroviral integration site selection

Retroviral replication proceeds through an obligate integrated DNA provirus, making retroviral vectors attractive vehicles for human gene-therapy. Though most of the host cell genome is available for integration, the process of integration site selection is not random. Retroviruses differ in their choice of chromatin-associated features and also prefer particular nucleotide sequences at the point of insertion. Lentiviruses including HIV-1 preferentially integrate within the bodies of active genes, whereas the prototypical gammaretrovirus Moloney murine leukemia virus (MoMLV) favors strong enhancers and active gene promoter regions. Integration is catalyzed by the viral integrase protein, and recent research has demonstrated that HIV-1 and MoMLV targeting preferences are in large part guided by integrase-interacting host factors (LEDGF/p75 for HIV-1 and BET proteins for MoMLV) that tether viral intasomes to chromatin. In each case, the selectivity of epigenetic marks on histones recognized by the protein tether helps to determine the integration distribution. In contrast, nucleotide preferences at integration sites seem to be governed by the ability for the integrase protein to locally bend the DNA duplex for pairwise insertion of the viral DNA ends. We discuss approaches to alter integration site selection that could potentially improve the safety of retroviral vectors in the clinic.


INTRODUCTION
Retroviral replication requires the covalent integration of the reverse transcribed viral genome into the host cell chromatin. The integrated form of the virus, referred to as the provirus, provides a template for viral gene expression. Because the provirus is an integral part of the host genome, retroviruses persist in the host for the lifetime of the infected cell. This trait of irreversible integration makes retroviruses particularly attractive vehicles for human-based genetic therapy (1).
Although most of the host cell genome is amenable to integration (2), retroviral integration is not a random process (3), with several factors influencing integration site selectivity. There are seven different retroviral genera--alpha through epsilon, lenti and spuma--and the selection of host DNA sequence and chromatin-associated features seems to largely follow genera-specific patterns (4,5). For examples, lentiviruses including HIV-1 prefer to integrate within the bodies of active genes located within gene dense regions of chromosomes (6), while gammaretroviruses such as Moloney murine leukemia virus (MoMLV) display bias for integrating in the vicinity of strong enhancers, active gene promoters and associated CpG islands (7)(8)(9). The deltaretrovirus human T-lymphotropic virus type 1 and the alpharetrovirus avian sarcoma-leukosis virus (ASLV) each display a pattern that differs from HIV-1 and MoMLV, as neither shows a strong preference for active genes or transcription start sites (TSSs) (4,10). The betaretrovirus mouse mammary tumor virus (MMTV) seems the least selective of all, displaying an integration pattern on the genomic level that is basically indistinguishable from random (11,12).
Studies of the mechanisms of retroviral integration have revealed two key players that determine integration site selection: the retroviral integrase (IN) protein and cognate cellular binding partners (13,14). In the case of lentivi-ral INs, integration site targeting is in large part guided by the cellular chromatin binding protein lens epitheliumderived growth factor (LEDGF)/p75, which facilitates integration into active gene bodies (15)(16)(17)(18). More recent studies have identified the bromo-and extra-terminal domain (BET) proteins (bromodomain (BRD) proteins 2, 3 and 4) as the main cellular binding partners of MoMLV IN and demonstrated their role in promoting efficient MoMLV integration near TSSs (19)(20)(21). Collectively, these findings have provided clues as to why different retroviruses exhibit markedly distinct integration site selectivity. Although retroviruses from the other five genera show less dramatic targeting of chromatin-associated features than do either the lentiviruses or gammaretroviruses, we nonetheless expect that these IN proteins also interact with specific nuclear factors to facilitate virus integration.
The significance of integration site selection has been highlighted by studies that have used retroviral vectors in human gene-therapy. Retroviruses present efficient vehicles for the delivery of therapeutic genes due to their trait of stable DNA integration and because they are amenable to pseudotyping with a variety of envelope glycoproteins (1,22,23). In particular, MoMLV-based vectors have been successfully utilized in the treatment of primary immunodeficiencies (24,25). However, adverse effects associated with integration of MoMLV-based vectors near proto-oncogenes were observed in these clinical trials (25)(26)(27)(28). Therefore, understanding the underlying mechanisms for integration site specificity could lead to the development of safer vectors for human gene-therapy. The recent identification of BET proteins as principal binding partners of MoMLV IN offers a new means to understand and address this problem. The present review compares the mechanisms of action of LEDGF/p75 and BET proteins in their ability to navigate HIV-1 and MoMLV integration to select chromatin sites and the implications for human gene-therapy.

INTEGRATION: CATALYTIC MECHANISM AND TAR-GET SITE SELECTION
Retroviral IN exhibits two distinct catalytic activities, termed 3 processing and strand transfer, to covalently insert the viral DNA into the host genome. Productive 3 processing involves the cleavage of a dinucleotide from the 3 ends of the viral DNA, yielding invariant CA OH -3 sequences. During the following strand transfer reaction, IN uses these 3 -hydroxyl groups to generate a staggered cut in complementary strands of the target DNA and concomitantly join the viral DNA ends to the host genome (29,30). The different retroviral IN proteins recognize scissile phosphodiester bonds in target DNA that are separated by 4-6 bp for strand transfer. The single-strand gaps in the DNA recombination intermediate are repaired by cellular enzymes, which accordingly yield 4-6 bp duplications of target DNA flanking the integrated provirus. In infected cells, IN functions in the context of a large nucleoprotein complex called the preintegration complex (PIC), which in addition to IN and viral DNA is comprised of a number of viral and cellular proteins.
X-ray crystal structures of the spumaviral prototype foamy virus (PFV) IN in complex with viral and target DNA substrates have provided a major breakthrough for understanding the mechanism of integration (31,32). One key feature observed in the functional complex is that the target DNA is significantly kinked to optimally position IN active sites for the pair-wise strand transfer events. These findings augmented earlier biochemical data (33)(34)(35)(36)(37) showing that IN favors integration into DNA acceptor sites that display inherent bendability, including nucleosomal DNA wrapped around core histones. In particular, the widened major groove, where nucleosomal DNA is relatively distorted, appears to be preferentially targeted by retroviral INs. Retroviral INs additionally exhibit a preference for weakly conserved palindromic sequences that center around the staggered cut in target DNA (3,(38)(39)(40)(41). It is logical that these sequences are only weakly conserved, as strong nucleotide sequence specificity would be disadvantageous for viral fitness since this would limit the distribution of proviral sites suitable for optimal gene expression. Accordingly, the majority of contacts between PFV IN and target DNA in the crystal structures are mediated through the phosphodiester backbone (31).
Recent research has indicated that the nature of the palindromic sequence in large part underlies the bendability of the substrate at sites of viral DNA joining. In particular, the preferred PFV integration site (-3)KWK↓VYRBMWM(6) (written using International Union of Biochemistry base codes; the arrow marks the position of viral DNA plusstrand insertion and the underline notes the target site duplication, which is 4 bp for PFV) (42) preferentially harbors the YR dinucleotide at the center of the integration site. The varying combinations of purine (R)/pyrimidine (Y) dinucleotides possess inherently different base stacking propensity and hence flexibility: YR and RY are the most and least distortable, respectively, while RR and YY fall in between (43). Retroviral IN proteins harbor three common domains: the zinc-chelating N-terminal domain (NTD), central catalytic core domain (CCD) that harbors the enzyme active site and C-terminal domain (CTD) (reviewed in 44). Because PFV IN amino acids Ala188 in the CCD and Arg329 in the CTD interact with the eight bases of the target DNA consensus nucleotide signature that abut the central YR, the DNA palindrome represents preferentially bendable sequences that result from PFV INbase interactions and the centrally flexible YR sequence (31). Similar scenarios are likely for the other retroviruses. Recent analysis of the consensus HIV-1 integration site (-3)TDG↓(G/V)TWA(C/B)CHA(7) highlighted the dinucleotide signature (0)RYXRY(4) at its center. Though enriched in rigid RY dinucleotides at first glance, this pattern actually ensures for relatively flexible sequences overlapping the center of the presumed DNA bend: Y at the center X yields YY and YR at nucleotide positions 1 and 2 and at 2 and 3, respectively, whereas R in the center yields YR and RR at these respective positions. Due to the lack of HIV-1 IN-DNA structures, less is known about the details of INtarget DNA contacts that abut these central flexible motifs than is known for PFV IN. Nevertheless, mutagenesis studies indicate that HIV-1 IN CCD residue Ser119, like its Ala188 analog in PFV IN, interacts with the bases that lie three positions upstream from the points of viral DNA joining (45).

LEDGF/P75 AND HIV-1 INTEGRATION
LEDGF/p75 was identified as an HIV-1 IN binding protein using different proteomic screens, including mass spectrometry (MS)-based analysis of cellular factors associated with ectopically expressed IN in human cells and by yeast 2hybrid (46)(47)(48)(49). Because LEDGF/p75 knockdown significantly reduced the steady-state level of ectopically expressed HIV-1 IN, the host factor appeared to be the principal cellular binding partner of the viral protein in human cells (50,51).
LEDGF/p75 is a ubiquitous cellular protein. It was initially discovered as a human transcriptional coactivator (52) and has been shown to move around the nucleus of living cells by interacting with chromatin in a hop/scan mode that is common among transcription factors (53). LEDGF/p75 also interacts with a number of cellular proteins including JPO2 (54,55), Cdc7-activator of S-phase kinase (ASK) (56), the 'domesticated' transposase pogZ (57) and menin, which links LEDGF/p75 with mixed-lineage leukemia (MLL) histone methyltransferase and results in MLL-dependent transcription and leukemic transformation (58).
The structural organization of LEDGF/p75 ( Figure 1A) reveals an N-terminal PWWP domain, a basic-type nuclear localization signal, two AT-hook DNA binding motifs and three highly charged regions (CR1-3) that allow this protein to tightly engage chromatin throughout the cell cycle (59,60). The C-terminal region contains a domain that is termed the IN binding domain (IBD) for its ability to directly interact with HIV-1 IN (61). LEDGF/p75 belongs to the hepatoma-derived growth factor related protein (HRP) family that comprises five additional members (HDGF, HRP1-3 and LEDGF/p52). HRP2 is the only other known cellular protein that contains both an IBD and PWWP domain ( Figure 1A). The other family members, including the smaller alternatively spliced isoform of LEDGF (LEDGF/p52), lack the IBD and thus fail to interact with HIV-1 IN (50).
LEDGF/p75 binds tightly to a number of lentiviral INs but fails to interact with INs from the other retroviral genera (62)(63)(64). Accordingly, in vitro assays with purified INs have revealed that LEDGF/p75 significantly stimulated the strand transfer activities of lentiviral but not of other retroviral INs (47,60,61,64). Initially it was unclear as to whether LEDGF/p75 also promoted efficient HIV-1 integration in cells, as significant knockdown of LEDGF/p75 either failed to reduce infectious HIV-1 titer (62) or yielded only ∼2-fold reductions in integration (65,66). However, parallel findings indicated that residual amounts of chromatin-associated LEDGF/p75, which could persist among cell clones despite an overall efficient level of knockdown, were sufficient to support near wild-type levels of HIV-1 infection and integration (67). Consistent with this finding, studies using LEDGF/p75 knockout cells revealed 5 to 80-fold defects in HIV-1 titer associated with ∼2 to 12-fold reductions in integration (16)(17)(18)68). The integration defect of the ungulate lentivirus equine infectious anemia virus (EIAV) in mouse knockout cells was reportedly >50-fold (16). Significant inhibitory effects on HIV-1 replication were also observed in cells engineered to express constructs that contained the IBD but lacked the N-terminal elements present in full length LEDGF/p75 that confer chromatin binding (67,69,70). The monitoring of viral replication intermediates has pinpointed that LEDGF/p75 depletion or overexpression of dominant-interfering IBD constructs does not significantly affect HIV-1 reverse transcription but instead selectively impairs integration. Collectively, the in vitro and cell culture experiments conclusively demonstrated a stimulatory role for LEDGF/p75 on lentiviral DNA integration.
Genome-wide integration site mapping experiments were carried out to explore the role of LEDGF/p75 in integration site selectivity. The first line of evidence for its role in HIV-1 target site selection emerged from the analysis of LEDGF/p75 knockdown cells, where significantly reduced frequencies of HIV-1 integration into active genes were observed even though these cells supported normal levels of HIV-1 infection (15,62). Subsequent experiments using LEDGF/p75 knockout cells corroborated these findings and extended them to show that a significant percentage of HIV-1 proviruses were aberrantly located near TSSs in the absence of the host factor (16,17). Furthermore, chimeric constructs that replaced the N-terminal chromatin-binding portions of LEDGF/p75 with the chromatin binding regions of other proteins supported efficient HIV-1 infection (71)(72)(73) and retargeted integration away from active genes and toward the sites preferentially bound by the heterologous chromatin binding domains (72)(73)(74). For example, replacing the N-terminal PWWP domain and AT hooks of LEDGF/p75 with a plant homeodomain (PHD) finger redirected HIV-1 DNA integration to TSSs. Integration frequencies within 2.5 kb of TSSs were 50.3 and 3.8% in the presence of the ectopically expressed fusion and wild-type LEDGF/p75 proteins, respectively (72). Similar use of the chromobox homolog 1 (CBX1) and heterochromatin protein 1 (HP1) alpha chromatin binding modules imposed a genomic pattern of HIV-1 integration that resembled random (72,73).
Mapping of the LEDGF/p75 chromatin-binding profile along the Encyclopedia of DNA Elements has revealed a preference for binding active transcription units, which paralleled the enhanced HIV-1 integration frequencies at these locations (75). These observations extended the prior finding that LEDGF/p75 was required for the chromosomal association of ectopically expressed HIV-1 IN in human cells (50). Collectively, these findings provide strong evidence that LEDGF/p75 tethers PICs to active transcription units during HIV-1 integration. Although LEDGF/p75 can potently stimulate HIV-1 IN catalytic function in vitro (47,60,61,76,77), it is somewhat unclear if the host factor provides this function during virus infection. LEDGF/p75 immunoprecipitation can recover PIC activity from infected cell extracts, indicating that the IN-binding factor is a component of the HIV-1 PIC (62). Interestingly, the wild-type level of HIV-1 PIC activity is maintained in samples isolated from LEDGF/p75 knockout cells (17). Therefore, LEDGF/p75 may provide chromatin-tethering function to lentiviral PICs without contributing to the formation of the catalytically active intasome.
The frequency of HIV-1 integration into active transcription units remained greater than random in LEDGF/p75 knockout cells (16)(17)(18), suggesting a potential role for other cellular proteins in integration targeting. In particular, a The N-terminal region of LEDGF/p75, which contains a PWWP domain, charged regions (CR) 1-3, nuclear localization signal (NLS), and AT-hooks, interacts with chromatin. Similar to LEDGF/p75, HRP2 contains an N-terminal PWWP domain and AT-hooks. HRP2 has an additional domain, termed the homology region III (HR3) that is conserved in multiple HRP2 homologs as well as in LEDGF/p75. The C-terminal regions of both proteins exhibit the IBD that directly interacts with lentiviral INs. (B) The BET proteins consist of BRD2, 3, 4 and T (not pictured). Whereas BRD3 is expressed as a single isoform, BRD2 is expressed as four isoforms (isoform 1 is pictured) and BRD4 as three isoforms (isoforms A and C are pictured; as compared to isoform C, isoform B harbors a unique 75 amino acid C-terminal tail that interacts with condensing II complexes; 183). Known domains and their respective start and end amino acids numbers are indicated. Two N-terminal bromodomains (BD I and II) and motifs A and B collectively contribute to high affinity chromatin binding. In the C-terminal region of the BET proteins, the conserved ET domain interacts with multiple proteins including MoMLV IN. Other domains include the SEED domain, which is present in all BET proteins, the BID, which is present in all BRD4 isoforms, and the CTM, which is unique to BRD4 isoform A. role for HRP2 was investigated due to its close structural similarity with LEDGF/p75 ( Figure 1A). In vitro assays with purified proteins demonstrated that HRP2 tightly binds HIV-1 IN and significantly stimulates its catalytic function (61). However, unlike LEDGF/p75, HRP2 does not remain bound to chromatin throughout the cell cycle (78). HRP2 depletion in cells containing normal levels of LEDGF/p75 did not have any detectable effect on HIV-1 titer or integration targeting (18,67,79,80). When HRP2 was depleted in LEDGF/p75 knockout cells, a further reduction in the preference of HIV-1 for integrating into active genes was observed (80,81). These findings argue that LEDGF/p75 is the principal cellular determinant for targeting HIV-1 integration to active transcriptional units and that HRP2 could play a secondary role. Notably, the preference for HIV-1 to integrate into active genes remained greater than random in LEDGF/HRP2 double knockout cells, suggesting that subsidiary targeting roles might exist for as of yet undefined lentiviral IN-binding proteins (80).

BET PROTEINS AND MOMLV INTEGRATION
The observation that the distribution of MoMLV proviruses along chromatin differed markedly from HIV-1 suggested that MoMLV IN relies on cellular binding partners other than LEDGF/p75 for integration target-site selection. Initial experiments with yeast 2-hybrid screening revealed a number of potential binding partners of MoMLV IN, including BRD2 (82). More recent MS-based proteomic analysis of human cellular proteins that co-purified with recombinantly expressed MoMLV IN identified BET proteins (BRD2, 3 and 4) as main binding partners of the viral protein (19,21). BRD2, 3, 4 and T belong to the BET protein family, which in turn is a part of the extended BET family that Nucleic Acids Research, 2014, Vol. 42, No. 16 10213 includes BRD1, 7, 8 and 9. BRD2, 3 and 4 are ubiquitously expressed, whereas BRDT is only expressed in testis. BET proteins have been implicated in transcription, DNA replication and cell cycle control (reviewed in 83,84). They exhibit several conserved domains and motifs (Figure 1B). Two N-terminal bromodomains (BD-I and BD-II) bind acetylated H3 and H4 tails on chromatin (85,86). Two conserved motifs, A and B, are positively charged and could contribute to DNA binding (87). An additional basic residue-enriched interaction domain (BID) has recently been described for BRD4 and shown to interact with cellular factor p53 (88). While this domain has not been confirmed in BRD2 or BRD3, sequence alignments identify a similar region corresponding to BRD2 residues 533-584 that is 44% identical and 66% homologous considering conservative amino acid substitutions. A short 17-residue region of BRD3 (amino acids 476-493) shows homology to the N-terminal part of the BID, though this could be due to the overlapping B motif. The C-terminal extra-terminal (ET) and SEED (Ser/Glu/Asp-rich region) domains that are present throughout the BET proteins directly engage various cellular proteins including transcription factors, chromatin modifying proteins, histone modifying enzymes and also interact with different viral proteins (reviewed in 84). BRD3 exhibits a single isoform, whereas BRD2 and BRD4 are expressed in several isoforms ( Figure 1B). In addition to the above structural elements, BRD4 isoform C contains a C-terminal motif (CTM) that has been shown to bind a number of cellular proteins and also plays a role in HIV-1 latency (89)(90)(91).
Notably, BET proteins play an active role in the life cycles of different viral families including Papillomaviridae, Herpesviridae, Polyomaviridae and Retroviridae. In addition to regulating transcriptional activation of Epstein-Barr virus, Kaposi's sarcoma-associated herpesvirus (KSHV), and papillomavirus, they can repress the transcription of papillomavirus E6 and E7 promoters, aid papillomavirus episomal maintenance and genome segregation, and control reactivation of latent HIV-1 reservoirs (reviewed in 83,92; also see 90,91,[93][94][95][96]. Of these, the most pertinent to this review is the role of BET proteins in tethering papillomavirus genomes to condensed mitotic chromosomes (93,94), which is mediated through the binding of BRD4 with papillomavirus E2 protein (93,97,98). In particular, the C-terminal DNA-binding domain of E2 protein binds viral DNA, whereas the N-terminal transactivation domain of the E2 protein directly interacts with the C-terminal region of BRD4. In turn, this nucleoprotein complex is tethered to host chromatin by the two N-terminal bromodomains of BRD4, which associates with acetylated lysines in the tail regions of histones H3 and H4 (83,86,99,100). This tethering mechanism ensures papillomavirus episomal maintenance by coupling the viral genomes to host chromosomes during mitosis, and subsequent distribution to daughter cells after cell division.
Investigation into the role of BET proteins during MoMLV integration has also revealed a bimodal tethering mechanism. Biochemical assays with purified proteins have revealed direct, high affinity interactions between BET proteins and MoMLV IN as well as between BET proteins and mononucleosomes (19)(20)(21)87). Furthermore, pu-rified recombinant BRD4-C (19,87) and, to a lesser degree, the isolated ET domains of BRD2, 3 and 4 (20), significantly enhanced the pair-wise or concerted integration activity of MoMLV IN in vitro (19,20,87). BET protein binding and enzymatic stimulation was specific for IN proteins derived from gammaretroviruses and not for other retroviruses (19)(20)(21). The stimulation of MoMLV IN in vitro activity was mediated through the bimodal interaction of BRD4 with naked DNA and MoMLV IN (87). Interestingly, the levels of stimulation of MoMLV IN integration activities by BRD4 were comparable to that of HIV-1 IN by LEDGF/p75 (19). In addition, MoMLV IN and BET proteins have been shown to colocalize in cell nuclei (20,21).
Small molecules JQ-1 and I-BET, which specifically impair interactions of all three BET proteins with cognate histone marks (86,101), were exploited to examine the role of BET proteins in MoMLV replication. These inhibitors selectively impaired MoMLV but not HIV-1 replication (19)(20)(21). Furthermore, the analysis of replication intermediates revealed that the inhibition of BET proteins with JQ-1 impaired MoMLV integration in a dose dependent manner, yielding inhibitory concentration 50% (IC 50 ) values of ∼50-100 nM (19,20). Taken together, in vitro and cell culture experiments indicate that the BET proteins function for MoMLV like LEDGF/p75 does for HIV-1: specific chromatin tethers that interact with PICs by binding their cognate IN and potentially stimulating its enzymatic function.
The chromatin binding sites of BET proteins have been mapped using ChIP-Seq experiments (100), which when compared with retroviral integration sites showed a positive correlation with MoMLV but not with HIV-1 or ASLV (19,21). In particular, MoMLV exhibited a strong preference for promoters associated with the binding sites of BET proteins. Treatment with JQ-1 and I-BET was used to experimentally examine the roles of BET proteins in MoMLV integration site selectivity (19,20). Alternatively, the effect of concurrent down-regulation of BRD2, 3 and 4 by a pool of short interfering (si) RNAs was investigated (19). Treatment with inhibitors or siRNA significantly reduced the characteristic preference of MoMLV for integrating near TSSs. For example, in the absence of JQ-1, MoMLV integration in HEK293T cells was strongly favored (39% integration events) within 2-kb regions of Ref-Seq TSSs, whereas after JQ-1 treatment this frequency was reduced in a dose-dependent manner to 11% at the highest dose tested (19). A complementary approach investigated an artificial fusion protein that contained the N-terminal chromatin binding segment of LEDGF/p75 (amino acids 1-324) and the C-terminal BRD4(ET/SEED) segment that interacts with MoMLV IN (21). Ectopic expression of the chimeric LEDGF(1-324)/BRD4(ET/SEED) protein retargeted MoMLV integration away from TSSs and toward the bodies of active genes, a pattern that is reminiscent of LEDGF/p75-mediated lentiviral integration. Taken together, these studies (19)(20)(21) have dissected the role of BET proteins in targeting MoMLV integration near TSSs.

STRUCTURAL ASPECTS OF HIV-1 IN-LEDGF/P75 AND MOMLV IN-BET PROTEIN INTERACTIONS
Highly conserved structural features of retroviral IN proteins include the catalytic triad of Asp, Asp and Glu (DDE) residues that coordinates a pair of essential Mg 2+ ions during 3 processing and strand transfer (32) and the Znbinding motif (HH-CC type) in the NTD that contributes to IN multimerization and DNA binding (102,103) (Figure 2A). Furthermore, the crystal structure of the PFV INviral DNA complex, or intasome (103), has enabled plausible molecular modeling studies of HIV-1 IN interactions with its DNA substrates (45,(104)(105)(106)(107). These studies in turn suggest that the overall architectures of different retroviral intasomes may exhibit significant resemblance with the PFV structure. However, despite potential overall similarity among different retroviral intasome structures, studies with LEDGF/p75 and BET proteins have revealed that retroviruses from different genera markedly differ in their interactions with their cognate cellular binding partners.
The principal LEDGF/p75 IBD binding determinant on HIV-1 IN is the CCD (108), although the NTD is additionally required for high affinity binding (50,77,109). A short interhelical loop from the IBD docks into a narrow, Vshaped cavity at the interface of two IN CCD molecules and establishes functionally critical hydrogen bonds between LEDGF/p75 hotspot residue Asp366 and the backbone amides of IN residues Glu170 and His171 ( Figure 2B and C) (108,110). The LEDGF/p75 binding pocket is conserved amongst lentiviral INs, whereas the corresponding segments in other retroviral INs exhibit significant differences (108). While the dimeric organization of the PFV CCDs is present in the functional intasome, the two interacting subunits of the PFV CCD create an ∼90 • angle compared with an acute angle observed for the lentiviral CCDs. Additionally, IN CCD residues that interact with LEDGF/p75 show greater degrees of conservation amongst lentiviral as compared to the other retroviral proteins despite the fact that all retroviral IN CCDs contain the invariant DDE catalytic triad (64,109,111; also see Figure 2B).
Alignment of the primary sequences of different retroviral INs has revealed that the C-terminal 28 amino acid tail of MoMLV IN is unique to the gammaretroviruses (112) (Figure 2D). For a long time, the functional significance of this tail had remained enigmatic, as various deletions of it did not significantly affect MoMLV infectivity (113). Recent reports have clarified that the role of the C-terminal tail is to directly bind the BET proteins (21,87,112). Nuclear magnetic resonance, MS-based protein footprinting and site-directed mutagenesis experiments have collectively identified that the MoMLV IN C-terminal tail (amino acid residues 386-405) directly mediates interactions with BET proteins (21,87,112). This region of MoMLV IN was disordered in the unliganded CTD structure but became ordered in the presence of the BRD3 ET domain (112). Importantly, C-terminal truncation mutants of recombinant MoMLV IN lacking all or part of the C-terminal tail exhibited markedly impaired interaction with BRD4, but retained wild-type levels of IN catalytic activities (87). Consistent with this observation, the MoMLV C-terminal deletion mutant 1-385 lost the ability to interact with BRD2, 3 and 4, and a 24-mer peptide composed of IN residues 386-408 ( Figure 2D) disrupted the interaction between full length IN and the BRD3 ET domain in vitro (112). Somewhat unclear is the extent to which MoMLV IN regions outside the C-terminal tail might contribute to BET protein binding. Whereas Sharma et al. did not detect any binding to a CTD deletion mutant of MoMLV IN (19), Gupta et al. reported that an N-terminal deletion mutation that removed the NTD and first 50 residues of the CCD significantly reduced binding despite the fact that this construct harbored an intact C-terminal tail (20). Results of Ala-scanning mutagenesis led these investigators to suggest that residues that compose CCD ␣ helix 6 contributed to BET protein binding (20). In summary, whereas numerous groups have highlighted the importance of the MoMLV IN C-terminal tail region in BET protein binding (19)(20)(21)87,112), additional work is required to help clarify the extent to which the CCD contributes to overall binding affinity.
There is precedence for integration-mediated targeting through the C-terminal tail of an IN protein and a cognate chromatin binding protein. Retrotransposons are analogous to retroviruses, with the exception that they lack an extracellular phase of replication. To avoid inactivation of essential genes, the integration of these elements is tightly linked to subsets of genomic loci. In the case of the yeast Ty5 retrotransposon, integration is favored into heterochromatin (114). Ty5 integration targeting is mediated by a 6-mer peptide at the IN C-terminus and the host heterochromatin protein Sir4p (115). With now two examples of retroelement integration targeting mediated between the C-terminal tail of IN and a cognate chromatin binding protein, we predict that other viruses/transposons will also be found to take advantage of this design to steer the integration of their reverse transcripts.
Recent truncation mutagenesis and MS-based protein footprinting experiments have identified the ET domain of BET proteins as the primary interface interacting with MoMLV IN (19)(20)(21)87). The BID region of BRD4 contributed additional interactions and increased the binding affinity to MoMLV IN (87). Complementary NMR and mutagenesis experiments have defined the MoMLV IN binding sites on the BRD4 ET domain in more detail (20,21,87). The majority of interacting residues are located on ET helices 2 and 3 and the loop connecting these two helices ( Figure 2E). These studies have provided structural clues for the specificity and high affinity binding between BRD4 and MoMLV IN. The determination of the structure of the MoMLV IN CTD bound to a BRD ET domain is expected to provide further valuable details about how the viral and cellular proteins recognize each other.
Interestingly, whereas expression of the LEDGF/p75 IBD in target cells can potently inhibit HIV-1 infection and integration (67,69,70), over expressing the IBD of BRD2 (residues 640-801) stimulated MoMLV infection and integration ∼2-fold (20). Although the reason behind this rather dramatic difference is unclear, we speculate it may lie in the mode of host factor-IN binding. As discussed, the LEDGF/p75 IBD engages IN at the CCD-CCD dimer interface (108) (Figure 2C (7,186). These IN proteins have also been shown to bind BET proteins in vitro (19)(20)(21). Below the alignment is the consensus Wx xxpxxPLb b xR sequence, where p stands for small polar (S or T) residue, b stands for basic (R, K, or H), stands for small hydrophobic (M, V, I, or L) and x refers to a position that is not conserved across the alignment. (E) Ribbon diagram of the NMR structure of BRD4 ET domain, with residues in red implicated in interacting with the MoMLV IN CTD as determined by chemical shift perturbations. These interactions were predominantly observed in helices 2 and 3 and the short loop connecting them (indicated by an arrow). (116)(117)(118). Conceivably, over expressed IBD protein in target cells, which functionally inhibits viral DNA integration, could similarly inhibit IN catalysis. BET proteins by contrast engage a functionally inert aspect of gammaretroviral IN structure, the disordered C-terminal tail (87,112). Accordingly, we conclude that forced expression of a BET protein IBD in target cells is unlikely to deregulate IN catalytic function. The purified ET domains of BRD2, 3 and 4 could moreover stimulate MoMLV IN strand transfer activity in vitro (20), indicating that the protein domains might have similar IN stimulatory activity during virus infection. Consistent with this interpretation, immunoprecipitation of ectopically expressed green fluorescent protein fusions to either BRD2 or BRD4 co-precipitated IN from cells infected with MoMLV (20). The analysis of HIV-1 and MoMLV PICs derived from cells over expressing the LEDGF/p75 or BET protein IBD should reveal if different levels of IN catalytic function determine the differences observed in viral titer under these infection conditions.
The N-terminal PWWP domain is the key determinant for the site selective association of LEDGF/p75 with chromatin. HIV-1 integration sites in the presence of wild type LEDGF/p75 differed substantially from those generated in the presence of truncation mutants that lacked the PWWP domain (127). NMR structures of the LEDGF/p75 PWWP domain revealed two distinct functional interfaces: a well-defined hydrophobic pocket that interacts with the H3K36me3 histone tail, and an adjacent basic interface that non-specifically engages DNA (120,121). Interestingly, the LEDGF/p75 PWWP domain exhibited low binding affinities for both an isolated H3K36me3 peptide and for naked DNA, whereas it interacted tightly with mononucleosomes that contained a tri-methyl-lysine analogue at position 36 of H3. These results indicate that cooperative binding of LEDGF/p75 with both the H3K36me3 tail and nucleosomal DNA is essential for the tight and site-selective association of LEDGF/p75 with chromatin (120). Indeed, mutations introduced in either the hydrophobic pocket or the basic surface significantly compromised the ability of LEDGF/p75 to both associate with chromatin and stimulate HIV-1 integration (128). These findings collectively indicate that LEDGF/p75-mediated navigation of lentiviral PICs to actively transcribed genes provides IN with increased access to nucleosomal DNA, which are the favored sites for integration both in vitro and in infected cells (35,36,129).
Similar to HIV-1, MoMLV integration sites are periodically distributed on nucleosomal DNA along cellular chromosomes (122). Furthermore, a recent study has suggested that akin to LEDGF/p75, BET proteins engage both DNA wrapped around the histone core and their cognate epigenetic marks to tightly bind chromatin (87). For example, BRD4 bound purified native mononucleosomes with significantly higher affinity than either naked DNA or isolated acetylated peptides. The two N-terminal bromodomains of BET proteins have been shown to interact with a number of acetylated H3 and H4 peptides but not with their unmodified counterparts (123)(124)(125). Furthermore, peptides con-taining multiple acetylated sites were particularly favored (123,130). Yet, the tightest binding affinity reported to date, ∼3 M, which was seen between BRD4 BD-I and a tetraacetylated H4 substrate, is a comparatively weak interaction (130). Two conserved motifs, A and B, which are located adjacent to the bromodomains ( Figure 1B), exhibit highly basic interfaces and contribute to BET protein binding to DNA. However, BRD4 interacted with naked DNA with a relatively low binding affinity (∼2 M) compared with the much tighter binding (K d ∼60 nM) detected with native mononucleosomes (87). Thus the cooperative binding to both cognate histone marks and nucleosomal DNA could be a generic mechanism employed by various chromatin tethers to allow their tight interactions with select regions of chromatin.
Recent reports that significantly extended the number of unique MoMLV integration sites analyzed (∼3.9 million) have yielded novel insight into the mechanism of MoMLV PIC targeting (8,9). For example, approximately half of all MoMLV integrations occurred within 1.6-2.0% of the human genome (8). Close examination revealed that strong enhancers and active promoters are superior predictors of MoMLV integration as compared to TSSs. Clustered transcription factor binding sites essentially comprise enhancer elements, which function to form a platform for transcriptional regulatory complex recruitment (reviewed in 131). In terms of MoMLV integration, the greatest enrichment was found in enhancers that are characterized by H3K4me1, H3K4me2, H3Kme3, H3K27ac and H3K9ac marks (8,9). However, the precise hierarchy of favored histone modifications varied among cell type (8,9), which likely recapitulates the observation that the activities of many enhancers are cell-type specific (132). Independent studies suggested that enhancers are the major source of BRD4-dependent transcriptional activation (133) and that genes that are regulated by strong enhancers are particularly sensitive to BET inhibition (134). Because BET proteins are unlikely to directly engage methylated histone tails, their association with strong enhancers could be mediated through direct interactions with H3K27ac and H3K9ac marks and/or with congregated heterologous transcription factors (134).
Although the specificity of favored enhancer-associated epigenetic mark can vary among cell type, a significant number, about one-third, of targeted H3K4me1 marks in CD4+ T cells and CD34+ hematopoietic stem cells overlapped (9). Therefore, the correlation of integration sites in one cell type to mapped positions of histone epigenetic mark in a second cell type can yield overall global patterns of MoMLV site selectivity in response to BET protein disruption, for example through targeted siRNA depletion of host factors or JQ-1 treatment (21,87) ( Figure 3B). The observation that JQ-1 treatment or concurrent BET protein depletion significantly reduced MoMLV integration frequencies at sites associated with enhancer and promoter-associated histone marks is consistent with the BET protein-mediated tethering mechanism of MoMLV integration (9,21,87). As the genomic occupancies of BRD2, 3 and 4 are non identical (135), further experimentation that correlates particular BET protein binding sites and histone epigenetic marks across cell type will help to better understand the detailed mechanisms that underlie MoMLV integration site selectivity. Despite the fact that LEDGF/p75 and BET proteins recognize distinct histone marks and bind different retroviral INs, the overall bimodal interaction (Figure 4) used to tether retroviral PICs to chromatin seems to be a common mechanism. HIV-1 and MoMLV depend on these cellular factors for effective and timely access for the integration of their viral DNAs into host chromosomes. Relatively rapid targeting to chromatin acceptor sites for IN-mediated strand transfer is likely crucial for virus survival, as the propensity for unintegrated DNA to be either degraded or modified by cellular proteins increases over time. For example, the two viral DNA ends can be ligated to form 2-long terminal repeat (LTR)-containing circles, which are a dead end for the viruses because they cannot support replication  ) is able to bind selectively and with high affinity to mononucleosomes through the cooperative binding of the dual bromodomains with acetylated H3 and H4 histone tails (H4 acetylation depicted here) and motifs A and B with DNA (shown in red) wrapped around the histones (shown in gray). The C-terminal region of the BET protein is able to engage the MoMLV intasome (depicted with a tetramer of MoMLV IN in purple and viral DNA in a dark red single line) through its extra terminal (ET) domain, which binds to the C-terminal tail of MoMLV IN. The SEED domain does not directly contribute to these interactions but may play an accessory role in complex stability (87). (136). Another key reason for HIV-1 and MoMLV to utilize LEDGF/p75 and BET proteins is to preferentially position their viral DNA into transcriptionally active regions of the host genome. Having such a distribution of proviral DNA should facilitate viral gene expression. Therefore, the ability of LEDGF/p75 and BET proteins to both enhance integration efficiency and preferentially target the site of in-tegration into favorable regions for HIV-1 and MoMLV collectively ensures for effective viral replication.

OTHER VIRAL AND CELLULAR FACTORS AFFECT-ING INTEGRATION SITE SELECTIVITY
HIV-1 and MoMLV take different paths en route to the host chromosomal sites for integration. Lentiviruses can efficiently infect non-dividing cells, and their PICs can accordingly traverse through the nuclear pore complexes that perforate the interphase nuclear envelope (137 for review). MoMLV PICs lack this ability, and gammaretroviruses accordingly rely on mitosis and nuclear envelope dissolution to access cell chromosomes (138). Therefore, it is not surprising that a number of cellular proteins that are involved in nuclear transport have also been shown to influence HIV-1 but not MoMLV integration. The key HIV-1 determinant that governs PIC nuclear import is the viral capsid (CA) protein, which is expressed as part of the Gag structural precursor protein (137,139).
Genome-wide siRNA screens have identified cell host factors that are important for efficient HIV-1 infection (140)(141)(142). Of these hits, nucleoporin (NUP) proteins NUP358 (also known as RanBP2) and NUP153, as well as the beta-karyopherin transportin-3/TNPO3 (also known as TRN-SR2), have been scrutinized for their roles in the early steps of HIV-1 replication (143)(144)(145)(146)(147)(148). Depletion of these cellular proteins could not only adversely affect PIC nuclear localization and integration efficiency, but also alter the pattern of HIV-1 proviruses along chromatin. In particular, RanBP2, TNPO3 or NUP153 depletion resulted in reduced HIV-1 integration frequencies in gene dense regions of chromosomes (144,146,147). As noted earlier, IN is the key viral protein that governs integration site selection (13). To investigate potential roles for other viral proteins in integration site selection, chimeric HIV-1 viruses containing the substitution of MoMLV Gag counterparts were previously examined. Interestingly, these chimeric viruses displayed reduced integration frequencies in gene rich regions, which suggested a Gag-dependent role in integration site targeting (13,144,146). Moreover, a single missense mutation that resulted in an N74D change in HIV-1 CA counteracted the preference for HIV-1 to integrate into gene-rich regions of chromosomes (146,149). Because the mutant virus with an N74D CA substitution efficiently infected cells that were depleted for RanBP2, TNPO3 or NUP153, its novel integration profile may be linked to an alternative pathway of PIC nuclear import. Collectively, these findings suggest that the route taken by HIV-1 PICs during nuclear import is directly linked to integration site selection. Accordingly, a two-step model has been proposed, where during nuclear entry the nuclear pore components direct HIV-1 PICs toward regions of high gene density, after which the PICs engage LEDGF/p75 to gain access to active gene bodies (144,146). Consistent with this interpretation, results of fluorescent imaging indicate that HIV-1 has the propensity to integrate into chromatin that is associated with the nuclear periphery (150).
The interaction of MoMLV IN with BET proteins is the best studied example of a virus-host interaction that determines gammaretroviral integration target site selection.
Nucleic Acids Research, 2014, Vol. 42, No. 16 10219 However, it is noteworthy that even with potent inhibition of BET proteins by JQ-1, which blocks their interactions with cognate histone marks, integration events at TSSs, while significantly reduced, were still substantially higher than random or when compared to HIV-1 (19,21). These findings suggest that additional host and/or viral factors could contribute to the integration pattern characteristic of MoMLV. One component of MoMLV PICs is the p12 Gag protein, which has been shown to mediate the association between PICs and condensed mitotic chromosomes (151). However, mutations that altered p12 interactions with chromatin had no detectable effects on MoMLV integration target site selection (152). Identification of new players in MoMLV integration target site selection will not only help to elucidate the molecular mechanisms of MoMLV integration, but will also inform ongoing efforts to develop retroviral vectors for human gene-therapy.

IMPLICATIONS FOR DEVELOPING RETROVIRAL VECTORS FOR HUMAN GENE-THERAPY
Retroviral vectors have been successfully used in clinical human gene-therapy to rectify monogenic disorders by stably expressing the therapeutic transgene in patients. Replication-defective vectors have been derived from various retroviral genera, such as gammaretrovirus, lentivirus and spumavirus, as well as from retotransposons (reviewed in 153). The widespread success of first-generation gammaretrovirus-based vectors for human gene-therapy resulted from their use in the correction of primary immunodeficiencies, such as X-linked severe combined immunodeficiency (SCID-X1), adenosine deaminase-SCID (ADA-SCID), Wiskott-Aldrich syndrome (WAS) and X-linked chronic granulomatous disease (CGD) (reviewed in [154][155][156][157]. The therapeutic concept for utilizing gammaretroviral MoMLV-based vectors was first successfully demonstrated for autologous hematopoietic stem cell (HSC) gene therapy for SCID-X1 (24). In separate clinical trials from 1999 to 2009, a total of 20 SCID-X1 patients underwent treatment for a gene defect in interleukin 2 common gamma chain (IL2␥ c) using MoMLV-based HSC gene therapy (158). Autologous CD34+ cells derived from patient bone marrow were transduced ex vivo with MoMLV-based vectors carrying the IL2␥ c transgene. Clinical benefits were achieved in 17 of 20 patients who displayed transgene expression, restoration of T-cell function and long-term immunological correction. Unfortunately, severe adverse events occurred in five of the 20 patients, who developed leukemia (158). The associated cancer in these patients was linked to the insertion of MoMLV-based vectors near the LMO-2 protooncogene in four instances and near the CCND2 protooncogene in the remaining case (26)(27)(28). The integrations led to MoMLV-LTR driven transcriptional upregulation of the nearby proto-oncogenes (26)(27)(28). In separate studies for the treatment of different genetic diseases, such as WAS and CGD, patients likewise have developed cancer (159)(160)(161).
The adverse outcomes from these clinical trials have highlighted the significance of exploring the molecular mechanisms of retroviral integration site selection for developing 'safer' retroviral vectors for human gene-therapy (162). The genotoxicity associated with retroviral vector integra-tion in the host genome can be explained by the following mechanisms (also reviewed in [163][164][165]: (i) activation of host gene promoters by enhancers present in the viral LTRs, leading to transcriptional activation and upregulation of host genes, (ii) transcriptional read-through of the host gene resulting in aberrant and/or chimeric transcripts whose expression can result in adverse effects and (iii) deregulation of host gene expression due to cryptic splicing or premature polyadenylation of host genes due to RNA elements present in the viral LTR. Interestingly, ∼0.12% of all MoMLV integration occurred in the vicinity of the LMO-2 proto-oncogene in CD34+ cells whereas integration in this region was not detected in CD4+ cells (9). These observations highlight the utility of determining retroviral integration site distributions in clinically relevant target cells prior to in vivo transplantation.
In depth analysis of the SCID-X1 gene-therapy trial has established that the initiation of leukemia was due to the transcriptional activation of proto-oncogenes by MoMLV vector LTRs (26)(27)(28). Additionally, it was shown that the leukemic T cell clones accumulated secondary genetic aberrations such as translocations and deletions, consistent with the 'multiple-hit' hypothesis of oncogenesis (166,167). Thus, integration of retroviral vectors near proto-oncogenes or growth control genes can prime the transformation process and lead to the expansion of aberrant clones by clonal dominance. In light of these points, second-generation retroviral vectors have been developed. These self-inactivating (SIN) vectors bear deletions in the U3 region of the viral LTR, which contains the viral enhancer/promoter elements. The SIN vectors have displayed a safer profile in in vitro genotoxicity assays (168)(169)(170) and have been used in recent clinical trials for SCID-X1, ADA-SCID, WAS and X-CGD (171).
Recent identification of the key role of BET proteins for MoMLV integration site selectivity has opened up new paths to modulate gene-therapy applications with the goal to suppress unwanted genotoxicity. For example, JQ-1 treatment has been shown in a cell line model to reduce the frequency of MoMLV integration in the vicinity of proto-oncogenes (19,87). Accordingly, CD34+ cells could be treated with BET protein inhibitors during ex vivo transduction with MoMLV-based vectors, though initial work would need to determine integration frequencies near proto-oncogenes in comparison to previously utilized cell line models and additionally address any potential toxicity of the small molecules in CD34+ cells. Ongoing clinical trials to determine the safety profiles of second generation BET inhibitors such as I-BET762 (172) and OTX015 (173) in the treatment of human cancers should help inform as to which molecules could have utility with MoMLV-based gene-therapy vectors.
An alternative approach to counteract the genotoxicity of MoMLV-based vectors would be to utilize chimeric cellular proteins to redirect MoMLV integration away from proto-oncogenes and toward 'safer' chromosomal sites. For example, a proof of concept study showed that ectopically expressed LEDGF(1-324)/BRD4(ET/SEED) redirected MoMLV integration away from TSSs and toward active genes (21). However, the potential clinical applications of BET protein-mediated MoMLV retargeting are fairly unclear. One significant drawback is the requirement of having the chimeric tethering factor in the target cell, presumably in advance of challenge with the therapeutic retroviral vector. A more direct approach might be to utilize MoMLV IN C-terminal tail deletion mutant vectors. While various deletions or mutations of the MoMLV IN C-terminal tail markedly compromised IN binding to BET proteins (19,21,87,112), the terminal tail region is not essential for catalytic activities of the enzyme in vitro (174)(175)(176) or for virus replication in cell culture (113,(177)(178)(179). MoMLV mutants with deleted or altered IN C-terminal tails displayed markedly reduced integration frequencies near TSSs, CpG islands and BET protein-binding sites (112). For example, integration frequencies for wild-type and mutant MLVs within 1 kb of TSSs averaged ∼12 and ∼2.5%, respectively. Yet, the residual preference of mutant MoMLVs for this chromatin region was still evident when compared to HIV-1. One potential explanation for this observation is the residual affinity of BET proteins to bind C-terminal truncation mutants of MoMLV IN (20). Alternatively, secondary chromatin-associated factors might arise in the absence of BET protein engagement, as occurs for HIV-1 in the absence of LEDGF/p75 protein (16,17).
The problems encountered with MoMLV-based vectors have prompted the development of HIV-derived lentiviral vectors. As discussed above, lentiviral vectors unlike MoMLV are able to transduce both dividing and nondividing differentiated cells with high efficiency (180). Even more importantly, HIV-1 integration is disfavored near TSSs and proto-oncogenes, which could reduce the risks of transcriptional activation. In a recent clinical trial involving one patient, a lentiviral vector was successfully employed to correct beta thalassemia major (181). Interestingly, in this case the vector integrated within the HMGA2 protooncogene, however the respective cell clone expanded without leukemic progression (181). Clinical trials with larger number of individuals will allow assessing the risks versus benefit ratios for the clinical utility of lentiviral vectors.
Electrostatic interactions between the HIV-1 IN NTD and LEDGF/p75 IBD, which are important for the high affinity interaction, have been scrutinized to artificially control HIV-1 integration site selectivity (109,182). Reversecharged mutations were engineered at the interacting interfaces of both proteins to allow mutant HIV-1 IN to recognize complementary mutant LEDGF/p75, but not the respective wild-type counterparts. The transduction efficiency of an optimized mutant IN vector, which was reduced to ∼10-20% compared with the wild-type vector in cells expressing wild-type LEDGF/p75, increased to ∼75% upon ectopic expression of complementary reverse-engineered LEDGF/p75 (182). The application of this approach can be extended to develop heterologous fusion proteins containing the mutant LEDGF/p75 IBD and desired chromatin tethering modules to control the HIV-1 integration pattern. Nevertheless, customized lentiviral retargeting strategies suffer the common drawback of ectopic expression of the retargeting factor in susceptible target cells (182). Clinical trials could potentially compare modified MoMLV and HIV-1 based vectors to MMTV-based constructs, as this betaretrovirus reportedly targets host chromatin in a random fashion (11,12).

SUMMARY
Recent research has clarified the molecular mechanisms that underlie integration site selection of retroviruses. The propensity for weakly conserved palindromic sequences at the sites of integration seemingly reflects IN-target DNA interactions that preferentially bend the DNA to position it near the two IN active sites within the functional intasome complex. Of the six profiled retroviral genera--the genome-wide distribution of epsilonretrovirus integration has yet to be reported--gammaretroviruses and lentiviruses display the most distinctive profiles. Whereas the interaction between LEDGF/p75 and HIV-1 IN targets integration into active gene bodies, MoMLV IN engages BET proteins to integrate in the vicinity of strong enhancers and the TSSs of active gene promoters. Though many differences exist between these two systems, including the regions of the IN protein that interact with its cognate host cellular protein and the resulting epigenetic mark to which the intasome complex is tethered, the overall concept of bimodal tethering of PIC-associated IN to specific regions of chromatin is strikingly similar and parallel findings in the related area of retrotransposon integration targeting. For retroviruses these interactions likely help to situate the provirus within well-expressed regions of the host genome. These discoveries have sprung novel initiatives toward controlling the specificity of retroviral DNA integration, in particular for the field of human gene-therapy. For example, the potential targeting of MoMLV vectors away from TSSs and oncogenes by generating clinical vectors that lack the C-terminal tail of MoMLV IN and hence do not engage BET proteins may prove less genotoxic than previous MoMLV-based clinical vectors. The field of retroviral integration site targeting is quickly evolving, with exciting advances on the basic mechanism and utility of virus-derived vectors for treating human genetic disorders expectedly forthcoming.

ACKNOWLEDGMENT
We wish to thank Matthew R. Plumb and Lei Feng for their help with preparing the figures and providing critical comments on the manuscript.