Integrated Single-cell Multiomic Analysis of HIV Latency Reversal Reveals Novel Regulators of Viral Reactivation

Abstract Despite the success of antiretroviral therapy, human immunodeficiency virus (HIV) cannot be cured because of a reservoir of latently infected cells that evades therapy. To understand the mechanisms of HIV latency, we employed an integrated single-cell RNA sequencing (scRNA-seq) and single-cell assay for transposase-accessible chromatin with sequencing (scATAC-seq) approach to simultaneously profile the transcriptomic and epigenomic characteristics of ∼ 125,000 latently infected primary CD4+ T cells after reactivation using three different latency reversing agents. Differentially expressed genes and differentially accessible motifs were used to examine transcriptional pathways and transcription factor (TF) activities across the cell population. We identified cellular transcripts and TFs whose expression/activity was correlated with viral reactivation and demonstrated that a machine learning model trained on these data was 75%–79% accurate at predicting viral reactivation. Finally, we validated the role of two candidate HIV-regulating factors, FOXP1 and GATA3, in viral transcription. These data demonstrate the power of integrated multimodal single-cell analysis to uncover novel relationships between host cell factors and HIV latency.


Introduction
The formation of a latently infected reservoir in CD4 þ T cells is a key barrier to an human immunodeficiency virus (HIV) cure [1,2].The reservoir is highly stable and long lived, and is not eliminated by current antiretroviral therapy [3].Additionally, clonal expansion in vivo counteracts gradual erosion of infected cells [4][5][6][7][8].A key mechanism in the formation of this reservoir is the phenomenon of viral latency, in which HIV transcription is reversibly silenced post integration, allowing it to evade the host immune defenses.Sporadic reactivation of these cells generates "blips" of viremia during therapy [9] and seeds rapid rebound upon interruption of therapy [10,11].Current cure strategies for HIV involve transiently inducing viral gene expression in latent proviruses using latency reversing agents (LRAs), followed by immune clearance of the reactivated cells.However, this approach has thus far only achieved limited success [12][13][14][15][16]. Reactivation of latently infected cells with existing LRAs is inefficient, with typically � 10% of replication-competent proviruses being reactivated in patient-derived cells, even with "potent" LRAs [17].For the LRA/clearance approach to be successful, broad reactivation of the reservoir will be required.This inefficiency of latency reversal with LRAs is likely due to a combination of stochastic processes that regulate viral gene expression, the existence of multiple layers of repression to HIV gene expression, and the heterogeneity in both the cellular environment and proviral integration sites in infected cells.Fully defining the nature of these repressive mechanisms is urgently needed to allow the development of broader acting LRAs or approaches with a combination of different LRAs.Prior work indicates that silencing of HIV in CD4 þ T cells involves the combined effects of low levels of positive transcription factors (TFs), such as nuclear factor kappa B (NF-κB) and activating protein-1 (AP-1) [18][19][20], active repression by cellular factors (e.g., NELF and DSIF) [21,22], sequestration of the elongation complex P-TEFb (cyclinT1/CDK9) [23], and histone modifications that create a repressive heterochromatin environment around the virus promoter [23][24][25][26][27]. Nevertheless, the regulation of HIV gene expression remains incompletely understood, and additional mechanisms likely exist that will need to be fully characterized.
Gene expression is regulated by TFs that bind to promoter or enhancer regions near the gene, and mediate chromatin remodeling of these regions [28], during which nucleosomes are removed, repositioned, or modified by covalent molecules such as acetylation or methylation [29].This remodeling then allows recruitment of additional TFs, as well as RNA polymerase II (RNA Pol II) and transcriptional elongation factors to promoters and enhancers.Thus, TF binding, chromatin remodeling, and RNA transcription are key regulatory steps in cellular gene expression that act in a coordinated fashion to determine the expression level for each gene.However, fully understanding how these processes interact to regulate gene activities in individual cells has been difficult due to the technical limitations of profiling individual cells.Recent advances in multimodal single-cell analyses now allow concurrent observation of both the RNA levels for a given gene and the accessibility of the chromatin associated with this gene in single cells [30].These approaches have led to fundamental new insights into the regulation of gene expression.Thus, in this study, we used an integrated single-cell RNA sequencing (scRNA-seq) and single-cell assay for transposaseaccessible chromatin with sequencing (scATAC-seq) approach to characterize transcriptomic and epigenomic profiles from same cells in both a cell line model of HIV latency and a primary CD4 þ T cell latency model to uncover the networks of transcriptional regulators that control HIV gene expression in response to hallmark LRAs.By correlating HIV viral RNA (vRNA) levels with individual cellular transcripts from scRNA-seq, as well as with chromatin peak dynamics and TF deviation scores inferred from scATAC-seq peak accessibility, we revealed several novel candidate regulators of HIV gene expression.Finally, by using short hairpin RNA (shRNA) knockdown and clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein 9 (Cas9) knockout, we identified FOXP1 and GATA3 as regulators of HIV latency and reactivation.

Latency reversal in in vitro models of HIV latency
To understand the relationship between cellular transcriptome, chromatin state, and HIV transcription, we stimulated 2D10 cells [24] -a Jurkat cell line model of HIV latency, and in vitro latently infected primary CD4 þ T cells from two different donors with one of three different LRAs, or with control vehicle [dimethylsulfoxide (DMSO)] (Figure 1A, Figures S1A and S2A).The primary CD4 þ T cell model of latency has been previously described [31] and involves infection of activated CD4 þ T cells with a reporter strain of HIV encoding a destabilized enhanced green fluorescent protein (HIV-GFP), followed by culture for 3 weeks to allow loss of viral gene expression as the cells return to a resting state (Figure 1A, Figure S2A).The LRAs that we used represent different mechanisms of action -histone deacetylase inhibitor (HDACi) by vorinostat, bromodomain inhibitor (BRDi) by iBET151, and protein kinase C activator (PKCa) by prostratin.For 2D10 cells, we added limiting amounts of each LRA to induce HIV reactivation in only a subset of cells (up to 20%-30%) at 24 h post stimulation (Figure S1B).Approximately 7% of the 2D10 cells displayed some level of background HIV expression as measured by detection of GFP in the absence of LRA stimulation.For the HIV-GFP infected primary CD4 þ T cells, a GFP þ population of 15%-20% was observed in the absence of LRA stimulation (Figure 1B and C, Figure S2B and C).Notably, while vorinostat and prostratin both significantly increased the fraction of GFP þ primary CD4 þ T cells for both donors by 10%-15%, iBET151 had no apparent latency reversing effect in either primary cell donor, indicating differences between the cell line model and primary CD4 þ T cells (Figure 1C, Figures S1B  and S2C).
Each population was then analyzed by integrated scRNAseq/scATAC-seq, yielding data for a total of � 4000 cells for 2D10 cells, � 80,000 cells for primary CD4 þ T cells from donor 1, and � 40,000 cells for primary CD4 þ T cells from donor 2. Quality control metrics indicated high-quality library preparation and sequencing quality with well-balanced depth of coverage between the assay for transposase-accessible chromatin (ATAC) and RNA domains (Figure S3).For both 2D10 cells and primary CD4 þ T cells, dimension reduction using Uniform Manifold Approximation and Projection (UMAP) [32] of the scRNA-seq data showed a clear separation of vorinostat-and prostratin-treated cells from the DMSO-treated (control) cells along UMAP_1 and UMAP_2, indicating LRA-specific modulation of cellular genes (Figure 1D, Figures S1C and S2D).iBET151-stimulated primary CD4 þ T cells, by contrast, mostly overlapped with DMSO-treated cells, indicating a limited impact of this drug on the cellular transcriptome, consistent with our observation of limited effect on viral transcription at this concentration.For primary CD4 þ T cells, UMAP display of the matching scATAC-seq data also indicated separation of prostratin-and vorinostat-treated cells from DMSO-and iBET151-treated cells (Figure 1D, Figure S2D).For 2D10 cells, separation of the four conditions at the epigenomic level was less clear -DMSO-and iBET151-treated cells occupied overlapping regions of the scATAC-seq UMAP plot, while vorinostat-and prostratin-treated cells were partially separated from other cell conditions (Figure S1C).
We also labeled clusters of cells based on their transcriptomes, irrespective of condition, using graph-based clustering.We observed 10 clusters for 2D10 cells, and 22 and 20 clusters for primary CD4 þ T cell donors 1 and 2, respectively (Figure 1E, Figures S1D and S2E).Each cluster exhibited a distinct expression profile indicated by the set of RNAs that were most selectively enriched in each cluster (Table S1).This clustering pattern indicated that diverse states of the cells exist within each condition.Similar to our observations when the cells were labeled by condition, we observed that these transcriptomic clusters were still observable but somewhat harder to discern from the scATAC-seq data, particularly for 2D10 cells.Overall, we observed that clusters that were adjacent in the scRNA-seq UMAP plots were typically adjacent in scATAC-seq UMAP plots as well.

Heterogeneous viral gene expression in individual cells
We next examined the expression of HIV transcripts across the cell populations.Reads were aligned to a HIV reference genome derived from the viral clones used to generate the 2D10 cells and used to infect the primary CD4 þ T cells [33].For 2D10 cells, the majority of cells (80%) did not express detectable vRNAs in the absence of LRA stimulation.All three LRA stimulations caused a significant increase in the abundance of HIV mapping reads in 2D10 cells, with significant P values from the Tukey test and one-way analysis of variance (ANOVA) test (Figure S4A).Specifically, 20% of cells had detectable vRNAs with at least one unique molecular identifier (UMI) read for DMSO-treated cells, whereas the percentages were 41%, 52%, and 43% upon treatments with iBET151, prostratin, and vorinostat, respectively.The higher fraction of 2D10 cells with detectable vRNAs compared to the GFP þ fraction observed by flow cytometry (Figure S1B) indicates that flow cytometry for GFP underestimates the fraction of cells with active viral transcription.Abundance of vRNA reads across the vRNA-expressing cells within each condition was highly variable and ranged from 0.6% to 17.3% of the reads for each cell (Figure S4B).For primary CD4 þ T cells, we observed that, for both donors, the majority of infected cells (� 80%) exhibited a low but detectable vRNAs even in the absence of LRA stimulation, even though only � 15%-20% of cells had detectable GFP expression by flow cytometry, indicating that most infected cells in this model retain low levels of persistent viral transcription (Figure 2A and B, Figure S5A and B).Nevertheless, both prostratin and vorinostat caused a significant up-regulation of vRNA abundance across the primary CD4 þ T cell populations, indicating transcriptional up-regulation, although prostratin was less potent at up-regulating vRNAs in donor 2 than in donor 1 (Figure 2A, Figure S5A).By contrast, iBET151 caused only a small increase in vRNA abundance in both primary CD4 þ T cell donors, consistent with the limited transcriptomic and epigenomic impact of this compound in primary CD4 þ T cells (Figure 2A, Figure S5A).The reason for the difference in potency of iBET151 between 2D10 cells and primary CD4 þ T cells is unclear and may reflect differing roles of the bromodomain (BRD)containing targets of iBET151 in enforcing latency between cell lines and primary CD4 þ T cells.
We next examined whether induction of viral gene expression was linked to the transcriptomic or epigenomic characteristics of the infected cells.UMAP display of the scRNA-seq data showed that vRNA-expressing cells were enriched in specific regions of the plots/clusters (Figure 2B, Figures S4C, S4D, and S5B).This clustering of vRNAexpressing cells was less prominent when vRNAs were excluded from the dataset when performing clustering, indicating that the viral transcripts themselves are contributing to the formation of these clusters (Figures S6-S8).Nevertheless, vRNA-expressing cells displayed a non-random distribution across the scRNA-seq UMAP plot even without inclusion of vRNAs, indicating that reactivation of viral gene expression is correlated with transcriptional profiles within the host cells.

Viral gene expression is correlated with increased accessibility of the viral genome
To understand how the chromatin configuration of the viral genome was related to viral genome expression, we next examined the accessibility of the HIV provirus within our datasets by examining scATAC-seq reads that mapped to the HIV reference genome.This approach has been previously used to identify HIV-infected cells in samples from people with HIV [34].Overall, the HIV mapping scATAC-seq data for both model systems were sparse, with the majority of cells exhibiting no detectable HIV mapping scATAC-seq reads (Figure 2C   and P ¼ 0 for donor 2) correlation between HIV mapping scATAC-seq reads and the abundance of vRNAs within each cell for both primary CD4 þ T cell donors (Figure 2E, Figure S5E), consistent with the hypothesis that viral gene expression is linked to overall proviral chromatin accessibility.This correlation between viral gene expression and proviral chromatin accessibility is further recapitulated in the 2D10 cells (Figure S4E-G).When we examined the proportion of scATAC-seq reads that were mapped to HIV across different stimulation conditions, we observed differential impacts of the LRAs on proviral accessibility.Specifically, vorinostat caused increased overall proviral accessibility in both primary CD4 þ T cell donors, while prostratin increased proviral accessibility in donor 1, but led to a more modest change in donor 2, consistent with its lower impact on vRNA abundance in this donor (Figure 2C and F-H, Figure S5C, F, and G).iBET151 had little impact on proviral accessibility in either primary CD4 þ T cell donor, also consistent with its lack of latency reversing activity in this system (Figure 2C, Figure S5C).By aggregating scATAC-seq data for all cells within each condition and examining read density across the viral genome, we observed distinct accessibility peaks within the HIV genome for 2D10 cells and for primary CD4 þ T cells (Figure 2F, Figures S4E and S5F).These peaks were located at the 5 0 end of the viral genome, overlapping with the 5 0 long terminal repeat (LTR).The HIV genome contains several known nucleosomal positions within the 5 0 end of the virus (Nuc0, Nuc1, Nuc2, and Nuc3) [35][36][37][38].The most prominent peak (position: 251-550 bp) for both 2D10 cells and primary CD4 þ T cells corresponds to a known DNase I hypersensitive (DHS) region between Nuc0 and Nuc1 and overlaps the viral transcription start site (TSS) at position 455 bp.In 2D10 cells, a second prominent peak was observed immediately downstream of Nuc1 between Nuc1 and Nuc2 (position: 521-850 bp) (Figure S4E).In primary CD4 þ T cells, the LTR promoter associated peak displayed increased accessibility in response to vorinostat and prostratin in donor 1, and in response to vorinostat in donor 2, indicating that, in primary CD4 þ T cells, increased promoter accessibility is a key aspect of potent latency reversal.We observed a consistent increase in accessibility across the gene body of the HIV provirus in prostratin-treated cells for both primary CD4 þ T cell donors, highlighting that this LRA is particularly effective at promoting increases in accessibility downstream of the promoter associated peaks (Figure 2F-H, Figure S5F and G).
Overall, these observations are consistent with the hypothesis that an overall increase in accessibility across the HIV genome is correlated with increased viral gene expression, likely due to nucleosomal remodeling and histone modification to permit passage of RNA Pol II through the provirus.These data also indicate significant differences between LRAs in terms of the impact on accessibility and transcription during viral reactivation.

LRAs promote defined changes to the transcriptome and epigenome of HIV-infected cells
To understand the impact of individual LRAs on host cells, we examined the host-cell transcripts that were up-regulated or down-regulated by the addition of the LRAs.As expected, each LRA displayed a distinct transcriptomic signature when considering the up-regulated genes (Figure 3, Figures S9 and  S10).In general, prostratin had the largest impact on the transcriptome [1558 differentially expressed genes (DEGs) in donor 1], consistent with its known ability to activate multiple signaling pathways, while vorinostat (813 DEGs in donor 1) and iBET151(294 DEGs in donor 1) had more modest impacts.In 2D10 cells, only eight genes/transcripts exhibited increased expression with all three LRAs (Figure S9A), while in primary CD4 þ T cells, 23 and 7 up-regulated transcripts were shared by the three LRAs for donors 1 and 2, respectively (Figure 3A, Figure S10A).
To probe the biological significance of the DEGs, we performed Gene Ontology (GO) enrichment analysis [39] to identify enrichment of specific functional classes of genes.For the down-regulated genes in 2D10 cells, we detected a shared set of enriched biological process terms across all three LRAs (Table S2).For the down-regulated genes in primary CD4 þ T cells, some of the enriched sets include T cell activation (GO:0042110) and host-virus interaction (KW-0945) (Tables S3 and S4).For the up-regulated genes, the GO enrichment analyses in 2D10 cells and primary CD4 þ T cells indicated that prostratin induced transcriptional responses consistent with the activation of T cell receptor, tumor necrosis factor (TNF), Rap-1, PD-1 checkpoint, FOXO/MAPK/ RAS, and NF-κB signaling pathways.Inhibition of T cell, NF-κB, and MAPK/RAS signaling pathways block reversal of HIV expression, suggesting that prostratin treatment provides sufficient signals for HIV expression in latently infected cells.Conversely, we did find T cell receptor signaling pathways as common figure between 2D10 cells and primary CD4 þ T cells induced by vorinostat (Tables S2-S4).
We next examined the impact of the LRAs on the cellular chromatin of HIV-infected cells by using the scATAC-seq data to examine enrichment of TF-binding sites within the open chromatin.During activation of a TF in a cell, TF binding to promoter and enhancer regions is frequently associated with an increase in accessibility at binding sites due to the action of nucleosomal remodeling complexes recruited by the TFs.Thus, increased enrichment of binding sites for a TF in the open chromatin of a cell can serve as a proxy for overall activity of a TF in that cell.Using chromVAR [40], we calculated a TF deviation score for a set of 633 TFs with annotated binding motifs for each cell and examined the impact of each LRA on these TF deviation scores via a test of differential motif accessibility.Similar to the scRNA-seq data, each LRA caused a distinctive impact on the pattern of TF-binding motif accessibilities in the host cells (Figure 4, Figures S11 and  S12).Each LRA modulated the motif accessibilities of a set of TFs that were unique to each LRA, but there were also a set of TFs whose motif accessibilities were coregulated by all three LRAs.Consistent with its more potent transcriptomic impact, prostratin up-regulated the motif accessibilities of the greatest number of TFs in both 2D10 cells and primary CD4 þ T cells.Vorinostat and iBET151 had milder impact on the host-cell TF deviation scores in primary CD4 þ T cells, in concordance with the differential gene expression analysis.Notably, TF deviation scores were significantly increased after prostratin stimulation for two members of the AP-1 family (JUN and FOS), consistent with this family's known role in the PKC signaling pathway (Figure 4C, Figure S11C).In 2D10 cells, no TFs displayed increased motif accessibilities by all three LRAs; however, four TFs exhibited down-regulated motif accessibilities in response to all three LRAs, including FIGLA (a bHLH TF) as well as SNAI1, SNAI2, and SNAI3 (three members of the SNAIL family of transcriptional repressors) (Figure S12C).In primary CD4 þ T cells, 34 and 24 TFs displayed increased motif accessibilities by all three LRAs in donors 1 and 2, respectively, while the motif accessibilities of 36 and 28 TFs were down-regulated by all three LRAs for donors 1 and 2, respectively (Figure 4A, Figure S11A).Seven of the TFs with up-regulated motif accessibilities and 11 of the TFs with down-regulated motif accessibilities overlapped between primary CD4 þ T cell donors.This down-regulated set also included members of the SNAIL family, indicating a conserved pathway between cell lines and primary CD4 þ T cells.Additionally, the regulatory roles of these differentially accessible TFs (DATFs) were corroborated by the enrichment of Tn5 integration events in the flanking regions of the TF-binding sites using DNA footprinting analyses (Figure S13).

Identification of cellular transcripts that correlate with HIV transcription
To understand the relationship between vRNA, cellular RNA, and chromatin accessibility more deeply, we examined pairwise correlations between summed vRNA levels for each cell and individual cellular transcript levels across cells of the dataset (Figure 5A, Figures S14A and S15A).We performed this analysis for each individual condition, as well as across the aggregated cell population.Although we found some condition-specific associations, we observed that including all conditions in this analysis identified a significantly greater number of HIV-correlated transcripts, likely due to the increased cellular diversity and statistical power provided by the larger cell numbers (Figures S16-S18).Therefore, we focused our analysis on the aggregated cell populations.In 2D10 cells, we identified 229 and 123 cellular transcripts whose expression was correlated positively or negatively with HIV vRNA levels using a 0.05 cutoff of false discovery rate (FDR)-adjusted P value, respectively (Tables S5A and S6).In primary CD4 þ T cells, we observed 2437 (donor 1) and 1204 (donor 2) transcripts that were significantly correlated with HIV vRNA levels.The top 20 significantly linked genes in primary CD4 þ T cells are included in Table 1 and Table S7A for donors 1 and 2, respectively, with the full testing results shown in Tables S8 and S9.
Overall, each transcript in these sets individually correlated with HIV vRNA levels only weakly, albeit the small P values, with a 95% confidence interval for the significant R 2 .The most highly correlated cellular transcript in 2D10 cells was TSPOAP1 (Figure S14B), which encodes a cytoplasmic protein with no previously known role in HIV replication.Although the cellular role of TSPOAP1 is unclear, it has recently been shown that TSPOAP1 enhances influenza A virus replication by antagonizing innate immune signaling [41].Interestingly, two of the most significantly correlated transcripts in 2D10 cells were long non-coding RNAs, MALAT1 and NEAT (Figure S14B), both of which have been reported to be implicated in HIV replication [42].
Amongst the highly HIV-correlated transcripts in primary CD4 þ T cells, several noteworthy genes were observed.The transcript for the centromere protein CENPF was significantly positively associated with HIV transcription in both donors.Additionally, transcripts for two proteins associated with active cell metabolism and cell division (GAPDH and MKI67) were significantly correlated with vRNA levels.The transcriptional regulator BACH2, which has been previously identified as a recurrent integration target for HIV, was also

No. of down-regulated genes by LRAs
No. of up-regulated genes by LRAs highly associated with HIV transcription [5].Overall, there was a pattern of increased expression of T cell activation markers with greater HIV vRNA levels -specifically, expression of transcripts for the activation markers CD38, IL2RA (CD25), and CD69 was positively correlated with vRNA levels in both donors.Similar to 2D10 cells, MALAT1 expression was positively correlated with vRNA levels in donor 2 but was uncorrelated in donor 1.Interestingly, the transcript for the DNA helicase TOP2A, which is required for unwinding of DNA templates during transcription, was also positively correlated with HIV transcription [43].
When we examined genes whose transcription was negatively correlated with vRNA levels in primary CD4 þ T cells, we observed that high expression of transcripts for LTB and IL7RA was associated with lower vRNA levels (Figure 5B, Figure S15B).These two genes are highly expressed in quiescent memory T cells, consistent with our previous observations that HIV latency preferentially occurs in cells with a quiescent memory phenotype [31].Notably, expression of the transcripts encoding two well characterized HIV restriction factors, SERINC5 and SAMHD1, was negatively correlated with HIV transcription across the cell population.Expression of STK38, encoding a kinase that has been shown to be incorporated into HIV particles, was also negatively correlated with HIV vRNA levels, along with numerous genes with no known previous roles in HIV infection, such as TC2N and the long intergenic non-protein coding RNA LINC00861.
We further carried out a GO enrichment analysis using a set of genes that are significantly linked with vRNA levels.In 2D10 cells, we observed that correlated transcripts were enriched in the GO terms such as translation, cytoplasmic translation, and ATP binding (Table S10); in primary CD4 þ T cells, enrichment was observed in the GO terms such as DNA binding, ATP binding, metal ion binding, and T cell receptor signaling pathway (Table S10).
We then focused on TFs whose transcripts were found to be correlated with vRNA levels.In 2D10 cells, GATA3 was the most significantly positively correlated TF-coding transcript (P ¼ 0.0005), and interestingly, CTCF was the most significant negatively associated TF-coding transcript (P ¼ 0.0072).We have recently reported that CTCF is a novel repressor of HIV gene expression in CD4 þ T cells, and knockdown of CTCF expression in latently infected Jurkat cells and primary CD4 þ T cells reactivates HIV gene expression [44].Transcripts for YY1 and HSF1, two TFs that have been confirmed to repress HIV replication, were also found to be negatively correlated with vRNA levels in 2D10 cells.In primary CD4 þ T cells, we found that transcripts encoding the BACH2, ETV6, and IRF4 TFs were among the top positively correlated TFs for both donors.Members of the AP-1, NF-κB, and RUNX families of TFs were also highly represented within the set of positively correlated factors, while the transcripts of the TCF, KLF, and forkhead (FOX) TFs were negatively correlated with vRNA levels.Overall, these data indicate that this approach not only successfully identifies known regulators of HIV expression and reactivation, but also reveals novel HIV associations with cellular transcripts that could represent important but uncharacterized viral regulators.

Activities of individual TFs correlate with HIV transcription
The integrated multiomic approach also allowed us to examine connections between the epigenomic state of the cells and HIV transcription.Using the chromVAR motif scores calculated from the scATAC-seq reads for each cell, we investigated whether activity of individual TFs correlates with vRNA levels across the cell population.We expect that TFs that promote HIV transcription will have motif deviation scores that positively correlate with vRNA levels, while TFs that repress HIV will have motif deviation scores that negatively correlate with vRNA levels.The top 20 significantly linked TFs for primary CD4 þ T cells of donor 1 and donor 2 are included in Table 2 and Table S7B, respectively.In 2D10 cells, 31 TFs had motif deviation scores that were positively or negatively correlated with vRNA levels using a cutoff of FDR-adjusted P value 0.05 (Figure S14C and D; Table S5B).Similar to the correlations between the vRNA level and the abundance of cellular transcripts, the overall Spearman correlation coefficients between vRNA levels and TF deviation scores, although statistically significant, were individually weak (95% confidence interval for the R 2 ).Among TFs positively correlated with HIV vRNA levels in 2D10 cells, multiple members of the ETS, GATA, and AP-1 families were identified.Notably, the HIV LTR promoter contains binding sites for members of all three of these families, and previous work has shown that individual members of these families can positively regulate HIV gene expression, indicating that this approach successfully identifies HIV regulators [45][46][47][48][49]. Amongst the negatively correlated TFs in 2D10 cells, two families of TFs were prominently represented -TCF and SNAIL families.In particular, three members of the TCF family were associated with lower HIV vRNA levels -TCF3, TCF4, and TCF12.Some previous data have linked a related TCF protein (TCF1) to HIV regulation through direct binding [50].Additionally, TCF4 has been shown to repress HIV transcription [51,52].Interestingly, our analysis also identified SNAI1, SNAI2, and SNAI3 (members of the SNAIL family of transcriptional repressors) as being negatively correlated with HIV transcription.This family of TFs binds to Ebox motifs and has not been previously implicated in HIV transcription.
In primary CD4 þ T cells, we identified 174 TFs whose motif deviation scores were significantly correlated with vRNA levels in both donors (Figure 5C and D, Figure S15D and E).For both donors, members of the AP-1 and NF-κB (RelA, RelB) families of TFs were highly represented amongst the most positively correlated TFs, consistent with the known roles of these factors in activation of HIV transcription.Among the negatively correlated TFs, we observed transcripts for several members of the FOX TF family represented for both donors, including FOXP1, FOXO3, FOXO4, and FOXO6.This finding is consistent with our previous observations from bulk ATAC-seq of HIV-infected primary CD4 þ T cells that transcript levels for FOX TFs are also negatively associated with HIV vRNA levels [44].Similar to 2D10 cells, transcripts of members of the SNAIL family of transcriptional repressors, as well as those of members of the TCF family, were also negatively associated with vRNA levels (Tables S8 and S9).
Dynamic chromatin peaks that correlate with HIV transcription are enriched for binding sites of specific TFs We also examined the correlation between accessibility at individual cellular chromatin peaks and HIV transcription.In general, in 2D10 cells, correlations between individual peaks and HIV transcription were even weaker than those for transcripts and for TF deviation scores, with typical Spearman correlation coefficients between 0.02 and 0.05 (Figure S14E and F; Table S5C).Note that the statistical power is low due to the large number of testing/peaks and the data sparsity.Nevertheless, in 2D10 cells, we detected 224 peaks whose accessibility was significantly correlated with HIV vRNA levels, with a significance threshold of 0.01 on the nominal P values.In primary CD4 þ T cells, we found 6379 and 1886 peaks that were correlated with vRNA levels in donor 1 and donor 2, respectively (Figure 5E, Figure S15G).
We then examined these sets of positively and negatively linked peaks for motif enrichment compared with a set of background peaks using Signac [53].In 2D10 cells, we found significant enrichment of motifs that represent binding sites of 91 TFs (Table S5D).Members of the specificity protein (Sp) and Kruppel-like factor (KLF) families were prominently represented in this list, including Sp4, Sp1, Sp9, and KLF15, as well as EGR1 and ZNF148 (Figure S14F).In primary CD4 þ T cells, we found that, for both donors, the most enriched TF-binding motifs within enriched peaks were members of the AP-1, FOX, and KLF families, although enrichment of several other TF families could also be observed  (Sp1, NF-κB, and ETS) (Figure 5F, Figure S15H; Table 3, Table S7C).These findings highlight the presence of conserved associations between cell line models of latency and primary CD4 þ T cell models, but also indicate some important differences.
To examine the reproducibility of these datasets across donors, we compared the sets of correlated transcripts, TF activities, and TF-binding sites enriched in correlated peaks between donor 1 and donor 2. As expected, we observed an overall higher number of correlated transcripts and TF activities for donor 1, consistent with the greater transcriptomic impact of the LRAs in this donor.When we examined the correlated features for donor 2, however, the majority of these were also detected for donor 1, suggesting a significant degree of conserved associations.Specifically, 65.9% of the correlated DEGs, 83.3% of the correlated DATFs, and 88.5% of the TF-binding sites enriched in correlated peaks for donor 2 were also detected in donor 1 (Figure S15C, F,  and I).
To examine coordinate behavior between HIV-associated TFs, we examined overlap between TFs detected as being associated with vRNA expression through three independent analyses (i.e., TF-coding gene expression, differential accessibility, and motif enrichment in correlated peaks) (Figure 6A, Figures S19A and S20A).The list of TFs with enriched binding sites in significantly linked peaks was highly overlapped with the other two analyses.We observed that the majority of TFs with HIV-linked expression also exhibited differential accessibility that was correlated with HIV vRNA expression.We further examined pairwise correlations of consensus principal components (PCs) derived from both expression levels of TF-coding genes and deviation scores of TF-binding motifs, within a set of top 50 TFs that had the smallest P values from the Cauchy combination test.The pairwise correlations were visualized as heatmaps shown in Figure 6B and Figures S19B and S20B.An adjacency matrix was constructed for pairs with correlation coefficient > 0.5, with visualized networks shown in Figure 6C and Figures S19C  and S20C (see Materials and methods for details).For both 2D10 cells and primary CD4 þ T cells, the data indicated that these sets of TFs could be grouped into several distinct clusters.For 2D10 cells (Figure S19C), one cluster was composed mainly of KLF and Sp TFs but also included chromatin insulators CTCF and CTCFL.A second cluster was composed almost entirely of ETS TFs, while a third cluster contained just three TFs that are members of the SNAIL family of transcriptional repressors -SNAI1, SNAI2, and SNAI3.In primary CD4 þ T cells (Figure 6C, Figure S20C), we could also observe clear groups of HIV-associated TFs with correlated activity for both donors.Sp1 and KLF TFs formed the largest cluster, followed by clusters that included AP-1 and ATF TFs.FOX, TCF, and RUNX TFs also formed a more loosely associated group of TFs.Overall, these results from TF network analysis shed light upon the regulatory relationships and synergies among the previously identified TFs.

Machine learning modeling of HIV transcription from multiomic data
Since each transcript, TF deviation score, or peak was only weakly correlated with vRNA levels, we next examined whether considering multiple features of the dataset at once in a multivariate machine learning model could lead to a more predictive model of HIV gene expression from the dataset.Beginning with the 2D10 cell dataset, we first used Spearman correlation coefficients between individual features (transcripts, TF deviation scores, and peaks) and vRNA levels to select the 60 highest ranked features and then incorporated these features into a boosting-based machine learning model.To simplify the prediction goal of the model, we converted the data into a binary classification of vRNAþ and vRNA− cells based on whether vRNAs were detected in each cell.For machine learning training and evaluation, we split the data into five sets -four for separate training and one for testing.By examining the receiver operating characteristic (ROC) curves to evaluate model performance, the model was able to achieve an area under the curve (AUC) of 0.79 for the training set and 0.68 for the test set (Figure S21A).We were also able to determine the rank importance of individual features to the model performance (Figure S21B).Based on this ranking, the most important predictive features for 2D10 cells were the transcripts TSPOAP1 and MALAT1 followed by accessibility of binding sites for the ETS TF SPIB (MA0081.2).
We then applied machine learning tools to the dataset from HIV-infected primary CD4 þ T cells.Since primary CD4 þ T cells exhibit a more continuous level of viral gene expression, with most cells expressing some vRNAs, for these cells we focused on predicting whether a given cell was in the top 10% of vRNA expression cells based on features derived from the multiomic dataset.We first examined classification of cells using Generalized and Scalable Optimal Sparse Decision Trees (GOSDT) [54].GOSDT is an interpretable decision tree model that generates provably optimal and sparse trees by splitting the dataset at given thresholds of individual features.To identify features of interest with respect to predicting HIV expression, the dataset was processed in three stages (Figure 7A).First, all features (transcript levels and TF deviation scores) were ranked by Spearman correlation coefficients against HIV expression level, and the highest-ranked 1000 features were then considered for further analysis.Second, the feature set was further reduced and binarized via the threshold guessing column elimination procedure proposed by McTavish and his colleagues [55].A total of 2258 GOSDT models were created by varying the threshold guessing stage hyper-parameters (number of boosting stages and sample weights), and GOSDT hyper-parameters (tree depth, regularization, and sample weights controlled via random oversampling) (Figure 7B).These models were all evaluated by determining their precision (fraction of guessed cells that are actually in the top 10% of HIV-expressing cells) and their recall (the fraction of top 10% HIV-expressing cells accurately guessed).A randomly guessing model would, for example, be expected to have an AUC of the precision-recall curve of 0.1.The majority of our GOSDT models trained and tested on our data significantly outperformed this threshold, indicating significant ability to predict high HIV expression from the multiomic data.By counting the most commonly used split features in the set of final GOSDT decision trees, we observed that three cellular genes (CENPF, AHI1, and LINC00486) that have not been previously associated with HIV infection were the most abundantly represented features across the GOSDT models (Figure 7C and D; Table S11).
We next used a boosting-based approach to predict HIV vRNA expression across the cell population.While not as simple or interpretable as GOSDT models, boosting-derived models frequently have high predictive accuracy.An XGBoost [56] gradient boosting model was trained on the top 1000 Spearman correlated features to predict whether a given cell was within the top 10% of HIV-expressing cells.We observed that the model produced by this approach that was trained on data from both primary CD4 þ T cell donors performed well on test sets of cells from both donors.Specifically, the AUC values were 0.791 and 0.751 for donor 1 and donor 2, respectively (Figure 7E).Additionally, the model trained on one donor still had high predictive performance for cells from the other donor, indicating biological reproducibility and robustness to inter-donor variation.When we visualized a ranked list of cells in the test set based on XGBoost prediction of being in the highest 10% of HIVexpressing cells, we observed that cells that were confirmed to be in the top 10% were highly enriched at the top of the predicted list, further demonstrating model accuracy (Figure 7F).We also examined the importance of individual features for the performance of the XGBOOST.Interestingly, a ranked list of the most important features for this model included many highly ranked cellular genes that were also identified by the GOSDT approach, and that were also identified by our machine learning analysis of 2D10 cells, suggesting conserved pathways of importance across these models.For example, MALAT1 and TSMB4X were in the top 5 features for both 2D10 cells and primary CD4 þ T cells (Table S12).

GATA3 and FOXP1 regulate HIV latency
We next chose to further investigate the role of two TFs in HIV latency -GATA3 and FOXP1.Our scRNA-seq data from 2D10 cells had identified the transcript for GATA3 as being positively correlated with HIV vRNA levels, and the scATAC-seq data indicated that the activity of GATA3 was positively correlated with HIV vRNA levels.Furthermore, GATA3-binding motif (MA0037.3)accessibility played a significant role in the performance of our machine learning model of HIV reactivation in 2D10 cells (Figure S21).Additionally, we observed that in primary CD4 þ T cells, GATA3 expression was positively correlated with HIV vRNA levels in both donors (Tables S8 and S9).For FOXP1, scATAC-seq data from primary CD4 þ T cell data demonstrated that the activity of FOXP1 was negatively correlated with HIV vRNA levels for both donors (Tables S8 and S9).When we examined the activity and transcript expression of GATA3 and FOXP1 across the LRA conditions, we observed up-regulation of GATA3 activity/transcript expression and down-regulation of FOXP1 activity/transcript expression within specific LRA-stimulated conditions (Figure 8A).Thus, we hypothesize that GATA3 and FOXP1 regulate HIV expression and may play an important role in baseline HIV expression or the response to one or more of the LRAs detected in this study.
We first investigated the role of GATA3 in HIV latency in 2D10 cells by transducing these cells with a GATA3-targeting shRNA (shGATA3) lentivirus or a control non-targeting shRNA (shNT) lentivirus.Transduced cells were selected by puromycin, and efficient knockdown of GATA3 was confirmed by Western blot (Figure 8B).Viral gene expression due to transduction in 2D10 cells was then measured by flow cytometry for GFP þ signals both at baseline and 24 h after prostratin stimulation.We observed that cells transduced with shGATA3 exhibited a significantly reduced response (30.8%) to prostratin compared with cells transduced with shNT (36.9%) (Figure 8C).Although GATA3-independent pathways clearly account for a majority of the response to prostratin, these data demonstrate that GATA3 plays an important accessory role in viral reactivation by this molecule in 2D10 cells.
In contrast with GATA3, we observed that lentiviral transduction of 2D10 cells with a FOXP1-targeting shRNA (shFOXP1) showed depletion of FOXP1 by Western blot (Figure 8D) but caused reactivation of vRNA expression (� 8% GFP þ cells) compared with non-targeting control (� 2% GFP þ cells) by 5 days post knockdown (Figure 8E and  F).To further examine the role of FOXP1 in HIV expression in cell line model of latency, we co-transfected HEK293T cells with a GFP-expressing HIV clone (HIV-GFP) [33] and either a FOXP1-mCherry-expressing plasmid or a mCherryexpressing plasmid (control).Overexpression of FOXP1 isoform D (FOXP1D) in HEK293T cells was confirmed by Western blot (Figure 8G).We noted that the most abundant isoform of endogenous FOXP1 expressed in HEK293T cells was a larger isoform, consistent with the size of FOXP1A.Overexpression of FOXP1D along with HIV in HEK293T cells caused attenuated viral gene expression (51% GFP þ cells) compared with the control plasmid (60% GFP þ cells) (Figure 8H).Overall, these data indicate that, in cell lines, FOXP1 can play a role in repressing HIV expression.
We also examined the role of GATA3 and FOXP1 in HIV-GFP-infected primary CD4 þ T cells.To address this, we nucleofected Cas9/single-guide RNA (sgRNA) ribonucleoprotein (RNP) complexes targeting GATA3 or FOXP1 into HIV-GFP-infected primary CD4 þ T cells at 1 week post infection (Figure 8I).Western blot of nucleofected cells confirmed significant depletion of GATA3 and FOXP1 with their respective targeting sgRNAs (Figure 8J, Figure S22A).When we examined HIV expression at 1 week post nucleofection (Figure 8K, Figure S22B), we observed that knockout of GATA3 or FOXP1 caused a small reduction in viral gene expression at baseline.We then stimulated the infected cells with 500 nM prostratin for 24 h.This stimulation caused a significant increase in the percentage of GFP þ cells in the cell population, indicating latency reversal.While this increase also occurred in GATA3-and FOXP1-depleted cells, the response was attenuated compared with cells nucleofected with a non-targeting Cas9/sgRNA RNP.These data clearly indicate that GATA3 and FOXP1 play accessory roles in the reactivation of HIV expression in response to prostratin treatment in HIV-infected primary CD4 þ T cells.The divergent behavior of FOXP1 in primary CD4 þ T cells compared with that in 2D10 cells also indicates that TFs can play complex and sometimes opposing roles in viral gene expression depending on the system/model.Additional investigation will clarify both the repressive and accessory roles of FOXP1.
Overall, these findings highlight the value of the combined multiomics approach to identify and prioritize new candidate regulators of HIV and reveal FOXP1 and GATA3 as previously unappreciated factors that impact the expression of HIV in CD4 þ T cells.It is likely that additional analyses of HIV-correlated factors derived from this dataset will identify further regulators of HIV transcription.Nevertheless, these data identify GATA3 and FOXP1 as novel and potentially important regulators of HIV expression and reactivation from latency.

Discussion
Inefficient reactivation of HIV by current LRAs is a major impediment to achieving an HIV cure, and developing methods that can broadly reactivate the replication competent reservoir is an urgent priority [17].However, the mechanisms β-actin was used as a loading control.Non-transfected HEK293T cells were used as mock.H. Flow cytometry analysis of viral GFP expression in HEK293 cells transfected with a FOXP1-mCherry-expressing plasmid or a mCherry-expressing plasmid (control) plus HIV-GFP.Data were represented by mean ± SD (n ¼ 3).P value was determined by Student's t-test.I. Schematic overview of experimental design for CRISPR/Cas9 targeting of GATA3 and FOXP1 in HIV-GFP-infected primary CD4 þ T cells.Activated CD4 þ T cells were infected with HIV-GFP.At 3 days post infection, infected (GFP þ ) cells were enriched by flow sorting and cultured for 1 week before nucleofection of Cas9/sgRNA complexes targeting GATA3 or FOXP1.One week post nucleofection of sgRNAs, knockout of the targets was analyzed by Western blot.Furthermore, GFP þ cells were quantified after stimulation with prostratin for 24 h at 1 week post nucleofection.J.Western blot showing the depletion of GATA3 and FOXP1 at 1 week post Cas9/sgRNA nucleofection.β-actin was used as a loading control.HIV-GFP-infected primary CD4 þ T cells without Cas9/sgRNA nucleofection were used as mock.K. Flow cytometry analysis of viral GFP expression in HIV-GFP-infected primary CD4 þ T cells nucleofected with sgGATA3 or sgFOXP1 at baseline and 24 h after prostratin stimulation.sgNT was used as a negative control, and sgTat was used as a positive control.Data were represented by mean ± SD (n ¼ 3), and P values were determined by one-way ANOVA Tukey's multiple comparisons test.shRNA, short hairpin RNA; shNT, non-targeting shRNA; shGATA3, GATA3targeting shRNA; shFOXP1, FOXP1-targeting shRNA; FOXP1A, FOXP1 isoform A; FOXP1D, FOXP1 isoform D; FOXP1-OE, FOXP1 overexpression; sgRNA, single-guide RNA; sgNT, non-targeting sgRNA; sgGATA3, GATA3-targeting sgRNA; sgFOXP1, FOXP1-targeting sgRNA; sgTat, Tat-targeting sgRNA; CRISPR, clustered regularly interspaced short palindromic repeats; Cas9, CRISPR-associated protein 9; SD, standard deviation.
responsible for this inefficiency remain largely unknown.Reactivation is regulated in part by stochastic processes relating to fluctuations in the abundance of host cell factors and in the level of the viral trans-activator Tat [57,58].Nevertheless, reactivation is also likely influenced by the overall host cell environment, through the expression or activity of cellular transcriptional regulators that bind and directly regulate the viral LTR, or that influence cell fate decisions that indirectly impact viral gene expression [59].Previous work from our lab and others has demonstrated that viral silencing in CD4 þ T cells is not entirely random but is associated with a distinct transcriptomic and epigenomic program [31,44,60,61].We hypothesize that during the initiation of HIV latency, a set of heritable epigenetic modifications are made to the provirus that maintain viral silencing through cell division, and that the frequency with which these modifications occur is connected to the overall host cell program [27].Furthermore, this program, in combination with stochastic responses to latency reversing drugs, likely regulates the response of latent proviruses to external latency reversing stimuli [62].Rather than representing a single state, we propose that latency represents a set of adjacent and interconnected states each characterized by a different set of restrictions to full viral reactivation.The low efficiency of overall reactivation for a given LRA is thus explained by the fact that the pathway targeted by each LRA is limiting only a minor fraction of cells (responders) while for other cells (nonresponders) a different set of restrictions continue to block viral expression.As such, detailed investigation of viral reactivation at the single-cell level may reveal additional cellular factors that control reactivation and could be targeted in parallel with current LRAs to greatly improve the overall potency and breadth of LRAs.
Understanding the heterogeneous and dynamic natures of HIV latency will require the application of single-cell methods to model systems of latency in which, ideally, multiple modalities regarding the state of the host cell and the provirus can be simultaneously measured [63].By combining high dimensional, multimodal observations of latency reversal with analytical methods to connect these observations to viral gene expression, we aim to reveal the complex relationship between host cell factors and viral reactivation.In particular, combining scRNA-seq with scATAC-seq will allow us to connect viral expression to the behavior of cellular chromatin, and, by inference, the activity of individual cellular TFs [64].Since TFs frequently recruit chromatin remodeling complexes to cellular promoters and enhancers, the re-opening of cellular chromatin in response to stimulation can reveal activity levels for sequence-specific cellular TFs [65].To date only a handful of publications have used scRNA-seq/scATAC-seq or combined scATAC-seq/surface protein analysis to characterize HIV-infected cells [34,66].
In this study, we reactivated an established cell line and primary CD4 þ T cell model of latently infected cells with three different LRAs in parallel, using concentrations that produce variegated reactivation of the provirus.As expected, each LRA induced a distinct set of gene expression and TF activity across the population.The overall diversity of transcriptomic and epigenomic phenotypes across the cell population then allowed us to examine the correlation of proviral accessibility and expression with numerous host cell factors.The overall expression of HIV transcripts was significantly correlated with proviral accessibility, consistent with the hypothesis that accessibility of the provirus is a key barrier to reactivation of expression.
By examining the correlation of cellular transcripts and TF activities with HIV vRNA levels, we were able to identify a set of cellular transcripts and TFs that significantly correlated (P < 0.05) with vRNA levels in each of these models.Furthermore, we were able to examine the correlation of accessibility of specific chromatin peaks with HIV vRNA levels, and examine these for TF-binding motif enrichment.Altogether, this approach has identified a set of candidate regulators of HIV expression and reactivation that could be investigated further for functional roles in HIV infection.Reassuringly, this approach identified several known HIV regulators of HIV transcription and latency including NF-κB, AP-1, Sp1, CTCF, and MALAT1 [42,67,68].This observation suggests that this approach is powered to detect regulators of HIV transcription.In addition to these known HIV regulating factors, many genes or TFs with no known connection to HIV transcription were found.These included genes/TFs that were either positively correlated with HIV transcription (candidate activators) or negatively correlated (candidate repressors).Amongst the positively correlated TFs, ETS and GATA families were highly represented.Previous data have linked the activity of some ETS family members to HIV expression, and an ETSbinding site in the HIV 5 0 LTR enhancer has been demonstrated [69][70][71][72][73]. Amongst the negatively associated factors, transcriptional repressors of the SNAIL family were identified.No previous reports have connected SNAIL members to HIV gene expression.This family of Zn 2þ finger TFs has been characterized for their role in regulating epithelial to mesenchymal fate decisions that bind to E-box CANNTG sequences [74].Notably, the HIV LTR (NL4-3) contains four potential E-box sequences.Additional functional experiments will be needed to confirm whether these novel factors contribute directly or indirectly to HIV gene expression.Through cross-correlation analysis, we have also identified groups of TFs within the HIVcorrelated set that are connected to each other, indicating that sets of functionally related TFs can act as part of a coordinated group.As such, carefully defining the roles of individual members of these groups will be important.
Notably, the overall correlations of individual TFs and transcripts, although statistically significant, were weak.We speculate that these low correlations result from the additional influence of background noise or stochastic processes that also contribute to reactivation.Alternatively, this observation could result from latency reversal being determined by the combined activity of several different host cell factors.To address this possibility, we used a machine learning approach to develop a multivariate model to predict whether individual cells expressed vRNAs across the dataset.We successfully developed a boosting-based model with � 75%-79% accuracy to classify cells highly expressing vRNAs (top 10%), confirming that a machine learning approach improves the overall understanding of viral reactivation.
To validate this overall approach, we chose to focus on the functional roles of two cellular factors, GATA3 and FOXP1.Although a previous report has shown that GATA3 can bind and activate an LTR-driven reporter plasmid in an epithelial cell line, the role of GATA3 in HIV transcription and latency is unknown [75].GATA3 was identified by both correlations of the GATA3 transcript and GATA3-binding motif accessibility with vRNA levels.Across the experimental conditions, GATA3 transcript expression and activity were most clearly induced by prostratin, suggesting a possible role for GATA3 in regulating reactivation of HIV latency by prostratin.Selective knockdown of GATA3 in 2D10 cells and in primary CD4 þ T cells diminished the ability of prostratin to reactivate HIV.Although, clearly, prostratin is not strictly dependent on GATA3 for its latency reversing activity, we propose that GATA3 induction represents one of several pathways activated by prostratin in CD4 þ T cells that ultimately combine to promote HIV reactivation.As such, a GATA3-targeting drug could potentially be combined with prostratin or other LRAs to enhance reservoir reactivation during latency reversal therapy.To date, toxicity concerns have limited enthusiasm for PKC agonists such as prostratin in therapy [76], but a combined stimulation of a PKC agonist with a GATA3 agonist could potentially permit a lower and safer dose of prostratin to be used.Conversely, a GATA3 inhibitor such as pyrrothiogatain could be used to block viral reactivation and promote deeper latency [77].An issue of concern with this approach is that GATA3 plays important physiological roles in promoting the development of important T cell subsets [78].It also remains to be determined whether the role of GATA3 in this response is direct or indirect.
Our data also suggest that FOXP1 represents a novel regulator of HIV expression.FOXP1 activity was negatively correlated with vRNA levels, leading us to hypothesize that FOXP1 could repress HIV transcription.Consistent with this hypothesis, knockdown of FOXP1 in 2D10 cells reactivated viral gene expression, and overexpression of FOXP1D inhibited HIV expression in HEK293T cells.Curiously, in primary CD4 þ T cells, knockout of FOXP1 diminished HIV expression after prostratin stimulation, suggesting that this factor may play opposing roles in different cell models of HIV latency.Additional studies will be required to resolve this issue.Notably, a previous study has identified another forkhead TF (FOXO1) as a repressor of HIV, suggesting that multiple FOX TFs may participate in regulating HIV expression [79].
This study should also be considered in light of several inherent limitations and caveats.Firstly, 2D10 cells are an immortalized cell line which have several differences with primary resting CD4 þ T cells, the natural host cell for the HIV reservoir.Furthermore, the proviral clones used for both the 2D10 cell and the primary CD4 þ T cell latency models are not complete HIV strains and have inactivating mutations in several viral proteins that could conceivably also contribute to the behavior of the virus in the context of latency reversal.It is also important to note that our approach rested on examining correlations between cellular factors and viral transcription, with the assumption that some of these correlations represent functional contributors to viral expression.However, it is also possible that while correlated, some factors may be induced by the LRAs tested, but not be causally linked to viral gene reactivation.Other correlated factors may also represent non-contributing "bystanders".Nevertheless, the confirmation of the functional roles of GATA3 and FOXP1 demonstrates the power of this approach to identify novel host-viral connections that could be exploited to enhance latency reversal approaches.
Primary CD4 + T cell model of HIV latency CD4 þ T cells were isolated from uninfected donor peripheral blood mononuclear cells (PBMCs) using the EasySep Human CD4 þ T Cell Isolation Kit (Catalog No. 17952, StemCell Technologies, Vancouver, Canada) using negative selection following the manufacturer's protocol.Purity of CD4 þ cells were determined by flow cytometry, which showed a high purity (97.5%)CD4 þ T cell population.Cells were plated at 1 × 10 6 cells/ml in RPMI supplemented with 10% FBS and penicillin/streptomycin, and incubated overnight at 37 � C. The cells were activated using Dynabeads Human T-Activator CD3/CD28 beads (Catalog No. 111.31D,Thermo Fisher Scientific, Waltham, MA) for 72 h at a ratio of 1:1 (beads:cells).Dynabeads were then removed, and activated cells were spin-infected with NL4-3-�6-dreGFP-IRES-thy1.2 virus [81,82] at a speed of 600 g for 2 h.Successful infection was confirmed by measuring GFP expression by flow cytometry at 2 days post infection (dpi) followed by enrichment of infected cells by flow sorting using a BD FACSAria II (Becton Dickinson, Franklin Lakes, NJ).Enriched cells expressing GFP were kept in culture for 3 weeks, during which time viral gene expression diminished, leaving most infected cells GFP − .The GFP − population was then re-sorted to enrich the latently infected population.Latently infected cells were then treated with the LRAs, vorinostat, prostratin, or iBET151, at concentration of 500 nM along with vehicle control (DMSO, 0.05%).After 24 h treatment, cells were harvested, washed, and quantified for reactivation (GFP) by flow cytometry.The remainder of the reactivated cells were used for integrated single-cell multiomic scRNA-seq/ scATAC-seq library preparation.
Single-cell multiome (scRNA-seq and scATAC-seq) library construction and sequencing Construction of single-cell multiome libraries was performed using a 10X Genomics Chromium Controller and a Single Cell Multiome ATAC þ Gene Expression kit (Catalog No. PN-1000285, 10X Genomics, Pleasanton, CA) following manufacturer's protocol (CG000338 • Rev D).Nuclei were first isolated following the 10X Genomics protocol (CG000365 • Rev B) as described above.Then, the intact nuclei suspensions were subjected to Tn5 transposition in bulk, followed by barcoding using Gel Beads-in-emulsion (GEM) beads.Silane magnetic beads were used to purify the barcoded products from the post GEM-Reverse Transcription (RT) reaction mixture.Incubation of the GEMs produces barcoded DNA from the transposed DNA (for scATAC-seq) and barcoded, full-length complementary DNA (cDNA) from poly-adenylated messenger RNA (mRNA) (for scRNA-seq).Barcoded transposed DNA and barcoded full-length cDNA were then pre-amplified by PCR to facilitate library construction.The pre-amplified product was used as input for both ATAC-seq and RNA-seq library constructions.P5 and P7 indexes were added to the pre-amplified transposed DNA for ATAC-seq libraries.cDNA amplification, enzymatic fragmentation followed by end repair, A-tailing, adaptor ligation, and PCR were performed to incorporate P5, P7, i7, and i5 sample indices, and TruSeq Read 2 (read 2 primer sequence) for RNA-seq libraries.The libraries were quantified using an Agilent TapeStation 4200 (Agilent Technologies, Santa Clara, CA) and the Qubit dsDNA High Sensitivity Assay Kit (Catalog No. Q33230, Invitrogen, Waltham, MA).Pooled samples from ATAC-seq and RNA-seq libraries were sequenced using paired-end, single-index (ATAC-seq) and dual index (RNA-seq) sequencing on a NextSeq 2000 instrument (Illumina, San Diego, CA).For RNA-seq libraries, the read format was: Read 1 -28 cycles; Read 2 -90 cycles; i7 -10 cycles; i5 -10 cycles.For ATAC-seq libraries, the read format was: Read 1 -50 cycles; Read 2 -49 cycles; i7 -8 cycles; i5 -16 cycles.Paired-end reads of pooled libraries were demultiplexed prior to downstream analysis.

scRNA-seq and scATAC-seq data processing
For the 10X Genomics multiome data, we used Cell Ranger ARC (version 2.0.0) to perform sample demultiplexing, barcode processing, peak calling, and counting of RNA-seq and ATAC-seq reads in single cells.HIV-human reference genome was generated based on Sanger sequencing of the NL4-3-�6-dreGFP-IRES-thy1.2 reporter plasmid (described above).The 3 0 LTR was masked to prevent issues with multimapping.A fasta file representing the viral genome was concatenated to a GRCh38 human reference genome, and a Cell Ranger compatible reference genome was generated using the cellranger-arc mkref function.Alignment was performed against this reference genome with the cellrangerarc count function.To validate our HIV-specific mapping, we analyzed scRNA-seq and scATAC-seq data from uninfected PBMCs using our HIV-human reference genome.The uninfected dataset used was retrieved from 10X Genomics (https://www.10xgenomics.com/resources/datasets/10-k-humanpbm-cs-multiome-v-1-0-chromium-controller-1-standard-2-0-0)covering 57,425 mean raw read pairs per cell for ATAC-seq and 54,464 mean raw reads per cell for RNA-seq, respectively.Using this dataset, we did not observe any HIV mapping reads in uninfected PBMCs, demonstrating that the viral data do indeed reflect genuine information about HIV expression and accessibility.
Cells with high percentages of mitochondrial reads, extreme RNA-seq read counts, and/or extreme ATAC-seq fragment counts were removed (see quality control metrics before and after cell filtering in Figure S3).Doublet cells, identified by scDblFinder across multiplexed samples corresponding to the treatment groups, were excluded from subsequent analysis [83].For read count normalization, we used sctransform by Seurat [64] for scRNA-seq and term frequency-inverse document frequency (TF-IDF) by Signac [53] for scATACseq.This was followed by principal component analysis (PCA) and latent semantic indexing (LSI) for dimension reduction, respectively.We kept the top 50 PCs from scRNAseq and the top 2-50 LSIs from scATAC-seq.Jointly, these are used as input to construct a weighted nearest neighbor (WNN) graph, which is further used for joint UMAP [32] visualization and clustering.We used the Leiden clustering algorithm for community detection with the resolution parameter visualized and optimized by the clustree algorithm [84], which builds a tree to visualize and examine how clusters are related to each other at varying resolutions, allowing researchers to assess which clusters are distinct and which are unstable with the use of additional metrics such as the SC3 stability index [85].For modality-specific processing, we built the unweighted K-nearest neighbor (KNN) and the weighted shared nearest neighbor (SNN) graphs, followed by UMAP visualization and clustering identification.For scATAC-seq data, we also adopted weighted PCA by Destin [86] to weigh chromatin accessibilities by existing genomic annotations and publicly available regulomic data.
We obtained the position frequency matrices and annotated 633 pairs of TFs and motifs from the JASPAR 2020 database [87]; we further applied chromVAR [40] to derive, for each TF, its motif score, which measures the deviation in chromatin accessibility across the set of peaks containing the corresponding motif, compared with a set of background peaks.We also constructed a pseudo-gene activity matrix [88], summing ATAC-seq read counts in gene bodies and promoter regions (2 kb upstream of genes' TSSs), followed by sctransform normalization.

Differential gene expression and motif accessibility
For each treated group (iBET151, prostratin, and vorinostat), we performed a nonparametric Wilcoxon rank sum test to identify DEGs and differentially accessible motifs compared to the DMSO control.We used the normalized gene expression levels by sctransform and the motif deviation scores by chromVAR as input, and adopted FDR for multiple testing correction.Cauchy combination testing [89] was performed to integrate the three sets of condition-specific P values that are correlated.For significant DEGs, we carried out a GO enrichment analysis using the Database for Annotation, Visualization and Integrated Discovery (DAVID) [90].

Gene, TF, and peak linkage analyses
For HIV expression, we calculated, for each cell, the proportion of HIV vRNA reads out of the total number of reads.For HIV chromatin accessibility, we directly used the total number of HIV ATAC-seq reads, due to the sparsity of the scATAC-seq data.We used the square root (sqrt) of the proportion of HIV vRNA reads as the outcome variable in various supervised frameworks, aiming to recover the molecular biomarkers that are related to HIV expression upon LRA treatment and thus latency reversal.The dependent variables in such linkage analyses include gene expression, TF motif scores, and peak accessibilities.For gene and TF linkage analyses, we adopted Q-value based FDR to control for false positives and to adjust P values for multiple testing correction [91].Due to the large number of testing in the peak linkage analysis and the low power due to the ATAC sparsity, we used a significance threshold of 0.01.For the significantly linked peaks, we further searched for motifs that were overrepresented in them compared with a background peak set using the hypergeometric test of enrichment.

TF network analysis
For TF-specific testing from the TF expression linkage analysis, motif enrichment analysis, and differential accessibility analysis, we compared the calling results and generated a master output to identify key TF regulators associated with latency reversal.We performed consensus PCA to jointly reduce dimensions of TF-coding gene expression levels and deviation scores of TF-binding motifs.For each pair of TFs, we calculated the correlation coefficients between their consensus PCs and constructed an adjacency matrix for the TF network by creating an edge between two TFs if the correlation between them is greater than 0.5.

Machine learning model of HIV reactivation
Due to the large number of features in the dataset, feature selection is necessary to create an interpretable and usable set of features for a machine learning model.In order to select a high-performance subset of features for prediction, we first calculated Spearman correlation coefficients between each feature within the multiomic datasets against the HIV gene expression before binarization of the cell data to HIVþ or HIV− based on the presence or absence of any detectable vRNA for 2D10 cells or ranking above or below the 90th percentile for HIV vRNA levels for primary CD4 þ T cells.From each feature set (RNA expression, motifs, and peaks), features with top k ranking Spearman correlation coefficients were then selected for modeling.Here, k is a hyperparameter to be determined by parameter tuning.For 2D10 cells, we performed parameter tuning, choosing 30 features from RNA expression, 20 from motifs, and 10 from peaks that were then used to train a gradient boosting tree classifier, which constructed a strong final classifier by combining several weaker classifiers.In gradient boosting, weak learners were optimized in an iterative fashion with loss functions that focus on the residuals (errors) of their predecessors.Here we used LightGBM, an efficient implementation of this algorithm [92].The parameters of this model were then tuned via 10-fold cross-validation and grid search.The feature importance in LightGBM was calculated as the total number of splits of each feature in all trees and was used in our experiment to rank each selected input feature.
For primary CD4 þ T cells, the initial modeling feature set consisted of the RNA expression and TF activity features observed across both donors (n ¼ 8497).For both the XGBoost models and GOSDT models, 80% of the data was randomly selected as a training set, while a test set of 18,508 was held out to evaluate the performance of the final models.Using the training data, absolute Spearman correlation coefficients were calculated between individual features and the vRNA label.The highest-ranked 1000 features by absolute Spearman correlation coefficients were considered for further modeling analysis.These 1000 features excluded mitochondrial features.Next, vRNA was binarized at a split value of 52, where cells with a vRNA value greater than 52 represented the top 10% of the overall data, while those with a vRNA value less than or equal to 52 comprised the bottom 90%.
XGBoost models [56] were trained directly using a binary loss objective through the XGBoost python package (version 1.6.2).Models that trained only using data from one donor only used the subset of the training data coming from that donor.Training involved hyperparameter tuning via a randomized grid search procedure with 100 iterations and 5-fold cross-validation of the training set using an objective of average precision.The hyperparameters tuned include the number of tree estimators, tree depth, subsampling ratio, column sample ratio, label weights, learning rate (eta), and the minimum number of observations.A complete list of hyperparameter thresholds chosen can be found at https://github.com/CalebKornfein/latency.Feature importance was ranked using permutation testing on the test set.Each feature was randomly shuffled five times, and the features were ranked by average drop in test area under the precision-recall curve.
GOSDT models [54] are a class of interpretable, optimal, and sparse decision trees.As with XGBoost, the top 1000 non-mitochondrial features ranked by absolute Spearman correlation coefficients were considered.The feature set of continuous features was reduced into a small set of binarized features using the threshold guessing column elimination procedure proposed by McTavish and his colleagues [55].This procedure iteratively drops columns of least importance as determined by a black-box boosting model, in our case scikitlearn's implementation of a gradient boosting classifier.A stopping condition for backward selection was met when both (1) further dropping resulted in deviance performance loss of the gradient boosting classifier and (2) the number of remaining thresholds to binarize features was less than or equal to 30.We varied hyperparameters of the gradient boosting classifier in the threshold guessing procedure including the number of estimators and the weight of the sample labels by randomly oversampling the minority class.Finally, GOSDT models were trained on the remaining binary splits using the python package gosdt (version 0.1.7).GOSDT trees were created for a variety of tree depths, regularization levels, and relative sample weights.Between varying the thresholds and gosdt hyperparameters, 2258 models were produced.

shRNA knockdown and overexpression
For knockdown experiments, lentiviruses encoding shRNAs against FOXP1, GATA3, or scrambled control were generated by co-transfecting viral expression plasmids with Gag-Pol and VSV-G packaging plasmids into HEK293T cells.Virus containing supernatant was subjected to low-speed centrifugation (300 g, 5 min) and filtration using a 0.45-μm filter to removed cellular contaminants.The remaining supernatant was added to 2D10 cells in complete RPMI media for 24 h to facilitate infection.For GATA3 knockdown, the targeting lentivirus encoded a puromycin resistance gene, and transduced cells were selected by inclusion of 1μg/ml puromycin in the cell media.For FOXP1 knockdown, the targeting lentivirus also encoded an mCherry expression cassette, and transduced cells were distinguished from non-transduced cells by flow cytometry and gating for mCherry þ cells.Five days post transduction, cells were quantified for viral gene expression (GFP) by flow cytometry.
For overexpression of FOXP1, plasmids encoding a FOXP1-mCherry fusion separated by a 2A peptide or mCherry only were co-transfected with NL4-3-�6-dreGFP-IRES-thy1.2 into HEK293T cells.Three days post transfection, GFP expression in the mCherry þ population was quantified.Cells from knockdown and overexpression experiments were also harvested for Western blot analysis.

Figure 1
Figure 1 Single-cell multiomic analysis of HIV latency reversal A. A schematic of the experimental design of primary CD4 þ T cell HIV latency model followed by stimulation with LRAs.DMSO was used as a control.B. Flow cytometry showing the expression of the viral GFP reporter in HIV-GFP-infected primary CD4 þ T cells from donor 1 after exposure to LRAs (vorinostat, prostratin, and iBET151) or control vehicle (DMSO).C. Bar plot showing the percentage of GFP þ cells in infected cells for each condition.D. and E. UMAP dimension reduction of scRNA-seq and scATAC-seq, and WNN with cells labeled by conditions (D) and by graph-based clusters (E).Data shown here are from primary CD4 þ T cells of donor 1.Data from 2D10 cells and primary CD4 þ T cells of donor 2 are shown in Figures S1 and S2, respectively.LRA, latency reversing agent; HIV, human immunodeficiency virus; GFP, green fluorescent protein; DMSO, dimethylsulfoxide; WNN, weighted nearest neighbor; UMAP, Uniform Manifold Approximation and Projection; scRNA-seq, single-cell RNA sequencing; scATAC-seq, single-cell assay for transposase-accessible chromatin with sequencing.

Figure 2
Figure 2 HIV vRNA expression and chromatin accessibility A. Transformed HIV vRNA reads under different treatment conditions.P values from one-way ANOVA test and for significant pairwise differences based on the Tukey test are shown.B. UMAP plot of the scRNA-seq data color-coded based on the level of HIV vRNA expression.C. Transformed HIV mapping scATAC-seq reads under different treatment conditions.P values from one-way ANOVA test and for significant pairwise differences based on the Tukey test are shown.D. UMAP plot of the scATAC-seq data color-coded based on the levels of HIV mapping scATAC-seq reads.E. Correlation scatter plot showing the sqrt proportion of scATAC-seq reads mapping to HIV (X-axis) vs. the sqrt proportion of scRNA-seq reads mapping to HIV (Y-axis) across the cell population.F. Coverage plots showing normalized read counts from scATAC-seq along the HIV reference genome for different LRA conditions.The dark blue and gray boxes represent gene annotations and ATAC-seq peaks, respectively.G. Dot plot showing the percentage of cells expressing HIV and the average expression level for vRNAs.H. Percentage of cells with accessible HIV DNA and the average accessibility for different regions of the HIV proviral genome.Data shown here are from primary CD4 þ T cells of donor 1.Data from 2D10 cells and primary CD4 þ T cells of donor 2 are shown in FiguresS4 and S5, respectively.ANOVA, analysis of variance; ATAC, assay for transposase-accessible chromatin; scATAC-seq, single-cell assay for transposase-accessible chromatin with sequencing; vRNA, viral RNA; RNA-seq, RNA sequencing; sqrt, square root.

Figure 3
Figure 3 DEGs stimulated by LRAs A. Venn diagrams showing the number of down-regulated (upper) and up-regulated (lower) genes upon treatment with LRAs (FDR-adjusted P value < 0.05).B. Heatmaps showing normalized and scaled RNA expression levels for different treatment conditions.The rows contain representative genes that were down-regulated (left) and up-regulated (right), respectively.The columns represent single cells.Data shown here are from primary CD4 þ T cells of donor 1.Data from 2D10 cells and primary CD4 þ T cells of donor 2 are shown in Figures S9 and S10, respectively.FDR, false discovery rate.

Figure 4
Figure 4 DATFs stimulated by LRAs A. Venn diagrams showing the number of TFs whose motif accessibilities were significantly down-regulated (upper) or up-regulated (lower) upon treatment with LRAs compared with DMSO (FDR-adjusted P value < 0.05).B. Box plots showing distributions of motif accessibilities (deviation scores) of selected TFs with decreased accessibility scores (SNAI2, SREBF1, and ZNF135; upper) or increased accessibility scores (NFATC2, RBPJ, and TEAD2; lower) in response to all three LRAs.P values from Wilcoxon rank sum tests comparing DMSO-treated and one of the LRA-treated samples were integrated using the Cauchy combination.C. Box plots showing distributions of motif accessibilities of positive controls, FOS and JUN.Data shown here are from primary CD4 þ T cells of donor 1.Data from primary CD4 þ T cells of donor 2 and 2D10 cells are shown in Figures S11 and S12, respectively.TF, transcription factor; DATF, differentially accessible transcription factor.

Figure 5
Figure 5 Linkage analyses between HIV transcription and cellular transcripts/TFs/peaks A. Distribution of nominal P values from cellular RNA expression linkage analysis.B. Visualization of selected cellular transcripts linked with HIV vRNA expression.The dotted line represents the fitted line from a simple regression.Spearman correlation coefficient (r) and nominal P value are shown.C. Distribution of nominal P values from TF activity linkage analysis.D. Visualization of activities of selected TFs linked with HIV vRNA expression.E. Distribution of nominal P values from peak linkage analysis.F. Visualization of motifs enriched in ATAC-seq peaks that were positively (top) and negatively (bottom) associated with the HIV vRNA expression.Data shown here are from primary CD4 þ T cells of donor 1.Data from 2D10 cells and primary CD4 þ T cells of donor 2 are shown in Figures S14 and S15, respectively.

Figure 6 Figure 7
Figure6TF regulators of HIV latency reversal A. Venn diagram showing the number of HIV-linked TFs from three different testing schemes: expression linkage analysis of TF-coding genes (TF expression linkage), enrichment analysis of TF-binding motifs using significantly linked peaks (motif enrichment), and analysis of TFs with differential motif accessibility (differential accessibility).B. Heatmap showing correlations between TF pairs for selective 50 TFs with the highest combined P values from the three testing schemes shown in (A).The correlation is computed using the TF-specific consensus PCs, calculated using both the TF-coding gene expression level and its motif accessibility.C. TF regulatory network constructed from TFs' motif deviation scores and expression levels (see Materials and methods for details).Data shown here are from primary CD4 þ T cells of donor 1.Data from 2D10 cells and primary CD4 þ T cells of donor 2 are shown in FiguresS19 and S20, respectively.PC, principal component.

Figure 8
Figure 8 GATA3 and FOXP1 regulate HIV latency A. Binding site accessibility and RNA expression of GATA3 and FOXP1 across drug conditions.B. Western blot showing the depleted GATA3 in 2D10 cells transduced with shGATA3.β-actin was used as a loading control.Non-transduced 2D10 cells were used as mock.C. Flow cytometry analysis of viral GFP expression in 2D10 cells transduced with shGATA3 or shNT at baseline and 24 h after prostratin stimulation.Data were represented by mean ± SD (n ¼ 3).P values were determined by Student's t-test.D. Western blot showing the depleted FOXP1 in 2D10 cells transduced with shFOXP1.β-actin was used as a loading control.Non-transduced 2D10 cells were used as mock.E. Flow cytometry analysis of viral GFP expression in 2D10 cells at 5 days post transduction with shFOXP1 or shNT.Data were represented by mean ± SD (n ¼ 3).P value was determined by Student's t-test.F. Dot plot showing representative flow cytometry for viral GFP expression in 2D10 cells at 5 days post transduction with shFOXP1 or shNT.G.Western blot showing the overexpression of FOXP1D in HEK293T cells transfected with a FOXP1-mCherry-expressing plasmid or a mCherry-expressing plasmid (control) plus HIV-GFP.β-actin was used as a loading control.Non-transfected HEK293T cells were used as mock.H. Flow cytometry analysis of viral GFP expression in HEK293 cells transfected with a FOXP1-mCherry-expressing plasmid or a mCherry-expressing plasmid (control) plus HIV-GFP.Data were represented by mean ± SD (n ¼ 3).P value was determined by Student's t-test.I. Schematic overview of experimental design for CRISPR/Cas9 targeting of GATA3 and FOXP1 in HIV-GFP-infected primary CD4 þ T cells.Activated CD4 þ T cells were infected with HIV-GFP.At 3 days post infection, infected (GFP þ ) cells were enriched by flow sorting and cultured for 1 week before nucleofection of Cas9/sgRNA complexes targeting GATA3 or FOXP1.One week post nucleofection of sgRNAs, knockout of the targets was analyzed by Western blot.Furthermore, GFP þ cells were quantified after stimulation with prostratin for 24 h at 1 week post nucleofection.J.Western blot showing the depletion of GATA3 and FOXP1 at 1 week post Cas9/sgRNA nucleofection.β-actin was used as a loading control.HIV-GFP-infected primary CD4 þ T cells without Cas9/sgRNA nucleofection were used as mock.K. Flow cytometry analysis of viral GFP expression in HIV-GFP-infected primary CD4 þ T cells nucleofected with sgGATA3 or sgFOXP1 at baseline and 24 h after prostratin stimulation.sgNT was used as a negative control, and sgTat was used as a positive control.Data were represented by mean ± SD (n ¼ 3), and P values were determined by one-way ANOVA Tukey's multiple comparisons test.shRNA, short hairpin RNA; shNT, non-targeting shRNA; shGATA3, GATA3targeting shRNA; shFOXP1, FOXP1-targeting shRNA; FOXP1A, FOXP1 isoform A; FOXP1D, FOXP1 isoform D; FOXP1-OE, FOXP1 overexpression; sgRNA, single-guide RNA; sgNT, non-targeting sgRNA; sgGATA3, GATA3-targeting sgRNA; sgFOXP1, FOXP1-targeting sgRNA; sgTat, Tat-targeting sgRNA; CRISPR, clustered regularly interspaced short palindromic repeats; Cas9, CRISPR-associated protein 9; SD, standard deviation.
Ashokkumar M et al / HIV Latency Reversal by Single-cell Multiomics Dot plot showing the percentage of cells expressing HIV and the average expression level for vRNAs.H. Percentage of cells with accessible HIV DNA and the average accessibility for different regions of the HIV proviral genome.Data shown here are from primary CD4 þ T cells of donor 1.Data from 2D10 cells and primary CD4 The dark blue and gray boxes represent gene annotations and ATAC-seq peaks, respectively.G. þ T cells of donor 2 are shown in FiguresS4 and S5, respectively.ANOVA, analysis of variance; ATAC, assay for transposase-accessible chromatin; scATAC-seq, single-cell assay for transposase-accessible chromatin with sequencing; vRNA, viral RNA; RNA-seq, RNA sequencing; sqrt, square root.

Table 1 Top 20 genes significantly linked with HIV transcription in primary CD4 + T cells of donor 1
Note: HIV, human immunodeficiency virus; FDR, false discovery rate.

Table 2 Top 20 TFs significantly linked with HIV transcription in primary CD4 + T cells of donor 1 TF Motif Spearman correlation coefficient Nominal P value FDR-adjusted P value
Note: TF, transcription factor.Ashokkumar M et al / HIV Latency Reversal by Single-cell Multiomics

Table 3 Top 20 enriched TF-binding motifs significantly linked with HIV transcription in primary CD4 + T cells of donor 1 TF Motif Observed Background Percent.observed Percent.background Fold enrichment P value FDR-adjusted P value
Note: A list of the top 20 TF-binding motifs enriched in ATAC-seq peaks that were significantly associated with HIV transcription.Corresponding TFs, the numbers of motifs occurring in the linked peaks and background peaks (Observed and Background), the percentages of linked peaks and background peaks containing the motif (Percent.observedand Percent.background),fold enrichment (Percent.observedvs. Percent.background),P values from hypergeometric tests examining the degree of enrichment, and FDR-adjusted P values are shown as columns.