BERMAD: batch effect removal for single-cell RNA-seq data using a multi-layer adaptation autoencoder with dual-channel framework

Abstract Motivation Removal of batch effect between multiple datasets from different experimental platforms has become an urgent problem, since single-cell RNA sequencing (scRNA-seq) techniques developed rapidly. Although there have been some methods for this problem, most of them still face the challenge of under-correction or over-correction. Specifically, handling batch effect in highly nonlinear scRNA-seq data requires a more powerful model to address under-correction. In the meantime, some previous methods focus too much on removing difference between batches, which may disturb the biological signal heterogeneity of datasets generated from different experiments, thereby leading to over-correction. Results In this article, we propose a novel multi-layer adaptation autoencoder with dual-channel framework to address the under-correction and over-correction problems in batch effect removal, which is called BERMAD and can achieve better results of scRNA-seq data integration and joint analysis. First, we design a multi-layer adaptation architecture to model distribution difference between batches from different feature granularities. The distribution matching on various layers of autoencoder with different feature dimensions can result in more accurate batch correction outcome. Second, we propose a dual-channel framework, where the deep autoencoder processing each single dataset is independently trained. Hence, the heterogeneous information that is not shared between different batches can be retained more completely, which can alleviate over-correction. Comprehensive experiments on multiple scRNA-seq datasets demonstrate the effectiveness and superiority of our method over the state-of-the-art methods. Availability and implementation The code implemented in Python and the data used for experiments have been released on GitHub (https://github.com/zhanglabNKU/BERMAD) and Zenodo (https://zenodo.org/records/10695073) with detailed instructions.


Introduction
Single-cell RNA sequencing technology combined with computational biological analysis can reveal the specific and independent transcriptome information of cells in heterogeneous states (Kharchenko 2021).However, with the growth of more and more scRNA-seq techniques, the scale of generated datasets is also increasing.Large-scale scRNA-seq datasets generated from different experiment platforms usually contain batch-specific systematic variations, the so-called 'batch effect' (Tran et al. 2020).In order to accurately carry out various related cell and gene research on datasets described above, it is very crucial to first conduct batch effect removal.
There are already research efforts on batch effect removal.Linear model is the first method proposed to remove batch effect from data.In fact, it was originally used to process microarray gene expression data.Smyth and Speed (2003) proposed limma; it fits the input data into a linear model to capture the batch effect, which will be subtracted later from the original data to obtain the batch corrected gene expression matrix.Combat (Smyth and Speed 2003) is another method proposed for microarray gene expression data, and Risso et al. (2018) applied it to scRNA-seq data.Combat uses an empirical Bayes method to fit the standardized input data to standard distributions for estimating the batch effect, and then utilizes the obtained batch correction operator to conduct batch effect removal.With continuous growth of the scale of scRNA-seq datasets, linear models are difficult to handle such complex and highly nonlinear data because of its limited representation ability.Haghverdi et al. (2018) first proposed to use MNNs between batches to remove batch effect.They use the identified MNN pairs to compute the batch correction vector and apply it to all cells in the dataset.Lately, Lun (2019) improved the above method and proposed fastMNN.The promotion is that MNN pairs are identified in dimension-reduced PCA space, thus avoiding the high computational cost brought by the original high-dimensional gene expression space.BBKNN (Pola� nski et al. 2020) uses the neighborhood graph of cells in batches to perform data integration and batch correction.Scanorama (Hie et al. 2019) is analogous to computer vision algorithms for panorama stitching, where it automatically identifies scRNA-seq datasets containing cells with similar transcriptional profiles and leverages those matches for batch correction.All MNN-based methods rely on the accuracy of MNN pairs' recognition and tend to work correctly only when the cell type compositions across batches are relatively similar.
There are some approaches based on component analysis and dimension reduction, Seurat 3 (Stuart et al. 2019) is a typical one of them.It uses canonical correlation analysis (CCA) for dimension reduction, and identifies MNN pairs in the obtained dimension-reduced CCA space to calculate the correction vector.Another method of this approach is Harmony (Korsunsky et al. 2019), which first projects all batches into a dimension-reduced space with the help of PCA, and then iterates between cell clustering and batch correction until convergence.Poli� car et al. (2023) first constructs a t-SNE embedding of one dataset as reference, then embeds another dataset to the reference-defined low-dimensional space to remove batch effect.However, these methods are highly dependent on the quality of the dimension reduction work and thus only work well under specific circumstances.
In recent years, there have been some deep learning-based methods dedicated to removing batch effect.scVI (Lopez et al. 2018) utilizes a deep generative model to approximate the underlying distribution of input gene expression values, and the output is subsequently used to deal with batch effects.BERMUDA (Wang et al. 2019) aligns similar clusters in a low-dimensional space by training a deep autoencoder to achieve batch effect removal.HDMC (Wang et al. 2022) address batch effect problem through a hierarchical distribution-matching framework as well as contrastive learning.iMAP (Wang et al. 2021) removes batch effect based on both deep autoencoders and generative adversarial networks.mtSC (Duan et al. 2021) regards each dataset as an individual task to train a multitask-based deep neural network for single-cell alignment.Among these methods, scVI and BERMUDA either pay excessive attention to bridge the distribution difference between datasets or focus too much on the alignment of similar cell clusters, which may lead to the loss of independence and specificity of individual dataset.On the other hand, although HDMC uses contrastive learning to address over-correction, it essentially processes different batches jointly without considering batch specificity.iMAP needs to obtain MNN pairs between batches before batch correction step, so it still relies on the accuracy of MNN pairs' recognition.As for mtSC, it is proposed for dataset assignment instead of batch correction or data integration.
As discussed above, some previous methods rely on assumptions that do not always meet or cannot deal with highly nonlinear data, which may lead to under-correction.However, other methods pay too much attention to reduce the difference between batches and ignore some important exclusive information of an individual dataset, which may cause over-correction.To address these concerns, we propose a novel multi-layer adaptation autoencoder with dualchannel framework, attempting to develop a unified framework to simultaneously tackle the under-correction and over-correction problems in batch effect removal.
For our BERMAD model, in order to integrate datasets from different experiment platforms more completely and accurately, we design a multi-layer adaptation architecture, which carries on distribution matching of different dimensions in different feature layers of the autoencoder, and then generates a batch effect corrected embedding for downstream tasks.At the same time, for explicitly considering the specificity and independence of an individual dataset, we propose a dual-channel framework.In detail, we train network parameters for different batches separately, so that the autoencoders processing different batches have the same model architecture but have unshared weights, thus solving the over-correction problem caused by the loss of dataset specificity.Experiments on simulated dataset and real scRNA-seq datasets have proved our method's effectiveness and superiority over other state-of-the-art methods.
Our method has two advantages compared with previous methods: (i) the novel multi-layer adaptation architecture can reduce the distribution difference of similar cell clusters between different batches more effectively.The outputs on different layers of a deep autoencoder have different feature dimensions and contain information of different granularity, allowing for comprehensive distribution matching at different levels, thereby producing to more accurate batch correction results.(ii) The dual-channel framework of BERMAD can preserve the unique biological signals of a single dataset, such as unshared cell types.The same network architecture but independently trained parameters ensure correct clustering and integration, thus effectively reducing over-correction.

Materials and methods
The overall architecture of our method is shown in Fig. 1.Let X 1 ¼ fx 1 ; . . .; x m g and X 2 ¼ fx 1 ; . . .; x n g, respectively, represent the input of scRNA-seq dataset from two batches, X 0 1 ¼ fx 0 1 ; . . .; x 0 m g and X 0 2 ¼ fx 0 2 ; . . .; x 0 n g are their reconstructions.The goal is to train a deep autoencoder to obtain the batch corrected low-dimensional embeddings Z 1 and Z 2 of the original data, which will be combined into the final output Z after the task is completed.Specifically, we use the inherent reconstruction loss of the autoencoder to obtain the appropriate low-dimensional embedding of the original highdimensional gene expression data.Then we calculate the transfer loss on the output of three hidden layers h 1 , h2 and z with different dimensions simultaneously to reduce the distribution difference between two batches.Detailed description about the workflow and the diagram of our method can be found in Supplementary material SI.
In fact, our design was inspired by the recent advance in the research of domain adaptation, which can be successfully transferred to batch effect removal tasks we are interested in.Domain adaptation is a subfield within machine learning that attempts to generalize the model learned from the source domain to the target domain.It allows the model transfer and generalization by minimizing the distribution difference between domains (Farahani et al. 2021).The mainstream method of domain adaptation is to perform distribution matching between source domain and target domain, similar to that we perform distribution matching between different batches.To our knowledge, we are the first to apply multilayer adaptation to batch correction and design a corresponding approach for batch effect removal.

Basic model
We choose deep autoencoder as the basic model for our method, since it is an unsupervised learning method widely used in domain adaptation research and has excellent performance in data dimension reduction, feature extraction and data reconstruction (Zhang et al. 2020).Specifically, the basic model of our framework is a deep autoencoder consisting of an input layer, three hidden layers with dimensions of 200, 20, and 200, and an output layer, which is a common setting in domain adaptation.The reconstruction loss between the original input and the reconstruction output is calculated to obtain the appropriate low-dimensional embedding of the original gene expression data, so the first term of the loss function of our method is the inherent reconstruction loss of the autoencoder: where x i is the original input sample and x 0 i is its reconstruction.

Multi-layer adaptation
In order to reduce the distribution difference between clusters of the same type from different batches for accurate batch effect removal, we conduct distribution matching on the intermediate outputs of the deep autoencoder.We choose the maximum mean discrepancy (MMD) (Gretton et al. 2012) between different batches as the measure of distribution difference, which is a common choice in domain adaptation.MMD is a nonparametric method to measure the distance between embeddings of the probability distributions in a reproducing kernel Hilbert space (RKHS), and performs well in many deep transfer learning tasks (Ghifary et al. 2014, Long et al. 2017, Ying et al. 2018).The detailed introduction of MMD is shown in Supplementary materials SII.

The calculation of transfer loss
In order to correctly calculate the MMD between cell clusters of different batches before distribution matching, we need to first divide each batch into several clusters individually.To this end, we cluster the datasets of each batch separately using the Seurat package provided by Butler et al. (2018), which perform graph-based clustering on the data with Louvain algorithm.Since the cell types contained in different batches may not be identical and only the distribution difference between similar cell clusters contribute to the task, we need to filter out similar pairs of cell clusters between batches.Therefore, we run the MetaNeighbor algorithm of Crow et al. (2018) to calculate the similarity score of the cell cluster pairs between different batches.We briefly describe the working principle of MetaNeighbor in Supplementary material SIII.Only the cell cluster pairs with the similarity score larger than the similarity threshold (0.85 as the default) are included in the distribution discrepancy calculation.In general, our MMD-based transfer loss is computed as follows: where c i and c j are the cell clusters of batch 1 and batch 2 respectively, eðc i Þ and eðc j Þ are their intermediate output embeddings, and sði; jÞ is the similarity score between c i and c j .We binarize it according to our requirement: where thr is the similarity threshold used to filter out similar cell cluster pairs, set to 0.85 by default.

Design of multi-layer adaptation
Since a deep autoencoder has multiple intermediate layers from input to output, it's a new problem to determine which layers' distribution matching should be performed, i.e. which layers' transfer loss should be included in the loss function of our framework.Yosinski et al. (2014)  Inspired by above discussion, we design the multi-layer adaptation architecture of BERMAD.Specifically, we calculate transfer loss on the embeddings h 1 , h 2 , and z of the three intermediate layers in the deep autoencoder as shown in Fig. 1 according to Equation ( 2).Then we perform distribution matching on embeddings of different dimensions, to more completely reduce the distribution differences of similar cell clusters between batches.Finally, the second term of our framework's loss function is a weighted sum of three calculation results of intermediate transfer loss: where L h 1 , L h 2 and L z represent the single-layer transfer loss on the intermediate layer h 1 , h 2 and z, respectively.a, b and c are hyperparameters for balancing the transfer loss items of different layers, where they may have different optimal values on different datasets but generally robust.To verify the robustness, we conduct comparative experiments with different hyperparameter values on the simulated dataset and the real dataset Pancreas, and present the results in Supplementary material SIV.At the same time, considering the convergence of the algorithm, their values should not be too large, which are set to 0.2, 0.2, 0.2 by default.

Total loss of BERMAD
The total loss function of our framework is formed by combining the reconstruction loss inherent in deep autoencoder and the multi-layer adaptation transfer loss designed by us: L rec is the reconstruction loss used to obtain the appropriate low-dimensional representation of the original data.L tran is the transfer loss used to match the distribution of similar cell clusters between batches, and k is a hyperparameter balancing those two components.We follow the strategy proposed by Ganin and Lempitsky (2015) to gradually increase k from 0 to 1: where n is the number of total epochs and p is the current epoch.With this parameter setting, we can make the network focuses on finding the appropriate low-dimensional representation of data in the early training process, and focus on matching similar cell clusters in the later training process.

Dual-channel framework
There is another common and thorny problem in the field of batch correction, which is over-correction caused by too much attention to the distribution matching between batches.For example, although most of the cell types contained in scRNA-seq data from different batches are the same or similar, there are still some unshared cell types present within an individual dataset in some real tasks.If we ignore this kind of specificity of datasets, it may lead to incorrect alignment of dissimilar cell clusters after integration.Therefore, we should avoid such over-correction.This phenomenon also exists in domain adaptation.In some cases, the domain invariance caused by over-mitigating domain differences might be detrimental to discriminative power of the method, leading to declined performance (Rozantsev et al. 2018).Rozantsev et al. (2018) proposed a two-stream architecture that explicitly models the domain shift, where each stream serves for a single domain (source domain or target domain).This allows the models able to handle data from different domains with the same network architecture but different parameters.At the same time, a specifically designed loss is used to constrain the differences between domains, so that the parameters of the models will not be too different from each other.
Inspired by the above method, we propose a dual-channel framework to address the problem of over-correction and the loss of dataset specificity for batch correction.We make the models handle datasets from different batches independently of each other (as depicted by Fig. 1).In other words, the deep autoencoders encoding different datasets have the same architecture but different parameter values.This structural separation helps us explicitly model the difference between batches and the specificity of datasets.Since a single deep autoencoder is trained independently, its reconstruction loss is also optimized independently, which helps the lowdimensional embedding of data more completely retain the unshared information.At the same time, the transfer loss between data embeddings in different channels also plays an important role in constraining the parameters between batches not to differ too much, preventing the failure of data integration.Through such dual-channel framework, we can accurately align similar cell clusters between datasets while maintaining the independence of unshared type cell clusters.

Simulated data
First, we evaluated our method on a simulated dataset generated using the splatter package in R (Zappia et al. 2017).The dataset consists of two batches (denoted as Batch 1 and Batch 2); Batch 1 contains 2000 cells, and Batch 2 contains 1000 cells.Since linear methods developed in early time were not designed for highly nonlinear scRNA-seq data, so we exclude them from method comparison.We compared our method The qualitative visualization results on the simulated dataset are shown in Fig. 2. It shows a significant margin between cell clusters of the same type from different batches, which is known as batch effect.Not removing batch effect will lead to lower accuracy of subsequent downstream analysis such as cell clustering.It can be seen that all the methods we selected can correctly perform data integration and batch correction on the simulated dataset, but produce clusters with different densities.Among all methods, BERMAD and BERMUDA have the most impressive visualization results as they generate compact and independent cell cluster structures.In this article, we only illustrate qualitative figures dyed by cell type.All the qualitative visualization results dyed by batch ID are presented in Supplementary material SV.
For quantitative evaluation, we choose divergence, silhouette and Adjusted Rand Index (ARI) to evaluate our method and other comparison methods from different perspectives.Divergence is used to measure the distribution discrepancy between batches, so a lower value is better.Silhouette evaluates whether dissimilar cell clusters still maintain good separation after integration, so a higher value is preferred.A higher ARI means that the clustering result is closer to ground truth, which also means better batch correction performance.The detailed explanations and calculation formulas of these three metrics are shown in Supplementary material SVI.The quantitative calculation results of three performance evaluation metrics are presented in Table 1.Due to the simple composition of the simulated dataset, the ARI of all methods is roughly the same, so we only analyze the divergence and silhouette.From the table, it can be seen that BERMAD has the lowest divergence and the highest silhouette, showing the effectiveness and superiority of our method.Note that most methods cannot consider both divergence and silhouette simultaneously, as they are conflicting performance evaluation metrics, while BERMAD can achieve optimal results on both metrics simultaneously.Meanwhile, although BERMUDA has comparative visualization result to BERMAD, its metric calculation results are not as good as ours.On this dataset, we set thr ¼ 0:90, a ¼ b ¼ c ¼ 0:3.

Pancreas data
To compare our method with others on real data, we further applied BERMAD to human Pancreas scRNA-seq dataset.The dataset consists of two batches called Muraro (Muraro et al. 2016) and Baron (Baron et al. 2016), generated by different sequencing techniques CEL-Seq2 and droplet RNA seq, and each batch contains eight cell types.In order to compare the performance of different methods more comprehensively, we divided the experiment on this dataset into three parts.In the first part, we performed batch correction on the whole dataset, which corresponds to the situation that batches have identical cell type composition.We refer to it as Pancreas_Same, where batch Muraro contains 2042 cells, and batch Baron contains 8012 cells.In the second part, we removed the alpha and beta types from Baron, which corresponds to the situation where there are unshared cell types between batches.We refer to it as Pancreas_Diff, where batch Muraro still contains 2042 cells, while batch Baron contains 3161 cells.In the third part, we introduced another dataset Seger (Segerstolpe et al. 2016) with a different sequencing technique called SMART-Seq2.We removed the alpha type from Muraro and removed the beta type from Seger, which is also for unshared cell types between batches.We refer to it as

BERMAD
other clusters of the same type.On the contrary, deep learning-based methods are able to perform data integration and batch correction appropriately, and our method achieved the most compact and considerable visualization result among these four methods.At the same time, Harmony, scVI, and BERMUDA all mixed cells of different types to varying degrees, while HDMC and BERMAD avoided such over-correction problem.In addition, the epsilon cells are relatively rare and can be covered by other cells with larger number during plotting process, which may be difficult for visual recognition.This phenomenon is normal and trivial for performance evaluation and visualization analysis.
The calculation results of performance evaluation metrics are presented in Table 2.The quantitative metrics calculation reflects roughly equivalent experimental results to qualitative visualization, and our method BERMAD achieved the best calculation results on all three metrics.The lowest divergence indicates that BERMAD can integrate cell clusters of the same type from different batches most appropriately.The highest silhouette indicates that BERMAD can form the most independent and compact cell clusters, and the highest ARI directly illustrates the clustering accuracy of BERMAD.Note that all other methods have either one or more performance evaluation metrics exhibited worse values, which is consistent with our analysis of their visualization results.On this dataset, we set thr ¼ 0:85, a ¼ b ¼ c ¼ 0:2.

PBMC data
To verify the data integration and batch correction capabilities of BERMAD on more complex scRNA-seq datasets, we compared it with comparison methods on PBMC dataset.This is a human peripheral blood monocular cells dataset (Kang et al. 2018) where technical variations are confounded with biological signals.24 679 cells are divided into two groups, one treated with interferon-beta (IFN-b) and the other as a control.Due to different responses of different cells to IFN-b, there is batch effect caused by technical processing between the two groups of data.In the meantime, compared to Pancreas dataset, PBMC dataset has a larger scale, making batch correction tasks on it more challenging.
From Fig. 4, BERMAD still produced excellent performance in the face of more complex batch effect.There is significant margin between cell clusters of the same type from the IFN-b treated group and the control group in the uncorrected visualization result, known as batch effect.MNN and BBKNN can only merge CD14þ cells together, while ignoring other types of cells.Scanorama still presented chaotic and disorderly visualization result, mistakenly mixing all types of cells.Although Harmony successfully merged some cell clusters of the same type (CD4 T, CD14þ, and FCGR3Aþ), it caused cell clusters of different types to approach each other (B, CD8 T, and NK), resulting in over-correction problem.iMAP failed to combine all the similar cells between two batches, leaving some cells of the same type (e.g.B and CD14þ) far from each other.Compared with the above methods, deep learning-based methods scVI, BERMUDA, HDMC, and BERMAD more accurately integrated cells of the same type while forming more independent cluster   The optimal results are highlighted in bold font.
Table 2. Calculation results of evaluation metrics on Pancreas_Same dataset.The optimal results are highlighted in bold font.
structures.However, the cell clusters produced by scVI are relatively loose and far less compact and clear than BERMAD.And there are still varying margins between some cells of the same type (such as CD14þ) in the result of BERMUDA, indicating that its data integration and batch correction are not as sufficient as BERMAD.As for HDMC, it mistakenly confused two different types CD14þ and FCGR3Aþ, which is an over-correction problem.Note that CD4 T, CD8 T, and NK cells are highly similar in biology, so they remain close to each other after correction.
The calculation results of the three performance evaluation metrics on the PBMC dataset are presented in Table 3.It can be seen that all nondeep learning-based methods struggled to achieve better results on all three metrics simultaneously, as their optimization goals usually conflict.On the contrary, deep learning-based methods can improve silhouette and ARI while reducing divergence.Our method BERMAD outperformed all other methods in terms of silhouette and ARI, and its divergence is also lower than most other methods.Although scVI achieved the lowest divergence, its silhouette and ARI were lower than ours, and as mentioned before, its qualitative visualization result was not as good as ours.The above experiment results all demonstrate the effectiveness and superiority of our method on complex datasets such as PBMC.On this dataset, we set thr ¼ 0:85, a ¼ b ¼ 0:1, c ¼ 0:5.

Multiple batch
We also generalized our method to data integration and batch correction for multiple batches, which is more suitable for current application scenario of single-cell analysis.Since BERMAD focuses on cluster pairs between different batches, it can be naturally generalized to multiple batches.Specifically, we find and constrain the similar cluster pairs among all the batches simultaneously.In the meantime, we train an individual autoencoder for each dataset and reconstruct the input by reconstruction loss with the parameters of different autoencoders constrained by transfer loss.We tested BERMAD and all the other comparative methods on Multi-Batch dataset, which consists of three batches named Muraro, Baron and Seger mentioned before.The number of cells in three batches are 1230, 3161, and 1791, respectively.
The qualitative visualization results of multi-batch integration are shown in Fig. 5.The quantitative calculation results of evaluation metrics are given in Table 4. From Fig. 5, it can be seen that most compared methods are unable to work well facing more than two batches.They either could not integrate similar clusters between batches correctly, or mistakenly combined dissimilar clusters together.On the contrary, BERMAD managed to remove batch effect between similar clusters and keep dissimilar clusters away from each other, outputting clear and compact clusters.In the meantime, BERMAD achieved best calculation results of all three metrics.Specifically, the lowest divergence means that BERMAD can remove batch effect more sufficiently.The highest silhouette and ARI means that BERMAD can maintain biological specificity better, alleviating over-correction issue.The above experimental results demonstrate our method's good generalization ability in multi-batch integration scenario.

Ablation studies
In order to verify the efficacy of different components in the framework of our method, we further validate two variants of BERMAD on the 'Pancreas_Same' dataset and compare them with the complete framework of BERMAD.Specifically, the two variants are: (i) BERMAD without multi-layer adaptation architecture, which only performs distribution matching on the immediate layer z (as shown in Fig. 1), which we refer to as woMultiLayer.(ii) BERMAD without dual-channel framework, where two batches share the same model architecture and parameter settings, which we refer to as woDualChannel.The calculation results of three metrics are shown in Table 5.The visualization results can be found in Supplementary material SVIII.
From the table, the complete BERMAD framework has the best performance comprehensively.Although woDualChannel The optimal results are highlighted in bold font.

BERMAD
achieves lower divergence, its silhouette is much lower than BERMAD, indicating that without dual-channel framework, the method focuses more on distribution matching between batches, but ignores the independence of cluster structure.
The dual-channel framework can help to generate clearer and independent cluster structures while ensuring accurate correction, which alleviates over-correction issues.For woMultiLayer, it achieves higher silhouette than woDualChannel, but its divergence is significantly higher, and its ARI is also slightly decreased.It indicates that multilayer adaptation architecture contributes to more comprehensive and accurate data integration and batch correction.Our method BERMAD combines these two parts to obtain the most outstanding performance.

Conclusion
In this article, we propose a deep learning-based method BERMAD for batch effect removal of scRNA-seq data.
BERMAD is a multi-layer adaptation autoencoder with dual- Qualitative and quantitative experiments on simulated and real datasets show that BERMAD has better visualization results than the competing methods, and outperforms all other methods in different metrics.In general, our method can accurately and sufficiently integrate cells of the same type between different batches while keeping cells of different types away from each other, proving the effectiveness and superiority of our method.In the future, we will focus on integrative analysis of scRNA-seq data with spatial transcriptomic data (Zhang et al. 2022), expanding application scenarios and research depth of our method.The optimal results are highlighted in bold font.The optimal results are highlighted in bold font.

Figure 1 .
Figure 1.Overall framework of BERMAD.The basic architecture is a deep autoencoder with four components and three intermediate outputs.The inherent reconstruction loss of the autoencoder is used to obtain the appropriate low-dimensional representation of the input data, and the transfer loss of three intermediate outputs is used to match the distribution between two batches.Besides, the graph representation that two batches have the same but separate architecture reflects our dual-channel framework.

Figure 2 .
Figure 2. Visualization for batch effect removal on simulated dataset.

Figure 3 .
Figure 3. Visualization for batch effect removal on Pancreas_Same dataset.

Figure 4 .
Figure 4. Visualization for batch effect removal on PBMC dataset.
channel framework.By performing distribution matching between batches in multiple intermediate layers with different dimensions in the autoencoder, BERMAD can consider batch effect in data from different granularities and address the insufficient correction issue in other methods.On the other hand, by inputting data from different batches into different deep autoencoder networks as well as independently training, BERMAD explicitly considers batch specific biological signals in the data, effectively reducing over-correction.

Figure 5 .
Figure 5. Visualization for batch effect removal on Multi_Batch dataset.
Long et al. (2015)he feature transferability of different layers in deep neural network will vary greatly with domain discrepancy, so the performance of single-layer adaptation is often limited.In order to solve this problem,Long et al. (2015)proposed deep adaptive network (DAN).DAN can perform distribution matching between source domain and target domain, where it learns transferable features with statistical guarantees by calculating transfer loss on multiple intermediate layers of the deep neural network.

Table 1 .
Calculation results of evaluation metrics on simulated dataset.

Table 5 .
Calculation results of evaluation metrics for ablation studies.