AcrNET: predicting anti-CRISPR with deep learning

Abstract Motivation As an important group of proteins discovered in phages, anti-CRISPR inhibits the activity of the immune system of bacteria (i.e. CRISPR-Cas), offering promise for gene editing and phage therapy. However, the prediction and discovery of anti-CRISPR are challenging due to their high variability and fast evolution. Existing biological studies rely on known CRISPR and anti-CRISPR pairs, which may not be practical considering the huge number. Computational methods struggle with prediction performance. To address these issues, we propose a novel deep neural network for anti-CRISPR analysis (AcrNET), which achieves significant performance. Results On both the cross-fold and cross-dataset validation, our method outperforms the state-of-the-art methods. Notably, AcrNET improves the prediction performance by at least 15% regarding the F1 score for the cross-dataset test problem comparing with state-of-art Deep Learning method. Moreover, AcrNET is the first computational method to predict the detailed anti-CRISPR classes, which may help illustrate the anti-CRISPR mechanism. Taking advantage of a Transformer protein language model ESM-1b, which was pre-trained on 250 million protein sequences, AcrNET overcomes the data scarcity problem. Extensive experiments and analysis suggest that the Transformer model feature, evolutionary feature, and local structure feature complement each other, which indicates the critical properties of anti-CRISPR proteins. AlphaFold prediction, further motif analysis, and docking experiments further demonstrate that AcrNET can capture the evolutionarily conserved pattern and the interaction between anti-CRISPR and the target implicitly. Availability and implementation Web server: https://proj.cse.cuhk.edu.hk/aihlab/AcrNET/. Training code and pre-trained model are available at.

information of protein fully.We first consider the traditional secondary structure that can be divided into three classes, namely two regular types alpha-helix (H), beta-strand (E), and one irregular type coil region (C) [6].Then, we further consider the extended secondary structure with eight classes, namely 3 10 helix (G), alpha-helix (H), pi-helix (I), betastrand (E), beta-bridge (B), beta-turn (T), high curvature loop (S), and irregular (L) [2].These two types of secondary structure features are transformed into matrices with shapes of  × 3 and  × 8 respectively using one-hot encoding techniques, which are then combined to consider more secondary structure information.In our implementation, we adopt the RaptorX tool [3] to predict secondary structure.
Author's address: 1 Solvent accessibility.Three states of solvent accessibility are derived from two thresholds, namely buried (0-10%), medium (11%-40%), and exposed (41%-100%).These features are then encoded into an  × 3 matrix and appended to the matrices of the secondary structure mentioned in the previous part.Similarly, the RaptorX tool in Källberg et al. [3] is utilized to calculate the solvent accessibility information.
Transformer feature.Finally, we utilize the ESM-1b Transformer to calculate Transformer features, which trained a 33-layer Transformer model on the UR50/S dataset with 250 million sequences by comparing the results of Transformer models with different sizes and training datasets pretrained by the authors of [7,8].This Transformer module consists of 33 encoder blocks, each of which contains a multi-headed self-attention unit and a feed-forward network unit.Specifically, the objective function is designed as follows to minimize the masked language modeling (MLM) loss: where  represents a protein sequence;  is the set of mask indices;   denotes a protein sequence with mask token at index .In our implementation, the outputs of the last encoder block are used.To deal with the issue of unequal protein sequence lengths, we calculate the mean values of the hidden states of all tokens and recorded them as the Transformer features.
Implementation details of AcrNET..Here we list the details of our model.
Hardware.The proposed AcrNET, which mainly consists of CNN and FCN, is implemented with Python 3.7 and PyTorch 1.8 [5], and is trained on NVIDIA GeForce RTX 3090.
Architecture details.The one-hot encoded inputs, such as sequence, secondary structure, and relative accessibility, are concatenated together, then further processed by a 2D CNN to learn more high-level and informative features.For convenience, the kernel width of the CNN module is set the same as that of the concatenated feature.Max-pooling is connected after the CNN layer.The four evolutionary features and Transformer features were first injected into a 2-layer FCN, then concatenated with the high-level features learned from the one-hot encoded features via CNN.
The concatenated features are finally inputted into another FCN layer, which has two-dimensional outputs for the Acr prediction task while has five-dimensional outputs for the Acr classification task.The detailed architecture is shown in Table 1.
Training.During the training process, the batch size is set as 16, the number of epochs is 3000, and the learning rate is set as 0.001.We picked the number of epochs from 30, 300, 3000, and the learning rate from 0.1, 0.01, 0.001, 0.0001.We utilize the Adam Implementation details of baselines.Here we provide the implementation of the methods that we compare with.
AcRanker.The author provides the code on https://github.com/amina01/AcRanker.We simply adopt the implementation from their code.
PaCRISPR.Thanks to the author of PaCRISPR, they provided their source code for 5-fold cross-validation and cross-dataset tests and patient guidance.

Motif.
Here we provide more motif results.We selected sub-classes with sample size greater than or equal to 20 for motif finding.Finally, six sub-classes are selected.They are AcrIIA7(499 samples), AcrIIA8(48 samples), AcrIIA9(194 samples), AcrIIA11(20 samples), AcrIF11(68 samples) and AcrID(46 samples).All the motifs are found by MEME [1] with default setting, and according to the setting, three motifs are found for each class.The results are shown in Fig. 2-7.12. Detailed class prediction performance comparison.We adopt the one-vs-rest strategy for AcRanker and PaCRISPR, converting binary classification methods to five-class classification methods, and compare their performance on the class prediction problem with AcrNET ("mi":micro-average, "ma": macro-average).AcrNET outperforms the other methods across all the evaluation criteria significantly and consistently, especially on macro-average, suggesting that AcrNET is an unbiased predictor for small classes.
Results in this table are averaged over 10 different random seeds in our experiments.

[ 4 ]
provided by PyTorch as the optimizer to train the model.3000 epochs complete in about 179.22 seconds and inference time is 0.013 seconds.A total of 2256 sequences require 202.62 seconds to compute Transformer features.ESM-1b was trained for 56 epochs, 8.5 hours on 64 GPUs for each epoch.To deal with the imbalance issue in the Acr classification tasks, we select each sequence with different weights to ensure each class has the same probability of being sampled.

Table 1 .
The detailed architecture of AcrNET in prediction and classification problems.DeepAcr.The authors of DeepAcr provided source code on https://github.com/BackofenLab/DeepAcr.With their model architectures of LSTM, Linear and GRU and the same combination of the models in their evaluation code, we used Adam from PyTorch as the optimizer with the learning rate 0.001, batch size 30 (same as the source code) and epoch 75.We trained each model with our data and used the mean value of the models for evaluation.Gussow et al.Since the features in the paper are obtained in biological experiments, which is beyond our ability, we only use the dataset from their paper, and apply five-fold cross-validation test.The authors provided well-implemented source code on https://github.com/gussow/acr.
3 CROSS-DATASET TEST WITH SEPARATION 2 AND 3

Table 2 .
Cross-dataset test results of anti-CRISPRs prediction with separation 2.

Table 3 .
Cross-dataset test results of anti-CRISPRs prediction with separation 3.

Table 4 .
Five-fold cross-validation test results with 40% similarity dataset.We used dataset with 40% and 70% similarity and compared the performance.We used the same training and testing dataset split as previous experiments.AcrNET also outperforms other methods, the results are consistent with the previous performance.

Table 5 .
Cross-dataset test results with 40% similarity dataset and separation 1.

Table 6 .
Cross-dataset test results with 40% similarity dataset and separation 2.

Table 7 .
Cross-dataset test results with 40% similarity dataset and separation 3.

Table 9 .
Cross-dataset test results with 70% similarity dataset and separation 1.

Table 10 .
Cross-dataset test results with 70% similarity dataset and separation 2.

Table 11 .
Cross-dataset test results with 70% similarity dataset and separation 3.