SARS-CoV-2 3D database: understanding the coronavirus proteome and evaluating possible drug targets

Abstract The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a rapidly growing infectious disease, widely spread with high mortality rates. Since the release of the SARS-CoV-2 genome sequence in March 2020, there has been an international focus on developing target-based drug discovery, which also requires knowledge of the 3D structure of the proteome. Where there are no experimentally solved structures, our group has created 3D models with coverage of 97.5% and characterized them using state-of-the-art computational approaches. Models of protomers and oligomers, together with predictions of substrate and allosteric binding sites, protein-ligand docking, SARS-CoV-2 protein interactions with human proteins, impacts of mutations, and mapped solved experimental structures are freely available for download. These are implemented in SARS CoV-2 3D, a comprehensive and user-friendly database, available at https://sars3d.com/. This provides essential information for drug discovery, both to evaluate targets and design new potential therapeutics.

Bridget Bannerman is a postdoc at the Molecular Immunity Unit, Department of Medicine University of Cambridge, MRC Laboratory of Molecular Biology. Her research focuses on developing predictive models for various pathogenic micro-organisms, reviewing treatment management strategies for SARS-CoV-2 and designing tools and strategies for surveillance of antimicrobial resistance.
Sundeep Chaitanya Vedithi is Research Director of the American Leprosy Mission and leads a group of postdoc in the Department of Biochemistry, University of Cambridge, focusing on bioinformatics and drug discovery for Mycobacterium leprae.
Pedro Torres is a Professor at the Laboratório de Modelagem e Dinâmica Molecular, Instituto de Biofísica Carlos Chagas Filho, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brasil. His research focuses on bioinformatics tools for proteomic databases and virtual screening and docking for early drug discovery.
Tom Blundell is a Professor at the Department of Biochemistry, University of Cambridge. His research focuses on structural biology, bioinformatics and drug discovery for cancer and mycobacterial infections.

Figure S1
The Nsp2 protein structure shown in green Papain-like proteinase (Nsp3) Nsp3 is the largest SARS CoV-2 protein, and is a multidomain polypeptide. PL pro , part of Nsp3, cleaves the N-terminus of the replicase polyprotein at a specific site to produce a functional protein. PL pro is essential and considered one of the most useful drug targets [1]. The modeled structure was built using multiple templates (PDB ID: 2GRI_A, 6VXS_A, 2W2G_A, 6W9C_A, 2K87_A, 3GA8_A, 1YX1_A, 6ORH_B, 1QWG_A, 3C8F_A, 1HA8_A) with a MolProbity score of 3.22. Figure S2 the assembly of multidomain homotrimer modelled Nsp3 structure.

Non-structural protein 4 (Nsp4)
Nsp4 is a transmembrane protein, vital for the host membrane rearrangement, also necessary for viral replication [2]. The modeled structure was built using multiple templates (PDB ID: 1BCP_F, 3VC8_A, 3A7K_A, 1T70_A) with a MolProbity score of 3.18.

Figure S3
The Nsp4 homodimer transmembrane structure shown in green and white; the membrane is represented as a red, blue circular structure.

Proteinase 3CL-PRO Main protease (nsp5)
The main protease (M pro , 3CL pro , nsp5) Cleaves the C-terminus of replicase polyprotein to functional proteins. The nsp5 very well known as a drug target. One of the best-characterized drug targets among coronaviruses is the main protease. This target has been solved experimentally, no model has been built.
Non-structural protein 6 (nsp6) It's a transmembrane protein, that has seven putative transmembrane helices. Plays important roles in host membrane rearrangement. The model structure has been downloaded from AlphaFold website [3]. To our knowledge this could be the closest 3D structure model that could describe the nsp6 protein, for this reason we have implemented into our database.

Figure S5
Alphafold prediction of nsp3 transmembrane in green the membrane is represented as a red, blue circular structure.
Non-structural protein (nsp7, nsp8, nsp12, nsp13) The RNA-dependent RNA polymerase (RdRp) proteins are very essential for SARS CoV-2 genome replication. It consists of nsp12, nsp13, and two accessory proteins nsp7, and nsp8. This complex has been solved experimentally (PDB ID;6YYT) [4], however not with full amino acid coverage. The modeled structure built using multiple templates (PDB ID: 6M5I, 6YYT, 7C2K) with a MolProbity score of 2.78. Figure S6 the RNA-polymerase is shown in green in complex with the homodimer of Nsp13 shown in white, and light red, and the homodimer of Nsp8 highlighted in yellow and cyan, whereas nsp7 shown in magenta. The ligands bound to Nsp13 are shown in stick green and cyan. The double stranded DNA, which passes through the complex, is coloured in light orange.

Non-structural protein 10 (Nsp10)
Nsp10 plays essential roles in viral transcription by interacting with other Nsp proteins, such as nsp10. The complex of Nsp10-Nsp16 has been solved experimentally (PDB ID: 6W75). In addition we have built the homo-12-mer based on (PDB ID: 2G9T) with MolProbity score of 2.59

Figure S7
Monomeric modeled structure of Nsp10 protein, with zinc atoms shown as gray spheres.

Structural proteins: Surface glycoprotein (S)
The Spike protein is located on the surface of the virus, available for interaction with human receptors such as ACE2 on order to facilitate virus entry to the human cells. The closed conformation has been modelled using multiple templates (PDB ID:

Nucleoprotein (N)
The Nucleoprotein (N) plays a vital role in enhancing viral transcription and replication, It also interact with viral membrane proteins [5]. The process of building this model is described in the Methods section of this manuscript. The modeled structure was built using multiple templates (PDB ID: 6M3M_A, 2CJR_A, 6K12_A, 1F15_A, 5NP3_A) with a MolProbity score of 3.17.

Figure S10
Homodimer model of the Nucleoprotein (N) protein structure, with the equivalent two chains coloured green and grey in order to facilitate visualization of interactions.
Accessory proteins (ORF3a, ORF6, ORF7a, ORF7b, ORF8, ORF10) ORF3a is the largest accessory protein with 275 amino acids. Its transmembrane protein is located between the Spike and the Envelope proteins. The modeled structure was built using multiple templates (PDB ID: 6XDC, 4BKW_A, 5ZBE_A, 5L2F_A) with a MolProbity score of 2.44. The model of ORF7a, another transmembrane protein, was built using multiple templates (PDB ID: 1XAK_A, 1YO4_A, 5XSY_B, 6W37_A) with a MolProbity score of 3. ORF7b, a single pass membrane protein, was modeled based on (PDBID: 5XSY_B) with MolProbity of 2.51. ORF6 suppress host innate immune activation [6]. The modeled structure built using multiple templates (PDB ID: 4GQT_A) with a MolProbity score of 2.37. ORF8 may play in viral interactions. The modeled structure built using multiple templates (PDB ID: 6P65_A, 5O32_I, 5L74_A) with a MolProbity score of 1.95. ORF10 is the smallest SARS CoV-2 protein.
ORF9b play role in inhibition of host innate immune response. the modeled structure built using (PDBID: 6Z4U) with MolProbity score 2.3.

Figure S11
Accessory proteins structure. The transmembrane homodimer modelled structure of ORF 3a protein coloured green and white. Transmembrane monomer modelled structure of ORF7a, ORF7b coloured in green. All the membrane is represented as a red, blue circular structure. ORF6, ORF8, and ORF10 all modeled as monomer and shown in green.

Figure S12
Structural change upon mutations impact for main protease. (A) Nsp5 V20N represented by sticks the wild type structure coloured in cyan, and the mutant structure coloured in green. (B) Nsp5 V148D represented in sticks, the wild type coloured in cyan, whereas the mutant structure coloured in green.