Journal of Prevention and Infection Control Open Access

  • ISSN: 2471-9668
  • Journal h-index: 6
  • Journal CiteScore: 1.11
  • Journal Impact Factor: 1.04
  • Average acceptance to publication time (5-7 days)
  • Average article processing time (30-45 days) Less than 5 volumes 30 days
    8 - 9 volumes 40 days
    10 and more volumes 45 days

Mini Review - (2021) Volume 7, Issue 2

Bioinformatic Algorithm and Prediction to Control SARS-Cov-2 Spike Protein Variant on COVID-19 Pandemic Progression

Hyunjo Kim1*, Jae-Hoon Song2

1Department of Pharmacology, School of Pharmacy, Yonsei University, South Korea

2Department of Infectious Diseases, Chairman of Asian-Pacific Research Foundation for Infectious and ANSORP, South Korea

Corresponding Author:
Hyunjo Kim
Department of Pharmacology
School of Pharmacy
Yonsei University, South Korea
Tel: +82 10-5447-5369
E-mail: hjkim19@yonsei.ac.kr

Received date: January 19, 2021; Accepted date: February 26, 2021; Published date: March 05, 2021

Citation: Kim H, Song JH (2021) Bioinformatic algorithm and prediction to control SARS-Cov-2 spike protein variant on COVID-19 pandemic progression. J Prev Infec Contr Vol.7 No.2:59

Visit for more related articles at Journal of Prevention and Infection Control

Keywords

COVID-19, SARS-CoV-2 variant/mutant, Spike protein (S), D614G, Tracking alga- rhythms, Predictive stewardship

Introduction

Coronavirus family has caused several human illnesses, the latest caused by SARS-CoV-2, has led to COVID-19 pandemic posing serious threat to global health [1-3]. A SARS-CoV-2 variant encoding a D614G mutation in the viral spike (S) protein has now become the most prevalent form of the virus worldwide, suggesting fitness advantage for the mutant [4-7]. The G614 variant is associated with higher upper respiratory tract viral load, higher infectivity, increased total S protein incorporation into the virion, reduced S1 shedding and a conformational change leading to a more ACE2- binding and fusion- competent state. However, it does not seem to be correlated to increased disease severity or escape neutralizing antibodies [8-12]. The SARS-CoV-2 D614G S protein variant sup- planted the ancestral virus worldwide, reaching near fixation in a matter of months. Here it is shown that D614G was more infectious than the ancestral form on human lung cells, colon cells, and on cells rendered permissive by ectopic expression of human ACE2 or of ACE2 orthologs from various mammals. D614G did not alter S protein synthesis, processing, or incorporation into SARS-CoV-2 particles, but D614G affinity for ACE2 was reduced due to a faster dissociation rate [13,14]. Consistent with this more open conformation, neutralization potency of antibodies targeting the S pro- tein receptor-binding domain was not attenuated. Over the course of the SARS-CoV-2 pandemic, the identified SNPs were seen only once in the dataset, and only four SNPs rose to high frequency reported in GISAID [15,16]. This suggests that D614G confers replication advantage to SARS- CoV-2 increases the likelihood of humanto- human transmission, which would support this hypothesis. Future prospective comparisons of D614G transmission to that of D614 seem unlikely given that D614G has gone to near fixation worldwide. However, the SARS-CoV-2 genomes that have been sequenced are only a narrow snapshot of the pandemic and additional sequencing of archived samples might pinpoint the origin of D614G or better resolve the variant’s trajectory. This D614G is associated with increased viral load in people with COVID-19 although these studies quantitated SARS-CoV-2 RNA and did not measure infectious virus.

When the SARS-CoV-2 S protein RBD is in its closed conformation, the binding site for ACE2 is physically blocked [17]. Models of coronavirus S-mediated membrane fusion describe ACE2 binding to all three RBD domains in the open conformation as destabilizing the pre-fusion S trimer, leading to dissociation of S1 from S2 and promoting transition to the post-fusion conformation.

According to these models, the well-populated all- open conformation of D614G would reflect an intermediate that is on-pathway to S-mediated membrane fusion [18]. Therefore, structural biology provides key insights into 3D structures, critical residues/mutations in SARS-CoV-2 proteins, implicated in infectivity, molecular recognition and susceptibility to a broad range of host species [19].

The detailed understanding of viral proteins and their complexes with host receptors and candidate epitope/lead compounds is the key to developing a structure-guided therapeutic design. Since the discovery of SARS-CoV-2, several structures of its proteins have been determined experimentally at an unprecedented speed and deposited in the Protein Data Bank [20,21]. Further, specialized structural bioinformatics tools and resources have been developed for theoretical models, data on protein dynamics from computer simulations, impact of variants/mutations and molecular therapeutics. Here, we provide an overview of ongoing efforts on developing structural bioinformatics algorithms and predictions for COVID-19 research. We also discuss the impact of these resources and structure-based studies, to understand various aspects of SARS-CoV-2 variant infection and therapeutic development. These include understanding differences between SARS-CoV-2 and SARS- CoV-2 variant, leading to increased infectivity of SARS-CoV-2, and deciphering key residues in the SARS-CoV-2 involved in variant as well.

Methods

Coronavirus sequences and structures analysis

Amino acid sequences and mutant data of the S protein used in the analysis were obtained from NCBI GenBank and GISAID. After the first complete genome sequence of SARS-CoV-2 released on NCBI GenBank (accession number: NC 045512.2), there has been a large number of genome sequences as proceeding information is referred in Table 1. The mutant information of whole- genome sequences of S protein with high coverage of SARS-CoV-2 strains from the infected individuals around the world was obtained from the GISAID database (https://www.gisaid.org). Sequence analysis and k-means clustering were described in detail elsewhere. For structural analyses, visualization, analysis and in silico mutations of protein structures were done (see Table 2). First, we downloaded the molecular structure of the spike protein from the Protein Data Bank (PDB). This structure corresponds to that resolved by colleagues and deposited in PDB .

Table 1: The SARS-CoV-2 proteome (NCBI reference genome NC_045512.2)

Gene Protein length Position in Genome Description
Nsp1 180 266-805 Interferes with host mRNA translation and processing.
Nsp2 638 806-2719 Specific function is not known, it may play an auxiliary role to other viral proteins.
Nsp3 1945 2720-8554 Papain-like protease with phosphatase activity. Performs proteolytic cleavage of the polyproteins, membrane arrangements
Nsp4 500 8555-10054 Involved in membrane rearrangements during viral infection.
Nsp5 306 10055-10972 3C-like proteinase that cleave the viral polyprotein to produce the active forms of the nonstructural proteins.
Nsp6 290 10973-11842 Involved in membrane rearrangements during viral infection and autophagy.
Nsp7 83 11843-12091 Forms an hexadecameric complex with nsp8 that helps in viral RNA replication.
Nsp8 198 12092-12685 Forms an hexadecameric complex with nsp8 that helps in viral RNA replication.
Nsp9 113 12686-13024 Binds and protects the viral genome from host degradation during replication.
Nsp10 139 13025-13441 Interacts with nsp14 and nsp16 to perform 3′–5′ exoribonuclease and 2′-O-methyltransferase activities, respectively.
Nsp11 13 13442-13480 Short peptide with potential role in RNA synthesis.
Nsp12 932 13442-16236 RNA-dependent RNA polymerase.
Nsp13 601 16237-18039 Viral RNA helicase.
Nsp14 527 18040-19620 3′-to-5′ exonuclease with proofreading activity.
Nsp15 346 19621-20658 Nidoviral RNA uridylate-specific endoribonuclease (NendoU)
Nsp16 298 20659-21552 2′-O-ribose methyltransferase. Involved in capping of viral mRNA to protect it from host degradation.
S 1273 21563-25384 Spike glycoprotein. Interacts with human ACE2 to enter target cells
M 222 26523-27191 Membrane glycoprotein. Required for viral particle assembly.
N 419 28274-29533 Nucleocapsid protein. Binds viral RNA during viral particle formation.
E 75 26245-26472 gEnvelope protein. Forms ion channels in host ER membranes.
Involved in exaggerated immune response.
ORF3a 275 25393-26220 Form ion channels in the host membrane. Linked to inflammatory, IFN signaling, innate immunity, apoptosis, and cell cycle regulation.
ORF6 61 27202-27387 Viral replication enhancer.
ORF7a 121 27394-27759 Viral replication enhancer. Prevents virus tethering at the plasma membrane by inactivation BTS-2 protein.
ORF7b 43 27756-27887 gEnvelope protein. Forms ion channels in host ER membranes.
Involved in exaggerated immune response.
ORF8 121 27894-28259 Virus replication enhancer.
ORF9b* 97 28284-28580 Expressed from an alternative reading frame in the N gene. Suppresses host antiviral responses by promoting MAVS degradation .
ORF10 38 29558-29674 Potential role in hijacking components of the host ubiquitin- proteasome system (UPS)
ORF14** 73 28734-28946 Expressed from an alternative reading frame in the N gene. Unknown function.

Table 2: Oligonucleotide primers used for amplification of SARS-CoV-2 nucleoprotein gene, a single point mutation in the N gene.

Type Name Sequence Remark
N2-Probe Probe ACAATTTGCCCCCAGCGCTTCAG  
N2-FP Control TTACAAACATTGGCCGCAAA  
  27870fwd GAAACTTGTCACGCCTAAACGAAC  
  28268fwd ACTAAAATGTCTGATAATGGACC  
  28923fwd CTGCTCTTGCTTTGCTGCTGC  
  29338fwd GCATATTGACGCATACAAAAC  
N2-RP Control TTCTTCGGAATGTCGCGCA  
  28943rev GCAGCAGCAAAGCAAGAGCAG  
  29358rev GTTTTGTATGCGTCAATATGC  
  29588rev AGCGAAAACGTTTATATAGCCCATCTG  
  29880pArev TTTTTTTTTTGTCATTCTCCTAAGAAGCTAT T  
SNP C28858T   N-Nterm
  C29200T   N-mid
  C29451T   N-Cterm

Genetic algorithm (GA) application

A population with defined fitness undergoes random mutation, crossover, and selection, and those with high fitness are retained. The individuals in the population are RNA sequences in our case and the fitness function is the number of residues that have the same 2D structure in both the target (provided for fitness calculation) and the predicted structure as determined by the Hamming distance, a metric for comparing two binary data strings. Based on prediction performance on the residue FSE (Frame shifting RNA Element) and computational complexity (see below result section), we choose NUPACK [22] for our GA [23,24]. The Nucleic Acid Package web server (http:// www.nupack.org) currently enables and GAs mimic evolution in nature. The initial population is obtained by randomly assigning nucleotides to the mutation region in the RNA sequence. This population is then subject to iterations of random mutation, crossover, selection, and nominatio.

Nucleotide and amino acid variant detection

We first aligned each of these SARS-CoV-2 sequences using BLAT software [25,26]. After the alignment, we extracted nucleotide sequences corresponding to individual proteins of SARS-CoV-2, translated them to amino acid sequences, and then compared them to reference amino acid sequences. Using the nucleotide mutations, the resulting amino acid mutations throughout the proteome of SARS‐CoV‐2 were determined. The amino acid changes were automatically annotated using the bioinformatics tool and a subsequent run, the resulting proteome from each SARS‐CoV‐2 genome was created and edited using CLC Genomics Workbench 20.0.3 [27]. This bioinformatics software is used by hundreds of microbiology and virology labs around the world for basic research and infectious disease epidemiology. The whole proteome was then aligned for phylogenetic analysis, and for identification of the resulting amino acid mutations.

Energy calculation

The effects of mutations on protein stability of S and binding affinity of RBD with hACE2 were estimated [28] by the folding energy change (G) and the binding energy change (G) between the mutant structure (MUT) and wild-type (WT) structure, respectively [29]. FoldX was used for energy calculations [30,31]. The performance of the FoldX compares favourably with other random- based approaches for protein engineering research including therapeutic antibody design. Particularly, FoldX is widely used for computational saturation mutagenesis in biomedical studies. All protein structures were repaired and stability analysis was performed. The folding energy change was calculated using an Equation

ΔΔG stability = ΔG folding MUT − ΔG folding WT

A negative ΔΔG value suggests that the mutation can stabilize the protein and a positive value indi- cates that it makes the protein unstable. The structure-based tools DUET and CUPSAT were applied to check the reliability of FoldX for protein stability predictions [32,33]. Additionally, interaction analysis carried out and the binding energy change was computed by an Equation [34].

ΔΔΔG binding = ΔΔG binding MUT − ΔΔG binding WT

A negative ΔΔΔG value suggests that the mutation strengthens the binding affinity, whereas a positive value indicates that the mutation weakens the RBD–ACE2 interaction.

TopNetTree model for PPI BFE changes upon mutation

TopNetTree is a recently developed deep learning algorithm that integrates the advantages of convolutional neural networks and gradient-boosting trees [35]. The topology-based network tree (TopNetTree) was constructed by an innovative integration between the topological representation and NetTree for predicting PPI BFE changes following mutation ΔΔG [36,37]. In this work, Top NetTree is applied to predict the BFE changes of mutations that happened on the RBD with ACE2 of SARS-CoV-2. The topology-based feature generation is the first step followed by a convolutional neural network- assisted model as described in more details below result section. The topological representation uses element- and site-specific persistent homology to simplify the structural complexity of protein–protein complexes and encode vital biological information into topological in- variants.

LSTM Network Based on NLP and Infection Rate

We propose a deep learning model based on Long Short-Term Memory (LSTM) for sentiment classification of COVID-19–related comments, which produces better results compared with several other well-known machine-learning methods [38,39]. Deep neural networks have the capacity to fit complex distributions but tend to overfit without sufficient supervision. As infection rate features are based on the growing percentage of each factor, they are stable across time. However, epidemic models based on the infection rate cannot predict policy changes and emergency conditions nor ad- just the prediction with short-term influence. Therefore, we introduce the LSTM network based on Neurolinguistic programming (NLP) features to model the current policy and social media. Then, the short-term flexibility and long-term stability are both ensured.

Statistical data analysis

Fisher’s exact test was used to analyze the enrichment of epitopes and differences of mutation rates of SARS-CoV-2 isolated from different areas. Statistical analysis was carried out using the R statistical environment version 3.6.1. Mutation hotspots were identified as genome sites with two or more occurring mutations; on the other hand, mutation cold spots are those with no occurring mutations. The characterization of nucleotide mutations was done in terms of the nature of the nucleotide substitution (transition or transversion) and insertion and deletions (indel). The mutation densities ( see an Equation) in the genome and proteome of SARS‐CoV‐2 were determined [40].

Mutation density = number of mutations ÷ size of genomic (nt length) or proteomic (aa length) region

Bioinformatic analysis: Data quality control and processing

Read quality control was carried out using FAST-QC and the default parameters [41]. Adapter sequences and low-quality bases were removed. Low-complexity reads, those with a length shorter than 40 bases, and duplicates were excluded using CDHIT- DUP v.4.6.8, where CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. Offtarget reads were then filtered out using Bowtie2 v2.3.4.3 with the default parameters against human genome version GRCh38. p13 (Genome Reference Consortium Human Build 38 patch release 13), and the SILVA (a ribosomal RNA) database as a reference to filter out human DNA and ribosomal sequences [42].

Results

It is reported that increased COVID-19 pandemic severity has not been detected in association with D614G infection [15, 43-45]. Perhaps there are fitness tradeoffs for D614G in vivo due to the more open conformation of its RBD, which potentially renders D614G more immunogenic [44, 46-48]. As shown in (Figure 1) coronavirus mutants are prevalent since the end of 2019 and the location of D614G within the S protein is remote from the receptor-binding domain in keeping with the fact, that D614G affinity for ACE2 is less than that of D614, and that the relatively better-concealed D614 receptor-binding domain is likely to be advantageous for immune evasion, the D614G and D614 variants are equally sensitive to neutralization by human monoclonal antibodies targeting the S protein RBD. If SARS-CoV-2 Spike D614G is an adaptive variant that was selected for increased human-tohuman transmission after spillover from an animal reservoir, one might expect that in- creased infectivity would only be evident on cells bearing ACE2 orthologs similar to that in humans. The increased infectivity of D614G was equally evident on cells bearing ACE2 orthologs from a range of mammalian species. Among these viruses, only SARS-CoV-2 possesses a polybasic furin cleavage site at the S1-S2 junction in the S protein, which is required for SARSCoV- 2 to in- fect human lung cells but not other cell types (Figure 1). The mutation that loosens the spike protein is illustrated in (Figure 2) Spike proteins on SARA-CoV-2 bind to receptors on human cells helping the virus to enter. A spike protein is made up of three smaller peptides in open or closed orientations; when more are open. It’s easier for the protein to bind. The D614G mutation, the results of a single-letter change to the viral RNA code, seems to relax connections between peptides. This makes open conformations more likely and might increase the chance of infection as shown in the top of (Figure 2). Further, variants and combined variants with D614G across the entire S gene excluding the RBD region are more detailed in the bottom of (Figure 2).

infection-control-meta-analysis

Figure 1: Structural mapping of Coronavirus amino acid changes and clusters of variation in the spike pro- tein based on Meta-analysis for 2 decades and COVID-19 respectively.

infection-control-RBD-region

Figure 2: Illustration of Amino Acid Changes Selected for This Study: Variants and combined variants with D614G across the entire S gene excluding the RBD region.

Presence of a novel mutation and a high frequency mutation in SARS‐CoV‐2

SARS-CoV-2 is a RNA coronavirus responsible for the COVID-19 pandemic of the Severe Acute Respiratory Syndrome. RNA viruses are characterized by a high mutation rate, up to a million times higher than that of their hosts. The virus mutagenic capability depends upon several factors, including the fidelity of viral enzymes that replicate nucleic acids, as SARS-CoV-2 RNA dependent RNA polymerase (RdRp). Mutation rate drives viral evolution and genome variability, thereby enabling viruses to escape host immunity and to develop drug resistance (Figure 3).

infection-control-monoclonal-antibodies

Figure 3: Neutralization of SARS-CoV-2 RBD mutants by monoclonal antibodies and position of substitutions conferring neutralization.

The morbidity of SARS-CoV-2 (COVID-19) is a serious public health concern globally and it is enigmatic how several antiviral and antibody treatments were not effective in the different period across the globe [49]. With the drastic increasing number of positive cases around the world WHO raised the importance in the assessment of the risk of spread and understanding genetic modifications that could have occurred in the SARS-CoV-2. Using all available deep sequencing data of complete genome from all over the world (NCBI repository).

In the present work we have compared the SARS-CoV-2 reference genome to those exported from the GISAID database with the aim of gaining important insights into virus mutations, them occurrence over time and within different geographic areas. A total of SARS-CoV-2 sequences were collected from NCBI and GISAID databases and aligned against SARS-CoV-2 reference sequence NC_045512.2 as presented in Tables 1 and 3.

Table 3: Conserved mutations in SARS-CoV-2 genome.

Mutations Amino Acid Change Gene Remark on Type
C to U—n241 - 5′ UTR Non coding
C to U—nt313 No (L16) Nsp1 Synonymous
C to U—nt1059 T85I Nsp2 Missense
G to A—nt1397 V198I Missense
Deletion 1606–1609 D268 deletion Missense
C to U—nt3037 No (F106) Nsp3 Synonymous
C to U—nt8782 No (S76) Nsp4 Synonymous
C to U—9802 No (A416) Synonymous
G to U—9803 No (A417) Synonymous
G to U—nt11083 L37F Nsp6 Missense
C to U—nt14408 P232L Nsp12 Missense
C to U—nt14805 No (Y455) Synonymous
U to C—nt17247 No (R337) Nsp13 Synonymous
A to G—nt23403 D614G S D614 Missense
C to U—nt24034 No (N824) Synonymous
G to U—nt25563 Q57H ORF3a Missense
G to U—nt26144 G251V mRNA targeting
C to U—nt27964 S24L ORF8 Missense
U to C- nt28144 L84S Missense
C to U—nt28311 P13L N Missense
U to C—nt28688 No (L139) Synonymous
GGG to AAC—nt28881-28884 R203K and G204R Missense
G to U—nt29742 - 3′ UTR Non coding

Additionally, Table 2 listed the used primmer for sequencing with the related descriptions.

Structural and function impact of mutations

Upon viral infection, the viral proteins express in the infected cells and are processed into small peptides by proteosomes [50]. These peptides are then presented by HLA molecules on the surface of the infected cells and recognized by T cells through their T cell receptors. Thus, the potential T cell epitopes can be derived from any of the viral structural and non structural proteins. Nucleotide conservation Shannon entropy is a measure of the amount of information (measure of uncertainty). Conservation of each of the four nucleotides has been determined using Shannon entropy. Note that it is assumed log (0) = 0 for smooth calculation of the SE. For a given sequence of length l, the conservation SE (converse) is calculated as follows using an Equation:

Conv_SE = -um i=1-4 pNilog4 (pNi)

where pNi = fi /i; fi represents the of represents the occurrence frequency of a nucleotide Ni in the given sequence [51].

Functional importance of the D614G mutation in the SARS-CoV-2 spike protein

Point mutations in SARS‐CoV‐2 variants: We performed phylogenetic network analysis using the sequences published in GISAID to investigate the frequency of point mutations in SARSCoV- 2 variants [52-55]. These sequences were collected and point mutation were calculated by the phylogenetic network analysis. We also analyzed the locations of these point mutations and observed a higher frequency of point mutations in several locations. In addition to it, we further counted the number of point mutations per gene in order to further analyze the polarization of point mutations in each gene and found more point mutations in ORF-1a and ORF-1b. However, as shown in (Figure 4) open reading frame (ORF)-1a and ORF-1b are much longer than other regions, which may result in more mutations; hence, we estimated the rate of point mutations per 100 bases in each gene (Figure 4). When normalized by gene length, the highest frequency of point mutations occurred in the 5′-untranslated region (UTR) and 3′- UTR. These results indicate that point mutations are present in SARS- CoV-2 variants but they do not cluster within the gene coding regions.

infection-control-point-mutations

Figure 4: Distribution of point mutations in the SARS-CoV-2 genome.

The Impact on viral infectivity and antigenicity: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is an enveloped virus which binds its cellular receptor angiotensin-converting enzyme 2 (ACE2) and enters hosts cells through the action of its spike (S) glycoprotein displayed on the surface of the SARS-CoV-2. Compared to the reference strain of SARS- CoV-2, the majority of currently circulating isolates possess an S protein variant characterized by an aspartic acid-to-glycine substitution at amino acid position 614 (D614G). Residue 614 lies outside the receptor binding domain (RBD) (Figure 5) and the mutation does not alter the affinity of monomeric S protein for ACE2. However, S(G614), compared to S(D614), mediates more efficient ACE2-mediated transduction of cells by S-pseudo-typed vectors and more efficient infection of cells by live SARS-CoV-2. This review article summarizes and synthesizes the epidemiological and functional observations of the D614G spike mutation, with focus on the biochemical and cell-biological impact of this mutation and its consequences for S protein function. We further discuss the significance of these recent findings in the context of the current global pandemic. The spike protein of SARS-CoV-2 has been undergoing mutations and is highly glycosylated. It is critically important to investigate the biological significance of these mutations. Here, we investigated 80 variants and 26 glycosylation site modifications for the infectivity and re- activity to a panel of neutralizing antibodies from convalescent patients [56-59]. D614G, along with several variants containing both D614G and another amino acid change were significantly more infectious. Most variants with amino acid change at receptor binding domain were less infectious, but some variants became resistant to some neutralizing antibodies. These findings could be of value in the development of incrementally modified vaccine and future therapeutic antibodies to quench COVID-19 pandemic eventually.

infection-control-binding-domain

Figure 5: The receptor binding domain (RBD) and mutations on the affinity of monomeric S pro- tein for ACE2.

Predicting the affinity of ACE2 mutants to SARSCoV- 2 S1 protein using fast methods

We predicted the effect of the detected mutations in ACE2 on the affinity of ACE2 variants to S1 by using different computational methods in this section. There are many bioinformatics methods to predict the stability of the protein complex and the affinity between subunits by using various approaches as shown in (Figure 6). The possible source of the variation between the prediction results of thermodynamic-based and other descriptorbased affinity predictors is the interface issue.

infection-control-protein-complex

Figure 6: The bioinformatic methods and to predict the stability of the protein complex.

Long short‐term memory networks: An LSTM network is a subclass of RNNs, trying to circumvent RNNs’ inability to learn to recognize long-term dependencies in the data sequences [38,39,60,61]. It is addressed the latter by presenting the LSTM unit, whereas LSTM networks are constructed by combining several layers of LSTM units [60]. Specially, (Figure 7) shows the structure of an LSTM unit and its sequence across time. Every and each single LSTM unit consists of three gates that operate on the input vector, xt, to generate the cell state, Ct, and the hidden state, ht. From a physical interpretation, the cell state can be viewed as the memory of the cell, while the gates control the flow of information in and out of the memory. In addition to it, the input gate determines the incorporation of new information, the forget gate determines which information should be discarded, and the output gate controls the in- formation that passes along to the next layer. Following the interconnections presented in (Figure 7) the following formulas per category of the variables hold:

infection-control-CoV-2-infection

Figure 7: Schematic representation of SARS‐CoV‐2 infection and tthe structure of an LSTM unit, its sequence across time.

Gating variables

ft = σ(Wf xt + Uf ht−1 + bf ) (1)

it = σ(Wixt + Uiht−1 + bi) (2)

ot = σ(Woxt + Uoht−1 + bo) (3)

Candidate (memory) cell state variable

C t =tanh(Wcxt +Ucht−1 +bc) (4)

Cell and hidden state variables

Ct=ft ◦ Ct−1+it ◦ C t (5)

ht =ot ◦tanh(Ct) (6)

Where {W, U} and b are the learnable weights and bias of the LSTM layer, respectively, for the in- put and the recurrent connections for the input/output/forget gates and cell state; ◦ is the elementwise product of two vectors; σ is a sigmoid function given by σ (x) = (1 + e−x) −1 to compute the gate activation function, whereas the hyperbolic tangent function (tanh) is used to compute the state activation function [62].

Potential ORF3a protein of SARS-CoV-2 possibility on viral immune-pathogenicity: Since none of the accessible SARS-CoV-2 genomes are stream-lined in protein databases like STRING we could not directly get ORF3a (SARS-CoV-2) −human protein interactome. Our data have significantly established the structural resemblance of ORF3a protein between SARS-CoV and SARSCoV- 2 thus conceding further functional prediction. Thus, we deduce the functional pertinence of SARS-CoV-2 putative ORF3a protein from interactome and pathway enrichment analysis of SARS-CoV. The aforementioned neutralizing antibody escape mutations were artificially generated during in vitro replication of a recombinant virus. However, as monoclonal antibodies are developed for therapeutic and prophylactic applications, and vaccine candidates are deployed, and the possibility of SARS-CoV-2 reinfection becomes greater, it is important both to understand pathways of antibody resistance and to monitor the prevalence of resistance-conferring mutations in naturally circulating SARS-CoV-2 populations [63-65]. We used the GISAID and CoV-Glue, SARS-CoV-2 databases to survey the natural occurrence of mutations that might confer resistance to the mono- clonal and plasma antibodies used in our experiments. Among the SARS-CoV-2 sequences in the CoV2-Glue database at the time of writing, different non-synonymous mutations were present in natural populations of SARS-CoV-2 S protein sequences. Consistent with the finding that none of the mutations that arose in our selection experiments gave an obvious fitness deficit, most were also present in natural viral populations. CoV-GLUE is an online web application for the interpretation and analysis of SARS- CoV-2 virus genome sequences, with a focus on amino acid sequence variation [66]. It is based on the GLUE data centric bioinformatics environment and provides a brows able database of amino acid replacements and coding region indels that have been observed in sequences from the pandemic. Users may also analyse their own SARS-CoV-2 sequences by submitting them to the web application to receive an interactive report containing visualisations of phylo- genetic classification and highlighting genomic variation of potentially high impact, for example linked to primer mismatches (see Table 2 as reference). Position of neutralization resistance conferring substitutions. Structure of the RBD with positions that are occupied by amino acids where mutations were acquired during replication in the presence of each monoclonal antibody indicated (Figure 8). Additionally (Figure 9) showed the position and frequency of S amino acid substitutions in SARS-CoV-2 S. We identified and analysed the amino acid mutations that gained prominence worldwide from the early months of the pandemic. Eight mutations have been identified along the viral genome, mostly located in conserved segments of the structural proteins and showing low variability among coronavirus, which indicated that they might have a functional impact. At the moment of writing this paper, these mutations present a varied success in the SARS-CoV-2 virus population; ranging from a change in the spike protein that becomes absolutely prevalent.

infection-control-amino-acids

Figure 8: Structure of the RBD with positions occupied by amino acids where mutations were ac- quired during replication.

infection-control-key-residues

Figure 9: Structural representation of the position and frequency of S amino acid substitutions in SARS-CoV-2 S key residues.

Future selection of combinations of monoclonal antibodies for therapeutic and prophylactic applications

The ability of SARS-CoV-2 monoclonal antibodies and plasma to select variants that are apparently fit and that naturally occur at low frequencies in circulating viral populations suggests that therapeutic use of single antibodies might select for escape mutants. As shown in (Figure 10) the schematic diagram presented on implications of new antiviral agents or vaccines. We tested whether combinations of monoclonal antibodies could suppress the emergence of resistant variants to mitigate against the emergence or selection of escape mutations during therapy, or during population based prophylaxis [67].

infection-control-antiviral-agents

Figure 10: The schematic diagram presented on implications of new antiviral agents or vaccines.

Miscellaneous comment on recent spike protein changes: As seen on many occasions before, mutations are naturally expected for viruses and are most often simply neutral regional markers useful for contact tracing. The changes seen have rarely affected viral fitness and almost never affected clinical outcome but the detailed effects of these mutations remain to be determined fully. Changes in the spike protein have relevance for potential effects on both host receptor as well as antibody binding with possible consequences for infectivity, transmission potential and antibody and vaccine escape. Actual effects need to be measured and verified experimentally and GISAID reports updates on spike mutations of recent submissions via gisaid.org/ spike and any sequence can be tested for spike mutations via gisaid. org/covsurver and from the internal analysis interface where individual countries/regions and time periods can be selected for custom analysis. This allows highlighting and tracking the rise of mutations like D614G or the currently most common receptor binding mutations as well as combinations of these mutations with deletions altering the spike protein surface [68, 69].

As shown in (Figure 11) it has become evident, these few S gene mutations and some deletions are found in multiple genomic contexts (different clades in different countries) that may be an early indication for some potential advantage for these viruses but needs to be verified and does not necessarily mean change in clinical severity or transmission efficiency.

infection-control-possible-implications

Figure 11: The Schematic representation to depict the possible implications of mutations in the nu- cleocapsid (N protein) of SARS-CoV-2.

Discussion

The severity of COVID-19 greatly varies from patient to patient. Majority of the patients either remain asymptomatic or develop mild to moderate symptoms. However, some COVID-19 patients who develop severe disease die even after hospitalization and intensive care. Why the disease severity differs so much from one person to another is one of the mysteries scientists are still trying to solve. The present study was designed to explore whether genetic variation in SARS-CoV-2 can explain variable severity of COVID-19. Mutation profiles of SARS-CoV-2 isolated from mildly affected and severely affected COVID-19 patients were explored and compared. Among numerous mutations observed in this study, two missense mutations, affecting RdRp and spike protein genes.

Respectively, were found most predominantly in the severely affected group compared with mildly affected group. Along with these two mutations in the 5′ UTR and a silent mutation in the ORF1ab were predominantly found in severely affected group the later not significantly [70] however, these mutations do not alter amino acid sequence in a protein. Many other mutations that were found in low frequency in the present study are unlikely to exert an effect on the severity of COVID-19. Therefore, the ability of spike protein and RdRp mutations on the severity of COVID-19 needs to be considered.

Conclusion

Ongoing efforts by the worldwide variants of SARS-CoV-2 are providing valuable insights into the structural mechanisms of action between bioinformatics algorithms and predictions.

Structural biology can help explain the effect of amino acid variations on interactions with other proteins, leading to changes in the infection rate, associated symptoms and so on. This knowledge, when combined with structure guided efforts to design stable vaccine antigens provides an important foundation for countering the impacts of the disease. These vaccines/drugs should target the regions of the protein which do not mutate fast. Therapeutics against regions with high mutation propensities will make them strain specific. Bioinformatics guided approaches provide an important framework for understanding the increased virulence of this pathogen and for designing therapeutics and will be important for understanding the emergence of drug resistance and antibody resistant variants/mutations of SARS-CoV-2.

Conflicts of Interest

No potential conflict of interest relevant to this article was reported.

Acknowledgments

We gratefully acknowledge the authors, originating and submitting laboratories of the sequences from GenBank and GISAID’s hCoV- 19 database on which this research is based.

References