iACVP

Introduction

Computational prediction of anticoronavirus peptides (ACVPs) using their sequences.
The COVID-19 pandemic caused several million deaths worldwide. Development of anti-coronavirus drugs is thus urgent. Unlike conventional non-peptide drugs, antiviral peptide drugs are highly specific, easy to synthesize and modify, and not highly susceptible to drug resistance. To reduce the time and expense involved in screening thousands of peptides and assaying their antiviral activity, computational predictors of anti-coronavirus peptides (ACVPs) are needed. However, few experimentally verified ACVP samples are available, even though a relatively large number of antiviral peptides (AVPs) have been discovered. We challenged to predict ACVPs using an AVP dataset and small collection of ACVPs. Using conventional features, a binary profile, and a word-embedding word2vec (W2V), we systematically explored five different machine learning methods: Transformer, Convolutional Neural Network, bidirectional Long Short-Term Memory, Random Forest (RF), and Support Vector Machine. Via exhaustive searches, we found that the RF classifier with W2V consistently achieved better performance on different datasets. The two main controlling factors were (i) the dataset-specific W2V dictionary was generated from the training and independent test datasets instead of the widely used general UniProt proteome, and (ii) the optimal k-mer value in W2V was determined using a systematic search. The optimal W2V provided greater discrimination between positive and negative samples. Therefore, our proposed method, named iACVP, consistently provides better prediction performance compared with existing state-of-the-art methods.