BERT6mA

help

You can make predictions in the following flow

1. Inputting DNA sequences for prediction
This webserver system predicts whether the central adenine is 6mA based on the distribution of nucleotides on both sides. This system provides two types of ways for inputting DNA sequences: “quick prediction” and “upload file”. In the “quick prediction”, you need to paste IDs and DNA sequences onto the entry box with fasta format. If you don't know how to paste, click on the example button. In the “upload file”, you need to upload a fasta file that is less than 10MB in size. You can download an example file from here.


2. Selecting a model
Using this webserver, you can predict 6mA in 11 species. Please select the species for the DNA sequence you entered from the follows.
A. thaliana
C. elegans
C. equisetifolia
D. melanogaster
F. vesca
H. sapiens
R. chinensis
R. chinensis (Pre-trained by F. vesca; -20_20 sequence window only)
S. cerevisiae
T. thermophile
Ts. SUP5-1
Xoc. BLS256


3. Sequence window
DNA sequences with the following sequence window need to enter.

-20_20 Sequence window including a target adenin and the upstream and downstream 20 nucleotides.
The length of the input DNA sequence must be just 41 bp and the central nucleotide must be adenine.
-15_15 Sequence window including a target adenin and the upstream and downstream 15 nucleotides.
The length of the input DNA sequence must be just 31 bp and the central nucleotide must be adenine.
-10_10 Sequence window including a target adenin and the upstream and downstream 10 nucleotides.
The length of the input DNA sequence must be just 21 bp and the central nucleotide must be adenine.


4. Obtaining prediction results
When you are done the selection, click the submit button to start the prediction. Don't forget the Job ID that will be displayed when the screen transitions. Depending on the number of DNA sequences that you entered, the prediction may take some time. You can access the results by "job retrieval" using your job ID. Please note that the results will be deleted in about 24 hours!
You can download attention weights as well as probability scores and predictive labels. If the score is higher than a threshold, a "6mA label" is given; otherwise, a "non-6mA" label is given as a predictive label. The threshold to determine the label is set to 0.5. The results are provided in a CSV and pickle file.



5. Attention analysis
The attention weights are provided by a file in pkl format. The dimension of the weights is five (sample, layer, head, sequence, sequence). A sample program to convert the file in pkl format to CSV format is offered from here