Abstract
Computational identification of translation initiation sites (TISs) has been of great importance in gene discovery and gene loci annotation because it predicts the start of protein coding regions. Many methods have been developed to identify TISs from cDNA and mRNA sequences, but much less work has considered TIS recognition directly from genomic DNA. In addition, to provide an insight into TIS signals conserved between distantly related eukaryotic species, the authors developed a human TIS recognition model that, when applied without modifications to TIS prediction in Arabidopsis thaliana genome, produced an accuracy of over 83 percent. When the model was trained on A. thaliana data, the resulting accuracy increased to 91 percent.
Their results suggest that in spite of the considerable evolutionary distance between Homo sapiens and A. thaliana, our approach successfully recognized deeply conserved genomic signals that characterize TIS. Moreover, they report the highest accuracy of TIS recognition in A. thaliana DNA genomic sequences.