Abstract
Protein sequence comparison is the most powerful tool for the identification of novel protein structure and function. This type of inference is commonly based on the similar sequence-similar structure-similar function paradigm, and derived by sequence similarity searching on databases of protein sequences. As entire genomes have been being determined at a rapid rate, computational methods for comparing protein sequences will be more essential for probing the complexity of molecular machines. In this paper we introduce a pattern-comparison algorithm, which is based on the mathematical concept of linear-predictive-coding based cepstral distortion measure, for comparison and identification of protein sequences. Experimental results on a real data set of functionally related and functionally non-related protein sequences have shown the effectiveness of the proposed approach on both accuracy and computational efficiency.