On Identifying Minimal Absent and Unique Words: An Efficient Scheme

Aqil M. Azmi

doi:10.1007/s12559-016-9385-9

Back

On Identifying Minimal Absent and Unique Words: An Efficient Scheme

Journal article

Peer reviewed

On Identifying Minimal Absent and Unique Words: An Efficient Scheme

Aqil M. Azmi

Cognitive computation, Vol.8(4), pp.603-613

01/08/2016

DOI: https://doi.org/10.1007/s12559-016-9385-9

Abstract

Computer Science

Computer Science, Artificial Intelligence

Life Sciences & Biomedicine

Neurosciences

Neurosciences & Neurology

Science & Technology

Technology

One of the basic tasks in genomic research is the analysis of a sequence. An absent word in a sequence is a substring that does not occur in the given sequence. Many studies looked into finding the shortest absent words, with some recent studies noting that longer absent words are also of interest. A simple extension of the shortest ones is impractical as the list tends to grow exponentially in the size of the sequence. A better choice is the minimal absent words, since these are known to grow linearly in the size of the sequence. An absent word is minimal if none of its proper factors is missing in the sequence. Similarly, it is (left-fixed) minimal unique if none of its proper prefixes is unique. In this paper we present an efficient algorithm that discovers all words up to a user-specified length that are either minimal absent or are left-fixed minimal unique in the input sequence. We employ a purely deterministic approach which guarantees nothing is overlooked. At each successive iteration, the algorithm works on larger words using a simple list structure for all the operations. Theoretically, the algorithm has a space complexity that is linear with the size of input sequence, while the time bound scales well with alphabet size. Experimental results using real biological sequences and randomly generated ones using different-sized alphabets show that the algorithm has a linearity in time behavior.

Metrics

1 Record Views

See more details

Details

Title: On Identifying Minimal Absent and Unique Words: An Efficient Scheme
Creators - without role: Aqil M. Azmi - King Saud University
Publication Details: Cognitive computation, Vol.8(4), pp.603-613
Publisher: Springer Nature
Number of pages: 11
Grant note: special fund in the research center of College of Computer and Information Sciences (CCIS) at King Saud University
Identifiers: 9947445608331
Academic Unit: King Saud University
Language: English
Resource Type: Journal article