Abstract
In Molecular Biology, biological macromolecules, like DesaxyriboNucleic Acides (DNA) and proteins are coded by strings, called primary structures. For long time, Biologists gather these primary structures in large databases. Now, they focus on analyzing these primary, structures in order to extract useful knowledge. Data Mining approaches can be helpful to reach this goal, In this paper, we present a data mining approach based on Machine Learning techniques to do classification of biological sequences. B using our approach, we proceed within four steps : (i) During the first step, we construct the set of all the discriminant substrings, called Discriminant Descriptor (DD), associated with each family of primary structures, This construction is made thinks to an adaptation of the Karp, Miller and Rosenberg (KMR) algorithm. (ii) During the second step, we use the DDs constructed during the First step to code the families of primary structures by a table of examples versus attributes, called context. (iii) During the third step, we extract knowledge from the context constructed during the second step and represent it by production rules. This extraction is made by using an incremental. production rule approach. (iv) Finally, during the last steps we use the obtained production rules to do classification of primary structures.