Abstract
The Bi-Lingual and Mono-Lingual Corpora Information relating to numerous Languages may be duplicated. This leads to slow and inaccurate search results from Bi-Lingual and Mono-Lingual databases. It is essential to structure the Sequences in a fashion that reduces the redundant sequence structure so that the analysis of Bi-Lingual and Mono-Lingual Corpora structure is accurate to help in analyzing the features of certain complex and subjective languages. The detection will lead to the selection of right solution from large Corpora's.
In this paper, we present an algorithm (we call it DSDR) that operates on a set of Bi-Lingual and Mono-Lingual Corpora and iterates in the same set to find all possible duplications present in the set. Once the duplications are found, the DSDR removes duplicated Chains and refreshes the databases resulting in remarkable reductions in the sizes of the databases. In addition, the speed of searches of certain Chains from Bi-Lingual and Mono-Lingual Corpora becomes quite fast and accurate.