A Pattern Matching Approach for Redundancy Detection in Bi-lingual and Mono-lingual Corpora

Muneer Ahmad; Hassan Mathkour

Back

Conference proceeding

A Pattern Matching Approach for Redundancy Detection in Bi-lingual and Mono-lingual Corpora

Muneer Ahmad and Hassan Mathkour

IMECS 2009: INTERNATIONAL MULTI-CONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, pp.526-531

Lecture Notes in Engineering and Computer Science

01/01/2009

Abstract

Computer Science

Computer Science, Artificial Intelligence

Engineering

Engineering, Multidisciplinary

Science & Technology

Technology

The Bi-Lingual and Mono-Lingual Corpora Information relating to numerous Languages may be duplicated. This leads to slow and inaccurate search results from Bi-Lingual and Mono-Lingual databases. It is essential to structure the Sequences in a fashion that reduces the redundant sequence structure so that the analysis of Bi-Lingual and Mono-Lingual Corpora structure is accurate to help in analyzing the features of certain complex and subjective languages. The detection will lead to the selection of right solution from large Corpora's. In this paper, we present an algorithm (we call it DSDR) that operates on a set of Bi-Lingual and Mono-Lingual Corpora and iterates in the same set to find all possible duplications present in the set. Once the duplications are found, the DSDR removes duplicated Chains and refreshes the databases resulting in remarkable reductions in the sizes of the databases. In addition, the speed of searches of certain Chains from Bi-Lingual and Mono-Lingual Corpora becomes quite fast and accurate.

Metrics

1 Record Views

Details

Title: A Pattern Matching Approach for Redundancy Detection in Bi-lingual and Mono-lingual Corpora
Creators - without role: Muneer Ahmad - King Saud University
Hassan Mathkour - King Saud University
Contributors - without role: O Castillo
C Douglas
D D Feng
J A Lee
Publication Details: IMECS 2009: INTERNATIONAL MULTI-CONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, pp.526-531
Series: Lecture Notes in Engineering and Computer Science
Publisher: Int Assoc Engineers-Iaeng
Number of pages: 6
Identifiers: 9952641008331
Academic Unit: King Saud University
Language: English
Resource Type: Conference proceeding