A Hybrid Stemmer of Punjabi Shahmukhi Script

Abdul Mateen; M. Kamran Malik; Zubair Nawaz; H. M. Danish; M. Hassan Siddiqui; Qaiser Abbas

Back

Journal article

A Hybrid Stemmer of Punjabi Shahmukhi Script

Abdul Mateen, M. Kamran Malik, Zubair Nawaz, H. M. Danish, M. Hassan Siddiqui and Qaiser Abbas

INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, Vol.17(8), pp.90-97

30/08/2017

Abstract

Computer Science

Computer Science, Information Systems

Science & Technology

Technology

Stemming is a heuristic process to chop off end part of words and sometimes adding additional letters at the end of words to get the basic meaningful forms of surface words. The basic goal of stemming is to reduce inflectional forms of words to root words using multiple techniques. In this paper, hybrid approaches are used for stemming Punjabi words. There has not been any stemmer reported for Punjabi (Shahmukhi) script. We used database lookup approach and rule based stemming for Punjabi Stemmer. Our dataset consists of 2.5 million tokens which were divided into three parts of 1500000, 500000 and 500000 tokens and used for training, development and testing purpose respectively. We got 86.01% accuracy while tested our stemmer over above specified dataset by using 63 rules.

Metrics

1 Record Views

Details

Title: A Hybrid Stemmer of Punjabi Shahmukhi Script
Creators - without role: Abdul Mateen - University of the Punjab
M. Kamran Malik - University of the Punjab
Zubair Nawaz - University of the Punjab
H. M. Danish - University of the Punjab
M. Hassan Siddiqui - University of the Punjab
Qaiser Abbas - University of Sargodha
Publication Details: INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, Vol.17(8), pp.90-97
Publisher: Int Journal Computer Science & Network Security-Ijcsns
Number of pages: 8
Identifiers: 9917085508331
Academic Unit: Islamic University of Al Madinah
Language: English
Resource Type: Journal article