Abstract
Stemming is a heuristic process to chop off end part of words and sometimes adding additional letters at the end of words to get the basic meaningful forms of surface words. The basic goal of stemming is to reduce inflectional forms of words to root words using multiple techniques. In this paper, hybrid approaches are used for stemming Punjabi words. There has not been any stemmer reported for Punjabi (Shahmukhi) script. We used database lookup approach and rule based stemming for Punjabi Stemmer. Our dataset consists of 2.5 million tokens which were divided into three parts of 1500000, 500000 and 500000 tokens and used for training, development and testing purpose respectively. We got 86.01% accuracy while tested our stemmer over above specified dataset by using 63 rules.