Urdu part of speech tagging using conditional random fields

Wahab Khan; Ali Daud; Jamal Abdul Nasir; Tehmina Amjad; Sachi Arafat; Naif Aljohani; Fahd S. Alotaibi

doi:10.1007/s10579-018-9439-6

Back

Urdu part of speech tagging using conditional random fields

Journal article

Peer reviewed

Urdu part of speech tagging using conditional random fields

Wahab Khan, Ali Daud, Jamal Abdul Nasir, Tehmina Amjad, Sachi Arafat, Naif Aljohani and Fahd S. Alotaibi

Language resources and evaluation, Vol.53(3), pp.331-362

01/09/2019

DOI: https://doi.org/10.1007/s10579-018-9439-6

Abstract

Computer Science

Computer Science, Interdisciplinary Applications

Science & Technology

Technology

Part of speech (POS) tagging, the assignment of syntactic categories for words in running text, is significant to natural language processing as a preliminary task in applications such as speech processing, information extraction, and others. Urdu language processing presents a challenge due to the dual behaviour of various Urdu POS tags in differing situations (morphosyntactic ambiguity). This paper addresses this challenge by developing a novel tagging approach using linear-chain conditional random fields (CRF). Our work is the first instance of a CRF approach for Urdu POS tagging. The proposed model employs a strong, stable and balanced language-independent as well as language dependent feature set. The language-dependent feature considered includes part-of-speech tag of the previous word and suffix of the current word while the language-independent features includes the 'context words window'. Our approach was evaluated against support vector machine techniques for Urdu POS-considered as state of the art-on two benchmark datasets. The results show our CRF approach to improve upon the F-measure of prior attempts by 8.3-8.5%.

Metrics

1 Record Views

See more details

Details

Title: Urdu part of speech tagging using conditional random fields
Creators - without role: Wahab Khan - IIU, Dept Comp Sci & Software Engn, Islamabad 44000, Pakistan
Ali Daud - King Abdulaziz University
Jamal Abdul Nasir - International Islamic University, Islamabad
Tehmina Amjad - International Islamic University, Islamabad
Sachi Arafat - King Abdulaziz University
Naif Aljohani - King Abdulaziz University
Fahd S. Alotaibi - King Abdulaziz University
Publication Details: Language resources and evaluation, Vol.53(3), pp.331-362
Publisher: Springer Nature
Number of pages: 32
Identifiers: 9932958008331
Academic Unit: University of Jeddah; King Abdulaziz University
Language: English
Resource Type: Journal article