Named Entity Recognition Using Word-Embedding Techniques for ArabicWeb16: An Empirical Study

Sharefah Al-Ghamdi; Mashael Al-Duwais; Hend Al-Khalifa; Abdulmalik Al-Salman

Back

Conference proceeding

Named Entity Recognition Using Word-Embedding Techniques for ArabicWeb16: An Empirical Study

Sharefah Al-Ghamdi, Mashael Al-Duwais, Hend Al-Khalifa and Abdulmalik Al-Salman

The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings, p.1

01/01/2018

Abstract

Datasets

Embedding

Natural language processing

Recognition

Websites

Conference Title: 2018 21st Saudi Computer Society National Computer Conference (NCC) Conference Start Date: 2018, April 25 Conference End Date: 2018, April 26 Conference Location: Riyadh, Saudi Arabia The 3rd Workshop on Open-Source Arabic Corpora and Processing Tools introduces ArabicWeb16 Data Challenge track. The challenge is about experimenting with ArabicWeb16 dataset, the largest Arabic Web dataset publicly available with about 150M Arabic Web pages. In this paper, we explore the ArabicWeb16 dataset and experiment with it to build word-embedding models for Named Entity Recognition (NER) task. Word-embedding models are powerful for building many Natural Language Processing (NLP) tasks including NER. We tried two word-embedding models: Google Word2Vec model and Stanford GloVe model. The two models were used to recognize similar words for each named entity type. The ArabicWeb16 dataset was somehow hard to pre-process, however, the final results showed promising outputs.

Metrics

1 Record Views

Details

Title: Named Entity Recognition Using Word-Embedding Techniques for ArabicWeb16: An Empirical Study
Creators - without role: Sharefah Al-Ghamdi
Mashael Al-Duwais
Hend Al-Khalifa
Abdulmalik Al-Salman
Publication Details: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Conference Proceedings, p.1
Publisher: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Identifiers: 9952208308331
Academic Unit: King Saud University
Language: English
Resource Type: Conference proceeding