An Adaptive Wordpiece Language Model for Learning Chinese Word Embeddings

BinChen Xu; Lu Ma; Liang Zhang; HaoHai Li; Qi Kang; MengChu Zhou

doi:10.1109/COASE.2019.8843151

Back

Conference proceeding

An Adaptive Wordpiece Language Model for Learning Chinese Word Embeddings

BinChen Xu, Lu Ma, Liang Zhang, HaoHai Li, Qi Kang and MengChu Zhou

2019 IEEE 15th International Conference on Automation Science and Engineering (CASE), Vol.2019-, pp.812-817

08/2019

DOI: https://doi.org/10.1109/COASE.2019.8843151

Abstract

Adaptation models

Computational modeling

Data mining

Electromagnetic interference

Road transportation

Semantics

Task analysis

Word representations are crucial for many nature language processing tasks. Most of the existing approaches learn contextual information by assigning a distinct vector to each word and pay less attention to morphology. It is a problem for them to deal with large vocabularies and rare words. In this paper we propose an Adaptive Wordpiece Language Model for learning Chinese word embeddings (AWLM), as inspired by previous observation that subword units are important for improving the learning of Chinese word representation. Specifically, a novel approach called BPE+ is established to adaptively generates variable length of grams which breaks the limitation of stroke n-grams. The semantical information extraction is completed by three elaborated parts i.e., extraction of morphological information, reinforcement of fine-grained information and extraction of semantical information. Empirical results on word similarity, word analogy, text classification and question answering verify that our method significantly outperforms several state-of-the-art methods.

Metrics

1 Record Views

Details

Title: An Adaptive Wordpiece Language Model for Learning Chinese Word Embeddings
Creators - without role: BinChen Xu - Tongji University
Lu Ma - Tongji University
Liang Zhang - Tongji University
HaoHai Li - Ravenscroft School, Raleigh, NC, 27615, USA
Qi Kang - Tongji University
MengChu Zhou - New Jersey Institute of Technology
Publication Details: 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE), Vol.2019-, pp.812-817
Publisher: IEEE
Identifiers: 9934429508331
Academic Unit: King Abdulaziz University
Language: English
Resource Type: Conference proceeding