Topical Term Weighting based on Extended Random Sets for Relevance Feature Selection

Abdullah Semran Alharbi; Yuefeng Li; Yue Xu; ACM

doi:10.1145/3106426.3106440

Back

Conference proceeding

Topical Term Weighting based on Extended Random Sets for Relevance Feature Selection

Abdullah Semran Alharbi, Yuefeng Li, Yue Xu and ACM

2017 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2017), pp.654-661

01/01/2017

DOI: https://doi.org/10.1145/3106426.3106440

Abstract

Computer Science

Computer Science, Artificial Intelligence

Computer Science, Information Systems

Science & Technology

Technology

It is challenging to discover relevant features from long documents that describe user information needs due to the nature of text where synonymy, polysemy, noise, and high dimensionality are inherited problems. Traditional feature selection methods could not effectively deal with these problems, because they assume that documents describe one topic only. Topic-based techniques, such as Latent Dirichlet Allocation (LDA), relax this assumption. They have been developed on the basis that a document can exhibit multiple hidden topics. However, LDA does not show encouraging results in selecting relevant features, because LDA calculates the weight of terms based on their local documents and does not generalise it globally at the collection level. So as to address this problem, we propose an innovative and effective extended random set model to generalise LDA weight for local document terms. The model is used as a weighting scheme for topical terms. It can assign a more discriminately accurate weight to these terms based on their appearance in LDA topics and relevant documents. The experimental results, based on the standard RCV1 dataset, TREC topics, and five standard performance measures, show that the proposed model significantly outperforms eight state-of-the-art baseline models in information filtering.

Metrics

1 Record Views

Details

Title: Topical Term Weighting based on Extended Random Sets for Relevance Feature Selection
Creators - without role: Abdullah Semran Alharbi - Queensland University of Technology
Yuefeng Li - Queensland University of Technology
Yue Xu - Queensland University of Technology
ACM
Publication Details: 2017 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2017), pp.654-661
Publisher: Assoc Computing Machinery
Number of pages: 8
Grant note: DP140103157 / Australian Research Council (ARC Discovery Project); Australian Research Council
Identifiers: 9928563108331
Academic Unit: Qassim University
Language: English
Resource Type: Conference proceeding