CNN with Paragraph to Multi-Sequence Learning for Sensitive Text Detection

Khudran Alzhrani; Fahad Saud Alrasheedi; Faris Anwar Kateb; Terrance E. Boult; IEEE

Back

Conference proceeding

CNN with Paragraph to Multi-Sequence Learning for Sensitive Text Detection

Khudran Alzhrani, Fahad Saud Alrasheedi, Faris Anwar Kateb, Terrance E. Boult and IEEE

2019 2ND INTERNATIONAL CONFERENCE ON COMPUTER APPLICATIONS & INFORMATION SECURITY (ICCAIS)

01/01/2019

Abstract

Computer Science

Computer Science, Information Systems

Computer Science, Interdisciplinary Applications

Science & Technology

Technology

The problem of sensitive information leaks became apparent in the recent infamous security breaches such as WikiLeaks, DNC emails, and Panama Papers. Detecting sensitive texts on the fly enhances the capabilities of security solutions' to monitor and protect critical information flow within the network. Automated text security classification is relatively a new research area, where sensitive texts are marked with labels as Secret, Confidential, and Unclassified with no human interaction. This paper examines the performance of deep learning networks in detecting the sensitivity levels of a given text. In deep text classification networks, regardless of text samples length, each paragraph/sentence is represented by a single sequence. We propose techniques to expand training set size, minimize the number of padding character in sequences, and lower inputs' dimensionality through learning from long paragraphs' segments as independent instances. Also, we introduce a wide variation of Convolution Neural Networks (CNN) network evaluated on four large sets of U. S. embassy's diplomatic cables. We are not aware of any paper that applied deep networks to sensitive text classification. Thus, we further evaluate our multi-sequencing technique and CNN network on well-researched non-sensitive text corpora. Our approach outperformed the state-of-the-art models on non-sensitive text datasets and competed with other traditional classifiers on the sensitive text datasets.

Metrics

1 Record Views

Details

Title: CNN with Paragraph to Multi-Sequence Learning for Sensitive Text Detection
Creators - without role: Khudran Alzhrani - Umm al-Qura University
Fahad Saud Alrasheedi - Minist Def, Riyadh, Saudi Arabia
Faris Anwar Kateb - King Abdulaziz University
Terrance E. Boult - Univ Colorado, Vis & Secur Technol Vast Lab, Dept Comp Sciecne, Colorado Springs, CO 80907 USA
IEEE
Publication Details: 2019 2ND INTERNATIONAL CONFERENCE ON COMPUTER APPLICATIONS & INFORMATION SECURITY (ICCAIS)
Publisher: IEEE
Number of pages: 6
Identifiers: 9931763808331
Academic Unit: Umm Al Qura University
Language: English
Resource Type: Conference proceeding