Abstract
Conference Title: 2018 Sixth International Conference on Digital Information, Networking, and Wireless Communications (DINWC) Conference Start Date: 2018, April 25 Conference End Date: 2018, April 27 Conference Location: Beirut, Lebanon One Hot Encoding (OHE) is currently the norm in text encoding for deep learning neural models. The main problem with OHE is that the size of the input vector, and hence the number of neurons in the input layer, depends on the size of the vocabulary. Experience has shown that the training time for text classification neural models grows exponentially with the size of the vocabulary when OHE is used. For example, if the size of the vocabulary is 10,000, then the size of the input vector will be model 10,000 implying 10,000 neurons in the input layer. This paper proposes and illustrates the use of an alternative Reversible Integer Transformation (RIT) whereby each word in the training/testing set is transformed into base-64 integer format. The transformation is reversible, and the output of the network can easily be converted back to string format (without the need for an index). Another important feature is that each character in the word is represented using only six bits at the appropriate position in the resulting base-64 integer. The maximum number of neurons needed in the input layer is 64, but the actual number of neurons depends on the maximum word length in the vocabulary, and is usually below 64.