Abstract
Lip reading is the task of decoding and understanding speech from the movement of a speaker's mouth. This can be extremely beneficial for aiding the hearing impaired to `listen' to people who do not know sign language in real-world environments with a lot of noise pollution. Orthodoxically methods have focused mainly on heavy preprocessing. Despite showing tremendous potential, application of deep learning algorithms has been limited in this field. Here we present a convolutional neural network model to predict words from videos without any audio. It is developed using the pre-trained deep learning architecture VGG Net, pre-trained on the ImageNet Database with some custom modifications on the MIRACL-VCl Dataset of 10 words. The model achieved an accuracy of 94.86% in training, 93.82% in validation and 60% in testing. An app has been developed using this model which can use cloud computing to run the model real time in any smartphone to aid the hearing-impaired in their day-to-day activities and can make conversations with hearing impaired people more natural, organic as well as cost friendly.