Abstract
Automatically recognizing emotions of patients can be a good facilitator of a connected healthcare framework. It can give automatic feedback to the stakeholders of the healthcare industry about patients' states and satisfaction levels. In this article, we propose an automatic audio-visual emotion recognition system in a connected healthcare framework. The system uses a 2D CNN model for the speech modality and a 3D CNN model for the visual modality. For the speech signal, preprocessing is done to extract the PS-PA feature vector. The features from the two CNN models are blended by two ELM networks. The first ELM is trained with gender-specific data, while the other one is trained with emotion-specific data. The proposed system is evaluated using three databases, and the experiments prove the success of the system. In the healthcare framework, we use edge computing prior to intensive-processing cloud computing. In the edge computing, we realize edge caching, which can store the CNN model parameters and thereby perform the testing fast.