Please use this identifier to cite or link to this item: http://hdl.handle.net/2080/5368
Title: EnhanceNet: Leveraging Facial, Speech, and Textual Cues for Multimodal Emotion Recognition
Authors: Sahoo, Prachyut Priyadarshi
Patra, Dipti
Keywords: Multimodal Emotion Recognition
Deep Learning
Facial Expression Recognition
Speech Emotion Analysis
Natural Language Processing
Mental Health
Issue Date: Oct-2025
Citation: 4th IEEE International Conference on Computer Vision and Machine Intelligence (CVMI), NIT, Rourkela, 12-13 October 2025
Abstract: Multimodal emotion recognition plays a vital role in affective computing, with applications spanning mental health monitoring and adaptive learning systems. This paper introduces EnhanceNet, a comprehensive deep learning framework that fuses three key modalities: facial expression recognition, speech emotion analysis, and spoken language understanding. Each modality employs specialized neural network architectures—a residual CNN with squeeze-and-excitation blocks for facial cues, a CNN-LSTM model for speech signals, and a BiLSTM network for textual transcripts—trained on diverse, widely-used datasets. Unlike conventional late fusion approaches, EnhanceNet adopts an early fusion strategy by averaging predicted emotion vectors from each modality to form a robust, unified emotional profile: Efinal = 1/3 (Eface + Espeech + Etext). This approach leverages complementary strengths of individual modalities, mitigating challenges such as facial occlusion, ambiguous vocal intonation, or sparse linguistic content. The fused model achieves an overall accuracy of 78.23%, outperforming unimodal baselines. The system supports real-time facial expression analysis via webcam and asynchronous processing of audio and text inputs, demonstrating robustness to environmental variability such as lighting conditions and background noise. The results suggest EnhanceNet as a practical foundation for scalable, real-world multimodal emotion recognition systems, with potential impact on next-generation affective technologies.
Description: Copyright belongs to the proceeding publisher.
URI: http://hdl.handle.net/2080/5368
Appears in Collections:Conference Papers

Files in This Item:
File Description SizeFormat 
2025_CVMI_PPSahoo_Enhance.pdf2.72 MBAdobe PDFView/Open    Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.