Adaptive Contextual Feature Fusion: Leveraging Human-Robot Interaction with Speech Emotion Recognition

Please use this identifier to cite or link to this item: http://hdl.handle.net/2080/4921

Full metadata record

DC Field	Value	Language
dc.contributor.author	Biswas, Sougatamoy	-
dc.contributor.author	Mishra, Romala	-
dc.contributor.author	Sahoo, Pratik Kumar	-
dc.contributor.author	Nandy, Anup	-
dc.date.accessioned	2025-01-10T06:04:59Z	-
dc.date.available	2025-01-10T06:04:59Z	-
dc.date.issued	2024-12	-
dc.identifier.citation	21st IEEE INDICON 2024, IIT Kharagpur, India, 19-21 December 2024	en_US
dc.identifier.uri	http://hdl.handle.net/2080/4921	-
dc.description	Copyright belongs to the proceeding publisher.	en_US
dc.description.abstract	Speech Emotion Recognition (SER) is essential in Human-Robot Interaction (HRI) as it empowers robots to detect and react to human emotions. However, existing Speech Emotion Recognition systems face challenges in capturing the full range of emotional expressions due to the complex interaction of various speech features. This research introduces an innovative method utilizing an Adaptive Contextual Feature Fusion (ACFF) technique. Our method employs Adaptive Contextual Feature Fusion to dynamically fuse a hybrid set of features including Mel-scaled spectrogram, Mel-frequency Cepstral Coefficients (MFCCs), Zero-Crossing Rate (ZCR), and Root Mean Square Energy (RMSE) that captures both spectral and temporal characteristics essential for accurate emotion recognition. The Convolutional Neural Network with Long Short-Term Memory (CNN-LSTM) architecture is then employed to learn spatial and temporal dependencies from the adaptively fused features. The proposed approach is evaluated on a publicly available RAVDESS emotional speech dataset. The proposed CNN-LSTM with Adaptive Contextual Feature Fusion and hybrid features achieved 75.45% accuracy and outperforms other state-of-the-art methods.	en_US
dc.subject	Mel-frequency Cepstral Coefficients	en_US
dc.subject	Speech Signal Processing	en_US
dc.subject	Adaptive Contextual Feature Fusion	en_US
dc.subject	RealTime Emotion Recognition	en_US
dc.subject	Human-Robot Interaction	en_US
dc.title	Adaptive Contextual Feature Fusion: Leveraging Human-Robot Interaction with Speech Emotion Recognition	en_US
dc.type	Article	en_US
Appears in Collections:	Conference Papers

Files in This Item:

File	Description	Size	Format
2024_INDICON_SBiswas_Adaptive.pdf		514.13 kB	Adobe PDF	View/Open

Show simple item record