Video Anomaly Detection Using Self-Attention-Enabled Convolutional Spatiotemporal Autoencoder

Please use this identifier to cite or link to this item: http://hdl.handle.net/2080/4104

Title:	Video Anomaly Detection Using Self-Attention-Enabled Convolutional Spatiotemporal Autoencoder
Authors:	Nayak, Rashmiranjan Pati, Umesh Chandra Das, Santos Kumar
Keywords:	Auto-encoders Convolutional LSTM Convolutional spatiotemporal autoencoder Self-attention Video anomaly detection
Issue Date:	Oct-2023
Citation:	International Symposium on Communications and Information Technologies (ISCIT), Sydney, Australia, 16-18 October 2023
Abstract:	The process of automatically detecting abnormal video patterns in the intelligent surveillance framework is known as video anomaly detection. However, video anomaly detection is challenging due to inherent research challenges such as equivocal nature, data imbalances, data scarcity, the complex nature of the entities involved in the anomaly, etc. Hence, a self-attention-enabled convolutional spatiotemporal autoencoder is proposed to detect video anomalies efficiently. The proposed Self-Attention-enabled Convolutional Long-Short-Term-Memory Auto-Encoder (SA-ConvLSTM2DAE)-based video anomaly detector is comprised of three sequential stages: spatial encoder to learn spatial (appearance) features of individual frames, temporal encode-decoder to learn temporal (motion) features of encoded spatial features, and spatial decoder to decode the encoded spatial features for reconstructing the individual frames. Here, the self-attention mechanism is embedded into the convolutional Long Short Term Memory block present in the temporal encoder-decoder section to generate the Spatial-Attention-enabled ConvLSTM block for learning better spatiotemporal features. An efficient threshold selection criteria based on the finding of the optimized Geometric mean value of the sensitivity and specificity from the Receiver Operating Characteristics curve is implemented. The model is trained on only the video frame sequences corresponding to the normal incidents. However, the model poorly reconstructed test frame sequences with video anomalies, as anomalous samples are never exposed during training. Hence, when the anomaly score of individual frames exceeds the selected optimum threshold level, then an anomaly is said to be detected
Description:	Copyright belongs to proceeding publisher
URI:	http://hdl.handle.net/2080/4104
Appears in Collections:	Conference Papers

Files in This Item:

File	Description	Size	Format
2023_ISCIT_RNayak_Video.pdf		973.41 kB	Adobe PDF	View/Open

Show full item record