Please use this identifier to cite or link to this item: http://hdl.handle.net/2080/5271
Title: Malware Classification with n-gram Based NLP and Cosine Similarity
Authors: Shahnawaz, Md
Gond, Bishwajit Prasad
Mohapatra, Durga Prasad
Keywords: Malware classification
Natural Language Processing (NLP)
Cosine similarity
API call sequences
Issue Date: Aug-2025
Citation: 22nd Control Instrumentation Systems conference (CISCON), MIT Manipal, Karnataka, 1-2 August 2025
Abstract: Malware classification is a fundamental aspect of cybersecurity, essential for detecting and mitigating threats. This paper introduces a malware classification technique that applies Natural Language Processing (NLP) methods alongside Cosine similarity. Our approach involves using n-grams of API call sequences, including the API names, their arguments, and categories, to characterize malware behavior. By computing Cosine similarity between these n-grams, we effectively capture both similarities and differences in malware behavior. Our experimental results show that different n-gram configurations exhibit varying classification capabilities, with some proving more effective for specific types of malware. Overall, our technique presents a promising solution for malware classification, leveraging NLP and Cosine similarity to improve the accuracy and efficiency of malware variant detection.
Description: Copyright belongs to the proceeding publisher.
URI: http://hdl.handle.net/2080/5271
Appears in Collections:Conference Papers

Files in This Item:
File Description SizeFormat 
2025_CISCON_MShahnawaz_Malware.pdf1.44 MBAdobe PDFView/Open    Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.