Please use this identifier to cite or link to this item:
http://hdl.handle.net/2080/5271
Title: | Malware Classification with n-gram Based NLP and Cosine Similarity |
Authors: | Shahnawaz, Md Gond, Bishwajit Prasad Mohapatra, Durga Prasad |
Keywords: | Malware classification Natural Language Processing (NLP) Cosine similarity API call sequences |
Issue Date: | Aug-2025 |
Citation: | 22nd Control Instrumentation Systems conference (CISCON), MIT Manipal, Karnataka, 1-2 August 2025 |
Abstract: | Malware classification is a fundamental aspect of cybersecurity, essential for detecting and mitigating threats. This paper introduces a malware classification technique that applies Natural Language Processing (NLP) methods alongside Cosine similarity. Our approach involves using n-grams of API call sequences, including the API names, their arguments, and categories, to characterize malware behavior. By computing Cosine similarity between these n-grams, we effectively capture both similarities and differences in malware behavior. Our experimental results show that different n-gram configurations exhibit varying classification capabilities, with some proving more effective for specific types of malware. Overall, our technique presents a promising solution for malware classification, leveraging NLP and Cosine similarity to improve the accuracy and efficiency of malware variant detection. |
Description: | Copyright belongs to the proceeding publisher. |
URI: | http://hdl.handle.net/2080/5271 |
Appears in Collections: | Conference Papers |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
2025_CISCON_MShahnawaz_Malware.pdf | 1.44 MB | Adobe PDF | View/Open Request a copy |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.