Please use this identifier to cite or link to this item:
http://hdl.handle.net/2080/4576
Title: | Malware Classification Leveraging NLP & Machine Learning for Enhanced Accuracy |
Authors: | Gond, Bishwajit Prasad Rajneekant, . Kishor, Pushkar Mohapatra, Durga Prasad |
Keywords: | API calls Malware Classifier 𝑛-grams Portable executable |
Issue Date: | May-2024 |
Citation: | 4th International Conference on Machine Learning and Big Data Analytics (ICMLBDA), NIT Kurukshetra, India, Hybrid Mode, 09-11 May 2024 |
Abstract: | This paper investigates the application of natural language processing (NLP)-based 𝑛-gram analysis and machine learning techniques to enhance malware classification. We explore how NLP can be used to extract and analyze textual features from malware samples through 𝑛-grams, contiguous string or API call sequences. This approach effectively captures distinctive linguistic patterns among malware and benign families, enabling finer-grained classification. We delve into 𝑛-gram size selection, feature representation, and classification algorithms. While evaluating our proposed method on real-world malware samples, we observe significantly improved accuracy compared to the traditional methods. By implementing our 𝑛-gram approach, we achieved an accuracy of 99.02% across various machine learning algorithms by using hybrid feature selection technique to address high dimensionality. Hybrid feature selection technique reduces the feature set to only 1.6% of the original features. |
Description: | Copyright belongs to proceeding publisher |
URI: | http://hdl.handle.net/2080/4576 |
Appears in Collections: | Conference Papers |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
2024_ICMLABDA_BPGond_Malware_.pdf | 1.12 MB | Adobe PDF | View/Open Request a copy |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.