Please use this identifier to cite or link to this item: http://hdl.handle.net/2080/4576
Title: Malware Classification Leveraging NLP & Machine Learning for Enhanced Accuracy
Authors: Gond, Bishwajit Prasad
Rajneekant, .
Kishor, Pushkar
Mohapatra, Durga Prasad
Keywords: API calls
Malware Classifier
𝑛-grams
Portable executable
Issue Date: May-2024
Citation: 4th International Conference on Machine Learning and Big Data Analytics (ICMLBDA), NIT Kurukshetra, India, Hybrid Mode, 09-11 May 2024
Abstract: This paper investigates the application of natural language processing (NLP)-based 𝑛-gram analysis and machine learning techniques to enhance malware classification. We explore how NLP can be used to extract and analyze textual features from malware samples through 𝑛-grams, contiguous string or API call sequences. This approach effectively captures distinctive linguistic patterns among malware and benign families, enabling finer-grained classification. We delve into 𝑛-gram size selection, feature representation, and classification algorithms. While evaluating our proposed method on real-world malware samples, we observe significantly improved accuracy compared to the traditional methods. By implementing our 𝑛-gram approach, we achieved an accuracy of 99.02% across various machine learning algorithms by using hybrid feature selection technique to address high dimensionality. Hybrid feature selection technique reduces the feature set to only 1.6% of the original features.
Description: Copyright belongs to proceeding publisher
URI: http://hdl.handle.net/2080/4576
Appears in Collections:Conference Papers

Files in This Item:
File Description SizeFormat 
2024_ICMLABDA_BPGond_Malware_.pdf1.12 MBAdobe PDFView/Open    Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.