Please use this identifier to cite or link to this item: http://hdl.handle.net/2080/4632
Title: NLP-Driven Malware Classification: A Jaccard Similarity Approach
Authors: Gond, Bishwajit Prasad
Shahnawaz, Md
Rajneekant
Mohapatra, Durga Prasad
Keywords: Malware
Malware classifier
n-grams
Jaccard similarity
ortable executable
Issue Date: Jun-2024
Citation: IEEE International Conference on Information Technology, Electronics and Intelligent Communication Systems (ICITEICS) Karnataka, India. Jun 28-29, 2024
Abstract: Malware classification is a critical task in cybersecu-rity, essential for identifying and mitigating threats. This paper presents an approach to malware classification using Natural Language Processing (NLP) techniques coupled with Jaccard similarity. We propose utilizing n-grams of API call sequences, comprising API names and their arguments, to represent the be- haviour of malware samples. By computing the Jaccard similarity between these n-grams, we can effectively capture the similarities and differences in malware behaviour. Our experiments reveal that different n-grams exhibit varying classification abilities, with some performing better for specific types of malware. Moreover, we observe that increasing the value of n in n-grams leads to improved evaluation metrics, indicating the effectiveness of our approach. Overall, our method offers a promising approach to malware classification, leveraging NLP and Jaccard similarity to enhance accuracy and effectiveness in identifying malware variants.
Description: Copyright belongs to proceeding publisher
URI: http://hdl.handle.net/2080/4632
Appears in Collections:Conference Papers

Files in This Item:
File Description SizeFormat 
2024_IEEE_DPMohapatra_NLP.pdf1.06 MBAdobe PDFView/Open    Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.