Please use this identifier to cite or link to this item: http://hdl.handle.net/2080/4733
Full metadata record
DC FieldValueLanguage
dc.contributor.authorNayak, Subhashish-
dc.contributor.authorDash, Swayam Smruti-
dc.contributor.authorKhilar, Pabitra Mohan-
dc.date.accessioned2024-11-05T11:43:09Z-
dc.date.available2024-11-05T11:43:09Z-
dc.date.issued2024-10-
dc.identifier.citation3rd IEEE International Conference on Computer Vision and Machine Intelligence (IEEE CVMI), IIIT Allahabad, Prayagraj, India, 19-20 October 2024en_US
dc.identifier.urihttp://hdl.handle.net/2080/4733-
dc.descriptionCopyright belongs to proceeding publisheren_US
dc.description.abstractIn this emerging technological era, data is the new oil. For a long time, missing values in data posed a huge challenge to machine learning, data statistics, data mining and other data-driven fields. In the present context, various data imputation methods to handle missing data exist, as discovering meaningful information is essential. However, the most widely used approach to handle missing values in a huge dataset is to discard those values, leading to losing crucial information. Therefore, a novel imputation method needs to handle those missing values. Soft clustering-based approaches are widely employed in many current data imputation techniques applications. This paper proposes an accurate Fuzzy C-Means (FCM) clustering and integrates it with membership values for weighted imputation. The contributions include a novel methodology for estimating missing values in healthcare datasets, retaining the dataset’s underlying distribution while maintaining vital information, proposed workflow, and handling numerical and categorical data types. This multi-step procedure yielded more accurate results and representative information than other state-of-the-art methods: Mean imputation and Fuzzy C-means with Genetic Algorithm (FCMGA). The proposed algorithm outperforms the available methods and is presented in this work. The experimentation is carried out on two benchmark datasets to assess the efficacy of the proposed approach. The proposed method gave significantly improved MSE, NRMSE, UCE and CCD scores on Diabetes and Heart datasets.en_US
dc.subjectClusteringen_US
dc.subjectFuzzy C-Meansen_US
dc.subjectImputationen_US
dc.subjectk-NNen_US
dc.subjectOptimizationen_US
dc.subjectData Analysisen_US
dc.subjectBig Dataen_US
dc.subjectHealthcareen_US
dc.subjectMachine Learningen_US
dc.titleA Multi-Step Fuzzy C-Means Approach for Accurate Data Imputation in Healthcareen_US
dc.typeArticleen_US
Appears in Collections:Conference Papers

Files in This Item:
File Description SizeFormat 
2024_IEE-CVMI_SNayak_AMulti-Step.pdf68.29 kBAdobe PDFView/Open    Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.