A Multi-Step Fuzzy C-Means Approach for Accurate Data Imputation in Healthcare

Please use this identifier to cite or link to this item: http://hdl.handle.net/2080/4733

Full metadata record

DC Field	Value	Language
dc.contributor.author	Nayak, Subhashish	-
dc.contributor.author	Dash, Swayam Smruti	-
dc.contributor.author	Khilar, Pabitra Mohan	-
dc.date.accessioned	2024-11-05T11:43:09Z	-
dc.date.available	2024-11-05T11:43:09Z	-
dc.date.issued	2024-10	-
dc.identifier.citation	3rd IEEE International Conference on Computer Vision and Machine Intelligence (IEEE CVMI), IIIT Allahabad, Prayagraj, India, 19-20 October 2024	en_US
dc.identifier.uri	http://hdl.handle.net/2080/4733	-
dc.description	Copyright belongs to proceeding publisher	en_US
dc.description.abstract	In this emerging technological era, data is the new oil. For a long time, missing values in data posed a huge challenge to machine learning, data statistics, data mining and other data-driven fields. In the present context, various data imputation methods to handle missing data exist, as discovering meaningful information is essential. However, the most widely used approach to handle missing values in a huge dataset is to discard those values, leading to losing crucial information. Therefore, a novel imputation method needs to handle those missing values. Soft clustering-based approaches are widely employed in many current data imputation techniques applications. This paper proposes an accurate Fuzzy C-Means (FCM) clustering and integrates it with membership values for weighted imputation. The contributions include a novel methodology for estimating missing values in healthcare datasets, retaining the dataset’s underlying distribution while maintaining vital information, proposed workflow, and handling numerical and categorical data types. This multi-step procedure yielded more accurate results and representative information than other state-of-the-art methods: Mean imputation and Fuzzy C-means with Genetic Algorithm (FCMGA). The proposed algorithm outperforms the available methods and is presented in this work. The experimentation is carried out on two benchmark datasets to assess the efficacy of the proposed approach. The proposed method gave significantly improved MSE, NRMSE, UCE and CCD scores on Diabetes and Heart datasets.	en_US
dc.subject	Clustering	en_US
dc.subject	Fuzzy C-Means	en_US
dc.subject	Imputation	en_US
dc.subject	k-NN	en_US
dc.subject	Optimization	en_US
dc.subject	Data Analysis	en_US
dc.subject	Big Data	en_US
dc.subject	Healthcare	en_US
dc.subject	Machine Learning	en_US
dc.title	A Multi-Step Fuzzy C-Means Approach for Accurate Data Imputation in Healthcare	en_US
dc.type	Article	en_US
Appears in Collections:	Conference Papers

Files in This Item:

File	Description	Size	Format
2024_IEE-CVMI_SNayak_AMulti-Step.pdf		68.29 kB	Adobe PDF	View/Open

Show simple item record