Please use this identifier to cite or link to this item: http://hdl.handle.net/2080/4733
Title: A Multi-Step Fuzzy C-Means Approach for Accurate Data Imputation in Healthcare
Authors: Nayak, Subhashish
Dash, Swayam Smruti
Khilar, Pabitra Mohan
Keywords: Clustering
Fuzzy C-Means
Imputation
k-NN
Optimization
Data Analysis
Big Data
Healthcare
Machine Learning
Issue Date: Oct-2024
Citation: 3rd IEEE International Conference on Computer Vision and Machine Intelligence (IEEE CVMI), IIIT Allahabad, Prayagraj, India, 19-20 October 2024
Abstract: In this emerging technological era, data is the new oil. For a long time, missing values in data posed a huge challenge to machine learning, data statistics, data mining and other data-driven fields. In the present context, various data imputation methods to handle missing data exist, as discovering meaningful information is essential. However, the most widely used approach to handle missing values in a huge dataset is to discard those values, leading to losing crucial information. Therefore, a novel imputation method needs to handle those missing values. Soft clustering-based approaches are widely employed in many current data imputation techniques applications. This paper proposes an accurate Fuzzy C-Means (FCM) clustering and integrates it with membership values for weighted imputation. The contributions include a novel methodology for estimating missing values in healthcare datasets, retaining the dataset’s underlying distribution while maintaining vital information, proposed workflow, and handling numerical and categorical data types. This multi-step procedure yielded more accurate results and representative information than other state-of-the-art methods: Mean imputation and Fuzzy C-means with Genetic Algorithm (FCMGA). The proposed algorithm outperforms the available methods and is presented in this work. The experimentation is carried out on two benchmark datasets to assess the efficacy of the proposed approach. The proposed method gave significantly improved MSE, NRMSE, UCE and CCD scores on Diabetes and Heart datasets.
Description: Copyright belongs to proceeding publisher
URI: http://hdl.handle.net/2080/4733
Appears in Collections:Conference Papers

Files in This Item:
File Description SizeFormat 
2024_IEE-CVMI_SNayak_AMulti-Step.pdf68.29 kBAdobe PDFView/Open    Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.