Please use this identifier to cite or link to this item:
http://hdl.handle.net/2080/4733
Title: | A Multi-Step Fuzzy C-Means Approach for Accurate Data Imputation in Healthcare |
Authors: | Nayak, Subhashish Dash, Swayam Smruti Khilar, Pabitra Mohan |
Keywords: | Clustering Fuzzy C-Means Imputation k-NN Optimization Data Analysis Big Data Healthcare Machine Learning |
Issue Date: | Oct-2024 |
Citation: | 3rd IEEE International Conference on Computer Vision and Machine Intelligence (IEEE CVMI), IIIT Allahabad, Prayagraj, India, 19-20 October 2024 |
Abstract: | In this emerging technological era, data is the new oil. For a long time, missing values in data posed a huge challenge to machine learning, data statistics, data mining and other data-driven fields. In the present context, various data imputation methods to handle missing data exist, as discovering meaningful information is essential. However, the most widely used approach to handle missing values in a huge dataset is to discard those values, leading to losing crucial information. Therefore, a novel imputation method needs to handle those missing values. Soft clustering-based approaches are widely employed in many current data imputation techniques applications. This paper proposes an accurate Fuzzy C-Means (FCM) clustering and integrates it with membership values for weighted imputation. The contributions include a novel methodology for estimating missing values in healthcare datasets, retaining the dataset’s underlying distribution while maintaining vital information, proposed workflow, and handling numerical and categorical data types. This multi-step procedure yielded more accurate results and representative information than other state-of-the-art methods: Mean imputation and Fuzzy C-means with Genetic Algorithm (FCMGA). The proposed algorithm outperforms the available methods and is presented in this work. The experimentation is carried out on two benchmark datasets to assess the efficacy of the proposed approach. The proposed method gave significantly improved MSE, NRMSE, UCE and CCD scores on Diabetes and Heart datasets. |
Description: | Copyright belongs to proceeding publisher |
URI: | http://hdl.handle.net/2080/4733 |
Appears in Collections: | Conference Papers |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
2024_IEE-CVMI_SNayak_AMulti-Step.pdf | 68.29 kB | Adobe PDF | View/Open Request a copy |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.