Comparison of machine learning algorithms in statistically imputed water potability dataset
Keywords:ANN, K Nearest Neighbor, LR, missing values, RF
Lack of safe drinking water is a growing concern in the present day and age. Since missing data is commonly found among most of the available datasets, the main purpose of this study is to find the best algorithm that works in the dataset that is statistically imputed and find the algorithm that gives the best prediction on whether water is potable or not. Water potability is predicted using its datasets with the help of the four algorithms evaluating nine features. Some values of the three features, specifically pH, chloramine, and trihalomethane, are found to be missing in the dataset. Missing values are filled in by the median of that particular feature. The performance of machine learning algorithms called LR, K-NN, RF, and ANN is compared in these given conditions. As per our research, RF, with 700 decision trees at a maximum depth of 30, is found to be the best-performing algorithm for the statically imputed water potability dataset. The study most certainly answers the question concerning the best algorithm, but still, further study is needed to optimize the algorithm in order to provide the best prediction.
How to Cite
Copyright (c) 2022 JIEE and the authors
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Upon acceptance of an article, the copyright for the published works remains in the JIEE, Thapathali Campus and the authors.