Effects of Data Imputation Methods on Data Missingness in Data Mining

Marvin L. Brown ., Chien-Hua Mike Lin .

Abstract


The purpose of this paper is to study the
effectiveness of data imputation methods in dealing
with data missingness in the data mining phase of
knowledge discovery in Database (KDD). The
application of data mining techniques without careful
consideration of missing data can result into biased
results and skewed conclusions. This research explores
the impact of data missingness at various levels in KDD
models employing neural networks as the primary data
mining algorithm. Four of the most commonly utilized
data imputation methods - Case Deletion, Mean
Substitution, Regression Imputation, and Multiple
Imputation were evalutated using Root Mean Square
(RMS) Values, ANOVA Testing, T-tests, and Tukey’s
Honestly Significant Difference Test to assess the
differences of performance levels between various
Knowledge Discovery and Neural Network Models,
both in the presence and absence of Missing Data.


Full Text:

PDF

Refbacks

  • There are currently no refbacks.