Importance of data preprocessing for neural networks modeling: The case of estimating the compaction parameters of soils


Isik F., ÖZDEN G., KUNTALP M.

ENERGY EDUCATION SCIENCE AND TECHNOLOGY PART A-ENERGY SCIENCE AND RESEARCH, vol.29, no.1, pp.463-474, 2012 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 29 Issue: 1
  • Publication Date: 2012
  • Journal Name: ENERGY EDUCATION SCIENCE AND TECHNOLOGY PART A-ENERGY SCIENCE AND RESEARCH
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED)
  • Page Numbers: pp.463-474
  • Keywords: Artificial neural networks, Data transformation, Data division, Clustering analysis, Fuzzy c-means clustering, NATURAL-GAS CONSUMPTION, PREDICTION
  • Dokuz Eylül University Affiliated: Yes

Abstract

In recent years, the artificial neural networks (ANNs) have been successfully applied to variety of engineering problems in order to discover the unknown phenomenon of the problem at hand. In the majority of these applications, ANNs were used to predict the non-linear relationship between the input variables and the corresponding target(s). Although, ANNs have undeniable advantages, they are not faultless. One of the shortcomings of ANNs takes place at the preprocessing stage of the modeling. The data preprocessing methodologies (i.e. data transformation and data division) have a significant effect on the performance of ANN models. This study examines the effect of four different data transformation methods (i.e. statistical normalization, min-max normalization, non-linear transformation and whitening transformation) and two different data division methods (i.e. random division and fuzzy c-means clustering) on ANN prediction models performances for the case study of prediction of the compaction parameters of both coarse and fine-grained soils at standard Proctor compaction energy level. Findings reveal that the raw data should be transformed by a data transformation method. It is also exposed that the main data set should be subjected to clustering analysis and divided into training, testing and validation subsets by a systematic approach. The success of preprocessing methods may vary for other neural network applications. However, this study shows the importance of data preprocessing neural networks modelers.