Importance of data preprocessing for neural networks modeling: The case of estimating the compaction parameters of soils

Isik, Fatih; ÖZDEN, GÜRKAN; KUNTALP, MEHMET

Importance of data preprocessing for neural networks modeling: The case of estimating the compaction parameters of soils

ENERGY EDUCATION SCIENCE AND TECHNOLOGY PART A-ENERGY SCIENCE AND RESEARCH, cilt.29, sa.1, ss.463-474, 2012 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 29 Sayı: 1
Basım Tarihi: 2012
Dergi Adı: ENERGY EDUCATION SCIENCE AND TECHNOLOGY PART A-ENERGY SCIENCE AND RESEARCH
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED)
Sayfa Sayıları: ss.463-474
Anahtar Kelimeler: Artificial neural networks, Data transformation, Data division, Clustering analysis, Fuzzy c-means clustering
Dokuz Eylül Üniversitesi Adresli: Evet

Özet

In recent years, the artificial neural networks (ANNs) have been successfully applied to variety of engineering problems in order to discover the unknown phenomenon of the problem at hand. In the majority of these applications, ANNs were used to predict the non-linear relationship between the input variables and the corresponding target(s). Although, ANNs have undeniable advantages, they are not faultless. One of the shortcomings of ANNs takes place at the preprocessing stage of the modeling. The data preprocessing methodologies (i.e. data transformation and data division) have a significant effect on the performance of ANN models. This study examines the effect of four different data transformation methods (i.e. statistical normalization, min-max normalization, non-linear transformation and whitening transformation) and two different data division methods (i.e. random division and fuzzy c-means clustering) on ANN prediction models performances for the case study of prediction of the compaction parameters of both coarse and fine-grained soils at standard Proctor compaction energy level. Findings reveal that the raw data should be transformed by a data transformation method. It is also exposed that the main data set should be subjected to clustering analysis and divided into training, testing and validation subsets by a systematic approach. The success of preprocessing methods may vary for other neural network applications. However, this study shows the importance of data preprocessing neural networks modelers.