IMPUTING MISSING DATA IN A SWAT WATER QUALITY MODELLING STUDY USING STATISTICAL METHODS


Boyacıoğlu H., Kaya Uyar M., Boyacıoğlu H.

ENVIRONMENTAL ENGINEERING AND MANAGEMENT JOURNAL, vol.23, no.3, pp.579-586, 2024 (SCI-Expanded)

  • Publication Type: Article / Article
  • Volume: 23 Issue: 3
  • Publication Date: 2024
  • Journal Name: ENVIRONMENTAL ENGINEERING AND MANAGEMENT JOURNAL
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aqualine, Aquatic Science & Fisheries Abstracts (ASFA), CAB Abstracts, Environment Index, Greenfile, Pollution Abstracts, Veterinary Science Database
  • Page Numbers: pp.579-586
  • Dokuz Eylül University Affiliated: Yes

Abstract

Large water-quality databases are useful in modeling studies to identify optimal measures for pollution mitigation and management of water basins. The objective of the study was to conduct statistical methods to impute missing data in the water quality simulation study in the Küçük Menderes River Basin, Türkiye, where missing data caused by a lack of periodic sampling is an important challenge. In the study, the Soil Water Assessment Tool (SWAT) was used to simulate nitrate-nitrogen concentrations (NO3-N). Water-quality data collected between 2001 and 2012 from the outlet of the basin was subjected to regression analysis-based imputation methods. In this scope, simple regression models were developed to estimate missing water quality data. Hence, a continuous data set was created, and then the SWAT water quality model was calibrated and validated. Since the calculated Nash–Sutcliffe model efficiency coefficient values were above 0.65, model simulations were judged "good". Furthermore, the Mann-Whitney U test was applied to test model performance by comparing continuous data generated by the SWAT model with the limited observed water quality data. It can be concluded that a simple regression model and non-parametric Mann-Whitney U tests can be performed to impute missing data and evaluate model performance in modeling studies of data shortage basins.