High-temporal resolution ozone forecasting and missing data imputation in a topographically influenced agricultural region using Gaussian process regression


GÜNDOĞDU S., ELBİR T.

Environmental Research Communications, cilt.8, sa.5, 2026 (SCI-Expanded, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 8 Sayı: 5
  • Basım Tarihi: 2026
  • Doi Numarası: 10.1088/2515-7620/ae67e5
  • Dergi Adı: Environmental Research Communications
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Anahtar Kelimeler: Gaussian process regression, machine learning, ozone, prediction
  • Dokuz Eylül Üniversitesi Adresli: Evet

Özet

Ground-level ozone (O3) is a secondary air pollutant with pronounced adverse effects on human health, ecosystems, and atmospheric chemistry. Its formation and temporal variability are strongly governed by complex, non-linear interactions between precursor emissions and meteorological conditions. This study presents a robust probabilistic machine learning framework based on Gaussian process regression (GPR) for forecasting hourly O3 concentrations at a semi-urban site in Türkiye, located within a major agricultural region where accurate O3 prediction is particularly critical due to topographical constraints and enhanced summertime photochemical activity. Unlike traditional deterministic models, the GPR approach was selected for its ability to capture intricate spatiotemporal nonlinear dependencies. By leveraging high-quality ERA5 reanalysis meteorological data alongside local observations, the performance of the GPR framework was benchmarked against four widely recognized algorithms: support vector machine, wide neural network, regression tree, and linear regression. The GPR model, utilizing a Matérn 3/2 kernel and optimized via Grid search, emerged as the superior framework, achieving a high correlation coefficient (R) of 0.96 and a low root mean square error of 6.36 μg m−3. Feature importance analysis confirmed that the boundary layer height and 2 m temperature were the most influential predictors. Beyond point forecasting, the optimized GPR framework demonstrated substantial utility as a robust imputation tool, successfully reconstructing 3853 missing hourly O3 observations to ensure data integrity for long-term trend analysis. The results confirm that the GPR-based approach provides a scientifically defensible, highly accurate decision-support tool for air quality management in topographically complex agricultural regions.