Comparison of Decision Tree Algorithms for Predicting Potential Air Pollutant Emissions with Data Mining Models


Birant D.

JOURNAL OF ENVIRONMENTAL INFORMATICS, vol.17, no.1, pp.46-53, 2011 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 17 Issue: 1
  • Publication Date: 2011
  • Doi Number: 10.3808/jei.201100186
  • Journal Name: JOURNAL OF ENVIRONMENTAL INFORMATICS
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Page Numbers: pp.46-53
  • Keywords: air pollution, data mining, classification and prediction, decision support systems, artificial intelligence, NEURAL-NETWORK, QUALITY, SYSTEM, SO2, NO2
  • Dokuz Eylül University Affiliated: Yes

Abstract

Predicting air pollutant emissions from potential industrial installations is important for controlling air pollution and future planning of air quality management. This paper proposes the classification and prediction of the emission levels of industrial air pollutant sources using decision tree technique. It presents the comparison results of many decision tree algorithms (C4.5, CART, NBTree, BFTree, LADTree, REPTree, Random Tree, Random Forest, LMT, FT and Decision Stump) in terms of running time, classification accuracy and applicability. In comparison, six performance metrics were used: classification accuracy, precision, recall, f-measure, mean absolute error and mean squared error. The aim of the study is to determine the best classifier as a data mining model for the prediction of emission levels of the industrial plants as dependent variable from known values of independent variables: the physical region of the plant, the height of the plant, working hours, the height of the stack, the diameter of the stack, the velocity of the waste in the stack, the temperature of the waste in the stack, plume rise, source classification code, control equipment type and emissions method code. In the experimental studies, all these algorithms are applied on the dataset that consists of sulphur oxide emission levels of industrial pollutants in Izmir. According to the results, while C4.5 algorithm has the highest accuracy value, Decision Stump algorithm is the fastest one. The average classification accuracy found as 82.4% empirically shows the benefits of using decision tree technique in the classification and the prediction of emission levels.