Genetic algorithms for outlier detection in multiple regression with different information criteria


Alma O. G., Kurt S., UĞUR A.

JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, vol.81, no.1, pp.29-47, 2011 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 81 Issue: 1
  • Publication Date: 2011
  • Doi Number: 10.1080/00949650903136782
  • Journal Name: JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Page Numbers: pp.29-47
  • Keywords: simultaneous outlier detection, AIC, BIC, and ICOMP information criterion, genetic algorithms, multiple regression, INFLUENTIAL OBSERVATIONS, COMPLEXITY CRITERIA, VARIABLE SELECTION, BAYESIAN-APPROACH, IDENTIFICATION
  • Dokuz Eylül University Affiliated: Yes

Abstract

Outliers are abnormal, aberrant or outlying observations in data and can cause distortion of estimations in statistical models. Identification of outliers is an important process for preventing faulty conclusions in statistical analysis. Simultaneous outlier detection, which genetic algorithms (GA) provide, is more successful than the methods based on detecting outliers one by one when an order of detection is important. In this study, we derived new approaches of information criteria which are based on Akaike's information criterion (AIC) and Bozdogan's information complexity (ICOMP) information criterion and we used them as the fitness function of GAs to detect outliers in multiple regression. Performances of AIC' and ICOMP' that we derived are compared by Bayesian information criterion (BIC'). Simulation results of AIC', BIC' and ICOMP' obtained from different sample sizes, penalized kappa values of information criteria and different numbers of explanatory variables are presented and discussed.