A Novel Algorithmic Model for Accurate Reference Interval Calculation in Highly Skewed Datasets


Deveci Kocakoç İ., İşbilen Başok B.

26th IFCC-EFLM EuroMedLab Congress, Brussels, Belgium, 18 - 22 May 2025, pp.1, (Summary Text)

  • Publication Type: Conference Paper / Summary Text
  • City: Brussels
  • Country: Belgium
  • Page Numbers: pp.1
  • Dokuz Eylül University Affiliated: Yes

Abstract

Aim: Various methods are proposed for indirect reference interval calculation, also known as the indirect method for establishing reference intervals (RI)s. Many indirect methods assume that test data follows a normal distribution or attempt to transform non-normally distributed data into a normal distribution using techniques like Box-Cox. This approach can be problematic in calculating RIs, particularly for tests with skewed distributions. In this study, we aimed to compare the RIs obtained using our advanced algorithmic model in two different moderately and highly skewed datasets with those generated by the Refiner and Kosmic models.

Materials and Methods: Firstly, a moderately skewed dataset (test1) contains 100,000 simulated measurements, with the majority (90%) being non-pathological (np) were employed. Ground Truth (GT) for RI was 10-50 (2.5% perc., 97.5% perc). Secondly, a highly skewed dataset (test2) consists of 50,000 simulated measurements where the majority of the values are heavily concentrated in the lower range with 60% np samples being adopted. GT for test2’s RI was 59.8 to 160. A novel method (EazyRI), coded in Python 3.6, utilizes distribution theory to better model the skewed data and obtain precise RIs by using peak detection and distribution fitting is utilized. RIs of the datasets were calculated, and hence compared by Kosmic, Refiner and EazyRI.

Results: The np ratios (%) determined by RefineR for test1 and test2 were 91 and 66, respectively, while these values were 91 and 62 in EazyRI. For test1 with a GT of 10–50, the RIs determined by Kosmic, RefineR, and EazyRI were 9.8–49.77, 9.62–49.5, and 10–49.63, respectively. For test2 with a GT of 59.8–160, Kosmic could not calculate an RI, while RefineR and EazyRI produced RIs of 32.73–147.8 and 53.12–151.21, respectively.

Conclusion: EazyRI showed similar np separation capability to RefineR in the moderately skewed dataset, whereas its np separation performance was observed to be superior in the highly skewed dataset. Notably, in the highly skewed dataset, EazyRI produced a lower reference limit that was closer to the GT. EazyRI's distribution theory approach demonstrated superior performance in both np separation and RI calculation in the highly skewed dataset.

 

Keywords: EazyRI, reference interval, indirect method, algorithm, distribution theory, skewed data.