PGSXplorer: an integrated nextflow pipeline for comprehensive quality control and polygenic score model development


Yaraş T., OKTAY Y., KARAKÜLAH G.

PeerJ, cilt.13, sa.2, 2025 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 13 Sayı: 2
  • Basım Tarihi: 2025
  • Doi Numarası: 10.7717/peerj.18973
  • Dergi Adı: PeerJ
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, BIOSIS, MEDLINE, Veterinary Science Database, Directory of Open Access Journals
  • Anahtar Kelimeler: GWAS, Nextflow, PGS, Pipeline, Polygenic risk score, Polygenic score, PRS, Quality control
  • Dokuz Eylül Üniversitesi Adresli: Evet

Özet

The rapid development of next-generation sequencing technologies and genomic data sharing initiatives during the post-Human Genome Project-era has catalyzed major advances in individualized medicine research. Genome-wide association studies (GWAS) have become a cornerstone of efforts towards understanding the genetic basis of complex diseases, leading to the development of polygenic scores (PGS). Despite their immense potential, the scarcity of standardized PGS development pipelines limits widespread adoption of PGS. Herein, we introduce PGSXplorer, a comprehensive Nextflow DSL2 pipeline that enables quality control of genomic data and automates the phasing, imputation, and construction of PGS models using reference GWAS data. PGSXplorer integrates various PGS development tools such as PLINK, PRSice-2, LD-Pred2, Lassosum2, MegaPRS, SBayesR-C, PRS-CSx and MUSSEL, improving the generalizability of PGS through multi-origin data integration. Tested with synthetic datasets, our fully Docker-encapsulated tool has demonstrated scalability and effectiveness for both single- and multi-population analyses. Continuously updated as an open-source tool, PGSXplorer is freely available with user tutorials at https://github.com/tutkuyaras/ PGSXplorer, making it a valuable resource for advancing precision medicine in genetic research.