A Generating System for Creating a Training Set to Convert Table Images into HTML Files


Sadigzade M., NASİBOĞLU R., NASİBOĞLU E.

5th International Conference on Problems of Cybernetics and Informatics, PCI 2023, Baku, Azerbaycan, 28 - 30 Ağustos 2023, (Tam Metin Bildiri) identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/pci60110.2023.10325951
  • Basıldığı Şehir: Baku
  • Basıldığı Ülke: Azerbaycan
  • Anahtar Kelimeler: Deep Learning, HTML, Image processing, Table detection
  • Dokuz Eylül Üniversitesi Adresli: Evet

Özet

In our study, a software system has been developed for automatic generation of a training set in 'image to HTML' pairs for table detection problem. In the system, table images with different designs reflecting real-world scenarios are randomly generated depending on various parameters. Depending on the parameters used, HTML files are first generated and then these HTML files are converted into table images using a web browser. Then a training set is created by matching these table images with appropriate HTML texts. This training set can be a useful tool for academic and industrial researchers and can be used in automatic document conversion processes.