5th International Conference on Problems of Cybernetics and Informatics (PCI 2023), Baku, Azerbaycan, 28 - 30 Ağustos 2023, ss.1-5
In our study, a software
system has been developed for automatic generation of a training set in “image
to HTML” pairs for table detection problem. In the system, table images with
different designs reflecting real-world scenarios are randomly generated depending
on various parameters. Depending on the parameters used, HTML files are first
generated and then these HTML files are converted into table images using a web
browser. Then a training set is created by matching these table images with
appropriate HTML texts. This training set can be a useful tool for academic and
industrial researchers and can be used in automatic document conversion
processes.