Deep Learning-based Table Detection in Documents
Kovalova, Andrea (2023)
Kovalova, Andrea
2023
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-202303183820
https://urn.fi/URN:NBN:fi:amk-202303183820
Tiivistelmä
Extracting content from tables in documents was identified as problematic, especially in the cases of tables without clear borders. The proposed solution was to apply document layout analysis, particularly using the LayoutParser library, to identify tables in company documents. The hypothesis was that fine-tuning a model available in the Model Zoo of LayoutParser on the custom dataset would significantly improve the table detections, at least for the documents with borderless tables.
As the custom dataset, 302 company documents were annotated and used for experiments with different data augmentations and their evaluations. A Faster R-CNN model trained on the TableBank dataset was selected as the pre-trained model for fine- tuning. The results of experiments showed that fine-tuning the model with augmentations where only resizing of the input images was applied was the best approach and it substantially improved table detection in company documents, especially the detection of borderless tables. In accordance with the results, the thesis hypothesis was accepted.
As the custom dataset, 302 company documents were annotated and used for experiments with different data augmentations and their evaluations. A Faster R-CNN model trained on the TableBank dataset was selected as the pre-trained model for fine- tuning. The results of experiments showed that fine-tuning the model with augmentations where only resizing of the input images was applied was the best approach and it substantially improved table detection in company documents, especially the detection of borderless tables. In accordance with the results, the thesis hypothesis was accepted.