This page contains models that power the PDF document converion package
docling
.
Layout Model
The layout model will take an image from a page and apply RT-DETR model in order to find different layout components. It currently detects the labels: Caption, Footnote, Formula, List-item, Page-footer, Page-header, Picture, Section-header, Table, Text, Title. As a reference (from the DocLayNet-paper), this is the performance of standard object detection methods on the DocLayNet dataset compared to human evaluation,
human
MRCNN
MRCNN
FRCNN
YOLO
human
R50
R101
R101
v5x6
Caption
84-89
68.4
71.5
70.1
77.7
Footnote
83-91
70.9
71.8
73.7
77.2
Formula
83-85
60.1
63.4
63.5
66.2
List-item
87-88
81.2
80.8
81.0
86.2
Page-footer
93-94
61.6
59.3
58.9
61.1
Page-header
85-89
71.9
70.0
72.0
67.9
Picture
69-71
71.7
72.7
72.0
77.1
Section-header
83-84
67.6
69.3
68.4
74.6
Table
77-81
82.2
82.9
82.2
86.3
Text
84-86
84.6
85.8
85.4
88.1
Title
60-72
76.7
80.4
79.9
82.7
All
82-83
72.4
73.5
73.4
76.8
TableFormer
The tableformer model will identify the structure of the table, starting from an image of a table. It uses the predicted table regions of the layout model to identify the tables. Tableformer has SOTA table structure identification,
Model (TEDS)
Simple table
Complex table
All tables
Tabula
78.0
57.8
67.9
Traprange
60.8
49.9
55.4
Camelot
80.0
66.0
73.0
Acrobat Pro
68.9
61.8
65.3
EDD
91.2
85.4
88.3
TableFormer
95.4
90.1
93.6
References
@techreport{Docling,
author = {Deep Search Team},
month = {8},
title = {{Docling Technical Report}},
url={https://arxiv.org/abs/2408.09869},
eprint={2408.09869},
doi = "10.48550/arXiv.2408.09869",
version = {1.0.0},
year = {2024}
}
@article{doclaynet2022,
title = {DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis},
doi = {10.1145/3534678.353904},
url = {https://arxiv.org/abs/2206.01062},
author = {Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S and Staar, Peter W J},
year = {2022}
}
@InProceedings{TableFormer2022,
author = {Nassar, Ahmed and Livathinos, Nikolaos and Lysak, Maksym and Staar, Peter},
title = {TableFormer: Table Structure Understanding With Transformers},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {4614-4623},
doi = {https://doi.org/10.1109/CVPR52688.2022.00457}
}
Runs of docling-project docling-models on huggingface.co
1.7M
Total runs
52.9K
24-hour runs
52.9K
3-day runs
313.9K
7-day runs
779.9K
30-day runs
More Information About docling-models huggingface.co Model
docling-models huggingface.co is an AI model on huggingface.co that provides docling-models's model effect (), which can be used instantly with this docling-project docling-models model. huggingface.co supports a free trial of the docling-models model, and also provides paid use of the docling-models. Support call docling-models model through api, including Node.js, Python, http.
docling-models huggingface.co is an online trial and call api platform, which integrates docling-models's modeling effects, including api services, and provides a free online trial of docling-models, you can try docling-models online for free by clicking the link below.
docling-project docling-models online free url in huggingface.co:
docling-models is an open source model from GitHub that offers a free installation service, and any user can find docling-models on GitHub to install. At the same time, huggingface.co provides the effect of docling-models install, users can directly use docling-models installed effect in huggingface.co for debugging and trial. It also supports api for free installation.