A higher-precision layout area localization model trained on a self-built dataset containing Chinese and English papers, PPT, multi-layout magazines, contracts, books, exams, ancient books and research reports using RT-DETR-L. The layout detection model includes 20 common categories: document title, paragraph title, text, page number, abstract, table, references, footnotes, header, footer, algorithm, formula, formula number, image, table, seal, figure_table title, chart, and sidebar text and lists of references. The key metrics are as follow:
Model
mAP(0.5) (%)
PP-DocLayout_plus-L
83.2
Note
: the evaluation set of the above precision indicators is the self built version sub area detection data set, including Chinese and English papers, magazines, newspapers, research reports PPT、 1000 document type pictures such as test papers and textbooks.
Quick Start
Installation
PaddlePaddle
Please refer to the following commands to install PaddlePaddle using pip:
# for CUDA11.8
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
# for CUDA12.6
python -m pip install paddlepaddle-gpu==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
# for CPU
python -m pip install paddlepaddle==3.0.0 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
You can also integrate the model inference of the layout detection module into your project. Before running the following code, please download the sample image to your local machine.
from paddleocr import LayoutDetection
model = LayoutDetection(model_name="PP-DocLayout_plus-L")
output = model.predict("N5C68HPVAI-xQAWTxpbA6.jpeg", batch_size=1, layout_nms=True)
for res in output:
res.print()
res.save_to_img(save_path="./output/")
res.save_to_json(save_path="./output/res.json")
For details about usage command and descriptions of parameters, please refer to the
Document
.
Pipeline Usage
The ability of a single model is limited. But the pipeline consists of several models can provide more capacity to resolve difficult problems in real-world scenarios.
PP-StructureV3
Layout analysis is a technique used to extract structured information from document images. PP-StructureV3 includes the following six modules:
You can experience the inference of the pipeline with just a few lines of code. Taking the PP-StructureV3 pipeline as an example:
from paddleocr import PPStructureV3
pipeline = PPStructureV3()
# ocr = PPStructureV3(use_doc_orientation_classify=True) # Use use_doc_orientation_classify to enable/disable document orientation classification model# ocr = PPStructureV3(use_doc_unwarping=True) # Use use_doc_unwarping to enable/disable document unwarping module# ocr = PPStructureV3(use_textline_orientation=True) # Use use_textline_orientation to enable/disable textline orientation classification model# ocr = PPStructureV3(device="gpu") # Use device to specify GPU for model inference
output = pipeline.predict("./KP10tiSZfAjMuwZUSLtRp.png")
for res in output:
res.print() ## Print the structured prediction output
res.save_to_json(save_path="output") ## Save the current image's structured result in JSON format
res.save_to_markdown(save_path="output") ## Save the current image's result in Markdown format
The default model used in pipeline is
PP-DocLayout_plus-L
.
For details about usage command and descriptions of parameters, please refer to the
Document
.
PP-DocLayout_plus-L huggingface.co is an AI model on huggingface.co that provides PP-DocLayout_plus-L's model effect (), which can be used instantly with this PaddlePaddle PP-DocLayout_plus-L model. huggingface.co supports a free trial of the PP-DocLayout_plus-L model, and also provides paid use of the PP-DocLayout_plus-L. Support call PP-DocLayout_plus-L model through api, including Node.js, Python, http.
PP-DocLayout_plus-L huggingface.co is an online trial and call api platform, which integrates PP-DocLayout_plus-L's modeling effects, including api services, and provides a free online trial of PP-DocLayout_plus-L, you can try PP-DocLayout_plus-L online for free by clicking the link below.
PaddlePaddle PP-DocLayout_plus-L online free url in huggingface.co:
PP-DocLayout_plus-L is an open source model from GitHub that offers a free installation service, and any user can find PP-DocLayout_plus-L on GitHub to install. At the same time, huggingface.co provides the effect of PP-DocLayout_plus-L install, users can directly use PP-DocLayout_plus-L installed effect in huggingface.co for debugging and trial. It also supports api for free installation.
PP-DocLayout_plus-L install url in huggingface.co: