OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations Paper • 2412.07626 • Published Dec 10, 2024 • 28
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model Paper • 2510.14528 • Published Oct 16 • 105
PaddleOCR-VL Collection Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model • 3 items • Updated Oct 17 • 21
PP-OCRv5 Collection PP-OCRv5 is the latest text recognition solution, supporting Simplified Chinese, Chinese Pinyin, Traditional Chinese, English, and Japanese • 13 items • Updated Sep 15 • 50
PP-StructureV3 Collection PP-StructureV3 is a SOTA document parsing solution on OmniDocBench, supporting the conversion of PDFs and do cument images to Markdown and JSON. • 17 items • Updated Sep 15 • 10