deepghs
/

paddleocr

@@ -1,9 +1,134 @@
 ---
 tags:
-- art
 - ocr
 license: other
 license_name: model-distribution-disclaimer-license
 license_link: https://huggingface.co/spaces/deepghs/RDLicence
 ---
-Onnx version of [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR).

 ---
+pipeline_tag: image-to-text
 tags:
 - ocr
+- text-detection
+- text-recognition
+- onnx
+- computer-vision
+- multilingual
 license: other
 license_name: model-distribution-disclaimer-license
 license_link: https://huggingface.co/spaces/deepghs/RDLicence
+library_name: dghs-imgutils
 ---
+# PaddleOCR ONNX Models
+## Summary
+This repository provides **ONNX-format** implementations of **PaddleOCR** models, offering comprehensive **optical character recognition** capabilities for multilingual text detection and recognition. The models are exported from the original [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) framework and optimized for efficient inference across various deployment scenarios.
+The repository contains two main types of models: **text detection** models that identify text regions in images, and **text recognition** models that convert detected text regions into actual text content. The detection models utilize advanced segmentation-based approaches with post-processing techniques including contour detection, polygon approximation, and score thresholding to accurately localize text regions with high precision.
+For text recognition, the models support **multiple languages** including Chinese, English, Japanese, Korean, Arabic, Cyrillic, Devanagari, and several other scripts. Each recognition model comes with its own character dictionary (`dict.txt`) tailored to the specific language or script family. The recognition pipeline handles text normalization, feature extraction, and sequence decoding to produce accurate text transcriptions with confidence scores.
+The models are designed with **practical deployment** in mind, offering various versions optimized for different use cases - from lightweight mobile applications to high-accuracy server deployments. Key features include support for rotated text detection, duplicate character removal, and configurable confidence thresholds for balancing precision and recall in real-world applications.
+## Usage
+The models can be easily used through the `dghs-imgutils` library:
+```bash
+# Installation
+pip install dghs-imgutils
+```
+```python
+# Basic OCR usage
+from imgutils.ocr import ocr, list_det_models, list_rec_models
+# List available models
+print("Detection models:", list_det_models())
+print("Recognition models:", list_rec_models())
+# Perform OCR on an image
+results = ocr('your_image.jpg')
+for bbox, text, confidence in results:
+    print(f"Text: {text}, Confidence: {confidence:.4f}, BBox: {bbox}")
+# Custom model selection
+results = ocr('your_image.jpg',
+              detect_model='ch_PP-OCRv4_det',
+              recognize_model='japan_PP-OCRv3_rec')
+```
+```python
+# Text detection only
+from imgutils.ocr import detect_text_with_ocr
+# Detect text regions without recognition
+detections = detect_text_with_ocr('your_image.jpg')
+for bbox, label, confidence in detections:
+    print(f"BBox: {bbox}, Confidence: {confidence:.4f}")
+```
+## Available Models
+### Text Detection Models
+- `ch_PP-OCRv2_det` - Chinese text detection v2
+- `ch_PP-OCRv3_det` - Chinese text detection v3
+- `ch_PP-OCRv4_det` - Chinese text detection v4
+- `ch_PP-OCRv4_server_det` - Server-optimized Chinese detection v4
+- `ch_ppocr_mobile_slim_v2.0_det` - Lightweight mobile detection
+- `ch_ppocr_mobile_v2.0_det` - Mobile-optimized detection
+- `ch_ppocr_server_v2.0_det` - Server-optimized detection
+- `en_PP-OCRv3_det` - English text detection
+### Text Recognition Models
+- `arabic_PP-OCRv3_rec` - Arabic text recognition
+- `ch_PP-OCRv2_rec` - Chinese text recognition v2
+- `ch_PP-OCRv3_rec` - Chinese text recognition v3
+- `ch_PP-OCRv4_rec` - Chinese text recognition v4
+- `ch_PP-OCRv4_server_rec` - Server-optimized Chinese recognition v4
+- `ch_ppocr_mobile_v2.0_rec` - Mobile-optimized Chinese recognition
+- `ch_ppocr_server_v2.0_rec` - Server-optimized Chinese recognition
+- `chinese_cht_PP-OCRv3_rec` - Traditional Chinese recognition
+- `cyrillic_PP-OCRv3_rec` - Cyrillic script recognition
+- `devanagari_PP-OCRv3_rec` - Devanagari script recognition
+- `en_PP-OCRv3_rec` - English text recognition v3
+- `en_PP-OCRv4_rec` - English text recognition v4
+- `en_number_mobile_v2.0_rec` - Mobile-optimized number recognition
+- `japan_PP-OCRv3_rec` - Japanese text recognition
+- `ka_PP-OCRv3_rec` - Kannada text recognition
+- `korean_PP-OCRv3_rec` - Korean text recognition
+- `latin_PP-OCRv3_rec` - Latin script recognition
+- `ta_PP-OCRv3_rec` - Tamil text recognition
+- `te_PP-OCRv3_rec` - Telugu text recognition
+## Model Configuration
+The OCR pipeline supports several configurable parameters:
+- `heat_threshold`: Heat map threshold for text detection (default: 0.3)
+- `box_threshold`: Box confidence threshold (default: 0.7)
+- `max_candidates`: Maximum number of text candidates (default: 1000)
+- `unclip_ratio`: Expansion ratio for detected boxes (default: 2.0)
+- `rotation_threshold`: Aspect ratio threshold for rotation detection (default: 1.5)
+- `is_remove_duplicate`: Whether to remove duplicate characters (default: False)
+## Performance Notes
+- The default detection model `ch_PP-OCRv4_det` provides excellent balance of accuracy and speed
+- The default recognition model `ch_PP-OCRv4_rec` supports both Chinese and English with high accuracy
+- For specific languages, choose the corresponding recognition model for optimal results
+- Server versions generally offer higher accuracy at the cost of increased computational requirements
+- Mobile versions are optimized for speed and resource efficiency
+## Original Content
+Onnx version of [PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR).
+## Citation
+```bibtex
+@misc{paddleocr_onnx,
+  title        = {{PaddleOCR ONNX Models}},
+  author       = {PaddlePaddle and Repository Contributors},
+  howpublished = {\url{https://huggingface.co/deepghs/paddleocr}},
+  year         = {2023},
+  note         = {ONNX-format implementations of PaddleOCR models for multilingual text detection and recognition},
+  abstract     = {This repository provides ONNX-format implementations of PaddleOCR models, offering comprehensive optical character recognition capabilities for multilingual text detection and recognition. The models are exported from the original PaddleOCR framework and optimized for efficient inference across various deployment scenarios. The repository contains text detection models that identify text regions in images and text recognition models that convert detected text regions into actual text content, supporting multiple languages including Chinese, English, Japanese, Korean, Arabic, Cyrillic, Devanagari, and several other scripts.},
+  keywords     = {OCR, text-detection, text-recognition, multilingual, ONNX}
+}
+```