File size: 2,545 Bytes
7e14917 620bd23 7e14917 620bd23 7e14917 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
---
tags:
- object-detection
- ocr
- text-detection
- dbnet
library_name: transformers
pipeline_tag: object-detection
---
# DBNet for Text Detection
Model Hub: [shuzi-mewtant/dbnet_res18_text_detection_v0.1](https://huggingface.co/shuzi-mewtant/dbnet_res18_text_detection_v0.1)
This is a DBNet model for text detection, ported to Hugging Face Transformers.
It uses a ResNet-18 backbone and Feature Pyramid Network (FPN) for multi-scale feature fusion.
The model was trained on the ICDAR 2015 dataset and supports detecting text in natural images.
## Usage
```python
from transformers import pipeline
# Load pipeline
ocr_pipe = pipeline(
"object-detection",
model="shuzi-mewtant/dbnet_res18_text_detection_v0.1",
trust_remote_code=True
)
# Run inference
image_path = "path/to/image.jpg"
results = ocr_pipe(image_path)
for res in results:
print(f"Box: {res['box']}, Score: {res['score']}")
```
### Local usage
If you have downloaded the model locally, you can use it directly:
```python
from transformers import AutoModel, AutoImageProcessor
from pipeline import DBNetPipeline
model_path = "/path/to/dbnet_res18_text_detection_v0.1"
model = AutoModel.from_pretrained(model_path, trust_remote_code=True)
processor = AutoImageProcessor.from_pretrained(model_path, trust_remote_code=True)
pipe = DBNetPipeline(model=model, image_processor=processor, task="object-detection")
results = pipe("path/to/image.jpg")
print(f"Found {len(results)} text regions")
for result in results:
box = result['box']
score = result['score']
print(f"Box: {box}, Score: {score:.3f}")
```
### Testing the model
You can test the model using the provided test script:
```bash
cd /path/to/dbnet_res18_text_detection_v0.1
python test_model.py
```
## Model Details
- **Architecture**: DBNet with ResNet-18 backbone and FPN
- **Input size**: 1024x1024 pixels (automatically padded/resized)
- **Output**: 3-channel probability maps (shrink, threshold, binary)
- **Training data**: ICDAR 2015 dataset
- **Normalization**: RGB with mean [123.675, 116.28, 103.53] and std [58.395, 57.12, 57.375]
## Performance
The model achieves competitive performance on text detection benchmarks:
- Trained on ICDAR 2015 dataset
- Supports detection of horizontal and oriented text
- Post-processing includes NMS and box expansion for better localization
## Installation
```bash
pip install torch torchvision transformers opencv-python pillow safetensors
```
Or install from the requirements file:
```bash
pip install -r requirements.txt
```
|