File size: 2,545 Bytes

---
tags:
- object-detection
- ocr
- text-detection
- dbnet
library_name: transformers
pipeline_tag: object-detection
---

# DBNet for Text Detection

Model Hub: [shuzi-mewtant/dbnet_res18_text_detection_v0.1](https://huggingface.co/shuzi-mewtant/dbnet_res18_text_detection_v0.1)

This is a DBNet model for text detection, ported to Hugging Face Transformers.
It uses a ResNet-18 backbone and Feature Pyramid Network (FPN) for multi-scale feature fusion.
The model was trained on the ICDAR 2015 dataset and supports detecting text in natural images.

## Usage

```python
from transformers import pipeline

# Load pipeline
ocr_pipe = pipeline(
    "object-detection",
    model="shuzi-mewtant/dbnet_res18_text_detection_v0.1",
    trust_remote_code=True
)

# Run inference
image_path = "path/to/image.jpg"
results = ocr_pipe(image_path)

for res in results:
    print(f"Box: {res['box']}, Score: {res['score']}")
```

### Local usage

If you have downloaded the model locally, you can use it directly:

```python
from transformers import AutoModel, AutoImageProcessor
from pipeline import DBNetPipeline

model_path = "/path/to/dbnet_res18_text_detection_v0.1"
model = AutoModel.from_pretrained(model_path, trust_remote_code=True)
processor = AutoImageProcessor.from_pretrained(model_path, trust_remote_code=True)
pipe = DBNetPipeline(model=model, image_processor=processor, task="object-detection")

results = pipe("path/to/image.jpg")
print(f"Found {len(results)} text regions")
for result in results:
    box = result['box']
    score = result['score']
    print(f"Box: {box}, Score: {score:.3f}")
```

### Testing the model

You can test the model using the provided test script:

```bash
cd /path/to/dbnet_res18_text_detection_v0.1
python test_model.py
```

## Model Details

- **Architecture**: DBNet with ResNet-18 backbone and FPN
- **Input size**: 1024x1024 pixels (automatically padded/resized)
- **Output**: 3-channel probability maps (shrink, threshold, binary)
- **Training data**: ICDAR 2015 dataset
- **Normalization**: RGB with mean [123.675, 116.28, 103.53] and std [58.395, 57.12, 57.375]

## Performance

The model achieves competitive performance on text detection benchmarks:
- Trained on ICDAR 2015 dataset
- Supports detection of horizontal and oriented text
- Post-processing includes NMS and box expansion for better localization

## Installation

```bash
pip install torch torchvision transformers opencv-python pillow safetensors
```

Or install from the requirements file:

```bash
pip install -r requirements.txt
```