PaddlePaddle/PaddleOCR-VL · paddleOCR-VL repeat tokens forever

paddleOCR-VL repeat tokens forever

#73

by yz342 - opened 24 days ago

24 days ago

When using paddleOCR-VL to do OCR with certain forms, we observe the model stuck keep repeating the same token forever.

docker run -d \
  --runtime nvidia \
  --gpus '"device=0"' \
  --name paddleOCR-VL_test \
  -e DEBUG="true" \
  -p 8001:8000 \
  --ipc=host \
  vllm/vllm-openai:nightly-ca00b1bfc69e71d860485340f0a197bf584ec004 \
  --model PaddlePaddle/PaddleOCR-VL \
  --trust-remote-code \
  --max-num-batched-tokens 16384 \
  --no-enable-prefix-caching \
  --mm-processor-cache-gb 0 \
  --gpu-memory-utilization 0.15

Device: A100, CUDA 12.8.

Code to reproduce the error. The code is pretty much the same as the official vllm example:

from openai import OpenAI
import base64
from mimetypes import guess_type
def local_data_url(image_path):
    # Guess the MIME type of the image based on the file extension
    mime_type, _ = guess_type(image_path)
    if mime_type is None:
        mime_type = 'application/octet-stream'  # Default MIME type if none is found

    # Read and encode the image file
    with open(image_path, "rb") as image_file:
        base64_encoded_data = base64.b64encode(image_file.read()).decode('utf-8')

    # Construct the data URL
    return f"data:{mime_type};base64,{base64_encoded_data}"
client = OpenAI(
    api_key="EMPTY",
    base_url="http://localhost:8001/v1",
    timeout=3600
)

# Task-specific base prompts
TASKS = {
    "ocr": "OCR:",
    "table": "Table Recognition:",
    "formula": "Formula Recognition:",
    "chart": "Chart Recognition:",
}
image_path="Screenshot 2025-11-13 100012.png"
data_url = local_data_url(image_path)
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": data_url
                }
            },
            {
                "type": "text",
                "text": TASKS["ocr"]
            }
        ]
    }
]

response = client.chat.completions.create(
    model="PaddlePaddle/PaddleOCR-VL",
    messages=messages,
    temperature=0,
    stream=True
)

We observe that setting frequency_penalty=0.1 seems to alleviate the issue, however not sure if that's the best way and this is not mentioned in the official instruction.

sucream

12 days ago

I have same issue in some images on vllm 0.11.1 and 0.11.2. setting frequency_penalty=0.1 or higher can stop the tokens, but the quality of result is not good than frequency_penalty=0.0 in my case. maybe this caused by vllm.

yz342

5 days ago

Does anyone know what is going on?

sucream

4 days ago

I updated vllm to 0.12.0, endless token doesn't apear more with frequency_penalty=0.0. but response is still strange and repeat same tokens.

sunflowerting78

PaddlePaddle org 4 days ago

try repetition_penalty=1.05 or more

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment