paddleOCR-VL repeat tokens forever

#73
by yz342 - opened

When using paddleOCR-VL to do OCR with certain forms, we observe the model stuck keep repeating the same token forever.

docker run -d \
  --runtime nvidia \
  --gpus '"device=0"' \
  --name paddleOCR-VL_test \
  -e DEBUG="true" \
  -p 8001:8000 \
  --ipc=host \
  vllm/vllm-openai:nightly-ca00b1bfc69e71d860485340f0a197bf584ec004 \
  --model PaddlePaddle/PaddleOCR-VL \
  --trust-remote-code \
  --max-num-batched-tokens 16384 \
  --no-enable-prefix-caching \
  --mm-processor-cache-gb 0 \
  --gpu-memory-utilization 0.15

Device: A100, CUDA 12.8.

Screenshot 2025-11-13 100012
Code to reproduce the error. The code is pretty much the same as the official vllm example:

from openai import OpenAI
import base64
from mimetypes import guess_type
def local_data_url(image_path):
    # Guess the MIME type of the image based on the file extension
    mime_type, _ = guess_type(image_path)
    if mime_type is None:
        mime_type = 'application/octet-stream'  # Default MIME type if none is found

    # Read and encode the image file
    with open(image_path, "rb") as image_file:
        base64_encoded_data = base64.b64encode(image_file.read()).decode('utf-8')

    # Construct the data URL
    return f"data:{mime_type};base64,{base64_encoded_data}"
client = OpenAI(
    api_key="EMPTY",
    base_url="http://localhost:8001/v1",
    timeout=3600
)

# Task-specific base prompts
TASKS = {
    "ocr": "OCR:",
    "table": "Table Recognition:",
    "formula": "Formula Recognition:",
    "chart": "Chart Recognition:",
}
image_path="Screenshot 2025-11-13 100012.png"
data_url = local_data_url(image_path)
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image_url",
                "image_url": {
                    "url": data_url
                }
            },
            {
                "type": "text",
                "text": TASKS["ocr"]
            }
        ]
    }
]

response = client.chat.completions.create(
    model="PaddlePaddle/PaddleOCR-VL",
    messages=messages,
    temperature=0,
    stream=True
)

We observe that setting frequency_penalty=0.1 seems to alleviate the issue, however not sure if that's the best way and this is not mentioned in the official instruction.

I have same issue in some images on vllm 0.11.1 and 0.11.2. setting frequency_penalty=0.1 or higher can stop the tokens, but the quality of result is not good than frequency_penalty=0.0 in my case. maybe this caused by vllm.

Does anyone know what is going on?

I updated vllm to 0.12.0, endless token doesn't apear more with frequency_penalty=0.0. but response is still strange and repeat same tokens.

PaddlePaddle org

try repetition_penalty=1.05 or more

Sign up or log in to comment