paddleOCR-VL repeat tokens forever
When using paddleOCR-VL to do OCR with certain forms, we observe the model stuck keep repeating the same token forever.
docker run -d \
--runtime nvidia \
--gpus '"device=0"' \
--name paddleOCR-VL_test \
-e DEBUG="true" \
-p 8001:8000 \
--ipc=host \
vllm/vllm-openai:nightly-ca00b1bfc69e71d860485340f0a197bf584ec004 \
--model PaddlePaddle/PaddleOCR-VL \
--trust-remote-code \
--max-num-batched-tokens 16384 \
--no-enable-prefix-caching \
--mm-processor-cache-gb 0 \
--gpu-memory-utilization 0.15
Device: A100, CUDA 12.8.

Code to reproduce the error. The code is pretty much the same as the official vllm example:
from openai import OpenAI
import base64
from mimetypes import guess_type
def local_data_url(image_path):
# Guess the MIME type of the image based on the file extension
mime_type, _ = guess_type(image_path)
if mime_type is None:
mime_type = 'application/octet-stream' # Default MIME type if none is found
# Read and encode the image file
with open(image_path, "rb") as image_file:
base64_encoded_data = base64.b64encode(image_file.read()).decode('utf-8')
# Construct the data URL
return f"data:{mime_type};base64,{base64_encoded_data}"
client = OpenAI(
api_key="EMPTY",
base_url="http://localhost:8001/v1",
timeout=3600
)
# Task-specific base prompts
TASKS = {
"ocr": "OCR:",
"table": "Table Recognition:",
"formula": "Formula Recognition:",
"chart": "Chart Recognition:",
}
image_path="Screenshot 2025-11-13 100012.png"
data_url = local_data_url(image_path)
messages = [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": data_url
}
},
{
"type": "text",
"text": TASKS["ocr"]
}
]
}
]
response = client.chat.completions.create(
model="PaddlePaddle/PaddleOCR-VL",
messages=messages,
temperature=0,
stream=True
)
We observe that setting frequency_penalty=0.1 seems to alleviate the issue, however not sure if that's the best way and this is not mentioned in the official instruction.
I have same issue in some images on vllm 0.11.1 and 0.11.2. setting frequency_penalty=0.1 or higher can stop the tokens, but the quality of result is not good than frequency_penalty=0.0 in my case. maybe this caused by vllm.
Does anyone know what is going on?
I updated vllm to 0.12.0, endless token doesn't apear more with frequency_penalty=0.0. but response is still strange and repeat same tokens.
try repetition_penalty=1.05 or more