missing the beginning of think tag

by O-delicious - opened 2 days ago

2 days ago

I hosted the model via vllm and already without reasoning_parser, I found the model output with directly output without but having close tag later.

root@iv-ydzbs5zshss6ipm6s5gu /h/n/d/ark_http_proxy# curl --location 'http://localhost/v1/chat/completions' \
                                                    --header 'Authorization: Bearer YOUR_API_KEY' \
                                                    --header 'Content-Type: application/json' \
                                                    --data '{
                                                        "model": "GLM-4.7-FP8", "stream": true,
                                                        "messages": [
                                                            {
                                                                "role": "user",
                                                                "content": "what is cryptography"
                                                            }
                                                        ],"chat_template_kwargs": {"enable_thinking": true}, "skip_special_tokens": false,
                                                        "thinking": {
                                                            "type": "enabled"
                                                        },
                                                        "max_tokens": 1024,
                                                        "temperature": 1.0
                                                    }'
data: {"id":"chatcmpl-9fbc092d919f9e51","object":"chat.completion.chunk","created":1766599479,"model":"GLM-4.7-FP8","choices":[{"index":0,"delta":{"role":"assistant","content":"","reasoning_content":null},"logprobs":null,"finish_reason":null}],"prompt_token_ids":null}

data: {"id":"chatcmpl-9fbc092d919f9e51","object":"chat.completion.chunk","created":1766599479,"model":"GLM-4.7-FP8","choices":[{"index":0,"delta":{"content":"1","reasoning_content":null},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"id":"chatcmpl-9fbc092d919f9e51","object":"chat.completion.chunk","created":1766599479,"model":"GLM-4.7-FP8","choices":[{"index":0,"delta":{"content":". ","reasoning_content":null},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"id":"chatcmpl-9fbc092d919f9e51","object":"chat.completion.chunk","created":1766599479,"model":"GLM-4.7-FP8","choices":[{"index":0,"delta":{"content":" **An","reasoning_content":null},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"id":"chatcmpl-9fbc092d919f9e51","object":"chat.completion.chunk","created":1766599479,"model":"GLM-4.7-FP8","choices":[{"index":0,"delta":{"content":"alyze the","reasoning_content":null},"logprobs":null,"finish_reason":null,"token_ids":null}]}

I confirmed that chat template will

root@iv-ydzbs5zshss6ipm6s5gu /h/n/d/ark_http_proxy# curl -sS 'http://127.0.0.1/tokenize' \
                                                      -H 'Content-Type: application/json' \
                                                      -d '{"model":"GLM-4.7-FP8","messages":[{"role":"user","content":"hi"}],"add_generation_prompt":true,"return_token_strs":true}'
{"count":6,"max_model_len":202752,"tokens":[151331,151333,151336,6023,151337,151350],"token_strs":["[gMASK]","<sop>","<|user|>","hi","<|assistant|>","<think>"]}⏎

O-delicious

2 days ago

•

edited 2 days ago

I think it is vllm bug. I did a patch and opened a issue https://github.com/vllm-project/vllm/issues/31319

I will wait for vllm team to confirm and close this one.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment