Skip to content

fix: run tokenizer.encode in executor to avoid blocking event loop#1258

Open
sufubao wants to merge 1 commit intomainfrom
fix/tokenizer-encode-blocking-eventloop
Open

fix: run tokenizer.encode in executor to avoid blocking event loop#1258
sufubao wants to merge 1 commit intomainfrom
fix/tokenizer-encode-blocking-eventloop

Conversation

@sufubao
Copy link
Copy Markdown
Collaborator

@sufubao sufubao commented Apr 6, 2026

Summary

  • tokenizer.encode in httpserver/manager.py::_encode() is CPU-bound and was called synchronously on the asyncio event loop, blocking all concurrent I/O (streaming output, new request acceptance) during tokenization
  • Moved to run_in_executor so the event loop stays responsive under concurrent long-text requests

Test results (Qwen3.5-35B-A3B, 22k tokens)

A heartbeat coroutine ticks every 1ms to detect event loop blocking:

Mode Encode time Heartbeat ticks Max gap
sync (before) 41.7ms 0 (loop fully blocked) N/A
run_in_executor (after) 40.2ms 36 2.1ms

Test plan

  • python test/test_tokenizer_blocking.py — verifies event loop is not blocked during tokenization
  • End-to-end server test with concurrent streaming requests

… loop

tokenizer.encode is CPU-bound and blocks the event loop when called
synchronously, stalling all concurrent I/O (streaming, new requests).
Move it to run_in_executor so the event loop stays responsive.

Test results (Qwen3.5-35B-A3B, 22k tokens):
  [sync]     encode: 41.7ms | heartbeat ticks: 0 (loop fully blocked)
  [executor] encode: 40.2ms | heartbeat ticks: 36 | max gap: 2.1ms
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the HTTP server manager to execute tokenizer encoding within a thread pool executor, preventing synchronous operations from blocking the asyncio event loop. It also introduces a new test script to verify the non-blocking behavior. Feedback includes a recommendation to use asyncio.get_running_loop() for better consistency and a suggestion to avoid hardcoded local paths in the test suite to ensure portability.

Comment on lines +451 to +461
prompt_ids = await asyncio.get_event_loop().run_in_executor(
None,
lambda: self.tokenizer.encode(
prompt, multimodal_params, add_special_tokens=sampling_params.add_special_tokens
),
)
else:
prompt_ids = self.tokenizer.encode(prompt, add_special_tokens=sampling_params.add_special_tokens)
prompt_ids = await asyncio.get_event_loop().run_in_executor(
None,
lambda: self.tokenizer.encode(prompt, add_special_tokens=sampling_params.add_special_tokens),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using asyncio.get_running_loop() is preferred over asyncio.get_event_loop() inside coroutines. get_running_loop() is more explicit and ensures that an error is raised if no loop is currently running, which helps in catching bugs during development. Additionally, it avoids potential deprecation warnings in newer Python versions.

Suggested change
prompt_ids = await asyncio.get_event_loop().run_in_executor(
None,
lambda: self.tokenizer.encode(
prompt, multimodal_params, add_special_tokens=sampling_params.add_special_tokens
),
)
else:
prompt_ids = self.tokenizer.encode(prompt, add_special_tokens=sampling_params.add_special_tokens)
prompt_ids = await asyncio.get_event_loop().run_in_executor(
None,
lambda: self.tokenizer.encode(prompt, add_special_tokens=sampling_params.add_special_tokens),
)
prompt_ids = await asyncio.get_running_loop().run_in_executor(
None,
lambda: self.tokenizer.encode(
prompt, multimodal_params, add_special_tokens=sampling_params.add_special_tokens
),
)
else:
prompt_ids = await asyncio.get_running_loop().run_in_executor(
None,
lambda: self.tokenizer.encode(prompt, add_special_tokens=sampling_params.add_special_tokens),
)

import statistics
from transformers import AutoTokenizer

MODEL_DIR = "/nvme/models/Qwen3.5-35B-A3B"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The MODEL_DIR is hardcoded to a specific local path (/nvme/models/Qwen3.5-35B-A3B). This makes the test non-portable and likely to fail in CI environments or on other developers' machines. Consider using a small, publicly available model identifier (e.g., "Qwen/Qwen2.5-0.5B") or allowing the path to be set via an environment variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant