fix: run tokenizer.encode in executor to avoid blocking event loop by sufubao · Pull Request #1258 · ModelTC/LightLLM

sufubao · 2026-04-06T08:13:21Z

Summary

tokenizer.encode in httpserver/manager.py::_encode() is CPU-bound and was called synchronously on the asyncio event loop, blocking all concurrent I/O (streaming output, new request acceptance) during tokenization
Moved to run_in_executor so the event loop stays responsive under concurrent long-text requests

Test results (Qwen3.5-35B-A3B, 22k tokens)

A heartbeat coroutine ticks every 1ms to detect event loop blocking:

Mode	Encode time	Heartbeat ticks	Max gap
sync (before)	41.7ms	0 (loop fully blocked)	N/A
run_in_executor (after)	40.2ms	36	2.1ms

Test plan

python test/test_tokenizer_blocking.py — verifies event loop is not blocked during tokenization
End-to-end server test with concurrent streaming requests

… loop tokenizer.encode is CPU-bound and blocks the event loop when called synchronously, stalling all concurrent I/O (streaming, new requests). Move it to run_in_executor so the event loop stays responsive. Test results (Qwen3.5-35B-A3B, 22k tokens): [sync] encode: 41.7ms | heartbeat ticks: 0 (loop fully blocked) [executor] encode: 40.2ms | heartbeat ticks: 36 | max gap: 2.1ms

gemini-code-assist

Code Review

This pull request updates the HTTP server manager to execute tokenizer encoding within a thread pool executor, preventing synchronous operations from blocking the asyncio event loop. It also introduces a new test script to verify the non-blocking behavior. Feedback includes a recommendation to use asyncio.get_running_loop() for better consistency and a suggestion to avoid hardcoded local paths in the test suite to ensure portability.

gemini-code-assist · 2026-04-06T08:14:35Z

lightllm/server/httpserver/manager.py

+                prompt_ids = await asyncio.get_event_loop().run_in_executor(
+                    None,
+                    lambda: self.tokenizer.encode(
+                        prompt, multimodal_params, add_special_tokens=sampling_params.add_special_tokens
+                    ),
                )
            else:
-                prompt_ids = self.tokenizer.encode(prompt, add_special_tokens=sampling_params.add_special_tokens)
+                prompt_ids = await asyncio.get_event_loop().run_in_executor(
+                    None,
+                    lambda: self.tokenizer.encode(prompt, add_special_tokens=sampling_params.add_special_tokens),
+                )


Using asyncio.get_running_loop() is preferred over asyncio.get_event_loop() inside coroutines. get_running_loop() is more explicit and ensures that an error is raised if no loop is currently running, which helps in catching bugs during development. Additionally, it avoids potential deprecation warnings in newer Python versions.

Suggested change

prompt_ids = await asyncio.get_event_loop().run_in_executor(

None,

lambda: self.tokenizer.encode(

prompt, multimodal_params, add_special_tokens=sampling_params.add_special_tokens

),

)

else:

prompt_ids = self.tokenizer.encode(prompt, add_special_tokens=sampling_params.add_special_tokens)

prompt_ids = await asyncio.get_event_loop().run_in_executor(

None,

lambda: self.tokenizer.encode(prompt, add_special_tokens=sampling_params.add_special_tokens),

)

prompt_ids = await asyncio.get_running_loop().run_in_executor(

None,

lambda: self.tokenizer.encode(

prompt, multimodal_params, add_special_tokens=sampling_params.add_special_tokens

),

)

else:

prompt_ids = await asyncio.get_running_loop().run_in_executor(

None,

lambda: self.tokenizer.encode(prompt, add_special_tokens=sampling_params.add_special_tokens),

)

gemini-code-assist · 2026-04-06T08:14:35Z

test/test_tokenizer_blocking.py

+import statistics
+from transformers import AutoTokenizer
+
+MODEL_DIR = "/nvme/models/Qwen3.5-35B-A3B"


The MODEL_DIR is hardcoded to a specific local path (/nvme/models/Qwen3.5-35B-A3B). This makes the test non-portable and likely to fail in CI environments or on other developers' machines. Consider using a small, publicly available model identifier (e.g., "Qwen/Qwen2.5-0.5B") or allowing the path to be set via an environment variable.

gemini-code-assist bot reviewed Apr 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: run tokenizer.encode in executor to avoid blocking event loop#1258

fix: run tokenizer.encode in executor to avoid blocking event loop#1258
sufubao wants to merge 1 commit intomainfrom
fix/tokenizer-encode-blocking-eventloop

sufubao commented Apr 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

gemini-code-assist bot Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sufubao commented Apr 6, 2026

Summary

Test results (Qwen3.5-35B-A3B, 22k tokens)

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant