Skip to content

Conversation

@SlightwindSec
Copy link
Contributor

@SlightwindSec SlightwindSec commented Dec 11, 2025

In the PD separation scenario, the D node does not need to perform get operations, and therefore does not need to create ZeroMQ (ZMQ) communication.

Signed-off-by: SlightwindSec <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request attempts to conditionally initialize the Mooncake lookup client and server. However, the conditions introduced are too restrictive and lead to incorrect behavior. Specifically, the MooncakeLookupServer is not started for the kv_both role, and the MooncakeLookupClient is not created in cases where it's needed (e.g., kv_both role, or kv_consumer with loading enabled), which will cause a runtime crash. I've provided critical feedback and suggestions to fix these issues.

Comment on lines 165 to 168
self.client = MooncakeLookupClient(
vllm_config) if self.kv_role == "kv_producer" else None
self.consumer_is_to_load = vllm_config.kv_transfer_config.kv_connector_extra_config.get(
"consumer_is_to_load", False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The condition for creating MooncakeLookupClient is too restrictive and will cause a NoneType error. The client is needed for lookups when the kv_role is kv_producer or kv_both, or when it's kv_consumer and consumer_is_to_load is true. The current logic only creates the client for kv_producer, which will cause a crash in other cases where a lookup is attempted.

To fix this, consumer_is_to_load should be initialized before self.client, and the condition for client creation should be corrected.

Suggested change
self.client = MooncakeLookupClient(
vllm_config) if self.kv_role == "kv_producer" else None
self.consumer_is_to_load = vllm_config.kv_transfer_config.kv_connector_extra_config.get(
"consumer_is_to_load", False)
self.consumer_is_to_load = vllm_config.kv_transfer_config.kv_connector_extra_config.get(
"consumer_is_to_load", False)
if self.kv_role != "kv_consumer" or self.consumer_is_to_load:
self.client = MooncakeLookupClient(vllm_config)
else:
self.client = None


assert self.connector_worker is not None
if vllm_config.parallel_config.rank == 0:
if vllm_config.parallel_config.rank == 0 and self.kv_role == "kv_producer":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The condition to start MooncakeLookupServer is too restrictive. It should be started if the role is kv_producer or kv_both. With the current change, a node with kv_both role will not start the server, which is incorrect as it's also a producer.

Suggested change
if vllm_config.parallel_config.rank == 0 and self.kv_role == "kv_producer":
if vllm_config.parallel_config.rank == 0 and self.kv_role != "kv_consumer":

@wangxiyuan
Copy link
Collaborator

please update the commit message and point that which PR is picked from main.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Signed-off-by: SlightwindSec <[email protected]>
@SlightwindSec SlightwindSec changed the title mooncake: update self.client [v0.11.0-dev] Remove the ZMQ communication setup on the D node Dec 11, 2025
Signed-off-by: SlightwindSec <[email protected]>
@SlightwindSec SlightwindSec changed the title [v0.11.0-dev] Remove the ZMQ communication setup on the D node [0.11.0][bugfix] Remove the ZMQ communication setup on the D node Dec 11, 2025
@SlightwindSec SlightwindSec changed the title [0.11.0][bugfix] Remove the ZMQ communication setup on the D node [0.11.0][Bugfix] Remove the ZMQ communication setup on the D node Dec 11, 2025
@SlightwindSec
Copy link
Contributor Author

please update the commit message and point that which PR is picked from main.

done

@wangxiyuan wangxiyuan merged commit 9c0ad46 into vllm-project:v0.11.0-dev Dec 12, 2025
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants