Skip to content

Conversation

@eicherseiji
Copy link
Contributor

@eicherseiji eicherseiji commented Sep 4, 2025

Purpose

Test Plan

Run Ray Serve LLM app:

from ray import serve
from ray.serve.llm import LLMConfig, build_openai_app

llm_config = LLMConfig(
    model_loading_config={
        "model_id": "qwen-0.5b",
        "model_source": "Qwen/Qwen2.5-0.5B-Instruct",
    },
    deployment_config={
        "autoscaling_config": {
            "min_replicas": 2,
            "max_replicas": 2,
        },
    },
)

app = build_openai_app({"llm_configs": [llm_config]})
serve.run(app, blocking=True)

Test Result

Curl the Prometheus endpoint, check that ReplicaId tag is included:

(base) ray@ip-10-1-113-171:~/default/work/ray$ curl localhost:8085 | grep vllm_time_to
...
ray_vllm_time_to_first_token_seconds_count{Component="core_worker",NodeAddress="10.1.113.171",ReplicaId="3r4e76ag",SessionName="session_2025-12-01_11-05-34_460418_3579",Version="2.52.0",WorkerId="6f8b9bc4ba0fda1ef6769ce262129be2b522f90a31a593ec37379267",engine="0",model_name="qwen-0.5b"} 266.0
ray_vllm_time_to_first_token_seconds_sum{Component="core_worker",NodeAddress="10.1.113.171",ReplicaId="3r4e76ag",SessionName="session_2025-12-01_11-05-34_460418_3579",Version="2.52.0",WorkerId="6f8b9bc4ba0fda1ef6769ce262129be2b522f90a31a593ec37379267",engine="0",model_name="qwen-0.5b"} 2.077499999999997
ray_vllm_time_to_first_token_seconds_bucket{Component="core_worker",NodeAddress="10.1.113.171",ReplicaId="r7owse5i",SessionName="session_2025-12-01_11-05-34_460418_3579",Version="2.52.0",WorkerId="2a2948dc4ab2c93c9731b7fb1fc0d2ec4c0dd33794454bacf527fa53",engine="0",le="0.001",model_name="qwen-0.5b"} 0.0
...
Screenshot 2025-12-01 at 2 30 57 PM
Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify bot added the v1 label Sep 4, 2025
Signed-off-by: Seiji Eicher <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
@eicherseiji eicherseiji marked this pull request as ready for review December 1, 2025 22:34
@eicherseiji eicherseiji requested a review from markmc as a code owner December 1, 2025 22:34
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Signed-off-by: Seiji Eicher <[email protected]>
Copy link
Collaborator

@ruisearch42 ruisearch42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM


self.metric.set_default_tags(labelskwargs)
@staticmethod
def _get_tag_keys(labelnames: list[str] | None) -> tuple[str, ...]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we call it label_names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable is called labelnames in the Prometheus class we are trying to emulate, unfortunately:

labelnames=["position", "model", "engine_index"],

So it's not safe to rename, due to keyword arguments

labels.append("ReplicaId")
return tuple(labels)

def labels(self, *labels, **labelskwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why it's named labelskwargs in the first place, but a bit less readable. Can we rename to labels_kwargs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also named labelkwargs in the prometheus_client package.

    def labels(self: T, *labelvalues: Any, **labelkwargs: Any) -> T:

f"Expected {len(self.metric._tag_keys)}, got {len(labels)}"
f"Expected {expected}, got {len(labels)}"
)
labelskwargs.update(zip(self.metric._tag_keys, labels))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this modifies the original dict? should we make a copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, **labelskwargs is a fresh dict each time. So we are not modifying any caller data

if labelskwargs:
for k, v in labelskwargs.items():
if not isinstance(v, str):
labelskwargs[k] = str(v)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, should we modify the copy rather than the original?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(as above) I think **labelskwargs will be a fresh dict each function call. So we are not modifying any caller data

Comment on lines +44 to +45
expected = len(self.metric._tag_keys) - 1
if len(labels) != expected:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could there be a backwards compatibility issue? say an old version of ray/vllm is used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm there is no corresponding change in Ray for this to be (in)compatible with. This change is self-contained within vLLM.

@staticmethod
def _get_tag_keys(labelnames: list[str] | None) -> tuple[str, ...]:
labels = list(labelnames) if labelnames else []
labels.append("ReplicaId")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment here to make the intention explicit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

@eicherseiji eicherseiji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @ruisearch42!


self.metric.set_default_tags(labelskwargs)
@staticmethod
def _get_tag_keys(labelnames: list[str] | None) -> tuple[str, ...]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable is called labelnames in the Prometheus class we are trying to emulate, unfortunately:

labelnames=["position", "model", "engine_index"],

So it's not safe to rename, due to keyword arguments

labels.append("ReplicaId")
return tuple(labels)

def labels(self, *labels, **labelskwargs):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also named labelkwargs in the prometheus_client package.

    def labels(self: T, *labelvalues: Any, **labelkwargs: Any) -> T:

@staticmethod
def _get_tag_keys(labelnames: list[str] | None) -> tuple[str, ...]:
labels = list(labelnames) if labelnames else []
labels.append("ReplicaId")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +44 to +45
expected = len(self.metric._tag_keys) - 1
if len(labels) != expected:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm there is no corresponding change in Ray for this to be (in)compatible with. This change is self-contained within vLLM.

f"Expected {len(self.metric._tag_keys)}, got {len(labels)}"
f"Expected {expected}, got {len(labels)}"
)
labelskwargs.update(zip(self.metric._tag_keys, labels))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, **labelskwargs is a fresh dict each time. So we are not modifying any caller data

if labelskwargs:
for k, v in labelskwargs.items():
if not isinstance(v, str):
labelskwargs[k] = str(v)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(as above) I think **labelskwargs will be a fresh dict each function call. So we are not modifying any caller data

@ruisearch42 ruisearch42 added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 2, 2025
@ruisearch42 ruisearch42 enabled auto-merge (squash) December 2, 2025 01:11
@ruisearch42 ruisearch42 merged commit 22274b2 into vllm-project:main Dec 2, 2025
47 checks passed
xbfs pushed a commit to xbfs/vllm that referenced this pull request Dec 5, 2025
Signed-off-by: Seiji Eicher <[email protected]>
Co-authored-by: rongfu.leng <[email protected]>
Signed-off-by: Bofeng BF1 Xue <[email protected]>
charlotte12l pushed a commit to charlotte12l/vllm that referenced this pull request Dec 5, 2025
Signed-off-by: Seiji Eicher <[email protected]>
Co-authored-by: rongfu.leng <[email protected]>
Signed-off-by: Xingyu Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants