-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
[Misc] Add ReplicaId to Ray metrics #24267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc] Add ReplicaId to Ray metrics #24267
Conversation
…t#22159 Signed-off-by: Seiji Eicher <[email protected]> Co-authored-by: rongfu.leng <[email protected]>
…metrics-replicaid Signed-off-by: Seiji Eicher <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]>
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Signed-off-by: Seiji Eicher <[email protected]>
ruisearch42
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
|
|
||
| self.metric.set_default_tags(labelskwargs) | ||
| @staticmethod | ||
| def _get_tag_keys(labelnames: list[str] | None) -> tuple[str, ...]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we call it label_names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable is called labelnames in the Prometheus class we are trying to emulate, unfortunately:
vllm/tests/v1/metrics/test_metrics_reader.py
Line 110 in 342c4f1
| labelnames=["position", "model", "engine_index"], |
So it's not safe to rename, due to keyword arguments
| labels.append("ReplicaId") | ||
| return tuple(labels) | ||
|
|
||
| def labels(self, *labels, **labelskwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why it's named labelskwargs in the first place, but a bit less readable. Can we rename to labels_kwargs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also named labelkwargs in the prometheus_client package.
def labels(self: T, *labelvalues: Any, **labelkwargs: Any) -> T:
| f"Expected {len(self.metric._tag_keys)}, got {len(labels)}" | ||
| f"Expected {expected}, got {len(labels)}" | ||
| ) | ||
| labelskwargs.update(zip(self.metric._tag_keys, labels)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this modifies the original dict? should we make a copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my understanding, **labelskwargs is a fresh dict each time. So we are not modifying any caller data
| if labelskwargs: | ||
| for k, v in labelskwargs.items(): | ||
| if not isinstance(v, str): | ||
| labelskwargs[k] = str(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, should we modify the copy rather than the original?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(as above) I think **labelskwargs will be a fresh dict each function call. So we are not modifying any caller data
| expected = len(self.metric._tag_keys) - 1 | ||
| if len(labels) != expected: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could there be a backwards compatibility issue? say an old version of ray/vllm is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm there is no corresponding change in Ray for this to be (in)compatible with. This change is self-contained within vLLM.
| @staticmethod | ||
| def _get_tag_keys(labelnames: list[str] | None) -> tuple[str, ...]: | ||
| labels = list(labelnames) if labelnames else [] | ||
| labels.append("ReplicaId") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment here to make the intention explicit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
eicherseiji
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @ruisearch42!
|
|
||
| self.metric.set_default_tags(labelskwargs) | ||
| @staticmethod | ||
| def _get_tag_keys(labelnames: list[str] | None) -> tuple[str, ...]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable is called labelnames in the Prometheus class we are trying to emulate, unfortunately:
vllm/tests/v1/metrics/test_metrics_reader.py
Line 110 in 342c4f1
| labelnames=["position", "model", "engine_index"], |
So it's not safe to rename, due to keyword arguments
| labels.append("ReplicaId") | ||
| return tuple(labels) | ||
|
|
||
| def labels(self, *labels, **labelskwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also named labelkwargs in the prometheus_client package.
def labels(self: T, *labelvalues: Any, **labelkwargs: Any) -> T:
| @staticmethod | ||
| def _get_tag_keys(labelnames: list[str] | None) -> tuple[str, ...]: | ||
| labels = list(labelnames) if labelnames else [] | ||
| labels.append("ReplicaId") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| expected = len(self.metric._tag_keys) - 1 | ||
| if len(labels) != expected: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm there is no corresponding change in Ray for this to be (in)compatible with. This change is self-contained within vLLM.
| f"Expected {len(self.metric._tag_keys)}, got {len(labels)}" | ||
| f"Expected {expected}, got {len(labels)}" | ||
| ) | ||
| labelskwargs.update(zip(self.metric._tag_keys, labels)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my understanding, **labelskwargs is a fresh dict each time. So we are not modifying any caller data
| if labelskwargs: | ||
| for k, v in labelskwargs.items(): | ||
| if not isinstance(v, str): | ||
| labelskwargs[k] = str(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(as above) I think **labelskwargs will be a fresh dict each function call. So we are not modifying any caller data
Signed-off-by: Seiji Eicher <[email protected]> Co-authored-by: rongfu.leng <[email protected]> Signed-off-by: Bofeng BF1 Xue <[email protected]>
Signed-off-by: Seiji Eicher <[email protected]> Co-authored-by: rongfu.leng <[email protected]> Signed-off-by: Xingyu Liu <[email protected]>
Purpose
WorkerIdin default tags.ReplicaIdis the more readable, dashboard-visible way to distinguish model replicas in metrics.Test Plan
Run Ray Serve LLM app:
Test Result
Curl the Prometheus endpoint, check that ReplicaId tag is included:
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.