Skip to content

Conversation

@alisonshao
Copy link
Collaborator

@alisonshao alisonshao commented Dec 6, 2025

Summary

  • Add /model_info endpoint to mini_lb to fix regression introduced by tiny remove deprecated endpoint call #13607
  • Keep /get_model_info for backwards compatibility
  • Update backend call to use /model_info (the non-deprecated endpoint)

Root Cause

PR #13607 changed runtime_endpoint.py to use /model_info instead of /get_model_info, but mini_lb.py was not updated to support the new endpoint. This caused test_disaggregation_basic.py to fail with 404 errors when accessing /model_info through the mini load balancer.

Error example: https://github.com/sgl-project/sglang/actions/runs/19977384736/job/57312528074

Test plan

  • Verify test_disaggregation_basic.py passes in CI after this fix

PR #13607 changed runtime_endpoint.py to use /model_info instead of
/get_model_info, but mini_lb.py was not updated to support the new
endpoint. This caused test_disaggregation_basic.py to fail with 404
errors when accessing /model_info through the mini load balancer.

This fix:
- Adds /model_info endpoint to mini_lb.py
- Keeps /get_model_info for backwards compatibility
- Updates the backend call to use /model_info (the non-deprecated endpoint)
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @alisonshao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical regression in the mini load balancer (mini_lb) where it failed to recognize the newly introduced /model_info endpoint, leading to 404 errors. The changes ensure that the load balancer correctly routes requests for model information to the updated endpoint while also preserving compatibility with the older /get_model_info endpoint. This fix restores proper functionality for model information retrieval and resolves associated test failures.

Highlights

  • New Endpoint Added: Introduced the /model_info endpoint to mini_lb.py to address a regression caused by a previous change (PR tiny remove deprecated endpoint call #13607) that updated the model information endpoint in runtime_endpoint.py.
  • Backward Compatibility: Maintained the existing /get_model_info endpoint for backward compatibility, ensuring that older clients or services can still retrieve model information without disruption.
  • Internal Endpoint Update: Updated the internal call within mini_lb to use the new /model_info endpoint when fetching model information from backend servers, aligning with the non-deprecated endpoint.
  • Code Refactoring: Refactored the core logic for retrieving model information into a shared private asynchronous function, _get_model_info_impl, which is now called by both /model_info and /get_model_info endpoints.
  • Regression Fix: Resolved the 404 errors encountered by test_disaggregation_basic.py when attempting to access /model_info through the mini load balancer, ensuring the test now passes.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@alisonshao
Copy link
Collaborator Author

/rerun-stage unit-test-backend-2-gpu

@github-actions
Copy link
Contributor

github-actions bot commented Dec 6, 2025

✅ Triggered unit-test-backend-2-gpu to run independently (skipping dependencies).

Check the Actions tab for progress.

@alisonshao
Copy link
Collaborator Author

alisonshao commented Dec 6, 2025

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves the regression by introducing the /model_info endpoint in mini_lb.py and ensuring backward compatibility by retaining /get_model_info. The refactoring of the shared logic into the _get_model_info_impl function is a clean approach that improves code maintainability by avoiding duplication. The changes are well-targeted and correctly address the issue.

Comment on lines +377 to +384
@app.get("/model_info")
async def model_info():
return await _get_model_info_impl()


@app.get("/get_model_info")
async def get_model_info():
return await _get_model_info_impl()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve maintainability and reduce code duplication, you can register multiple paths to a single handler function in FastAPI. This would make it more explicit that both /model_info and /get_model_info are served by the same logic, and it reduces the number of similar functions to maintain.

Suggested change
@app.get("/model_info")
async def model_info():
return await _get_model_info_impl()
@app.get("/get_model_info")
async def get_model_info():
return await _get_model_info_impl()
@app.get("/model_info")
@app.get("/get_model_info")
async def get_model_info():
"""Handle both /model_info and /get_model_info for backward compatibility."""
return await _get_model_info_impl()

Copy link
Collaborator

@b8zhong b8zhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think it's the right fix.

@slin1237 slin1237 added the run-ci label Dec 6, 2025
@slin1237 slin1237 merged commit cee93a6 into main Dec 6, 2025
90 of 94 checks passed
@slin1237 slin1237 deleted the fix-mini-lb-model-info-endpoint branch December 6, 2025 19:34
eternally-z pushed a commit to AniZpZ/sglang that referenced this pull request Dec 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants