Skip to content

Conversation

@zhewenl
Copy link
Collaborator

@zhewenl zhewenl commented Oct 30, 2025

Purpose

More details in #27844.

Structured outputs tests were added in #12388 with @pytest.mark.skip_global_cleanup to speed up testing time, however this is causing tests OOMs on AMD CI specifically. (example)


2025-10-30 05:48:54 UTC | (EngineCore_DP0 pid=3997) ERROR 10-30 05:48:54 [core.py:779]     self.driver_worker.init_device()
-- | --
  | 2025-10-30 05:48:54 UTC | (EngineCore_DP0 pid=3997) ERROR 10-30 05:48:54 [core.py:779]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 308, in init_device
  | 2025-10-30 05:48:54 UTC | (EngineCore_DP0 pid=3997) ERROR 10-30 05:48:54 [core.py:779]     self.worker.init_device()  # type: ignore
  | 2025-10-30 05:48:54 UTC | (EngineCore_DP0 pid=3997) ERROR 10-30 05:48:54 [core.py:779]     ^^^^^^^^^^^^^^^^^^^^^^^^^
  | 2025-10-30 05:48:54 UTC | (EngineCore_DP0 pid=3997) ERROR 10-30 05:48:54 [core.py:779]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 207, in init_device
  | 2025-10-30 05:48:54 UTC | (EngineCore_DP0 pid=3997) ERROR 10-30 05:48:54 [core.py:779]     raise ValueError(
  | 2025-10-30 05:48:54 UTC | (EngineCore_DP0 pid=3997) ERROR 10-30 05:48:54 [core.py:779] ValueError: Free memory on device (24.92/255.98 GiB) on startup is less than desired GPU memory utilization (0.9, 230.39 GiB). Decrease GPU memory utilization or reduce GPU memory used by other processes.

As for a short term mitigation, we will call cleanup_dist_env_and_memory if these tests are running on AMD.

Test Plan

 pytest -v -s tests/v1/entrypoints/llm/test_struct_output_generate.py
...
================================================================== warnings summary ==================================================================
<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

<frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

tests/v1/entrypoints/llm/test_struct_output_generate.py::test_structured_output[mistralai/Ministral-8B-Instruct-2410-xgrammar-auto-None]
tests/v1/entrypoints/llm/test_struct_output_generate.py::test_structured_output[mistralai/Ministral-8B-Instruct-2410-guidance-auto-None]
tests/v1/entrypoints/llm/test_struct_output_generate.py::test_structured_output[mistralai/Ministral-8B-Instruct-2410-lm-format-enforcer-auto-None]
tests/v1/entrypoints/llm/test_struct_output_generate.py::test_structured_output[mistralai/Ministral-8B-Instruct-2410-outlines-auto-speculative_config6]
tests/v1/entrypoints/llm/test_struct_output_generate.py::test_structured_output[mistralai/Ministral-8B-Instruct-2410-guidance-auto-speculative_config7]
tests/v1/entrypoints/llm/test_struct_output_generate.py::test_structured_output_auto_mode[mistralai/Ministral-8B-Instruct-2410-auto]
  /data/users/zhewenli/gitrepos/vllm-fork/vllm/transformers_utils/tokenizer.py:287: FutureWarning: It is strongly recommended to run mistral models with `--tokenizer-mode "mistral"` to ensure correct encoding and decoding.
    return get_tokenizer(

tests/v1/entrypoints/llm/test_struct_output_generate.py::test_structured_output_with_structural_tag[xgrammar]
  /data/users/zhewenli/gitrepos/vllm-fork/tests/v1/entrypoints/llm/test_struct_output_generate.py:902: DeprecationWarning: guided_decoding is deprecated. This will be removed in v0.12.0 or v1.0.0, which ever is soonest. Please use structured_outputs instead.
    },

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
==================================================== 20 passed, 9 warnings in 1123.22s (0:18:43) =====================================================
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute

TODO
CI still OOM
CI: https://buildkite.com/vllm/amd-ci/builds/774

Signed-off-by: zhewenli <[email protected]>
@zhewenl zhewenl force-pushed the fix-amd-ci-struct-output-oom branch from 0a25a5e to 35ab86d Compare October 30, 2025 23:52
@zhewenl zhewenl marked this pull request as ready for review October 31, 2025 00:01
@zhewenl zhewenl requested review from aarnphm and mgoin as code owners October 31, 2025 00:01
@zhewenl zhewenl changed the title [CI/Build] Fix AMD structured outputs tests OOM [WIP][CI/Build] Fix AMD structured outputs tests OOM Oct 31, 2025
Signed-off-by: zhewenli <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant