[Bugfix] fix dp parallel + tp > 1 offline inference port conflict #4539

leo-pony · 2025-11-28T08:20:14Z

What this PR does / why we need it?

fix dp parallel + tp > 1 offline inference port conflict

issue import PR:#429

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.2
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2

github-actions · 2025-11-28T08:20:21Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request addresses a port conflict issue that occurs during offline inference in distributed environments using both data parallelism and tensor parallelism (tp > 1). The change removes a monkey-patch for ParallelConfig.get_next_dp_init_port that was causing this conflict. The removed code attempted to manage data parallel ports by incrementing a counter on the ParallelConfig instance. This approach is not safe across multiple processes, as each process would have its own instance and counter, leading to race conditions and port collisions. By deleting this faulty patch, the code now relies on the default port allocation mechanism from the underlying vLLM framework, which should correctly handle multi-process scenarios. This is a clean and effective fix for the described problem.

Signed-off-by: leo-pony <[email protected]>

…lm-project#4539) ### What this PR does / why we need it? fix dp parallel + tp > 1 offline inference port conflict issue import PR:vllm-project#429 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: leo-pony <[email protected]>

…lm-project#4539) ### What this PR does / why we need it? fix dp parallel + tp > 1 offline inference port conflict issue import PR:vllm-project#429 - vLLM version: v0.11.2 - vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.2 --------- Signed-off-by: leo-pony <[email protected]> Signed-off-by: Che Ruan <[email protected]>

leo-pony changed the title ~~fix dp parallel + tp > 1 offline inference port conflict~~ [Bugfix] fix dp parallel + tp > 1 offline inference port conflict Nov 28, 2025

gemini-code-assist bot reviewed Nov 28, 2025

View reviewed changes

wangxiyuan approved these changes Nov 28, 2025

View reviewed changes

leo-pony mentioned this pull request Nov 28, 2025

[Bug]: DP parallel + TP > 1 offline inference failed: address already in use #4546

Closed

leo-pony force-pushed the dp_tp_parallel_fix branch from 977e836 to ff29d66 Compare November 28, 2025 12:01

github-actions bot added the module:tests label Nov 28, 2025

leo-pony added ready read for review ready-for-test start test by label for PR labels Nov 28, 2025

leo-pony added 2 commits November 29, 2025 01:08

fix dp parallel + tp > 1 offline inference port confilict

b208a9d

Signed-off-by: leo-pony <[email protected]>

Add e2e DP + TP2 test case

9260b49

Signed-off-by: leo-pony <[email protected]>

leo-pony force-pushed the dp_tp_parallel_fix branch from ff29d66 to 9260b49 Compare November 29, 2025 01:08

MengqingCao approved these changes Nov 29, 2025

View reviewed changes

leo-pony added ready read for review ready-for-test start test by label for PR and removed ready read for review ready-for-test start test by label for PR labels Nov 29, 2025

fix e2e ci test_data_parallel_tp2 permission error

2e99435

Signed-off-by: leo-pony <[email protected]>

wangxiyuan merged commit a3041cd into vllm-project:main Nov 29, 2025
21 of 22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] fix dp parallel + tp > 1 offline inference port conflict #4539

[Bugfix] fix dp parallel + tp > 1 offline inference port conflict #4539

Uh oh!

leo-pony commented Nov 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Bugfix] fix dp parallel + tp > 1 offline inference port conflict #4539

[Bugfix] fix dp parallel + tp > 1 offline inference port conflict #4539

Uh oh!

Conversation

leo-pony commented Nov 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

leo-pony commented Nov 28, 2025 •

edited by github-actions bot

Loading