Skip to content

Conversation

@WenyXu
Copy link
Member

@WenyXu WenyXu commented Nov 7, 2025

I hereby agree to the terms of the GreptimeDB CLA.

Refer to a related PR or issue link (optional)

What's changed and what's your intention?

This PR fixes multiple issues to improve system stability and consistency:

1. Leader State Management

  • Reset leader state after campaign failures for Postgres and MySQL backends
  • Ensure failed campaigns properly clear cached leader information to prevent stale leadership

2. Region Migration Procedures

  • Fix incorrect locking in region migration procedures
  • Remove unnecessary table route cache to maintain consistent state

3. Region Lease Renewal Error Handling

  • Handle region lease renewal errors gracefully without failing the heartbeat handler
  • Add error logging when region lease renewal fails, including datanode id and affected regions
  • Add warning log in heartbeat service when handler fails, with pusher_id context for better debugging
  • Prevent datanodes from being incorrectly marked as failed due to transient lease renewal errors

PR Checklist

Please convert it to a draft if some of the following conditions are not met.

  • I have written the necessary rustdoc comments.
  • I have added the necessary unit tests and integration tests.
  • This PR requires documentation updates.
  • API changes are backward compatible.
  • Schema or data changes are backward compatible.

@github-actions github-actions bot added size/S docs-not-required This change does not impact docs. labels Nov 7, 2025
@WenyXu WenyXu mentioned this pull request Nov 10, 2025
11 tasks
@github-actions github-actions bot added size/M and removed size/S labels Nov 10, 2025
@WenyXu WenyXu force-pushed the fix/region-migration-downgraded-region branch from 34f138e to f5e009a Compare November 10, 2025 10:59
@WenyXu WenyXu changed the title fix(meta): remove table route cache in region migration ctx fix: correct leader state reset and region migration locking consistency Nov 10, 2025
@WenyXu WenyXu marked this pull request as ready for review November 10, 2025 11:06
@WenyXu WenyXu requested a review from fengjiachun November 10, 2025 11:13
Copy link
Collaborator

@fengjiachun fengjiachun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MichaelScofield MichaelScofield added this pull request to the merge queue Nov 11, 2025
Merged via the queue into GreptimeTeam:main with commit ac0e95c Nov 11, 2025
47 checks passed
WenyXu added a commit to WenyXu/greptimedb that referenced this pull request Nov 11, 2025
…ncy (GreptimeTeam#7199)

* fix(meta): remove table route cache in region migration ctx

Signed-off-by: WenyXu <[email protected]>

* fix: fix unit tests

Signed-off-by: WenyXu <[email protected]>

* chore: fix clippy

Signed-off-by: WenyXu <[email protected]>

* fix: fix campaign reset not clearing leader state-s

Signed-off-by: WenyXu <[email protected]>

* feat: gracefully handle region lease renewal errors

Signed-off-by: WenyXu <[email protected]>

---------

Signed-off-by: WenyXu <[email protected]>
@killme2008
Copy link
Contributor

I think we should add tests for these fixes. @fengjiachun @WenyXu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-not-required This change does not impact docs. size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants