Skip to content

Conversation

@kostasrim
Copy link
Contributor

@kostasrim kostasrim commented Oct 27, 2025

Add ProtocolClient::Shutdown() which does not close the socket.

For cluster:

When a migration is cancelled, we should not Close() the socket inside Migration::Finish(). The reason is that

  195   if (should_cancel_flows) { 
  196     OnAllShards([](auto& migration) {
  197       CHECK(migration != nullptr);
  198       migration->Cancel();
  199     });   
  200     exec_st_.JoinErrorHandler();                                                                   
  201   }                                                                                                
  202 }   

does not block and wait Migration::SyncFb to complete. In this context, error is empty so there is no error handler to join on.

Then within sync fb the dispatched callbacks of OnAllShards execute:

  285     OnAllShards([this](auto& migration) { migration->PrepareFlow(cf_->MyID()); });
  286     if (!exec_st_.IsRunning())
  287       // exit loop here

And initiate IO on a Closed() fd triggering the check fd >0 within the socket destructor.

Replication:

Also similarly, calling sock->close() early can lead to the same crash in dfly flows. The reason is that the soon to be cancelled fiber attempts to do IO and crashes because of the check fail. Now instead, we shutdown, wait for the flows to complete and then close the socket.

resolves #5968 and #5982

@kostasrim kostasrim self-assigned this Oct 27, 2025
@kostasrim kostasrim changed the title chore: close socket on destructor for outgoing migrations [wip] chore: close socket on destructor for outgoing migrations Oct 27, 2025
@kostasrim kostasrim force-pushed the kpr4 branch 2 times, most recently from d02a54f to 2235891 Compare October 31, 2025 12:19
@kostasrim kostasrim changed the title [wip] chore: close socket on destructor for outgoing migrations chore: close socket on destructor for outgoing migrations Nov 3, 2025
@kostasrim kostasrim marked this pull request as ready for review November 3, 2025 08:58
@kostasrim kostasrim changed the title chore: close socket on destructor for outgoing migrations chore: separate shutdown and close for ProtocolClient Nov 3, 2025
Copy link
Contributor

@vyavdoshenko vyavdoshenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@kostasrim kostasrim merged commit fc657ca into main Nov 3, 2025
21 of 34 checks passed
@kostasrim kostasrim deleted the kpr4 branch November 3, 2025 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test_cluster_migration

4 participants