-
-
Notifications
You must be signed in to change notification settings - Fork 496
Description
refs:
rabbit.schema#L1552-L1563
Schema Syncing from Online Peers:
A restarted node will sync the schema and other information from its peers on boot. Before this process completes, the node won't be fully started and functional.
A stopping node picks an online cluster member (only disc nodes will be considered) to sync with after restart. Upon restart the node will try to contact that peer 10 times by default, with 30 second response timeouts.
In case the peer becomes available in that time interval, the node successfully starts, syncs what it needs from the peer and keeps going.
If the peer does not become available, the restarted node will give up and voluntarily stop. Such condition can be identified by the timeout (timeout_waiting_for_tables) warning messages in the logs that eventually lead to node startup failure:
This window of time can be adjusted using two configuration settings:
# wait for 60 seconds instead of 30
mnesia_table_loading_retry_timeout = 60000
# retry 15 times instead of 10
mnesia_table_loading_retry_limit = 15
By adjusting these settings and tweaking the time window in which known peer has to come back it is possible to account for cluster-wide redeployment scenarios that can be longer than 5 minutes to complete.