-
Notifications
You must be signed in to change notification settings - Fork 73
Description
I experienced this working with postgres. If you don't use timeout_seconds then it's possible for two processes to get deadlocked against each other and get stuck in a sleep-retry loop forever.
You can reproduce this by opening up two rails consoles and running these two transactions:
# In console 1
ApplicationRecord.transaction do
ApplicationRecord.with_advisory_lock("lock_test_a", transaction: true)
pp "sleeping"
sleep(3)
pp "waking up"
ApplicationRecord.with_advisory_lock("lock_test_b", transaction: true)
"success"
end
# In console 2
ApplicationRecord.transaction do
ApplicationRecord.with_advisory_lock("lock_test_b", transaction: true)
pp "sleeping"
sleep(3)
pp "waking up"
ApplicationRecord.with_advisory_lock("lock_test_a", transaction: true)
"success"
endThe only different between them is the lock strings are swapped, which is what creates the deadlock. The sleep(3) is there just so you have time to start the second command in time to create the deadlock.
I have a few thoughts on ways to prevent this:
Add a global setting for timeout_seconds
Now that I know about this issue, I plan to use with_advisory_lock! with timeout_seconds everywhere. But it would be nice to be able to set timeout_seconds globally.
Add an option to not use try commands
Normally Postgres detects a deadlock and kills one of the queries (raising a PG::TRDeadlockDetected) allowing the other one to proceed. For example it does this with row-level locks, which you can see by running these two queries at the same time in two consoles:
# Console 1
User.transaction do
User.lock.find(1)
pp "sleeping"
sleep(3)
pp "waking up"
User.lock.find(2)
"success"
end
# Console 2
User.transaction do
User.lock.find(2)
pp "sleeping"
sleep(3)
pp "waking up"
User.lock.find(1)
"success"
endThis doesn't happen when you use with_advisory_lock because the try commands return immediately when they can't acquire the lock, and they don't actually create a true deadlock within postgres — instead the deadlock is created by the retry and sleep logic within with_advisory_lock in conjunction with the locks.
But if instead you use the non-try version of the locking functions, which will wait for the lock to be available rather than returning, then postgres will detect the deadlock and kill one of them, just as it does with the row-level locks above. You can test with this:
# Console 1
ApplicationRecord.transaction do
ApplicationRecord.connection.execute("SELECT pg_advisory_xact_lock(101,0)")
pp "sleeping"
sleep(3)
pp "waking up"
ApplicationRecord.connection.execute("SELECT pg_advisory_xact_lock(102,0)")
"success"
end
# Console 2
ApplicationRecord.transaction do
ApplicationRecord.connection.execute("SELECT pg_advisory_xact_lock(102,0)")
pp "sleeping"
sleep(3)
pp "waking up"
ApplicationRecord.connection.execute("SELECT pg_advisory_xact_lock(101,0)")
"success"
endI assume the idea with using try was to not tie up the postgres thread while waiting for the lock to be available, but this deadlock issue is a drawback of that approach.