Skip to content

Conversation

@p-datadog
Copy link
Member

@p-datadog p-datadog commented Oct 7, 2025

What does this PR do?
Uses Mutex#synchronize instead of manual lock & unlock calls to (hopefully) properly unlock the mutex in remote config worker.

Motivation:
Flaky tests on JRuby: https://github.com/DataDog/ruby-guild/issues/256, https://datadoghq.atlassian.net/browse/LANGPLAT-871

The issue is that sometimes mutex is attempted to be unlocked when it is not locked.

Change log entry
Yes: core/fix: improve locking code for remote configuration worker

Additional Notes:

I also attempted this alternate diff which also fixed the issue:

diff --git a/lib/datadog/core/remote/component.rb b/lib/datadog/core/remote/component.rb
index 18e33bea25..f73d9742b6 100644
--- a/lib/datadog/core/remote/component.rb
+++ b/lib/datadog/core/remote/component.rb
@@ -137,11 +137,13 @@ module Datadog
           def lift
             @mutex.lock
 
-            @once ||= true
+            begin
+              @once ||= true
 
-            @condition.broadcast
-          ensure
-            @mutex.unlock
+              @condition.broadcast
+            ensure
+              @mutex.unlock
+            end
           end
         end
 

How to test the change?

Following @Strech 's investigation in https://datadoghq.atlassian.net/browse/LANGPLAT-871, I reproduced the failure locally (both in and not in docker) using the following:

export BUNDLE_GEMFILE=gemfiles/ruby_3.4_mongo_min.gemfile

while true; do if ! TEST_DATADOG_INTEGRATION=1 BUNDLE_GEMFILE=./gemfiles/jruby_9.4_contrib.gemfile bundle exec rspec --pattern spec/datadog/tracing/contrib/suite/\*\*/\*_spec.rb --format=doc; then break; fi; done

After a couple of iterations, this would fail with the above error. The seeds specified in the ticket did not reproduce the failure for me - running the tests without explicitly specifying the seeds was needed.

With the change in this PR, I ran the above command for ~5 minutes without errors.

I have not further investigated the interaction between remote config worker and test polling intervals.

@p-datadog p-datadog requested a review from a team as a code owner October 7, 2025 18:57
@github-actions github-actions bot added the core Involves Datadog core libraries label Oct 7, 2025
@datadog-official
Copy link

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage
Patch Coverage: 100.00%
Total Coverage: 101.94% (-0.03%)

View detailed report

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: ee83db1 | Docs | Was this helpful? Give us feedback!

@pr-commenter
Copy link

pr-commenter bot commented Oct 7, 2025

Benchmarks

Benchmark execution time: 2025-10-07 19:22:39

Comparing candidate commit ee83db1 in PR branch jruby-deadlock with baseline commit d865382 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 44 metrics, 2 unstable metrics.

Copy link
Member

@Strech Strech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏🏼

@Strech Strech merged commit 0640641 into master Oct 8, 2025
359 checks passed
@Strech Strech deleted the jruby-deadlock branch October 8, 2025 06:04
@github-actions github-actions bot added this to the 2.22.0 milestone Oct 8, 2025
Copy link
Member

@ivoanjo ivoanjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I believe this is the best + correct solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Involves Datadog core libraries

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants