-
Notifications
You must be signed in to change notification settings - Fork 151
Handle changes to MutableSettings and ExporterSettings without rebuilding
#7724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
tracer/src/Datadog.Trace/LibDatadog/DataPipeline/ManagedTraceExporter.cs
Outdated
Show resolved
Hide resolved
2bc63f6 to
34f0d90
Compare
e347879 to
8c472a5
Compare
34f0d90 to
f1e1c7e
Compare
8c472a5 to
8e19e3a
Compare
|
f1e1c7e to
c2b6a1c
Compare
8e19e3a to
7940c31
Compare
c2b6a1c to
48c7644
Compare
Execution-Time Benchmarks Report ⏱️Execution-time results for samples comparing This PR (7724) and master. ✅ No regressions detected - check the details below Full Metrics ComparisonFakeDbCommand
HttpMessageHandler
Comparison explanationExecution-time benchmarks measure the whole time it takes to execute a program, and are intended to measure the one-off costs. Cases where the execution time results for the PR are worse than latest master results are highlighted in **red**. The following thresholds were used for comparing the execution times:
Note that these results are based on a single point-in-time result for each branch. For full results, see the dashboard. Graphs show the p99 interval based on the mean and StdDev of the test run, as well as the mean value of the run (shown as a diamond below the graph). Duration chartsFakeDbCommand (.NET Framework 4.8)gantt
title Execution time (ms) FakeDbCommand (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7724) - mean (76ms) : 70, 82
master - mean (77ms) : 71, 82
section Bailout
This PR (7724) - mean (80ms) : 75, 86
master - mean (79ms) : 73, 85
section CallTarget+Inlining+NGEN
This PR (7724) - mean (1,060ms) : 1013, 1108
master - mean (1,061ms) : 983, 1138
FakeDbCommand (.NET Core 3.1)gantt
title Execution time (ms) FakeDbCommand (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7724) - mean (118ms) : 111, 124
master - mean (119ms) : 113, 125
section Bailout
This PR (7724) - mean (120ms) : 113, 127
master - mean (119ms) : 111, 126
section CallTarget+Inlining+NGEN
This PR (7724) - mean (765ms) : 736, 794
master - mean (763ms) : 707, 819
FakeDbCommand (.NET 6)gantt
title Execution time (ms) FakeDbCommand (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7724) - mean (107ms) : 101, 113
master - mean (105ms) : 100, 110
section Bailout
This PR (7724) - mean (106ms) : 101, 112
master - mean (106ms) : 100, 111
section CallTarget+Inlining+NGEN
This PR (7724) - mean (718ms) : 679, 757
master - mean (708ms) : 671, 745
FakeDbCommand (.NET 8)gantt
title Execution time (ms) FakeDbCommand (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7724) - mean (105ms) : 97, 114
master - mean (103ms) : 95, 111
section Bailout
This PR (7724) - mean (104ms) : 99, 110
master - mean (105ms) : 99, 111
section CallTarget+Inlining+NGEN
This PR (7724) - mean (695ms) : 665, 725
master - mean (690ms) : 658, 721
HttpMessageHandler (.NET Framework 4.8)gantt
title Execution time (ms) HttpMessageHandler (.NET Framework 4.8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7724) - mean (195ms) : 192, 199
master - mean (193ms) : 189, 197
section Bailout
This PR (7724) - mean (199ms) : 197, 201
master - mean (197ms) : 194, 200
section CallTarget+Inlining+NGEN
This PR (7724) - mean (1,130ms) : 1092, 1167
master - mean (1,123ms) : 1049, 1198
HttpMessageHandler (.NET Core 3.1)gantt
title Execution time (ms) HttpMessageHandler (.NET Core 3.1)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7724) - mean (281ms) : 276, 285
master - mean (278ms) : 273, 283
section Bailout
This PR (7724) - mean (282ms) : 277, 287
master - mean (278ms) : 274, 282
section CallTarget+Inlining+NGEN
This PR (7724) - mean (929ms) : 890, 968
master - mean (913ms) : 856, 971
HttpMessageHandler (.NET 6)gantt
title Execution time (ms) HttpMessageHandler (.NET 6)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7724) - mean (274ms) : 270, 277
master - mean (272ms) : 265, 278
section Bailout
This PR (7724) - mean (275ms) : 270, 280
master - mean (271ms) : 267, 274
section CallTarget+Inlining+NGEN
This PR (7724) - mean (899ms) : 866, 933
master - mean (888ms) : 844, 931
HttpMessageHandler (.NET 8)gantt
title Execution time (ms) HttpMessageHandler (.NET 8)
dateFormat x
axisFormat %Q
todayMarker off
section Baseline
This PR (7724) - mean (273ms) : 269, 278
master - mean (270ms) : 264, 276
section Bailout
This PR (7724) - mean (273ms) : 269, 277
master - mean (270ms) : 266, 274
section CallTarget+Inlining+NGEN
This PR (7724) - mean (844ms) : 820, 868
master - mean (830ms) : 812, 849
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
…n't respond to changes I left it like this because the debugger already doesn't respond to changes like other services do
- Move statsd instance creation to separate factory - Create a StatsdManager to handle automatic updating in response to setting changes - Always create a statsd instance, as it's hard to know if we're _ever_ going to need one, and reduces some of the compexity
… reconfiguration is not allowed
…s though, and doesn't respond to changes
This isn't necessary with the current design, and it causes issues today
Make sure we can't dispose a stats consumer that's in use (as it will throw) Rework to use a "lease" mechanism to track usages Make passing in a statsmanager required
The statsd client does sync-over-async in the flush and dispose paths, which can lead to deadlocks and thread exhaustion. To work around that, we push the dispose to happen on a thread-pool thread instead, in the background
… config changes (#7796) ## Summary of changes A fix for #7724 to handle telemetry reporting in dynamic config "reset" scenarios ## Reason for change The system tests for #7724 were failing in some dynamic configuration scenarios. Specifically, the tests were sending remote config _without_ any configuration values "i.e. 'reset to use defaults'" and were waiting a telemetry update. However, we never sent it, because there was "no telemetry to record". Note that we _did_ correctly apply the new configuration, we just didn't report the telemetry correctly, primarily due to limitations in the telemetry protocol. This PR adds a fix for that, and will be merged into #7724. ## Implementation details The solution is to "remember" the telemetry from the default mutable configuration values, _without_ any dynamic sources, and "replay" this telemetry when we update telemetry. This feels kind of hacky, but it's something I suspected we might need to do, and had been avoiding up to this point because we do a "full reconfigure" anyway. ## Test coverage Added a specific unit test that mimics the behaviour of the system-test (i.e. an "empty" dynamic config response) and confirms the telemetry is recorded as expected ## Other details https://datadoghq.atlassian.net/browse/LANGPLAT-819 Part of a config stack - #7522 - #7525 - #7530 - #7532 - #7543 - #7544 - #7721 - #7722 - #7695 - #7723 - #7724 - #7796 👈 Unlike other PRs in the stack, I'll merge this directly into #7724 to fix the tests there, just thought I'd keep this separate for easier reviewing
447027a to
9db67e4
Compare
NachoEchevarria
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive! Thanks!
## Summary of changes Updates a couple of places where we're calling `Tracer.Instance` where we don't need to ## Reason for change In one of my other PRs I accidentally broke something that should _only_ have affected integration tests, but a bunch of unit tests broke. It highlighted where they were using `Tracer.Instance` and setting the global tracer for tests. In other places, it revealed that code that tests that _looked_ independent of other tests really wasn't... Calling `Tracer.Instance` potentially does a _lot_ of work, as it initializes the tracer. It's also hard to follow if we're making static calls out in places we don't need to. So just pass through the settings we want instead. ## Implementation details Instead of calling `Tracer.Instance.Settings`, use the value of `Tracer` or `TracerSettings` that's already available wherever possible. It makes the tests cleaner too. ## Test coverage Same coverage, just a bit cleaner ## Other details Included as part of the config stack, just because I already refactored some of this code, and can't be bothered to faff with merge conflicts: https://datadoghq.atlassian.net/browse/LANGPLAT-819 - #7522 - #7525 - #7530 - #7532 - #7543 - #7544 - #7721 - #7722 - #7695 - #7723 - #7724 - #7744 👈
Summary of changes
Reason for change
This is the "endpoint" that we've been heading for - services only being disposed/rebuilt at the end of the app, and otherwise only rebuilding the necessary parts. For example - we don't need to tear down all the API factories when a customer changes a global tag via remote config; they only need to change if the
ExporterSettingschange.The hope is that overall this reduces the overhead of using configuration in code and/or remote configuration, while also reducing the number of issues due to managing disposal of services.
Implementation details
Overall, this PR is kind of a pain. Moving from the "rebuild everything" to "reconfigure each service" couldn't be done piecemeal, so this is the one-shot PR. What's more, different services need different patterns (though we can probably consolidate some of them, this has taken a lot of work and I likely changed patterns unnecessarily in some places).
In general, there's a couple of patterns:
Managed*versions of some servicesVolatile.Read()(to ensure changes are visible) and are generally cached to a local variable (as the underlying field may be updated in the background).Test coverage
In the vast majority of places, this should be covered by existing tests
I plan to add some additional integration tests around reconfiguring and a bunch of manual testing to make sure I'm confident.
Other details
I strongly recommend reviewing commit-by-commit. They're generally self-contained, and hopefully simple enough to understand one commit at a time.
https://datadoghq.atlassian.net/browse/LANGPLAT-819
Part of a config stack
MutableSettingsfromTracerSettings#7522MutableSettingson dynamic config changes #7525DefaultServiceNametoMutableSettings#7530PerTraceSettings.GetServiceName()#7532TracerSettingsto useMutableSettingswhere appropriate #7543IsIntegrationEnabled(),IsErrorStatusCode(), andGetIntegrationAnalyticsSampleRate()#7544DictionaryExtensions.SequenceEqual#7722SettingsManagerfor managing mutable settings and ExporterSettings #7695TracerSettingswhich can change at runtime #7723MutableSettingsandExporterSettingswithout rebuilding #7724 👈This isn't the final PR in the stack, as there will be a bunch of cleaning up to do, but it's the final "implementation" PR