Skip to content

Conversation

@ericfirth
Copy link
Contributor

@ericfirth ericfirth commented Sep 22, 2025

What does this PR do?
This implements Data Streams Monitoring in Ruby. It autoinstruments 2 libraries:

  • ruby-kafka
  • karafka (though only the consumer, as waterdrop, the producer library is not instrumented)

Data Streams Monitoring works by setting a checkpoint with topic (i.e. queue), direction (i.e. produce/consume = in/out) and service (plus a few other tags). Produces then set the context of the checkpoint in the message (usually via the headers, but for SQS and non-header libraries, it will be in the message body) and consumers take the message context and set a checkpoint with a link to the previous checkpoint. Thus, we can give latency stats (i.e. non-sampled) through queues and long pipelines.

Motivation:
There are have been many requests for DSM in Ruby over the years and there is an issue on the repo to add it.

Change log entry
Yes. DataStreams: Adds Data Streams Monitoring capability, instrumenting ruby-kafka and karafka consumers with manual instrumentation available

Additional Notes:

How to test the change?
The following branch is being tested on staging with our test apps. You can see them here

They are defined here:
https://github.com/DataDog/data-streams-dev/pull/252

@github-actions github-actions bot added core Involves Datadog core libraries integrations Involves tracing integrations tracing labels Sep 22, 2025
@ericfirth ericfirth force-pushed the eric.firth/dsm-ruby branch 2 times, most recently from 2b9b824 to cb04944 Compare September 29, 2025 14:36
@datadog-official
Copy link

datadog-official bot commented Sep 29, 2025

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage
Patch Coverage: 97.48%
Total Coverage: 98.58% (+0.03%)

View detailed report

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 728b9c7 | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@ericfirth ericfirth force-pushed the eric.firth/dsm-ruby branch 3 times, most recently from 2738b51 to 29cf40f Compare September 29, 2025 20:33
@ericfirth ericfirth force-pushed the eric.firth/dsm-ruby branch from 3a25253 to 072e090 Compare October 14, 2025 12:25
@github-actions
Copy link

github-actions bot commented Oct 14, 2025

Typing analysis

Ignored files

This PR introduces 1 ignored file. It increases the percentage of typed files from 35.94% to 36.87% (+0.93%).

Ignored files (+1-0)Introduced:
lib/datadog/data_streams/configuration/settings.rb

Note: Ignored files are excluded from the next sections.

Untyped methods

This PR introduces 7 untyped methods and 8 partially typed methods. It increases the percentage of typed methods from 51.66% to 53.09% (+1.43%).

Untyped methods (+7-0)Introduced:
sig/datadog/data_streams.rbs:19
└── def self.components: () -> untyped
sig/datadog/tracing/contrib/kafka/instrumentation/consumer.rbs:7
└── def self.included: (untyped base) -> untyped
sig/datadog/tracing/contrib/kafka/instrumentation/consumer.rbs:9
└── def each_message: (**untyped kwargs) { (untyped) -> untyped } -> untyped
sig/datadog/tracing/contrib/kafka/instrumentation/consumer.rbs:10
└── def each_batch: (**untyped kwargs) { (untyped) -> untyped } -> untyped
sig/datadog/tracing/contrib/kafka/instrumentation/producer.rbs:7
└── def self.included: (untyped base) -> untyped
sig/datadog/tracing/contrib/kafka/instrumentation/producer.rbs:9
└── def deliver_messages: (**untyped kwargs) -> untyped
sig/datadog/tracing/contrib/kafka/instrumentation/producer.rbs:10
└── def send_messages: (untyped messages, **untyped kwargs) -> untyped
Partially typed methods (+8-0)Introduced:
sig/datadog/data_streams/processor.rbs:52
└── def process_kafka_produce_event: (::Hash[::Symbol, untyped] event) -> void
sig/datadog/data_streams/processor.rbs:53
└── def process_kafka_consume_event: (::Hash[::Symbol, untyped] event) -> void
sig/datadog/data_streams/processor.rbs:54
└── def process_checkpoint_event: (::Hash[::Symbol, untyped] event) -> void
sig/datadog/data_streams/processor.rbs:67
└── def send_stats_to_agent: (::Hash[::String, untyped] payload) -> void
sig/datadog/data_streams/processor.rbs:68
└── def send_dsm_payload: (::String data, ::Hash[::String, ::String] headers) -> untyped
sig/datadog/data_streams/processor.rbs:71
└── def serialize_buckets: () -> ::Array[::Hash[::String, untyped]]
sig/datadog/data_streams/processor.rbs:72
└── def serialize_consumer_backlogs: () -> ::Array[::Hash[::String, untyped]]
sig/datadog/data_streams/transport/stats.rbs:36
└── def send_stats: (::Hash[::String, untyped] payload) -> (Core::Transport::HTTP::Response | Core::Transport::InternalErrorResponse)

If you believe a method or an attribute is rightfully untyped or partially typed, you can add # untyped:accept to the end of the line to remove it from the stats.

@ericfirth ericfirth force-pushed the eric.firth/dsm-ruby branch from 37705c0 to 7ab70ad Compare October 14, 2025 12:28
@pr-commenter
Copy link

pr-commenter bot commented Oct 14, 2025

Benchmarks

Benchmark execution time: 2025-11-03 18:46:49

Comparing candidate commit 2ae6bf8 in PR branch eric.firth/dsm-ruby with baseline commit 0808070 in branch master.

Found 2 performance improvements and 1 performance regressions! Performance is the same for 41 metrics, 2 unstable metrics.

scenario:profiling - Allocations (baseline)

  • 🟩 throughput [+272517.572op/s; +286292.951op/s] or [+5.450%; +5.726%]

scenario:tracing - Propagation - Datadog

  • 🟩 throughput [+2702.065op/s; +2773.593op/s] or [+9.407%; +9.656%]

scenario:tracing - Propagation - Trace Context

  • 🟥 throughput [-3545.328op/s; -3447.408op/s] or [-9.502%; -9.240%]

@ericfirth ericfirth force-pushed the eric.firth/dsm-ruby branch 3 times, most recently from 77dfd10 to 8242233 Compare October 17, 2025 12:48
@github-actions
Copy link

github-actions bot commented Oct 17, 2025

Thank you for updating Change log entry section 👏

Visited at: 2025-10-23 12:59:15 UTC

@ericfirth ericfirth marked this pull request as ready for review October 17, 2025 13:31
@ericfirth ericfirth requested review from a team as code owners October 17, 2025 13:31
@ericfirth ericfirth requested a review from vpellan October 17, 2025 13:31
@ericfirth ericfirth force-pushed the eric.firth/dsm-ruby branch from 0d9c1ce to b1f669a Compare October 17, 2025 17:12
@ericfirth ericfirth marked this pull request as draft October 17, 2025 18:24
@ericfirth ericfirth force-pushed the eric.firth/dsm-ruby branch from efd426f to 988cd35 Compare October 17, 2025 20:39
@ericfirth ericfirth marked this pull request as ready for review October 20, 2025 13:47
@ericfirth
Copy link
Contributor Author

Removed a ton of worthless comments and de-namespaced it from tracing. It is ready for re-review @y9v

@ericfirth ericfirth force-pushed the eric.firth/dsm-ruby branch from 0de7f3c to 5abf907 Compare October 21, 2025 13:29
- Move Data Streams configuration to use self.extended hook pattern
- Add Extensions module to activate Data Streams configuration
- Move ENV_ENABLED constant to lib/datadog/data_streams/ext.rb
- Add comprehensive RBS type signatures for Data Streams modules
- Fix type errors: add missing timestamp field to consumer stats
- Improve type coverage to 88-100% across data_streams files
- Add configuration settings tests

This addresses review comments to align with the existing AppSec
configuration pattern and improve type safety.
Update set_produce_checkpoint and set_consume_checkpoint methods to accept
tags as Hash instead of Array<String>, aligning with the common pattern used
throughout the Datadog library.

Tags are now passed as { key: 'value' } and internally converted to the
'key:value' string format used by the DSM pathway hash computation.

This change addresses @marcotc's review feedback.
The processor method is an internal implementation detail and should not be
part of the public API. Users should interact with DataStreams through the
public methods (set_produce_checkpoint, set_consume_checkpoint, etc.).

- Move processor method to private section
- Update tests to use .send(:processor) for internal access
- Update RBS to mark processor as private

This addresses @marcotc's review feedback.
Add comprehensive documentation to encode_var_int_64 and decode_varint
methods explaining that they implement unsigned LEB128 (Little Endian Base
128) encoding as specified in DWARF5 standard section 7.6.

This addresses @marcotc's review feedback.
…ta structures

These methods were writing to @produce_offsets and @commit_offsets instance
variables, but serialize_buckets was reading from bucket[:latest_produce_offsets]
and bucket[:latest_commit_offsets]. This meant offset data would never be sent
to the agent.

Fixed by writing directly to the bucket structure like track_kafka_consume does.

Related to @marcotc's review feedback about unused methods.
@ericfirth ericfirth force-pushed the eric.firth/dsm-ruby branch from 8c47ea9 to 6d9d801 Compare October 30, 2025 18:15
@ericfirth ericfirth force-pushed the eric.firth/dsm-ruby branch from 6d9d801 to 182ce81 Compare October 30, 2025 18:26
Copy link
Contributor

@vpellan vpellan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for tracing part! (non blocker comment)

- Fix hook from 'included' to 'prepended' in Producer/Consumer instrumentation
- Add missing env.verb in Stats::API::Endpoint#call
- Update tests to use prepend instead of include
- Fix async buffer test to call process_events
- Remove track_kafka_commit tests (method was removed)
@ericfirth ericfirth force-pushed the eric.firth/dsm-ruby branch from fad63ac to 728b9c7 Compare November 4, 2025 16:49
@p-datadog
Copy link
Member

p-datadog commented Nov 5, 2025

I have the follow-up PR to improve ddsketch loading: #5008. I think it should stay as a separate PR because I would like the guild to review it since it is a bit of a conceptual change in how loading/referencing of components from libdatadog is accomplished.

This present PR does not add any circular dependencies as far as I can tell, I think it's OK to be merged in its present state.

@ericfirth ericfirth merged commit 4d6369e into master Nov 5, 2025
557 checks passed
@ericfirth ericfirth deleted the eric.firth/dsm-ruby branch November 5, 2025 16:14
@github-actions github-actions bot added this to the 2.23.0 milestone Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Involves Datadog core libraries integrations Involves tracing integrations tracing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants