Skip to content

Conversation

@lym953
Copy link
Contributor

@lym953 lym953 commented Dec 9, 2025

Problem

A customer reported that their Lambda is behind a proxy, and the Rust-based extension can't send traces to Datadog via the proxy, while the previous go-based extension worked.

This PR

Supports the env var DD_TLS_CERT_FILE: The path to a file of concatenated CA certificates in PEM format.
Example: DD_TLS_CERT_FILE=/opt/ca-cert.pem, so the when the extension flushes traces/stats to Datadog, the HTTP client created can load and use this cert, and connect the proxy properly.

Testing

Steps

  1. Create a Lambda in a VPC with an NGINX proxy.
  2. Add a layer to the Lambda, which includes the CA certificate ca-cert.pem
  3. Set env vars:
    • DD_TLS_CERT_FILE=/opt/ca-cert.pem
    • DD_PROXY_HTTPS=http://10.0.0.30:3128, where 10.0.0.30 is the private IP of the proxy EC2 instance
    • DD_LOG_LEVEL=debug
  4. Update routing rules of security groups so the Lambda can reach http://10.0.0.30:3128
  5. Invoke the Lambda

Result

Before
Trace flush failed with error logs:

DD_EXTENSION | ERROR | Max retries exceeded, returning request error error=Network error: client error (Connect) attempts=1
DD_EXTENSION | ERROR | TRACES | Request failed: No requests sent

After
Trace flush is successful:

DD_EXTENSION | DEBUG | TRACES | Flushing 1 traces
DD_EXTENSION | DEBUG | TRACES | Added root certificate from /opt/ca-cert.pem
DD_EXTENSION | DEBUG | TRACES | Proxy connector created with proxy: Some("http://10.0.0.30:3128")
DD_EXTENSION | DEBUG | Sending with retry url=https://trace.agent.datadoghq.com/api/v0.2/traces payload_size=1120 max_retries=1
DD_EXTENSION | DEBUG | Received response status=202 Accepted attempt=1
DD_EXTENSION | DEBUG | Request succeeded status=202 Accepted attempts=1
DD_EXTENSION | DEBUG | TRACES | Flushing took 1609 ms

Notes

This fix only covers trace flusher and stats flusher, which use ServerlessTraceFlusher::get_http_client() to create the HTTP client. It doesn't cover logs flusher and proxy flusher, which use a different function (http.rs:get_client()) to create the HTTP client. However, logs flushing was successful in my tests, even if no certificate was added. We can come back to logs/proxy flusher if someone reports an error.

@lym953 lym953 marked this pull request as ready for review December 9, 2025 22:52
@lym953 lym953 requested a review from a team as a code owner December 9, 2025 22:52
#[serde(deserialize_with = "deserialize_optional_string")]
pub http_protocol: Option<String>,
#[serde(deserialize_with = "deserialize_optional_string")]
pub ssl_ca_cert: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the DD_SSL_CA_CERT environment variable comes from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.merge(Env::prefixed("DD_"));

It's handled by this code, which puts the value of env var DD_SSL_CA_CERT to the field ssl_ca_cert. Is this what you are asking?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'm asking where does DD_SSL_CA_CERT name convention comes from, does this come from the Datadog Agent config? Does it come from the docs? Did you came up with it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I came up with it. Upon further search, I found tls_ca_cert here:
https://github.com/DataDog/integrations-core/blob/master/http_check/datadog_checks/http_check/data/conf.yaml.example#L477
so I'll rename the env var to DD_SSL_CA_CERT. Does this sound good to you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, although I'd confirm with the Agent team!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Env var name LGTM from the perspective of agent-configuration team!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. Changing to DD_TLS_CERT_FILE to be consistent with:
https://github.com/DataDog/datadog-agent/blob/0638dfce1e1f3a9ae336334d4df01cb2a5e35120/pkg/config/setup/config.go#L1410

The same config option has different names in different places. This PR just picks one of them.

@lym953 lym953 changed the title [SVLS-7934] feat: Support SSL certificate for trace/stats flusher [SVLS-7934] feat: Support CLS certificate for trace/stats flusher Dec 10, 2025
@lym953 lym953 changed the title [SVLS-7934] feat: Support CLS certificate for trace/stats flusher [SVLS-7934] feat: Support TLS certificate for trace/stats flusher Dec 10, 2025
Copy link
Contributor

@duncanista duncanista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

One las thing, do logs/metrics/proxy clients need this change too?

@lym953
Copy link
Contributor Author

lym953 commented Dec 12, 2025

One las thing, do logs/metrics/proxy clients need this change too?

@duncanista Maybe, but somehow I didn't see any error with logs/metrics client in my test. If this or future customer reports such error, we can reproduce it then fix it in a separate PR.

@lym953 lym953 merged commit bae97ec into main Dec 15, 2025
47 checks passed
@lym953 lym953 deleted the yiming.luo/ssl-2 branch December 15, 2025 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants