fingerprint: Add retry and failure config to env fingerprinters. #27161

jrasell · 2025-11-26T14:29:42Z

This change introduces new optional client fingerprinter configuration fields which can be used to control how the env fingerprinters perform retries and whether errors should halt the agent startup.

The retry wrapper is used by the env_aws, env_azure, env_gce, and env_digitalocean fingerprinters and is the handler for retry and error logic on the main fingerprinter. The change is backwards compatible, so running this change without any new config options results in the same behaviour as previously.

retry_interval: Specifies the time to wait between fingerprint attempts. This will default to 2 seconds.
retry_attempts: Specifies the maximum number of fingerprint retries to be made. This will default to 0 and can be set to -1 if the operator wants infinite retries.
exit_on_failure: Determines how the agent handles failure in performing the fingerprint.

The change helps alleviate problems in cloud providers where a machine starts before the metadata service and endpoint is available. In this situation, Nomad timesout the fingerprinter quickly and marks it as skipped, thus assuming we are not running within that environment. Operators can use the new configuration options to handle these race conditions, and wait for the metadata service to be available and respond.

Links

Jira: https://hashicorp.atlassian.net/browse/NMD-1061

Contributor Checklist

Changelog Entry If this PR changes user-facing behavior, please generate and add a
changelog entry using the make cl command.
Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
ensure regressions will be caught.
Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
and job configuration, please update the Nomad product documentation, which is stored in the
web-unified-docs repo. Refer to the web-unified-docs contributor guide for docs guidelines.
Please also consider whether the change requires notes within the upgrade
guide. If you would like help with the docs, tag the nomad-docs team in this PR.

Reviewer Checklist

Backport Labels Please add the correct backport labels as described by the internal
backporting document.
Commit Type Ensure the correct merge method is selected which should be "squash and merge"
in the majority of situations. The main exceptions are long-lived feature branches or merges where
history should be preserved.
Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
within the public repository.

This change introduces new optional client fingerprinter configuration fields which can be used to control how the env fingerprinters perform retries and whether errors should halt the agent startup. The retry wrapper is used by the env_aws, env_azure, env_gce, and env_digitalocean fingerprinters and is the handler for retry and error logic on the main fingerprinter. The change is backwards compatible, so running this change without any new config options results in the same behaviour as previously. - retry_interval: Specifies the time to wait between fingerprint attempts. This will default to 2 seconds. - retry_attempts: Specifies the maximum number of fingerprint retries to be made. This will default to 0 and can be set to -1 if the operator wants infinite retries. - exit_on_failure: Determines how the agent handles failure in performing the fingerprint. The change helps alleviate problems in cloud providers where a machine starts before the metadata service and endpoint is available. In this situation, Nomad times out the fingerprinter quickly and marks it as skipped, thus assuming we are not running within that environment. Operators can use the new configuration options to handle these race conditions, and wait for the metadata service to be available and respond.

pkazmierczak

LGTM! Left some minor typo-related comments, but nothing blocking.

client/config/fingerprint.go

client/fingerprint/env_aws.go

command/agent/agent.go

Co-authored-by: Piotr Kazmierczak <[email protected]>

tgross · 2025-12-03T16:28:34Z

client/config/fingerprint.go

+
+// Fingerprint is an optional configuration block for environment fingerprinters
+// can control retry behavior and failure handling.
+type Fingerprint struct {


In non-cloud environments we'll typically see something like the following to disable all the cloud fingerprinters. This uses the deprecated options syntax. Should we add an enabled flag to this struct to implement the same behavior?

client { options = { "fingerprint.denylist" = "env_aws,env_gce,env_azure,env_digitalocean" } }

tgross · 2025-12-03T16:32:43Z

client/fingerprint/fingerprint_retry.go

+// Fingerprint executes the underlying fingerprinter with retry logic based
+// on the client configuration and implements the Fingerprinter interface.
+//
+// If the fingerprinter fails after all retry attempts, the error from the last
+// attempt is returned, unless the configuration indicates that failures should
+// be skipped for this fingerprinter and the error is of the type that indicates
+// an initial probe failure.
+func (rw *RetryWrapper) Fingerprint(req *FingerprintRequest, resp *FingerprintResponse) error {


I'm not sure I understand how we're differentiating between failures we need to retry (and block initial fingerprinting for) and failures because that's not the environment we're in. Does this end up blocking the first fingerprint?

jrasell self-assigned this Nov 26, 2025

vercel bot deployed to Preview – nomad-ui November 26, 2025 14:29 View deployment

jrasell force-pushed the f-NMD-1061 branch from dbee56d to f645969 Compare November 26, 2025 14:35

vercel bot deployed to Preview – nomad-ui November 26, 2025 14:36 View deployment

jrasell force-pushed the f-NMD-1061 branch from f645969 to f993c71 Compare November 26, 2025 15:15

vercel bot deployed to Preview – nomad-ui November 26, 2025 15:16 View deployment

jrasell force-pushed the f-NMD-1061 branch from f993c71 to cf44232 Compare November 26, 2025 15:34

vercel bot deployed to Preview – nomad-ui November 26, 2025 15:35 View deployment

jrasell mentioned this pull request Nov 27, 2025

nomad: Add client documentation for new fingerprint block. hashicorp/web-unified-docs#1399

Open

15 tasks

jrasell added the backport/1.11.x backport to 1.11.x release line label Nov 27, 2025

changelog: Add entry for #27161

806dd3c

vercel bot deployed to Preview – nomad-ui November 27, 2025 09:06 View deployment

jrasell requested review from pkazmierczak, tehut and tgross November 27, 2025 09:06

jrasell marked this pull request as ready for review November 27, 2025 09:07

jrasell requested review from a team as code owners November 27, 2025 09:07

pkazmierczak previously approved these changes Nov 28, 2025

View reviewed changes

client/config/fingerprint.go Outdated Show resolved Hide resolved

client/fingerprint/env_aws.go Outdated Show resolved Hide resolved

command/agent/agent.go Outdated Show resolved Hide resolved

Apply suggestions from code review

4f30534

Co-authored-by: Piotr Kazmierczak <[email protected]>

jrasell dismissed pkazmierczak’s stale review via 4f30534 December 1, 2025 09:00

vercel bot deployed to Preview – nomad-ui December 1, 2025 09:01 View deployment

jrasell requested a review from pkazmierczak December 1, 2025 12:48

tgross reviewed Dec 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fingerprint: Add retry and failure config to env fingerprinters. #27161

fingerprint: Add retry and failure config to env fingerprinters. #27161

Uh oh!

jrasell commented Nov 26, 2025 •

edited

Loading

Uh oh!

pkazmierczak left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tgross Dec 3, 2025

Uh oh!

tgross Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fingerprint: Add retry and failure config to env fingerprinters. #27161

Are you sure you want to change the base?

fingerprint: Add retry and failure config to env fingerprinters. #27161

Uh oh!

Conversation

jrasell commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Links

Contributor Checklist

Reviewer Checklist

Uh oh!

pkazmierczak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tgross Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

tgross Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jrasell commented Nov 26, 2025 •

edited

Loading