Skip to content

Conversation

@tgross
Copy link
Member

@tgross tgross commented Dec 1, 2025

In #12583 we changed the serialization code for CSI volumes so that we were using the extension method we use for topology and nodes. This reduces an enormous amount of boilerplate code, but we introduced a state store corruption bug in the process. The extension method sanitizes the volume without copying it, so a read of the volume (such as getting an event from the event stream) can cause the volume's secrets to be redacted in subsequent requests to publish or mount the volume.

Move the sanitization code into a testable method on the volume, and add a copy to the method.

Ref: #12583
Fixes: #26766

Testing & Reproduction steps

In addition to the unit test here, you can reproduce this behavior with the hostpath plugin demo. Comment out the steps to claim the volume. Run the run.sh script and examine the allocation logs for the plugin. You'll see the unredacted secrets there. Then read from the event stream via nomad operator api "/v1/event/stream?topic=CSIVolume". Uncomment the steps to claim the volume and run the workload. Examine the allocation logs for the plugin again and you'll see the secret is being redacted. With this patch it works as expected.

Contributor Checklist

  • Changelog Entry If this PR changes user-facing behavior, please generate and add a
    changelog entry using the make cl command.
  • Testing Please add tests to cover any new functionality or to demonstrate bug fixes and
    ensure regressions will be caught.
  • Documentation If the change impacts user-facing functionality such as the CLI, API, UI,
    and job configuration, please update the Nomad product documentation, which is stored in the
    web-unified-docs repo. Refer to the web-unified-docs contributor guide for docs guidelines.
    Please also consider whether the change requires notes within the upgrade
    guide
    . If you would like help with the docs, tag the nomad-docs team in this PR.

Reviewer Checklist

  • Backport Labels Please add the correct backport labels as described by the internal
    backporting document.
  • Commit Type Ensure the correct merge method is selected which should be "squash and merge"
    in the majority of situations. The main exceptions are long-lived feature branches or merges where
    history should be preserved.
  • Enterprise PRs If this is an enterprise only PR, please add any required changelog entry
    within the public repository.
  • If a change needs to be reverted, we will roll out an update to the code within 7 days.

Changes to Security Controls

Are there any changes to security controls (access controls, encryption, logging) in this pull request? If so, explain.

In #12583 we changed the serialization code for CSI volumes so that we were
using the extension method we use for topology and nodes. This reduces an
enormous amount of boilerplate code, but we introduced a state store corruption
bug in the process. The extension method sanitizes the volume without copying
it, so a read of the volume (such as getting an event from the event stream) can
cause the volume's secrets to be redacted in subsequent requests to publish or
mount the volume.

Move the sanitization code into a testable method on the volume, and add a copy
to the method.

Ref: #12583
Fixes: #26766
@tgross tgross force-pushed the csi-copy-for-sanitize branch from 479167f to c68297d Compare December 1, 2025 15:28
@tgross tgross marked this pull request as ready for review December 1, 2025 15:29
@tgross tgross requested review from a team as code owners December 1, 2025 15:29
@tgross tgross requested review from gulducat and jrasell December 1, 2025 15:30
@tgross tgross added backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/ent/1.10.x+ent backport to 1.10.x+ent release line backport/1.11.x backport to 1.11.x release line labels Dec 1, 2025
@tgross tgross added this to the 1.11.x milestone Dec 1, 2025
Copy link
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tgross tgross merged commit f9ce228 into main Dec 2, 2025
55 checks passed
@tgross tgross deleted the csi-copy-for-sanitize branch December 2, 2025 16:58
tgross added a commit that referenced this pull request Dec 2, 2025
In #12583 we changed the serialization code for CSI volumes so that we were
using the extension method we use for topology and nodes. This reduces an
enormous amount of boilerplate code, but we introduced a state store corruption
bug in the process. The extension method sanitizes the volume without copying
it, so a read of the volume (such as getting an event from the event stream) can
cause the volume's secrets to be redacted in subsequent requests to publish or
mount the volume.

Move the sanitization code into a testable method on the volume, and add a copy
to the method.

Ref: #12583
Fixes: #26766
tgross added a commit that referenced this pull request Dec 2, 2025
…ruption (#27176) (#27183)

In #12583 we changed the serialization code for CSI volumes so that we were
using the extension method we use for topology and nodes. This reduces an
enormous amount of boilerplate code, but we introduced a state store corruption
bug in the process. The extension method sanitizes the volume without copying
it, so a read of the volume (such as getting an event from the event stream) can
cause the volume's secrets to be redacted in subsequent requests to publish or
mount the volume.

Move the sanitization code into a testable method on the volume, and add a copy
to the method.

Ref: #12583
Fixes: #26766

Co-authored-by: Tim Gross <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/ent/1.8.x+ent Changes are backported to 1.8.x+ent backport/ent/1.10.x+ent backport to 1.10.x+ent release line backport/1.11.x backport to 1.11.x release line theme/storage type/bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CSI not sending secrets as part of NodeStageVolume call

2 participants