Skip to content

Conversation

@henrybear327
Copy link
Contributor

@henrybear327 henrybear327 commented Sep 13, 2025

Based on the discovery from Marek #20234 #20221 (comment).

Please see the discussion starting from #20349 (comment)

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: henrybear327
Once this PR has been reviewed and has the lgtm label, please assign ivanvc for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@henrybear327 henrybear327 self-assigned this Sep 13, 2025
@henrybear327 henrybear327 force-pushed the robustness-test-reproduce-20573 branch 2 times, most recently from d3481ed to 3566e71 Compare September 13, 2025 23:34
@codecov
Copy link

codecov bot commented Sep 13, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.05%. Comparing base (4601818) to head (7587cfc).

Additional details and impacted files

see 34 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #20663      +/-   ##
==========================================
- Coverage   69.14%   69.05%   -0.10%     
==========================================
  Files         420      417       -3     
  Lines       34817    34682     -135     
==========================================
- Hits        24074    23948     -126     
+ Misses       9338     9324      -14     
- Partials     1405     1410       +5     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4601818...7587cfc. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@serathius
Copy link
Member

THe robustness tests flaked with main_test.go:175: rpc error: code = Canceled desc = grpc: the client connection is closing. Do we know why?

@serathius
Copy link
Member

There is also flake on TestRobustnessRegression/Issue15271, maybe we should consider making the additional watch requests optional and disable them for regression tests that don't need them.

@henrybear327
Copy link
Contributor Author

THe robustness tests flaked with main_test.go:175: rpc error: code = Canceled desc = grpc: the client connection is closing. Do we know why?

Not yet, as I am looking into it just now. I was actually puzzled when I saw the watch validation passed on CI, but the test failed.

@henrybear327 henrybear327 force-pushed the robustness-test-reproduce-20573 branch 8 times, most recently from c7112ac to 8de83cd Compare September 15, 2025 20:41
@henrybear327
Copy link
Contributor Author

The target is created! Reproduction is now consistent.

@henrybear327 henrybear327 force-pushed the robustness-test-reproduce-20573 branch 8 times, most recently from 06f3ee9 to 9e80f44 Compare September 15, 2025 21:44
@henrybear327
Copy link
Contributor Author

/retest pull-etcd-robustness-amd64

@k8s-ci-robot
Copy link

@henrybear327: The /retest command does not accept any targets.
The following commands are available to trigger required jobs:

/test pull-etcd-build
/test pull-etcd-contrib-mixin
/test pull-etcd-coverage-report
/test pull-etcd-e2e-386
/test pull-etcd-e2e-amd64
/test pull-etcd-e2e-arm64
/test pull-etcd-fuzzing-v3rpc
/test pull-etcd-govulncheck
/test pull-etcd-grpcproxy-e2e-amd64
/test pull-etcd-grpcproxy-e2e-arm64
/test pull-etcd-grpcproxy-integration-amd64
/test pull-etcd-grpcproxy-integration-arm64
/test pull-etcd-integration-1-cpu-amd64
/test pull-etcd-integration-1-cpu-arm64
/test pull-etcd-integration-2-cpu-amd64
/test pull-etcd-integration-2-cpu-arm64
/test pull-etcd-integration-4-cpu-amd64
/test pull-etcd-integration-4-cpu-arm64
/test pull-etcd-markdown-lint
/test pull-etcd-release-tests
/test pull-etcd-robustness-amd64
/test pull-etcd-robustness-arm64
/test pull-etcd-unit-test-386
/test pull-etcd-unit-test-amd64
/test pull-etcd-unit-test-arm64
/test pull-etcd-verify

Use /test all to run the following jobs that were automatically triggered:

pull-etcd-build
pull-etcd-contrib-mixin
pull-etcd-coverage-report
pull-etcd-e2e-386
pull-etcd-e2e-amd64
pull-etcd-e2e-arm64
pull-etcd-fuzzing-v3rpc
pull-etcd-govulncheck
pull-etcd-grpcproxy-e2e-amd64
pull-etcd-grpcproxy-e2e-arm64
pull-etcd-grpcproxy-integration-amd64
pull-etcd-grpcproxy-integration-arm64
pull-etcd-integration-1-cpu-amd64
pull-etcd-integration-1-cpu-arm64
pull-etcd-integration-2-cpu-amd64
pull-etcd-integration-2-cpu-arm64
pull-etcd-integration-4-cpu-amd64
pull-etcd-integration-4-cpu-arm64
pull-etcd-release-tests
pull-etcd-robustness-amd64
pull-etcd-robustness-arm64
pull-etcd-unit-test-386
pull-etcd-unit-test-amd64
pull-etcd-unit-test-arm64
pull-etcd-verify

In response to this:

/retest pull-etcd-robustness-amd64

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@henrybear327
Copy link
Contributor Author

/test pull-etcd-robustness-amd64

@henrybear327 henrybear327 force-pushed the robustness-test-reproduce-20573 branch 3 times, most recently from c11049c to a0744cb Compare September 19, 2025 20:06
@henrybear327 henrybear327 force-pushed the robustness-test-reproduce-20573 branch from a0744cb to 70a3d98 Compare September 21, 2025 14:54
@henrybear327 henrybear327 changed the title Add robustness test regression target for 20573 Add robustness test regression target for 20693 Sep 21, 2025
@henrybear327
Copy link
Contributor Author

@serathius I am renaming the target to 20693, as this is probably a better reference to the issue you fixed and discovered!

@henrybear327 henrybear327 force-pushed the robustness-test-reproduce-20573 branch 2 times, most recently from 8387c9e to fc5b2c7 Compare September 21, 2025 18:38
@henrybear327
Copy link
Contributor Author

Merging of this PR is dependent on #20689

@henrybear327 henrybear327 force-pushed the robustness-test-reproduce-20573 branch from fc5b2c7 to adb5c97 Compare September 21, 2025 18:49
@henrybear327 henrybear327 force-pushed the robustness-test-reproduce-20573 branch from adb5c97 to 926f4c1 Compare September 21, 2025 19:09
@henrybear327
Copy link
Contributor Author

/retest

@henrybear327 henrybear327 force-pushed the robustness-test-reproduce-20573 branch 2 times, most recently from 7a2a7b3 to e5cd591 Compare September 22, 2025 08:02

scenarios = append(scenarios, TestScenario{
Name: "issue20693",
Profile: traffic.HighTrafficProfile.WithoutCompaction().WithBackgroundWatchConfigInterval(10 * time.Millisecond).WithBackgroundWatchConfigRevisionOffset(-10),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the best way to reproduce the issue. The current reproduction rate (one per 30 minutes) is to low. As the root cause is known, request with watch revision -1, please consider adding more targeted test to ensure higher reproducibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got you.

Still puzzled by how come I am consistently reproduce the problem almost all the time with <2 min on my Macbook but not else where.

@henrybear327 henrybear327 force-pushed the robustness-test-reproduce-20573 branch from e5cd591 to 6782a10 Compare September 22, 2025 08:42
Comment on lines +301 to +307
e2e.WithClusterSize(3),
e2e.WithCompactionBatchLimit(10),
e2e.WithSnapshotCount(50),
e2e.WithSnapshotCatchUpEntries(100),
e2e.WithGoFailEnabled(true),
e2e.WithPeerProxy(true),
e2e.WithIsPeerTLS(true),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current root cause known, we can just disable most of those config changes.

Reference:
- etcd-io#20693

Signed-off-by: Chun-Hung Tseng <[email protected]>
@henrybear327 henrybear327 force-pushed the robustness-test-reproduce-20573 branch from 6782a10 to 7587cfc Compare September 26, 2025 04:47
@k8s-ci-robot
Copy link

@henrybear327: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-e2e-amd64 7587cfc link true /test pull-etcd-e2e-amd64
pull-etcd-e2e-386 7587cfc link true /test pull-etcd-e2e-386
pull-etcd-robustness-amd64 7587cfc link true /test pull-etcd-robustness-amd64
pull-etcd-robustness-arm64 7587cfc link true /test pull-etcd-robustness-arm64

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants