Skip to content

Conversation

@weshayutin
Copy link
Contributor

Why the changes were made

Too many test failures w/ mongo csi

How to test the changes made

@openshift-ci openshift-ci bot requested review from hhpatel14 and sseago October 29, 2025 20:28
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 29, 2025
kaovilai
kaovilai previously approved these changes Oct 29, 2025
Copy link
Member

@kaovilai kaovilai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 29, 2025
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 513180a and 2 for PR HEAD 4356982 in total

@kaovilai
Copy link
Member

[FAIL] Backup and restore tests Backup and restore applications [It] Mongo application DATAMOVER
is also still failing which I guess is also technically CSI

@kaovilai
Copy link
Member

/retest

@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 30, 2025
@openshift-ci
Copy link

openshift-ci bot commented Oct 30, 2025

New changes are detected. LGTM label has been removed.

@kaovilai
Copy link
Member

/retest

1 similar comment
@weshayutin
Copy link
Contributor Author

/retest

@coderabbitai
Copy link

coderabbitai bot commented Oct 31, 2025

Walkthrough

Three Mongo-related test entries are disabled via commenting and marked "DOWN FOR MAINTENANCE" across two test suite files. The modifications do not alter test logic or scaffolding; remaining tests continue unchanged.

Changes

Cohort / File(s) Summary
Test suite Mongo test disabling
tests/e2e/backup_restore_cli_suite_test.go, tests/e2e/backup_restore_suite_test.go
Commented out three Mongo-related test cases (CSI, DATAMOVER, BlockDevice DATAMOVER) with "DOWN FOR MAINTENANCE" annotations, effectively disabling them from execution while preserving their definitions.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

  • Changes are straightforward test case disabling via comments with consistent annotations across both files
  • No logic modifications or structural changes to test framework
  • Pattern is homogeneous and repetitive across the affected files

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The pull request description is significantly incomplete against the required template. While the "Why the changes were made" section is present, it provides only a minimal one-line explanation ("Too many test failures w/ mongo csi") without context or detail. More critically, the "How to test the changes made" section is entirely empty—it is present as a header but contains no content whatsoever. The description fails to meet the "mostly complete" standard because one of the two required sections has no substantive information, making it largely incomplete. The "How to test the changes made" section should be completed with clear instructions on how to verify the changes, such as running the test suite and confirming that the specified Mongo tests are skipped or commented out as intended. Additionally, expand the "Why the changes were made" section to provide more context, such as linking to related issues or test failure reports that motivated disabling these tests, and explain why these particular tests need maintenance rather than just stating that "too many test failures" exist.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title "Mongo application CSI DOWN FOR MAINTENANCE" is clearly related to the main changes in the pull request, which involve commenting out and marking Mongo-related test cases as down for maintenance. The title accurately reflects the primary action taken (disabling Mongo tests due to failures) and uses the "DOWN FOR MAINTENANCE" annotation that appears directly in the code changes. While the PR also disables Mongo DATAMOVER and BlockDevice DATAMOVER tests in addition to CSI tests, the root cause and context (CSI-related test failures) make the title appropriately focused on the core issue being addressed.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/e2e/backup_restore_suite_test.go (1)

411-422: Consider tracking this maintenance work with a GitHub issue.

The test is appropriately disabled with a clear marker. However, to ensure these tests are re-enabled once the underlying issues are resolved, consider:

  • Creating a GitHub issue to track the Mongo CSI test failures
  • Referencing the issue number in the comment (e.g., // DOWN FOR MAINTENANCE - See issue #XXXX)
  • Setting a timeline or milestone for re-enabling the tests

Do you want me to help draft a GitHub issue description to track the re-enablement of these Mongo-related tests?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 513180a and dbf1bec.

📒 Files selected for processing (2)
  • tests/e2e/backup_restore_cli_suite_test.go (3 hunks)
  • tests/e2e/backup_restore_suite_test.go (3 hunks)
🔇 Additional comments (5)
tests/e2e/backup_restore_suite_test.go (2)

445-456: Consistent with the maintenance approach for Mongo tests.

This DATAMOVER test is appropriately disabled alongside the CSI test. Ensure all disabled Mongo tests are tracked together in a single GitHub issue for easier management and re-enablement.


468-480: Third Mongo test appropriately disabled.

This BlockDevice DATAMOVER test completes the set of disabled Mongo tests. All three disabled tests (CSI, DATAMOVER, BlockDevice DATAMOVER) should be tracked in a single GitHub issue for coordinated re-enablement.

tests/e2e/backup_restore_cli_suite_test.go (3)

273-284: CLI tests consistently disabled with non-CLI tests.

This mirrors the Mongo CSI test disabled in backup_restore_suite_test.go. Ensure both the CLI and non-CLI Mongo tests are tracked together in the same GitHub issue for coordinated re-enablement.


307-318: DATAMOVER CLI test appropriately disabled.

This mirrors the DATAMOVER test disabled in the non-CLI suite. The consistency across both test files is good.


330-342: BlockDevice DATAMOVER CLI test appropriately disabled.

This completes the set of disabled Mongo CLI tests, maintaining consistency with the non-CLI test suite. All six disabled tests (3 non-CLI + 3 CLI) should be tracked together.

@kaovilai
Copy link
Member

/retest

@mpryc
Copy link
Contributor

mpryc commented Nov 5, 2025

@kaovilai
Copy link
Member

kaovilai commented Nov 5, 2025

@mpryc known failures we can use PEntry instead of Entry which is easier one line change.

@kaovilai
Copy link
Member

kaovilai commented Nov 5, 2025

also just merged crd update, expect FSB backups to work now.

/retest

@kaovilai
Copy link
Member

kaovilai commented Nov 5, 2025

/retest

ai-retester: The e2e-test-aws-e2e step failed because the MySQL application KOPIA test timed out, and other tests were skipped due to its failure. The specific error indicates the todolist container was unable to start, hanging in the PodInitializing state. This likely caused the liveness probes of the mysql container to continually fail in the pod, eventually marking the pod as NotReady. This then likely resulted in failing to meet eventually condition for the pods by the AfterEach step hence, that step also failed and resulted in the test suite/build failing.

The OADP e2e tests failed because the MySQL application KOPIA test timed out, and the test container e2e-test-aws-e2e exited with a non-zero code. It appears a pod was not ready during the test, specifically a todolist pod was in PodInitializing state.

The e2e test e2e-test-cli-aws failed because the Mongo application Native-Snapshots via CLI test timed out, and the todolist pod was stuck in PodInitializing state. This suggests a failure in the backup and restore process for MongoDB using native snapshots via the OADP CLI tool.

@kaovilai
Copy link
Member

kaovilai commented Nov 5, 2025

/retest

ai-retester: The e2e tests failed because the "MySQL application KOPIA" test timed out after 540 seconds, and the mysql container repeatedly failed its liveness probe, causing restarts. The todolist container was also stuck in PodInitializing state. This prevented the e2e tests from completing successfully.

The e2e-test-aws-e2e step failed because the MySQL application KOPIA test timed out after 540 seconds, indicating a performance issue or some unhandled exception. The pod todolist-1-2wnll was also waiting to start. There are some warning and skips on the deploymentconfig, volume, the route and in liveness probes.

@kaovilai
Copy link
Member

kaovilai commented Nov 5, 2025

/retest

ai-retester: The e2e-test-aws-e2e step failed because the "MySQL application KOPIA" test timed out and the todolist container was waiting to start due to PodInitializing. This indicates an issue with the MySQL application's readiness during the restoration phase when using KOPIA for backup/restore.

The e2e-test-aws-e2e step failed because a "MySQL application KOPIA" test timed out and failed after the pod entered a ContainersNotReady state due to a liveness probe failing on the mysql container. Specifically, something went wrong with Pod mysql-656595cfcb-cbcb5/ causing it to alternate to a False state.

The e2e tests failed specifically during the "Mongo application Native-Snapshots via CLI" test, because the todolist pod never reached a successful state, and after many retries it eventually timed out waiting for the Pod. It exited with code 2.

@kaovilai
Copy link
Member

kaovilai commented Nov 6, 2025

/retest

ai-retester: The e2e-test-aws step failed because the MySQL application KOPIA test timed out after 540 seconds, resulting in an "ContainersNotReady" state for containers running the MySQL database. The pod failed 2 separate tests.

The "MySQL application two Vol CSI" e2e test failed because it couldn't connect to the todolist service after the restore. The container test exited with code 2.

The e2e "Mongo application Native-Snapshots via CLI" test timed out and failed because the todolist pod never reached a successful state during the test.

@kaovilai
Copy link
Member

kaovilai commented Nov 6, 2025

/retest

ai-retester: The e2e tests failed, specifically the "MySQL application two Vol CSI" test in the oadp-operator repository. The pod was in PodInitializing status preventing container from starting.

The e2e-test-cli-aws-e2e step failed because the Mongo application Native-Snapshots via CLI test timed out after 540 seconds, and a container "todolist" in pod "todolist-6d7bb9554c-zj468" is waiting to start: PodInitializing. This indicates a problem with the application deployment or dependencies within the test environment causing a timeout during the execution of the end-to-end test.

@kaovilai
Copy link
Member

kaovilai commented Nov 6, 2025

/retest

ai-retester: The e2e tests failed. Specifically, the "MySQL application two Vol CSI" test failed because the restore process encountered an error when accessing the application's endpoint via the route and subsequently timed out trying to communicate with service directly. The container exited with code 7 during this phase.

The e2e test e2e-test-cli-aws failed because the Mongo application Native-Snapshots via CLI test timed out after 540 seconds and because container "todolist" in pod "todolist-6d7bb9554c-grxws" is waiting to start: PodInitializing. This indicates a problem with the test application or the test environment during the e2e run.

@kaovilai
Copy link
Member

kaovilai commented Nov 6, 2025

/retest

ai-retester: The e2e test failed because the "Mongo application Native-Snapshots" test timed out after 540 seconds while waiting for a Pod to succeed. The container "test" in pod "e2e-test-aws-e2e" also exited with an error code of 2 indicating a test failure.

The only test that hit a hard failure was the “MySQL application – two‑volume CSI via CLI” test.
After the backup succeeded and the restore was declared Completed, the restore job tried to bring up the MySQL/Todolist
application in the mysql‑persistent namespace. The pod that should have started the
mysql‑todolist container (todolist‑1‑f8kzm) stuck forever in the “PodInitializing” state and the test
timed‑out after ~4 min.

A few clues from the logs explain why the pod never reached a running state:

  • Volume provisioning & snapshoting – The CSI snapshots were created and marked “Ready” by the AWS CSI provisioner, but the PVC for the applog volume was only provisioned after the pod launch. The pod had to wait for the data‑source snapshot to be online.
    failed to provision volume with StorageClass "gp2-csi": error… snapshot … is not Ready
    
  • DeploymentConfig deprecation – The pod was created from a DeploymentConfig (which is still enabled in this cluster) and the logs printed warnings about its deprecation at every phase.
    apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
    
    The test framework continued, but the pod was never marked Ready.
  • Init‑container timing – The init‑myservice container waited until the MySQL service became reachable. It finished quickly (init-myservice exited cleanly) but the main todolist container had to wait for the application image to be pulled and for the applog volume to be attached.
    container "todolist" in pod "todolist-1-f8kzm" is waiting to start: PodInitializing
    
  • Timing out of the restore – Because the pod never transitioned to Ready, the after‑restore verification (curl to /todo) never finished, and the Ginkgo test timed out, flagging a failure.

In short, the test failed because the restore step created a pod that could not finish initialization: the CSI snapshots were not yet fully available, the old DeploymentConfig style was still used (causing additional delays), and the init‑container finished too late, so the main container never started before the test timed out. Fixing the volume claim handling or switching the pod deployment to a modern Deployment (or ensuring the CSI snapshots are ready before the restore starts) would eliminate the failure.

@kaovilai
Copy link
Member

kaovilai commented Nov 6, 2025

/retest

ai-retester: The e2e tests failed because the "test" container in pod "e2e-test-cli-aws-e2e" exited with a non-zero exit code (2), indicating an error during the OADP e2e tests. More specifically, the test "MySQL application KOPIA via CLI" timed out after 540 seconds and the pod "todolist-1-md9jw" was stuck in PodInitializing state, which typically means the pod couldn't start correctly, likely due to underlying infrastructure or configuration issues.

The automated test e2e-test-aws failed because the Mongo application Native-Snapshots test case timed out after 540 seconds while waiting for a pod to succeed. The pod's test container exited with code 2, indicating a failure in the tests it was running.

@kaovilai
Copy link
Member

kaovilai commented Nov 6, 2025

/retest

ai-retester: The e2e tests failed because the "MySQL application two Vol CSI" test failed. Specifically, the restore process failed in that test with a "PodInitializing" error and network connectivity issues.

The test e2e-test-aws failed because the Mongo application Native-Snapshots test case timed out after 540 seconds, and the subsequent cleanup also encountered errors. Specifically, the todolist container in the todolist-6d7bb9554c-wwlwc pod remained in a PodInitializing state, indicating a problem with the application deployment or environment setup.

@shubham-pampattiwar
Copy link
Member

/retest

@openshift-ci
Copy link

openshift-ci bot commented Nov 6, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kaovilai, shubham-pampattiwar, weshayutin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kaovilai
Copy link
Member

kaovilai commented Nov 6, 2025

/retest

ai-retester: The e2e-test-aws test failed because the Mongo application Native-Snapshots test timed out while waiting for a pod to become ready, and the container test exited with an error.

The e2e-test-cli-aws-e2e step failed because the Mongo application Native-Snapshots via CLI test timed out. The pod todolist-6d7bb9554c-sfw8k failed as a result.

@kaovilai
Copy link
Member

kaovilai commented Nov 6, 2025

/retest

ai-retester: The e2e test failed because the Mongo application Native-Snapshots via CLI test timed out after 540 seconds. The underlying issue was that a "todolist" container in a pod failed to start, remaining in the PodInitializing state.

The e2e test failed. Specifically, the "MySQL application two Vol CSI" test failed because the application endpoint was not accessible after the restore. The todolist container might not be starting properly. Also there are podInitializing errors and multi-attach errors observed from the app namespace events.

@kaovilai
Copy link
Member

kaovilai commented Nov 7, 2025

/retest

ai-retester: The "Mongo application Native-Snapshots" e2e test timed out and failed after 1 hour, 1 minute, and 11 seconds. The test pod got stuck in PodInitializing.

The e2e test Mongo application Native-Snapshots via CLI timed out after 540 seconds, causing the e2e-test-cli-aws test step to fail. The Todolist pod was not becoming ready in the allotted time.

@openshift-ci
Copy link

openshift-ci bot commented Nov 7, 2025

@weshayutin: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/4.20-e2e-test-aws dbf1bec link false /test 4.20-e2e-test-aws
ci/prow/4.19-e2e-test-cli-aws dbf1bec link true /test 4.19-e2e-test-cli-aws
ci/prow/4.19-e2e-test-aws dbf1bec link true /test 4.19-e2e-test-aws

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@kaovilai
Copy link
Member

kaovilai commented Nov 7, 2025

Let's merge #2013 #2014 instead.
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 7, 2025
@kaovilai
Copy link
Member

kaovilai commented Nov 7, 2025

no need: #2013

@kaovilai kaovilai closed this Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants