Mongo application CSI DOWN FOR MAINTENANCE #2002

weshayutin · 2025-10-29T20:27:20Z

Why the changes were made

Too many test failures w/ mongo csi

How to test the changes made

kaovilai

/lgtm

openshift-ci-robot · 2025-10-29T22:56:51Z

/retest-required

Remaining retests: 0 against base HEAD 513180a and 2 for PR HEAD 4356982 in total

kaovilai · 2025-10-30T01:12:09Z

[FAIL] Backup and restore tests Backup and restore applications [It] Mongo application DATAMOVER
is also still failing which I guess is also technically CSI

kaovilai · 2025-10-30T01:18:57Z

/retest

openshift-ci · 2025-10-30T14:10:27Z

New changes are detected. LGTM label has been removed.

kaovilai · 2025-10-30T16:24:30Z

/retest

weshayutin · 2025-10-31T14:58:11Z

/retest

coderabbitai · 2025-10-31T17:33:58Z

Walkthrough

Three Mongo-related test entries are disabled via commenting and marked "DOWN FOR MAINTENANCE" across two test suite files. The modifications do not alter test logic or scaffolding; remaining tests continue unchanged.

Changes

Cohort / File(s)	Summary
Test suite Mongo test disabling `tests/e2e/backup_restore_cli_suite_test.go`, `tests/e2e/backup_restore_suite_test.go`	Commented out three Mongo-related test cases (CSI, DATAMOVER, BlockDevice DATAMOVER) with "DOWN FOR MAINTENANCE" annotations, effectively disabling them from execution while preserving their definitions.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Changes are straightforward test case disabling via comments with consistent annotations across both files
No logic modifications or structural changes to test framework
Pattern is homogeneous and repetitive across the affected files

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description is significantly incomplete against the required template. While the "Why the changes were made" section is present, it provides only a minimal one-line explanation ("Too many test failures w/ mongo csi") without context or detail. More critically, the "How to test the changes made" section is entirely empty—it is present as a header but contains no content whatsoever. The description fails to meet the "mostly complete" standard because one of the two required sections has no substantive information, making it largely incomplete.	The "How to test the changes made" section should be completed with clear instructions on how to verify the changes, such as running the test suite and confirming that the specified Mongo tests are skipped or commented out as intended. Additionally, expand the "Why the changes were made" section to provide more context, such as linking to related issues or test failure reports that motivated disabling these tests, and explain why these particular tests need maintenance rather than just stating that "too many test failures" exist.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The pull request title "Mongo application CSI DOWN FOR MAINTENANCE" is clearly related to the main changes in the pull request, which involve commenting out and marking Mongo-related test cases as down for maintenance. The title accurately reflects the primary action taken (disabling Mongo tests due to failures) and uses the "DOWN FOR MAINTENANCE" annotation that appears directly in the code changes. While the PR also disables Mongo DATAMOVER and BlockDevice DATAMOVER tests in addition to CSI tests, the root cause and context (CSI-related test failures) make the title appropriately focused on the core issue being addressed.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/e2e/backup_restore_suite_test.go (1)

411-422: Consider tracking this maintenance work with a GitHub issue.

The test is appropriately disabled with a clear marker. However, to ensure these tests are re-enabled once the underlying issues are resolved, consider:

Creating a GitHub issue to track the Mongo CSI test failures

Referencing the issue number in the comment (e.g., // DOWN FOR MAINTENANCE - See issue #XXXX)

Setting a timeline or milestone for re-enabling the tests

Do you want me to help draft a GitHub issue description to track the re-enablement of these Mongo-related tests?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 513180a and dbf1bec.

📒 Files selected for processing (2)

tests/e2e/backup_restore_cli_suite_test.go (3 hunks)
tests/e2e/backup_restore_suite_test.go (3 hunks)

🔇 Additional comments (5)

tests/e2e/backup_restore_suite_test.go (2)

445-456: Consistent with the maintenance approach for Mongo tests.

This DATAMOVER test is appropriately disabled alongside the CSI test. Ensure all disabled Mongo tests are tracked together in a single GitHub issue for easier management and re-enablement.

468-480: Third Mongo test appropriately disabled.

This BlockDevice DATAMOVER test completes the set of disabled Mongo tests. All three disabled tests (CSI, DATAMOVER, BlockDevice DATAMOVER) should be tracked in a single GitHub issue for coordinated re-enablement.

tests/e2e/backup_restore_cli_suite_test.go (3)

273-284: CLI tests consistently disabled with non-CLI tests.

This mirrors the Mongo CSI test disabled in backup_restore_suite_test.go. Ensure both the CLI and non-CLI Mongo tests are tracked together in the same GitHub issue for coordinated re-enablement.

307-318: DATAMOVER CLI test appropriately disabled.

This mirrors the DATAMOVER test disabled in the non-CLI suite. The consistency across both test files is good.

330-342: BlockDevice DATAMOVER CLI test appropriately disabled.

This completes the set of disabled Mongo CLI tests, maintaining consistency with the non-CLI test suite. All six disabled tests (3 non-CLI + 3 CLI) should be tracked together.

kaovilai · 2025-10-31T20:40:00Z

/retest

mpryc · 2025-11-05T10:52:13Z

@weshayutin wdyt ? https://github.com/openshift/oadp-operator/pull/2009/files#diff-fb6f4195a3b0f6bf8e0bdf1506a517b980bf8d5dc5c2fa767e5210277251bb4cR21

kaovilai · 2025-11-05T14:15:02Z

@mpryc known failures we can use PEntry instead of Entry which is easier one line change.

kaovilai · 2025-11-05T14:15:25Z

also just merged crd update, expect FSB backups to work now.

/retest

kaovilai · 2025-11-05T17:23:56Z

/retest

ai-retester: The e2e-test-aws-e2e step failed because the MySQL application KOPIA test timed out, and other tests were skipped due to its failure. The specific error indicates the todolist container was unable to start, hanging in the PodInitializing state. This likely caused the liveness probes of the mysql container to continually fail in the pod, eventually marking the pod as NotReady. This then likely resulted in failing to meet eventually condition for the pods by the AfterEach step hence, that step also failed and resulted in the test suite/build failing.

The OADP e2e tests failed because the MySQL application KOPIA test timed out, and the test container e2e-test-aws-e2e exited with a non-zero code. It appears a pod was not ready during the test, specifically a todolist pod was in PodInitializing state.

The e2e test e2e-test-cli-aws failed because the Mongo application Native-Snapshots via CLI test timed out, and the todolist pod was stuck in PodInitializing state. This suggests a failure in the backup and restore process for MongoDB using native snapshots via the OADP CLI tool.

kaovilai · 2025-11-05T19:08:54Z

/retest

ai-retester: The e2e tests failed because the "MySQL application KOPIA" test timed out after 540 seconds, and the mysql container repeatedly failed its liveness probe, causing restarts. The todolist container was also stuck in PodInitializing state. This prevented the e2e tests from completing successfully.

The e2e-test-aws-e2e step failed because the MySQL application KOPIA test timed out after 540 seconds, indicating a performance issue or some unhandled exception. The pod todolist-1-2wnll was also waiting to start. There are some warning and skips on the deploymentconfig, volume, the route and in liveness probes.

kaovilai · 2025-11-05T22:36:57Z

/retest

ai-retester: The e2e-test-aws-e2e step failed because the "MySQL application KOPIA" test timed out and the todolist container was waiting to start due to PodInitializing. This indicates an issue with the MySQL application's readiness during the restoration phase when using KOPIA for backup/restore.

The e2e-test-aws-e2e step failed because a "MySQL application KOPIA" test timed out and failed after the pod entered a ContainersNotReady state due to a liveness probe failing on the mysql container. Specifically, something went wrong with Pod mysql-656595cfcb-cbcb5/ causing it to alternate to a False state.

The e2e tests failed specifically during the "Mongo application Native-Snapshots via CLI" test, because the todolist pod never reached a successful state, and after many retries it eventually timed out waiting for the Pod. It exited with code 2.

kaovilai · 2025-11-06T01:36:56Z

/retest

ai-retester: The e2e-test-aws step failed because the MySQL application KOPIA test timed out after 540 seconds, resulting in an "ContainersNotReady" state for containers running the MySQL database. The pod failed 2 separate tests.

The "MySQL application two Vol CSI" e2e test failed because it couldn't connect to the todolist service after the restore. The container test exited with code 2.

The e2e "Mongo application Native-Snapshots via CLI" test timed out and failed because the todolist pod never reached a successful state during the test.

kaovilai · 2025-11-06T04:09:52Z

/retest

ai-retester: The e2e tests failed, specifically the "MySQL application two Vol CSI" test in the oadp-operator repository. The pod was in PodInitializing status preventing container from starting.

The e2e-test-cli-aws-e2e step failed because the Mongo application Native-Snapshots via CLI test timed out after 540 seconds, and a container "todolist" in pod "todolist-6d7bb9554c-zj468" is waiting to start: PodInitializing. This indicates a problem with the application deployment or dependencies within the test environment causing a timeout during the execution of the end-to-end test.

kaovilai · 2025-11-06T06:39:53Z

/retest

ai-retester: The e2e tests failed. Specifically, the "MySQL application two Vol CSI" test failed because the restore process encountered an error when accessing the application's endpoint via the route and subsequently timed out trying to communicate with service directly. The container exited with code 7 during this phase.

The e2e test e2e-test-cli-aws failed because the Mongo application Native-Snapshots via CLI test timed out after 540 seconds and because container "todolist" in pod "todolist-6d7bb9554c-grxws" is waiting to start: PodInitializing. This indicates a problem with the test application or the test environment during the e2e run.

kaovilai · 2025-11-06T09:24:02Z

/retest

ai-retester: The e2e test failed because the "Mongo application Native-Snapshots" test timed out after 540 seconds while waiting for a Pod to succeed. The container "test" in pod "e2e-test-aws-e2e" also exited with an error code of 2 indicating a test failure.

The only test that hit a hard failure was the “MySQL application – two‑volume CSI via CLI” test.
After the backup succeeded and the restore was declared Completed, the restore job tried to bring up the MySQL/Todolist
application in the mysql‑persistent namespace. The pod that should have started the
mysql‑todolist container (todolist‑1‑f8kzm) stuck forever in the “PodInitializing” state and the test
timed‑out after ~4 min.

A few clues from the logs explain why the pod never reached a running state:

Volume provisioning & snapshoting – The CSI snapshots were created and marked “Ready” by the AWS CSI provisioner, but the PVC for the applog volume was only provisioned after the pod launch. The pod had to wait for the data‑source snapshot to be online.
```
failed to provision volume with StorageClass "gp2-csi": error… snapshot … is not Ready
```
DeploymentConfig deprecation – The pod was created from a DeploymentConfig (which is still enabled in this cluster) and the logs printed warnings about its deprecation at every phase.
```
apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
```
The test framework continued, but the pod was never marked Ready.
Init‑container timing – The init‑myservice container waited until the MySQL service became reachable. It finished quickly (init-myservice exited cleanly) but the main todolist container had to wait for the application image to be pulled and for the applog volume to be attached.
```
container "todolist" in pod "todolist-1-f8kzm" is waiting to start: PodInitializing
```
Timing out of the restore – Because the pod never transitioned to Ready, the after‑restore verification (curl to /todo) never finished, and the Ginkgo test timed out, flagging a failure.

In short, the test failed because the restore step created a pod that could not finish initialization: the CSI snapshots were not yet fully available, the old DeploymentConfig style was still used (causing additional delays), and the init‑container finished too late, so the main container never started before the test timed out. Fixing the volume claim handling or switching the pod deployment to a modern Deployment (or ensuring the CSI snapshots are ready before the restore starts) would eliminate the failure.

kaovilai · 2025-11-06T12:22:56Z

/retest

ai-retester: The e2e tests failed because the "test" container in pod "e2e-test-cli-aws-e2e" exited with a non-zero exit code (2), indicating an error during the OADP e2e tests. More specifically, the test "MySQL application KOPIA via CLI" timed out after 540 seconds and the pod "todolist-1-md9jw" was stuck in PodInitializing state, which typically means the pod couldn't start correctly, likely due to underlying infrastructure or configuration issues.

The automated test e2e-test-aws failed because the Mongo application Native-Snapshots test case timed out after 540 seconds while waiting for a pod to succeed. The pod's test container exited with code 2, indicating a failure in the tests it was running.

kaovilai · 2025-11-06T15:16:58Z

/retest

ai-retester: The e2e tests failed because the "MySQL application two Vol CSI" test failed. Specifically, the restore process failed in that test with a "PodInitializing" error and network connectivity issues.

The test e2e-test-aws failed because the Mongo application Native-Snapshots test case timed out after 540 seconds, and the subsequent cleanup also encountered errors. Specifically, the todolist container in the todolist-6d7bb9554c-wwlwc pod remained in a PodInitializing state, indicating a problem with the application deployment or environment setup.

shubham-pampattiwar · 2025-11-06T18:04:54Z

/retest

openshift-ci · 2025-11-06T18:05:21Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kaovilai, shubham-pampattiwar, weshayutin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~tests/e2e/OWNERS~~ [kaovilai,shubham-pampattiwar,weshayutin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kaovilai · 2025-11-06T20:53:56Z

/retest

ai-retester: The e2e-test-aws test failed because the Mongo application Native-Snapshots test timed out while waiting for a pod to become ready, and the container test exited with an error.

The e2e-test-cli-aws-e2e step failed because the Mongo application Native-Snapshots via CLI test timed out. The pod todolist-6d7bb9554c-sfw8k failed as a result.

kaovilai · 2025-11-06T23:32:54Z

/retest

ai-retester: The e2e test failed because the Mongo application Native-Snapshots via CLI test timed out after 540 seconds. The underlying issue was that a "todolist" container in a pod failed to start, remaining in the PodInitializing state.

The e2e test failed. Specifically, the "MySQL application two Vol CSI" test failed because the application endpoint was not accessible after the restore. The todolist container might not be starting properly. Also there are podInitializing errors and multi-attach errors observed from the app namespace events.

kaovilai · 2025-11-07T02:47:55Z

/retest

ai-retester: The "Mongo application Native-Snapshots" e2e test timed out and failed after 1 hour, 1 minute, and 11 seconds. The test pod got stuck in PodInitializing.

The e2e test Mongo application Native-Snapshots via CLI timed out after 540 seconds, causing the e2e-test-cli-aws test step to fail. The Todolist pod was not becoming ready in the allotted time.

openshift-ci · 2025-11-07T05:07:06Z

@weshayutin: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/4.20-e2e-test-aws	`dbf1bec`	link	false	`/test 4.20-e2e-test-aws`
ci/prow/4.19-e2e-test-cli-aws	`dbf1bec`	link	true	`/test 4.19-e2e-test-cli-aws`
ci/prow/4.19-e2e-test-aws	`dbf1bec`	link	true	`/test 4.19-e2e-test-aws`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

kaovilai · 2025-11-07T06:25:10Z

Let's merge #2013 #2014 instead.
/hold

kaovilai · 2025-11-07T15:25:45Z

no need: #2013

Mongo application CSI DOWN FOR MAINTENANCE

4356982

weshayutin mentioned this pull request Oct 29, 2025

Merge https://github.com/openshift/oadp-operator:oadp-dev (513180a) into oadp-dev #1990

Merged

openshift-ci bot requested review from hhpatel14 and sseago October 29, 2025 20:28

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 29, 2025

shubham-pampattiwar previously approved these changes Oct 29, 2025

View reviewed changes

kaovilai previously approved these changes Oct 29, 2025

View reviewed changes

openshift-ci bot assigned kaovilai Oct 29, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 29, 2025

add mongo datamover to maint

c0e61fd

weshayutin dismissed stale reviews from kaovilai and shubham-pampattiwar via c0e61fd October 30, 2025 14:10

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Oct 30, 2025

mongo blockdevice removed

dbf1bec

coderabbitai bot reviewed Oct 31, 2025

View reviewed changes

mpryc mentioned this pull request Nov 5, 2025

UPSTREAM: <carry>: Skip few Mongo tests. #2009

Closed

shubham-pampattiwar approved these changes Nov 6, 2025

View reviewed changes

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 7, 2025

kaovilai closed this Nov 7, 2025

Mongo application CSI DOWN FOR MAINTENANCE #2002

Mongo application CSI DOWN FOR MAINTENANCE #2002

Conversation

weshayutin commented Oct 29, 2025

Why the changes were made

How to test the changes made

Uh oh!

kaovilai left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Oct 29, 2025

Uh oh!

kaovilai commented Oct 30, 2025

Uh oh!

kaovilai commented Oct 30, 2025

Uh oh!

openshift-ci bot commented Oct 30, 2025

Uh oh!

kaovilai commented Oct 30, 2025

Uh oh!

weshayutin commented Oct 31, 2025

Uh oh!

coderabbitai bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

kaovilai commented Oct 31, 2025

Uh oh!

mpryc commented Nov 5, 2025

Uh oh!

kaovilai commented Nov 5, 2025

Uh oh!

kaovilai commented Nov 5, 2025

Uh oh!

kaovilai commented Nov 5, 2025

Uh oh!

kaovilai commented Nov 5, 2025

Uh oh!

kaovilai commented Nov 5, 2025

Uh oh!

kaovilai commented Nov 6, 2025

Uh oh!

kaovilai commented Nov 6, 2025

Uh oh!

kaovilai commented Nov 6, 2025

Uh oh!

kaovilai commented Nov 6, 2025

Uh oh!

kaovilai commented Nov 6, 2025

Uh oh!

kaovilai commented Nov 6, 2025

Uh oh!

shubham-pampattiwar commented Nov 6, 2025

Uh oh!

openshift-ci bot commented Nov 6, 2025

Uh oh!

kaovilai commented Nov 6, 2025

Uh oh!

kaovilai commented Nov 6, 2025

Uh oh!

kaovilai commented Nov 7, 2025

Uh oh!

openshift-ci bot commented Nov 7, 2025

Uh oh!

kaovilai commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaovilai commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

coderabbitai bot commented Oct 31, 2025 •

edited

Loading

kaovilai commented Nov 7, 2025 •

edited

Loading