Skip to content

Conversation

@rich7420
Copy link
Contributor

@rich7420 rich7420 commented Oct 29, 2025

What changes were proposed in this pull request?

This PR fixes two issues in the FSO repair tool:

  1. Fixed the DB path handling in getStoreFromPath() to use the actual directory name from the provided path instead of hardcoding "om.db". Now the tool supports any OM DB directory name (e.g., om-db-backup, om-copy).

  2. Renamed internal method and report references for better clarity:
    unreachable → pendingToDelete
    unreferenced → orphaned

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-13825

How was this patch tested?

build-branch

Copy link
Contributor

@sarvekshayr sarvekshayr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @rich7420.

How did you test this change?

@rich7420
Copy link
Contributor Author

rich7420 commented Oct 29, 2025

I tested with mvn test -Dtest=TestFSORepairTool -pl hadoop-ozone/integration-test.
should I add a function to test the part of "any OM DB directory name"?

@sarvekshayr
Copy link
Contributor

should I add a function to test the part of "any OM DB directory name"?

Yes, it would be good to add a test in TestFSORepairTool to verify different om db names.

@rich7420
Copy link
Contributor Author

@sarvekshayr thanks for your feedback! I've added a test and ci test passed!
https://github.com/rich7420/ozone/actions/runs/18930068530

@sarvekshayr
Copy link
Contributor

Tested the fix manually.

  • Before the fix: When the DB was not named om.db, the tool returned a null report, which was unclear and did not indicate the actual issue.
$ bin/ozone repair om fso-tree --db=/Users/sarvekshayr/OMDB/om.db --dry-run
[dry run] No running OM service detected. Proceeding with repair.
[dry run] Creating database with reachable and unreachable tables at /Users/sarvekshayr/OMDB/temp.db
[dry run] Processing volume: /benchmarking
[dry run] Processing bucket: benchmarking/e043069
...
$ bin/ozone repair om fso-tree --db=/Users/sarvekshayr/OMDB/copy/om-db-backup --dry-run
[dry run] No running OM service detected. Proceeding with repair.
[dry run] Creating database with reachable and unreachable tables at /Users/sarvekshayr/OMDB/copy/temp.db
[dry run] 
Reachable:
        Directories: 0
        Files: 0
        Bytes: 0
Unreachable (Pending to delete):
        Directories: 0
        Files: 0
        Bytes: 0
Unreferenced (Orphaned):
        Directories: 0
        Files: 0
        Bytes: 0
  • After the fix: The tool now accepts any DB name, and the report is generated correctly with no ambiguity.
$ bin/ozone repair om fso-tree --db=/Users/sarvekshayr/OMDB/om.db --dry-run
[dry run] No running OM service detected. Proceeding with repair.
[dry run] Creating database with reachable and unreachable tables at /Users/sarvekshayr/OMDB/temp.db
[dry run] Processing volume: /benchmarking
[dry run] Processing bucket: benchmarking/e043069
...
$ bin/ozone repair om fso-tree --db=/Users/sarvekshayr/OMDB/copy/om-db-backup --dry-run
[dry run] No running OM service detected. Proceeding with repair.
[dry run] Creating database with reachable and unreachable tables at /Users/sarvekshayr/OMDB/copy/temp.db
[dry run] Processing volume: /benchmarking
[dry run] Processing bucket: benchmarking/e043069
...


@Order(2)
@Test
public void testAlternateOmDbDirNameDryRun() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test currently passes with or without the fix because the previous implementation always expected om.db. Since om.db happens to exist in the same directory as om-db-backup, the tool silently picks up om.db and generates a valid report instead of returning a null report, which hides the bug. You can verify this behaviour once.

To properly validate the fix, please update the test so that om.db resides in a different path than om-db-backup.

Before the fix: the test should produce a null report.
After the fix: the test should correctly generate the report for the provided DB path.

@rich7420
Copy link
Contributor Author

@sarvekshayr thanks for the feeback! I'll try to fix it as soon as possible.

@sarvekshayr
Copy link
Contributor

@sarvekshayr thanks for the feeback! I'll try to fix it as soon as possible.

Sure, it's a simple test fix.

Comment on lines 393 to 395
@Order(2)
@Test
public void testAlternateOmDbDirNameDryRun() throws Exception {
Copy link
Contributor

@sarvekshayr sarvekshayr Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should assert the actual report values to ensure the report is not null, just verifying the presence of report fields is not enough.

Update the test to:

  • change the order
  • rename the method
  • place it after all dry run tests
  • validate the report content instead of only checking for field existence.
  @Order(ORDER_DRY_RUN)
  @Test
  public void testAlternateOmDbDirName() throws Exception {
    File original = new File(OMStorage.getOmDbDir(cluster.getConf()), OM_DB_NAME);
    // Place backup under a different parent directory to ensure we don't
    // accidentally open the original om.db due to path handling bugs.
    File backupParent = new File(OMStorage.getOmDbDir(cluster.getConf()), "copy");
    File backup = new File(backupParent, "om-db-backup");

    if (backup.exists()) {
      FileUtils.deleteDirectory(backup);
    }
    if (backupParent.exists()) {
      FileUtils.deleteDirectory(backupParent);
    }
    boolean created = backupParent.mkdirs();
    if (!created && !backupParent.exists()) {
      throw new IOException("Failed to create backup parent directory: " + backupParent);
    }
    FileUtils.copyDirectory(original, backup);

    out.reset();
    String expectedOutput = serializeReport(fullReport);

    int exitCode = executeWithDb(true, backup.getPath());
    assertEquals(0, exitCode, err.getOutput());

    String cliOutput = out.getOutput();
    String reportOutput = extractRelevantSection(cliOutput);
    assertEquals(expectedOutput, reportOutput);

    FileUtils.deleteDirectory(backupParent);
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use fullReport for validation.

@rich7420
Copy link
Contributor Author

@sarvekshayr thanks a lot! I've tried to modify function like you said.

Copy link
Contributor

@sarvekshayr sarvekshayr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rich7420 for patiently iterating on the patch. The changes look good to me now.

@rich7420
Copy link
Contributor Author

rich7420 commented Oct 31, 2025

@sarvekshayr thanks the feedback! you helped me a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants