Skip to content

Conversation

@alphaprinz
Copy link
Contributor

@alphaprinz alphaprinz commented Oct 7, 2025

Describe the Problem

Use new 'read only' pool (connected to cnpg replicated read-only cluster) in order to alleviate load from main, writable pg cluster.

Explain the Changes

  1. Move some background queries to read-only pool:
    -db cleancer
    -object reclaimer's first find
    -scrubber's first find

Issues: Fixed #xxx / Gap #xxx

Testing Instructions:

  • Doc added/updated
  • Tests added

Summary by CodeRabbit

  • New Features

    • executeSQL now accepts an optional preferred pool parameter to route queries to a specified database pool.
  • Improvements

    • Read operations now prefer read-only pool replicas for better query distribution.
    • Queries can automatically retry on the default pool if a preferred pool is unavailable, improving resilience and reducing failures.
  • Observability

    • Added debug logging for query host and retry behavior to aid troubleshooting.

@coderabbitai
Copy link

coderabbitai bot commented Oct 7, 2025

Walkthrough

Adds per-query pool selection and read-only pool usage: DBCollection.executeSQL accepts preferred_pool; MDStore read operations request preferred_pool: 'read_only'; Postgres client routes queries to the requested pool, falls back to default, and can retry on default after errors.

Changes

Cohort / File(s) Summary
TypeScript interface
src/sdk/nb.d.ts
DBCollection.executeSQL options now include optional preferred_pool?: string.
MDStore read-query sites
src/server/object_services/md_store.js
Read operations updated to pass { preferred_pool: 'read_only' } to underlying queries for multiple methods (e.g., find_unreclaimed_objects, find_deleted_objects, iterate_all_chunks, has_any_blocks_for_chunk, has_any_parts_for_chunk, find_deleted_chunks).
Postgres client & pool routing
src/util/postgres_client.js
executeSQL and internal query paths accept options.preferred_pool; get_pool supports selecting a named pool with fallback to default; pools carry retry_with_default_pool flag; _do_query logs host/retry and will retry on default pool when configured.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant MDStore
    participant PostgresTable
    participant PoolMgr as Pool Manager
    participant DefaultPool as default
    participant ReadPool as read_only

    Caller->>MDStore: read operation
    MDStore->>PostgresTable: executeSQL(query, params, {preferred_pool:'read_only'})
    PostgresTable->>PoolMgr: get_pool('read_only')
    alt read_only exists
        PoolMgr-->>PostgresTable: ReadPool
        PostgresTable->>ReadPool: run query
        alt success
            ReadPool-->>PostgresTable: result
            PostgresTable-->>MDStore: result
        else error and retry_with_default_pool
            PostgresTable->>DefaultPool: retry query on default (logged)
            DefaultPool-->>PostgresTable: result/error
            PostgresTable-->>MDStore: result/error
        end
    else read_only missing
        PoolMgr->>PoolMgr: fallback to default
        PoolMgr-->>PostgresTable: DefaultPool
        PostgresTable->>DefaultPool: run query
        DefaultPool-->>PostgresTable: result
        PostgresTable-->>MDStore: result
    end
    MDStore-->>Caller: return result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Potential attention areas:

  • src/util/postgres_client.js — verify fallback and retry semantics, logging, and propagation of retry_with_default_pool.
  • src/server/object_services/md_store.js — ensure all read paths consistently pass options and do not affect write operations.
  • src/sdk/nb.d.ts — ensure the declaration change aligns with runtime usage.

Possibly related PRs

Suggested labels

size/L

Suggested reviewers

  • dannyzaken
  • jackyalbo
  • liranmauda

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title Check ❓ Inconclusive The PR title "Read only uses" is vague and incomplete as a phrase, making it unclear what the actual change accomplishes. While the title does reference "read only," which is tangentially related to the changeset's focus on implementing a read-only pool mechanism, the title fails to clearly communicate the core purpose: implementing a read-only pool option to offload queries from the main PostgreSQL cluster. The incomplete phrasing lacks sufficient specificity to convey meaningful information about the changes to someone reviewing commit history. The changes span multiple files (type definitions, pool client implementation, and data store operations) but the title provides no clear indication of these modifications or their collective purpose.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@alphaprinz alphaprinz marked this pull request as ready for review October 17, 2025 22:47
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/util/postgres_client.js (1)

632-642: Review the fallback behavior in get_pool.

The fallback logic on lines 636-637 silently falls back to the default pool when a requested pool key is invalid:

if (key && key !== this.pool_key) {
    return this.get_pool();
}

Concerns:

  1. Silent fallback masks configuration issues: If code requests a non-existent pool (e.g., typo like 'read_onyl'), it silently uses the default pool instead of failing fast.
  2. Misleading error message: After the fallback attempt, line 639 throws an error mentioning the requested key, but the fallback might have already succeeded with this.pool_key.

Recommendation:
Consider removing the fallback or adding a warning log when falling back, so configuration issues are visible:

 get_pool(key = this.pool_key) {
     const pool = this.client.get_pool(key);
     if (!pool) {
-        //if original get_pool was no for the default this.pool_key, try also this.pool_key
+        // If original get_pool was not for the default this.pool_key, try also this.pool_key
         if (key && key !== this.pool_key) {
+            dbg.warn(`Pool ${key} not found, falling back to default pool ${this.pool_key}`);
             return this.get_pool();
         }
         throw new Error(`The postgres clients pool ${key} disconnected`);
     }
     return pool;
 }

Alternatively, remove the fallback entirely to fail fast on invalid pool names.

src/server/object_services/md_store.js (1)

814-835: Route mapReduce operations to read-only pool for consistent replica distribution.

The list_objects and list_object_versions methods use mapReduce for queries with delimiters (lines 814-824 and 864-874), but these calls don't pass preferred_pool to route to the read-only pool. While the find() operations support routing via preferred_pool: 'read_only' (as seen elsewhere in md_store.js), the mapReduce calls should be consistent.

To fix:

  1. Update mapReduceListObjects() and mapReduceAggregate() in postgres_client.js to extract options.preferred_pool and pass it to this.single_query() (currently they call this.single_query(mr_q) without the pool parameter)
  2. Pass preferred_pool: 'read_only' in the options object when calling this._objects.mapReduce() at lines 814-824 and 864-874 in md_store.js
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2367740 and d699655.

📒 Files selected for processing (4)
  • src/sdk/nb.d.ts (1 hunks)
  • src/server/bg_services/db_cleaner.js (1 hunks)
  • src/server/object_services/md_store.js (5 hunks)
  • src/util/postgres_client.js (5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-08T13:12:46.728Z
Learnt from: naveenpaul1
PR: noobaa/noobaa-core#9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.

Applied to files:

  • src/server/object_services/md_store.js
🧬 Code graph analysis (1)
src/server/bg_services/db_cleaner.js (1)
src/server/object_services/md_store.js (3)
  • dbg (14-14)
  • P (13-13)
  • config (28-28)
🔇 Additional comments (10)
src/server/object_services/md_store.js (6)

779-779: LGTM! Read-only pool routing for unreclaimed objects.

Correctly routes the find operation to the read-only pool, which aligns with the PR objective of offloading background read queries from the main cluster.


1127-1127: LGTM! Read-only pool routing for deleted objects query.

The SQL query correctly uses the read-only pool via the options parameter, consistent with the pattern established in this PR.


1603-1603: LGTM! Read-only pool routing for chunk iteration.

Appropriately routes the chunk iteration query to the read-only pool for the scrubber's background operations.


1778-1779: LGTM! Read-only pool routing for deleted chunks.

Correctly adds preferred_pool: 'read_only' to the find options for deleted chunks cleanup query.


1787-1788: LGTM! Read-only pool routing for chunk existence checks.

Both has_any_blocks_for_chunk and has_any_parts_for_chunk correctly route to the read-only pool. These existence checks are read-only operations suitable for read replicas.

Also applies to: 1796-1797


2039-2040: LGTM! Read-only pool routing for deleted blocks.

Correctly routes the deleted blocks query to the read-only pool for background cleanup operations.

src/util/postgres_client.js (3)

252-252: Helpful debug logging for pool routing.

The added debug log that includes pg_client.options?.host will be useful for verifying which pool (and host) is being used for each query, aiding in troubleshooting read-only pool routing.


726-733: LGTM! Clean implementation of preferred_pool in executeSQL.

The JSDoc and implementation correctly support routing queries to a preferred pool. The pattern of passing options.preferred_pool to get_pool() is clean and extensible.


937-937: LGTM! Consistent preferred_pool support in find and findOne.

Both methods correctly pass options.preferred_pool to get_pool(), enabling read-only pool routing for these common query methods. The implementation is consistent with the executeSQL pattern.

Also applies to: 954-954

src/sdk/nb.d.ts (1)

769-769: LGTM! TypeScript interface correctly updated.

The addition of preferred_pool?: string to the executeSQL options parameter aligns with the implementation in postgres_client.js and properly documents the new functionality for TypeScript consumers.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/util/postgres_client.js (1)

635-638: Add warning log when falling back to default pool.

The silent fallback could mask misconfigurations. If the read_only pool is expected but unavailable, queries will silently route to the default pool, defeating the load distribution purpose.

Apply this diff to add visibility:

         //if original get_pool was not for the default this.pool_key, try also this.pool_key
         if (key && key !== this.pool_key) {
+            dbg.warn(`Pool ${key} not available for table ${this.name}, falling back to default pool ${this.pool_key}`);
             return this.get_pool();
         }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d699655 and af555fb.

📒 Files selected for processing (1)
  • src/util/postgres_client.js (5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-package-lock-validation
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (4)
src/util/postgres_client.js (4)

251-252: LGTM! Helpful debug logging for pool routing verification.

The host logging will be useful for confirming that queries are being routed to the correct pool (read-only vs. default).


954-954: LGTM! Consistent with find implementation.

The preferred_pool support in findOne is consistent with the find implementation. Same eventual consistency considerations apply.


937-937: LGTM. Routing find queries to read replicas is appropriate for background maintenance operations.

The callers using preferred_pool: 'read_only' are background operations (find_unreclaimed_objects, find_deleted_objects, chunk/block existence checks) where eventual consistency is acceptable. These align with the stated use cases (object reclaimer, scrubber) and do not impact critical read paths.


726-733: Review comment is incorrect—executeSQL is designed for standalone queries and not used within transactions.

The code shows that executeSQL and PgTransaction have separate execution paths:

  • PgTransaction (lines 335-378) uses its own query() method with a dedicated pg_client connection for transactional queries
  • executeSQL (line 731) gets its own pool connection via pool.connect() for standalone queries
  • All executeSQL call sites in md_store.js are standalone queries, never invoked within transaction contexts
  • The preferred_pool option is appropriate for directing standalone queries to specific pools (e.g., read-only)

There is no mixing of transaction and standalone query execution paths, and no risk of routing queries to incorrect pools during transactions.

Likely an incorrect or invalid review comment.

Copy link
Member

@dannyzaken dannyzaken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, this looks good. A few general comments\questions

  • We should consider the case where the secondary is down. Maybe we should fallback to the primary?
  • We should make sure that the rest of the code behaves nicely if the secondary returns stale data (e.g., object_mds that were marked as reclaimed but were not reflected yet in the secondary). From a quick look at the code, it looks ok, but I didn't look too deeply into it.

@dannyzaken
Copy link
Member

@alphaprinz I also think we should have a config to disable this feature and revert to the old behavior. It can be done in a different PR, so approving this in the meantime

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/util/postgres_client.js (2)

639-642: Fallback logic is correct but comment could be clearer.

The fallback mechanism works as intended, but the comment "try also this.pool_key" could be more explicit that it's falling back to the table's default pool key.

Consider this clearer comment:

-            //if original get_pool was not for the default this.pool_key, try also this.pool_key
+            // If requested pool doesn't exist and wasn't the table's default pool, fallback to table's default pool
             if (key && key !== this.pool_key) {
                 return this.get_pool();
             }

1510-1514: Consider implementing the requested configuration option.

Based on the PR comment from dannyzaken, there's a request for a configuration option to disable the new read-only behavior and revert to previous behavior. While approved for a separate PR, consider whether adding a simple config flag now (e.g., config.ENABLE_READ_ONLY_POOL) would prevent future breaking changes.

Would you like me to generate a script to check how the preferred_pool: 'read_only' option is being used across the codebase?

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af555fb and 2f4a70a.

📒 Files selected for processing (3)
  • src/sdk/nb.d.ts (1 hunks)
  • src/server/object_services/md_store.js (5 hunks)
  • src/util/postgres_client.js (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/sdk/nb.d.ts
  • src/server/object_services/md_store.js
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-08-08T13:12:46.728Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:9-17
Timestamp: 2025-08-08T13:12:46.728Z
Learning: In upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js for noobaa-core, rely on structural detection (e.g., pool.mongo_info, and resource_type === 'INTERNAL') with name-prefix fallback for removing legacy mongo/internal pools, instead of depending solely on config.INTERNAL_STORAGE_POOL_NAME or config.DEFAULT_POOL_NAME. Handle multi-system stores and remove all matching pools in one change.

Applied to files:

  • src/util/postgres_client.js
📚 Learning: 2025-08-11T06:12:12.318Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js:6-22
Timestamp: 2025-08-11T06:12:12.318Z
Learning: In the noobaa-core upgrade script src/upgrade/upgrade_scripts/5.20.0/remove_mongo_pool.js, bucket migration from the internal mongo pool to a new default pool is planned to be handled in separate future PRs with comprehensive testing, rather than being included directly in the pool removal script.

Applied to files:

  • src/util/postgres_client.js
📚 Learning: 2025-08-08T13:10:36.141Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1314-1317
Timestamp: 2025-08-08T13:10:36.141Z
Learning: In src/server/system_services/pool_server.js (and config usage), the constant config.INTERNAL_STORAGE_POOL_NAME has been removed from the system. Future logic should not depend on this constant and should instead use config.DEFAULT_POOL_NAME or structural markers (e.g., pool.resource_type === 'INTERNAL' or pool.mongo_info) to identify internal/mongo pools.

Applied to files:

  • src/util/postgres_client.js
📚 Learning: 2025-08-08T13:13:42.192Z
Learnt from: naveenpaul1
Repo: noobaa/noobaa-core PR: 9182
File: src/server/system_services/pool_server.js:1304-1307
Timestamp: 2025-08-08T13:13:42.192Z
Learning: In noobaa-core, do not rely on config.INTERNAL_STORAGE_POOL_NAME to detect backingstore pools. In src/server/system_services/pool_server.js, prefer checking pool.hosts_pool_info.backingstore or pool.cloud_pool_info.backingstore; fallback to prefix-matching the pool name against config.DEFAULT_POOL_NAME ('backingstores') to handle potential suffixes.

Applied to files:

  • src/util/postgres_client.js
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: run-package-lock-validation
  • GitHub Check: Build Noobaa Image
  • GitHub Check: run-jest-unit-tests
🔇 Additional comments (6)
src/util/postgres_client.js (6)

252-252: LGTM: Debug logging added for pool routing.

The debug log provides useful information for troubleshooting pool selection and retry behavior.


730-730: LGTM: Type definition updated correctly.

The preferred_pool option is properly documented as optional in the typedef.


737-737: LGTM: Pool selection correctly uses preferred_pool option.

The implementation properly routes queries to the preferred pool when specified.


941-941: LGTM: Pool selection propagated correctly in find operations.

Both find() and findOne() correctly pass the preferred_pool option to route queries appropriately.

Also applies to: 958-958


1753-1754: LGTM: Retry flag correctly propagated to pool instance.

The flag is properly propagated from pool configuration to the pool instance, making it available in _do_query().


1510-1514: Configuration is correct—no infinite recursion risk.

The retry_with_default_pool: true setting for the read_only pool is properly configured. The default pool (lines 1502-1505) does not have this flag, preventing infinite recursion. The read_only pool will safely retry failed queries using the default pool as intended.

Comment on lines +271 to +274
if (pg_client.retry_with_default_pool) {
dbg.warn("retrying with default pool. q = ", q);
return _do_query(PostgresClient.instance().get_pool('default'), q, transaction_counter);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Risk of infinite recursion if default pool also has retry flag set.

If the default pool's retry_with_default_pool is set to true, this creates infinite recursion. The code should either:

  1. Explicitly pass retry_with_default_pool: false when getting the default pool, or
  2. Add a guard to prevent retrying if already using the default pool.

Additionally, the original error details are lost before the retry. Consider logging the original error with more context.

Apply this diff to prevent infinite recursion:

-        if (pg_client.retry_with_default_pool) {
+        if (pg_client.retry_with_default_pool && pg_client !== PostgresClient.instance().get_pool('default')) {
             dbg.warn("retrying with default pool. q = ", q);
             return _do_query(PostgresClient.instance().get_pool('default'), q, transaction_counter);
         }

Or alternatively, ensure the default pool instance never has retry_with_default_pool set:

         if (pg_client.retry_with_default_pool) {
-            dbg.warn("retrying with default pool. q = ", q);
+            dbg.warn("retrying with default pool. original error = ", err.message, ", q = ", q);
-            return _do_query(PostgresClient.instance().get_pool('default'), q, transaction_counter);
+            const default_pool = PostgresClient.instance().get_pool('default');
+            // Ensure we don't retry infinitely if default pool also has retry flag
+            const safe_pool = { ...default_pool, retry_with_default_pool: false };
+            return _do_query(safe_pool, q, transaction_counter);
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (pg_client.retry_with_default_pool) {
dbg.warn("retrying with default pool. q = ", q);
return _do_query(PostgresClient.instance().get_pool('default'), q, transaction_counter);
}
if (pg_client.retry_with_default_pool && pg_client !== PostgresClient.instance().get_pool('default')) {
dbg.warn("retrying with default pool. q = ", q);
return _do_query(PostgresClient.instance().get_pool('default'), q, transaction_counter);
}
🤖 Prompt for AI Agents
In src/util/postgres_client.js around lines 271 to 274, the retry branch
unconditionally calls _do_query against the default pool which can cause
infinite recursion if the default pool also has retry_with_default_pool=true and
loses the original error context; change the retry to (a) call
get_pool('default') with retry_with_default_pool explicitly set to false or (b)
add a guard that detects when the current pool is already the default and skip
retrying, and also log the original error before attempting the retry so the
original error details are preserved for debugging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants