[wip]feat(backend): postgres integration #12379

kaikaila · 2025-10-19T04:47:24Z

Summary

This PR adds full PostgreSQL (pgx driver) support to Kubeflow Pipelines backend, enabling users to choose between MySQL and PostgreSQL as the metadata database. The implementation introduces a clean dialect abstraction layer and includes a major query optimization that benefits both database backends.

Key achievements
✅ Complete PostgreSQL integration for API Server and Cache Server, addressing #7512, #9813
✅ All CI tests passing (MySQL + PostgreSQL).
✅ Significant performance improvement for ListRuns queries. This PR is expected to address the root causes behind #10778, #10230, #9780, #9701
✅ Zero breaking changes - backward compatible with existing MySQL deployments

What Changed

Storage Layer Refactoring - Dialect Abstraction ([backend/src/apiserver/common/sql/dialect]

Problem
SQL syntax was tightly coupled to MySQL.
Solution
Introduced a DBDialect interface that encapsulates database-specific behavior
Identifier quoting (MySQL backticks vs PostgreSQL double quotes)
Placeholder styles (? vs $1, $2, ...)
Aggregation functions (GROUP_CONCAT vs string_agg)
Concatenation syntax (CONCAT() vs ||)
Files
- Core dialect implementation → backend/src/apiserver/common/sql/dialect/dialect.go
- Dialect-aware utility functions → backend/src/apiserver/storage/sql_dialect_util.go
- Reusable filter builders with proper quoting → backend/src/apiserver/storage/list_filters.go

All storage layer code now uses

q := s.dbDialect.QuoteIdentifier
qb := s.dbDialect.QueryBuilder()

This ensures queries work correctly across MySQL, PostgreSQL, and SQLite (for tests).

ListRuns Query Performance Optimization

Problem
The original ListRuns query called addMetricsResourceReferencesAndTasks which performed a 3-layer LEFT JOIN with GROUP BY on all columns, including LONGTEXT fields like PipelineSpecManifest WorkflowSpecManifest etc. This caused slow response times for large datasets.
Solution
Layers 1-3: LEFT JOIN only on PrimaryKeyUUID + aggregated columns (refs, tasks, metrics)
Final layer: INNER JOIN back to run_details to fetch LONGTEXT columns
Performance impact
Eliminates GROUP BY on LONGTEXT columns entirely. Expected substantial performance improvements for deployments with large pipeline specifications, though formal load testing has not yet been conducted.

Deployment Configurations

Production-ready PostgreSQL kustomization → manifests/kustomize/env/platform-agnostic-postgresql/
Local development setup → manifests/kustomize/env/dev-kind-postgresql/
PostgreSQL StatefulSet → manifests/kustomize/third-party/postgresql/

Configuration is symmetric to existing MySQL manifests for consistency.

CI Manifest Overlays

Created CI-specific Kustomize overlays to ensure tests use locally built images from the Kind registry instead of pulling official images from ghcr.io:

Add PostgreSQL CI overlay .github/resources/manifests/standalone/postgresql/
Added kfp-cache-server image override to .github/resources/manifests/standalone/base/kustomization.yaml

Added 2 PostgreSQL-specific CI workflows

V2 API and integration tests (cache enabled/disabled matrix) → api-server-test-Postgres.yml
V1 integration tests → integration-tests-v1-postgres.yml

PostgreSQL tests cover the core cache enabled/disabled matrix.

Local development support

make dev-kind-cluster-pg - Provision Kind cluster with PostgreSQL
Updated README for PostgreSQL setup and debugging, achieving parity with MySQL documentation.

Testing

Unit Tests

23 test files modified/added
New test coverage: dialect_test.go, list_filters_test.go, sql_dialect_util_test.go
All existing tests updated to use dialect abstraction

Integration Tests

✅ V1 API integration tests (PostgreSQL)
✅ V2 API integration tests (PostgreSQL, cache enabled/disabled)
✅ Existing MySQL tests remain green

Migration Guide

For new deployments:
kubectl apply -k manifests/kustomize/env/platform-agnostic-postgresql
For existing MySQL deployments:
No action required. This PR is fully backward compatible.
For local development, to set up the kind cluster with Postgres
make -C backend dev-kind-cluster-pg

This PR continues from #12063.

google-oss-prow · 2025-10-19T04:47:34Z

Hi @kaikaila. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

github-actions · 2025-10-19T04:47:45Z

🚫 This command cannot be processed. Only organization members or owners can use the commands.

kaikaila · 2025-10-22T05:10:42Z

Currently, both MySQL and PGX setups use the DB superuser for all KFP operations, which is why client_manager.go contains a “create database if not exist” step here.

From a security standpoint, would it be preferable to:

Move DB creation out of the client manager and into the deployment/init phase (i.e. add a manifests/kustomize/third-party/postgresql/base/pg-init-configmap.yaml) and
Introduce a dedicated restricted user for KFP components, limited to the mlpipeline database?

If the team agrees, I can propose a follow-up PR to refactor accordingly.

HumairAK · 2025-10-22T18:56:27Z

I'm fine with this, I don't think it's great that KFP tries to create a database (or a bucket frankly)

fyi @mprahl / @droctothorpe

kaikaila · 2025-10-22T19:47:05Z

Thanks, @HumairAK — totally agree on the security point.
Since this PR is already getting quite heavy, would you be okay if I leave the user permission changes for a separate follow-up PR?

HumairAK · 2025-10-23T14:08:31Z

yes that is fine

kaikaila · 2025-10-24T03:05:32Z

Question about the PostgreSQL test workflow organization

Current situation

The V2 integration tests for PostgreSQL logically belong in a "PostgreSQL counterpart" to legacy-v2-api-integration-tests.yml
However, I didn't want to create a new workflow with "legacy" in the name from day one.
As a temporary solution, I merged them into api-server-test-Postgres.yml
This causes asymmetry with api-server-tests.yml and the workflow has mixed responsibilities.

Question: What's the recommended workflow organization for PostgreSQL tests?

Should I:

a. Create legacy-v2-api-integration-tests-postgres.yml for consistency (even though it's new)?
b. Keep current structure and accept the asymmetry?
c. Refactor both MySQL and PostgreSQL to a unified structure?

Would love guidance on the long-term vision for test workflow organization, especially from @nsingla

nsingla · 2025-11-03T15:56:42Z

Question about the PostgreSQL test workflow organization

Current situation

The V2 integration tests for PostgreSQL logically belong in a "PostgreSQL counterpart" to legacy-v2-api-integration-tests.yml However, I didn't want to create a new workflow with "legacy" in the name from day one. As a temporary solution, I merged them into api-server-test-Postgres.yml This causes asymmetry with api-server-tests.yml and the workflow has mixed responsibilities.

Question: What's the recommended workflow organization for PostgreSQL tests?

Should I:
* a. Create legacy-v2-api-integration-tests-postgres.yml for consistency (even though it's new)?

* b. Keep current structure and accept the asymmetry?

* c. Refactor both MySQL and PostgreSQL to a unified structure?
Would love guidance on the long-term vision for test workflow organization, especially from @nsingla

I actually would like to get rid of these legacy tests asap, but there are still few tests that needs to be migrated first, so my suggestion is to not add more work to the legacy workflows
Rather, can we add "database" as a workflow parameter, similar to "pipeline_store", and run tests against mysql as well as postgres in the same workflow?

nsingla

1 big difference between Mysql and Postgres is that Mysql is case insensitive whereas postgres is, so can we actually add test cases to verify searches with cases sensitivity and insensitivity? we can probably implement these test cases in https://github.com/kubeflow/pipelines/blob/master/backend/test/v2/api/pipeline_api_test.go#L125, to filter by name containing upper case letter

Signed-off-by: kaikaila <[email protected]>

…iveExperiment - Extract repeated subquery SQL into resourceReferenceSubquery variable - Unify code style: consistently use SetMap() throughout - Add detailed comments explaining PostgreSQL $N placeholder handling - Simplify error messages optimization according to sanchesoon's suggestion Signed-off-by: kaikaila <[email protected]>

1. dialect.go: fmt.Sprintf for sql string 2. merge review diff from Humair *_store.go: replace sql string concatenating with fmt.Sprintf reuse escapeSQLString from dialect.go replace == with erros.Is() replace t.Errorf with require or assert; replace t.Fatalf with require.FailNow in integration test hardcode expectations rename QuoteFunction QualifyIdentifier in storage package unit tests for qualifyIdentifier parametized timeout as a constant make target dev-kind-cluster with DB parameter revert dev-kind-cluster bridge to 172.17.0.1 for linux cleaning redundant postgres config in configmap add database parameter to api-server-test standalone only add CI job for postgres with argo 3.6.7 database ”" to “mysql” Signed-off-by: kaikaila <[email protected]>

Signed-off-by: kaikaila <[email protected]>

…abase-agnostic sorting Signed-off-by: kaikaila <[email protected]>

…irst yaml file for smoke tests Signed-off-by: kaikaila <[email protected]>

Signed-off-by: kaikaila <[email protected]>

google-oss-prow bot added the do-not-merge/work-in-progress label Oct 19, 2025

google-oss-prow bot requested review from HumairAK, gmfrasca, hbelmiro and mprahl October 19, 2025 04:47

google-oss-prow bot added the needs-ok-to-test label Oct 19, 2025

google-oss-prow bot added the size/XXL label Oct 19, 2025

kaikaila force-pushed the feature/postgres-integration branch 7 times, most recently from cd1d08b to 85498ed Compare October 22, 2025 05:03

kaikaila force-pushed the feature/postgres-integration branch 3 times, most recently from 09fd370 to 1e0caa8 Compare October 23, 2025 07:10

kaikaila force-pushed the feature/postgres-integration branch 6 times, most recently from 4d33821 to e6c943c Compare October 24, 2025 02:47

kaikaila force-pushed the feature/postgres-integration branch 2 times, most recently from e038969 to c81e91a Compare November 3, 2025 09:20

nsingla reviewed Nov 3, 2025

View reviewed changes

kaikaila force-pushed the feature/postgres-integration branch 4 times, most recently from eb68f89 to 1ded0c7 Compare November 7, 2025 03:29

kaikaila changed the title ~~feat(backend): postgres integration~~ [wip]feat(backend): postgres integration Nov 7, 2025

google-oss-prow bot added the do-not-merge/work-in-progress label Nov 7, 2025

kaikaila force-pushed the feature/postgres-integration branch 2 times, most recently from 6fe55fe to a6cce78 Compare November 13, 2025 00:18

kaikaila force-pushed the feature/postgres-integration branch 5 times, most recently from e318f11 to afe5e6a Compare November 19, 2025 04:40

kaikaila added 11 commits November 20, 2025 01:22

refactor(backend): integrate postgresql

5b00c13

Signed-off-by: kaikaila <[email protected]>

ci(pg): add PostgreSQL CI workflows and manifests

5311971

Signed-off-by: kaikaila <[email protected]>

agents.md and readme.md

141134b

Signed-off-by: kaikaila <[email protected]>

add unit test for placeholder numbering

a6928c2

Signed-off-by: kaikaila <[email protected]>

update api-server-test to include postgres

24b6076

Signed-off-by: kaikaila <[email protected]>

document for kind-cluster-agnostic

86cf32d

Signed-off-by: kaikaila <[email protected]>

test(integration): use consistent naming for test files to ensure dat…

266d8b5

…abase-agnostic sorting Signed-off-by: kaikaila <[email protected]>

fix(test): handle binary files in pipeline spec replacement and use f…

ffdc994

…irst yaml file for smoke tests Signed-off-by: kaikaila <[email protected]>

fix branch path

129ae6a

Signed-off-by: kaikaila <[email protected]>

kaikaila force-pushed the feature/postgres-integration branch from af3a6cd to 129ae6a Compare November 20, 2025 09:23

fixup first commit

00dda6a

Signed-off-by: kaikaila <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wip]feat(backend): postgres integration #12379

[wip]feat(backend): postgres integration #12379

Uh oh!

kaikaila commented Oct 19, 2025 •

edited

Loading

Uh oh!

google-oss-prow bot commented Oct 19, 2025

Uh oh!

github-actions bot commented Oct 19, 2025

Uh oh!

kaikaila commented Oct 22, 2025

Uh oh!

HumairAK commented Oct 22, 2025

Uh oh!

kaikaila commented Oct 22, 2025

Uh oh!

HumairAK commented Oct 23, 2025

Uh oh!

kaikaila commented Oct 24, 2025

Uh oh!

nsingla commented Nov 3, 2025

Question about the PostgreSQL test workflow organization

Current situation

Question: What's the recommended workflow organization for PostgreSQL tests?

Uh oh!

nsingla left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[wip]feat(backend): postgres integration #12379

Are you sure you want to change the base?

[wip]feat(backend): postgres integration #12379

Uh oh!

Conversation

kaikaila commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Testing

Unit Tests

Integration Tests

Migration Guide

Uh oh!

google-oss-prow bot commented Oct 19, 2025

Uh oh!

github-actions bot commented Oct 19, 2025

Uh oh!

kaikaila commented Oct 22, 2025

Uh oh!

HumairAK commented Oct 22, 2025

Uh oh!

kaikaila commented Oct 22, 2025

Uh oh!

HumairAK commented Oct 23, 2025

Uh oh!

kaikaila commented Oct 24, 2025

Question about the PostgreSQL test workflow organization

Current situation

Question: What's the recommended workflow organization for PostgreSQL tests?

Uh oh!

nsingla commented Nov 3, 2025

Question about the PostgreSQL test workflow organization

Current situation

Question: What's the recommended workflow organization for PostgreSQL tests?

Uh oh!

nsingla left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

kaikaila commented Oct 19, 2025 •

edited

Loading