Skip to content

Conversation

@zhuxr11
Copy link
Contributor

@zhuxr11 zhuxr11 commented Oct 12, 2025

Close #4560

This PR implemented PCA sign flip algorithm such that the max absolute value in each row is always positive in components, leaving trans_input unchanged, just like sklearn.decomposition.PCA.

In _mg version of PCA, components is not chunked. Thus, the _mg version can reuse sign flipping from single-GPU version (by setting stream = streams[0]). In this PR, cuml\decomposition\sign_flip_mg.hpp and cpp\src\pca\sign_flip_mg.cu are not in use (but the files are not removed).

@zhuxr11 zhuxr11 requested review from a team as code owners October 12, 2025 05:57
@zhuxr11 zhuxr11 requested a review from dantegd October 12, 2025 05:57
@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 12, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Contributor

@lowener lowener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good, thanks for the PR.
Can you also modify the assertion in test_pca.py to see if we can re-establish with_sign=True in the comparison tests?
https://github.com/rapidsai/cuml/blob/branch-25.12/python/cuml/tests/test_pca.py#L187

@csadorf csadorf added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change bug Something isn't working and removed improvement Improvement / enhancement to an existing function labels Oct 20, 2025
@zhuxr11
Copy link
Contributor Author

zhuxr11 commented Oct 22, 2025

This looks pretty good, thanks for the PR. Can you also modify the assertion in test_pca.py to see if we can re-establish with_sign=True in the comparison tests? https://github.com/rapidsai/cuml/blob/branch-25.12/python/cuml/tests/test_pca.py#L187

Ok, working on it.

@zhuxr11
Copy link
Contributor Author

zhuxr11 commented Oct 22, 2025

This looks pretty good, thanks for the PR. Can you also modify the assertion in test_pca.py to see if we can re-establish with_sign=True in the comparison tests? https://github.com/rapidsai/cuml/blob/branch-25.12/python/cuml/tests/test_pca.py#L187

Done. Now the tests pass with with_sign=True on my computer.

@zhuxr11 zhuxr11 requested a review from lowener October 22, 2025 16:44
@divyegala
Copy link
Member

/ok to test 711ac27

raft::handle_t handle{stream};
raft::linalg::map_offset(
handle,
raft::make_device_matrix_view<math_t, std::size_t>(components, n_rows, n_cols),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use the device_matrix_view based API above in raft::linalg::reduce as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use the device_matrix_view based API above in raft::linalg::reduce as well?

Resolved in #4992cef

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please use the device_matrix_view based API above in raft::linalg::reduce as well?

In #43660b8, I found a possible bug in the device_matrix_view overload of raft::linalg::reduce and filed an issue. For now, I added raft::linalg::reduce2 to compute the values with max abs per row, listed here:

https://github.com/zhuxr11/cuml/blob/43660b8e6b45b32acb566f087d50bf929bf9bc33/cpp/src/tsvd/tsvd.cuh#L46-L92

This should fix the error in computing max_vals in #4992cef.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the bug was resolved?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@divyegala do you want this to be changed prior to merge?

@divyegala
Copy link
Member

@zhuxr11 hi, the check-style CI job is failing. Please run pre-commit to automatically fix.

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR corrects PCA's sign-flip algorithm to match scikit-learn's behavior: ensuring the maximum absolute value in each component row is positive, while leaving transformed data unchanged. Previously, cuML flipped signs on transformed outputs, causing discrepancies with sklearn. The fix introduces signFlipComponents in cpp/src/tsvd/tsvd.cuh, calls it after computing components in both single-GPU (pca.cuh) and multi-GPU (pca_mg.cu) code paths, and removes incorrect sign flipping from transformed data. Tests now enforce exact sign matching (with_sign=True) instead of ignoring signs. The multi-GPU version reuses the single-GPU implementation because components are not partitioned.

Potential Issues

Critical: Non-deterministic sign behavior in reduction lambda (cpp/src/tsvd/tsvd.cuh:155-159)
The reduction lambda returns the element with max absolute value by comparing abs_a > abs_b ? a : b. When multiple elements share the same max absolute value with opposite signs (e.g., [3.0, -3.0]), which element is selected depends on evaluation order. This creates non-deterministic signs across runs.

Critical: Inconsistent test expectation (python/cuml/tests/test_pca.py:81)
Line 81 explicitly sets with_sign=False for components_ comparisons: with_sign = False if attr in ["components_"] else True. This contradicts the PR's goal of matching sklearn's sign convention and will allow incorrect sign flipping to pass undetected. This appears to be leftover from the old implementation.

High: Pointer aliasing in map_offset lambda (cpp/src/tsvd/tsvd.cuh:167-170)
The lambda captures both the raw components pointer and a device_matrix_view wrapping the same data. Reading from max_vals[row] and writing to components[idx] creates potential aliasing issues, though CUDA's memory model may make this safe in practice.

Medium: Handle construction from stream only (cpp/src/tsvd/tsvd.cuh:163)
Constructing a new raft::handle_t from just a stream (raft::handle_t handle{stream}) may lose synchronization context or resource manager state from the original handle passed to parent functions.

Low: Matrix dimension semantics unclear (cpp/src/tsvd/tsvd.cuh:148-149)
The code passes n_rows, n_cols to reduce, but whether this matches the actual component matrix layout (column-major vs row-major) should be verified against the rest of the PCA/TSVD implementation.

Confidence: 3/5

The core concept is sound, but the reduction lambda's non-determinism and the test inconsistency at line 81 could cause intermittent failures or mask bugs. Verify the reduction logic handles ties correctly and update the test to enforce with_sign=True for components.

Additional Comments (1)

  1. python/cuml/tests/test_pca.py, line 81 (link)

    logic: test assertion was changed from with_sign=False to with_sign=True but line 81 still sets it to False - this contradicts the PR's intent to enable sign checking

4 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

The behavior in cuML standard is no longer dependent on the installed
scikit-learn version. For cuml.accel we adjust behavior based on the
emulated scikit-learn version.
@csadorf
Copy link
Contributor

csadorf commented Nov 7, 2025

/ok to test 40e4746

@jcrist
Copy link
Member

jcrist commented Nov 7, 2025

/ok to test c444788

# Exposed to support sklearn's `get_feature_names_out`
return self.components_.shape[0]

def _flip_sign_u_based(self, components, X):
Copy link
Contributor Author

@zhuxr11 zhuxr11 Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is implemented with cpp code and you may not need to re-implement them in python, except for sparse matrix where SVD is computed via cupy. Here, you can control whether to flip by U or V (components) via setting flip_signs_based_on_U in pcaFit().

@csadorf
Copy link
Contributor

csadorf commented Nov 13, 2025

/ok to test afb6c2c

@csadorf
Copy link
Contributor

csadorf commented Nov 14, 2025

/ok to test ed7a860

Copy link
Contributor

@csadorf csadorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good to go IMO, assuming that tests pass.

@csadorf
Copy link
Contributor

csadorf commented Nov 14, 2025

/ok to test 1916077

@csadorf
Copy link
Contributor

csadorf commented Nov 14, 2025

/ok to test 23d3801

@csadorf
Copy link
Contributor

csadorf commented Nov 14, 2025

/merge

@rapids-bot rapids-bot bot merged commit ccf23dc into rapidsai:main Nov 14, 2025
106 checks passed
@csadorf
Copy link
Contributor

csadorf commented Nov 14, 2025

@zhuxr11 Thanks a lot for pulling through on this one! This has been a long-standing issue. Very much appreciated!!

@zhuxr11
Copy link
Contributor Author

zhuxr11 commented Nov 14, 2025

@zhuxr11 Thanks a lot for pulling through on this one! This has been a long-standing issue. Very much appreciated!!

Thanks for all who have helped me along the way. Cheers!

rapids-bot bot pushed a commit that referenced this pull request Nov 17, 2025
The sign normalization behavior in `PCA`/`TruncatedSVD` now matches that of sklearn (#7331), we no longer need this callout in the `cuml.accel` limitations docs page.

Authors:
  - Jim Crist-Harif (https://github.com/jcrist)

Approvers:
  - Simon Adorf (https://github.com/csadorf)

URL: #7492
rapids-bot bot pushed a commit that referenced this pull request Dec 3, 2025
- Don't branch the sign flipping behavior based on the version of sklearn installed. This somehow slipped through in #7331. We always want `cuml` behavior to be the same regardless of sklearn version - the only thing we branch on is the testing where we don't assert sign matches for sklearn < 1.5 (this matches the single-gpu testing strategy as well).
- Adds a sync point in multi-gpu PCA before calling `signFlipComponents`. The multi-gpu implementation makes use of multiple streams, but before only the first stream was passed to `signFlipComponents` (without any sync beforehand) leading to potential stream ordering issues. It's hard to prove a negative, but with this change I can no longer reproduce an issue reported in `rapids_singlecell`. A similar fix isn't needed for `TruncatedSVD` since that implementation only uses one stream.

Authors:
  - Jim Crist-Harif (https://github.com/jcrist)

Approvers:
  - Simon Adorf (https://github.com/csadorf)

URL: #7560
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working CUDA/C++ Cython / Python Cython or Python issue non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] PCA / TSVD sign flipping is unused and not correct

5 participants