SpectralClustering estimator #7372

aamijar · 2025-10-21T23:12:02Z

Resolves #7071

This branch should be retargeted to 25.12

This PR introduces the SpectralClustering estimator which has python bindings that is similar to sklearn. The spectral clustering implementation from cuvs is called under the hood.

Here is a plot comparison of sklearn vs cuml.

copy-pr-bot · 2025-10-21T23:12:06Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

python/cuml/cuml/cluster/spectral_clustering.pyx

csadorf · 2025-11-24T19:45:12Z

python/cuml/cuml/cluster/spectral_clustering.pyx

+    eigen_tol : float, default=0.0
+        Tolerance for the eigensolver. 0.0 uses the default solver tolerance.


Please use default: "auto"

Addressed in d8aeb06

csadorf · 2025-11-24T19:46:57Z

python/cuml/cuml/cluster/spectral_clustering.pyx

+    affinity : {'nearest_neighbors', 'precomputed'}, default='precomputed'
+        How to construct the affinity matrix.
+         - 'nearest_neighbors' : construct the affinity matrix by computing a
+           graph of nearest neighbors from the input data.
+         - 'precomputed' : interpret X as a precomputed affinity matrix,
+           where larger values indicate greater similarity between instances.


I think the default for the affinity parameter should be nearest_neighbors to match the user expectation that typically, we do not have to perform additional computations to train an estimator and to more closely match the scikit-learn API.

That would also be in line with the estimator docs:

When calling ``fit``, an affinity matrix is constructed using a k-nearest neighbors connectivity matrix. Alternatively, a user-provided affinity matrix can be specified by setting ``affinity='precomputed'``.

Good catch! Changing to default of nearest_neighbors solved the test_input_estimators CI fail. Addressed in 99f7268

csadorf · 2025-11-24T19:58:09Z

python/cuml/tests/test_spectral_clustering.py

+        sk_spectral = skSpectralClustering(
+            n_clusters=n_clusters,
+            affinity="precomputed",
+            random_state=42,
+        )
+        y_sklearn = sk_spectral.fit_predict(knn_graph)
+
+        cuml_spectral = SpectralClustering(
+            n_clusters=n_clusters,
+            affinity="precomputed",
+            random_state=42,
+        )


I'd suggest we place shared parameters into a params dict. In this way it's a bit easier to see where there are differences between sklearn's and cuml's API.

Good idea, also removes repeated code. Addressed in 78e74e6

csadorf · 2025-11-24T19:59:21Z

python/cuml/tests/test_spectral_clustering.py

+    knn_graph = kneighbors_graph(
+        X_np,
+        n_neighbors=30,
+        mode="connectivity",
+        include_self=True,
+    )
+    knn_graph = 0.5 * (knn_graph + knn_graph.T)


Wouldn't it be much easier to run these tests with affinity=nearest_neighbors? Or why would the output_type handling be affected somehow? If that's the case, then we should run this for both affinity="nearest_neighbors" and affinity="precomputed".

Addressed in 309e006

python/cuml/tests/test_spectral_clustering.py

csadorf · 2025-11-24T20:15:57Z

python/cuml/cuml/cluster/spectral_clustering.pyx

+    cdef int affinity_nnz = 0
+
+    if affinity == "nearest_neighbors":
+        from cuml.internals.input_utils import input_to_cupy_array


No need for a delayed import IMO.

Addressed in 05dce17

csadorf · 2025-11-24T20:21:48Z

python/cuml/cuml/cluster/spectral_clustering.pyx

+    >>> from sklearn.neighbors import kneighbors_graph
+    >>> from cuml.cluster import spectral_clustering
+    >>> X = cp.random.rand(100, 10, dtype=cp.float32)
+    >>> A = kneighbors_graph(cp.asnumpy(X), n_neighbors=10, include_self=True)


I'd recommend to have an example that does not require scikit-learn and that perhaps avoids the need for a pre-computed graph.

Addressed in 86a3cb1

csadorf · 2025-11-24T20:23:29Z

python/cuml/cuml/cluster/spectral_clustering.pyx

+            if A.dtype != np.float32:
+                A = A.astype("float32")


I'd recommend to issue a UserWarning when we trigger the conversion. User can avoid it by converting prior to the call.

Addressed in 30db43a

I think we need to issue that warning before the if: else: block since we might be coercing on lines 171, and 173 as well.

python/cuml/cuml/cluster/spectral_clustering.pyx

This PR updates the repository to version 26.02. This is part of the 25.12 release burndown process.

Reverts rapidsai#7494

aamijar · 2025-11-25T06:58:19Z

/ok to test

copy-pr-bot · 2025-11-25T06:58:21Z

/ok to test

@aamijar, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

aamijar · 2025-11-25T06:59:08Z

/ok to test 78e89dd

csadorf · 2025-11-25T17:42:29Z

/ok to test 3b07729

python/cuml/cuml/cluster/spectral_clustering.pyx

csadorf · 2025-11-25T15:11:01Z

python/cuml/cuml/cluster/spectral_clustering.pyx

+            if A.dtype != np.float32:
+                A = A.astype("float32")


I think we need to issue that warning before the if: else: block since we might be coercing on lines 171, and 173 as well.

python/cuml/cuml/cluster/spectral_clustering.pyx

csadorf · 2025-11-25T16:14:28Z

python/cuml/cuml/cluster/spectral_clustering.pyx

+    labels : cupy.ndarray of shape (n_samples,)
+        Cluster labels for each sample.


Is this actually true? Or are we returning numpy arrays etc. if X is provided as a numpy array?

Addressed in 241e2ff

What about the return value of the spectral_clustering() function?

Addressed in 3c9316a

python/cuml/tests/test_pickle.py

jinsolp

Thanks @aamijar cpp side looks good! Small comments on the python side

python/cuml/cuml/cluster/spectral_clustering.pyx

python/cuml/tests/test_spectral_clustering.py

Co-authored-by: Simon Adorf <[email protected]>

aamijar · 2025-11-25T21:51:40Z

/ok to test

copy-pr-bot · 2025-11-25T21:51:45Z

/ok to test

@aamijar, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

aamijar · 2025-11-25T21:52:10Z

/ok to test 9a2818e

aamijar · 2025-11-25T23:42:35Z

/ok to test f74949d

aamijar · 2025-11-26T02:23:32Z

/ok to test f74949d

csadorf · 2025-11-26T03:39:04Z

/ok to test 8f69c01

csadorf

LGTM! Nice work!

jinsolp

Thanks @aamijar 👍

csadorf · 2025-11-26T15:35:09Z

/merge

github-actions bot assigned aamijar Oct 21, 2025

github-actions bot added Cython / Python Cython or Python issue CMake CUDA/C++ labels Oct 21, 2025

aamijar added algo: spectral-embedding non-breaking Non-breaking change feature request New feature or request labels Oct 21, 2025

aamijar marked this pull request as ready for review November 21, 2025 10:42

aamijar requested review from a team as code owners November 21, 2025 10:42

aamijar requested review from KyleFromNVIDIA, betatim and jinsolp and removed request for KyleFromNVIDIA, betatim and jinsolp November 21, 2025 10:42

csadorf requested changes Nov 24, 2025

View reviewed changes

aamijar and others added 11 commits November 25, 2025 06:31

SpectralClustering estimator

602ada2

rng_state instead of seed

287a009

Update to 26.02 (rapidsai#7493)

81c54b4

This PR updates the repository to version 26.02. This is part of the 25.12 release burndown process.

Revert "Forward-merge release/25.12 into main" (rapidsai#7495)

a6d4721

Reverts rapidsai#7494

refactor

65bba07

remove unused

6db0615

input dataset api

37a5077

add pytests

f2c8d51

refactor

1f4a8e0

add docstrings

8600479

docs and readme

340bf45

aamijar removed request for a team and gforsyth November 25, 2025 06:57

aamijar added the New Algorithm For tracking new algorithms that will be added to our existing collection label Nov 25, 2025

csadorf added 2 commits November 25, 2025 11:39

Improve the hypothesis property-based testing.

e1a3d50

Add test for spectral clustering convergence failure

3b07729

csadorf requested changes Nov 25, 2025

View reviewed changes

jinsolp requested changes Nov 25, 2025

View reviewed changes

python/cuml/cuml/cluster/spectral_clustering.pyx Show resolved Hide resolved

python/cuml/tests/test_spectral_clustering.py Show resolved Hide resolved

aamijar and others added 5 commits November 25, 2025 21:34

add notes docstring

4c8e4ce

Update python/cuml/cuml/cluster/spectral_clustering.pyx

ed594f3

Co-authored-by: Simon Adorf <[email protected]>

Update python/cuml/cuml/cluster/spectral_clustering.pyx

bfbd5bb

Co-authored-by: Simon Adorf <[email protected]>

refactor

241e2ff

rename

9a2818e

aamijar and others added 3 commits November 25, 2025 22:56

return type docstring

3c9316a

compare y_sklearn and y_cuml score

222a029

Merge branch 'release/25.12' into spectral-clustering

f74949d

Raise a nicer exception on conversion issues.

8f69c01

csadorf approved these changes Nov 26, 2025

View reviewed changes

jinsolp approved these changes Nov 26, 2025

View reviewed changes

rapids-bot bot merged commit 63e26e7 into rapidsai:release/25.12 Nov 26, 2025
107 checks passed

		eigen_tol : float, default=0.0
		Tolerance for the eigensolver. 0.0 uses the default solver tolerance.

		labels : cupy.ndarray of shape (n_samples,)
		Cluster labels for each sample.

SpectralClustering estimator #7372

SpectralClustering estimator #7372

Uh oh!

Conversation

aamijar commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Oct 21, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aamijar Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aamijar Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aamijar commented Nov 25, 2025

Uh oh!

copy-pr-bot bot commented Nov 25, 2025

Uh oh!

aamijar commented Nov 25, 2025

Uh oh!

csadorf commented Nov 25, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jinsolp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

aamijar commented Nov 25, 2025

Uh oh!

copy-pr-bot bot commented Nov 25, 2025

Uh oh!

aamijar commented Nov 25, 2025

Uh oh!

aamijar commented Nov 25, 2025

aamijar commented Oct 21, 2025 •

edited

Loading

aamijar Nov 24, 2025 •

edited

Loading

aamijar Nov 25, 2025 •

edited

Loading