-
Notifications
You must be signed in to change notification settings - Fork 12
Update compatibility bounds, fixed tolerances and issues in quadratic regularization #194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update compatibility bounds, fixed tolerances and issues in quadratic regularization #194
Conversation
|
I can't run the GPU tests locally but wanted to see if this fixed the CI |
|
ping @devmotion if you can allow CI :) |
|
I was about to fix the tolerance in the simple GPU quadratic regularization but this might actually be an algorithmic issue more than numeric, both the CPU and GPU versions issue the warning that the semi-smooth Newton didn't converge. This also occurs if I increase the maxiter to 1000, which is already too high for a Newton-type method |
|
TIL sources in Project.toml 😮 |
|
The semi-smooth is stalling at this value: |
|
alright, slowly getting there |
| G[(M + 1):end, 1:M] .= σ' | ||
| # G[diagind(G)] .+= δ # regularise cg | ||
| G += δ * I | ||
| view(G, diagind(G)) .+= δ # regularise cg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matbesancon this shouldn't matter but here G wasn't updated in-place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
G wasn't regularized altogether in the current master then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was, it's only used in this function here, so it doesn't matter whether it's updated in-place or not. It just unintentionally caused additional allocations in every iteration which one would like to get rid of completely by caching G upfront.
|
still no convergence of the semi-smooth Newton in the GPU example but well, might just be inherent to the method? 🤷 |
Yeah, I'm not completely sure. My impression is that the tests are quite bad - other methods also show convergence warnings and even POT fails/warns on some test examples. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #194 +/- ##
==========================================
- Coverage 95.30% 94.28% -1.02%
==========================================
Files 14 14
Lines 681 683 +2
==========================================
- Hits 649 644 -5
- Misses 32 39 +7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I'm a bit puzzled that this is still not converging (from buildkite logs), I got it to work fine locally with the latest version |
|
But at least CI is green and we can update to the latest packages 🥳 |
Did you run the GPU test locally (without GPUs)? Or the non-GPU test? The GPU and non-GPU tests are different, e.g., the former uses random histograms whereas the latter uses uniform marginal measures, and they use different regularization parameters. |
I ran the CPU part of the GPU test. Since two warnings are produced, it means the algorithm doesn't converge, regardless of the hardware it runs on |
|
I just checked out the PR locally. I don't see any convergence issues in the GPU tests locally, everything works and converges as expected 😕 Only if I decrease the regularization parameter to very small values (e.g. 1f-5) I see convergence issues. |
|
I would be surprised if buildkite is caching anything. For now we can merge this and see in later things whether this is still an issue? |
|
No, I'm not worried about caches, most likely it's just due to the different hardware architecture (I'm running it on an Apple M2). But it shows that most likely there isn't a general problem with the algorithm and/or the test but at most it's a slightly brittle test. I went through the tests more generally and fixed and improved a few things. I think generally the regularization parameters were unnecessarily small (which caused convergence failures in a few cases, both in OptimalTransport and POT). Additionally, some comparisons with POT were just failing due to a too high threshold/tolerance in POT. Moreover, the comparisons of the unbalanced Sinkhorn algorithm with POT were actually failing since POT by default uses a different regularization; so instead of adjusting the tolerances, it was sufficient to tell POT to also use entropy regularization. |
|
@devmotion I think that's good to go? |
af842b5
into
JuliaOptimalTransport:master
No description provided.