Add lazy EMD solver with O(n) memory requirement#788
Conversation
- Implement emd_c_lazy in C++ network simplex for memory-efficient OT - Add lazy mode to emd2() accepting coordinates (X_a, X_b) instead of cost matrix - Support sqeuclidean, euclidean, and cityblock metrics - Add __restrict__ for SIMD optimization - Remove debug output from network_simplex_simple.h - Add tests for lazy solver and metric variants
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #788 +/- ##
===========================================
- Coverage 97.07% 84.97% -12.10%
===========================================
Files 107 107
Lines 22156 22249 +93
===========================================
- Hits 21507 18906 -2601
- Misses 649 3343 +2694 🚀 New features to boost your workflow:
|
rflamary
left a comment
There was a problem hiding this comment.
Nearly there, this is a wonderful PR that will allows solvers on very large data (at the cost of computing time)
| metric="sqeuclidean", | ||
| numItermax=100000, | ||
| log=False, | ||
| return_matrix=False, |
There was a problem hiding this comment.
since it is sparse, it could be true by default
There was a problem hiding this comment.
The solver here is only for dense ? So not sure why it should be true by defautl
@nathanneike Could you detail where exactly this vectorization is applied? Is it in the wrapper? |
|
@rflamary I think the implementation has a small bug that prevents one from reaping the memory savings. When Edit: It is not just a matter of removing the |
I did not add explicit SIMD. on my local optimized build, the compiler auto-vectorizes parts of those inner loops, I think that is quite common on most modern compilers. Hope that makes sense :) |
I understand. I'm just trying to see in which part exactly this is important, since the algorithm is not amenable to vectorization. Or do you mean the initial data copying in the wrapper? |
The part I had in mind is the on-the-fly distance computation in the C++ lazy-cost path, specifically the inner loop over coordinates in. Maybe that wasn't clear from my description. |
Hey, yes thats a really good point I guess we are saving on the matrix cost D but the graph representation itself is still not O(n + m). That would require a bigger refactoring. |
Add lazy EMD solver with on-the-fly distance computation
Types of changes
Motivation and context / Related issue
Addresses memory limitations when computing OT with large point clouds. Instead of pre-computing and storing the full n×m cost matrix, the lazy solver computes distances on-the-fly during the network simplex algorithm. This reduces memory from O(nm) to O(n+m) while maintaining exact EMD solutions.
How has this been tested
test_solvers.py:test_solve_sample_lazyandtest_solve_sample_lazy_emdPR checklist