Skip to content

Add lazy EMD solver with O(n) memory requirement#788

Merged
rflamary merged 8 commits into
PythonOT:masterfrom
nathanneike:dense_emd_lazy_solver
Jan 22, 2026
Merged

Add lazy EMD solver with O(n) memory requirement#788
rflamary merged 8 commits into
PythonOT:masterfrom
nathanneike:dense_emd_lazy_solver

Conversation

@nathanneike

@nathanneike nathanneike commented Jan 19, 2026

Copy link
Copy Markdown
Contributor

Add lazy EMD solver with on-the-fly distance computation

  • Implement emd_c_lazy in C++ network simplex for memory-efficient OT
  • Add lazy mode to emd2() accepting coordinates (X_a, X_b) instead of cost matrix
  • Support sqeuclidean, euclidean, and cityblock metrics
  • Use restrict-style aliasing hints to help compiler optimization in the lazy distance kernel
  • Remove debug output from network_simplex_simple.h
  • Add tests for lazy solver and metric variants

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Motivation and context / Related issue

Addresses memory limitations when computing OT with large point clouds. Instead of pre-computing and storing the full n×m cost matrix, the lazy solver computes distances on-the-fly during the network simplex algorithm. This reduces memory from O(nm) to O(n+m) while maintaining exact EMD solutions.

How has this been tested

  • All existing tests pass (1031 passed, 69 skipped)
  • Added new tests in test_solvers.py: test_solve_sample_lazy and test_solve_sample_lazy_emd
  • Verified correctness against standard EMD solver
  • Tested all three metrics (sqeuclidean, euclidean, cityblock)
  • Confirmed SIMD optimization with compiler output analysis

PR checklist

  • I have read the CONTRIBUTING document.
  • The documentation is up-to-date with the changes I made (check build artifacts).
  • All tests passed, and additional code has been covered with new tests.
  • I have added the PR and Issue fix to the RELEASES.md file.

- Implement emd_c_lazy in C++ network simplex for memory-efficient OT
- Add lazy mode to emd2() accepting coordinates (X_a, X_b) instead of cost matrix
- Support sqeuclidean, euclidean, and cityblock metrics
- Add __restrict__ for SIMD optimization
- Remove debug output from network_simplex_simple.h
- Add tests for lazy solver and metric variants
@rflamary rflamary changed the title Add lazy EMD solver with on-the-fly distance computation Add lazy EMD solver with O(n) memory requirement Jan 20, 2026
Comment thread test/test_solvers.py Outdated
Comment thread ot/lp/_network_simplex.py
@codecov

codecov Bot commented Jan 20, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.45299% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.97%. Comparing base (4c49769) to head (58c520c).
⚠️ Report is 1 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (4c49769) and HEAD (58c520c). Click for more details.

HEAD has 19 uploads less than BASE
Flag BASE (4c49769) HEAD (58c520c)
20 1
Additional details and impacted files
@@             Coverage Diff             @@
##           master     #788       +/-   ##
===========================================
- Coverage   97.07%   84.97%   -12.10%     
===========================================
  Files         107      107               
  Lines       22156    22249       +93     
===========================================
- Hits        21507    18906     -2601     
- Misses        649     3343     +2694     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rflamary rflamary left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nearly there, this is a wonderful PR that will allows solvers on very large data (at the cost of computing time)

Comment thread ot/lp/_network_simplex.py Outdated
Comment thread ot/lp/_network_simplex.py Outdated
Comment thread ot/lp/_network_simplex.py Outdated
metric="sqeuclidean",
numItermax=100000,
log=False,
return_matrix=False,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it is sparse, it could be true by default

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solver here is only for dense ? So not sure why it should be true by defautl

Comment thread ot/lp/emd_wrap.pyx Outdated
Comment thread ot/lp/EMD_wrapper.cpp
Comment thread test/test_solvers.py Outdated
@rflamary rflamary merged commit 26f1380 into PythonOT:master Jan 22, 2026
17 of 18 checks passed
@EduardoGRocha

EduardoGRocha commented Mar 18, 2026

Copy link
Copy Markdown
  • Add restrict for SIMD optimization
    Confirmed SIMD optimization with compiler output analysis

@nathanneike Could you detail where exactly this vectorization is applied? Is it in the wrapper?

@EduardoGRocha

EduardoGRocha commented Mar 18, 2026

Copy link
Copy Markdown

@rflamary I think the implementation has a small bug that prevents one from reaping the memory savings. When NetworkSimplexSimple is initialized, reset is called, which leads to resizing the vectors to the number of arcs (m*n)

Edit: It is not just a matter of removing the resize because some of these vectors are still being accessed

@nathanneike

nathanneike commented Mar 19, 2026

Copy link
Copy Markdown
Contributor Author
  • Add restrict for SIMD optimization
    Confirmed SIMD optimization with compiler output analysis

@nathanneike Could you detail where exactly this vectorization is applied? Is it in the wrapper?

I did not add explicit SIMD. on my local optimized build, the compiler auto-vectorizes parts of those inner loops, I think that is quite common on most modern compilers. Hope that makes sense :)

@EduardoGoulart1

Copy link
Copy Markdown

I did not add explicit SIMD. on my local optimized build, the compiler auto-vectorizes parts of those inner loops, I think that is quite common on most modern compilers. Hope that makes sense :)

I understand. I'm just trying to see in which part exactly this is important, since the algorithm is not amenable to vectorization. Or do you mean the initial data copying in the wrapper?

@nathanneike

Copy link
Copy Markdown
Contributor Author

I did not add explicit SIMD. on my local optimized build, the compiler auto-vectorizes parts of those inner loops, I think that is quite common on most modern compilers. Hope that makes sense :)

I understand. I'm just trying to see in which part exactly this is important, since the algorithm is not amenable to vectorization. Or do you mean the initial data copying in the wrapper?

The part I had in mind is the on-the-fly distance computation in the C++ lazy-cost path, specifically the inner loop over coordinates in. Maybe that wasn't clear from my description.

@nathanneike

Copy link
Copy Markdown
Contributor Author

@rflamary I think the implementation has a small bug that prevents one from reaping the memory savings. When NetworkSimplexSimple is initialized, reset is called, which leads to resizing the vectors to the number of arcs (m*n)

Edit: It is not just a matter of removing the resize because some of these vectors are still being accessed

Hey, yes thats a really good point I guess we are saving on the matrix cost D but the graph representation itself is still not O(n + m). That would require a bigger refactoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants