[WIP] PERF Optimize weighted percentile (from n log n to n) by cakedev0 · Pull Request #32288 · scikit-learn/scikit-learn

cakedev0 · 2025-09-28T12:55:24Z

WIP

Reference Issues/PRs

Follow-up from the proof section of this PR: #32285

What does this implement/fix? Explain your changes.

The algorithm implemented here is basically the one described here: Find a weighted median for unsorted array in linear time adapted to handle several quantiles in the same recursive call.

Complexity is O(n) for one quantile, and approaches O(n log n) for many quantiles (in practice, 10 seems to be many already).

Benchmarks

In numpy:
For one quantile, it's ~3x faster than the unstable-sort version of the current code.
The current code doesn't handle multiple quantiles, so you have to loop on each quantile, which is much slower than my function that compute the 10 deciles in less time than computing one quantile with the current code.

github-actions · 2025-09-28T12:56:30Z

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here

`cython-lint`

cython-lint detected issues. Please fix them locally and push the changes. Here you can see the detected issues. Note that the installed cython-lint version is cython-lint=0.18.0.

Details


/home/runner/work/scikit-learn/scikit-learn/sklearn/cluster/_hdbscan/_tree.pyx:786:19: unnecessary set + generator (just use a set comprehension)

_{Generated for commit: d59abf5. Link to the linter CI: here}

…nt have a nvidia-GPU

betatim · 2025-09-30T15:28:47Z

Even in draft mode, could you fill in a few details in the top comment already? In particular referencing the relevant other PRs/issues. I think it helps keep track of things. No full explanation, proof, etc needed.

ogrisel · 2025-10-02T14:54:40Z

@cakedev0 if you want to run the tests on CUDA interactively, you can use https://gist.github.com/EdAbati/ff3bdc06bafeb92452b3740686cc8d7c

ogrisel · 2025-10-02T15:48:09Z

+            x = x[mask_nz]
+        # Recursively compute weighted percentiles using partitioning
+        w_sorted = False
+        if not hasattr(xp, "argpartition"):


Suggested change

if not hasattr(xp, "argpartition"):

# XXX: update this once argpartition or equivalent is officially part of the

# array API spec:

# https://github.com/data-apis/array-api/issues/629

if not hasattr(xp, "argpartition"):

About that: see my PR data-apis/array-api-extra#449

(I surely should read carefully all the discussions in the issue you've linked and the related PRs, but that sounds a bit daunting for today 😅)

Once array-api-extra 0.9.1 is merged with data-apis/array-api-extra#449 we need to revendor it in scikit-learn and update this draft PR to leverage it.

Is it worth waiting for that before making this PR ready for review?

If not, I'll finish polishing it this week (TODO: write tests, publish some benchmarks, fix CUDA).

I think the CUDA problem can be fixed independently (see below), but we sure need benchmarks, both with numpy and torch CPU and with torch on CUDA. Ideally both for the current state of this and the future code path with the xpx.argpartition.

Ideally both for the current state of this and the future code path with the xpx.argpartition

I guess I'll just wait for this future to be the present before doing the benchmarks then. Would that be ok? This mean not touching this PR before array-api-extra 0.9.1 is out and revendored.

Or maybe it's still better to move forward here now? As this PR is kinda blocking data-apis/array-api-extra#340 (comment) and the equivalent one in scipy.

I'll let you decide, both options work for me.

Let's benchmark the current state. If this is good, we can proceed with the review now that the tests are green.

ogrisel · 2025-10-09T08:11:50Z

Benchmarks

For one quantile, it's ~3x faster than the unstable-sort version of the current code.

It would be great to do benchmarks for 1, 5, 10 quantiles for int(1e5), int(1e6), int(1e7) data points with uniform and heavy tailed data and weights, both on numpy CPU and torch CUDA.

…t-learn into optim-weighted-percentile

…entile

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

cakedev0 · 2025-10-13T14:54:53Z

I made some benchmarks with varying shapes (n, d) for the input array and with nq quantiles (1, 3, 9).

For d=1 and nq=1, the gain is clear for numpy (3x) and the loss is very limited for torch.
For nq > 1, the gain is clear in most cases, but significant gains could also be reached by just looping over quantiles inside the function (once the sort is done), and not outside.
For d > 1, and esp. d >> 1 (like 100), it's not great. I think it's because of the the over-head of looping over the dimensions, and making a lot of calls xp.some_func(...).

Conclusion: maybe let's just go with the simple loop over quantiles inside the current implementation? This would be a clear and easy gain for some of the current use-cases in sklearn.

The algorithm I propose here might have its place to be implemented in Cython/C/C++ somewhere in numpy and/or scipy. Such an implementation would make the gain significant for any d and nq= 1 (or nq=2, 3 too). nq=1 being a common use-case, that might be interesting.
I'm not familiar with those ecosystems so any advice/insight will be appreciated!

…t-learn into optim-weighted-percentile

…entile

cakedev0 · 2025-10-13T15:07:23Z

Note: I found a way to mitigate the perf loss with d>>1 (I just pushed it), I'll give proper detailed benchmark results if you think it's still worth pursuing.

…th only one sort/cumsum

cakedev0 · 2025-10-17T16:52:34Z

I give up for now on the O(n) algorithm, so I'm closing this PR.

ogrisel · 2025-10-20T13:49:07Z

Conclusion: maybe let's just go with the simple loop over quantiles inside the current implementation? This would be a clear and easy gain for some of the current use-cases in sklearn.

I agree, this is a good first step with a net improvement. Once merged we, can always reexplore later to compare this new stronger baseline. But maybe it's not worth investing too much effort if this function is not reported as the computational bottleneck of any user actual workload.

cakedev0 added 5 commits September 28, 2025 14:09

implem done; clean-up & comments todo

b0f8bdf

conform to array-API

2140c82

cleanup

84c0240

comments; docstring; cleanups

7822673

swap functions order for easier diff

c82c75f

github-actions Bot added the module:utils label Sep 28, 2025

cakedev0 added 7 commits September 28, 2025 15:09

use new signature where useful

f6f877c

update docstring for new signature

cad8614

adapt fully to array-API

7f5d47f

fix array API compat

649b271

another array API fix: TypeError: object of type 'Array' has no len()

cda231c

more array-API fixes; tested locally; but I cant test everything I do…

9c4a5ad

…nt have a nvidia-GPU

tmp: old for benchmark

4ea221e

cakedev0 mentioned this pull request Sep 30, 2025

PERF: don't use stable sort in _weighted_percentile #32285

Merged

cakedev0 mentioned this pull request Sep 30, 2025

ENH: Adding partition and argpartition? data-apis/array-api-extra#448

Closed

Merge branch 'main' into optim-weighted-percentile

837c287

ogrisel added the CUDA CI label Oct 2, 2025

github-actions Bot removed the CUDA CI label Oct 2, 2025

ogrisel reviewed Oct 2, 2025

View reviewed changes

ogrisel mentioned this pull request Oct 3, 2025

ENH: add quantile data-apis/array-api-extra#340

Open

ogrisel reviewed Oct 6, 2025

View reviewed changes

Comment thread sklearn/utils/stats.py Outdated

Comment thread sklearn/utils/stats.py Outdated

cakedev0 and others added 4 commits October 9, 2025 12:39

Merge branch 'optim-weighted-percentile' of github.com:cakedev0/sciki…

6194d15

…t-learn into optim-weighted-percentile

Merge remote-tracking branch 'upstream/main' into optim-weighted-perc…

f14eb58

…entile

remove comment about floating dtype

f96d334

Fix device error

013725d

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

ogrisel added the CUDA CI label Oct 10, 2025

github-actions Bot removed the CUDA CI label Oct 10, 2025

ogrisel mentioned this pull request Oct 10, 2025

MAINT remove XFAIL marker in test_weighted_percentile_array_api_consistency for array-api-strict #32470

Merged

cakedev0 added 4 commits October 13, 2025 17:00

mitigate perf loss with d>>1

7b9e50b

Merge branch 'optim-weighted-percentile' of github.com:cakedev0/sciki…

1407248

…t-learn into optim-weighted-percentile

Merge remote-tracking branch 'upstream/main' into optim-weighted-perc…

4a8b9df

…entile

minor fix for average

35f6d8d

cakedev0 added 2 commits October 17, 2025 18:16

WIP: inner func handles 2D but only 1 quantile

fd8b2c9

restore back prev implem. and loop to compute multiple percentiles wi…

d59abf5

…th only one sort/cumsum

cakedev0 closed this Oct 17, 2025

cakedev0 mentioned this pull request Oct 19, 2025

PERF: support multiple percentile ranks in input of _weighted_percentile #32538

Merged

-        if not hasattr(xp, "argpartition"):
+        # XXX: update this once argpartition or equivalent is officially part of the
+        # array API spec:
+        # https://github.com/data-apis/array-api/issues/629
+        if not hasattr(xp, "argpartition"):

Uh oh!

Uh oh!

Conversation

cakedev0 commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

WIP

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Benchmarks

Uh oh!

github-actions Bot commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ Linting issues

cython-lint

Uh oh!

betatim commented Sep 30, 2025

Uh oh!

ogrisel commented Oct 2, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cakedev0 commented Oct 13, 2025

Uh oh!

cakedev0 commented Oct 13, 2025

Uh oh!

cakedev0 commented Oct 17, 2025

Uh oh!

ogrisel commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cakedev0 commented Sep 28, 2025 •

edited

Loading

github-actions Bot commented Sep 28, 2025 •

edited

Loading

`cython-lint`

ogrisel Oct 6, 2025 •

edited

Loading

ogrisel commented Oct 9, 2025 •

edited

Loading

ogrisel commented Oct 20, 2025 •

edited

Loading