Skip to content

ENH: Improve np.kron performance#21354

Merged
mattip merged 3 commits into
numpy:mainfrom
ganesh-k13:perf_kron_21257
Apr 19, 2022
Merged

ENH: Improve np.kron performance#21354
mattip merged 3 commits into
numpy:mainfrom
ganesh-k13:perf_kron_21257

Conversation

@ganesh-k13

@ganesh-k13 ganesh-k13 commented Apr 17, 2022

Copy link
Copy Markdown
Member

Improve np.kron performance

  • Use broadcasting during multiply to speed up the performance
  • Also removed transpose logic to reduce total operations.

Boost amount

Compare with bb811f4 (main)
~/os/numpy (perf_kron_21257) » python3 runtests.py --bench-compare main bench_shape_base.Kron                                                                         129 ↵ ganesh@ganesh-MS-7B86
· Creating environments
· Discovering benchmarks
·· Uninstalling from virtualenv-py3.9-Cython
·· Installing 06d34945 <perf_kron_21257> into virtualenv-py3.9-Cython.
· Running 6 total benchmarks (2 commits * 1 environments * 3 benchmarks)
[  0.00%] · For numpy commit bb811f45 <main> (round 1/2):
[  0.00%] ·· Building for virtualenv-py3.9-Cython.
[  0.00%] ·· Benchmarking virtualenv-py3.9-Cython
[  0.00%] ··· Importing benchmark suite produced output:
[  0.00%] ···· NumPy CPU features: SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX2* AVX512F? AVX512CD? AVX512_KNL? AVX512_KNM? AVX512_SKX? AVX512_CLX? AVX512_CNL? AVX512_ICL?
[  8.33%] ··· Running (bench_shape_base.Kron.time_arr_kron--)...
[ 25.00%] · For numpy commit 06d34945 <perf_kron_21257> (round 1/2):
[ 25.00%] ·· Building for virtualenv-py3.9-Cython.
[ 25.00%] ·· Benchmarking virtualenv-py3.9-Cython
[ 33.33%] ··· Running (bench_shape_base.Kron.time_arr_kron--)...
[ 50.00%] · For numpy commit 06d34945 <perf_kron_21257> (round 2/2):
[ 50.00%] ·· Benchmarking virtualenv-py3.9-Cython
[ 58.33%] ··· bench_shape_base.Kron.time_arr_kron                                                                                                                                          274±2ms
[ 66.67%] ··· bench_shape_base.Kron.time_mat_kron                                                                                                                                          215±2ms
[ 75.00%] ··· bench_shape_base.Kron.time_scalar_kron                                                                                                                                   3.96±0.05μs
[ 75.00%] · For numpy commit bb811f45 <main> (round 2/2):
[ 75.00%] ·· Building for virtualenv-py3.9-Cython.
[ 75.00%] ·· Benchmarking virtualenv-py3.9-Cython
[ 83.33%] ··· bench_shape_base.Kron.time_arr_kron                                                                                                                                          438±2ms
[ 91.67%] ··· bench_shape_base.Kron.time_mat_kron                                                                                                                                          413±7ms
[100.00%] ··· bench_shape_base.Kron.time_scalar_kron                                                                                                                                   3.87±0.07μs
       before           after         ratio
     [bb811f45]       [06d34945]
     <main>           <perf_kron_21257>
-         438±2ms          274±2ms     0.63  bench_shape_base.Kron.time_arr_kron
-         413±7ms          215±2ms     0.52  bench_shape_base.Kron.time_mat_kron

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.
Compare with latest release (v1.22.3)
~/os/numpy (perf_kron_21257) » python3 runtests.py --bench-compare v1.22.3 bench_shape_base.Kron                                                                            ganesh@ganesh-MS-7B86
· Creating environments
· Discovering benchmarks
·· Uninstalling from virtualenv-py3.9-Cython
·· Installing 06d34945 <perf_kron_21257> into virtualenv-py3.9-Cython.
· Running 6 total benchmarks (2 commits * 1 environments * 3 benchmarks)
[  0.00%] · For numpy commit 7d4349e3 <v1.22.3^0> (round 1/2):
[  0.00%] ·· Building for virtualenv-py3.9-Cython.......................................
[  0.00%] ·· Benchmarking virtualenv-py3.9-Cython
[  0.00%] ··· Importing benchmark suite produced output:
[  0.00%] ···· NumPy CPU features: SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX2* AVX512F? AVX512CD? AVX512_KNL? AVX512_KNM? AVX512_SKX? AVX512_CLX? AVX512_CNL? AVX512_ICL?
[  8.33%] ··· Running (bench_shape_base.Kron.time_arr_kron--)...
[ 25.00%] · For numpy commit 06d34945 <perf_kron_21257> (round 1/2):
[ 25.00%] ·· Building for virtualenv-py3.9-Cython.
[ 25.00%] ·· Benchmarking virtualenv-py3.9-Cython
[ 33.33%] ··· Running (bench_shape_base.Kron.time_arr_kron--)...
[ 50.00%] · For numpy commit 06d34945 <perf_kron_21257> (round 2/2):
[ 50.00%] ·· Benchmarking virtualenv-py3.9-Cython
[ 58.33%] ··· bench_shape_base.Kron.time_arr_kron                                                                                                                                          276±3ms
[ 66.67%] ··· bench_shape_base.Kron.time_mat_kron                                                                                                                                          220±4ms
[ 75.00%] ··· bench_shape_base.Kron.time_scalar_kron                                                                                                                                   3.94±0.05μs
[ 75.00%] · For numpy commit 7d4349e3 <v1.22.3^0> (round 2/2):
[ 75.00%] ·· Building for virtualenv-py3.9-Cython..
[ 75.00%] ·· Benchmarking virtualenv-py3.9-Cython
[ 83.33%] ··· bench_shape_base.Kron.time_arr_kron                                                                                                                                       1.32±0.02s
[ 91.67%] ··· bench_shape_base.Kron.time_mat_kron                                                                                                                                         733±20ms
[100.00%] ··· bench_shape_base.Kron.time_scalar_kron                                                                                                                                    3.78±0.1μs
       before           after         ratio
     [7d4349e3]       [06d34945]
     <v1.22.3^0>       <perf_kron_21257>
-        733±20ms          220±4ms     0.30  bench_shape_base.Kron.time_mat_kron
-      1.32±0.02s          276±3ms     0.21  bench_shape_base.Kron.time_arr_kron

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

Total speedup of about 70-80% compared to the current release

Explanation

Ok let me try my best to explain the current flow:

  1. Let's take two arrays a and b such that
a = np.ones((2,0,2))
b = np.ones((2,2))
  1. Transform the shape ndims of smaller array (b in this case) to make them equal, hence a's shape stays (2,0,2) while b becomes (1,2,2). We prepend in case you were wondering. This is arbitrary from my searching, as few people prefer to append as well.
  2. Now insert dimensions to both, such that we add them at odd axes for a and even for b. This is to compute the product for the required sub parts. Using broadcasting for the product of course which is helping in the performance.
  3. The shape of a is now (2, 1, 0, 1, 2, 1) and b will be (1, 1, 1, 2, 1, 2)
  4. After computing the product we reshape the result to the krons shape, 2, 0, 4. We get this shape by multiplying shapes of a and b.

TODO

  • One release note for performance e18e312
  • Add inline comments 8e447c8

Part of #21257

Comment thread numpy/lib/shape_base.py Outdated
@mattip

mattip commented Apr 17, 2022

Copy link
Copy Markdown
Member

Nice speed up. Could you add a comment about what is going on into the code?

@ganesh-k13

ganesh-k13 commented Apr 17, 2022

Copy link
Copy Markdown
Member Author

Yeah sure, will add comments in code 👍 , added a TODO

@mattip mattip left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I will wait to merge in case anyone else wants to take a look.

@mattip mattip merged commit 0ebde37 into numpy:main Apr 19, 2022
@mattip

mattip commented Apr 19, 2022

Copy link
Copy Markdown
Member

Thanks @ganesh-k13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants