Skip to content

Commit 12a1227

Browse files
etagwerkerclaude
andcommitted
Add array subset-check benchmark across Ruby 3.3, 3.4, 4.0
Revisits the comparison from the closed #125 (gabteles), fixing the reversed Set#subset? arguments so every approach returns the same result (guarded by an equivalence check), and benchmarks across modern Ruby versions. Findings: - (a1 - a2).empty? is the consistent winner for true-subset inputs. - a1.all? { include? } only wins when a1 is NOT a subset (short-circuits) and is O(n*m) on large true subsets. - Set#subset? (incl. to_set) went from ~6.8x slower on 3.3/3.4 to ~1.7x slower on 4.0, where Set got much faster. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent 2342b8a commit 12a1227

2 files changed

Lines changed: 97 additions & 0 deletions

File tree

README.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -492,6 +492,61 @@ Comparison:
492492
Array#sort_by &:-@: 229323.6 i/s - 2.44x slower
493493
```
494494

495+
##### Subset check: `(a1 - a2).empty?` vs alternatives [code](code/array/subset-check.rb)
496+
497+
> To check whether every element of `a1` is also in `a2`, `(a1 - a2).empty?` is
498+
> consistently the fastest across modern Ruby versions for the common case where
499+
> `a1` is actually a subset. <br>
500+
> **Caveat:** the winner is highly data-dependent. `a1.all? { |e| a2.include?(e) }`
501+
> short-circuits on the first miss, so it wins when `a1` is *not* a subset, but it
502+
> is O(n*m) and degrades badly on large true subsets. <br>
503+
> Note also that `Set#subset?` (including the `to_set` conversion) was ~6.8x slower
504+
> on Ruby 3.3/3.4 but only ~1.7x slower on Ruby 4.0, where `Set` became much
505+
> faster. If you already hold `Set`s (or check repeatedly), `Set#subset?` scales
506+
> best.
507+
508+
```
509+
$ ruby -v code/array/subset-check.rb
510+
ruby 3.3.10 (2025-10-23 revision 343ea05002) [arm64-darwin25]
511+
Warming up --------------------------------------
512+
(a1 - a2).empty? 86.499k i/100ms
513+
(a1 & a2) == a1 73.860k i/100ms
514+
(a1 & a2).size == n 74.102k i/100ms
515+
a1.all? { include? } 67.703k i/100ms
516+
a1.to_set.subset? 12.068k i/100ms
517+
Calculating -------------------------------------
518+
(a1 - a2).empty? 849.546k (± 2.7%) i/s (1.18 μs/i) - 4.325M in 5.090893s
519+
(a1 & a2) == a1 746.103k (± 2.0%) i/s (1.34 μs/i) - 3.767M in 5.048711s
520+
(a1 & a2).size == n 780.447k (± 1.9%) i/s (1.28 μs/i) - 3.927M in 5.032254s
521+
a1.all? { include? } 715.301k (± 2.4%) i/s (1.40 μs/i) - 3.588M in 5.016435s
522+
a1.to_set.subset? 124.189k (± 0.8%) i/s (8.05 μs/i) - 627.536k in 5.053052s
523+
524+
Comparison:
525+
(a1 - a2).empty?: 849546.4 i/s
526+
(a1 & a2).size == n: 780446.7 i/s - 1.09x slower
527+
(a1 & a2) == a1: 746103.3 i/s - 1.14x slower
528+
a1.all? { include? }: 715300.6 i/s - 1.19x slower
529+
a1.to_set.subset?: 124189.5 i/s - 6.84x slower
530+
531+
$ ruby -v code/array/subset-check.rb
532+
ruby 3.4.7 (2025-10-08 revision 7a5688e2a2) +PRISM [arm64-darwin25]
533+
Comparison:
534+
(a1 - a2).empty?: 807172.5 i/s
535+
(a1 & a2).size == n: 719880.2 i/s - 1.12x slower
536+
a1.all? { include? }: 713213.0 i/s - 1.13x slower
537+
(a1 & a2) == a1: 687629.1 i/s - 1.17x slower
538+
a1.to_set.subset?: 119834.6 i/s - 6.74x slower
539+
540+
$ ruby -v code/array/subset-check.rb
541+
ruby 4.0.0 (2025-12-25 revision 553f1675f3) +PRISM [arm64-darwin25]
542+
Comparison:
543+
(a1 - a2).empty?: 784706.1 i/s
544+
a1.all? { include? }: 727520.9 i/s - 1.08x slower
545+
(a1 & a2).size == n: 701969.3 i/s - 1.12x slower
546+
(a1 & a2) == a1: 670252.7 i/s - 1.17x slower
547+
a1.to_set.subset?: 467206.3 i/s - 1.68x slower
548+
```
549+
495550
### Enumerable
496551

497552
##### `Enumerable#each + push` vs `Enumerable#map` [code](code/enumerable/each-push-vs-map.rb)

code/array/subset-check.rb

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
require "benchmark/ips"
2+
require "set"
3+
4+
# Check whether ARRAY1 is a subset of ARRAY2 (every element of ARRAY1 is in
5+
# ARRAY2). The fastest approach is highly dependent on the input: `all?` +
6+
# `include?` short-circuits on the first miss (great for non-subsets) but is
7+
# O(n*m) when ARRAY1 really is a subset, while Set-based lookups stay O(n).
8+
ARRAY1 = [*1..25]
9+
ARRAY2 = [*1..100]
10+
11+
def minus_empty
12+
(ARRAY1 - ARRAY2).empty?
13+
end
14+
15+
def intersection_equal
16+
(ARRAY1 & ARRAY2) == ARRAY1
17+
end
18+
19+
def intersection_size
20+
(ARRAY1 & ARRAY2).size == ARRAY1.size
21+
end
22+
23+
def all_include
24+
ARRAY1.all? { |element| ARRAY2.include?(element) }
25+
end
26+
27+
def set_subset
28+
ARRAY1.to_set.subset?(ARRAY2.to_set)
29+
end
30+
31+
# Sanity check: every approach must return the same answer.
32+
results = [minus_empty, intersection_equal, intersection_size, all_include, set_subset]
33+
raise "not equivalent: #{results.inspect}" unless results.uniq.size == 1
34+
35+
Benchmark.ips do |x|
36+
x.report("(a1 - a2).empty?") { minus_empty }
37+
x.report("(a1 & a2) == a1") { intersection_equal }
38+
x.report("(a1 & a2).size == n") { intersection_size }
39+
x.report("a1.all? { include? }") { all_include }
40+
x.report("a1.to_set.subset?") { set_subset }
41+
x.compare!
42+
end

0 commit comments

Comments
 (0)