Comparing array subset checks by gabteles · Pull Request #125 · fastruby/fast-ruby

gabteles · 2017-05-14T15:41:13Z

No description provided.

mblumtritt · 2017-05-16T08:59:09Z

Your code above is massively dependent to used sample arrays. Using this samples

ARRAY1 = [*(400..500)].freeze
ARRAY2 = [*(1..500)].freeze

will change your results completely. I would prefer to use a more real-world sample for the test to avoid misleadings…

Arcovion · 2017-05-16T09:25:45Z

Maybe (ARRAY1 & ARRAY2).size == ARRAY1.size is faster - also try the set library; convert both #to_set and check with #subset?

gabteles · 2017-05-31T12:12:37Z

@mblumtritt What you suggest as a real world example? It would be nice if I generate a random-filled array of integers and let it static in the test or something like that?

@Arcovion Nice, I'll add both.

Thank you guys. (:

mblumtritt · 2017-05-31T20:04:03Z

@gabteles Do you mean something like this?

ARRAY1 = [*(400..500)].shuffle.freeze
ARRAY2 = [*(1..500)].shuffle.freeze

Then we have to other problem: how is the weight between the set and the subset 🤔 Or do you prefer a complete random set to compare?…

In "real world" I would check my chances to be faster with a complete different algorithm. For sample what about sorting first descending? This might be faster if both sets already nearly sorted…

I'm unsure to give a "general advice" for this subset problem. In "real world" it always depends… ;)

gabteles · 2017-06-01T12:28:29Z

@mblumtritt Yeah, I agree that the weight between set/subset is a big problem to benchmark. I'm not sure how to handle it.

Before I meant something like this:

Generate random arrays (generated with small size to better readability in this example, test could have bigger arrays):

$ irb
2.4.1 :001 > 10.times.map { rand(100) }
 => [20, 83, 86, 53, 63, 52, 50, 43, 18, 94] 
2.4.1 :002 > 10.times.map { rand(100) }
 => [27, 93, 84, 86, 71, 89, 38, 48, 37, 22]

Statically use it in code:

ARRAY1 = [20, 83, 86, 53, 63, 52, 50, 43, 18, 94].freeze
ARRAY2 = [27, 93, 84, 86, 71, 89, 38, 48, 37, 22].freeze

Using it statically grants that we won't have performance improved/degraded by randomness effect of shuffle or rand in runtime. But it also suffers of the weight problem.

Copilot

Pull request overview

Adds a new benchmark to compare different ways of checking whether one Ruby array is a subset of another, and documents the results in the main README alongside existing performance comparisons.

Changes:

Added a new benchmark script for array subset detection methods (code/array/subset-checking.rb).
Added a README section documenting the benchmark command and sample results (README.md).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
README.md	Adds a new documented benchmark section and sample output for array subset detection.
code/array/subset-checking.rb	Introduces the benchmark implementation comparing multiple subset-check approaches.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+ARRAY2 = [*1..100]
+
+def slow_set
+  ARRAY2.to_set.subset?(ARRAY1.to_set)


+def slow_interception
+  (ARRAY1 & ARRAY2) == ARRAY1
+end
+
+def slow_interception_size
+  (ARRAY1 & ARRAY2).size == ARRAY1.size
+end
+
+def slow_minus_empty
+  (ARRAY1 - ARRAY2).empty?
+end
+
+def fast
+  ARRAY1.all?{|element| ARRAY2.include?(element) }
+end
+
+Benchmark.ips do |x|
+  x.report("(a1 - a2).empty?") { slow_minus_empty }
+  x.report("(a1 & a2) == a1") { slow_interception }
+  x.report("Array#all?#include?") { fast }
+  x.report("Array#&#size") { slow_interception_size }


+end
+
+def fast
+  ARRAY1.all?{|element| ARRAY2.include?(element) }


+##### Subset detection with various methods [code](code/array/subset-checking.rb)
+
+```
+$ ruby -v code/array/subset-checking.rb
+ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-linux]


etagwerker · 2026-06-28T02:35:46Z

Thanks for this, @gabteles, and thanks to @mblumtritt and @Arcovion for the thoughtful review back in 2017. I revisited this on Ruby 4.0.0 to see whether it still holds up, and unfortunately the concern @mblumtritt raised is exactly what sinks it. Going to close it, but I want to lay out the reasoning so it's useful for anyone who finds this later.

1. The conclusion no longer holds on Ruby 4.0

Running the benchmark verbatim, the ranking has flipped and the gaps have collapsed:

Method	Ruby 2.4.1 (this PR)	Ruby 4.0.0
`(a1 - a2).empty?`	1.41x slower	fastest
`Array#&#size`	1.66x slower	1.08x slower
`Array#all?#include?`	fastest	1.09x slower
`(a1 & a2) == a1`	1.72x slower	1.15x slower
`Set#subset?`	6.73x slower	1.54x slower

Array#all?#include? went from clear winner to mid-pack, and the spread shrank from ~6.7x to ~1.5x.

2. The `Set#subset?` case has a bug

The arguments are reversed:

ARRAY2.to_set.subset?(ARRAY1.to_set)   # checks if the BIG array is a subset of the small one

It returns false while every other method returns true, so that row was never measuring the same thing. It also builds the sets inside the benchmark loop, so it's really timing to_set conversion rather than the subset check.

3. The real problem: there is no stable winner (the 2017 concern, confirmed)

@mblumtritt's point that the result is "massively dependent on the used sample arrays" is fully borne out on Ruby 4.0. The fastest method changes completely with the input:

True subset, 25/100 (this PR's case): (a1 - a2).empty? wins; all?#include? is mid-pack.
Not a subset, first element missing: all?#include? is ~6x faster than everything else, because all? short-circuits on the first miss.
Larger true subset, 250/1000: all?#include? is 44.9x slower. It's O(n·m), a Ruby-level block calling an O(m) include? per element, so it falls off a cliff as the arrays grow.
Set#subset? with the sets built once (outside the loop, args correct) is the fastest approach in every scenario, 6.5x–42x faster, because it's O(n) hash lookups and scales.

fast-ruby works best when it can give a clear "do X, not Y." This comparison has no robust winner, has a correctness bug, and as written penalizes the approach that's actually fastest at scale, so I'd rather not enshrine it as an idiom.

The genuinely useful takeaway: if you're doing subset checks (especially repeatedly or on large arrays), convert to Set once and use Set#subset?. (a1 - a2).empty? is the most consistent array-only option; all?#include? only wins when it can bail out early on a non-subset.

Closing for those reasons, but thank you all for the contribution and the discussion. It aged into a nice case study of why benchmarks need realistic, varied inputs.

comparing array subset checks

f6ba8de

more subset checking methods

9103850

etagwerker requested a review from Copilot June 28, 2026 02:30

Copilot started reviewing on behalf of etagwerker June 28, 2026 02:31 View session

Copilot AI reviewed Jun 28, 2026

View reviewed changes

etagwerker closed this Jun 28, 2026

etagwerker mentioned this pull request Jun 28, 2026

Add array subset-check benchmark across Ruby 3.3, 3.4, 4.0 #237

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comparing array subset checks#125

Comparing array subset checks#125
gabteles wants to merge 2 commits into
fastruby:mainfrom
gabteles:master

gabteles commented May 14, 2017

Uh oh!

mblumtritt commented May 16, 2017 •

edited

Loading

Uh oh!

Arcovion commented May 16, 2017 •

edited

Loading

Uh oh!

gabteles commented May 31, 2017

Uh oh!

mblumtritt commented May 31, 2017

Uh oh!

gabteles commented Jun 1, 2017 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

etagwerker commented Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

gabteles commented May 14, 2017

Uh oh!

mblumtritt commented May 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Arcovion commented May 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gabteles commented May 31, 2017

Uh oh!

mblumtritt commented May 31, 2017

Uh oh!

gabteles commented Jun 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

etagwerker commented Jun 28, 2026

1. The conclusion no longer holds on Ruby 4.0

2. The Set#subset? case has a bug

3. The real problem: there is no stable winner (the 2017 concern, confirmed)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mblumtritt commented May 16, 2017 •

edited

Loading

Arcovion commented May 16, 2017 •

edited

Loading

gabteles commented Jun 1, 2017 •

edited

Loading

2. The `Set#subset?` case has a bug