Skip to content

Use a Schwartzian transform with custom sorting#6342

Merged
jekyllbot merged 7 commits into
masterfrom
schwartzian-transform
Sep 2, 2017
Merged

Use a Schwartzian transform with custom sorting#6342
jekyllbot merged 7 commits into
masterfrom
schwartzian-transform

Conversation

@parkr

@parkr parkr commented Sep 1, 2017

Copy link
Copy Markdown
Member

Hey!

I was reading through the Enumerable#sort_by documentation and came across an interesting passage about how it gets its speed-up:

A more efficient technique is to cache the sort keys (modification times in this case)
before the sort. Perl users often call this approach a Schwartzian transform, after
Randal Schwartz. We construct a temporary array, where each element is an array
containing our sort key along with the filename. We sort this array, and then extract
the filename from the result.

This is exactly what sort_by does internally.

Whoa! A hidden gem: a recommendation about getting the most out of #sort by reducing object allocations. I'm in!

I tested it on the _docs collection in our jekyllrb.com site and it is quite impressive:

Yeah, ok, correctness all checks out for property "redirect_from"
Yeah, ok, correctness all checks out for property "title"
Warming up --------------------------------------
sort_by_property_directly with sparse property
                        11.000  i/100ms
schwartzian_transform with sparse property
                        76.000  i/100ms
Calculating -------------------------------------
sort_by_property_directly with sparse property
                        114.388  (± 8.7%) i/s -      1.144k in  10.087658s
schwartzian_transform with sparse property
                        824.113  (± 5.2%) i/s -      8.284k in  10.083133s

Comparison:
schwartzian_transform with sparse property:      824.1 i/s
sort_by_property_directly with sparse property:      114.4 i/s - 7.20x  slower

Warming up --------------------------------------
sort_by_property_directly with non-sparse property
                       932.000  i/100ms
schwartzian_transform with non-sparse property
                         1.772k i/100ms
Calculating -------------------------------------
sort_by_property_directly with non-sparse property
                          9.885k (± 6.3%) i/s -     98.792k in  10.040320s
schwartzian_transform with non-sparse property
                         17.436k (± 7.5%) i/s -    173.656k in  10.023797s

Comparison:
schwartzian_transform with non-sparse property:    17435.8 i/s
sort_by_property_directly with non-sparse property:     9885.2 i/s - 1.76x  slower

Based on these numbers, I switched Jekyll::Filters#sort_input to use this method.

🎉 🐎

@parkr parkr requested review from a team September 1, 2017 15:28
@parkr parkr force-pushed the schwartzian-transform branch from ccee9be to dae8e24 Compare September 1, 2017 16:16
@ashmaroli

Copy link
Copy Markdown
Member

nice implementation Parker 👍 I did come across this transformation while reading up for another PR, but never bothered to bench an implementation..

If you have the time can you please benchmark using apples[0] and oranges[0] too?
Array[0] is a tad faster than Array.first

@parkr

parkr commented Sep 1, 2017

Copy link
Copy Markdown
Member Author

If you have the time can you please benchmark using apples[0] and oranges[0] too?
Array[0] is a tad faster than Array.first

Updated. I had to ensure I was copying the docs array for each new try (otherwise everything would be faster after the first iteration which sorted in place). Now showing much higher gains: 7x faster and 1.75x faster.

@ashmaroli

Copy link
Copy Markdown
Member

Now showing much higher gains: 7x faster and 1.75x faster

WOW!! Très cool!!!

@ashmaroli

Copy link
Copy Markdown
Member

@parkr IMO, this change should ideally be made into a public Utility method which can then be adapted for your suggestion to #5904

# lib/jekyll/utils.rb

def schwartzian_transform(input, property, order)
  [...]
end

@parkr

parkr commented Sep 1, 2017

Copy link
Copy Markdown
Member Author

this change should ideally be made into a public Utility method

I don't think we can do that, because each implementation will pull the property from input through different means each time, e.g. using input.data[property] or input[property] or item_property(input, property) in the filters.

Comment thread lib/jekyll/filters.rb
apple_property <=> orange_property
input.map { |item| [item_property(item, property), item] }
.sort! do |apple_info, orange_info|
apple_property = apple_info.first

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the benchmark results, Array#[] was faster than Array#first correct? Should we switch to that or did I interpret the benchmark results incorrectly?

@mattr-

mattr- commented Sep 2, 2017

Copy link
Copy Markdown
Member

What happens if we switch to Enumerable#sort_by directly? What's the reason that I'm missing that keeps us from doing:

input.sort_by { |item| item_property(item, property) }

@Crunch09

Crunch09 commented Sep 2, 2017

Copy link
Copy Markdown
Member

What happens if we switch to Enumerable#sort_by directly?

@mattr- : @parkr added a comment about this in the benchmark script, sort_by doesn't like nils

["foo", nil, "bar"].sort_by{|el| el}
# => ArgumentError: comparison of String with nil failed

@Crunch09 Crunch09 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 Really nice change!

@mattr-

mattr- commented Sep 2, 2017

Copy link
Copy Markdown
Member

@parkr added a comment about this in the benchmark script

Always good to know that I can't read very well. 🤣 Thanks @Crunch09!

@mattr-

mattr- commented Sep 2, 2017

Copy link
Copy Markdown
Member

@jekyllbot: merge +minor

@jekyllbot jekyllbot merged commit 6ce912e into master Sep 2, 2017
@jekyllbot jekyllbot deleted the schwartzian-transform branch September 2, 2017 14:38
jekyllbot added a commit that referenced this pull request Sep 2, 2017
@jekyll jekyll locked and limited conversation to collaborators Jul 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants