Context
The fl::function<Sig> Option B refactor (under #3235 Tier 1B, branch feat/fl-function-option-b-single-invoker-3235) collapses the 5-alternative variant storage to a single SBO + invoker + manager design. To preserve support for lambdas larger than the SBO (FASTLED_INLINE_LAMBDA_SIZE, currently 64 B), the implementation falls back to a HeapHolder<F> struct that wraps fl::shared_ptr<F> inside the SBO. This issue tracks the tradeoff investigation requested at PR review time.
The concern
Wrapping over-SBO callables in shared_ptr<F> has good properties (refcount-based copy is cheap; shared_ptr already in tree; copy/move/destroy multiplex through the existing non_trivial_manager template). But it changes the cost model in two ways:
-
Every over-SBO callable lives on the heap, even when the user constructs a single fl::function and never copies it. A unique_ptr-style holder would keep the same SBO surface but skip the refcount overhead and bump-allocator behavior.
-
Even when the lambda DOES fit in the SBO, every copy of the fl::function calls the non_trivial_manager for HeapHolder, which calls shared_ptr<F>'s copy constructor (refcount increment). For trivially-copyable in-SBO lambdas the existing trivial_manager<sizeof(F)> path is used (memcpy only), so the cost is only paid for actual over-SBO captures — but the surface is wider than necessary.
The user-visible question: do copy / move of fl::function "do the right thing" — meaning no heap allocation or refcount touch when the callable fits in the SBO, and minimal-overhead semantics when it doesn't?
What the implementation does today
| Scenario |
Storage path |
Copy ctor cost |
Move ctor cost |
Destroy cost |
Empty fl::function |
n/a |
memset SBO + pointer copy |
same |
nothing |
| Free function pointer |
SBO (R(*)(Args...) — 4-8 B) |
trivial_manager memcpy |
trivial_manager memcpy |
nothing |
| Trivially-copyable lambda ≤ SBO |
SBO |
trivial_manager memcpy |
trivial_manager memcpy |
nothing |
| Non-trivially-copyable lambda ≤ SBO |
SBO |
non_trivial_manager placement-new copy |
non_trivial_manager placement-new move |
non_trivial_manager ~F() call |
Lambda > SBO (e.g. UIButton::onClicked at 80 B) |
HeapHolder<F> in SBO; F on heap |
non_trivial_manager → shared_ptr<F>::ctor(const&) → refcount inc |
non_trivial_manager → shared_ptr<F>::ctor(&&) → pointer-swap |
non_trivial_manager → ~shared_ptr<F>() → refcount dec, possibly free |
Member-fn pointer (R(C::*)(Args...), C*) |
NonConstMemberWrapper<C> in SBO (~12-16 B) |
trivial_manager memcpy |
trivial_manager memcpy |
nothing |
The in-SBO paths are correct (zero heap touch). The heap-fallback path uses shared_ptr for the refcount, which means copies share the same heap-allocated F. That's intentional and matches the legacy variant-based behavior (which also used shared_ptr<CallableBase> for the heap fallback).
The investigation we want
Q1: Is shared_ptr the right primitive for the heap fallback?
Alternatives to evaluate:
-
unique_ptr<F> semantics: copy ctor would need to deep-copy F (heap-allocate a new F and copy-construct it). Pro: no refcount overhead. Con: copy is now O(sizeof(F)) — for a 256 B closure being copied a few times that's hundreds of bytes of memcpy work and several new/delete cycles.
-
Refcounted-but-no-atomics holder: shared_ptr uses atomic refcount operations. On single-threaded MCUs (everything except ESP32 dual-core), atomics are wasted overhead. A fl::single_threaded_shared<F> with a plain int refcount would skip the atomic, but matters only on AVR / Cortex-M0 (no atomics → libcalls); on M3+ with native LDREX/STREX the cost is negligible.
-
Intrusive refcount: place the refcount inside F's allocation. Same complexity as shared_ptr, slightly smaller per-object overhead. Probably not worth the diff.
-
Hybrid (unique_ptr for non-copyable fl::functions, shared_ptr for copyable): would require splitting fl::function into fl::move_only_function<Sig> (C++23 style) and the existing fl::function<Sig> — a bigger API churn.
Q2: Can we avoid the heap entirely for "typical" closures?
Audit: how many fl::function instantiations in the codebase actually capture > 64 B of state? The UIButton::onClicked case captures a fl::function<void()> by value (~80 B). Are there others? If the answer is "very few", the heap path may be cold enough that the shared_ptr cost is fine.
Q3: Does the copy ctor "do the right thing" today?
For the user-stated goal — "not copy heap memory whenever possible":
- ✅ Empty, free-fn, trivially-copyable-lambda-in-SBO, member-fn: zero heap touch.
- ✅ Non-trivial-lambda-in-SBO: zero heap touch.
- ⚠️ Over-SBO lambda: copy increments a refcount (1 atomic op on multicore, 0 cost otherwise). NO new heap allocation. Two
fl::function copies share one heap-allocated F.
The "right thing" semantically — copying an fl::function should not allocate — IS happening even for the heap-fallback case. What's at issue is whether the initial construction of an over-SBO fl::function should heap-allocate, or whether we should aggressively force users to keep captures under 64 B.
Q4: Should we expose an opt-out for users who can't tolerate heap?
A FL_FUNCTION_NO_HEAP_FALLBACK macro could make over-SBO callables a static_assert failure. Users could then choose per-target whether to allow heap fallback or fail fast.
Acceptance criteria for closing this issue
Related
Context
The
fl::function<Sig>Option B refactor (under #3235 Tier 1B, branchfeat/fl-function-option-b-single-invoker-3235) collapses the 5-alternative variant storage to a single SBO + invoker + manager design. To preserve support for lambdas larger than the SBO (FASTLED_INLINE_LAMBDA_SIZE, currently 64 B), the implementation falls back to aHeapHolder<F>struct that wrapsfl::shared_ptr<F>inside the SBO. This issue tracks the tradeoff investigation requested at PR review time.The concern
Wrapping over-SBO callables in
shared_ptr<F>has good properties (refcount-based copy is cheap; shared_ptr already in tree; copy/move/destroy multiplex through the existingnon_trivial_managertemplate). But it changes the cost model in two ways:Every over-SBO callable lives on the heap, even when the user constructs a single
fl::functionand never copies it. Aunique_ptr-style holder would keep the same SBO surface but skip the refcount overhead and bump-allocator behavior.Even when the lambda DOES fit in the SBO, every copy of the
fl::functioncalls thenon_trivial_managerfor HeapHolder, which callsshared_ptr<F>'s copy constructor (refcount increment). For trivially-copyable in-SBO lambdas the existingtrivial_manager<sizeof(F)>path is used (memcpyonly), so the cost is only paid for actual over-SBO captures — but the surface is wider than necessary.The user-visible question: do copy / move of
fl::function"do the right thing" — meaning no heap allocation or refcount touch when the callable fits in the SBO, and minimal-overhead semantics when it doesn't?What the implementation does today
fl::functionmemsetSBO + pointer copyR(*)(Args...)— 4-8 B)trivial_managermemcpytrivial_managermemcpytrivial_managermemcpytrivial_managermemcpynon_trivial_managerplacement-new copynon_trivial_managerplacement-new movenon_trivial_manager~F()callUIButton::onClickedat 80 B)HeapHolder<F>in SBO; F on heapnon_trivial_manager→shared_ptr<F>::ctor(const&)→ refcount incnon_trivial_manager→shared_ptr<F>::ctor(&&)→ pointer-swapnon_trivial_manager→~shared_ptr<F>()→ refcount dec, possibly freeR(C::*)(Args...), C*)NonConstMemberWrapper<C>in SBO (~12-16 B)trivial_managermemcpytrivial_managermemcpyThe in-SBO paths are correct (zero heap touch). The heap-fallback path uses
shared_ptrfor the refcount, which means copies share the same heap-allocated F. That's intentional and matches the legacy variant-based behavior (which also usedshared_ptr<CallableBase>for the heap fallback).The investigation we want
Q1: Is
shared_ptrthe right primitive for the heap fallback?Alternatives to evaluate:
unique_ptr<F>semantics: copy ctor would need to deep-copy F (heap-allocate a new F and copy-construct it). Pro: no refcount overhead. Con: copy is now O(sizeof(F)) — for a 256 B closure being copied a few times that's hundreds of bytes of memcpy work and severalnew/deletecycles.Refcounted-but-no-atomics holder:
shared_ptruses atomic refcount operations. On single-threaded MCUs (everything except ESP32 dual-core), atomics are wasted overhead. Afl::single_threaded_shared<F>with a plainintrefcount would skip the atomic, but matters only on AVR / Cortex-M0 (no atomics → libcalls); on M3+ with native LDREX/STREX the cost is negligible.Intrusive refcount: place the refcount inside F's allocation. Same complexity as
shared_ptr, slightly smaller per-object overhead. Probably not worth the diff.Hybrid (
unique_ptrfor non-copyablefl::functions,shared_ptrfor copyable): would require splittingfl::functionintofl::move_only_function<Sig>(C++23 style) and the existingfl::function<Sig>— a bigger API churn.Q2: Can we avoid the heap entirely for "typical" closures?
Audit: how many
fl::functioninstantiations in the codebase actually capture > 64 B of state? TheUIButton::onClickedcase captures afl::function<void()>by value (~80 B). Are there others? If the answer is "very few", the heap path may be cold enough that the shared_ptr cost is fine.Q3: Does the copy ctor "do the right thing" today?
For the user-stated goal — "not copy heap memory whenever possible":
fl::functioncopies share one heap-allocated F.The "right thing" semantically — copying an
fl::functionshould not allocate — IS happening even for the heap-fallback case. What's at issue is whether the initial construction of an over-SBOfl::functionshould heap-allocate, or whether we should aggressively force users to keep captures under 64 B.Q4: Should we expose an opt-out for users who can't tolerate heap?
A
FL_FUNCTION_NO_HEAP_FALLBACKmacro could make over-SBO callables astatic_assertfailure. Users could then choose per-target whether to allow heap fallback or fail fast.Acceptance criteria for closing this issue
src/+examples/forfl::function<Sig>instantiations whose captured state exceeds 64 B. Document the count and the largest example.UIButton::onClickedheavily) before/after the Option B refactor.shared_ptr, switch tounique_ptr-style deep-copy, or split into move-only vs copyable variants.shared_ptr: document why (covered in this issue body if the audit shows refcount cost is negligible).HeapHolder<F>with the chosen alternative; verify copy/move semantics on the host test suite.FL_FUNCTION_NO_HEAP_FALLBACKmacro (or document why not) so memory-constrained users can opt out.Related
fl::function's copy/move is currently "matchesstd::functionshallowly" — this issue may end up tightening that contract toward "matchesstd::function-with-explicit-allocator semantics" or splitting into separate types.