Skip to content

fl::function: investigate heap-allocation tradeoffs of the HeapHolder<F> + shared_ptr fallback (Option B follow-on) #3237

Description

@zackees

Context

The fl::function<Sig> Option B refactor (under #3235 Tier 1B, branch feat/fl-function-option-b-single-invoker-3235) collapses the 5-alternative variant storage to a single SBO + invoker + manager design. To preserve support for lambdas larger than the SBO (FASTLED_INLINE_LAMBDA_SIZE, currently 64 B), the implementation falls back to a HeapHolder<F> struct that wraps fl::shared_ptr<F> inside the SBO. This issue tracks the tradeoff investigation requested at PR review time.

The concern

Wrapping over-SBO callables in shared_ptr<F> has good properties (refcount-based copy is cheap; shared_ptr already in tree; copy/move/destroy multiplex through the existing non_trivial_manager template). But it changes the cost model in two ways:

  1. Every over-SBO callable lives on the heap, even when the user constructs a single fl::function and never copies it. A unique_ptr-style holder would keep the same SBO surface but skip the refcount overhead and bump-allocator behavior.

  2. Even when the lambda DOES fit in the SBO, every copy of the fl::function calls the non_trivial_manager for HeapHolder, which calls shared_ptr<F>'s copy constructor (refcount increment). For trivially-copyable in-SBO lambdas the existing trivial_manager<sizeof(F)> path is used (memcpy only), so the cost is only paid for actual over-SBO captures — but the surface is wider than necessary.

The user-visible question: do copy / move of fl::function "do the right thing" — meaning no heap allocation or refcount touch when the callable fits in the SBO, and minimal-overhead semantics when it doesn't?

What the implementation does today

Scenario Storage path Copy ctor cost Move ctor cost Destroy cost
Empty fl::function n/a memset SBO + pointer copy same nothing
Free function pointer SBO (R(*)(Args...) — 4-8 B) trivial_manager memcpy trivial_manager memcpy nothing
Trivially-copyable lambda ≤ SBO SBO trivial_manager memcpy trivial_manager memcpy nothing
Non-trivially-copyable lambda ≤ SBO SBO non_trivial_manager placement-new copy non_trivial_manager placement-new move non_trivial_manager ~F() call
Lambda > SBO (e.g. UIButton::onClicked at 80 B) HeapHolder<F> in SBO; F on heap non_trivial_managershared_ptr<F>::ctor(const&) → refcount inc non_trivial_managershared_ptr<F>::ctor(&&) → pointer-swap non_trivial_manager~shared_ptr<F>() → refcount dec, possibly free
Member-fn pointer (R(C::*)(Args...), C*) NonConstMemberWrapper<C> in SBO (~12-16 B) trivial_manager memcpy trivial_manager memcpy nothing

The in-SBO paths are correct (zero heap touch). The heap-fallback path uses shared_ptr for the refcount, which means copies share the same heap-allocated F. That's intentional and matches the legacy variant-based behavior (which also used shared_ptr<CallableBase> for the heap fallback).

The investigation we want

Q1: Is shared_ptr the right primitive for the heap fallback?

Alternatives to evaluate:

  • unique_ptr<F> semantics: copy ctor would need to deep-copy F (heap-allocate a new F and copy-construct it). Pro: no refcount overhead. Con: copy is now O(sizeof(F)) — for a 256 B closure being copied a few times that's hundreds of bytes of memcpy work and several new/delete cycles.

  • Refcounted-but-no-atomics holder: shared_ptr uses atomic refcount operations. On single-threaded MCUs (everything except ESP32 dual-core), atomics are wasted overhead. A fl::single_threaded_shared<F> with a plain int refcount would skip the atomic, but matters only on AVR / Cortex-M0 (no atomics → libcalls); on M3+ with native LDREX/STREX the cost is negligible.

  • Intrusive refcount: place the refcount inside F's allocation. Same complexity as shared_ptr, slightly smaller per-object overhead. Probably not worth the diff.

  • Hybrid (unique_ptr for non-copyable fl::functions, shared_ptr for copyable): would require splitting fl::function into fl::move_only_function<Sig> (C++23 style) and the existing fl::function<Sig> — a bigger API churn.

Q2: Can we avoid the heap entirely for "typical" closures?

Audit: how many fl::function instantiations in the codebase actually capture > 64 B of state? The UIButton::onClicked case captures a fl::function<void()> by value (~80 B). Are there others? If the answer is "very few", the heap path may be cold enough that the shared_ptr cost is fine.

Q3: Does the copy ctor "do the right thing" today?

For the user-stated goal — "not copy heap memory whenever possible":

  • ✅ Empty, free-fn, trivially-copyable-lambda-in-SBO, member-fn: zero heap touch.
  • ✅ Non-trivial-lambda-in-SBO: zero heap touch.
  • ⚠️ Over-SBO lambda: copy increments a refcount (1 atomic op on multicore, 0 cost otherwise). NO new heap allocation. Two fl::function copies share one heap-allocated F.

The "right thing" semantically — copying an fl::function should not allocate — IS happening even for the heap-fallback case. What's at issue is whether the initial construction of an over-SBO fl::function should heap-allocate, or whether we should aggressively force users to keep captures under 64 B.

Q4: Should we expose an opt-out for users who can't tolerate heap?

A FL_FUNCTION_NO_HEAP_FALLBACK macro could make over-SBO callables a static_assert failure. Users could then choose per-target whether to allow heap fallback or fail fast.

Acceptance criteria for closing this issue

  • Audit src/ + examples/ for fl::function<Sig> instantiations whose captured state exceeds 64 B. Document the count and the largest example.
  • Measure heap-allocation count + RAM for a known-large-capture sketch (e.g. one that uses UIButton::onClicked heavily) before/after the Option B refactor.
  • Decide: keep shared_ptr, switch to unique_ptr-style deep-copy, or split into move-only vs copyable variants.
  • If keeping shared_ptr: document why (covered in this issue body if the audit shows refcount cost is negligible).
  • If switching: file a follow-on PR replacing HeapHolder<F> with the chosen alternative; verify copy/move semantics on the host test suite.
  • Add a FL_FUNCTION_NO_HEAP_FALLBACK macro (or document why not) so memory-constrained users can opt out.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    Status
    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions