Samply by blp · Pull Request #6344 · feldera/feldera

blp · 2026-05-29T00:25:19Z

This adds a bunch of useful features in CPU profiles. It fixes a bug that made multihost profiling panic. It also adds node ids to CPU profiling data related to spines, so that it is clear what operators cause exchanges, merges, and other work in spines.

Describe Manual Test Plan

I ran a pipeline and captured CPU profiles a few times and looked at the output.

mythical-fred

Two blockers: a variable-name swap that produces wrong profiling output, and behavior changes without tests. See inline.

mythical-fred · 2026-05-29T02:01:43Z

+                    // Record when marker space was exhausted.  The combination
+                    // of `load` and `store` is not an atomic transaction, but
+                    // it's good enough.
+                    if MARKERS_EXHAUSTED.load().is_none() {
+                        MARKERS_EXHAUSTED.store(Some(Timestamp::now()));
+                    }


The load-then-store race is acknowledged and is fine for this use case. However, this marker-overflow path has no test. A unit test that fills the block capacity and verifies MARKERS_EXHAUSTED gets set (and that the resulting Capture::finish() annotations contain the overflow marker) would be valuable.

mihaibudiu

I think Leonid is indeed the right person to review this.

mihaibudiu · 2026-05-29T04:59:42Z

+/// An operator doesn't initially know its global node ID, but the global node
+/// ID is useful for debugging and profiling.  This type allows the name to
+/// initially omit the ID but adds it when it becomes available.
+pub struct OperatorName(ArcSwap<String>);


why not keep the name and the id separate, and just concatenate them on Display?

I think that would work too.

There are a few reasons I went with the current approach:

I wanted to minimize the space and time overheads (this is just a single pointer and a single atomic increment/decrement) because the names end up getting copied around a bit and most of the time they won't get used.

From memory (I didn't go back and recheck), the names also end up getting used in places where there's already some other string type and it's not necessarily convenient to do a conversion at that point.

Really I wish that the global node id was available at construction instead of sometime later, it would simplify a lot here.

ryzhyk

Haven't read every line. Adding operator names to batches is a neat improvement.

mythical-fred

Re-APPROVE after the blocker fix in 8df3120 / 2b83ea4.

spine_async.rs backpressure_wait: the swapped (initial_batches, name) destructuring is gone. New shape is much cleaner — single let (name, initial_batches); initialised under the lock with an early-return for the already-drained case, then a tight wait loop, then unconditional Span recording. tooltip format string now reads correctly.
exchange.rs / sharded_accumulator.rs: receiver_global_node_id.get().map_or("Exchange"/"ShardedAccumulator", ...) replaces the panicking .unwrap() in fn name() — addresses the multihost CPU profiling crash directly.
samply: marker-overflow recording lands as in-profile markers + const fn new cleanup; AtomicOptionTimestamp tightened.
sharded_accumulator tests: test_config helper + with_streaming_exchange(true) threaded through single-host and multihost test_circuit paths.

Non-blocking: still no unit test directly asserting the new fn name() fallback string for the unset-OnceCell branch, and the samply marker-overflow in-profile recording is exercised only at integration time. Both are reasonable to defer.

mythical-fred

Re-review of 636fee3: rebased onto a new base; the only substantive change vs the previously approved aef0805 is one line in js-packages/web-console/vite.config.ts — adding 'fancy-ansi > escape-html' to testOptimizeDepsInclude. The remaining 111 files are byte-identical to the prior tip. LGTM.

Sometimes `receiver_global_node_id` can be uninitialized on multihost pipelines. The reason isn't entirely clear to me. Signed-off-by: Ben Pfaff <blp@feldera.com>

DBSP doesn't use `arcstr` so there is no need to use the `size-of` feature for it. feldera-sqllib does use it, so enable it there. Signed-off-by: Ben Pfaff <blp@feldera.com>

Signed-off-by: Ben Pfaff <blp@feldera.com>

… out. Before, we just logged a message. Signed-off-by: Ben Pfaff <blp@feldera.com>

Signed-off-by: feldera-bot <feldera-bot@feldera.com>

blp requested a review from ryzhyk May 29, 2026 00:25

blp self-assigned this May 29, 2026

blp added bug Something isn't working DBSP core Related to the core DBSP library rust Pull requests that update Rust code profiler Issues related to the profiler and its APIs metrics Metrics about feldera pipelines multihost Related to multihost or distributed pipelines labels May 29, 2026

mythical-fred suggested changes May 29, 2026

View reviewed changes

mihaibudiu reviewed May 29, 2026

View reviewed changes

blp force-pushed the samply branch from fb1c218 to 8df3120 Compare May 29, 2026 17:03

ryzhyk approved these changes May 29, 2026

View reviewed changes

mythical-fred approved these changes May 29, 2026

View reviewed changes

blp force-pushed the samply branch from 701aaa2 to 0177c52 Compare June 2, 2026 17:02

blp enabled auto-merge June 2, 2026 17:02

blp added this pull request to the merge queue Jun 2, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 2, 2026

blp added this pull request to the merge queue Jun 2, 2026

mythical-fred approved these changes Jun 2, 2026

View reviewed changes

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Jun 2, 2026

blp added 4 commits June 2, 2026 15:34

[dbsp] Fix panic in CPU profiling multihost pipelines.

ee41d29

Sometimes `receiver_global_node_id` can be uninitialized on multihost pipelines. The reason isn't entirely clear to me. Signed-off-by: Ben Pfaff <blp@feldera.com>

[dbsp] Drop unused arcstr feature from size-of dependency.

c093228

DBSP doesn't use `arcstr` so there is no need to use the `size-of` feature for it. feldera-sqllib does use it, so enable it there. Signed-off-by: Ben Pfaff <blp@feldera.com>

[dbsp] Add operator names and global node IDs to spines.

8c96e23

Signed-off-by: Ben Pfaff <blp@feldera.com>

[samply] Record in the profiles themselves when the marker space runs…

9d9038b

… out. Before, we just logged a message. Signed-off-by: Ben Pfaff <blp@feldera.com>

blp force-pushed the samply branch from 0c702e4 to 9d9038b Compare June 2, 2026 22:34

blp enabled auto-merge June 2, 2026 22:35

[ci] apply automatic fixes

4e8d156

Signed-off-by: feldera-bot <feldera-bot@feldera.com>

blp added this pull request to the merge queue Jun 2, 2026

github-merge-queue Bot removed this pull request from the merge queue due to no response for status checks Jun 3, 2026

blp added this pull request to the merge queue Jun 3, 2026

Merged via the queue into main with commit 07ce4ee Jun 3, 2026
2 checks passed

blp deleted the samply branch June 3, 2026 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Samply#6344

Samply#6344
blp merged 5 commits into
mainfrom
samply

blp commented May 29, 2026

Uh oh!

mythical-fred left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mythical-fred May 29, 2026

Uh oh!

mihaibudiu left a comment

Uh oh!

Uh oh!

mihaibudiu May 29, 2026

Uh oh!

blp May 29, 2026

Uh oh!

ryzhyk left a comment

Uh oh!

mythical-fred left a comment

Uh oh!

Uh oh!

mythical-fred left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

blp commented May 29, 2026

Describe Manual Test Plan

Uh oh!

mythical-fred left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mythical-fred May 29, 2026

Choose a reason for hiding this comment

Uh oh!

mihaibudiu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mihaibudiu May 29, 2026

Choose a reason for hiding this comment

Uh oh!

blp May 29, 2026

Choose a reason for hiding this comment

Uh oh!

ryzhyk left a comment

Choose a reason for hiding this comment

Uh oh!

mythical-fred left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mythical-fred left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants