Skip to content

POC: Memory profiler allocation labels#62649

Draft
rudolf wants to merge 10 commits intonodejs:mainfrom
rudolf:poc-allocation-profiler-tags-v2
Draft

POC: Memory profiler allocation labels#62649
rudolf wants to merge 10 commits intonodejs:mainfrom
rudolf:poc-allocation-profiler-tags-v2

Conversation

@rudolf
Copy link
Copy Markdown
Contributor

@rudolf rudolf commented Apr 9, 2026

This is a POC for initial feedback. If we can get alignment within Node.js I could try to contribute the v8 changes upstream.

Summary

Adds the ability to tag sampling heap profiler allocations with string labels that propagate through async context (via CPED). This enables attributing memory usage to specific HTTP routes, tenants, or operations — something no JS runtime currently supports.

V8 changes

  • HeapProfileSampleLabelsCallback — embedder callback invoked on sampled allocations to retrieve labels from the current async context
  • AllocationProfile::Sample::labels — key-value pairs on each sample (behind V8_HEAP_PROFILER_SAMPLE_LABELS compile flag)

Datadog's Attila Szegedi proposed a similar label mechanism for CPU profiling on v8-dev (July 2025). V8 team (Leszek Swirski) indicated they would review non-invasive patches behind #ifdefs. This PR applies the same approach to heap profiling, which is simpler. Everything runs on the allocation thread with no signal-safety concerns.

Node.js changes:

  • v8.withHeapProfileLabels(labels, fn) — runs fn with labels that propagate across await
  • v8.setHeapProfileLabels(labels) — sets labels for current async scope (for framework middleware patterns)
  • v8.getAllocationProfile() returns samples[].labels and per-label externalBytes (Buffer/ArrayBuffer)
  • ProfilingArrayBufferAllocator tracks external allocations per label (single atomic load overhead when disabled)

#62273 landed the SyncHeapProfileHandle API with Symbol.dispose support. The labels API proposed here is complementary, it adds context (which route/tenant) to the samples that SyncHeapProfileHandle already collects. A follow-up could integrate withHeapProfileLabels as a method on the handle.

Motivation

In multi-tenant or multi-route Node.js servers, a memory spike today tells you how much memory grew but not what caused it. Operators resort to code inspection or heap snapshots but these don't scale to collecting data over long timespans for large deployments. With labeled heap/external memory profiling, you can answer "route /api/search accounts for 400MB of the 1.2GB heap" directly from production telemetry (e.g. via OTel).

This mirrors Go's pprof.Labels capability

Overhead

20-run benchmark (two-server realistic HTTP workload):

  • Sampling profiler alone: 0.6% (not statistically significant)
  • Sampling + labels: 2.2% total (p<0.01)
  • When disabled: zero overhead (no code path changes)

Test plan

  • V8 cctests for label callback
  • JS tests: label propagation across await, concurrent contexts, setHeapProfileLabels, external memory tracking, GC cleanup
  • Micro and macro benchmarks in benchmark/v8/ and benchmark/http/

@nodejs-github-bot
Copy link
Copy Markdown
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/performance
  • @nodejs/security-wg
  • @nodejs/v8-update

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Apr 9, 2026
Comment thread src/node_v8.cc Outdated
// This happens when --experimental-async-context-frame is not set on
// Node.js 22, causing all contexts to map to Smi::zero() (address 0).
if (cped.IsEmpty() || cped->IsUndefined()) return;
uintptr_t addr = node::GetLocalAddress(cped);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storing in binding data by the CPED address won't work at all. Because all AsyncLocalStorage contexts are combined into a single AsyncContextFrame map, any changes to any contexts will change what this value is, even if the particular store you are interested in has not changed at all within that map frame.

You would need to have V8 capture the CPED value at the time of the sample and store that on the heap profile itself alongside the samples, then use that actual AsyncContextFrame instance to look up what the corresponding data was in that frame for the label store.

@rudolf rudolf force-pushed the poc-allocation-profiler-tags-v2 branch from 8743634 to 302bebe Compare April 10, 2026 09:22
Copy link
Copy Markdown
Contributor

@szegedi szegedi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting, I had to take a look since I was mentioned in the PR description itself 😄. I generally like the direction this is going in, long term we can probably replace Datadog's heap profiler that directly wraps V8 heap profiler with this and reduce our maintenance surface. This solution indeed looks like it can only be implemented in Node.js itself, and not as an add-on since the v8::ArrayBuffer::Allocator instance is a global Isolate::CreateParams setting so it's controlled by the embedder, that is, Node.js.

v8::Local<v8::Value> context =
v8_isolate->GetContinuationPreservedEmbedderData();
if (!context.IsEmpty() && !context->IsUndefined()) {
sample->cped.Reset(v8_isolate, context);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you get the value associated with the AsyncLocalStorage from the AsyncContextFrame here and only store that? Essentially the additional step from BuildSamples. You'll be retaining in memory all ALSes this ACF references as keys, some might have large retained set themselves. If you can safely call GetContinuationPreservedEmbedderData (which creates a v8::Local) I'd think you can also safely get a local to the ALS key from its global, and call v8::Map::Get on context too? (Or direct V8 hashmap reading like I suggested in that other comment in ProfilingArrayBufferAllocator::FindCurrentLabels)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point about memory retention. You're right that storing the entire CPED keeps all ALS stores alive as long as the sample exists, not just the labels store 💣 💥

OrderedHashMap::FindEntry is a great suggestion!

Comment thread src/api/environment.cc Outdated
// BackingStore::Allocate inside the ArrayBuffer constructor).
// Use AsArray() which reads the internal backing store directly without
// calling JS builtins, then iterate entries by identity comparison.
v8::Local<v8::Array> entries = frame->AsArray();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the memory requirement of this AsArray call? It sounds like it'd have to construct a whole new array?
You're lucky that you have a whole embedded copy of V8, so you can use existing internals, something like this roughly sketched might work:

#include "src/objects/js-collection.h"
#include "src/objects/ordered-hash-table.h"

// Given a v8::Local<v8::Map>, get to the internal table:
i::Tagged<i::JSMap> js_map = *Utils::OpenDirectHandle(*frame);
i::Tagged<i::OrderedHashMap> table = i::Cast<i::OrderedHashMap>(js_map->table());

// no-JS lookup in the table:
i::InternalIndex entry = table->FindEntry(isolate, *Utils::OpenDirectHandle(*als_key));
if (entry.is_found()) {
  i::Tagged<i::Object> value = table->ValueAt(entry);
  // go back from Tagged to Local:
  v8::Local<v8::Value> val = Utils::ToLocal(i::direct_handle(value, i_isolate));

  // use val as before...
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah AsArray() allocates a full JS Array and copies the Map backing store. I moved away from AsArray() to Map::Get() in an unpushed revision, but your OrderedHashMap::FindEntry approach would be even better.

Since we're already modifying V8 source in deps/v8/src/profiler/ and have access to internals from src/ as well, I can adopt this pattern in both locations:

  1. SampleObject (allocation time) — extract ALS value, store as Global on Sample
  2. ProfilingArrayBufferAllocator::TrackAllocate — same pattern for ArrayBuffer tracking

The read-time callback in GetAllocationProfile then receives the already-extracted flat array, making it trivial (just string conversion).

@@ -11,6 +11,11 @@
#include <unordered_set>
#include <vector>

#ifdef V8_HEAP_PROFILER_SAMPLE_LABELS
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I presume this'll need upstreaming, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I'm hoping if we can show that Nodejs would definitely use this feature they'd be more open to accepting it

Copy link
Copy Markdown
Contributor Author

@rudolf rudolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting, I had to take a look since I was mentioned in the PR description itself 😄. I generally like the direction this is going in, long term we can probably replace Datadog's heap profiler that directly wraps V8 heap profiler with this and reduce our maintenance surface.

@szegedi Thanks for popping by! Your thread sparked this idea and made me think maybe it's not all that hard (at least for memory profiling, sounds like CPU profiles might be a different beast).

v8::Local<v8::Value> context =
v8_isolate->GetContinuationPreservedEmbedderData();
if (!context.IsEmpty() && !context->IsUndefined()) {
sample->cped.Reset(v8_isolate, context);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point about memory retention. You're right that storing the entire CPED keeps all ALS stores alive as long as the sample exists, not just the labels store 💣 💥

OrderedHashMap::FindEntry is a great suggestion!

Comment thread src/api/environment.cc Outdated
// BackingStore::Allocate inside the ArrayBuffer constructor).
// Use AsArray() which reads the internal backing store directly without
// calling JS builtins, then iterate entries by identity comparison.
v8::Local<v8::Array> entries = frame->AsArray();
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah AsArray() allocates a full JS Array and copies the Map backing store. I moved away from AsArray() to Map::Get() in an unpushed revision, but your OrderedHashMap::FindEntry approach would be even better.

Since we're already modifying V8 source in deps/v8/src/profiler/ and have access to internals from src/ as well, I can adopt this pattern in both locations:

  1. SampleObject (allocation time) — extract ALS value, store as Global on Sample
  2. ProfilingArrayBufferAllocator::TrackAllocate — same pattern for ArrayBuffer tracking

The read-time callback in GetAllocationProfile then receives the already-extracted flat array, making it trivial (just string conversion).

@@ -11,6 +11,11 @@
#include <unordered_set>
#include <vector>

#ifdef V8_HEAP_PROFILER_SAMPLE_LABELS
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I'm hoping if we can show that Nodejs would definitely use this feature they'd be more open to accepting it

rudolf added 10 commits April 15, 2026 16:15
Add V8 API for attaching embedder-defined labels to sampling heap
profiler samples. Labels propagate through async context via CPED
and are resolved at profile-read time.

Signed-off-by: Rudolf Meijering <skaapgif@gmail.com>
Add Node.js C++ bindings that wire up the V8 sample labels API.
Handles callback registration, ALS key setup, and cleanup on worker
termination.

Signed-off-by: Rudolf Meijering <skaapgif@gmail.com>
Add JS API for labeling heap profiler samples via AsyncLocalStorage.
Labels are pre-flattened at set time to avoid V8 property access
during resolution.

Signed-off-by: Rudolf Meijering <skaapgif@gmail.com>
Cover basic labeling, multi-key labels, async propagation, worker
cleanup, and C++ callback tests.

Signed-off-by: Rudolf Meijering <skaapgif@gmail.com>
Document the new labels API on startSamplingHeapProfiler,
getAllocationProfile, withHeapProfileLabels, and
setHeapProfileLabels.

Signed-off-by: Rudolf Meijering <skaapgif@gmail.com>
Measure per-allocation overhead and HTTP throughput impact across
profiling modes.

Signed-off-by: Rudolf Meijering <skaapgif@gmail.com>
Expose a public API to look up ALS values from the CPED map, so the
allocator can resolve labels without duplicating V8 internal logic.

Signed-off-by: Rudolf Meijering <skaapgif@gmail.com>
Track per-label Buffer/ArrayBuffer backing store allocations. The
allocator is installed as a delegate when profiling is active, with
zero overhead otherwise.

Signed-off-by: Rudolf Meijering <skaapgif@gmail.com>
Cover per-label attribution, GC cleanup, multi-key labels, and JSON
serialization for externalBytes.

Signed-off-by: Rudolf Meijering <skaapgif@gmail.com>
Document the externalBytes field in getAllocationProfile output.

Signed-off-by: Rudolf Meijering <skaapgif@gmail.com>
@rudolf rudolf force-pushed the poc-allocation-profiler-tags-v2 branch from 302bebe to da6af28 Compare April 17, 2026 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants