Skip to content

stream: rewrite WHATWG Stream in C++#63872

Draft
mcollina wants to merge 45 commits into
nodejs:mainfrom
mcollina:webstreams-cpp
Draft

stream: rewrite WHATWG Stream in C++#63872
mcollina wants to merge 45 commits into
nodejs:mainfrom
mcollina:webstreams-cpp

Conversation

@mcollina

@mcollina mcollina commented Jun 12, 2026

Copy link
Copy Markdown
Member

This PR is an experiment to see if moving WHATWG Stream to C++ is viable. This was done by a combination of Claude Opus 4.8 and Fable 5. I will add a full review guide later on.

This is not ready for review, but I'm opening to see what's feeling around this sort of change, and consider if we want to move it forward. My C++ is not great, so I would miss a lot of things.

These are the benchmarks on my machine against main:

Benchmark Configuration Improvement vs JS impl Confidence
pipe-to 16 HWM configs (512…4096 × 512…4096) +175.7…+184.5% *** (16/16)
read-buffered bufferSize=1 +61.2% ***
read-buffered bufferSize=10 +86.2% ***
read-buffered bufferSize=100 +84.5% ***
read-buffered bufferSize=1000 +92.1% ***
async-iterator n=100k +77.0% ***
readable-read byob +57.9% *** (±3.0%)
readable-read normal +38.2% ***
js_transfer ReadableStream +3.6% ***
js_transfer TransformStream +3.0% **
js_transfer WritableStream +0.7% (parity) n.s.
creation WritableStream +108.5% (n=500k) ***
creation TransformStream +96.4% (n=500k) ***
creation ReadableStream +6.9% (n=500k) ***
creation ReadableStream.tee +22.8% (n=500k) ***
creation ReadableStreamDefaultReader +24.9% (n=500k) ***
creation ReadableStreamBYOBReader +25.0% (n=500k) ***

mcollina and others added 30 commits June 8, 2026 20:00
Phase 1 + Phase 2 (C++ object model) of the WHATWG Streams rewrite from JS to
C++.

Phase 1: register a new `webstreams` internal binding
(src/streams/streams_binding.{h,cc}) as a SnapshotableObject, wired into all
binding registries and node.gyp. Builds, snapshots, and loads with no behavior
change.

Phase 2 (object model): src/streams/readable_stream.{h,cc} implements
ReadableStream, ReadableStreamDefaultController and ReadableStreamDefaultReader
in C++:
- value queue is a single JS Array (no per-chunk v8::Global) plus a C++
  std::deque<double> for sizes (no Number boxing);
- pending reads are a C++ deque of Promise::Resolver, resolved directly with no
  .then chains or read-request objects;
- stream<->controller<->reader relationships are stored in GC-traced internal
  fields, with all objects MakeWeak'd (applied after MakeBaseObject) so they
  live and die with their JS wrappers, mirroring the JS object graph's lifetime;
- a SizeMode enum recognizes the built-in CountQueuingStrategy /
  ByteLengthQueuingStrategy so the per-chunk size() crossing into JS is skipped.

The model is exercised in isolation via createReadableStream /
acquireReadableStreamDefaultReader binding entry points (pull/enqueue/close/read,
parked-read-then-fulfill, cancel, desiredSize backpressure, error propagation).
It is not yet wired to the public API, so existing tests are unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements the byte-stream half of the WHATWG ReadableStream rewrite in C++,
compiled-but-unwired (the public JS API still uses the JS implementation; the
unified flip is the next step).

Adds a shared StreamBaseObject base carrying a 1-byte Kind tag so a controller
or reader recovered from a GC-traced internal field can be safely downcast
(BaseObject::FromJSObject is an unchecked static_cast and RTTI is disabled).
ReadableStream now holds either a default or a byte controller, and either a
default or a BYOB reader, dispatched by tag.

New classes in src/streams/readable_stream.{h,cc}:
- ReadableByteStreamController: zero-copy byte queue (deque of owned
  BackingStores; enqueue transfers the view's ArrayBuffer via GetBackingStore +
  Detach) and pending pull-into descriptors filled/committed with pure-C++
  memcpy. The user-visible view is materialized once at fulfillment.
- ReadableStreamBYOBReader and ReadableStreamBYOBRequest.

Binding entries createReadableByteStream / acquireReadableStreamBYOBReader.
Default readers can read byte streams (PullSteps + autoAllocateChunkSize).

Validated by an isolation test (default-reader-on-bytes, BYOB respond /
respondWithNewView, autoAllocate, BYOB-on-closed, enqueue-fills-pending-BYOB,
minimumFill, partial-fill remainder, cancel, error, desiredSize) plus the
existing public suite (test-whatwg-readablebytestream*, test-webstream*,
20 files) — all green; snapshot builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements the WritableStream half of the WHATWG streams rewrite in C++,
compiled-but-unwired (the public JS API still uses the JS implementation; the
unified readable+writable+transform flip is the next step).

New src/streams/writable_stream.{h,cc} with WritableStream,
WritableStreamDefaultWriter, and WritableStreamDefaultController. The full
erroring/abort/in-flight state machine is ported faithfully:
- writeRequests as a deque of promise resolvers; close / inFlightWrite /
  inFlightClose as resolver slots (empty == the spec's "undefined" request);
  pendingAbort tracked on the stream.
- A PromiseSlot type models the writer's ready/closed {promise,resolve,reject},
  including the settled-without-resolver case.
- The close sentinel is modeled by a close_queued_ flag (close is always the
  last queue entry).
- The controller's AbortController is passed in from JS at setup; the signal
  getter and abort() delegate to it. size_mode_ keeps the built-in queuing
  strategy fast path.

The shared StreamBaseObject base, Kind tag, and SizeMode move into
streams_binding.h so readable and writable share them.

Binding entries createWritableStream / acquireWritableStreamDefaultWriter.
Validated by an isolation test (write+close, backpressure/desiredSize, size
strategy, close-flushes-queued-writes, abort, controller.error,
write-algorithm-rejects, signal, releaseLock, getWriter-locked) plus the
existing public suite (38 files incl. writablestream/transformstream/adapters/
transfer/compression) — all green; snapshot builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements the TransformStream half of the WHATWG streams rewrite in C++,
compiled-but-unwired (the unified readable+writable+transform flip is next).

New src/streams/transform_stream.{h,cc} with TransformStream and
TransformStreamDefaultController. TransformStream is pure orchestration over the
C++ readable+writable halves: it builds a C++ readable and writable via the
extracted helpers NewReadableStream / NewWritableStream (refactored out of the
readable/writable binding entries) and wires their controller algorithms with
small JS trampolines (created once at setup, carrying the transform stream as
their Data) that call back into C++ sink/source algorithms. Continuations that
must capture chunk/reason use a 2-element holder Array as the reaction Data.

Ported faithfully: backpressure coordination (backpressureChange promise,
SetBackpressure / UnblockWrite), SinkWrite (await backpressure then transform),
SinkClose / SinkAbort / SourceCancel (finishPromise + flush/cancel algorithms),
SourcePull, controller enqueue/error/terminate/performTransform, and the shared
start promise. The transform controller drives the readable/writable through
their already-public C++ operations.

Binding entry createTransformStream. Validated by an isolation test (identity,
mapping transform, flush-enqueue + close, terminate, controller.error,
transform-throws, desiredSize) plus the public suite (38 files) and all four
stream isolation suites — green; snapshot builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds ExposeReadableStreamConstructors / ExposeWritableStreamConstructors /
ExposeTransformStreamConstructors, called from CreatePerContextProperties, so the
JS layer can obtain the 11 native constructor functions (and their prototypes)
from the `webstreams` binding. The native prototypes are already spec-shaped:
methods and accessor getters are enumerable + configurable and the constructor
.name is correct, so the JS layer only needs to add Symbol.toStringTag, the
JS-only methods, and construction wrappers when the flip wires them up.

Groundwork only: the exposed constructors are not yet used by the public API
(still the JS implementation), so behavior is unchanged. Builds + snapshot +
full public suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lands the small, additive prerequisites the unified JS flip needs, kept
compiled-but-unwired so the tree stays green:

- C++: expose internal introspection over the existing (already-settled)
  stream state + closed-promise infrastructure as binding functions
  (readableStreamStateField/Disturbed/ClosedPromise,
  writableStreamStateField/ClosedPromise) rather than prototype properties,
  so the public WebIDL surface is unchanged. These back the node:stream
  interop hooks (kIsDisturbed/kIsErrored/kIsReadable/kIsWritable/
  kIsClosedPromise) once the JS layer is flipped.
- C++: fix ReadableStream/WritableStream closed_promise() to settle
  immediately when the stream is already in a terminal state, so a late
  requester (e.g. node:stream finished() on an already-closed stream) does
  not hang.
- JS: add util.extractSizeMode() to classify a strategy.size into the C++
  SizeMode enum, and export the built-in queuing-strategy size functions so
  they can be recognized by identity (skipping the per-chunk size() cross).

WPT streams: 69 passed / 1403 subtests / 0 unexpected. Public suite green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rewrites lib/internal/webstreams/{readablestream,writablestream,transformstream}
.js as thin wrappers whose instances ARE the native C++ objects, plus
transfer.js deferred-port support. The build is green and the public classes
work; WPT streams went from ~250+ failures to 24 unexpected + 1 hang.

This is an intentional checkpoint commit to preserve in-progress work — the
conformance bar (full WPT + parallel suite) is NOT yet met, so this must not be
treated as complete. Remaining failures are catalogued in the plan file
(pipeTo error/abort priority, byte tee hang, byte read detach, transform size
propagation, then-interception, tee error propagation, patched-global).

Highlights of fixes already in this commit:
- transformstream requires readable/writable so C++ transform halves get the
  grafted public prototypes
- ClassifyView distinguishes real Buffer from Uint8Array via prototype
- TypeError (not Error/RangeError) for released-reader/writer, byte brand,
  min<=0, non-ArrayBufferView, non-detachable transfer
- EnqueueInternal no longer loses size() throws to a TryCatch destructor
- WritableStream::Abort re-reads state after signalling (no assert)
- closed_promise settles immediately when already terminal
- pipeTo no longer drops the last read chunk or masks the real error; uses
  new writableStreamStateField / writableStreamCloseQueuedOrInFlight accessors
- subclassing honored via new.target in all three constructors

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drives the C++ flip to full WPT conformance: `node test/wpt/test-streams.js`
is green (1404 subtests, 0 unexpected), as are the WPT compression and
encoding suites. Down from ~45 unexpected failures + a crash + a hang.

Web IDL binding conformance (idlharness): accessor getters are named
"get <prop>"; operations carry correct `.length`; Promise-returning
operations and Promise-typed attribute getters REJECT (not throw) on a
foreign receiver via a HasInstance brand check + IllegalInvocationRejection
(removing the V8 receiver signature, which would crash FromJSObject on an
arbitrary object); respond() validates its required argument.

Byte streams: default reader Release now runs the byte controller's release
steps (re-tags the first pending pull-into "none") — fixes a crash and the
autoAllocate+releaseLock cases; InvalidateBYOBRequest detaches the handed-out
view's buffer (spec transfers the descriptor buffer on respond/enqueue);
enqueue() throws when the BYOB request buffer was detached;
respondWithNewView buffer-length mismatch -> ERR_INVALID_ARG_VALUE.

tee: byte-tee no longer hangs (BYOB read-into close steps surface the empty
view so the branch read settles); readableStreamCancel uses a lock-bypassing
native helper; default tee uses guarded internal enqueue/close/error and the
internal constructor (no WebIDL parse / global lookup -> survives a patched
Object.prototype); error propagation deferred one tick so the final chunk is
enqueued before branches error.

pipeTo: spec priority-ordered shutdown checks (isOrBecomesErrored/Closed)
backed by native stored-error accessors; abort actions guarded by state;
close-with-error-propagation no-ops on an already-closed dest.

Primordial safety: internal (pipeTo/tee) reads produce null-prototype results
(forAuthorCode) so a patched Object.prototype.then can't observe piping;
async iterator uses captured original read/cancel/releaseLock.

Transfer: a BaseObject with no native transfer mode but implementing the JS
transferable protocol (markTransferMode + @@ktransfer) is driven through that
protocol in node_messaging.cc — makes the C++ stream objects structuredClone/
worker-transferable.

Transform: size() throw propagates out of enqueue (TryCatch-cancel fix);
readable/writable brand-check with ERR_INVALID_THIS.

Parallel suite: 76/85 webstream/whatwg tests pass. The 9 remaining are
internal-coupled oracle tests probing the old `kState` structure via
--expose-internals (to be rewritten against the public API) plus the
systematic native brand-check parity (native "Illegal invocation" vs
ERR_INVALID_THIS) for controller methods.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop the V8 receiver signatures from every synchronous webstreams
operation/getter and brand-check internally, throwing an
ERR_INVALID_THIS-coded TypeError (via the new CheckReceiverInvalidThis
helper) instead of V8's bare "Illegal invocation". Promise-returning
operations already brand-checked and rejected; route their rejection
through ERR_INVALID_THIS too (IllegalInvocationRejection).

This matches Node's public surface, which reports ERR_INVALID_THIS for a
foreign receiver. idlharness still accepts the (still-TypeError)
behavior, so WPT stays green; the parallel test-whatwg-transformstream
controller brand-check assertions now pass.

WPT streams: 1404 subtests, 0 unexpected failures.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The flipped controller's custom inspect passed an empty data object, so
`inspect(controller, { depth: 0 })` rendered `... {}` instead of the
upstream `... [Object]`. Add an internal-only
`transformStreamControllerStream` binding and have controllerInspect
report `{ stream }`, matching the original JS implementation.

test-whatwg-transformstream now passes; WPT streams stays green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rewrite the white-box assertions to the public surface: capture the
controller via start(c) for the instanceof check, and use stream's
isErrored() for the post-abort state check. Add a
writableStreamControllerStream introspection binding so the controller's
custom inspect can report `{ stream }` (matching the original JS), fixing
the depth-0 `[Object]` rendering.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two changes:

1. node_messaging: JSTransferable::NestedTransferables registered a nested
   C++ stream half (a BaseObject with no native transfer mode that
   implements the JS transferable protocol) as the raw BaseObject, while
   the serializer later writes it through its idempotent JSTransferable
   wrapper. The two never matched, so transferring a TransformStream threw
   "Object that needs transfer was found in message but not listed in
   transferList". Bridge the nested BaseObject to its JSTransferable
   wrapper, mirroring the serializer delegate. TransformStream transfer
   (postMessage/worker) now works.

2. Introspection: add a readableStreamController(stream) binding and
   readableStreamState/readableStreamStoredError +
   writableStreamState/writableStreamStoredError lib helpers, then rewrite
   test-whatwg-readablebytestream, test-whatwg-webstreams-transfer and the
   four adapters-to-* tests off the removed stream[kState] structure.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pipeTo args

Rewrite the last white-box test off the removed stream[kState] structure:
- state/storedError checks -> readableStreamState/readableStreamStoredError
- post-cancel "algorithms cleared" -> observable closed state
- reader<->stream identity + readRequests.length -> the locking/rejection
  behavior already asserted alongside
- releasing the async iterator's internal reader -> new
  readableStreamReleaseReader(stream) introspection binding
- drop the two blocks that drove now-removed internal byte-controller
  helpers; replace with the equivalent public post-cancel behavior

This block was never reached before (the test used to crash on kState at
line 326), which exposed a real bug: the exported readableStreamPipeTo
called the native reader-acquire (a CHECK on receiver type) without first
validating its arguments, so readableStreamPipeTo(1) aborted instead of
rejecting. Validate source/dest up front.

All 39 webstream parallel tests pass; WPT streams/compression/encoding green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The flip's JS-transferable bridge in the message serializer dereferenced
`host_object->GetTransferMode()` unconditionally. For a transfer-list
entry whose native object was already freed — e.g. a detached
MessagePort, where `BaseObject::Unwrap` returns nullptr — this is a null
dereference and crashed (SIGSEGV in Message::Serialize). main never
dereferences there: its detached-port check leads with `!host_object ||`.

Guard all three bridge sites (transfer-list processing, the serializer
delegate path is already null-safe via its own deref, and
NestedTransferables) with a `host_object`/`base` null check so a freed
entry falls through to the existing detached-port error path.

Regression test test-worker-message-port-transfer-closed now passes
(was a 100% SIGSEGV); full worker/messaging suite (148) green, webstreams
parallel suite green, WPT streams 1404 subtests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two `make test` failures surfaced:

1. test-webapi-sharedarraybuffer-rejection: the C++ byte path lost the
   SharedArrayBuffer guard the JS implementation had. A SAB cannot be
   transferred/detached, so ReadableByteStreamController.enqueue() and
   ReadableStreamBYOBReader.read() must reject a SAB-backed view with
   ERR_INVALID_ARG_VALUE. Re-add the check (throw for enqueue, reject for
   read) before the buffer is otherwise used.

2. test-blob: read the blob stream's controller.desiredSize via the
   removed stream[kState].controller; switch it to the
   readableStreamController introspection binding.

WPT streams green; both tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The per-pull hot path allocated two V8 Functions on every CallPullIfNeeded
(the fulfil/reject reactions for the user pull() promise), via Function::New
per chunk. Cache them per-controller (Data == the controller wrapper),
created on first pull and reused thereafter; reset in ClearAlgorithms to
break the controller<->wrapper cycle on terminal states.

Effect (vs JS baseline, 30 runs): readable-read byob -42% -> -5%,
normal -77% -> -49%, async-iterator -81% -> -64%, pipe-to -81% -> -75%.
WPT streams (1404) + 39 webstream parallel tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Profiling the value-stream read loop showed ~30% of cycles outside V8
internals went to recovering related stream objects from GC-traced
internal fields (GetInternalField + FromJSObject per traversal, several
times per chunk) and to assembling { value, done } read results
property-by-property (re-interning the property names in the string
table and walking the map-transition machinery on every read).

- Mirror the GC-traced relationship fields (stream<->controller,
  stream<->reader/writer, controller<->byobRequest, transform halves)
  into raw C++ pointer caches updated at the exact sites the traced
  fields are written. The traced field keeps the target's wrapper alive
  while set, so a cache can never dangle; accessors become inline
  pointer reads.
- Build read results by cloning per-realm boilerplate objects (one per
  done x prototype combination, cached on the webstreams BindingData):
  Object::Clone is a flat heap copy preserving the boilerplate's map, so
  the hot path skips interning and transitions; only a non-undefined
  `value` is written, as an own-property overwrite. Note done=true
  results may carry a value (a BYOB read's partially-filled view on
  close), so the write is keyed off the value, not `done`. Boilerplates
  are built with CreateDataProperty so a patched Object.prototype
  accessor cannot corrupt them.

Value-stream read throughput improves ~70% (680k -> 1.16M reads/s on
the readable-read benchmark loop). WPT: streams 1404, compression 338;
parallel test-whatwg-*/test-webstream* and worker message tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
pipeTo's per-chunk cost was dominated by allocations: a read promise,
a { value, done } result object and a microtask per chunk on the read
side, and two freshly allocated promise-reaction Functions per chunk on
the write side (the same per-pull allocation already eliminated for the
readable controller).

- Add a readableStreamFastDequeue binding that synchronously dequeues a
  buffered chunk from a default (value) stream, mirroring the buffered
  branch of the default controller's PullSteps (dequeue, then
  close-if-drained or pull). pipeTo drains buffered chunks through it
  in a tight loop - re-checking writer backpressure before each write -
  and only falls back to reader.read() when the queue is empty, so the
  promise/result machinery runs once per drain boundary instead of once
  per chunk. How chunks move through a pipe is unobservable per spec;
  this matches the previous JS implementation's internal fast path.
- Cache the writable controller's write promise-reaction functions on
  the controller (created on first write, Data == controller wrapper),
  mirroring the readable pull-reaction cache; the cycle is broken in
  ClearAlgorithms.

pipe-to throughput improves 2.2x (266k -> 592k chunks/s on the pipe-to
benchmark loop). WPT: streams 1404 green; parallel test-whatwg-*,
test-webstream*, test-stream-* green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stream construction paid two per-creation reaction Function allocations
plus a wrapped-start call for the common case of a start algorithm that
returns undefined, and acquiring a reader eagerly allocated its closed
promise resolver even though most readers never touch `closed`.

- Start fast path: when the start algorithm returns a non-object (no
  promise to await, no thenable to chase), resolve the start promise
  with the controller wrapper itself and attach a single per-realm
  cached reaction that recovers the controller from the fulfilment
  value and dispatches by kind tag. Preserves the "started in a
  microtask" timing; an object return value keeps the existing
  ThenReact slow path so a thenable's `then` is only read once.
- Readers' closed promise is now lazily materialized: acquiring a
  reader records only the settlement state (pending/resolved/rejected +
  reason, including the always-rejected post-release state); the
  resolver is created and settled on first access.

new ReadableStream() improves ~80% (200k -> 362k/s) and getReader()
~17% on the creation microbenchmarks. WPT streams 1404 and parallel
test-whatwg-*/test-webstream* green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The pull and write algorithms were wrapped per stream in a JS `async`
adapter so the C++ controllers always received a promise; for the
common synchronous algorithm that meant a wrapper frame, an implicit
promise and two reaction jobs on every pull/write.

Pass the user's raw pull/write to the native controllers instead,
invoked with the underlying source/sink as receiver. The controller now
handles the four return shapes itself:
- promise: reacted to with the per-controller cached reactions
  (unchanged);
- non-object (the common synchronous case): FinishPull/OnWriteFulfilled
  is enqueued directly as a microtask via the per-controller cached
  reaction function - one API call, no promise; ordering is identical
  since CallableTask and PromiseReactionJob share the microtask queue
  (a promise-resolution round trip benchmarked slower than the JS async
  wrapper it replaced);
- thenable object: full promise resolution so its `then` is honored
  exactly once;
- synchronous throw: behaves as a rejected pull/write promise.
An absent pull/write skips the call entirely. close/cancel/abort remain
wrapped (cold paths). Callability is still validated at construction.

readable-read +12%, pipe-to +14%, creation +8% on the benchmark loops;
the value-read and pipe-to gaps vs the JS implementation shrink to
single digits. WPT: streams 1404, compression 338, encoding 3822;
parallel whatwg/webstream/stream-interop/worker suites green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- ReadableStream async iterator: hoist the per-chunk read reactions to
  per-iterator closures, return the read result (already a fresh
  { value, done } object) as the iterator result instead of re-wrapping
  it, and skip the serialization chaining promise when the previous
  request has already settled. An outstanding-request flag is set at
  call time (not when the steps run) so un-awaited next()/return()
  sequences still resolve in strict call order; return() always chains.
- TransformStream: the write/close/abort/pull/cancel trampolines are now
  shared per realm, recovering the transform stream from their receiver
  (the controllers invoke algorithms with algo_receiver, set to the
  transform's wrapper), instead of five per-creation Function
  allocations; only start keeps a per-instance Data binding since it is
  invoked with an undefined receiver. The close/abort/cancel algorithm
  invocations now pass algo_receiver as `this` (the public streams'
  wrapped algorithms ignore their receiver).

async-iterator throughput +13%, TransformStream creation gap vs the JS
implementation narrows from -44% to -25%. WPT streams 1404 (including
the async-iterator [no awaiting] ordering tests) and parallel
whatwg/webstream/stream suites green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the per-chunk fast-dequeue + writer.write() sequence with a C++
pipe pump. pipeTo's step now calls a single pipePump binding that moves
buffered chunks straight from the readable's value queue into the
writable's write queue: no read promise, no { value, done } result, and
no per-write request resolver (each write occupies an untracked request
slot). On destination backpressure the pump arms itself and every write
completion re-enters the transfer loop from C++, so a steady-state pipe
moves chunks without touching JS at all; pipeTo's step parks on a stall
promise that settles when the pump stops for any reason. While the pump
is armed, the writer.ready pending/resolve churn is suppressed (nothing
can observe pipeTo's internal writer) and re-synced on disarm.

Shutdown waits on a new flush promise (resolved once every queued and
in-flight write has settled or the stream errored) instead of per-write
promises, and disarms the pump so no chunks move after shutdown begins,
including when shutdown starts from user code running synchronously
inside the pump. The JS closeWithErrorPropagation shim gains the spec's
dest-errored branch, reachable now that fast-path write failures no
longer surface through a tracked write promise.

pipe-to benchmark: ~860k vs ~755k chunks/s (+13%) at highWaterMark 1
(previously -22% vs the JS implementation); parity at large HWMs where
user callbacks dominate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…read

reader.read() on a default reader is now a thin JS wrapper over a
readerFastRead binding that completes every read in one JS->C++
crossing. When a chunk is buffered, the binding returns the raw chunk
and the wrapper builds the { value, done } result and already-resolved
promise in JIT code — both substantially cheaper than their C++ API
equivalents (Promise::Resolver::New + Resolve + boilerplate clone).
Every other case (parked read, closed/errored stream, byte controller,
tee's internal non-author reader, which needs null-prototype results)
performs the full native read inline in the same crossing and returns
its promise.

The two return shapes are disambiguated by a Promise.prototype check:
chunks whose prototype chain contains the realm's original
%Promise.prototype% (cached primordial-safely from a freshly created
promise) are never returned raw — the binding builds their result
promise natively — so the wrapper's test cannot misclassify. Only a
foreign receiver falls back to the native prototype method, for its
Web IDL brand-check rejection. The async iterator's nextSteps uses the
same binding, turning a buffered chunk directly into the iterator
result with no read promise or reaction chain.

readable-read-buffered: -33% -> ~-4% vs the JS implementation;
readable-async-iterator: -40% -> ~-13%; readable-read parity preserved.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ReadableStreamReaderGenericRelease constructed the "reader was
released" TypeError eagerly on every releaseLock(), and constructing a
JS error captures a stack trace — over half of all cycles in a
getReader()/releaseLock() loop. Record the rejection as a new
ClosedState::kRejectedReleased instead and build the error only when a
materialized closed promise (or an outstanding read request) can
actually observe it; closed_promise() materializes it on first access.
Applies to both the default and BYOB readers.

getReader()+releaseLock() churn: 140k -> 1.30M ops/s (the JS
implementation does 815k).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
std::deque allocates its map plus a ~512-byte node block in its
constructor, and every reader, writer-side stream and controller
embedded one or more of them (read/read-into/write request queues and
the chunk-size queues) — the single largest cost in reader creation
profiles after the object allocation itself. Replace the four pure-FIFO
members with a vector-backed FifoQueue that allocates nothing until the
first push (most of these queues are never pushed to), pops via a head
index, and compacts the dead prefix once it dominates.

reader creation: 1.05M -> 1.83M/s (JS impl: 2.06M);
getReader()+releaseLock() churn: 1.30M -> 1.68M/s (JS impl: 834k);
ReadableStream and WritableStream creation also improve (one fewer
512-byte malloc per embedded queue).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PromiseSlot materialized a V8 promise (and, on release, the
"writer was released" TypeError with its stack capture) eagerly, even
though most writers never have their ready/closed observed: writer
setup allocated up to two resolvers and releaseLock() built rejected
replacement promises plus the error unconditionally. The slot now
records its state (pending / resolved / rejected-with / released)
without touching V8 and materializes on first promise() access, where
rejected promises are also marked handled; once observed it is
sticky-materialized and behaves as before. The released error is built
lazily and shared between ready and closed via the writer so both
reject with the same error object, per spec.

getWriter()+releaseLock() churn: 91k -> 2.0M ops/s (the JS
implementation does 417k); writer creation drops two resolver
allocations.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Convert ReadableStreamDefaultReader and ReadableStreamBYOBReader from
BaseObject to cppgc-managed wrappers (CPPGC_MIXIN). Readers are the
stream objects created and discarded in bulk (getReader()/releaseLock()
churn, tee, pipeTo, async iteration), and the BaseObject model costs a
malloc, a persistent global handle and a weak-callback pass per
instance; cppgc replaces those with a bump allocation traced from the
wrapper, and dead readers are reclaimed by the sweeper with no
per-object callback.

- V8 references (the pending read/read-into request resolvers and the
  lazily-materialized closed promise) move from v8::Global to
  v8::TracedReference: a v8::Global member would dispose global handles
  in the sweep-time destructor (unsafe) or require a pre-finalizer
  (which taxes every GC). Settled requests Reset() their cells eagerly
  since ~TracedReference deliberately does not free them.
- Parked requests are written with emplace-empty + Reset() — an
  assigning store — never the TracedReference(isolate, local)
  constructor: the constructor is an initializing store, which during
  incremental marking gets neither a markbit nor a write barrier, so a
  request parked on an already-traced reader would be zapped by
  ResetDeadNodes at the end of marking (caught as an intermittent
  segfault in the compression WPT; reproduces deterministically under
  --stress-incremental-marking).
- The pending-request FifoQueue mutates as reads park and settle, so
  the readers' Trace() defers concurrent marking to the mutator thread
  via DeferTraceToMutatorThreadIfConcurrent.
- ReadableStream's reader cache becomes a void* discriminated by an
  explicit kind byte (the readers no longer carry the StreamBaseObject
  kind tag).

getReader()+releaseLock() churn: 1.59M -> 1.82M/s (+14%), BYOB
1.76M -> 1.98M/s (+12%). Official creation benchmark at n=500k:
DefaultReader +6.4% (*), BYOBReader +4.9% (***). Read paths at parity
except readable-read byob -2.0% (**): a parked read-into request's
TracedReference create/Reset is slightly dearer than the Global it
replaces (byob reads remain ~+29% over the JS implementation).

WPT streams/compression/encoding (including 20x compression and
--stress-incremental-marking runs) and the parallel
whatwg/webstream/stream suites stay green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Realm::TrackCppgcWrapper allocates a CppgcWrapperListNode (malloc) plus
a cppgc::WeakPersistent per wrapper. The list exists to run Clean() on
still-live wrappers at Realm shutdown, to include them in the realm's
memory tracking, and to purge its own nodes after GC — none of which
applies to wrapper classes with trivial cleanup (default destructor, no
Clean() override).

Add CppgcMixin::Tracking::kUntracked so such classes can skip the list
entirely. Untracked wrappers must never touch realm_ during
destruction — a wrapper swept after Realm teardown outlives it — so the
destructor's purge-flag write is gated on tracked_ (untracked wrappers
have no list node to purge anyway).

Opt the webstreams readers out: they are created in bulk on hot paths
and their destructor is the default one. Interleaved A/B vs tracked:
getReader()+releaseLock() 1.91M -> 2.31M/s (+21%, +45% over the
BaseObject model), BYOB churn +21%, reader creation 1.90M -> 2.08M/s —
matching the all-JS implementation. Official creation benchmark
(n=500k, vs the BaseObject readers): DefaultReader +16.8% (***),
BYOBReader +16.1% (***); read paths at parity.

Validated: WPT streams/compression/encoding green, including 20x
compression repeats, --stress-incremental-marking runs, and worker
teardown with live untracked wrappers (test-worker*).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
pipeTo and tee called read/cancel/releaseLock/write/close/abort and the
closed/ready getters on their internal readers and writers through the
live prototypes, which author code may patch. Per spec these are
internal abstract operations and must not be observable.

Caught by WPT readable-streams/patched-global under
--stress-incremental-marking: a pipe's finalize drifted past its own
test and called a patched releaseLock() installed by the async-iterator
test, throwing inside the finalizer (intermittent in plain runs, ~1 in
2 under stress). The async iterator already used captured originals;
pipeTo and tee now do too — all reader/writer methods and the
closed/ready accessor getters are captured at module load and invoked
via FunctionPrototypeCall/ReflectApply, with byte tee passing the
kind-matching closed getter when it switches reader kinds.

Validated with 10x patched-global under --stress-incremental-marking
(previously failing ~1 in 2), 3x full WPT streams under stress, and the
usual gates.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A TransformStream always handed its halves a start trampoline returning
the shared start promise — an object — so the readable/writable
controllers never took the non-object start fast path and each paid
per-creation reaction Functions in their Setup. v8::Function::New
allocates a fresh SharedFunctionInfo per call; profiling a default
`new TransformStream({ transform })` loop showed ~21% of all cycles in
those three Function::New sites (two ThenReacts + the start trampoline).

When the transformer has no start() — the common case — the JS shim now
passes no start algorithm at all. The C++ side then skips the start
trampoline, the shared start resolver and the final start call, and each
half marks its controller started via the shared per-realm reaction.

The halves still reproduce the spec's observable timing: a transform
half's start promise adopts the inner start promise
(promiseResolvedWith(thenable)), settling `started` three microtask
jobs after construction — WPT transform-streams/errors observes that
depth against an abort queued right after construction. The fast path
adopts a fulfilled promise carrying the controller wrapper instead, so
the shared reaction fires at exactly the same depth with no per-creation
Functions.

TransformStream creation: 79k -> ~140k/s in a spot loop; official
benchmark (n=500k) -19.3% -> +83.1% (***) vs the JS implementation;
js_transfer TransformStream -10.6% -> -4.0%. No change for transformers
with a start() (rare), nor for plain Readable/WritableStream
construction.

WPT streams/compression/encoding (plain and under
--stress-incremental-marking) and the parallel whatwg/webstream/stream
suites stay green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mcollina and others added 15 commits June 11, 2026 11:37
The default controllers kept their value queues as a JS Array held in a
single v8::Global, with chunks stored through the C++ API. Once read()'s
fast path became a binding, the queue was only ever touched from C++, so
every enqueue/dequeue paid a full API element access (LookupIterator,
AddDataElement) — and at high-water-mark 1 the queue emptied on every
read, dropping the array and degenerating to an Array::New per chunk.
This was ~24% of all cycles in the read-buffered bufferSize=1 benchmark
and dominated the C++ pipe pump's per-chunk cost (~6 API element ops
across the source and destination queues).

Replace both queues with a FifoQueue of {v8::Global<v8::Value>, size}
entries: enqueue is a global-handle Reset plus vector emplace, dequeue a
Get plus eager Reset (pop_front alone would pin the consumed chunk until
prefix compaction). This also folds the separate sizes queue and the
head/size bookkeeping into the entry queue. Lifetime semantics are
unchanged: buffered chunks remain strong roots owned by the queue, and
the controllers are plain BaseObjects, so per-chunk Globals are disposed
deterministically.

Benchmarks vs the pre-flip JS implementation (compare.R, 10 runs, all
***): read-buffered bufferSize=1 -19% -> +39.6% (the former pathological
case), bufferSize 10/100/1000 parity -> +46..59%, pipe-to +11..15% ->
+183..195% across all 16 configs, async-iterator -5% -> +60.2%;
readable-read normal/byob unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CallPullIfNeeded modelled a missing pull algorithm as a pull returning
an already-resolved promise, paying a per-controller reaction Function
(a fresh SharedFunctionInfo via v8::Function::New) plus a microtask to
reset [[pulling]]/[[pullAgain]] — bookkeeping nothing can observe when
there is no pull to coalesce. Pull-less streams now return immediately.
This was the largest microtask cost in stream transfers, where pipeTo
parks a read on the (sourceless) transferred stream and every stream is
freshly created, so the per-controller reaction cache never amortized.

Also cache the noop reaction used by MarkHandled/CancelInternal on the
realm's BindingData instead of building a Function per call, and create
all internal reaction functions with ConstructorBehavior::kThrow so V8
skips the prototype/initial-map setup they never use.

js_transfer vs the pre-flip JS implementation (compare.R, 12 runs):
ReadableStream -1.4% -> +1.5% (n.s.), WritableStream -3.5% -> +0.1%
(n.s., parity), TransformStream -4.0% -> -2.6%. Microtask share of the
transfer loop drops from 3.7% to 1.4% of main-thread cycles.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Convert the writer from BaseObject to a cppgc-managed wrapper with
realm-tracking opted out, following the readers' conversion: a bump
allocation traced from the wrapper replaces malloc + persistent handle +
weak callback + realm list node. Writers are created and discarded in
bulk — every pipeTo (and so every stream transfer) acquires one.

The writer's lazy ready/closed PromiseSlots move from v8::Global to
v8::TracedReference (a cppgc object's sweep-time destructor must not
dispose persistent handles); all slot writes already go through
Reset(isolate, value), the assigning store that is safe during
incremental marking. The slots' handles mutate as promises settle, so
the writer defers concurrent tracing to the mutator thread, like the
readers' request queues.

getWriter()+releaseLock() churn: 2.0M/s -> 2.62M/s (+31%). js_transfer
WritableStream +0.97% / ReadableStream +1.42% vs the pre-flip JS
implementation (parity). All gates green: WPT streams 1404 / compression
338 / encoding 3822, plain and --stress-incremental-marking, 10x
compression repeats, parallel whatwg/webstream/stream/worker suites, and
worker teardown with 50k live untracked writers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Convert WritableStream and WritableStreamDefaultController from
BaseObject to cppgc-managed wrappers with realm tracking opted out,
completing the writable side (the writer converted previously). A
stream transfer allocates stream+controller pairs in bulk — the user
stream, the cross-realm peer, and the received stream — and each
BaseObject paid malloc + template-instance persistent handle + weak
callback + realm list node; the cppgc model is a bump allocation traced
from the wrapper.

All v8::Global members move to v8::TracedReference, including the
write-request queue and the chunk queue (whose entries get a traced
flavor of ValueQueueEntry until the readable controller converts too).
Both classes defer concurrent tracing to the mutator thread since their
queues mutate as writes park and chunks move. AddWriteRequest switches
from the TracedReference(isolate, local) emplace — an initializing
store with no write barrier during incremental marking, the exact race
previously found in the readers — to emplace-empty + Reset. The
write-request rejection drain Reset()s each popped entry so the dead
prefix cannot pin rejected resolvers.

The shared start-fulfilled reaction is split per wrapper model: the
cppgc-managed writable controller gets its own per-realm reaction
instead of adding a brand check in front of the readable controllers'
kind dispatch (which cost ~2.7% on ReadableStream creation in an
earlier draft of this change).

Benchmarks vs the pre-flip JS implementation: creation WritableStream
+91.3% -> +100.8% (***, n=500k); pipe-to unchanged (+172..187%, all 16
configs ***); js_transfer WritableStream/ReadableStream at parity
(n.s.). All gates green: WPT streams 1404 / compression 338 / encoding
3822, plain and --stress-incremental-marking, 15x compression repeats,
parallel whatwg/webstream/stream/worker suites, worker teardown with
50k live streams.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Convert ReadableStream, ReadableStreamDefaultController,
ReadableByteStreamController and ReadableStreamBYOBRequest from
BaseObject to cppgc-managed wrappers with realm tracking opted out,
completing the conversion of all per-stream objects (readers, writer,
and the writable side converted previously). Every stream construction
and every transfer allocates these in bulk; the cppgc model replaces
malloc + template persistent handle + weak callback + realm list node
with a bump allocation traced from the wrapper.

All v8::Global members move to v8::TracedReference; the value queue's
per-chunk entries become the traced flavor (TracedValueQueueEntry, now
shared with the writable controller). Tracing defers to the mutator
thread wherever the traced members mutate. The kind-tag dispatch that
relied on StreamBaseObject is replaced by per-type reaction callbacks —
each controller type binds its own pull/start/reject reactions and gets
its own per-realm shared start reaction, which is also cheaper than the
old switch. The stream's controller cache becomes a type-erased pointer
with an explicit kind byte, mirroring the reader cache.

Benchmarks vs the pre-flip JS implementation: js_transfer reaches
parity across all three payloads (ReadableStream +1.4%, WritableStream
-0.7%, TransformStream -1.1%, all n.s.); creation ReadableStream +4.2%
(n=500k, ***), DefaultReader +14.3%, tee +17.8%, WritableStream
+104.6%, TransformStream +77.8%; read-buffered +39..68%, byob +23.7%,
async-iterator +63.3%, pipe-to +164..174% (all ***). readable-read
'normal' pays ~4% for the conversion (traced-handle churn on the
parked-read path) — the one negative, against double-digit wins
elsewhere. All gates green: WPT streams 1404 / compression 338 /
encoding 3822, plain and --stress-incremental-marking (streams twice),
20x compression repeats, parallel whatwg/webstream/stream/worker
suites, worker teardown with 60k live streams holding buffered chunks.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Convert TransformStream and TransformStreamDefaultController from
BaseObject to cppgc-managed wrappers with realm tracking opted out —
the last two stream classes on the BaseObject model. All v8::Global
members (algorithms, finish/backpressure/start promise slots) move to
v8::TracedReference with mutator-thread tracing, and creation becomes a
bump allocation traced from the wrapper.

With every stream class now cppgc-managed, the StreamBaseObject kind-tag
base class has no users left and is removed; type discrimination is done
entirely by per-type reaction callbacks, brand checks at the boundary,
and explicit kind bytes next to the type-erased raw-pointer caches.

js_transfer TransformStream lands at exact parity with the pre-flip JS
implementation (-0.2% +-0.9, n.s.; ReadableStream +0.8%, WritableStream
-1.0%, both n.s.), closing the transfer family. TransformStream creation
+77.2% (n=500k, ***), spot keep-alive creation 151k -> 165k/s over the
previous commit. All gates green: WPT streams 1404 / compression 338 /
encoding 3822, plain and --stress-incremental-marking, 15x compression
repeats, parallel whatwg/webstream/stream/worker suites, worker teardown
with 40k live TransformStreams.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
readable-read normal sat at -10.9% vs the JS implementation (and -26%
at a 10x window), previously concluded to be a V8-level floor. Two
measurements reopened it: perf stat showed the C++ path executing the
same instruction count as the JS baseline but at IPC 1.90 vs 2.67 (the
API path is dependency-chained where JIT code is flat), and profiles
showed the steady state is the parked path - the await continuation is
enqueued before the pull-bookkeeping microtask, so every read parks and
the buffered fast path never engages.

Re-adopt the JS implementation's own structure for that path: the
read() wrapper gets a park sentinel back from readerFastRead, creates
the read promise (PromiseWithResolvers) plus one settle closure
`(value, done, isErr)` in JIT code, and parks it through a new trusted
readerParkRead binding. C++ settles the read with a single
Function::Call into the closure: the result object is built by JIT
StoreICs and the resolution runs through the builtin with an inline-
cached then-lookup - replacing Resolver::New, a boilerplate clone whose
value write re-ran a kConst->kMutable descriptor migration per clone,
and Resolver::Resolve, whose C++ then-lookup walks Object.prototype
with no inline cache.

A parked read request is now either a promise resolver (pipeTo, tee,
byte streams, the async iterator) or the settle closure, discriminated
by IsFunction(); the front request lives in a kParkedRead internal
field on the reader wrapper (a plain tagged store, traced with the
wrapper) with the TracedReference queue as overflow for concurrent
reads.

Also: pre-generalize the read-result boilerplates' value descriptor;
inline BindingData::Get as env->principal_realm() (the context walk +
embedder-data bounds check was ~4% of the read loop); take the binding
data store slot by reference in Realm::GetBindingData (the
BaseObjectPtr copy churned the refcount per lookup); skip the redundant
InternalFieldCount check when unwrapping behind a brand check or on
trusted reaction Data.

Two negative results are recorded in comments: a native
MicrotaskCallback for the sync-pull continuation loses (rooting the
cppgc controller wrapper requires a per-pull v8::Global arm/disarm pair
that costs more than the JS-function task it saves), and a
fulfill/reject closure pair loses half the win to two young-into-old
internal-field stores each paying the remembered-set barrier.

Benchmark results vs the pre-flip JS implementation (compare.R):

  readable-read normal        -10.9%  ->  +22.2% ***
  readable-read byob          +25.7%  ->  +32.7% ***
  read-buffered bufferSize=1  +39.6%  ->  +58.6% ***
  read-buffered 10/100/1000   +46..59 ->  +92.8..93.5% ***
  async-iterator              +63.2%  ->  +78.9% ***
  pipe-to (16 configs)        held at +181..194% ***
  js_transfer RS/WS/TS        parity held (all n.s.)
  creation (n=500k)           DefaultReader +28.5%, RS +9.5% ***

WPT: streams 1404 (plain and --stress-incremental-marking), compression
338 x15 plus 3 stress runs, encoding 3822; parallel whatwg/webstream/
stream/worker suites green; worker teardown with 80k live parked
closure reads (plain and stress).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
In the parked steady state (the spec-mandated regime for tight
read()/pull loops: the await continuation always precedes FinishPull,
so every read parks while pulling_ is set), each enqueued chunk
settled the pending read through a C++->JS Function::Call into the
read() wrapper's settle closure - a full Execution::Call/JSEntry
transition per chunk.

controller.enqueue is now a thin JS wrapper over a new
controllerEnqueueOrSettle binding. When the settle is safely
reorderable past EnqueueInternal's tail CallPullIfNeeded - i.e. when
pulling_ is set (an enqueue from inside a running pull, where it only
records pull_again_) or there is no pull algorithm (a no-op) - the
binding pops the closure and returns it, and the wrapper performs the
settle as a JIT call. All other cases (no pending read, resolver-kind
requests from pipeTo/tee/async-iterator/byte paths, throwing states)
complete inside the binding unchanged, which returns undefined.

A sync-settle park experiment (park a pure-C++ marker, run pull inside
the read crossing, answer the read directly) was implemented and
reverted: the parked regime is self-sustaining even at highWaterMark 0
because ShouldCallPull's read-request clause keeps the pull cycle one
read behind, so the marker only ever settled on cold reads.

readable-read normal: +22.2% -> +28.6% vs the JS baseline (spot n=1e6:
1.85M -> 2.06M reads/s vs the prior HEAD); async-iterator +83.2%;
read-buffered, byob and pipe-to (16/16) held. WPT streams 1404 (plain
and --stress-incremental-marking), compression 338 x10 plus x3 stress,
encoding 3822, parallel whatwg/webstream/stream/worker suites and the
worker-teardown stress all green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two structural costs dominated the respond-driven BYOB loop (the
fs/fetch-style pattern: pull writes into byobRequest.view and calls
respond(n)), which ran well behind the JS implementation even though
the official benchmark's enqueue-shaped byob row did not show it:

- Every parked BYOB read settled through Promise::Resolver::Resolve
  plus a result clone, with the C++ then-lookup walking
  Object.prototype with no inline cache. The byob read() is now a thin
  JS wrapper over byobReaderFastRead/byobReaderParkRead, mirroring the
  default reader's protocol: a synchronously-filled view returns raw
  (the wrapper builds { done, value } and the resolved promise in JIT
  code), and parked reads park a settle closure that C++ invokes with
  one Function::Call. Read-into requests are resolver OR closure,
  discriminated by IsFunction(); a done settle still passes the view
  (a closing BYOB stream hands back the partially-filled view). The
  byte tee keeps the raw native read (captured before the wrapper is
  installed): its internal reads need null-prototype results.

- Every pull created a cppgc-backed BYOBRequest: MakeGarbageCollected,
  wrapper traced refs and a template->function instantiation lookup
  per request. A BYOBRequest is now a JS-only wrapper - the GC-traced
  kController/kView internal fields carry everything, methods recover
  the controller by unwrapping kController after the brand check (the
  cleared field doubles as the invalidated brand), and the instantiated
  constructor is cached per realm.

Respond-driven loop: 0.44M -> 0.53M reads/s (+21%; the gap to the JS
baseline narrows from -34% to -18% - the remainder is per-read
ArrayBuffer materialization and crossing overhead, see the benchmarks
doc). Official readable-read byob: +38.8% vs the JS baseline (was
+32.7%); enqueue-shaped byob spot +6%; readable-read normal and
read-buffered held. WPT streams 1404 (plain and
--stress-incremental-marking), compression 338 x10 plus x3 stress,
encoding 3822, parallel whatwg/webstream/stream/worker suites, and
worker teardown with 4x20k live parked BYOB closure reads (plain and
stress) all green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A default (non-BYOB) reader on a byte stream - the fetch-body shape,
body.getReader() + read() - was excluded from the read() wrapper's
parked-closure protocol: ReaderFastRead punted every byte-controller
read to the native path, so each read paid Resolver::New, a result
clone and an un-IC'd C++ Resolver::Resolve, and each enqueue settled
it through a C++->JS Function::Call. The shape measured -17% vs the
JS baseline while the official suite (which has no such row) read
green everywhere.

ReaderFastRead now extends both fast paths to byte controllers behind
the same author-code gate: an empty queue returns the park sentinel
(readerParkRead dispatches to the byte park steps - auto-allocate
descriptor push, closure park, pull), and a buffered queue returns the
dequeued chunk's view raw for the wrapper to wrap in JIT code. The
take-then-settle order in ProcessReadRequestsUsingQueue is preserved
by splitting FulfillFront into TakeFrontReadRequest + SettleReadRequest
(the dequeue's drain bookkeeping can run pull reentrantly).

The byte controller's enqueue follows the default controller's
enqueue-or-settle inversion: a thin JS wrapper over a
byteControllerEnqueueOrSettle binding that, when the final settle
targets a wrapper closure and is reorderable past the tail
CallPullIfNeeded (pulling_ set or no pull algorithm), hands back a
[closure, view] pair - the settled value here is the C++-materialized
transferred view, not the caller's chunk, so the view crosses back
with the closure. This also covers the byob-reader commit path, so
the enqueue-shaped official byob row gains too.

The same inversion applied to byobRequest.respond() was implemented,
measured performance-neutral in an interleaved binary A/B (the pair
Array::New consumed the saved JSEntry transition), and dropped.

byte default-read spot loop: 0.63 -> 0.74 M reads/s (-17% -> -8% vs
the JS baseline; closure park +13%, enqueue defer +3%). Official rows
(compare.R, 10 runs): readable-read byob +23.5% -> +38.8%***,
readable-read normal +35.7%***, read-buffered +45...+92%***,
async-iterator +65.8%*** (all held). WPT streams 1404 (plain and
--stress-incremental-marking), compression 338 plus x5 stress,
encoding 3822, parallel whatwg/webstream suites and worker teardown
with 40k parked byte closure reads all green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Every write arriving under backpressure paid a fresh
v8::Function::New (a SharedFunctionInfo allocation per chunk) plus a
2-element holder array to carry (stream, chunk) into the
backpressure-change reaction.

The writable dispatches at most one write at a time - the next write
starts only after the previous sink-write promise settles, and that
promise settles only after the continuation has run - so the awaiting
chunk lives in a single TracedReference slot on the stream and the
continuation is created once per stream (data = the stream wrapper)
and reused for every chunk. The continuation takes the chunk
unconditionally on entry so the slot is free even when the wait ends
in an erroring writable.

Throughput-neutral in a direct binary A/B on a pipeThrough
passthrough loop (the loop is dominated by the spec's
backpressure-change promise machinery - two Resolver::New and one
API Then per chunk - which is the remaining lever there); this
removes the two per-chunk allocations and the holder indirection.
WPT streams 1404 (plain and --stress-incremental-marking),
compression 338 plus x3 stress, encoding 3822, transform/compose/
pipeline/transfer parallel suites green; js_transfer parity held
(RS +0.6 / TS -0.0 / WS -0.2, all n.s.).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The byte path carried pull-into destination buffers as owned
BackingStores, paying per read: a GetBackingStore + Detach round trip
at transfer-in (a shared_ptr control-block malloc per call), an
ArrayBuffer::New at commit, another ArrayBuffer::New + view per
byobRequest, and C++-Factory view materialization - the
respond-driven loop profiled ~18% Factory object construction and
~9% BackingStore shared_ptr churn against the JS baseline's all-JIT
allocation.

Descriptors now hold a real JS ArrayBuffer (a TracedReference while
parked in the pending deque, moved to a handle-scope Local when
shifted out for committing - a stack TracedReference is invisible to
marking and commits run JS) plus a cached data pointer for the fill
memcpys. The byob read() wrapper becomes a staged two-crossing
protocol: byobReaderReadStage validates and stages the geometry,
returning (viewTag << 1) | syncBit as a Smi; the wrapper transfers
the view's buffer with %ArrayBuffer.prototype.transferToFixedLength%
(V8-internal, no BackingStore API) and re-enters byobReaderParkRead
or byobReaderFillStaged - the sync flavor returns just a filled
length (0 <=> done), so a buffered read allocates nothing in the
binding. Parked settles hand (buffer, done, isErr, byteOffset,
length, viewTag, transfer) to the settle closure, which constructs
the result view via a JIT `new ctor(buffer, off, len)`; both read()
wrappers and the enqueue defer tuple share the protocol.

The byobRequest view now wraps the descriptor's own held buffer (no
fresh ArrayBuffer per pull) and marks it exposed. The spec's
respond/enqueue-entry re-transfers reduce to: nothing on the hot path
(the commit's transfer detaches the exposed view for free), a native
re-transfer where the descriptor stays pending (below-minimum
responds, enqueues with pending descriptors), and a plain detach
where its data is dropped - and commits of never-exposed buffers
skip the transfer entirely, handing the private buffer straight to
the result view (the transfer would be unobservable; this also keeps
the enqueue-shaped byob row from paying a new per-chunk transfer).
Resolver-kind settles (the byte tee, post-release paths) keep fully
native materialization.

Respond-driven byob spot loop: 0.52 -> 0.55 M reads/s (-16% -> -13%
vs the JS baseline); the profile's GetBackingStore and shared_ptr
release clusters are gone and result views build in
Builtins_CreateTypedArray. Official rows (10 runs): readable-read
byob +37.7%***, normal +34.4%***, read-buffered +46...+101%*** all
held; js_transfer parity held. What remains in the gap is the
per-pull BYOBRequest + request-view API materialization, the respond
settle's C++->JS call, and the V8-internal transfer machinery both
implementations pay. WPT streams 1404 (plain and
--stress-incremental-marking), compression 338 plus x5 stress,
encoding 3822, parallel whatwg/webstream suites, worker teardown
with 40k parked descriptor reads and stress-marking respond churn
all green.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Every pull in a respond-driven byte loop paid an API Factory
Uint8Array materialization inside the byobRequest getter, and every
byobRequest.view access after that paid a full API crossing (brand
check + internal-field read) just to return it - the request-view
machinery was ~5% of the loop after the byte-path redesign.

controller.byobRequest is now a JS getter over a
byobRequestGetOrCreate binding that returns the existing request,
null, or - on first access for a descriptor - a fresh
[request, buffer, byteOffset, byteLength] tuple. The wrapper
constructs the view with a JIT `new Uint8Array(...)` over the
descriptor's held buffer and caches it on the request as a JS own
property under a module-private symbol. byobRequest.view becomes a
plain property load while the request is valid - zero binding
crossings; possessing the symbol doubles as the Web IDL brand
(matching the pre-flip implementation's per-realm symbol-keyed
state). A detached cached view takes one cold crossing
(byobRequestIsInvalidated) to distinguish invalidation - the getter
returns null from then on - from a user-detached buffer, where the
spec keeps handing back the same view object; descriptor buffers
detach on exactly those two paths, so detachment is a sound cache
probe. The kView internal field is gone, invalidation is one field
write cheaper, and respond()'s detached pre-check inspects the front
descriptor's buffer directly (the same object the view wraps).

Respond-driven loop: 0.52 -> 0.54 M reads/s (+4.4% in a direct
interleaved binary A/B, 7/8 runs ahead); shapes that touch .view
several times per pull (fs.promises readableWebStream) save an API
crossing per access. Official rows: readable-read byob +40.4%***,
normal +36.1%***, both held. WPT streams 1404 (plain and
--stress-incremental-marking), compression plus x5 stress, encoding,
parallel whatwg/webstream/blob and filehandle-readablestream suites
all green; no new lint findings.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The transform's backpressure coordination ran on an internal
[[backpressureChangePromise]]: every toggle paid a Resolver::New and
an (un-IC'd) API Resolve, the readable's pull held the promise
through a ThenReact reaction pair, and each write arriving under
backpressure chained a derived promise off it - two resolvers, two
resolves, two reaction sets and a derived promise per chunk in a
passthrough loop, all to deliver two internal wakeups whose only
observable is their microtask position.

The change promise is gone. SourcePull flips backpressure and
returns a per-realm direct-pull marker: the readable controller
leaves the pull in flight with no promise and no reaction, and
SetBackpressure(true) delivers FinishPull by enqueueing the
controller's cached pull-fulfilled reaction as one microtask - the
exact FIFO position of the promise reaction it replaces (the same
EnqueueMicrotask equivalence the sync-pull fast path already relies
on). On the write side, SinkWrite parks the chunk and hands the
writable a resolver-backed promise; SetBackpressure(false) enqueues
the per-stream cached continuation as one microtask, which runs the
transform and resolves that promise with PerformTransform's result -
promise adoption keeps the writable's settle depth identical to the
old change-promise.Then chain, and an erroring writable rejects
within the continuation's microtask exactly as the old
reaction-throw did. The marker test lives at the head of
CallPullIfNeeded's thenable fallback (its coldest branch), so
non-transform pulls pay nothing.

pipeThrough(new TransformStream()) passthrough: 0.43 -> 0.56 M
chunks/s - parity with the JS baseline becomes +35%. js_transfer
held (RS +1.8 / TS +1.7 / WS +0.4, parity band). WPT streams 1404
(plain and --stress-incremental-marking), compression 338 plus x5
stress, encoding 3822 and the parallel whatwg/webstream suites all
green; smokes cover ordering, async transformers, transform throws
and writer aborts with a write parked on backpressure.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
clang-format over src/streams (clearing 42 of the 45 cpplint findings;
the remaining three were util.h/util-inl.h double includes) and a full
eslint pass over lib/internal/webstreams: ReflectApply with
array-literal arguments becomes FunctionPrototypeCall, read-result
literals go done-first, tee branches and from()'s stream become const,
JSDoc @returns added to the public constructors, non-ASCII punctuation
swept from comments, and unused/missing primordials imports fixed.
Zero eslint and cpplint findings remain in the flipped files.

No behavior change intended. WPT streams 1404 (plain and
--stress-incremental-marking), compression plus x3 stress, encoding,
and the parallel whatwg/webstream suites green; readable-read official
rows and the byte/transform spot shapes re-measured unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@nodejs-github-bot

nodejs-github-bot commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/realm
  • @nodejs/startup
  • @nodejs/streams

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Jun 12, 2026
@mcollina mcollina added the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label Jun 12, 2026
@mcollina

Copy link
Copy Markdown
Member Author

Adding @nodejs/tsc agenda as a discussion point, as it's a relatively high amount of C++ involved.

@aduh95

aduh95 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Can you remove the log files to make the line counter more representative?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. tsc-agenda Issues and PRs to discuss during the meetings of the TSC.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants