stream: rewrite WHATWG Stream in C++#63872
Draft
mcollina wants to merge 45 commits into
Draft
Conversation
Phase 1 + Phase 2 (C++ object model) of the WHATWG Streams rewrite from JS to
C++.
Phase 1: register a new `webstreams` internal binding
(src/streams/streams_binding.{h,cc}) as a SnapshotableObject, wired into all
binding registries and node.gyp. Builds, snapshots, and loads with no behavior
change.
Phase 2 (object model): src/streams/readable_stream.{h,cc} implements
ReadableStream, ReadableStreamDefaultController and ReadableStreamDefaultReader
in C++:
- value queue is a single JS Array (no per-chunk v8::Global) plus a C++
std::deque<double> for sizes (no Number boxing);
- pending reads are a C++ deque of Promise::Resolver, resolved directly with no
.then chains or read-request objects;
- stream<->controller<->reader relationships are stored in GC-traced internal
fields, with all objects MakeWeak'd (applied after MakeBaseObject) so they
live and die with their JS wrappers, mirroring the JS object graph's lifetime;
- a SizeMode enum recognizes the built-in CountQueuingStrategy /
ByteLengthQueuingStrategy so the per-chunk size() crossing into JS is skipped.
The model is exercised in isolation via createReadableStream /
acquireReadableStreamDefaultReader binding entry points (pull/enqueue/close/read,
parked-read-then-fulfill, cancel, desiredSize backpressure, error propagation).
It is not yet wired to the public API, so existing tests are unaffected.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements the byte-stream half of the WHATWG ReadableStream rewrite in C++,
compiled-but-unwired (the public JS API still uses the JS implementation; the
unified flip is the next step).
Adds a shared StreamBaseObject base carrying a 1-byte Kind tag so a controller
or reader recovered from a GC-traced internal field can be safely downcast
(BaseObject::FromJSObject is an unchecked static_cast and RTTI is disabled).
ReadableStream now holds either a default or a byte controller, and either a
default or a BYOB reader, dispatched by tag.
New classes in src/streams/readable_stream.{h,cc}:
- ReadableByteStreamController: zero-copy byte queue (deque of owned
BackingStores; enqueue transfers the view's ArrayBuffer via GetBackingStore +
Detach) and pending pull-into descriptors filled/committed with pure-C++
memcpy. The user-visible view is materialized once at fulfillment.
- ReadableStreamBYOBReader and ReadableStreamBYOBRequest.
Binding entries createReadableByteStream / acquireReadableStreamBYOBReader.
Default readers can read byte streams (PullSteps + autoAllocateChunkSize).
Validated by an isolation test (default-reader-on-bytes, BYOB respond /
respondWithNewView, autoAllocate, BYOB-on-closed, enqueue-fills-pending-BYOB,
minimumFill, partial-fill remainder, cancel, error, desiredSize) plus the
existing public suite (test-whatwg-readablebytestream*, test-webstream*,
20 files) — all green; snapshot builds.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements the WritableStream half of the WHATWG streams rewrite in C++,
compiled-but-unwired (the public JS API still uses the JS implementation; the
unified readable+writable+transform flip is the next step).
New src/streams/writable_stream.{h,cc} with WritableStream,
WritableStreamDefaultWriter, and WritableStreamDefaultController. The full
erroring/abort/in-flight state machine is ported faithfully:
- writeRequests as a deque of promise resolvers; close / inFlightWrite /
inFlightClose as resolver slots (empty == the spec's "undefined" request);
pendingAbort tracked on the stream.
- A PromiseSlot type models the writer's ready/closed {promise,resolve,reject},
including the settled-without-resolver case.
- The close sentinel is modeled by a close_queued_ flag (close is always the
last queue entry).
- The controller's AbortController is passed in from JS at setup; the signal
getter and abort() delegate to it. size_mode_ keeps the built-in queuing
strategy fast path.
The shared StreamBaseObject base, Kind tag, and SizeMode move into
streams_binding.h so readable and writable share them.
Binding entries createWritableStream / acquireWritableStreamDefaultWriter.
Validated by an isolation test (write+close, backpressure/desiredSize, size
strategy, close-flushes-queued-writes, abort, controller.error,
write-algorithm-rejects, signal, releaseLock, getWriter-locked) plus the
existing public suite (38 files incl. writablestream/transformstream/adapters/
transfer/compression) — all green; snapshot builds.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implements the TransformStream half of the WHATWG streams rewrite in C++,
compiled-but-unwired (the unified readable+writable+transform flip is next).
New src/streams/transform_stream.{h,cc} with TransformStream and
TransformStreamDefaultController. TransformStream is pure orchestration over the
C++ readable+writable halves: it builds a C++ readable and writable via the
extracted helpers NewReadableStream / NewWritableStream (refactored out of the
readable/writable binding entries) and wires their controller algorithms with
small JS trampolines (created once at setup, carrying the transform stream as
their Data) that call back into C++ sink/source algorithms. Continuations that
must capture chunk/reason use a 2-element holder Array as the reaction Data.
Ported faithfully: backpressure coordination (backpressureChange promise,
SetBackpressure / UnblockWrite), SinkWrite (await backpressure then transform),
SinkClose / SinkAbort / SourceCancel (finishPromise + flush/cancel algorithms),
SourcePull, controller enqueue/error/terminate/performTransform, and the shared
start promise. The transform controller drives the readable/writable through
their already-public C++ operations.
Binding entry createTransformStream. Validated by an isolation test (identity,
mapping transform, flush-enqueue + close, terminate, controller.error,
transform-throws, desiredSize) plus the public suite (38 files) and all four
stream isolation suites — green; snapshot builds.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds ExposeReadableStreamConstructors / ExposeWritableStreamConstructors / ExposeTransformStreamConstructors, called from CreatePerContextProperties, so the JS layer can obtain the 11 native constructor functions (and their prototypes) from the `webstreams` binding. The native prototypes are already spec-shaped: methods and accessor getters are enumerable + configurable and the constructor .name is correct, so the JS layer only needs to add Symbol.toStringTag, the JS-only methods, and construction wrappers when the flip wires them up. Groundwork only: the exposed constructors are not yet used by the public API (still the JS implementation), so behavior is unchanged. Builds + snapshot + full public suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lands the small, additive prerequisites the unified JS flip needs, kept compiled-but-unwired so the tree stays green: - C++: expose internal introspection over the existing (already-settled) stream state + closed-promise infrastructure as binding functions (readableStreamStateField/Disturbed/ClosedPromise, writableStreamStateField/ClosedPromise) rather than prototype properties, so the public WebIDL surface is unchanged. These back the node:stream interop hooks (kIsDisturbed/kIsErrored/kIsReadable/kIsWritable/ kIsClosedPromise) once the JS layer is flipped. - C++: fix ReadableStream/WritableStream closed_promise() to settle immediately when the stream is already in a terminal state, so a late requester (e.g. node:stream finished() on an already-closed stream) does not hang. - JS: add util.extractSizeMode() to classify a strategy.size into the C++ SizeMode enum, and export the built-in queuing-strategy size functions so they can be recognized by identity (skipping the per-chunk size() cross). WPT streams: 69 passed / 1403 subtests / 0 unexpected. Public suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rewrites lib/internal/webstreams/{readablestream,writablestream,transformstream}
.js as thin wrappers whose instances ARE the native C++ objects, plus
transfer.js deferred-port support. The build is green and the public classes
work; WPT streams went from ~250+ failures to 24 unexpected + 1 hang.
This is an intentional checkpoint commit to preserve in-progress work — the
conformance bar (full WPT + parallel suite) is NOT yet met, so this must not be
treated as complete. Remaining failures are catalogued in the plan file
(pipeTo error/abort priority, byte tee hang, byte read detach, transform size
propagation, then-interception, tee error propagation, patched-global).
Highlights of fixes already in this commit:
- transformstream requires readable/writable so C++ transform halves get the
grafted public prototypes
- ClassifyView distinguishes real Buffer from Uint8Array via prototype
- TypeError (not Error/RangeError) for released-reader/writer, byte brand,
min<=0, non-ArrayBufferView, non-detachable transfer
- EnqueueInternal no longer loses size() throws to a TryCatch destructor
- WritableStream::Abort re-reads state after signalling (no assert)
- closed_promise settles immediately when already terminal
- pipeTo no longer drops the last read chunk or masks the real error; uses
new writableStreamStateField / writableStreamCloseQueuedOrInFlight accessors
- subclassing honored via new.target in all three constructors
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drives the C++ flip to full WPT conformance: `node test/wpt/test-streams.js` is green (1404 subtests, 0 unexpected), as are the WPT compression and encoding suites. Down from ~45 unexpected failures + a crash + a hang. Web IDL binding conformance (idlharness): accessor getters are named "get <prop>"; operations carry correct `.length`; Promise-returning operations and Promise-typed attribute getters REJECT (not throw) on a foreign receiver via a HasInstance brand check + IllegalInvocationRejection (removing the V8 receiver signature, which would crash FromJSObject on an arbitrary object); respond() validates its required argument. Byte streams: default reader Release now runs the byte controller's release steps (re-tags the first pending pull-into "none") — fixes a crash and the autoAllocate+releaseLock cases; InvalidateBYOBRequest detaches the handed-out view's buffer (spec transfers the descriptor buffer on respond/enqueue); enqueue() throws when the BYOB request buffer was detached; respondWithNewView buffer-length mismatch -> ERR_INVALID_ARG_VALUE. tee: byte-tee no longer hangs (BYOB read-into close steps surface the empty view so the branch read settles); readableStreamCancel uses a lock-bypassing native helper; default tee uses guarded internal enqueue/close/error and the internal constructor (no WebIDL parse / global lookup -> survives a patched Object.prototype); error propagation deferred one tick so the final chunk is enqueued before branches error. pipeTo: spec priority-ordered shutdown checks (isOrBecomesErrored/Closed) backed by native stored-error accessors; abort actions guarded by state; close-with-error-propagation no-ops on an already-closed dest. Primordial safety: internal (pipeTo/tee) reads produce null-prototype results (forAuthorCode) so a patched Object.prototype.then can't observe piping; async iterator uses captured original read/cancel/releaseLock. Transfer: a BaseObject with no native transfer mode but implementing the JS transferable protocol (markTransferMode + @@ktransfer) is driven through that protocol in node_messaging.cc — makes the C++ stream objects structuredClone/ worker-transferable. Transform: size() throw propagates out of enqueue (TryCatch-cancel fix); readable/writable brand-check with ERR_INVALID_THIS. Parallel suite: 76/85 webstream/whatwg tests pass. The 9 remaining are internal-coupled oracle tests probing the old `kState` structure via --expose-internals (to be rewritten against the public API) plus the systematic native brand-check parity (native "Illegal invocation" vs ERR_INVALID_THIS) for controller methods. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop the V8 receiver signatures from every synchronous webstreams operation/getter and brand-check internally, throwing an ERR_INVALID_THIS-coded TypeError (via the new CheckReceiverInvalidThis helper) instead of V8's bare "Illegal invocation". Promise-returning operations already brand-checked and rejected; route their rejection through ERR_INVALID_THIS too (IllegalInvocationRejection). This matches Node's public surface, which reports ERR_INVALID_THIS for a foreign receiver. idlharness still accepts the (still-TypeError) behavior, so WPT stays green; the parallel test-whatwg-transformstream controller brand-check assertions now pass. WPT streams: 1404 subtests, 0 unexpected failures. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The flipped controller's custom inspect passed an empty data object, so
`inspect(controller, { depth: 0 })` rendered `... {}` instead of the
upstream `... [Object]`. Add an internal-only
`transformStreamControllerStream` binding and have controllerInspect
report `{ stream }`, matching the original JS implementation.
test-whatwg-transformstream now passes; WPT streams stays green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Rewrite the white-box assertions to the public surface: capture the
controller via start(c) for the instanceof check, and use stream's
isErrored() for the post-abort state check. Add a
writableStreamControllerStream introspection binding so the controller's
custom inspect can report `{ stream }` (matching the original JS), fixing
the depth-0 `[Object]` rendering.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two changes: 1. node_messaging: JSTransferable::NestedTransferables registered a nested C++ stream half (a BaseObject with no native transfer mode that implements the JS transferable protocol) as the raw BaseObject, while the serializer later writes it through its idempotent JSTransferable wrapper. The two never matched, so transferring a TransformStream threw "Object that needs transfer was found in message but not listed in transferList". Bridge the nested BaseObject to its JSTransferable wrapper, mirroring the serializer delegate. TransformStream transfer (postMessage/worker) now works. 2. Introspection: add a readableStreamController(stream) binding and readableStreamState/readableStreamStoredError + writableStreamState/writableStreamStoredError lib helpers, then rewrite test-whatwg-readablebytestream, test-whatwg-webstreams-transfer and the four adapters-to-* tests off the removed stream[kState] structure. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…pipeTo args Rewrite the last white-box test off the removed stream[kState] structure: - state/storedError checks -> readableStreamState/readableStreamStoredError - post-cancel "algorithms cleared" -> observable closed state - reader<->stream identity + readRequests.length -> the locking/rejection behavior already asserted alongside - releasing the async iterator's internal reader -> new readableStreamReleaseReader(stream) introspection binding - drop the two blocks that drove now-removed internal byte-controller helpers; replace with the equivalent public post-cancel behavior This block was never reached before (the test used to crash on kState at line 326), which exposed a real bug: the exported readableStreamPipeTo called the native reader-acquire (a CHECK on receiver type) without first validating its arguments, so readableStreamPipeTo(1) aborted instead of rejecting. Validate source/dest up front. All 39 webstream parallel tests pass; WPT streams/compression/encoding green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The flip's JS-transferable bridge in the message serializer dereferenced `host_object->GetTransferMode()` unconditionally. For a transfer-list entry whose native object was already freed — e.g. a detached MessagePort, where `BaseObject::Unwrap` returns nullptr — this is a null dereference and crashed (SIGSEGV in Message::Serialize). main never dereferences there: its detached-port check leads with `!host_object ||`. Guard all three bridge sites (transfer-list processing, the serializer delegate path is already null-safe via its own deref, and NestedTransferables) with a `host_object`/`base` null check so a freed entry falls through to the existing detached-port error path. Regression test test-worker-message-port-transfer-closed now passes (was a 100% SIGSEGV); full worker/messaging suite (148) green, webstreams parallel suite green, WPT streams 1404 subtests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two `make test` failures surfaced: 1. test-webapi-sharedarraybuffer-rejection: the C++ byte path lost the SharedArrayBuffer guard the JS implementation had. A SAB cannot be transferred/detached, so ReadableByteStreamController.enqueue() and ReadableStreamBYOBReader.read() must reject a SAB-backed view with ERR_INVALID_ARG_VALUE. Re-add the check (throw for enqueue, reject for read) before the buffer is otherwise used. 2. test-blob: read the blob stream's controller.desiredSize via the removed stream[kState].controller; switch it to the readableStreamController introspection binding. WPT streams green; both tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The per-pull hot path allocated two V8 Functions on every CallPullIfNeeded (the fulfil/reject reactions for the user pull() promise), via Function::New per chunk. Cache them per-controller (Data == the controller wrapper), created on first pull and reused thereafter; reset in ClearAlgorithms to break the controller<->wrapper cycle on terminal states. Effect (vs JS baseline, 30 runs): readable-read byob -42% -> -5%, normal -77% -> -49%, async-iterator -81% -> -64%, pipe-to -81% -> -75%. WPT streams (1404) + 39 webstream parallel tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Profiling the value-stream read loop showed ~30% of cycles outside V8
internals went to recovering related stream objects from GC-traced
internal fields (GetInternalField + FromJSObject per traversal, several
times per chunk) and to assembling { value, done } read results
property-by-property (re-interning the property names in the string
table and walking the map-transition machinery on every read).
- Mirror the GC-traced relationship fields (stream<->controller,
stream<->reader/writer, controller<->byobRequest, transform halves)
into raw C++ pointer caches updated at the exact sites the traced
fields are written. The traced field keeps the target's wrapper alive
while set, so a cache can never dangle; accessors become inline
pointer reads.
- Build read results by cloning per-realm boilerplate objects (one per
done x prototype combination, cached on the webstreams BindingData):
Object::Clone is a flat heap copy preserving the boilerplate's map, so
the hot path skips interning and transitions; only a non-undefined
`value` is written, as an own-property overwrite. Note done=true
results may carry a value (a BYOB read's partially-filled view on
close), so the write is keyed off the value, not `done`. Boilerplates
are built with CreateDataProperty so a patched Object.prototype
accessor cannot corrupt them.
Value-stream read throughput improves ~70% (680k -> 1.16M reads/s on
the readable-read benchmark loop). WPT: streams 1404, compression 338;
parallel test-whatwg-*/test-webstream* and worker message tests green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
pipeTo's per-chunk cost was dominated by allocations: a read promise,
a { value, done } result object and a microtask per chunk on the read
side, and two freshly allocated promise-reaction Functions per chunk on
the write side (the same per-pull allocation already eliminated for the
readable controller).
- Add a readableStreamFastDequeue binding that synchronously dequeues a
buffered chunk from a default (value) stream, mirroring the buffered
branch of the default controller's PullSteps (dequeue, then
close-if-drained or pull). pipeTo drains buffered chunks through it
in a tight loop - re-checking writer backpressure before each write -
and only falls back to reader.read() when the queue is empty, so the
promise/result machinery runs once per drain boundary instead of once
per chunk. How chunks move through a pipe is unobservable per spec;
this matches the previous JS implementation's internal fast path.
- Cache the writable controller's write promise-reaction functions on
the controller (created on first write, Data == controller wrapper),
mirroring the readable pull-reaction cache; the cycle is broken in
ClearAlgorithms.
pipe-to throughput improves 2.2x (266k -> 592k chunks/s on the pipe-to
benchmark loop). WPT: streams 1404 green; parallel test-whatwg-*,
test-webstream*, test-stream-* green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stream construction paid two per-creation reaction Function allocations plus a wrapped-start call for the common case of a start algorithm that returns undefined, and acquiring a reader eagerly allocated its closed promise resolver even though most readers never touch `closed`. - Start fast path: when the start algorithm returns a non-object (no promise to await, no thenable to chase), resolve the start promise with the controller wrapper itself and attach a single per-realm cached reaction that recovers the controller from the fulfilment value and dispatches by kind tag. Preserves the "started in a microtask" timing; an object return value keeps the existing ThenReact slow path so a thenable's `then` is only read once. - Readers' closed promise is now lazily materialized: acquiring a reader records only the settlement state (pending/resolved/rejected + reason, including the always-rejected post-release state); the resolver is created and settled on first access. new ReadableStream() improves ~80% (200k -> 362k/s) and getReader() ~17% on the creation microbenchmarks. WPT streams 1404 and parallel test-whatwg-*/test-webstream* green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The pull and write algorithms were wrapped per stream in a JS `async` adapter so the C++ controllers always received a promise; for the common synchronous algorithm that meant a wrapper frame, an implicit promise and two reaction jobs on every pull/write. Pass the user's raw pull/write to the native controllers instead, invoked with the underlying source/sink as receiver. The controller now handles the four return shapes itself: - promise: reacted to with the per-controller cached reactions (unchanged); - non-object (the common synchronous case): FinishPull/OnWriteFulfilled is enqueued directly as a microtask via the per-controller cached reaction function - one API call, no promise; ordering is identical since CallableTask and PromiseReactionJob share the microtask queue (a promise-resolution round trip benchmarked slower than the JS async wrapper it replaced); - thenable object: full promise resolution so its `then` is honored exactly once; - synchronous throw: behaves as a rejected pull/write promise. An absent pull/write skips the call entirely. close/cancel/abort remain wrapped (cold paths). Callability is still validated at construction. readable-read +12%, pipe-to +14%, creation +8% on the benchmark loops; the value-read and pipe-to gaps vs the JS implementation shrink to single digits. WPT: streams 1404, compression 338, encoding 3822; parallel whatwg/webstream/stream-interop/worker suites green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- ReadableStream async iterator: hoist the per-chunk read reactions to
per-iterator closures, return the read result (already a fresh
{ value, done } object) as the iterator result instead of re-wrapping
it, and skip the serialization chaining promise when the previous
request has already settled. An outstanding-request flag is set at
call time (not when the steps run) so un-awaited next()/return()
sequences still resolve in strict call order; return() always chains.
- TransformStream: the write/close/abort/pull/cancel trampolines are now
shared per realm, recovering the transform stream from their receiver
(the controllers invoke algorithms with algo_receiver, set to the
transform's wrapper), instead of five per-creation Function
allocations; only start keeps a per-instance Data binding since it is
invoked with an undefined receiver. The close/abort/cancel algorithm
invocations now pass algo_receiver as `this` (the public streams'
wrapped algorithms ignore their receiver).
async-iterator throughput +13%, TransformStream creation gap vs the JS
implementation narrows from -44% to -25%. WPT streams 1404 (including
the async-iterator [no awaiting] ordering tests) and parallel
whatwg/webstream/stream suites green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the per-chunk fast-dequeue + writer.write() sequence with a C++
pipe pump. pipeTo's step now calls a single pipePump binding that moves
buffered chunks straight from the readable's value queue into the
writable's write queue: no read promise, no { value, done } result, and
no per-write request resolver (each write occupies an untracked request
slot). On destination backpressure the pump arms itself and every write
completion re-enters the transfer loop from C++, so a steady-state pipe
moves chunks without touching JS at all; pipeTo's step parks on a stall
promise that settles when the pump stops for any reason. While the pump
is armed, the writer.ready pending/resolve churn is suppressed (nothing
can observe pipeTo's internal writer) and re-synced on disarm.
Shutdown waits on a new flush promise (resolved once every queued and
in-flight write has settled or the stream errored) instead of per-write
promises, and disarms the pump so no chunks move after shutdown begins,
including when shutdown starts from user code running synchronously
inside the pump. The JS closeWithErrorPropagation shim gains the spec's
dest-errored branch, reachable now that fast-path write failures no
longer surface through a tracked write promise.
pipe-to benchmark: ~860k vs ~755k chunks/s (+13%) at highWaterMark 1
(previously -22% vs the JS implementation); parity at large HWMs where
user callbacks dominate.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…read
reader.read() on a default reader is now a thin JS wrapper over a
readerFastRead binding that completes every read in one JS->C++
crossing. When a chunk is buffered, the binding returns the raw chunk
and the wrapper builds the { value, done } result and already-resolved
promise in JIT code — both substantially cheaper than their C++ API
equivalents (Promise::Resolver::New + Resolve + boilerplate clone).
Every other case (parked read, closed/errored stream, byte controller,
tee's internal non-author reader, which needs null-prototype results)
performs the full native read inline in the same crossing and returns
its promise.
The two return shapes are disambiguated by a Promise.prototype check:
chunks whose prototype chain contains the realm's original
%Promise.prototype% (cached primordial-safely from a freshly created
promise) are never returned raw — the binding builds their result
promise natively — so the wrapper's test cannot misclassify. Only a
foreign receiver falls back to the native prototype method, for its
Web IDL brand-check rejection. The async iterator's nextSteps uses the
same binding, turning a buffered chunk directly into the iterator
result with no read promise or reaction chain.
readable-read-buffered: -33% -> ~-4% vs the JS implementation;
readable-async-iterator: -40% -> ~-13%; readable-read parity preserved.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ReadableStreamReaderGenericRelease constructed the "reader was released" TypeError eagerly on every releaseLock(), and constructing a JS error captures a stack trace — over half of all cycles in a getReader()/releaseLock() loop. Record the rejection as a new ClosedState::kRejectedReleased instead and build the error only when a materialized closed promise (or an outstanding read request) can actually observe it; closed_promise() materializes it on first access. Applies to both the default and BYOB readers. getReader()+releaseLock() churn: 140k -> 1.30M ops/s (the JS implementation does 815k). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
std::deque allocates its map plus a ~512-byte node block in its constructor, and every reader, writer-side stream and controller embedded one or more of them (read/read-into/write request queues and the chunk-size queues) — the single largest cost in reader creation profiles after the object allocation itself. Replace the four pure-FIFO members with a vector-backed FifoQueue that allocates nothing until the first push (most of these queues are never pushed to), pops via a head index, and compacts the dead prefix once it dominates. reader creation: 1.05M -> 1.83M/s (JS impl: 2.06M); getReader()+releaseLock() churn: 1.30M -> 1.68M/s (JS impl: 834k); ReadableStream and WritableStream creation also improve (one fewer 512-byte malloc per embedded queue). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PromiseSlot materialized a V8 promise (and, on release, the "writer was released" TypeError with its stack capture) eagerly, even though most writers never have their ready/closed observed: writer setup allocated up to two resolvers and releaseLock() built rejected replacement promises plus the error unconditionally. The slot now records its state (pending / resolved / rejected-with / released) without touching V8 and materializes on first promise() access, where rejected promises are also marked handled; once observed it is sticky-materialized and behaves as before. The released error is built lazily and shared between ready and closed via the writer so both reject with the same error object, per spec. getWriter()+releaseLock() churn: 91k -> 2.0M ops/s (the JS implementation does 417k); writer creation drops two resolver allocations. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Convert ReadableStreamDefaultReader and ReadableStreamBYOBReader from BaseObject to cppgc-managed wrappers (CPPGC_MIXIN). Readers are the stream objects created and discarded in bulk (getReader()/releaseLock() churn, tee, pipeTo, async iteration), and the BaseObject model costs a malloc, a persistent global handle and a weak-callback pass per instance; cppgc replaces those with a bump allocation traced from the wrapper, and dead readers are reclaimed by the sweeper with no per-object callback. - V8 references (the pending read/read-into request resolvers and the lazily-materialized closed promise) move from v8::Global to v8::TracedReference: a v8::Global member would dispose global handles in the sweep-time destructor (unsafe) or require a pre-finalizer (which taxes every GC). Settled requests Reset() their cells eagerly since ~TracedReference deliberately does not free them. - Parked requests are written with emplace-empty + Reset() — an assigning store — never the TracedReference(isolate, local) constructor: the constructor is an initializing store, which during incremental marking gets neither a markbit nor a write barrier, so a request parked on an already-traced reader would be zapped by ResetDeadNodes at the end of marking (caught as an intermittent segfault in the compression WPT; reproduces deterministically under --stress-incremental-marking). - The pending-request FifoQueue mutates as reads park and settle, so the readers' Trace() defers concurrent marking to the mutator thread via DeferTraceToMutatorThreadIfConcurrent. - ReadableStream's reader cache becomes a void* discriminated by an explicit kind byte (the readers no longer carry the StreamBaseObject kind tag). getReader()+releaseLock() churn: 1.59M -> 1.82M/s (+14%), BYOB 1.76M -> 1.98M/s (+12%). Official creation benchmark at n=500k: DefaultReader +6.4% (*), BYOBReader +4.9% (***). Read paths at parity except readable-read byob -2.0% (**): a parked read-into request's TracedReference create/Reset is slightly dearer than the Global it replaces (byob reads remain ~+29% over the JS implementation). WPT streams/compression/encoding (including 20x compression and --stress-incremental-marking runs) and the parallel whatwg/webstream/stream suites stay green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Realm::TrackCppgcWrapper allocates a CppgcWrapperListNode (malloc) plus a cppgc::WeakPersistent per wrapper. The list exists to run Clean() on still-live wrappers at Realm shutdown, to include them in the realm's memory tracking, and to purge its own nodes after GC — none of which applies to wrapper classes with trivial cleanup (default destructor, no Clean() override). Add CppgcMixin::Tracking::kUntracked so such classes can skip the list entirely. Untracked wrappers must never touch realm_ during destruction — a wrapper swept after Realm teardown outlives it — so the destructor's purge-flag write is gated on tracked_ (untracked wrappers have no list node to purge anyway). Opt the webstreams readers out: they are created in bulk on hot paths and their destructor is the default one. Interleaved A/B vs tracked: getReader()+releaseLock() 1.91M -> 2.31M/s (+21%, +45% over the BaseObject model), BYOB churn +21%, reader creation 1.90M -> 2.08M/s — matching the all-JS implementation. Official creation benchmark (n=500k, vs the BaseObject readers): DefaultReader +16.8% (***), BYOBReader +16.1% (***); read paths at parity. Validated: WPT streams/compression/encoding green, including 20x compression repeats, --stress-incremental-marking runs, and worker teardown with live untracked wrappers (test-worker*). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
pipeTo and tee called read/cancel/releaseLock/write/close/abort and the closed/ready getters on their internal readers and writers through the live prototypes, which author code may patch. Per spec these are internal abstract operations and must not be observable. Caught by WPT readable-streams/patched-global under --stress-incremental-marking: a pipe's finalize drifted past its own test and called a patched releaseLock() installed by the async-iterator test, throwing inside the finalizer (intermittent in plain runs, ~1 in 2 under stress). The async iterator already used captured originals; pipeTo and tee now do too — all reader/writer methods and the closed/ready accessor getters are captured at module load and invoked via FunctionPrototypeCall/ReflectApply, with byte tee passing the kind-matching closed getter when it switches reader kinds. Validated with 10x patched-global under --stress-incremental-marking (previously failing ~1 in 2), 3x full WPT streams under stress, and the usual gates. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A TransformStream always handed its halves a start trampoline returning
the shared start promise — an object — so the readable/writable
controllers never took the non-object start fast path and each paid
per-creation reaction Functions in their Setup. v8::Function::New
allocates a fresh SharedFunctionInfo per call; profiling a default
`new TransformStream({ transform })` loop showed ~21% of all cycles in
those three Function::New sites (two ThenReacts + the start trampoline).
When the transformer has no start() — the common case — the JS shim now
passes no start algorithm at all. The C++ side then skips the start
trampoline, the shared start resolver and the final start call, and each
half marks its controller started via the shared per-realm reaction.
The halves still reproduce the spec's observable timing: a transform
half's start promise adopts the inner start promise
(promiseResolvedWith(thenable)), settling `started` three microtask
jobs after construction — WPT transform-streams/errors observes that
depth against an abort queued right after construction. The fast path
adopts a fulfilled promise carrying the controller wrapper instead, so
the shared reaction fires at exactly the same depth with no per-creation
Functions.
TransformStream creation: 79k -> ~140k/s in a spot loop; official
benchmark (n=500k) -19.3% -> +83.1% (***) vs the JS implementation;
js_transfer TransformStream -10.6% -> -4.0%. No change for transformers
with a start() (rare), nor for plain Readable/WritableStream
construction.
WPT streams/compression/encoding (plain and under
--stress-incremental-marking) and the parallel whatwg/webstream/stream
suites stay green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The default controllers kept their value queues as a JS Array held in a
single v8::Global, with chunks stored through the C++ API. Once read()'s
fast path became a binding, the queue was only ever touched from C++, so
every enqueue/dequeue paid a full API element access (LookupIterator,
AddDataElement) — and at high-water-mark 1 the queue emptied on every
read, dropping the array and degenerating to an Array::New per chunk.
This was ~24% of all cycles in the read-buffered bufferSize=1 benchmark
and dominated the C++ pipe pump's per-chunk cost (~6 API element ops
across the source and destination queues).
Replace both queues with a FifoQueue of {v8::Global<v8::Value>, size}
entries: enqueue is a global-handle Reset plus vector emplace, dequeue a
Get plus eager Reset (pop_front alone would pin the consumed chunk until
prefix compaction). This also folds the separate sizes queue and the
head/size bookkeeping into the entry queue. Lifetime semantics are
unchanged: buffered chunks remain strong roots owned by the queue, and
the controllers are plain BaseObjects, so per-chunk Globals are disposed
deterministically.
Benchmarks vs the pre-flip JS implementation (compare.R, 10 runs, all
***): read-buffered bufferSize=1 -19% -> +39.6% (the former pathological
case), bufferSize 10/100/1000 parity -> +46..59%, pipe-to +11..15% ->
+183..195% across all 16 configs, async-iterator -5% -> +60.2%;
readable-read normal/byob unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
CallPullIfNeeded modelled a missing pull algorithm as a pull returning an already-resolved promise, paying a per-controller reaction Function (a fresh SharedFunctionInfo via v8::Function::New) plus a microtask to reset [[pulling]]/[[pullAgain]] — bookkeeping nothing can observe when there is no pull to coalesce. Pull-less streams now return immediately. This was the largest microtask cost in stream transfers, where pipeTo parks a read on the (sourceless) transferred stream and every stream is freshly created, so the per-controller reaction cache never amortized. Also cache the noop reaction used by MarkHandled/CancelInternal on the realm's BindingData instead of building a Function per call, and create all internal reaction functions with ConstructorBehavior::kThrow so V8 skips the prototype/initial-map setup they never use. js_transfer vs the pre-flip JS implementation (compare.R, 12 runs): ReadableStream -1.4% -> +1.5% (n.s.), WritableStream -3.5% -> +0.1% (n.s., parity), TransformStream -4.0% -> -2.6%. Microtask share of the transfer loop drops from 3.7% to 1.4% of main-thread cycles. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Convert the writer from BaseObject to a cppgc-managed wrapper with realm-tracking opted out, following the readers' conversion: a bump allocation traced from the wrapper replaces malloc + persistent handle + weak callback + realm list node. Writers are created and discarded in bulk — every pipeTo (and so every stream transfer) acquires one. The writer's lazy ready/closed PromiseSlots move from v8::Global to v8::TracedReference (a cppgc object's sweep-time destructor must not dispose persistent handles); all slot writes already go through Reset(isolate, value), the assigning store that is safe during incremental marking. The slots' handles mutate as promises settle, so the writer defers concurrent tracing to the mutator thread, like the readers' request queues. getWriter()+releaseLock() churn: 2.0M/s -> 2.62M/s (+31%). js_transfer WritableStream +0.97% / ReadableStream +1.42% vs the pre-flip JS implementation (parity). All gates green: WPT streams 1404 / compression 338 / encoding 3822, plain and --stress-incremental-marking, 10x compression repeats, parallel whatwg/webstream/stream/worker suites, and worker teardown with 50k live untracked writers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Convert WritableStream and WritableStreamDefaultController from BaseObject to cppgc-managed wrappers with realm tracking opted out, completing the writable side (the writer converted previously). A stream transfer allocates stream+controller pairs in bulk — the user stream, the cross-realm peer, and the received stream — and each BaseObject paid malloc + template-instance persistent handle + weak callback + realm list node; the cppgc model is a bump allocation traced from the wrapper. All v8::Global members move to v8::TracedReference, including the write-request queue and the chunk queue (whose entries get a traced flavor of ValueQueueEntry until the readable controller converts too). Both classes defer concurrent tracing to the mutator thread since their queues mutate as writes park and chunks move. AddWriteRequest switches from the TracedReference(isolate, local) emplace — an initializing store with no write barrier during incremental marking, the exact race previously found in the readers — to emplace-empty + Reset. The write-request rejection drain Reset()s each popped entry so the dead prefix cannot pin rejected resolvers. The shared start-fulfilled reaction is split per wrapper model: the cppgc-managed writable controller gets its own per-realm reaction instead of adding a brand check in front of the readable controllers' kind dispatch (which cost ~2.7% on ReadableStream creation in an earlier draft of this change). Benchmarks vs the pre-flip JS implementation: creation WritableStream +91.3% -> +100.8% (***, n=500k); pipe-to unchanged (+172..187%, all 16 configs ***); js_transfer WritableStream/ReadableStream at parity (n.s.). All gates green: WPT streams 1404 / compression 338 / encoding 3822, plain and --stress-incremental-marking, 15x compression repeats, parallel whatwg/webstream/stream/worker suites, worker teardown with 50k live streams. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Convert ReadableStream, ReadableStreamDefaultController, ReadableByteStreamController and ReadableStreamBYOBRequest from BaseObject to cppgc-managed wrappers with realm tracking opted out, completing the conversion of all per-stream objects (readers, writer, and the writable side converted previously). Every stream construction and every transfer allocates these in bulk; the cppgc model replaces malloc + template persistent handle + weak callback + realm list node with a bump allocation traced from the wrapper. All v8::Global members move to v8::TracedReference; the value queue's per-chunk entries become the traced flavor (TracedValueQueueEntry, now shared with the writable controller). Tracing defers to the mutator thread wherever the traced members mutate. The kind-tag dispatch that relied on StreamBaseObject is replaced by per-type reaction callbacks — each controller type binds its own pull/start/reject reactions and gets its own per-realm shared start reaction, which is also cheaper than the old switch. The stream's controller cache becomes a type-erased pointer with an explicit kind byte, mirroring the reader cache. Benchmarks vs the pre-flip JS implementation: js_transfer reaches parity across all three payloads (ReadableStream +1.4%, WritableStream -0.7%, TransformStream -1.1%, all n.s.); creation ReadableStream +4.2% (n=500k, ***), DefaultReader +14.3%, tee +17.8%, WritableStream +104.6%, TransformStream +77.8%; read-buffered +39..68%, byob +23.7%, async-iterator +63.3%, pipe-to +164..174% (all ***). readable-read 'normal' pays ~4% for the conversion (traced-handle churn on the parked-read path) — the one negative, against double-digit wins elsewhere. All gates green: WPT streams 1404 / compression 338 / encoding 3822, plain and --stress-incremental-marking (streams twice), 20x compression repeats, parallel whatwg/webstream/stream/worker suites, worker teardown with 60k live streams holding buffered chunks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Convert TransformStream and TransformStreamDefaultController from BaseObject to cppgc-managed wrappers with realm tracking opted out — the last two stream classes on the BaseObject model. All v8::Global members (algorithms, finish/backpressure/start promise slots) move to v8::TracedReference with mutator-thread tracing, and creation becomes a bump allocation traced from the wrapper. With every stream class now cppgc-managed, the StreamBaseObject kind-tag base class has no users left and is removed; type discrimination is done entirely by per-type reaction callbacks, brand checks at the boundary, and explicit kind bytes next to the type-erased raw-pointer caches. js_transfer TransformStream lands at exact parity with the pre-flip JS implementation (-0.2% +-0.9, n.s.; ReadableStream +0.8%, WritableStream -1.0%, both n.s.), closing the transfer family. TransformStream creation +77.2% (n=500k, ***), spot keep-alive creation 151k -> 165k/s over the previous commit. All gates green: WPT streams 1404 / compression 338 / encoding 3822, plain and --stress-incremental-marking, 15x compression repeats, parallel whatwg/webstream/stream/worker suites, worker teardown with 40k live TransformStreams. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
readable-read normal sat at -10.9% vs the JS implementation (and -26% at a 10x window), previously concluded to be a V8-level floor. Two measurements reopened it: perf stat showed the C++ path executing the same instruction count as the JS baseline but at IPC 1.90 vs 2.67 (the API path is dependency-chained where JIT code is flat), and profiles showed the steady state is the parked path - the await continuation is enqueued before the pull-bookkeeping microtask, so every read parks and the buffered fast path never engages. Re-adopt the JS implementation's own structure for that path: the read() wrapper gets a park sentinel back from readerFastRead, creates the read promise (PromiseWithResolvers) plus one settle closure `(value, done, isErr)` in JIT code, and parks it through a new trusted readerParkRead binding. C++ settles the read with a single Function::Call into the closure: the result object is built by JIT StoreICs and the resolution runs through the builtin with an inline- cached then-lookup - replacing Resolver::New, a boilerplate clone whose value write re-ran a kConst->kMutable descriptor migration per clone, and Resolver::Resolve, whose C++ then-lookup walks Object.prototype with no inline cache. A parked read request is now either a promise resolver (pipeTo, tee, byte streams, the async iterator) or the settle closure, discriminated by IsFunction(); the front request lives in a kParkedRead internal field on the reader wrapper (a plain tagged store, traced with the wrapper) with the TracedReference queue as overflow for concurrent reads. Also: pre-generalize the read-result boilerplates' value descriptor; inline BindingData::Get as env->principal_realm() (the context walk + embedder-data bounds check was ~4% of the read loop); take the binding data store slot by reference in Realm::GetBindingData (the BaseObjectPtr copy churned the refcount per lookup); skip the redundant InternalFieldCount check when unwrapping behind a brand check or on trusted reaction Data. Two negative results are recorded in comments: a native MicrotaskCallback for the sync-pull continuation loses (rooting the cppgc controller wrapper requires a per-pull v8::Global arm/disarm pair that costs more than the JS-function task it saves), and a fulfill/reject closure pair loses half the win to two young-into-old internal-field stores each paying the remembered-set barrier. Benchmark results vs the pre-flip JS implementation (compare.R): readable-read normal -10.9% -> +22.2% *** readable-read byob +25.7% -> +32.7% *** read-buffered bufferSize=1 +39.6% -> +58.6% *** read-buffered 10/100/1000 +46..59 -> +92.8..93.5% *** async-iterator +63.2% -> +78.9% *** pipe-to (16 configs) held at +181..194% *** js_transfer RS/WS/TS parity held (all n.s.) creation (n=500k) DefaultReader +28.5%, RS +9.5% *** WPT: streams 1404 (plain and --stress-incremental-marking), compression 338 x15 plus 3 stress runs, encoding 3822; parallel whatwg/webstream/ stream/worker suites green; worker teardown with 80k live parked closure reads (plain and stress). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
In the parked steady state (the spec-mandated regime for tight read()/pull loops: the await continuation always precedes FinishPull, so every read parks while pulling_ is set), each enqueued chunk settled the pending read through a C++->JS Function::Call into the read() wrapper's settle closure - a full Execution::Call/JSEntry transition per chunk. controller.enqueue is now a thin JS wrapper over a new controllerEnqueueOrSettle binding. When the settle is safely reorderable past EnqueueInternal's tail CallPullIfNeeded - i.e. when pulling_ is set (an enqueue from inside a running pull, where it only records pull_again_) or there is no pull algorithm (a no-op) - the binding pops the closure and returns it, and the wrapper performs the settle as a JIT call. All other cases (no pending read, resolver-kind requests from pipeTo/tee/async-iterator/byte paths, throwing states) complete inside the binding unchanged, which returns undefined. A sync-settle park experiment (park a pure-C++ marker, run pull inside the read crossing, answer the read directly) was implemented and reverted: the parked regime is self-sustaining even at highWaterMark 0 because ShouldCallPull's read-request clause keeps the pull cycle one read behind, so the marker only ever settled on cold reads. readable-read normal: +22.2% -> +28.6% vs the JS baseline (spot n=1e6: 1.85M -> 2.06M reads/s vs the prior HEAD); async-iterator +83.2%; read-buffered, byob and pipe-to (16/16) held. WPT streams 1404 (plain and --stress-incremental-marking), compression 338 x10 plus x3 stress, encoding 3822, parallel whatwg/webstream/stream/worker suites and the worker-teardown stress all green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two structural costs dominated the respond-driven BYOB loop (the
fs/fetch-style pattern: pull writes into byobRequest.view and calls
respond(n)), which ran well behind the JS implementation even though
the official benchmark's enqueue-shaped byob row did not show it:
- Every parked BYOB read settled through Promise::Resolver::Resolve
plus a result clone, with the C++ then-lookup walking
Object.prototype with no inline cache. The byob read() is now a thin
JS wrapper over byobReaderFastRead/byobReaderParkRead, mirroring the
default reader's protocol: a synchronously-filled view returns raw
(the wrapper builds { done, value } and the resolved promise in JIT
code), and parked reads park a settle closure that C++ invokes with
one Function::Call. Read-into requests are resolver OR closure,
discriminated by IsFunction(); a done settle still passes the view
(a closing BYOB stream hands back the partially-filled view). The
byte tee keeps the raw native read (captured before the wrapper is
installed): its internal reads need null-prototype results.
- Every pull created a cppgc-backed BYOBRequest: MakeGarbageCollected,
wrapper traced refs and a template->function instantiation lookup
per request. A BYOBRequest is now a JS-only wrapper - the GC-traced
kController/kView internal fields carry everything, methods recover
the controller by unwrapping kController after the brand check (the
cleared field doubles as the invalidated brand), and the instantiated
constructor is cached per realm.
Respond-driven loop: 0.44M -> 0.53M reads/s (+21%; the gap to the JS
baseline narrows from -34% to -18% - the remainder is per-read
ArrayBuffer materialization and crossing overhead, see the benchmarks
doc). Official readable-read byob: +38.8% vs the JS baseline (was
+32.7%); enqueue-shaped byob spot +6%; readable-read normal and
read-buffered held. WPT streams 1404 (plain and
--stress-incremental-marking), compression 338 x10 plus x3 stress,
encoding 3822, parallel whatwg/webstream/stream/worker suites, and
worker teardown with 4x20k live parked BYOB closure reads (plain and
stress) all green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A default (non-BYOB) reader on a byte stream - the fetch-body shape, body.getReader() + read() - was excluded from the read() wrapper's parked-closure protocol: ReaderFastRead punted every byte-controller read to the native path, so each read paid Resolver::New, a result clone and an un-IC'd C++ Resolver::Resolve, and each enqueue settled it through a C++->JS Function::Call. The shape measured -17% vs the JS baseline while the official suite (which has no such row) read green everywhere. ReaderFastRead now extends both fast paths to byte controllers behind the same author-code gate: an empty queue returns the park sentinel (readerParkRead dispatches to the byte park steps - auto-allocate descriptor push, closure park, pull), and a buffered queue returns the dequeued chunk's view raw for the wrapper to wrap in JIT code. The take-then-settle order in ProcessReadRequestsUsingQueue is preserved by splitting FulfillFront into TakeFrontReadRequest + SettleReadRequest (the dequeue's drain bookkeeping can run pull reentrantly). The byte controller's enqueue follows the default controller's enqueue-or-settle inversion: a thin JS wrapper over a byteControllerEnqueueOrSettle binding that, when the final settle targets a wrapper closure and is reorderable past the tail CallPullIfNeeded (pulling_ set or no pull algorithm), hands back a [closure, view] pair - the settled value here is the C++-materialized transferred view, not the caller's chunk, so the view crosses back with the closure. This also covers the byob-reader commit path, so the enqueue-shaped official byob row gains too. The same inversion applied to byobRequest.respond() was implemented, measured performance-neutral in an interleaved binary A/B (the pair Array::New consumed the saved JSEntry transition), and dropped. byte default-read spot loop: 0.63 -> 0.74 M reads/s (-17% -> -8% vs the JS baseline; closure park +13%, enqueue defer +3%). Official rows (compare.R, 10 runs): readable-read byob +23.5% -> +38.8%***, readable-read normal +35.7%***, read-buffered +45...+92%***, async-iterator +65.8%*** (all held). WPT streams 1404 (plain and --stress-incremental-marking), compression 338 plus x5 stress, encoding 3822, parallel whatwg/webstream suites and worker teardown with 40k parked byte closure reads all green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Every write arriving under backpressure paid a fresh v8::Function::New (a SharedFunctionInfo allocation per chunk) plus a 2-element holder array to carry (stream, chunk) into the backpressure-change reaction. The writable dispatches at most one write at a time - the next write starts only after the previous sink-write promise settles, and that promise settles only after the continuation has run - so the awaiting chunk lives in a single TracedReference slot on the stream and the continuation is created once per stream (data = the stream wrapper) and reused for every chunk. The continuation takes the chunk unconditionally on entry so the slot is free even when the wait ends in an erroring writable. Throughput-neutral in a direct binary A/B on a pipeThrough passthrough loop (the loop is dominated by the spec's backpressure-change promise machinery - two Resolver::New and one API Then per chunk - which is the remaining lever there); this removes the two per-chunk allocations and the holder indirection. WPT streams 1404 (plain and --stress-incremental-marking), compression 338 plus x3 stress, encoding 3822, transform/compose/ pipeline/transfer parallel suites green; js_transfer parity held (RS +0.6 / TS -0.0 / WS -0.2, all n.s.). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The byte path carried pull-into destination buffers as owned BackingStores, paying per read: a GetBackingStore + Detach round trip at transfer-in (a shared_ptr control-block malloc per call), an ArrayBuffer::New at commit, another ArrayBuffer::New + view per byobRequest, and C++-Factory view materialization - the respond-driven loop profiled ~18% Factory object construction and ~9% BackingStore shared_ptr churn against the JS baseline's all-JIT allocation. Descriptors now hold a real JS ArrayBuffer (a TracedReference while parked in the pending deque, moved to a handle-scope Local when shifted out for committing - a stack TracedReference is invisible to marking and commits run JS) plus a cached data pointer for the fill memcpys. The byob read() wrapper becomes a staged two-crossing protocol: byobReaderReadStage validates and stages the geometry, returning (viewTag << 1) | syncBit as a Smi; the wrapper transfers the view's buffer with %ArrayBuffer.prototype.transferToFixedLength% (V8-internal, no BackingStore API) and re-enters byobReaderParkRead or byobReaderFillStaged - the sync flavor returns just a filled length (0 <=> done), so a buffered read allocates nothing in the binding. Parked settles hand (buffer, done, isErr, byteOffset, length, viewTag, transfer) to the settle closure, which constructs the result view via a JIT `new ctor(buffer, off, len)`; both read() wrappers and the enqueue defer tuple share the protocol. The byobRequest view now wraps the descriptor's own held buffer (no fresh ArrayBuffer per pull) and marks it exposed. The spec's respond/enqueue-entry re-transfers reduce to: nothing on the hot path (the commit's transfer detaches the exposed view for free), a native re-transfer where the descriptor stays pending (below-minimum responds, enqueues with pending descriptors), and a plain detach where its data is dropped - and commits of never-exposed buffers skip the transfer entirely, handing the private buffer straight to the result view (the transfer would be unobservable; this also keeps the enqueue-shaped byob row from paying a new per-chunk transfer). Resolver-kind settles (the byte tee, post-release paths) keep fully native materialization. Respond-driven byob spot loop: 0.52 -> 0.55 M reads/s (-16% -> -13% vs the JS baseline); the profile's GetBackingStore and shared_ptr release clusters are gone and result views build in Builtins_CreateTypedArray. Official rows (10 runs): readable-read byob +37.7%***, normal +34.4%***, read-buffered +46...+101%*** all held; js_transfer parity held. What remains in the gap is the per-pull BYOBRequest + request-view API materialization, the respond settle's C++->JS call, and the V8-internal transfer machinery both implementations pay. WPT streams 1404 (plain and --stress-incremental-marking), compression 338 plus x5 stress, encoding 3822, parallel whatwg/webstream suites, worker teardown with 40k parked descriptor reads and stress-marking respond churn all green. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Every pull in a respond-driven byte loop paid an API Factory Uint8Array materialization inside the byobRequest getter, and every byobRequest.view access after that paid a full API crossing (brand check + internal-field read) just to return it - the request-view machinery was ~5% of the loop after the byte-path redesign. controller.byobRequest is now a JS getter over a byobRequestGetOrCreate binding that returns the existing request, null, or - on first access for a descriptor - a fresh [request, buffer, byteOffset, byteLength] tuple. The wrapper constructs the view with a JIT `new Uint8Array(...)` over the descriptor's held buffer and caches it on the request as a JS own property under a module-private symbol. byobRequest.view becomes a plain property load while the request is valid - zero binding crossings; possessing the symbol doubles as the Web IDL brand (matching the pre-flip implementation's per-realm symbol-keyed state). A detached cached view takes one cold crossing (byobRequestIsInvalidated) to distinguish invalidation - the getter returns null from then on - from a user-detached buffer, where the spec keeps handing back the same view object; descriptor buffers detach on exactly those two paths, so detachment is a sound cache probe. The kView internal field is gone, invalidation is one field write cheaper, and respond()'s detached pre-check inspects the front descriptor's buffer directly (the same object the view wraps). Respond-driven loop: 0.52 -> 0.54 M reads/s (+4.4% in a direct interleaved binary A/B, 7/8 runs ahead); shapes that touch .view several times per pull (fs.promises readableWebStream) save an API crossing per access. Official rows: readable-read byob +40.4%***, normal +36.1%***, both held. WPT streams 1404 (plain and --stress-incremental-marking), compression plus x5 stress, encoding, parallel whatwg/webstream/blob and filehandle-readablestream suites all green; no new lint findings. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The transform's backpressure coordination ran on an internal [[backpressureChangePromise]]: every toggle paid a Resolver::New and an (un-IC'd) API Resolve, the readable's pull held the promise through a ThenReact reaction pair, and each write arriving under backpressure chained a derived promise off it - two resolvers, two resolves, two reaction sets and a derived promise per chunk in a passthrough loop, all to deliver two internal wakeups whose only observable is their microtask position. The change promise is gone. SourcePull flips backpressure and returns a per-realm direct-pull marker: the readable controller leaves the pull in flight with no promise and no reaction, and SetBackpressure(true) delivers FinishPull by enqueueing the controller's cached pull-fulfilled reaction as one microtask - the exact FIFO position of the promise reaction it replaces (the same EnqueueMicrotask equivalence the sync-pull fast path already relies on). On the write side, SinkWrite parks the chunk and hands the writable a resolver-backed promise; SetBackpressure(false) enqueues the per-stream cached continuation as one microtask, which runs the transform and resolves that promise with PerformTransform's result - promise adoption keeps the writable's settle depth identical to the old change-promise.Then chain, and an erroring writable rejects within the continuation's microtask exactly as the old reaction-throw did. The marker test lives at the head of CallPullIfNeeded's thenable fallback (its coldest branch), so non-transform pulls pay nothing. pipeThrough(new TransformStream()) passthrough: 0.43 -> 0.56 M chunks/s - parity with the JS baseline becomes +35%. js_transfer held (RS +1.8 / TS +1.7 / WS +0.4, parity band). WPT streams 1404 (plain and --stress-incremental-marking), compression 338 plus x5 stress, encoding 3822 and the parallel whatwg/webstream suites all green; smokes cover ordering, async transformers, transform throws and writer aborts with a write parked on backpressure. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
clang-format over src/streams (clearing 42 of the 45 cpplint findings; the remaining three were util.h/util-inl.h double includes) and a full eslint pass over lib/internal/webstreams: ReflectApply with array-literal arguments becomes FunctionPrototypeCall, read-result literals go done-first, tee branches and from()'s stream become const, JSDoc @returns added to the public constructors, non-ASCII punctuation swept from comments, and unused/missing primordials imports fixed. Zero eslint and cpplint findings remain in the flipped files. No behavior change intended. WPT streams 1404 (plain and --stress-incremental-marking), compression plus x3 stress, encoding, and the parallel whatwg/webstream suites green; readable-read official rows and the byte/transform spot shapes re-measured unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Collaborator
|
Review requested:
|
Member
Author
|
Adding @nodejs/tsc agenda as a discussion point, as it's a relatively high amount of C++ involved. |
Contributor
|
Can you remove the log files to make the line counter more representative? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is an experiment to see if moving WHATWG Stream to C++ is viable. This was done by a combination of Claude Opus 4.8 and Fable 5. I will add a full review guide later on.
This is not ready for review, but I'm opening to see what's feeling around this sort of change, and consider if we want to move it forward. My C++ is not great, so I would miss a lot of things.
These are the benchmarks on my machine against main: