buffer: optimize Buffer.copy#62491
Conversation
|
Review requested:
|
This removes some of the overhead by extending the V8 api. PR-URL: nodejs#62491
This removes some of the overhead by extending the V8 api. PR-URL: nodejs#62491
This removes some of the overhead by extending the V8 api. PR-URL: nodejs#62491
|
@nodejs/performance |
This removes some of the overhead by extending the V8 api. PR-URL: nodejs#62491
This removes some of the overhead by extending the V8 api. PR-URL: nodejs#62491
Failed to start CI⚠ Commits were pushed since the last approving review: ⚠ - buffer: optimize Buffer.copy ⚠ - Update v8-array-buffer.h ✘ Refusing to run CI on potentially unsafe PRhttps://github.com/nodejs/node/actions/runs/23705961064 |
|
Either way I don't see how this is worse than it was before. Feels like scope creep: |
|
If you feel strongly about the check please open a pr and fix it properly across all the other methods as well. As you wrote in the issue. Messing around with prototypes is not supported. |
|
Can you upstream the V8 changes first? In general floating patches is reserved for toolchains/platforms V8 doesn't support, that adds maintenance burden to rebase the patches during V8 updates especially when V8 does any architectural changes (e.g. they are constantly changing how handles should be used internally, which can easily turns patches like this an upgrade blocker), so it's always better to merge them first in the upstream. |
I don't think it's likely to land and I don't know how to upstream to V8. If that's a blocker for you then this PR is moot. I appreciate the maintenance burden and if those that have to deal with it object then I have no objection to closing this PR. |
joyeecheung
left a comment
There was a problem hiding this comment.
I think this needs to be upstreamed first. I could catch the pointer issue from a glance. But I don't think I know enough to gauge the internal API uses are safe and along the lines of where V8 is going without review from people in the V8 team. Floating patches can both add maintenance burden to our V8 upgrades as well as V8's integration CI (they build Node.js with different configurations there e.g. with pointer compression and could get hit by a different set of problems). They are generally reserved for things that V8 defer to others (MSVC support or niche platform support - even in those cases we still try to upstream to minimize the burden, like the big endian patches), not things like features or performance optimizations.
|
Understood. Unfortunate. Was a significant perf boost. |
This removes some of the overhead by extending the V8 api. PR-URL: nodejs#62491
This removes some of the overhead by extending the V8 api. PR-URL: nodejs#62491
|
Benchmark results: |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #62491 +/- ##
==========================================
- Coverage 91.55% 89.70% -1.85%
==========================================
Files 351 692 +341
Lines 147653 213981 +66328
Branches 23224 41050 +17826
==========================================
+ Hits 135179 191948 +56769
- Misses 12217 14110 +1893
- Partials 257 7923 +7666
🚀 New features to boost your workflow:
|
|
I like it, but yeah, I agree with @joyeecheung ... an attempt to upstream this into v8 should be made first. |
It isn't robust (as I mentioned above).
You can crash the process (again, as mentioned), which is better than reading/writing arbitrary process memory. |
|
I've funded work to upstream this to V8. Hopefully this will be able to land in the future without blocking nits. |
|
@joyeecheung @jasnell I funded work to get this upstreamed, see https://chromium-review.googlesource.com/c/v8/v8/+/7735151 However, in order to land it in V8 we would probably need to use Is floating a patch still off the table? |
|
Alternatively, if there is some way to "automagically" replace |
Route the native backing of Buffer.prototype.copy (CopyImpl, the `_copy`
binding) through V8's new v8::ArrayBuffer::CopyArrayBufferBytes API
instead of materializing an ArrayBufferViewContents and doing a manual
memmove. This speeds up partial copies (sourceStart > 0); whole-buffer
copies still use the %TypedArray%.prototype.set intrinsic, which is
faster than any native binding for that case.
V8 clamps the byte range and handles detached and immutable buffers.
When both sides are backed by a SharedArrayBuffer the relaxed-atomic
overload is used, which honors the SharedArrayBuffer memory model.
Mixed copies (one SharedArrayBuffer, one ArrayBuffer) use the regular
overload.
copy() now reports the number of bytes actually copied, as returned by
V8: 0 when the target is backed by a detached or immutable ArrayBuffer.
The copy is then a no-op rather than a write to read-only memory.
buffer-copy.js vs node v26.3.0 (x64, 30 runs):
partial=false bytes=1024: +7.91% (***)
partial=false bytes=128: +0.17%
partial=false bytes=8: -0.89%
partial=true bytes=1024: +22.33% (***)
partial=true bytes=128: +19.58% (***)
partial=true bytes=8: +18.70% (***)
The V8 API is carried as a floating patch cherry-picked from:
https://chromium-review.googlesource.com/c/v8/v8/+/7735151
That CL is authored by a Node.js collaborator (Ben Noordhuis) and is
expected to land upstream, so the floating patch should be easy to
maintain: it touches nothing but the public ArrayBuffer API and can be
dropped wholesale on the next V8 upgrade that includes it.
v8_embedder_string is bumped to -node.21 accordingly.
This supersedes the prototype in
nodejs#62491, which added a bespoke
ArrayBufferView::FastCopy instead of using the upstream-friendly
CopyArrayBufferBytes API.
Adds SharedArrayBuffer and immutable-ArrayBuffer coverage to the buffer
copy tests.
Signed-off-by: Robert Nagy <ronagy@icloud.com>
Assisted-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Route the native backing of Buffer.prototype.copy (CopyImpl, the `_copy`
binding) through V8's new v8::ArrayBuffer::CopyArrayBufferBytes API
instead of materializing an ArrayBufferViewContents and doing a manual
memmove. This speeds up partial copies (sourceStart > 0); whole-buffer
copies still use the %TypedArray%.prototype.set intrinsic, which is
faster than any native binding for that case.
V8 clamps the byte range and handles detached and immutable buffers.
When both sides are backed by a SharedArrayBuffer the relaxed-atomic
overload is used, which honors the SharedArrayBuffer memory model.
Mixed copies (one SharedArrayBuffer, one ArrayBuffer) use the regular
overload.
copy() now reports the number of bytes actually copied, as returned by
V8: 0 when the target is backed by a detached or immutable ArrayBuffer.
The copy is then a no-op rather than a write to read-only memory.
buffer-copy.js vs node v26.3.0 (x64, 30 runs):
partial=false bytes=1024: +7.91% (***)
partial=false bytes=128: +0.17%
partial=false bytes=8: -0.89%
partial=true bytes=1024: +22.33% (***)
partial=true bytes=128: +19.58% (***)
partial=true bytes=8: +18.70% (***)
The V8 API is carried as a floating patch cherry-picked from:
https://chromium-review.googlesource.com/c/v8/v8/+/7735151
That CL is authored by a Node.js collaborator (Ben Noordhuis) and is
expected to land upstream, so the floating patch should be easy to
maintain: it touches nothing but the public ArrayBuffer API and can be
dropped wholesale on the next V8 upgrade that includes it.
v8_embedder_string is bumped to -node.21 accordingly.
This supersedes the prototype in
nodejs#62491, which added a bespoke
ArrayBufferView::FastCopy instead of using the upstream-friendly
CopyArrayBufferBytes API.
Adds SharedArrayBuffer and immutable-ArrayBuffer coverage to the buffer
copy tests.
Signed-off-by: Robert Nagy <ronagy@icloud.com>
Assisted-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Route the native backing of Buffer.prototype.copy (CopyImpl, the `_copy`
binding) through V8's new v8::ArrayBuffer::CopyArrayBufferBytes API
instead of materializing an ArrayBufferViewContents and doing a manual
memmove. This speeds up partial copies (sourceStart > 0). All copies now
go through this binding: the previous %TypedArray%.prototype.set
fast-path for whole-buffer copies is dropped, since it would throw on a
detached or immutable target rather than report a 0-byte no-op, and the
benchmarks below show the native path is comparable for that case.
V8 clamps the byte range and handles detached and immutable buffers.
When both sides are backed by a SharedArrayBuffer the relaxed-atomic
overload is used, which honors the SharedArrayBuffer memory model.
Mixed copies (one SharedArrayBuffer, one ArrayBuffer) use the regular
overload.
copy() now reports the number of bytes actually copied, as returned by
V8: 0 when the target is backed by a detached or immutable ArrayBuffer.
The copy is then a no-op rather than a write to read-only memory.
buffer-copy.js vs node v26.3.0 (x64, 30 runs):
partial=false bytes=1024: +7.91% (***)
partial=false bytes=128: +0.17%
partial=false bytes=8: -0.89%
partial=true bytes=1024: +22.33% (***)
partial=true bytes=128: +19.58% (***)
partial=true bytes=8: +18.70% (***)
The V8 API is carried as a floating patch cherry-picked from:
https://chromium-review.googlesource.com/c/v8/v8/+/7735151
That CL is authored by a Node.js collaborator (Ben Noordhuis) and is
expected to land upstream, so the floating patch should be easy to
maintain: it touches nothing but the public ArrayBuffer API and can be
dropped wholesale on the next V8 upgrade that includes it.
v8_embedder_string is bumped to -node.21 accordingly.
This supersedes the prototype in
nodejs#62491, which added a bespoke
ArrayBufferView::FastCopy instead of using the upstream-friendly
CopyArrayBufferBytes API.
Adds SharedArrayBuffer and immutable-ArrayBuffer coverage to the buffer
copy tests.
Signed-off-by: Robert Nagy <ronagy@icloud.com>
Assisted-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Route the native backing of Buffer.prototype.copy (CopyImpl, the `_copy` binding) through V8's new v8::ArrayBuffer::CopyArrayBufferBytes API (added in the preceding commit) instead of materializing an ArrayBufferViewContents and doing a manual memmove. This speeds up partial copies (sourceStart > 0). All copies now go through this binding: the previous %TypedArray%.prototype.set fast-path for whole-buffer copies is dropped, since it would throw on a detached or immutable target rather than report a 0-byte no-op, and the benchmarks below show the native path is comparable for that case. When both sides are backed by a SharedArrayBuffer the relaxed-atomic overload is used, which honors the SharedArrayBuffer memory model. Mixed copies (one SharedArrayBuffer, one ArrayBuffer) use the regular overload. copy() now reports the number of bytes actually copied, as returned by V8: 0 when the target is backed by a detached or immutable ArrayBuffer. The copy is then a no-op rather than a write to read-only memory. buffer-copy.js vs node v26.3.0 (x64, 30 runs): partial=false bytes=1024: +7.91% (***) partial=false bytes=128: +0.17% partial=false bytes=8: -0.89% partial=true bytes=1024: +22.33% (***) partial=true bytes=128: +19.58% (***) partial=true bytes=8: +18.70% (***) This supersedes the prototype in nodejs#62491, which added a bespoke ArrayBufferView::FastCopy instead of using the upstream-friendly CopyArrayBufferBytes API. Adds SharedArrayBuffer and immutable-ArrayBuffer coverage to the buffer copy tests. Signed-off-by: Robert Nagy <ronagy@icloud.com> Assisted-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Route the native backing of Buffer.prototype.copy (CopyImpl, the `_copy` binding) through V8's new v8::ArrayBuffer::CopyArrayBufferBytes API (added in the preceding commit) instead of materializing an ArrayBufferViewContents and doing a manual memmove. This speeds up partial copies (sourceStart > 0). All copies now go through this binding: the previous %TypedArray%.prototype.set fast-path for whole-buffer copies is dropped, since it would throw on a detached or immutable target rather than report a 0-byte no-op, and the benchmarks below show the native path is comparable for that case. When both sides are backed by a SharedArrayBuffer the relaxed-atomic overload is used, which honors the SharedArrayBuffer memory model. Mixed copies (one SharedArrayBuffer, one ArrayBuffer) use the regular overload. copy() now reports the number of bytes actually copied, as returned by V8: 0 when the target is backed by a detached or immutable ArrayBuffer. The copy is then a no-op rather than a write to read-only memory. The native binding now plumbs byte offsets and the copied-byte count through as size_t, passed across the fast/slow API boundary as doubles (exact for integer values below 2^53), so copies that cross the 4 GiB boundary are no longer truncated to 32 bits. buffer-copy.js vs node v26.3.0 (x64, 30 runs): partial=false bytes=1024: +7.91% (***) partial=false bytes=128: +0.17% partial=false bytes=8: -0.89% partial=true bytes=1024: +22.33% (***) partial=true bytes=128: +19.58% (***) partial=true bytes=8: +18.70% (***) This supersedes the prototype in nodejs#62491, which added a bespoke ArrayBufferView::FastCopy instead of using the upstream-friendly CopyArrayBufferBytes API. Adds SharedArrayBuffer and immutable-ArrayBuffer coverage to the buffer copy tests, plus a pummel regression test for copies larger than 2^32 bytes. Refs: nodejs#55422 Signed-off-by: Robert Nagy <ronagy@icloud.com> Assisted-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This removes some of the overhead by extending the V8 api.