worker: initial implementation (large/base PR)#58
worker: initial implementation (large/base PR)#58addaleax wants to merge 15 commits intoayojs:latestfrom
Conversation
|
Hi,
I have been able to compile clean version of Node.js without any issues on same machine before on a side note, not as a requirement or anything but as a mere suggestion for help :) |
47e8463 to
70921b6
Compare
@YafimK Hm – I can’t quite figure out what the problem here is :/ Does it help if you just remove line 44 in
I think this isn’t really about this branch, but about Ayo in general – we don’t have much CI support yet, so talking about support for certain platforms is somewhat hard (but also see #25)
If you can make that happen, that would be absolutely awesome 💙 (Also, just fyi, I rebased this against the current |
70921b6 to
1aa0013
Compare
|
@benjamingr @Qard I’ve moved everything new off the |
1aa0013 to
00bb418
Compare
| // { globalVar: 1024 } | ||
| ``` | ||
|
|
||
| ## vm.moveMessagePortToContext(port, context) |
|
|
||
| Sends a JavaScript value to the receiving side of this channel. | ||
| `value` will be transferred in a way | ||
| that is compatible with the [HTML structured clone algorithm][]. In particular, |
There was a problem hiding this comment.
structured cloning isn't a thing any more (https://html.spec.whatwg.org/multipage/structured-data.html#structuredclone).
There was a problem hiding this comment.
Heh. Right. I think this is still what it’s known as, though? (That it’s implemented as a serialize + deserialize step should be more or less transparent to users)
There was a problem hiding this comment.
@addaleax That article is pretty outdated though (it has no mention of MessagePort or SharedArrayBuffer)... HTML structured serialization/deserialization? maybe a link to v8.Serializer/Deserializer?
There was a problem hiding this comment.
(it has no mention of MessagePort or SharedArrayBuffer)
That’s why these are mentioned explicitly here ;)
maybe a link to v8.Serializer/Deserializer?
Yeah, that’s a good idea … that should give people an idea of how to play around with the algorithm without spinning up message channels every time. I’ve added:
+For more information on the serialization and deserialization mechanisms
+behind this API, see the [serialization API of the `v8` module][v8.serdes].| --> | ||
|
|
||
| The `Worker` class represents an independent JavaScript execution thread. | ||
| Most Node APIs are available inside of it. |
There was a problem hiding this comment.
Ayo or Ayo.js.
Is Worker available in a worker?
There was a problem hiding this comment.
Is
Workeravailable in a worker?
Yes. That’s also tested and the code is generally laid out to support such a structure, in part because there’s not really anything standing in the way of it
| * Returns: {undefined} | ||
|
|
||
| Starts receiving messages on this `MessagePort`. When using this port | ||
| as an event emitter, this will be called automatically once `message` listeners |
| <!-- YAML | ||
| added: REPLACEME | ||
| --> | ||
|
|
There was a problem hiding this comment.
What arguments does this event get? (value, transferList)?
There was a problem hiding this comment.
Only value. We could try to add the transferList too, if that seems useful, but all of the relevant objects should be reachable through value anyway.
I don’t know if we have some standard format for saying “this events has one parameter that can be any JS value”, so I’m adding prose for it.
There was a problem hiding this comment.
I think something like the following is used. Basically the same format as function parameters.
### Event: 'event'
* `value` {any} The value
This event is fired when ...There was a problem hiding this comment.
Ah, thanks – done!
| MessagePort* port; | ||
| ASSIGN_OR_RETURN_UNWRAP(&port, args.This()); | ||
| if (!port->data_) { | ||
| env->ThrowError("Can not send data on closed MessagePort"); |
|
|
||
| The `MessageChannel` has no methods of its own. `new MessageChannel()` | ||
| yields an object with `port1` and `port2` properties, which refer to linked | ||
| [`MessagePort`][] instances. |
There was a problem hiding this comment.
Something I've been thinking is that the messaging API could well be applicable to VM contexts too. Is that a correct understanding?
There was a problem hiding this comment.
Yes, that is a correct understanding, because the ports in a channel can be moved to different contexts
| // { globalVar: 1024 } | ||
| ``` | ||
|
|
||
| ## vm.moveMessagePortToContext(port, context) |
There was a problem hiding this comment.
Maybe just vm.transferMessagePort()? The "to context" part is already conveyed by the fact that this method is in vm module, and "transfer" is a more Proper(R) name IMO than "move".
There was a problem hiding this comment.
What I like about move rather than transfer is that it conveys better that the original message port object is now rendered unusable … I don’t feel strongly about it, though
|
|
||
| ExternalSABReference ReferenceCountedSAB::ForIncomingSharedArrayBuffer( | ||
| Environment* env, Local<Context> context, Local<SharedArrayBuffer> source) { | ||
| Local<Value> lifetime_partner; |
| Local<FunctionTemplate> templ; | ||
| templ = env->message_port_constructor_template(); | ||
| if (!templ.IsEmpty()) | ||
| return templ->GetFunction(context); |
There was a problem hiding this comment.
Is GetFunction() cached by context?
There was a problem hiding this comment.
Yup – I have to admit I never got the point of {Function,Object}Templates before making this PR. However, in retrospect it’s a lot clearer: It’s a Template in the sense that it can be used to create functionally equivalent functions/objects in different contexts.
| void AddSharedArrayBuffer(ExternalSABReference ref); | ||
| // Internal method of Message that is called once serialization finishes | ||
| // and that transfers ownership of `data` to this message. | ||
| void AddMessagePort(std::unique_ptr<MessagePortData>&& data); |
There was a problem hiding this comment.
I think that would require giving the SerializerDelegate that accesses these a publicly visibly identifier, right?
00bb418 to
9d3a05f
Compare
5d36c5d to
5b0aca3
Compare
|
@TimothyGu I think I got everything you commented on so far? :) |
43b4725 to
5d793a4
Compare
TimothyGu
left a comment
There was a problem hiding this comment.
Some more feedback. Still haven't read much code yet.
| added: REPLACEME | ||
| --> | ||
|
|
||
| * Returns: {undefined} |
There was a problem hiding this comment.
* `value` {any}
* `transferList` {Object[]}| <!-- YAML | ||
| added: REPLACEME | ||
| --> | ||
|
|
There was a problem hiding this comment.
You mean, refer to cluster here?
There was a problem hiding this comment.
No, I intended to refer to the comment of adding parameter types. Seems like github messed up the order :/
There was a problem hiding this comment.
Ah – in that case, done :)
| } | ||
|
|
||
| void MessagePortData::Entangle(MessagePortData* a, MessagePortData* b) { | ||
| a->sibling_ = b; |
There was a problem hiding this comment.
I'd add some CHECKs or even mutexes here making sure it's thread-safe, and only works when either port is not entangled.
There was a problem hiding this comment.
I think CHECKs for making sure that the ports aren’t entangled are fine. The method isn’t really required or supposed to be thread-safe, I’ve noted that in the documentation. (It’s only called by the thread that created both message ports.)
| use them for I/O, since Ayo’s built-in mechanisms for performing operations | ||
| asynchronously already treat it more efficiently than Worker threads can. | ||
|
|
||
| Workers can also, unlike child processes, share memory efficiently by |
There was a problem hiding this comment.
You can even explicitly call out clusters here.
| --> | ||
|
|
||
| The `Worker` class represents an independent JavaScript execution thread. | ||
| Most Ayo APIs are available inside of it. |
There was a problem hiding this comment.
I would mention message passing as the primary means of communication between the worker thread and the thread that spawned the worker. Maybe something like:
Like [Web Workers] and the [
clustermodule], two-way communication can be achieved through inter-thread message passing. Internally, aWorkerhas a built-in pair of [MessagePort]s that are already associated with each other when theWorkeris created. While theMessagePortobjects are not directly exposed, their functionalities are exposed through [worker.postMessage()] and the ['message'event] on theWorkerobject for the parent thread, and [require('worker').postMessage()] and the ['workerMessage'event] onrequire('worker')for the child thread.To create custom messaging channels (which is strongly encouraged over using the default global channel for more complex tasks), users can create a
MessageChannelobject on either thread and pass one of theMessagePorts on thatMessageChannelto the other thread through a pre-existing channel, such as the global one.[insert example]
See [
port.postMessage()] for more information on how messages are passed, and what kind of JavaScript values can be successfully transported through the thread barrier.
|
|
||
| The `MessageChannel` has no methods of its own. `new MessageChannel()` | ||
| yields an object with `port1` and `port2` properties, which refer to linked | ||
| [`MessagePort`][] instances. |
There was a problem hiding this comment.
I'd add a description of what MessageChannel is before saying what it has. Maybe:
The
MessageChannelclass represents an asynchronous messaging channel.
| context. | ||
|
|
||
| Note that the return instance is *not* an `EventEmitter`; for receiving | ||
| messages, the `.onmessage` property can be used. |
There was a problem hiding this comment.
It seems rather crude to have some MessagePorts being EventEmitters, and some not. I understand you are still working on this part, and that the EventEmitter is implemented in JS complicates things. But I'd rather have it all one way or the other.
There was a problem hiding this comment.
But I'd rather have it all one way or the other.
Me too. :) But like with the Buffers, it’s just a problem that not all contexts have a concept of EventEmitters … this would require making some internal modules multi-context ready, which I would like to consider out of scope for this PR. :) Also, this is already quite an advanced feature on its own.
| * Extends: {EventEmitter} | ||
|
|
||
| Instances of the `worker.MessagePort` class represent an asynchronous | ||
| communications channel. It can be used to transfer structured data, |
There was a problem hiding this comment.
s/an asynchronous communications channel/an end(or port?) of a two-way asynchronous communications channel/
|
|
||
| `transferList` may be a list of `ArrayBuffer` and `MessagePort` objects. | ||
| After transferring, they will not be usable on the sending side of the channel | ||
| anymore. |
There was a problem hiding this comment.
I guess one of the main thing that confused me was how transferList was used. So now I understand that objects present in value (no matter how deep) that are also in transferList are transferred, while ordinarily they are cloned. I wasn't aware of the "ordinarily" part.
However, what if transferList has an object that is not present in value? Will it get neutered as well?
There was a problem hiding this comment.
I’m adding clarifications for both points, thanks
| communications channel. It can be used to transfer structured data, | ||
| memory regions and other `MessagePort`s between different [`Worker`][]s | ||
| or [`vm` context][vm]s. | ||
|
|
There was a problem hiding this comment.
Also, please document if/when open() and close() need to be called. An example would be best.
There was a problem hiding this comment.
It’s hard to come up with examples for these … I’ve added to the start() documentation that it doesn’t need to be used unless moving-between-VM-contexts is used. (And, as you pointed out below, ideally it wouldn’t be necessary at all).
Regarding close() … I’m not entirely sure whether MessagePorts should be unref()ed by default or not. Right now, any MessagePort keeps an event loop open unless explicitly unref()ed or close()ed …
8b087d0 to
a2095eb
Compare
|
@TimothyGu I think I got more or less everything … the |
a2095eb to
0693217
Compare
| void Finish() { | ||
| for (MessagePort* port : ports_) { | ||
| port->Close(); | ||
| msg_->AddMessagePort(std::move(port->Detach())); |
There was a problem hiding this comment.
Is the std::move redundant if MessagePort::Detach already returns a std::move'd smart pointer?
There was a problem hiding this comment.
Clang complains:
../src/node_messaging.cc:325:28: warning: moving a temporary object prevents copy elision [-Wpessimizing-move]
msg_->AddMessagePort(std::move(port->Detach()));
^
../src/node_messaging.cc:325:28: note: remove std::move call here
msg_->AddMessagePort(std::move(port->Detach()));
^~~~~~~~~~ ~
Will fix.
| } | ||
| Context::Scope context_scope(context); | ||
| MessagePort* target = | ||
| MessagePort::New(env, context, nullptr, std::move(port->data_)); |
There was a problem hiding this comment.
This std::move(port->data_) should be port->Detach().
0693217 to
ef85bd3
Compare
|
@addaleax I've completed making MessagePort consistently EventEmitter using the V8 Extras approach I talked about on Discord. See bb85e11 and 62a0263. The same approach could be taken to make Buffer available everywhere as well. While I agree that it isn't the most in scope for this PR, it shows something pretty promising IMO. |
ef85bd3 to
d76b9ec
Compare
|
@TimothyGu that is, indeed, very nice. 👏 The commits generally LGTM, I’ll pull them in here |
|
With the changes pulled this mostly LGTM - hopefully I'll be able to get some interesting benchmarks going. |
d76b9ec to
f715000
Compare
Qard
left a comment
There was a problem hiding this comment.
de0d718 LGTM other than double std::move(...) mentioned by @TimothyGu.
| // Create a SharedArrayBuffer object for a specific Environment and Context. | ||
| // The created SharedArrayBuffer will be in externalized mode and has | ||
| // a hidden object attached to it, during whose lifetime the reference | ||
| // count if increased by 1. |
There was a problem hiding this comment.
Actually the whole wording of this is a bit weird. Not sure what would be a better way to explain that. 🤔
There was a problem hiding this comment.
Not sure what would be a better way to explain that. 🤔
Hm, yeah… was this comment helpful in explaining what’s happening?
There was a problem hiding this comment.
It makes sense for the most part. The wording of during whose lifetime the reference count if increased by 1 was a bit hard for me to parse though. I suspect someone even less familiar with threading stuff would be even more confused.
@Qard Upps, yes – I fixed that but it ended up being in a different commit. I’ve rebased & added you as a reviewer :) |
| var stdout; | ||
| var stderr; | ||
|
|
||
| const isMainThread = process.binding('worker').threadId === 0; |
There was a problem hiding this comment.
This threadId === 0 check exists in a bunch of places. Could just have a getter at process.binding('worker').isMainThread to make it clearer what the checks are doing.
There was a problem hiding this comment.
If nodejs/node#16218 lands I’d probably just tack it onto the internal module function wrapper 😼
(edit: corrected link)
| debug(`[${threadId}] terminates Worker with ID ${this.threadId}`); | ||
|
|
||
| if (typeof callback !== 'undefined') | ||
| this.once('exit', (exitCode) => callback(null, exitCode)); |
There was a problem hiding this comment.
Why the null first argument if there's no error path?
There was a problem hiding this comment.
I guess just considering the error-first pattern as a standard in Node land? E.g. tools like util.promisify (or similar userland tools) rely on it
There was a problem hiding this comment.
There's plenty of similar examples of terminator functions binding a "callback" to an event which don't use the error-first style, like net.close(cb) and stream.end(cb). IMO, not having the impossible error argument slot makes more sense, but I think it'd be better to hear what others think on this.
There was a problem hiding this comment.
Hm, yeah, but those don’t carry data like an exit code, right?
There was a problem hiding this comment.
Oh, yep. You're right. I was remembering the close event on net.Socket having an argument to indicate if it closed due to an exception, but that's only on client sockets, which don't have a close(cb), only a destroy(exception) and end(data, encoding).
|
I’ll rebase this later, there shouldn’t be any actually significant conflicts except maybe with nodejs/node@e5ad545 |
|
Fwiw I “fixed” the V8 extras + events + internal errors issue by manually editing the error messages and tacking on |
Taken from petkaantonov/io.js@ea143f7 and modified to fit current linter rules and coding style.
Native addons need to use flags to indicate that they are capable of being loaded by worker threads. Native addons are unloaded if all Environments referring to it have been cleaned up, except if it also loaded by the main Environment.
This should help a lot with actual sandboxing of JS code.
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Only allow `.js` and `.mjs` extensions to provide future-proofing for file type detection.
$ ./ayo benchmark/cluster/echo.js cluster/echo.js n=100000 sendsPerBroadcast=1 payload="string" workers=1: 26,709.154114687004 cluster/echo.js n=100000 sendsPerBroadcast=10 payload="string" workers=1: 15,936.350422945854 cluster/echo.js n=100000 sendsPerBroadcast=1 payload="object" workers=1: 20,778.85550744996 cluster/echo.js n=100000 sendsPerBroadcast=10 payload="object" workers=1: 10,912.260027807712 $ ./ayo benchmark/worker/echo.js worker/echo.js n=100000 sendsPerBroadcast=1 payload="string" workers=1: 69,787.63926344117 worker/echo.js n=100000 sendsPerBroadcast=10 payload="string" workers=1: 32,544.630210444844 worker/echo.js n=100000 sendsPerBroadcast=1 payload="object" workers=1: 48,706.90345844702 worker/echo.js n=100000 sendsPerBroadcast=10 payload="object" workers=1: 18,088.639282621873
|
does it mean that we are not use thread in nodejs? or jsut this branch is deleted? |
|
@p3x-robot A newer version of this PR was merged into Node.js' master branch. |
(status: currently ready for review, by anyone, including you: #40 (comment))
(If that gives the wrong impression: No, this does not conform the browser WebWorker API, but that should be rather easily implementable on top of this, and I’m okay with that.)
Preliminary benchmarks: https://gist.github.com/TimothyGu/7fd2fbe15537a84963c36a3e0a03bcce
Fixes: #31
Checklist
make -j4 test(UNIX), orvcbuild test(Windows) passesTODO
.start()manually forMessagePortsperformance(currently pending the V8 6.1 update in upstream Node.js)MessagePorts configurablev8::Platformimplementation with multi-isolate support(@petkaantonov please feel free to indicate whether attributing you for code that comes from your original PR is not enough/too much/just right :) )