Skip to content

New: Allow parallel linting in child processes (fixes #3565)#12191

Closed
Gaafar wants to merge 70 commits intoeslint:masterfrom
Gaafar:parallel
Closed

New: Allow parallel linting in child processes (fixes #3565)#12191
Gaafar wants to merge 70 commits intoeslint:masterfrom
Gaafar:parallel

Conversation

@Gaafar
Copy link
Copy Markdown

@Gaafar Gaafar commented Aug 30, 2019

WIP: implement parallel linting

@jsf-clabot
Copy link
Copy Markdown

jsf-clabot commented Aug 30, 2019

CLA assistant check
All committers have signed the CLA.

@eslint-deprecated eslint-deprecated Bot added the triage An ESLint team member will look at this issue soon label Aug 30, 2019
@ilyavolodin
Copy link
Copy Markdown
Member

Very nice PoC! Thanks for doing this! There is one big problem with this approach however. Functions in CLIEngine are part of the public API and we can't just change them without creating a major breaking change (which we always try to avoid, even with major releases). We would have to leave existing functions as is, and create another async version of them. We also wanted to put parallel linting behind a flag, because on some CI VPCs with a single processor, it might actually slow things down quite a bit.

@Gaafar
Copy link
Copy Markdown
Author

Gaafar commented Aug 31, 2019

@ilyavolodin Sure! I just wanted to see how if it would work. We need to find a way to introduce an async API without breaking the contract, and with minimal code duplication.

Let me know if you think of any other points to consider. Also I'm not quite sure how to test this functionality.

@ilyavolodin
Copy link
Copy Markdown
Member

Yes, preserving API is very important. We also want to put parallel linting behind CLI flag for now (it might change in the future, but initial implementation should leave single-threaded linting as a default for now. I would suggest -p/--parallel flag). One other thing to watch out for is order of reports in formatter. That needs to stay as it currently is (as in, file order in the output).
In terms of testing, I'm not sure how to test it properly. We need tests to verify that async linting is still completing correctly. We also need some tests to verify that there are multiple processes doing the linting.

@Gaafar
Copy link
Copy Markdown
Author

Gaafar commented Aug 31, 2019

Regarding the flag, I thought that maybe they can be 2 flags:

  1. -p: where the value is the number of parallel processes to run. If the flag is provided without a value, a default number will be used
  2. --parallel-files: the number of files that are linted concurrently, per process. This will be the concurrency limit passed to bluebird.map

@Gaafar
Copy link
Copy Markdown
Author

Gaafar commented Sep 1, 2019

one other thing that I don't know how to tackle is passing ConfigArray. The hack I came up with is to pass it to the child process as a serialized array, but then it loses the properties stored in internalSlotsMap, and then I needed to add the method deserialize to rebuild the plugin definitions. I don't know if there is a better way to do it, or whether it would break something else.

https://github.com/eslint/eslint/pull/12191/files#diff-660ea0590a55a93f96e9f6979144e554

@Gaafar
Copy link
Copy Markdown
Author

Gaafar commented Sep 2, 2019

@ilyavolodin I cleaned up the implementation a lot and found a way to implement parallel without breaking the current sync API. Check it now

@Gaafar Gaafar changed the title parallel PoC New: Allow parallel linting in child processes (fixes #3565) Sep 2, 2019
@mysticatea
Copy link
Copy Markdown
Member

Hi. Thank you for working on this.

I have a concern -- I don't think that serializing ConfigArray is a good idea. Config arrays have a ton of stuff we cannot serialize. It has:

  • Plugin implementations
  • Parser implementations
  • Error objects about loading errors; stack trace is important to debug.
  • Specific class instances such as Minimatch.
  • etc...

Instead, I think that to load configs in workers is better. The config factory has the cache, so it will be fast than serializing the config of every file individually. It's a common case that some hundreds of files use the same config.

@Gaafar
Copy link
Copy Markdown
Author

Gaafar commented Sep 2, 2019

Actually I'm not happy either with serializing the config but I don't know how to load the config in the workers. Can you suggest how to do that?

@mysticatea
Copy link
Copy Markdown
Member

Yes, use CascadingConfigArrayFactory.

@mysticatea
Copy link
Copy Markdown
Member

The instantiation example of the factory is here: lib/cli-engine/cli-engine.js#L552-L561. The constructor arguments are the options of CLIEngine. Then call factory.getConfigArrayForFile(filePath).

Ah.
I have noticed that we cannot support parallel linting if a user used cliEngine.addPlugin(). We cannot pass the plugin implementation to workers. This should be written in the documentation as a known limitation.

@mysticatea mysticatea added core Relates to ESLint's core APIs and features do not merge This pull request should not be merged yet feature This change adds a new feature to ESLint needs design Important details about this change need to be discussed and removed triage An ESLint team member will look at this issue soon labels Sep 2, 2019
@Gaafar
Copy link
Copy Markdown
Author

Gaafar commented Sep 2, 2019

Done. Much cleaner now. Thanks for the feedback! https://github.com/eslint/eslint/pull/12191/files#diff-c50f7c7af8213ec603fb51a7ee2b3cf8R907

Comment thread lib/cli-engine/cli-engine.js Outdated
@Gaafar
Copy link
Copy Markdown
Author

Gaafar commented Sep 2, 2019

Added an example for tests, the idea is to do each assertion once for sync API and once for async

@Gaafar
Copy link
Copy Markdown
Author

Gaafar commented Nov 13, 2019

@kaicataldo Can you review the testing approach in tests/lib/cli-engine/cli-engine.js before I add the remaining cases?

@Gaafar
Copy link
Copy Markdown
Author

Gaafar commented Nov 14, 2019

@ilyavolodin @mysticatea is anyone available to review the testing approach in tests/lib/cli-engine/cli-engine.js? I'm trying to get your opinion before I manually add another 100+ test cases

@mysticatea
Copy link
Copy Markdown
Member

Wow, I apologize that I had not noticed that you are continuing to working on this after the PoC. Please don't advance to implement the unapproved design. PoC is worthful to confirm feasibility, but the advance of work likely goes in a different direction.

Currently, we have been discussing the brand new API for asynchronous on RFC40. We will migrate our APIs to asynchronous entirely and existing CLIEngine will be deprecated. Because we have been meeting several big events:

  • The worker_threads module has arrived stable on Node.js 12.11.0.
  • ES modules have arrived stable on 13.2.0
  • Async Iteration syntax has arrived stable on 10.0.0, and 8.x will be EOL in November 2019.

Therefore, since we will have the first-class asynchronous APIs and thread supports, (and existing RFCs for parallel are inactive while a long time), so I have written the new design based on the first-class asynchronous APIs and threads: RFC42.

RFC42 is different from RFC4 that it implements parallel linting into the executeOnFiles() method directly with the worker_threads module and we will enable parallel linting by default in the future.

So we must implement parallel linting based on RFC42. And RFC42 depends on RFC40. And we are in the middle of discussing the design of RFC40 and RFC42. The design is still changeable.

Please participate in the design discussion and wait for the approval.

Thank you!

@Gaafar
Copy link
Copy Markdown
Author

Gaafar commented Nov 14, 2019

Thanks for the update. That's bit of a bummer honestly. I finally have some free time to pick this up and I was hoping to finish the PR by next week.

I had a quick look at RFC42 and it seems very similar to this PR. I can still make the necessary adjustments to this PR without needing to wait for RFC40. Does this make sense?

@mysticatea
Copy link
Copy Markdown
Member

RFC42 has many different points from RFC4 and RFC42 has its own implementation that I'm working to aim at 7.0.0 (see the RFC). If you are interested, I hope you to contribute to that implementation. But please mind the design is still changeable.

@Gaafar
Copy link
Copy Markdown
Author

Gaafar commented Dec 4, 2019

I'm traveling for most of this month. If you can ping me once the design is finalized, I'll try to reimplement this accordingly. Thanks

@nzakas
Copy link
Copy Markdown
Member

nzakas commented Feb 25, 2020

Closing as we have a new design that will make this PR obsolete. Thanks for all of the work, it really pushed the team to make some big changes to enable parallel linting.

@nzakas nzakas closed this Feb 25, 2020
@choheekim
Copy link
Copy Markdown

choheekim commented Feb 26, 2020

@nzakas It'd be nice if you can share a link of rfc or pr that is being worked on to enable parallel liniting

@nzakas
Copy link
Copy Markdown
Member

nzakas commented Feb 26, 2020

Please see the previous comments where the RFCs are discussed.

@eslint-deprecated eslint-deprecated Bot locked and limited conversation to collaborators Aug 25, 2020
@eslint-deprecated eslint-deprecated Bot added the archived due to age This issue has been archived; please open a new issue for any further discussion label Aug 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

archived due to age This issue has been archived; please open a new issue for any further discussion core Relates to ESLint's core APIs and features do not merge This pull request should not be merged yet feature This change adds a new feature to ESLint needs design Important details about this change need to be discussed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants