Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
doc: initial cut at support tiers for diag tools
  • Loading branch information
mhdawson committed Aug 27, 2018
commit 19ee126668d5e24d500e45af425251a402689596
115 changes: 115 additions & 0 deletions doc/guides/diagnostic-tooling-support-tiers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Diagnostic Tooling Support Tiers

Diagnostic tooling is important to the consumers of Node.js. It is used both
in development and in production in order to investigate problems. The failure
of one of these tools may be as big a problem for an end user as a bug within
the runtime itself.

The Node.js project has assessed the tools and the APIs which support those
tools. Each of the tools and APIs has been put them into one of
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is them unnecessary here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

the following tiers.

* Tier 1 - Mmust always be working for all
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Mmust/Must

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Current and LTS Node.js releases. A release will not be shipped if the test
suite for the tool/API is not green. To be considered for inclusion
in this tier it must have a good test suite and that test suite and a job
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we include entries similar to those required by citgm:

The maintainers of the module remain responsive when there are problems
The module must be actively used by the community
The module must be heavily depended on

Given that tier-one tools block our commits and releases.

must exist in the Node.js CI so that it can be run as part of the release
process. No commit to master should break this tool/API if the next
Copy link
Copy Markdown
Member

@joyeecheung joyeecheung Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should be more specific about what No commit to master should break this tool/API means? We probably will not run the tests of these tools for every PR to master, so this will be hard to enforce as-is. We may either:

  1. Run the tests of these tools with master every day/week and revert any recent commit that breaks the tests
  2. Or, acknowledge that we may have commits on master that break the tools without being noticed (because we may not run the tooling tests for those PRs), and limit this to no commit to the current and LTS release branches should break..., which means we will run the tooling tests in the release process and exclude commits that break the tests (optionally, run the tests when someone suspects a PR may break the tools, i.g. similar to how CITGM works). This would be more practical.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with 2, given you running for every commit is problematic. In addition to the release process we should still aim to run them nightly if possible so that we get earlier warning if possible.

release is within 1 month.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to defined what does it mean to "work". In some cases it's a true-false decision (llnode), but in others it is a matter of accuracy (Linux perf in 8.x for example). I assume you mean the latter, but I think it should be written down.

cc @yunong @mmarchini

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we'll require a test suite on Node.js CI for Tier 1 tools, maybe we could define "work" as "successfully running a CI for the tool's test suite"?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, updated to try to clarify that.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove the condition. Current releases usually happen at least once a month.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better: keep the condition but change it to "if the next major release is within 1 month"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


* Tier 2 - Must be working for all
LTS releases. An LTS release will not be shipped if the test
suite for the tool/API is not green. To be considered for inclusion
in this tier it must have a good test suite and that test suite and a job
must exist in the Node.js CI so that it can be run as part of the release
process.

* Tier 3 - If possible its test suite
will be run at least nightly in the Node.js CI and issues opened for
failures. Does not block shipping a release.

* Tier 4 - Does not block shipping a release.

* Unclassified - tool/API is new or does not have the required testing in the
Node.js CI in order to qualify for a higher tier.

The choice of which tier a particular tool will be assigned to, will be a
collaborative decision between Diagnostics WG and Release WG. Some of the
criteria considered might be:

* If the tool fits into a key cateogry as listed below.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cateogry -> category

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

* Whether the tool is actively used by the community.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you define "the community"? Is this...

  • Open source developers?
  • The Node.js core collaborators?
  • Developers deploying to FaaS infrastructure?
  • Massive enterprise deployments?

We tend to use "the community" as a catch-all where I think we may be more well served by saying "the ecosystem" – but even then, we need to be explicit when we're codifying expectations around what that means.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to the ecosystem. I don't think its any one of the above at the exclusion of the others. It is that its actively used by enough of the overall ecosystem which includes all of the categories you listed.

* The Availability of alternatives.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Availability -> availability?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* Impact to the overall ecosystem if the tool is not working.
* The availability of reliable test suite that can be integrated into our CI.
* The availability of maintainer or community collaborator who will help
resolve issues when there are CI failures.
* If the tool is maintained the Node.js foundation GitHub organization.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maintained -> maintained by?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: foundation -> Foundation

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


The current categories of tools/APIs that fall under these Tiers are:

* FDDC (F) - First failure data capture, Easy to consume initial diagnostic
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FDDC -> FFDC?
Easy -> easy?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

information.
* Tracing (T) - use of logging to provide information about execution flow.
* Memory (M) - tools that provide additional information about memory
used in the Heap or by native code.
* Profiling (P) - tools that provide additional information about where
cpu cycles are being spent.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpu -> CPU?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

* AsyncFlow (A) - tools that provide additional insight into asynchronous
execution flow.


The following tools are currently assigned to Tiers as follows
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Colon in the end?

Copy link
Copy Markdown
Contributor

@Alhadis Alhadis Aug 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tautology: the word "follow" is used here twice:

The following tools are ... as follows

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


## Tier 1

| Tool Type | Tool Name | Regular Testing In Node.js CI | Integrated with Node.js | Target Tier |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ->in? (here and below)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

|-----------|---------------------------|-------------------------------|-------------------------|-------------|
| | | | | |

## Tier 2

| Tool Type | Tool Name | Regular Testing In Node.js CI | Integrated with Node.js | Target Tier |
|-----------|---------------------------|-------------------------------|-------------------------|-------------|
| | | | | |


## Tier 3

| Tool Type | Tool Name | Regular Testing In Node.js CI | Integrated with Node.js | Target Tier |
|-----------|---------------------------|-------------------------------|-------------------------|-------------|
| | | | | |

## Tier 4

| Tool Type | Tool Name | Regular Testing In Node.js CI | Integrated with Node.js | Target Tier |
|-----------|---------------------------|-------------------------------|-------------------------|-------------|
| | | | | |

## Not yet classified

| Tool Type | Tool/API Name | Regular Testing In Node.js CI | Integrated with Node.js | Target Tier |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the second column header be the same in all tables (Tool Name vs Tool/API Name)?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fixed

|-----------|---------------------------|-------------------------------|-------------------------|-------------|
| FFDC | node-report | No | No | 1 |
| Memory | llnode | ? | No | 2 |
| Memory | mdb_V8 | No | No | 4 |
| Memory | node-heapdump | No | No | 2 |
| Memory | V8 heap profiler | No | Yes | 1 |
| Memory | V8 sampling heap profiler | No | Yes | 1 |
| AsyncFlow | Async Hooks(API) | ? | Yes | 1 |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space before (API) here and below?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

| Debugger | V8 Debug protocol(API) | No | Yes | 1 |
| Debugger | Command line Debug Client | ? | Yes | 1 |
| Debugger | Chrome Dev tools | ? | No | 3 |
| Debugger | Chakracore - time-travel | No | data source only | to early |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data -> Data (for consistency)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but was to early intended to be Too early?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes and yes

| Tracing | trace_events(API) | No | Yes | 1 |
| Tracing | DTrace | No | Partial | 3 |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • DTrace is also used as a profiling tool and is widely used to generate FlameGraphs
  • Systemtap should also be included since its situation is similar to DTrace (for tracing): we have a file specifying Systemtap tracepoints (https://github.com/nodejs/node/blob/master/src/node.stp). I would add it as Tracing | Systemtap | No | Partial | ?

| Tracing | LTTng | No | Removed ? | N/A |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unneeded space in Removed ??

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

| Tracing | ETW | No | Partial | 3 |
| Profiling | V8 cpu profiler | No | Yes | 1 |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpu -> CPU?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include --prof as well? Is that included in V8 CPU profiler?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is the V8 cpu profiler ?

| Profiling | Linux perf | No | Partial | ? |
| Profiling | Windows xperf | No | ? | ? |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xperf -> Xperf?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

| Profiling | Ox | No | No | to early |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would classify 0x being "Tier 4" for now.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok will make that the target, but I'd like to handle tool promotion from unclassified to specified tiers with separate PRs.

| Profiling | node-clinic | No | No | to early |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to -> too

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

| F/P/T | appmetrics | No | No | ? |
| M/T | eBPF tracing tool | No | No | ? |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we convert this list to links to the various tools?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I'd prefer to do that in a separate PR. I will commit to doing that once this PR lands.