Skip to content

Fix some async hangs in the functional tests#5007

Merged
rchiodo merged 16 commits into
masterfrom
rchiodo/async_hang_debugging
Mar 29, 2019
Merged

Fix some async hangs in the functional tests#5007
rchiodo merged 16 commits into
masterfrom
rchiodo/async_hang_debugging

Conversation

@rchiodo
Copy link
Copy Markdown

@rchiodo rchiodo commented Mar 28, 2019

For #4992

Two root causes

  1. tree-kill spawns a process it never cleans up after on Windows
  2. liveshare guest shutdown was not using a promise based method, so it would shutdown and then immediately exit. Meaning the test would exit before the shutdown finished.

After this they should run without hanging, so removing the '--exit' from mocha options

  • Pull request represents a single change (i.e. not fixing disparate/unrelated things in a single PR)
  • Title summarizes what is changing
  • Has a news entry file (remember to thank yourself!)
  • Has sufficient logging.
  • Has telemetry for enhancements.
  • Unit tests & system/integration tests are added/updated
  • Test plan is updated as appropriate
  • package-lock.json has been regenerated by running npm install (if dependencies have changed)
  • The wiki is updated with any design decisions/details.

@rchiodo rchiodo self-assigned this Mar 28, 2019
@rchiodo rchiodo requested a review from IanMatthewHuff March 28, 2019 22:59
service.onRequest(LiveShareCommands.getSysInfo, (_args: any[], cancellation: CancellationToken) => this.onGetSysInfoRequest(cancellation));
service.onRequest(LiveShareCommands.restart, (_args: any[], cancellation: CancellationToken) => this.onRestartRequest(cancellation));
service.onRequest(LiveShareCommands.interrupt, (args: any[], cancellation: CancellationToken) => this.onInterruptRequest(args.length > 0 ? args[0] as number : LiveShare.InterruptDefaultTimeout, cancellation));
service.onRequest(LiveShareCommands.disposeServer, (_args: any[], _cancellation: CancellationToken) => this.dispose());
Copy link
Copy Markdown
Author

@rchiodo rchiodo Mar 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onRequest [](start = 24, length = 9)

Yay, test actually found a legitimate bug. #Resolved

@rchiodo rchiodo requested a review from DonJayamanne March 28, 2019 23:02
out: output,
dispose: () => {
if (proc && !proc.killed) {
tk(proc.pid);
Copy link
Copy Markdown
Author

@rchiodo rchiodo Mar 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tk [](start = 20, length = 2)

This causes a hang on windows because tk uses 'exec' instead of 'execSync' to kill a process. #Resolved

Comment thread package.json
"webpack-fix-default-import-plugin": "^1.0.3",
"webpack-merge": "^4.1.4",
"webpack-node-externals": "^1.7.2",
"why-is-node-running": "^2.0.3",
Copy link
Copy Markdown
Member

@IanMatthewHuff IanMatthewHuff Mar 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

love the name. #Resolved

tk(pid);
if (process.platform === 'win32') {
// Windows doesn't support SIGTERM, so execute taskkill to kill the process
execSync(`taskkill /pid ${pid} /T /F`);
Copy link
Copy Markdown
Member

@IanMatthewHuff IanMatthewHuff Mar 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

taskkill is available on all windows systems that we care about? I didn't see docs that specified that off hand, looks like it has been around a while though. #Resolved

Copy link
Copy Markdown
Author

@rchiodo rchiodo Mar 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what the 'tree-kill' function was doing under the covers. So it would have not worked before either. #Resolved

}
if (this.session) {
if (this.session && !this.session.isDisposed) {
this.session.dispose();
Copy link
Copy Markdown
Member

@IanMatthewHuff IanMatthewHuff Mar 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the dispose functions already be checking the isDisposed flag? #Resolved

Copy link
Copy Markdown
Author

@rchiodo rchiodo Mar 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. Not my code, so just wanted to be sure. #Resolved

IanMatthewHuff
IanMatthewHuff previously approved these changes Mar 28, 2019
Copy link
Copy Markdown
Member

@IanMatthewHuff IanMatthewHuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

await sleep(0);
// This function seems to cause CI builds to timeout randomly on
// different tests. Waiting for status to go idle doesn't seem to work and
// in the past, waiting on the ready promise doesn't work either. Check status with a maximum of 5 seconds
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aximum of 5 seconds [](start = 99, length = 19)

Looks like you changed the time.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah made it 10 seconds. Seems long enough? What do you think?


In reply to: 270593590 [](ancestors = 270593590)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 is fine just saying to update the comment.


In reply to: 270593664 [](ancestors = 270593664,270593590)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay will do if there's another change.


In reply to: 270593752 [](ancestors = 270593752,270593664,270593590)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, not sure if code flow reported it, but I flipped to reviewing then back to approved.


In reply to: 270593897 [](ancestors = 270593897,270593752,270593664,270593590)

}

// If we don't have a kernel spec yet, check using our current connection
if (!kernelSpec && connection.localLaunch) {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

localLaunch [](start = 38, length = 11)

Also fixed this bug with remote getting kernel specs when it doesn't need to.

@IanMatthewHuff IanMatthewHuff dismissed their stale review March 29, 2019 22:48

revoking review

Copy link
Copy Markdown
Member

@IanMatthewHuff IanMatthewHuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@rchiodo rchiodo merged commit ca08e17 into master Mar 29, 2019
@rchiodo rchiodo deleted the rchiodo/async_hang_debugging branch April 10, 2019 16:24
@lock lock Bot locked as resolved and limited conversation to collaborators Jul 30, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants