Skip to content

win: wait for full timeout duration#385

Closed
orangemocha wants to merge 1 commit intolibuv:v1.xfrom
JaneaSystems:joaocgreis-early-timer
Closed

win: wait for full timeout duration#385
orangemocha wants to merge 1 commit intolibuv:v1.xfrom
JaneaSystems:joaocgreis-early-timer

Conversation

@orangemocha
Copy link
Copy Markdown
Contributor

uv_poll should wait for at least the full timeout duration when there
is nothing else to do. This was not happening because
GetQueuedCompletionStatus can occasionally return early up to 15ms.
This is similar to what happens on unix, where the poll function loops
until the desired timeout is reached.

The added test reproduces https://github.com/joyent/node/blob/d13d7f74d794340ac5e126cfb4ce507fe0f803d5/test/simple/test-timers-first-fire.js
on libuv, being flaky before this fix.

Fix: nodejs/node-v0.x-archive#8960

@saghul saghul added the windows label Jun 3, 2015
@saghul
Copy link
Copy Markdown
Member

saghul commented Jun 3, 2015

/cc @piscisaureus

@piscisaureus
Copy link
Copy Markdown
Contributor

uv_poll should wait for at least the full timeout duration when there
is nothing else to do

Agreed that this needs to be fixed. But doesn't this cause libuv to busyloop for 15ms when a scenario like that occurs?

@orangemocha
Copy link
Copy Markdown
Contributor Author

Good point. This didn't happen in our testing but we are doing some further testing on a variety of hardware/OS configurations. We'll also refactor this to a loop, and assert that at most 2 calls to GetQueuedCompletionStatus are necessary, so that should this not hold true on some machine out there, we can catch it.

@piscisaureus
Copy link
Copy Markdown
Contributor

A solution may also be to set a minimum timeout when GetQueuedCompletionStatus is re-run.

You could make it auto-tuning by starting with a minimum timeout and increase it every time GetQueuedCompletionStatus returns too early. e.g.

  int fudge = 0;
retry:
  result = GetQueuedCompletionStatus(.., due - now + fudge)

  if (result == STATUS_TIMEOUT && uv_now() < next_timer_due) {
    // GetQueuedCompletionStatus returned too early.
    fudge = fudge ? fudge << 1 : 1;
    goto retry;    
  }

What surprises me though is that this issue pops up now. When I created the original uv_poll implementation (granted, that was years ago, using windows 7 and XP) I tested quite extensively for early timers, and it never happened. Is this a regresion in windows? Any idea in at which version it this behavior was introduced?

@saghul saghul added this to the next milestone Jun 15, 2015
@piscisaureus
Copy link
Copy Markdown
Contributor

This patch needs further refinement before it can be landed. @orangemocha just confirmed on the node.js TSC call that it is still on his todo list.

@saghul saghul removed this from the 1.7.0 milestone Jul 8, 2015
@orangemocha
Copy link
Copy Markdown
Contributor Author

Yep, sorry for dropping the ball on this one. Here's a status update.

We implemented the retry logic in a loop, including your suggestion for progressively increasing the timeout to ensure that we don't end up busy waiting. We tested it on a variety of hardware/OS combinations. Busy waiting doesn't seem to be an issue. However in rare cases it takes up to 3 calls to GetQueuedCompletionStatus to reach the target timeout. The last one is at most 1ms away from the target, so the theory is that this is due to rounding errors when converting hrtime results to integers. I wanted to see if we could fix that.

The other issue is that target timeouts are computed relative to the loop time when the timer is started. In the event that the loop time was stale when the timer is started, and given that GetQueuedCompletionStatus can return early, there is no way to avoid occasional failures in test-timers-first-fire, unless we update the loop time when starting the timer. I wanted to consider making that change and assess the impact on performance and correctness.

@joaocgreis
Copy link
Copy Markdown
Contributor

@piscisaureus Updated.

This runs GetQueuedCompletionStatus in a loop until the target time is reached. We observed it running at most 2 or at most 3 times depending on the machine used, but never more than that. Since the documentation does not guarantee that GQCS does not return immediately, the target time is increased exponetially after the second round, but we never observed this in practice.

The target time is still calculated based on the loop time and not the real time, to avoid the overhead of updating the time. Because of this, the timers first fire test that was in node would still fail, as it did in other platforms, but much more rarely. The test was deleted in nodejs/node#2458 .

@orangemocha
Copy link
Copy Markdown
Contributor Author

LGTM. It's been a while, but this is the fix we had agreed with @piscisaureus at the collaborator summit in August. Looks solid to me.

@saghul
Copy link
Copy Markdown
Member

saghul commented Dec 15, 2015

Can you rebase it so I can run the CI on last time?

uv_poll should wait for at least the full timeout duration when there
is nothing else to do. This was not happening because
GetQueuedCompletionStatus can occasionally return up to 15ms early.

The added test reproduces https://github.com/joyent/node/blob/d13d7f74d794340ac5e126cfb4ce507fe0f803d5/test/simple/test-timers-first-fire.js
on libuv, being flaky before this fix.

Fix: nodejs/node-v0.x-archive#8960
@orangemocha orangemocha force-pushed the joaocgreis-early-timer branch from 57c9876 to be8b3f5 Compare December 16, 2015 10:35
@orangemocha
Copy link
Copy Markdown
Contributor Author

Rebased. Thank you!

@saghul
Copy link
Copy Markdown
Member

saghul commented Dec 16, 2015

@joaocgreis joaocgreis force-pushed the joaocgreis-early-timer branch from be8b3f5 to 69941c8 Compare December 16, 2015 11:00
Comment thread src/win/core.c
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this is harder to read than necessary. What about:

if (timeout > 1)
  timeout += 1 << (repeat - 1)

I also wonder why we start increasing the timeout at the 3rd round, that seems rather arbitrary, although I'm fine with it and I can't think of a good justification for any value.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on the readability issue, though it's landed now :/

It starts increasing on the third round, because on the second round it usually gets it right. The increase is just a safety net to avoid busy polling. Thank you!

@piscisaureus
Copy link
Copy Markdown
Contributor

LGTM - I had one minimal nit, but my feelings won't be hurt if you totally ignore it.

saghul pushed a commit that referenced this pull request Dec 16, 2015
uv_poll should wait for at least the full timeout duration when there
is nothing else to do. This was not happening because
GetQueuedCompletionStatus can occasionally return up to 15ms early.

The added test reproduces https://github.com/joyent/node/blob/d13d7f74d794340ac5e126cfb4ce507fe0f803d5/test/simple/test-timers-first-fire.js
on libuv, being flaky before this fix.

Fixes: nodejs/node-v0.x-archive#8960
PR-URL: #385
Reviewed-By: Alexis Campailla <alexis@janeasystems.com>
Reviewed-by: Bert Belder <bertbelder@gmail.com>
@saghul
Copy link
Copy Markdown
Member

saghul commented Dec 16, 2015

Landed in 427e4c9, thank you all!

@piscisaureus I was in the middle of committing the patch, so I skipped your nit but added you to the reviewers list :-)

@saghul saghul closed this Dec 16, 2015
@piscisaureus
Copy link
Copy Markdown
Contributor

@piscisaureus I was in the middle of committing the patch, so I skipped your nit but added you to the reviewers list :-)

Haha alright, thanks for landing it (and @joaocgreis, @orangemocha of course for fixing the bug)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Timers sometimes fire early on Windows

4 participants