Conversation
|
/cc @piscisaureus |
Agreed that this needs to be fixed. But doesn't this cause libuv to busyloop for 15ms when a scenario like that occurs? |
|
Good point. This didn't happen in our testing but we are doing some further testing on a variety of hardware/OS configurations. We'll also refactor this to a loop, and assert that at most 2 calls to GetQueuedCompletionStatus are necessary, so that should this not hold true on some machine out there, we can catch it. |
|
A solution may also be to set a minimum timeout when You could make it auto-tuning by starting with a minimum timeout and increase it every time GetQueuedCompletionStatus returns too early. e.g. int fudge = 0;
retry:
result = GetQueuedCompletionStatus(.., due - now + fudge)
if (result == STATUS_TIMEOUT && uv_now() < next_timer_due) {
// GetQueuedCompletionStatus returned too early.
fudge = fudge ? fudge << 1 : 1;
goto retry;
}What surprises me though is that this issue pops up now. When I created the original uv_poll implementation (granted, that was years ago, using windows 7 and XP) I tested quite extensively for early timers, and it never happened. Is this a regresion in windows? Any idea in at which version it this behavior was introduced? |
|
This patch needs further refinement before it can be landed. @orangemocha just confirmed on the node.js TSC call that it is still on his todo list. |
|
Yep, sorry for dropping the ball on this one. Here's a status update. We implemented the retry logic in a loop, including your suggestion for progressively increasing the timeout to ensure that we don't end up busy waiting. We tested it on a variety of hardware/OS combinations. Busy waiting doesn't seem to be an issue. However in rare cases it takes up to 3 calls to GetQueuedCompletionStatus to reach the target timeout. The last one is at most 1ms away from the target, so the theory is that this is due to rounding errors when converting hrtime results to integers. I wanted to see if we could fix that. The other issue is that target timeouts are computed relative to the loop time when the timer is started. In the event that the loop time was stale when the timer is started, and given that GetQueuedCompletionStatus can return early, there is no way to avoid occasional failures in test-timers-first-fire, unless we update the loop time when starting the timer. I wanted to consider making that change and assess the impact on performance and correctness. |
1f677bd to
57c9876
Compare
|
@piscisaureus Updated. This runs GetQueuedCompletionStatus in a loop until the target time is reached. We observed it running at most 2 or at most 3 times depending on the machine used, but never more than that. Since the documentation does not guarantee that GQCS does not return immediately, the target time is increased exponetially after the second round, but we never observed this in practice. The target time is still calculated based on the loop time and not the real time, to avoid the overhead of updating the time. Because of this, the timers first fire test that was in node would still fail, as it did in other platforms, but much more rarely. The test was deleted in nodejs/node#2458 . |
|
LGTM. It's been a while, but this is the fix we had agreed with @piscisaureus at the collaborator summit in August. Looks solid to me. |
|
Can you rebase it so I can run the CI on last time? |
uv_poll should wait for at least the full timeout duration when there is nothing else to do. This was not happening because GetQueuedCompletionStatus can occasionally return up to 15ms early. The added test reproduces https://github.com/joyent/node/blob/d13d7f74d794340ac5e126cfb4ce507fe0f803d5/test/simple/test-timers-first-fire.js on libuv, being flaky before this fix. Fix: nodejs/node-v0.x-archive#8960
57c9876 to
be8b3f5
Compare
|
Rebased. Thank you! |
be8b3f5 to
69941c8
Compare
There was a problem hiding this comment.
nit: I think this is harder to read than necessary. What about:
if (timeout > 1)
timeout += 1 << (repeat - 1)I also wonder why we start increasing the timeout at the 3rd round, that seems rather arbitrary, although I'm fine with it and I can't think of a good justification for any value.
There was a problem hiding this comment.
+1 on the readability issue, though it's landed now :/
It starts increasing on the third round, because on the second round it usually gets it right. The increase is just a safety net to avoid busy polling. Thank you!
|
LGTM - I had one minimal nit, but my feelings won't be hurt if you totally ignore it. |
uv_poll should wait for at least the full timeout duration when there is nothing else to do. This was not happening because GetQueuedCompletionStatus can occasionally return up to 15ms early. The added test reproduces https://github.com/joyent/node/blob/d13d7f74d794340ac5e126cfb4ce507fe0f803d5/test/simple/test-timers-first-fire.js on libuv, being flaky before this fix. Fixes: nodejs/node-v0.x-archive#8960 PR-URL: #385 Reviewed-By: Alexis Campailla <alexis@janeasystems.com> Reviewed-by: Bert Belder <bertbelder@gmail.com>
|
Landed in 427e4c9, thank you all! @piscisaureus I was in the middle of committing the patch, so I skipped your nit but added you to the reviewers list :-) |
Haha alright, thanks for landing it (and @joaocgreis, @orangemocha of course for fixing the bug) |
uv_poll should wait for at least the full timeout duration when there
is nothing else to do. This was not happening because
GetQueuedCompletionStatus can occasionally return early up to 15ms.
This is similar to what happens on unix, where the poll function loops
until the desired timeout is reached.
The added test reproduces https://github.com/joyent/node/blob/d13d7f74d794340ac5e126cfb4ce507fe0f803d5/test/simple/test-timers-first-fire.js
on libuv, being flaky before this fix.
Fix: nodejs/node-v0.x-archive#8960