Increase default keepAliveTimeout to 65s#59203
Conversation
|
Review requested:
|
mcollina
left a comment
There was a problem hiding this comment.
Agreed, this should have been raised a long time ago.
What are the http keep-alive timeouts in browsers, smartphones etc? Can you provide references?
I did a quick check and Chrome would default to 300s. @nodejs/tsc can you weight it if you want to increase it up to 350s?
Fastify uses 72 seconds to avoid any issues with ALB/ELBs and other load balancers that expect 60s (having the same timeout is slightly problematic).
Increasing this exposes users to the possibility of DoS attacks by exhaustion of file descriptor - just keeping the sockets open would prevent new connections from arriving.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #59203 +/- ##
==========================================
+ Coverage 90.04% 90.06% +0.02%
==========================================
Files 648 648
Lines 191041 191041
Branches 37448 37455 +7
==========================================
+ Hits 172026 172065 +39
+ Misses 11651 11601 -50
- Partials 7364 7375 +11
🚀 New features to boost your workflow:
|
|
Hey @mcollina , Thanks for your feedback on the keep-alive timeout. I’ve chosen 60 seconds to align with defaults used by major cloud load balancers like AWS ELB/ALB, ensuring broad compatibility and preventing unexpected connection drops. Increasing the timeout to 72 seconds, as seen in Fastify and NGINX, could slightly boost connection reuse, but also raises the risk of file descriptor exhaustion under DoS scenarios. Overall, 60 seconds provides a safer balance between interoperability and security for most environments. I’m open to discussing further if there’s a strong case for a higher value. |
|
@mcollina , I've resolved this issue and removed the breaking change alert regarding http.Server.keepAliveTimeout. The documentation now reflects the correct status, in line with the release team's guidance. Please review my pull request when you have a chance. Thank you! |
|
I've seen 60 seconds cause problems due to time network delays - I've actually seen this happen. Going over 60 seconds is necessary to account for that. |
|
Okay, sure—that makes sense. Is 72 seconds sufficient to account for those network delays, or do you recommend going even higher? Happy to adjust as needed based on your experience. |
|
I literally picked 72 with no reasoning. We could do 65. |
|
Thanks for clarifying. I’m fine with 65 seconds if that works better—happy to update the default to 65 to provide a bit more buffer over 60. Let me know if you have any concerns or if you think another value might be preferable. |
|
@pras529 there are 4 frailing tests and linting to be fixed, could you address them? |
|
@mcollina Yes, I’ll address the failing tests and linting issues—just need a bit of time to work through them. |
pimterry
left a comment
There was a problem hiding this comment.
I'm very happy to increase this as well, but it's worth noting that the description in the linked issue doesn't quite match the reality of what's going on. The example reproduction in #59193 specifically isn't correct - in fact the response there is delivered just fine AFAICT. The keepAliveTimeout we're talking about here is how long the server will wait between requests, to keep a connection alive, when there's no response pending. In cases like that repro, it's not relevant: the response will be delivered to the client, and then the connection will close 5 seconds later (11 seconds after the initial connection started, in that example).
I've done some quick research, there's a wide split in defaults elsewhere:
- Reverse proxies & standalone server tools generally all have long multi-minute keep-alive timeouts, e.g.:
- Nginx uses 75 seconds
- Caddy uses 300 seconds
- Traefik uses 180 seconds
- AWS ELBs use 60 seconds
- Google's Cloud uses 620 seconds
- Cloudflare users 900 seconds
- Meanwhile most other programming languages and backends tend to use fairly short keep-alive timeouts:
- Uvicorn defaults to 5 seconds
- Gunicorn defaults to 2 seconds
- Puma defaults to 20 seconds
- Apache defaults to 5 seconds
- Jetty defaults to 30 seconds
I did a quick check and Chrome would default to 300s.
I don't think browser's configuration matters as much as LBs & reverse proxies. Given a RST when reusing an idle connection they'll retry transparently without exposing the error (I've just tested Chrome & Firefox and confirmed) and I've seen them apply heuristics and various connection pooling games to manage keep-alive and idle sockets there in the past - it's not a fixed value.
Of course, it's still nice for performance to keep the connection open, but in browser-facing performance-sensitive cases you'll probably want HTTP/2 anyway, which this PR doesn't touch.
The only case where there's going to cause an actual problem (e.g. those 502 errors) is when a) the client which doesn't gracefully handle this (it's not a browser) and b) you hit this race condition where the server closes the socket, but the client sends the next request before it processes the FIN packet.
Unless the Node timeout is greater than the LB timeout (i.e. many minutes) then that race condition is always possible, but any increase will exponentially decrease the probability of this race happening (particularly on production sites, where I'd strongly expect most LB->backend connections are rarely idle for more than a few seconds anyway).
Anyway, that's a long way of saying: given the above I think boosting this to 30+ seconds is a good idea, happy to go with 65 seconds as a balanced option 👍 but I don't think we should worry about going beyond that - we're balancing protection against resource usage vs race condition resistance, and there's diminishing returns.
|
Thanks for referencing this and working on a fix! |
e59f1f3 to
695d5d0
Compare
pimterry
left a comment
There was a problem hiding this comment.
Code all looks good to me 👍
The first commit message is failing the linting - you'll need to use git to modify that. It's probably best to just squash all your commits and change the resulting commit message to describe the end result of the whole change here, because that message is what will be shown in the CHANGELOG when this is released.
http: Requested changes addressed and resolved http: Increase default keepAliveTimeout to 65s http: explicitly set keepAliveTimeout to 65s in tests http: increase keepAliveTimeout default Increase the default keepAliveTimeout to 65 seconds in the http module. Tests have been updated to explicitly set the new timeout value. All requested changes have been addressed and resolved. PR-URL: nodejs#59203
695d5d0 to
33c1556
Compare
|
Hello @mcollina, Thank you for your review. I’ve addressed the changes you requested and updated the commit message to fully comply with our guidelines. I’m still a bit uncertain about the Linux and macOS test results—could you please take a look and let me know if anything else needs adjustment? Thanks for your help! |
|
I'm on the fence to consider this a breaking change or not. If so, this change will probably go out in v25, @pimterry wdyt? |
|
@pras529 can you adjust the first commit message? (You can also squash all commits too) |
marco-ippolito
left a comment
There was a problem hiding this comment.
LGTM but I think this is semver major
|
I'm content with 65 seconds. Appreciate this change greatly. |
RaisinTen
left a comment
There was a problem hiding this comment.
LGTM but the failing test assertions need to be updated
|
What's the reason for it be |
If we think there's a real risk that this could break anything in existing code, or if people would just prefer to be cautious, I'm not opposed to making this major, but personally I'd be quite surprised it caused any issues in practice. For users it should really just be a bug fix & performance improvement. Do we have any specific likely scenario where this might break things? |
|
It's hard to say its real impact, but by experience, whenever we change a default option on HTTP, it tends to break people. It's also hard to have those actions covered by our test suite. I feel that we might want to wait for it to land on Node.js 25 and then backport after assessing its impact. I think CITGM wouldn't cover this scenario too. |
|
@RafaelGSSy assumption is that risk in shipping on Node 24 before it goes LTS is close to zero, but very high afterwars. |
|
Hi folks! If you plan to increase |
|
A very good question @ex1st! From what I can see though, that issue was resolved by #32329, which changed how headerTimeout was measured entirely, so it shouldn't overlap with keepAliveTimeout as it used to. From my reading of the code there, nowadays the header timer doesn't start until a new request starts, and it's disabled when it finishes, so it's not measuring idle time at all. Does that make sense? |
|
Is this change still something we're looking to land on Node.js 25? It seemed to get a lot of approvals/support for changing the default at the time. |
|
It would be great to get this merged, @BethGriggs mentioned yesterday that it's still causing issues, and it would be a clear improvement for lots of use cases. It looks like there's a pending issue on the commit message and some debate about whether it's semver-major or not. I'm happy to go for semver-major if there's uncertainty, just in case. Any objections to that? I'll update it now but shout if anybody feels that's an issue. That'll delay releasing this significantly unfortunately but it's better than leaving it stuck. @pras529 can you update the commit message & force push, so the checks pass? Since this was done the requirements have also changed, so you'll now also need to include a 'Signed-Off-By' footer - see the docs at https://github.com/nodejs/node/blob/main/doc/contributing/pull-requests.md#commit-message-guidelines for details & examples. En route, a rebase would be helpful since this is now a long way back from |
RafaelGSS
left a comment
There was a problem hiding this comment.
I can land it on v26 as semver-major - if we land that in the next couple of days. If we don't see any major breakage, we could consider backporting it to other release lines
Summary
This PR increases the default value of
http.Server.keepAliveTimeoutfrom 5000ms (5s) to 65000ms (65s). This change aligns Node.js with the keep-alive timeout expectations of most browsers, proxies, and mobile clients, reducing the risk of subtle long-lived connection failures due to premature socket closure.Details
The
keepAliveTimeoutdefault is now set to 65 seconds (65000ms) instead of the previous 5 seconds (5000ms).All relevant documentation and code comments have been updated to reflect the new default.
Motivation
The previous default of 5 seconds is shorter than the defaults used by common infrastructure, which can lead to unexpected connection terminations or failures. Setting the default to 65 seconds brings Node.js in line with industry standards and improves interoperability.
References
Please review carefully as this changes default server behavior for all applications that do not explicitly set
keepAliveTimeout.