Revert "Revert "unix,fs: fix for potential partial reads/writes" (and fix it)#1742
Revert "Revert "unix,fs: fix for potential partial reads/writes" (and fix it)#1742vtjnash wants to merge 5 commits intolibuv:v1.xfrom
Conversation
|
bump. Can I get a CI run here? |
|
New CI since the last one was cancelled - https://ci.nodejs.org/view/libuv/job/libuv-test-commit/770/ |
|
Sorry, I forgot to push my attempted fix for that test (for machines where iovmax > "alot of bufs") after the last CI run. Hopefully this will test correctly on Windows now. |
|
Looks like the zOS |
|
Yes, but I don't have z/OS to determine why |
/cc @libuv/zos |
|
Will look into this in a bit |
|
Use this line from #include <limits.h> /* INT_MAX, PATH_MAX, IOV_MAX */ |
|
bump. |
|
Request reviewer's attention to this PR: a node.js unit test case revealed silent partial write issue. Here is a minimal reproduce and some relevant info: #cat 16601.js const fs = require('fs');
const ksm = process.binding('buffer').kStringMaxLength;
const stream = fs.createWriteStream('./big.txt')
const a = Buffer.alloc(ksm, 'a');
stream.write(a, () => {
stream.end();
})
stream.on('finish', () => {
const len = fs.readFileSync('./big.txt').length
console.log(`attempt: ${ksm}, real: ${len}`)
})#ulimit -f 1024 strace showed that only one attempt was made to write, the failure was neither noticed nor propagated. [pid 9464] write(11, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 1073741799) = 1048576
[pid 9464] write(8, "\1\0\0\0\0\0\0\0", 8) = 8
[pid 9464] futex(0x21b5704, FUTEX_WAIT_PRIVATE, 8, NULL <unfinished ...>
[pid 9457] <... epoll_wait resumed> {{EPOLLIN, {u32=8, u64=8}}}, 1024, 7999) = 1
[pid 9457] read(8, "\1\0\0\0\0\0\0\0", 1024) = 8
[pid 9457] futex(0x21b5704, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x21b5700, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
[pid 9465] <... futex resumed> ) = 0
[pid 9465] futex(0x21b56c0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 9465] close(11 <unfinished ...> |
|
libuv in Node CI: https://ci.nodejs.org/view/libuv/job/libuv-in-node/19/ @gireeshpunathil have you verified that this PR fixes the issue? |
|
|
||
| #ifdef _WIN32 | ||
| int uv_test_getiovmax(void) { | ||
| return INT32_MAX; /* emulated by libuv, so no real limit */ |
| @@ -316,6 +317,11 @@ static ssize_t uv__fs_read(uv_fs_t* req) { | |||
| return -1; | |||
There was a problem hiding this comment.
I think we should make these paths goto done as well so req->bufs is cleaned-up too?
|
|
||
| TEST_IMPL(fs_write_alotof_bufs) { | ||
| const size_t iovcount = 54321; | ||
| size_t iovcount = 54321; |
There was a problem hiding this comment.
Nit: separate declaration from assignment
| static void thread_main(void* arg) { | ||
| const struct thread_ctx *ctx; | ||
| int size; | ||
| char *data; |
There was a problem hiding this comment.
style: star aligned to the left: thread_ctx* and char*
|
yes, I have verified, it works - i.e, throws error (which is expected when writing too big data under too low ulimit). |
test-fs-readfile-tostring-fail is failing frequently on OSX machines. There's a PR to fix this issue in libuv, but while the fix don't land on Node.js this test should be marked as flaky. Ref: nodejs#16601 Ref: libuv/libuv#1742
test-fs-readfile-tostring-fail is failing frequently on OSX machines. There's a PR to fix this issue in libuv, but while the fix don't land on Node.js this test should be marked as flaky. Ref: #16601 Ref: libuv/libuv#1742 PR-URL: #21013 Refs: #16601 Refs: libuv/libuv#1742 Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Jon Moss <me@jonathanmoss.me> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Anatoli Papirovski <apapirovski@mac.com> Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
test-fs-readfile-tostring-fail is failing frequently on OSX machines. There's a PR to fix this issue in libuv, but while the fix don't land on Node.js this test should be marked as flaky. Ref: #16601 Ref: libuv/libuv#1742 PR-URL: #21013 Refs: #16601 Refs: libuv/libuv#1742 Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de> Reviewed-By: Jon Moss <me@jonathanmoss.me> Reviewed-By: Luigi Pinca <luigipinca@gmail.com> Reviewed-By: Anatoli Papirovski <apapirovski@mac.com> Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
|
@santigimeno If it's the difference between approving this PR and not, would you be willing to fix up the nits yourself, push directly to the branch, and land this? It would help us close a bug in Node.js. |
test-fs-readfile-tostring-fail is unreliable until a libuv fix lands. It had previously been marked flaky on macOS, but it has been observed to also fail on AIX. Mark it flaky everywhere. Refs: libuv/libuv#1742
test-fs-readfile-tostring-fail is unreliable until a libuv fix lands. It had previously been marked flaky on macOS, but it has been observed to also fail on AIX. Mark it flaky everywhere. Refs: libuv/libuv#1742 PR-URL: nodejs#21177 Reviewed-By: Matheus Marchini <matheus@sthima.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: Jon Moss <me@jonathanmoss.me>
test-fs-readfile-tostring-fail is unreliable until a libuv fix lands. It had previously been marked flaky on macOS, but it has been observed to also fail on AIX. Mark it flaky everywhere. Refs: libuv/libuv#1742 PR-URL: #21177 Reviewed-By: Matheus Marchini <matheus@sthima.com> Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com> Reviewed-By: Jon Moss <me@jonathanmoss.me>
The PR LGTM and I can fix up the nits, but would like some other @libuv/collaborators to look at it first. |
|
bump. |
|
Bump again. |
|
bump bump bump |
This reverts commit b0f3310. (but not the test deletion)
For simplicity and predictability (since the user must handle the retry anyways), always emit exactly one readv/pread/preadv syscall and return that result to the user. By contrast, write needs to preserve order, so it needs to keep retrying the operation until it finishes before retiring the req from the queue. Ref: libuv#640 Ref: libuv#1720
96df300 to
cabf522
Compare
cabf522 to
26834a8
Compare
|
bump x4. I've rebased and updated it to address all comments. |
|
Desperately trying to get some reviews here.../ping @bnoordhuis @saghul @ronkorving @santigimeno @tniessen @cjihrig |
| } while (index < req->nbufs && result > 0); | ||
| if (nread > 0) | ||
| result = nread; | ||
| result = pread(req->file, req->bufs[0].base, req->bufs[0].len, req->off); |
There was a problem hiding this comment.
Shouldn't we go through each of the buffers here?
There was a problem hiding this comment.
If the caller wants all of the data, they are required to implement that loop, so I opted here for the simpler solution of having this function do exactly one syscall on all code-paths, and returning that one result. I reference this behavior in the PR commit message:
For simplicity and predictability (since the user must handle the retry
anyways), always emit exactly one readv/pread/preadv syscall and return
that result to the user.
The alternative would be that we be that we loop here until we encounter an error, then discard the error and return the result up to that point. That's valid behavior (it's what the kernel internally is expected to do), but it also seemed to me like more complexity than necessary. It's also not clear to me whether it is more correct to fill each buffer only as much as one read call returns (but to fill as many buffers as possible—as it does not), or to fill each buffer fully (as it would do in all other cases in this code). Handling that here seems like it would add yet further complexity.
There was a problem hiding this comment.
It makes sense to me. Thanks!
|
CI looks good. The failures are known. I'm landing |
This reverts commit b0f3310. (but not the test deletion) Fixes: nodejs/node#16601 PR-URL: #1742 Refs: #640 Refs: #1720 Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
For simplicity and predictability (since the user must handle the retry anyways), always emit exactly one readv/pread/preadv syscall and return that result to the user. By contrast, write needs to preserve order, so it needs to keep retrying the operation until it finishes before retiring the req from the queue. Fixes: nodejs/node#16601 PR-URL: #1742 Refs: #640 Refs: #1720 Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
|
Thanks! I'll need to review my patch list now and see what this resolves :) Hopefully this attempt doesn't trigger more nodejs issues—ping me if it does. |
|
This might have introduced a minor bug in AIX, please check #2023 for a possible fix. |
This is a replacement for #640, which needed to be reverted (#640 (comment)) due to incorrect handling of partial-reads. It's very tempting to think that read and write should be symmetric with respect to signals and error handling, but they aren't (and fortunately the nodejs test suite noticed that before getting too far).
The key distinction is that writes get placed in a queue, while reads do not. And, as a corollary to that, partial reads can be useful for interactive usage, while omitting the middle of a write was simply a bug.
Note: I tried to approximately split my commits into a revert of the revert and then my fixes, but I wanted to keep test-eintr-handling.c (which was deleted in the original PR), so it's not a complete revert. These should probably be squashed in the final commit. It would also be nice to recognize Ben Wijen's contributions and authorship (14bfc27) in the merge commit.