Skip to content

Revert "Revert "unix,fs: fix for potential partial reads/writes" (and fix it)#1742

Closed
vtjnash wants to merge 5 commits intolibuv:v1.xfrom
vtjnash:jn/bwijen-partialoffset-take2
Closed

Revert "Revert "unix,fs: fix for potential partial reads/writes" (and fix it)#1742
vtjnash wants to merge 5 commits intolibuv:v1.xfrom
vtjnash:jn/bwijen-partialoffset-take2

Conversation

@vtjnash
Copy link
Copy Markdown
Member

@vtjnash vtjnash commented Feb 13, 2018

This is a replacement for #640, which needed to be reverted (#640 (comment)) due to incorrect handling of partial-reads. It's very tempting to think that read and write should be symmetric with respect to signals and error handling, but they aren't (and fortunately the nodejs test suite noticed that before getting too far).

The key distinction is that writes get placed in a queue, while reads do not. And, as a corollary to that, partial reads can be useful for interactive usage, while omitting the middle of a write was simply a bug.

Note: I tried to approximately split my commits into a revert of the revert and then my fixes, but I wanted to keep test-eintr-handling.c (which was deleted in the original PR), so it's not a complete revert. These should probably be squashed in the final commit. It would also be nice to recognize Ben Wijen's contributions and authorship (14bfc27) in the merge commit.

@vtjnash
Copy link
Copy Markdown
Member Author

vtjnash commented Mar 27, 2018

bump. Can I get a CI run here?

@cjihrig
Copy link
Copy Markdown
Contributor

cjihrig commented Mar 27, 2018

@cjihrig
Copy link
Copy Markdown
Contributor

cjihrig commented Mar 30, 2018

New CI since the last one was cancelled - https://ci.nodejs.org/view/libuv/job/libuv-test-commit/770/

@vtjnash
Copy link
Copy Markdown
Member Author

vtjnash commented Mar 30, 2018

Sorry, I forgot to push my attempted fix for that test (for machines where iovmax > "alot of bufs") after the last CI run. Hopefully this will test correctly on Windows now.

@cjihrig
Copy link
Copy Markdown
Contributor

cjihrig commented Mar 30, 2018

@cjihrig
Copy link
Copy Markdown
Contributor

cjihrig commented Apr 2, 2018

Looks like the zOS fs_write_alotof_bufs failure could be legit.

@vtjnash
Copy link
Copy Markdown
Member Author

vtjnash commented Apr 2, 2018

Yes, but I don't have z/OS to determine why readv is not returning the expected amount of data.

@santigimeno
Copy link
Copy Markdown
Member

Yes, but I don't have z/OS to determine why readv is not returning the expected amount of data.

/cc @libuv/zos

@jBarz
Copy link
Copy Markdown
Contributor

jBarz commented Apr 2, 2018

Will look into this in a bit

@jBarz
Copy link
Copy Markdown
Contributor

jBarz commented Apr 3, 2018

Use this line from src/unix/core.c in test/test-fs.c

#include <limits.h> /* INT_MAX, PATH_MAX, IOV_MAX */

@vtjnash
Copy link
Copy Markdown
Member Author

vtjnash commented May 5, 2018

bump.

@gireeshpunathil
Copy link
Copy Markdown
Contributor

Request reviewer's attention to this PR: a node.js unit test case revealed silent partial write issue. Here is a minimal reproduce and some relevant info:

#cat 16601.js

const fs = require('fs');
const ksm = process.binding('buffer').kStringMaxLength;
const stream = fs.createWriteStream('./big.txt')
const a = Buffer.alloc(ksm, 'a');

stream.write(a, () => {
  stream.end();
})
stream.on('finish', () => {
  const len = fs.readFileSync('./big.txt').length
  console.log(`attempt: ${ksm}, real: ${len}`)
})

#ulimit -f 1024
#node 16601.js
attempt: 1073741799, real: 1048576

strace showed that only one attempt was made to write, the failure was neither noticed nor propagated.

[pid  9464] write(11, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"..., 1073741799) = 1048576
[pid  9464] write(8, "\1\0\0\0\0\0\0\0", 8) = 8
[pid  9464] futex(0x21b5704, FUTEX_WAIT_PRIVATE, 8, NULL <unfinished ...>
[pid  9457] <... epoll_wait resumed> {{EPOLLIN, {u32=8, u64=8}}}, 1024, 7999) = 1
[pid  9457] read(8, "\1\0\0\0\0\0\0\0", 1024) = 8
[pid  9457] futex(0x21b5704, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x21b5700, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
[pid  9465] <... futex resumed> )       = 0
[pid  9465] futex(0x21b56c0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  9465] close(11 <unfinished ...>

@cjihrig
Copy link
Copy Markdown
Contributor

cjihrig commented May 23, 2018

libuv in Node CI: https://ci.nodejs.org/view/libuv/job/libuv-in-node/19/

@gireeshpunathil have you verified that this PR fixes the issue?

Comment thread test/test-fs.c Outdated

#ifdef _WIN32
int uv_test_getiovmax(void) {
return INT32_MAX; /* emulated by libuv, so no real limit */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation is a bit off

Comment thread src/unix/fs.c Outdated
@@ -316,6 +317,11 @@ static ssize_t uv__fs_read(uv_fs_t* req) {
return -1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make these paths goto done as well so req->bufs is cleaned-up too?

Comment thread test/test-fs.c Outdated

TEST_IMPL(fs_write_alotof_bufs) {
const size_t iovcount = 54321;
size_t iovcount = 54321;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: separate declaration from assignment

Comment thread test/test-fs.c Outdated
static void thread_main(void* arg) {
const struct thread_ctx *ctx;
int size;
char *data;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: star aligned to the left: thread_ctx* and char*

@gireeshpunathil
Copy link
Copy Markdown
Contributor

yes, I have verified, it works - i.e, throws error (which is expected when writing too big data under too low ulimit).

mmarchini pushed a commit to mmarchini/node that referenced this pull request May 29, 2018
test-fs-readfile-tostring-fail is failing frequently on OSX machines.
There's a PR to fix this issue in libuv, but while the fix don't land on
Node.js this test should be marked as flaky.

Ref: nodejs#16601
Ref: libuv/libuv#1742
mmarchini pushed a commit to nodejs/node that referenced this pull request May 30, 2018
test-fs-readfile-tostring-fail is failing frequently on OSX machines.
There's a PR to fix this issue in libuv, but while the fix don't land on
Node.js this test should be marked as flaky.

Ref: #16601
Ref: libuv/libuv#1742

PR-URL: #21013
Refs: #16601
Refs: libuv/libuv#1742
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Jon Moss <me@jonathanmoss.me>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Anatoli Papirovski <apapirovski@mac.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
addaleax pushed a commit to nodejs/node that referenced this pull request May 31, 2018
test-fs-readfile-tostring-fail is failing frequently on OSX machines.
There's a PR to fix this issue in libuv, but while the fix don't land on
Node.js this test should be marked as flaky.

Ref: #16601
Ref: libuv/libuv#1742

PR-URL: #21013
Refs: #16601
Refs: libuv/libuv#1742
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Jon Moss <me@jonathanmoss.me>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Anatoli Papirovski <apapirovski@mac.com>
Reviewed-By: Michael Dawson <michael_dawson@ca.ibm.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Trivikram Kamat <trivikr.dev@gmail.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
@Trott
Copy link
Copy Markdown
Contributor

Trott commented Jun 3, 2018

@santigimeno If it's the difference between approving this PR and not, would you be willing to fix up the nits yourself, push directly to the branch, and land this? It would help us close a bug in Node.js.

Trott added a commit to Trott/io.js that referenced this pull request Jun 6, 2018
test-fs-readfile-tostring-fail is unreliable until a libuv fix lands.

It had previously been marked flaky on macOS, but it has been observed
to also fail on AIX. Mark it flaky everywhere.

Refs: libuv/libuv#1742
Trott added a commit to Trott/io.js that referenced this pull request Jun 8, 2018
test-fs-readfile-tostring-fail is unreliable until a libuv fix lands.

It had previously been marked flaky on macOS, but it has been observed
to also fail on AIX. Mark it flaky everywhere.

Refs: libuv/libuv#1742

PR-URL: nodejs#21177
Reviewed-By: Matheus Marchini <matheus@sthima.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Jon Moss <me@jonathanmoss.me>
targos pushed a commit to nodejs/node that referenced this pull request Jun 10, 2018
test-fs-readfile-tostring-fail is unreliable until a libuv fix lands.

It had previously been marked flaky on macOS, but it has been observed
to also fail on AIX. Mark it flaky everywhere.

Refs: libuv/libuv#1742

PR-URL: #21177
Reviewed-By: Matheus Marchini <matheus@sthima.com>
Reviewed-By: Gireesh Punathil <gpunathi@in.ibm.com>
Reviewed-By: Jon Moss <me@jonathanmoss.me>
@santigimeno
Copy link
Copy Markdown
Member

If it's the difference between approving this PR and not, would you be willing to fix up the nits yourself, push directly to the branch, and land this? It would help us close a bug in Node.js.

The PR LGTM and I can fix up the nits, but would like some other @libuv/collaborators to look at it first.

@vtjnash
Copy link
Copy Markdown
Member Author

vtjnash commented Aug 27, 2018

bump.

@vtjnash
Copy link
Copy Markdown
Member Author

vtjnash commented Sep 4, 2018

Bump again.

@vtjnash
Copy link
Copy Markdown
Member Author

vtjnash commented Sep 12, 2018

bump bump bump

This reverts commit b0f3310.
(but not the test deletion)
For simplicity and predictability (since the user must handle the retry
anyways), always emit exactly one readv/pread/preadv syscall and return
that result to the user.

By contrast, write needs to preserve order, so it needs to keep retrying
the operation until it finishes before retiring the req from the queue.

Ref: libuv#640
Ref: libuv#1720
@vtjnash vtjnash force-pushed the jn/bwijen-partialoffset-take2 branch from 96df300 to cabf522 Compare September 18, 2018 19:30
@vtjnash vtjnash force-pushed the jn/bwijen-partialoffset-take2 branch from cabf522 to 26834a8 Compare September 18, 2018 19:31
@vtjnash
Copy link
Copy Markdown
Member Author

vtjnash commented Sep 18, 2018

bump x4. I've rebased and updated it to address all comments.

@Trott
Copy link
Copy Markdown
Contributor

Trott commented Sep 19, 2018

Desperately trying to get some reviews here.../ping @bnoordhuis @saghul @ronkorving @santigimeno @tniessen @cjihrig

Comment thread src/unix/fs.c
} while (index < req->nbufs && result > 0);
if (nread > 0)
result = nread;
result = pread(req->file, req->bufs[0].base, req->bufs[0].len, req->off);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we go through each of the buffers here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the caller wants all of the data, they are required to implement that loop, so I opted here for the simpler solution of having this function do exactly one syscall on all code-paths, and returning that one result. I reference this behavior in the PR commit message:

For simplicity and predictability (since the user must handle the retry
anyways), always emit exactly one readv/pread/preadv syscall and return
that result to the user.

The alternative would be that we be that we loop here until we encounter an error, then discard the error and return the result up to that point. That's valid behavior (it's what the kernel internally is expected to do), but it also seemed to me like more complexity than necessary. It's also not clear to me whether it is more correct to fill each buffer only as much as one read call returns (but to fill as many buffers as possible—as it does not), or to fill each buffer fully (as it would do in all other cases in this code). Handling that here seems like it would add yet further complexity.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me. Thanks!

@santigimeno
Copy link
Copy Markdown
Member

@santigimeno
Copy link
Copy Markdown
Member

CI looks good. The failures are known. I'm landing

santigimeno pushed a commit that referenced this pull request Sep 28, 2018
This reverts commit b0f3310.
(but not the test deletion)

Fixes: nodejs/node#16601
PR-URL: #1742
Refs: #640
Refs: #1720
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
santigimeno pushed a commit that referenced this pull request Sep 28, 2018
For simplicity and predictability (since the user must handle the retry
anyways), always emit exactly one readv/pread/preadv syscall and return
that result to the user.

By contrast, write needs to preserve order, so it needs to keep retrying
the operation until it finishes before retiring the req from the queue.

Fixes: nodejs/node#16601
PR-URL: #1742
Refs: #640
Refs: #1720
Reviewed-By: Santiago Gimeno <santiago.gimeno@gmail.com>
@santigimeno
Copy link
Copy Markdown
Member

Landed in 19a3419 and 60abdba. Thanks @vtjnash and sorry it took so long.

@vtjnash vtjnash deleted the jn/bwijen-partialoffset-take2 branch September 29, 2018 02:29
@vtjnash
Copy link
Copy Markdown
Member Author

vtjnash commented Sep 29, 2018

Thanks! I'll need to review my patch list now and see what this resolves :) Hopefully this attempt doesn't trigger more nodejs issues—ping me if it does.

@thefourtheye
Copy link
Copy Markdown
Contributor

This might have introduced a minor bug in AIX, please check #2023 for a possible fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants