Skip to content

unix,win: add uv_fs_{open,read,close}_dir()#2057

Merged
cjihrig merged 1 commit intolibuv:v1.xfrom
cjihrig:readdir
Mar 26, 2019
Merged

unix,win: add uv_fs_{open,read,close}_dir()#2057
cjihrig merged 1 commit intolibuv:v1.xfrom
cjihrig:readdir

Conversation

@cjihrig
Copy link
Copy Markdown
Contributor

@cjihrig cjihrig commented Oct 29, 2018

WIP continuation of #416

Details

Some questions about the current API design:

  • uv_fs_opendir() currently allocates a uv_dir_t that is used when reading and closing the directory. It's not clear to me where the matching free() should live (uv_fs_req_cleanup(), uv_fs_close_dir(), both, other). It might be better to pass the uv_dir_t to uv_fs_opendir() to avoid the issue.
  • How should errors in uv__fs_readdir() be handled? Memory is allocated via uv__strdup(). It can be freed via uv_fs_req_cleanup(). But, what condition should the outputs be left in if one of the readdir() calls fails in the middle of the call to uv__fs_readdir()?
  • In general, I think this attaches too much new information to uv_fs_s, which impacts all other fs operations.

Comment thread src/unix/fs.c Outdated
@cjihrig
Copy link
Copy Markdown
Contributor Author

cjihrig commented Nov 10, 2018

Anyone have any thoughts on the points in the OP?

@Fishrock123
Copy link
Copy Markdown
Contributor

@cjihrig I think it would be more libuv-style to pass uv_dir_t to uv_fs_opendir()?

which impacts all other fs operations.

Impacts in what way?

@cjihrig
Copy link
Copy Markdown
Contributor Author

cjihrig commented Dec 4, 2018

Impacts in what way?

I didn't mean anything too drastic. Adding fields to a struct that is shared between all fs operations means increased memory on all operations. If we initialize the new fields to some value, then it's also a (very minor) performance hit.

@rijnhard
Copy link
Copy Markdown

Not to nag but any update?

@Fishrock123
Copy link
Copy Markdown
Contributor

@cjihrig Can uv_dir_t "extend" (not that C has "extend" functionality) uv_fs_t? (Or have the new struct have it as a field.)

It may be worthwhile to have this return/use a non-uv_fs_t struct, based on what you said.

@Fishrock123
Copy link
Copy Markdown
Contributor

Fishrock123 commented Feb 11, 2019

Not 100% sure this would work but this is what I had in mind: Fishrock123/libuv@91e5a09...06fa01b

(Compiles but tests have some failures.)

@Fishrock123
Copy link
Copy Markdown
Contributor

@cjihrig Any thoughts?

@cjihrig
Copy link
Copy Markdown
Contributor Author

cjihrig commented Mar 18, 2019

@Fishrock123 sorry for the delay. Looking at this again I'm thinking...

Instead of adding the fields dir, dirents, and nentries to the uv_fs_t, maybe we could just add the uv_dir_t* to uv_fs_t, and add everything else to the uv_dir_t. That would prevent the problem of adding a bunch of things to uv_fs_t, while keeping a similar API to the rest of the fs functions.

Maybe the uv_dir_t* could even go in the UV_FS_PRIVATE_FIELDS to avoid breaking ABI.

I'll take another look at this tonight with fresh eyes.

@guymguym
Copy link
Copy Markdown

Hey @cjihrig I still don't see mutex around the call to linux readdir()...

@cjihrig cjihrig force-pushed the readdir branch 3 times, most recently from 99c1f24 to 2fb2d05 Compare March 20, 2019 16:06
@cjihrig
Copy link
Copy Markdown
Contributor Author

cjihrig commented Mar 20, 2019

I've refactored the unix implementation (including adding the mutex), and I think I've got this to a place where it can land on the v1.x branch. I still want to do a bit more cleanup, and I haven't updated the Windows implementation yet.

@cjihrig cjihrig force-pushed the readdir branch 11 times, most recently from 9050b07 to 18fc857 Compare March 24, 2019 03:10
@cjihrig cjihrig force-pushed the readdir branch 6 times, most recently from 9a15f8a to f3f34f4 Compare March 25, 2019 04:25
@cjihrig
Copy link
Copy Markdown
Contributor Author

cjihrig commented Mar 25, 2019

Updated status:

Copy link
Copy Markdown
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's been discussed already but just in case, we're okay with it being a bit of an odd duck? It's a streaming API whereas everything else in fs.c is one-shot.

Comment thread include/uv.h
Comment thread src/unix/fs.c Outdated
Comment thread src/unix/fs.c Outdated
Comment thread src/uv-common.c Outdated
Comment thread src/win/fs.c Outdated
Comment thread src/win/fs.c Outdated
Comment thread src/win/fs.c Outdated
Comment thread src/win/fs.c Outdated
Comment thread src/win/fs.c
@cjihrig
Copy link
Copy Markdown
Contributor Author

cjihrig commented Mar 25, 2019

Thanks for the review Ben. I'll follow up on the code comments later today.

I think it's been discussed already but just in case, we're okay with it being a bit of an odd duck? It's a streaming API whereas everything else in fs.c is one-shot.

I'm OK with the odd duck status. I think it makes the most sense to include this functionality with the other fs functions instead of making it a lone wolf odd duck. I guess I just assumed that detail had been ironed out at some point in the past 5 years 😄

@Fishrock123
Copy link
Copy Markdown
Contributor

I think it's been discussed already but just in case, we're okay with it being a bit of an odd duck? It's a streaming API whereas everything else in fs.c is one-shot.

You mean something more in the shape of uv_fs_scandir*...?

@cjihrig cjihrig force-pushed the readdir branch 3 times, most recently from a7ea9e3 to 440ab0c Compare March 26, 2019 03:14
@cjihrig
Copy link
Copy Markdown
Contributor Author

cjihrig commented Mar 26, 2019

Docs added. Removing the WIP label.

@cjihrig cjihrig changed the title WIP: unix,win: add uv_fs_{open,read,close}_dir() unix,win: add uv_fs_{open,read,close}_dir() Mar 26, 2019
@cjihrig
Copy link
Copy Markdown
Contributor Author

cjihrig commented Mar 26, 2019

CI: https://ci.nodejs.org/view/libuv/job/libuv-test-commit/1324/. Just unrelated failures on centos6.

@guymguym
Copy link
Copy Markdown

guymguym commented Mar 26, 2019

Hey @bnoordhuis @cjihrig

Thanks for your replies!

That isn't so different from multiple uv_fs_write() requests for the same file descriptor, for example. We don't make any promises about correctness or order there, either.

IMHO there is quite a difference with concurrent uv_fs_write() calls - the worst outcome is that the caller will not know which write won - which is expected for any unsynchronized write operations in any storage API known to man, and also the case between different processes. With concurrent uv_fs_readdir(), which is an unsynchronized read operation, the effect is unspecified and will likely include memory corruptions and crash depending on the underlying implementation.

Note that the reason that readdir_r() was deprecated by POSIX is that struct dirent is poorly defined in a cross platform way and the caller cannot reliably send a length for the provided d_name buffer. I think that this POSIX behavior is unfortunate to have been created in the first place (just needed to standardize d_reclen?), and a modern library like libuv should not expose these kind of unneeded POSIX complexities.

Anyway, if you only care for how Node.js uses this API then I am sure it can handle synchronization on its own, but as a general library this is a TMI API :)

Because the code currently isn't equipped to deal with that. The mutex guards the dir->dir handle but not the dir->dirents array. Two concurrent threads will overwrite each other's entries in that array.

Oh I just noticed that dirents are passed on the dir structure itself. I would actually expect that dirents and ndirents be passed to the uv_fs_readdir call as arguments, and store it in req->ptr as a request scope struct. And actually, since the underlying impl is producing entries one by one anyway, isn't it best to design the API to return a single entry at a time? that would simplify it even further for the caller.

Thanks!

Copy link
Copy Markdown
Member

@bnoordhuis bnoordhuis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, LGTM. Since Colin wrote it, I'll defer to him on whether or not to make allowances for thread-safety, but I think it's fine the way it is.

Comment thread docs/src/fs.rst Outdated
@bnoordhuis
Copy link
Copy Markdown
Member

You mean something more in the shape of uv_fs_scandir*...?

uv_fs_scandir() has an iterator interface for pulling out the results but the actual work is done in a single round-trip through the thread pool. uv_readdir() is different because it segments the work (i.e., does multiple round-trips.)

After thinking about it some more, I think the uv_readdir() interface is fine. I can live with it being slightly different from everything else.

@guymguym
Copy link
Copy Markdown

Hey @bnoordhuis @cjihrig

I had a second thought about what I wrote above.

I still think it is safer and simpler to the caller to implement with a mutex internally, but I also understood that concurrent calls to uv_fs_readdir() would probably not make sense from a usage perspective because it's a streaming API that maintains the dir position inside the DIR structure (there is no POSIX API for a "stateless" call to readdir(dir, position) like we have for files read(file, offset)). So the caller will inevitably have to serialize the readdir calls to collect all the entries, and therefore the mutex overhead is redundant.

Anyway, I also think the PR looks good.
Too bad we can't just protect the calling code from making that mistake.

Thanks!

cjihrig added a commit to cjihrig/libuv that referenced this pull request Mar 26, 2019
Co-authored-by: Julien Gilli <jgilli@nodejs.org>
Co-authored-by: Jeremy Whitlock <jwhitlock@apache.org>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
PR-URL: libuv#2057
Refs: joyent/libuv#1430
Refs: joyent/libuv#1521
Refs: joyent/libuv#1574
Refs: libuv#175
Refs: nodejs/node#583
Refs: libuv#416
Refs: libuv#170
Co-authored-by: Julien Gilli <jgilli@nodejs.org>
Co-authored-by: Jeremy Whitlock <jwhitlock@apache.org>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
PR-URL: libuv#2057
Refs: joyent/libuv#1430
Refs: joyent/libuv#1521
Refs: joyent/libuv#1574
Refs: libuv#175
Refs: nodejs/node#583
Refs: libuv#416
Refs: libuv#170
@cjihrig
Copy link
Copy Markdown
Contributor Author

cjihrig commented Mar 26, 2019

Squashed the commits down. Final CI before landing with no related failures: https://ci.nodejs.org/view/libuv/job/libuv-test-commit/1327/

Landing. Finally 🎉. Thanks everyone involved in this PR and all of its previous iterations.

@cjihrig cjihrig merged commit 99440bb into libuv:v1.x Mar 26, 2019
@cjihrig cjihrig deleted the readdir branch March 26, 2019 23:08
njlr pushed a commit to buckaroo-pm/libuv that referenced this pull request Apr 5, 2019
Co-authored-by: Julien Gilli <jgilli@nodejs.org>
Co-authored-by: Jeremy Whitlock <jwhitlock@apache.org>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
PR-URL: libuv#2057
Refs: joyent/libuv#1430
Refs: joyent/libuv#1521
Refs: joyent/libuv#1574
Refs: libuv#175
Refs: nodejs/node#583
Refs: libuv#416
Refs: libuv#170
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants