config_file: implement stat cache to avoid repeated rehashing#5132
config_file: implement stat cache to avoid repeated rehashing#5132pks-t merged 8 commits intolibgit2:masterfrom
Conversation
|
Fixes #5126? At least parts of it, I guess. |
cf43e99 to
3e55114
Compare
|
So this in fact breaks Windows-based systems when reading and writing files at nearly the same time due to Windows usually not using sub-second file stats and thus we're racy. To be honest, I don't think this matters and I'd still like to get this merged. If you have two processes reading and writing files at nearly the same time, then I find it totally reasonable to say "Yup, this is racy without any external synchronization mechanism." |
|
@pks-t: It seems to avoid the reading of the files (which is great!), but their opening or most other calls---those are still extremely slow. The change was only a handful of FS calls. An easy (although conservative) way to avoid this in many situations on Windows is to do the following: the very first time the For There's also a harder way to accomplish something similar, which is to keep the config files actually open, then use them directly (e.g. The above can be in a separate PR; doesn't have to be this one. Just wanted to explain how to do it, since at least the change notification method is quite easy. (You don't even need to implement it on non-Windows systems if you don't want, since just stat'ing like usual on those is probably already probably fast enough.) |
|
This is the general and non-Windows-specific approach. Yours just wouldn't work in a cross-platform way, at least as far as I know. If there is a cross-platform way then I'd certainly be happy to hear about it
Yeah, that'd be a huge problem. Many IDEs or editors just keep repos open all the time while the editor is alive, and we just cannot lock any files like that. |
|
So what if it's a platform specific approach though? Just |
Yeah, maybe in another increment. I'm not a Windows guy and this here already improves runtime performance by a lot on all platforms. |
|
Oh yeah, like I said, it doesn't need to be on this PR or anything. I just mentioned it as a TODO so we don't forget. I might be able to help with the Windows parts too later. |
By the way: this isn't quite true. Yes, it only removes a handful of calls, but in fact it removes the most hurting ones. Reads are always a lot less efficient compared to stats, but most importantly we now completly avoid re-hashing the file contents to see whether they're still the same. Previously, we would've always hashed the contents. |
|
At a later point we might decide to roll this into our filebuffers. It seems like a good match to have stuff like that as part of them, as there's multiple locations where we use this sort of caching. Furthermore, that'd allow us to more easily implement platform-specific heuristics similar to the one proposed by @mehrdadn |
@pks-t: No, it's not wrong. I've been trying to tell this to all of you on both threads, but it seems you still don't realize the sheer magnitude of the numbers. So let me put it in terms of actual numbers. Here's what it looks like on a real benchmark:
Here's what you should be observing:
So you can see Now maybe try benchmarking this yourself your Mac and letting me know what you see: |
I don’t understand why you make any assumption about what technologies any of us use. |
@ethomson well because he said "I'm not a Windows guy". As to why Mac and not Linux, though, I thought he was the one who said he's the MacOS developer but turns out it was tiennou who said that on the other thread, so I misremembered that bit. But I'm not "making assumptions", I'm just going by what you yourselves have told me! |
Indeed it does seem to. We could add a 100 ns sleep on win32 to that config test, which should ensure that we actually update the filetime. |
Sleeping for 100ns on Windows is not as easy as it sounds. You'll have to spin loop and remeasure. Also, I think he meant |
Our |
|
Oh I see, cool. |
|
On Fri, Jun 21, 2019 at 02:35:47PM -0700, mehrdadn wrote:
> Reads are always a lot less efficient compared to stats,
@pks-t: No, it's *not* wrong. I've been trying to tell this to
all of you on both threads, but it seems you still don't
realize the sheer magnitude of the numbers. So let me put it in
terms of actual numbers for.
"Not quite true" and "wrong" are different things. Furthermore I
claimed that the most important part is that we stop re-hashing
contents.
|
26ba376 to
3ce5844
Compare
|
@ethomson: rebased. I've introduced a 1ms sleep, which should probably fix the test |
3ce5844 to
1725582
Compare
|
Changed to 10ms due to it still being flaky |
I'd like to hit pause here for a second. ⏸ Things over here in the libgit2 community have seemed a bit contentious recently, and I'd like to nip this in the bud and make sure that we all get on the same page. Obviously, it's very difficult to convey things like tone online. So things that are meant as jokes may not come across that way, and readers may miss the intent and instead hear attacks. Things that are meant as emphasis may be interpreted as rudeness. So a quick reminder about our code of conduct, which reminds us that this aims to be a positive environment, and that we should all be:
And a reminder that our goals are to build a Git repository management library that is first correct and second performant, and that we need to work together to do that. This is an all-volunteer project, and we're all here because we want to be. We don't need to argue about degrees of incorrectness, and we don't need comments that can be perceived as rude. We need collaboration and inclusiveness. Thanks in advance! |
|
@ethomson: are you good with these changes and opening up the potential for races? |
|
I think so. For the filesystem-based config system, it's hard for me to imagine that But we could add some tuning to the FS-based config system to disable stat in favor of a more heavy handed approach (to minimize that NTFS attribute update lag, or if they're trying to use something worse even than NTFS) or to disable reloading entirely. But I think that this is good for the defaults. 👍 |
Well, the more heavy handed approach is the one we had before, right? :) We could use a global option to switch behaviour here, of course. Did you mean to imply that? |
I got confused and didn't really follow what's happening here, but just a note: as far as I know (although I'm not 100% sure about this) there shouldn't be a "lag" per se in NTFS file time updating. Rather, the situation is like this:
So to update the file time I'd expect the only thing to do is to just close all file descriptors used for writing. I think you shouldn't need to change how you actually read the file time (whether it's via And here's a demo that illustrates this: #!/usr/bin/env python3
import os
def stat_via_dir(filename):
basename = os.path.basename(filename)
with os.scandir(os.path.dirname(filename) or '.') as it:
for entry in it:
if entry.name == basename:
return entry.stat()
def main(program, filename='testfile'):
print(os.stat(filename).st_mtime)
with open(filename, 'r+b') as f:
print(stat_via_dir(filename).st_mtime)
f.write(b'\n')
f.flush()
print(os.stat(filename).st_mtime)
print(stat_via_dir(filename).st_mtime)
print(os.stat(filename).st_mtime)
print(stat_via_dir(filename).st_mtime)
if __name__ == '__main__':
import sys
raise SystemExit(main(*sys.argv)) |
e99d9c4 to
caeb80f
Compare
|
Updated. I've optimized how we handle refreshes after calling |
|
Awesome :) is the 1ms sleep due to the 100ns (and the fact that it's hard to go below 1ms)? Or do you need it to be actually 1ms long? Because if you only need the 100ns resolution I could write you a faster sleep... |
The first reason was that implementing The second reason was that we also need to consider systems built with "-DUSE_NSEC=NO", where we might not even have that level of granularity for So thanks for your offer, but I don't really think it's necessary for now :) |
5b208e7 to
f0a2653
Compare
|
|
||
| static int config_refresh(git_config_backend *cfg) | ||
| { | ||
| diskfile_backend *b = (diskfile_backend *)cfg; |
There was a problem hiding this comment.
Wouldn't that warrant one of those GIT_CONTAINER_OF things, just in case ?
There was a problem hiding this comment.
Definitely. Embarassing that I myself didn't think about it :P
Depending on the platform and on build options, we may or may not build libgit2 with support for nanoseconds when using `stat` calls. It's currently unclear though whether sub-second stat information is used at all. Add feature info for this to tell at configure time whether it's being used or not.
Implement a new example that resembles git-config(1). Right now, this example can both read and set configuration keys, only.
To decide whether a config file has changed, we always hash its
complete contents. This is unnecessarily expensive, as
well-behaved filesystems will always update stat information for
files which have changed. So before computing the hash, we should
first check whether the stat info has actually changed for either
the configuration file or any of its includes. This avoids having
to re-read the configuration file and its includes every time
when we check whether it's been modified.
Tracing the for-each-ref example previous to this commit, one can
see that we repeatedly re-open both the repo configuration as
well as the global configuration:
$ strace lg2 for-each-ref |& grep config
access("/home/pks/.gitconfig", F_OK) = -1 ENOENT (No such file or directory)
access("/home/pks/.config/git/config", F_OK) = 0
access("/etc/gitconfig", F_OK) = -1 ENOENT (No such file or directory)
stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
access("/tmp/repo/.git/config", F_OK) = 0
stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
stat("/home/pks/.gitconfig", 0x7ffd15c05290) = -1 ENOENT (No such file or directory)
access("/home/pks/.gitconfig", F_OK) = -1 ENOENT (No such file or directory)
stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
access("/home/pks/.config/git/config", F_OK) = 0
stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
stat("/home/pks/.gitconfig", 0x7ffd15c051f0) = -1 ENOENT (No such file or directory)
stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
stat("/home/pks/.gitconfig", 0x7ffd15c05090) = -1 ENOENT (No such file or directory)
stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
stat("/home/pks/.gitconfig", 0x7ffd15c05090) = -1 ENOENT (No such file or directory)
stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
stat("/home/pks/.gitconfig", 0x7ffd15c05090) = -1 ENOENT (No such file or directory)
stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
With the change, we only do stats for those files and open them a
single time, only:
$ strace lg2 for-each-ref |& grep config
access("/home/pks/.gitconfig", F_OK) = -1 ENOENT (No such file or directory)
access("/home/pks/.config/git/config", F_OK) = 0
access("/etc/gitconfig", F_OK) = -1 ENOENT (No such file or directory)
stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
access("/tmp/repo/.git/config", F_OK) = 0
stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
stat("/home/pks/.gitconfig", 0x7ffe70540d20) = -1 ENOENT (No such file or directory)
access("/home/pks/.gitconfig", F_OK) = -1 ENOENT (No such file or directory)
stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
access("/home/pks/.config/git/config", F_OK) = 0
stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
stat("/home/pks/.gitconfig", 0x7ffe70540ca0) = -1 ENOENT (No such file or directory)
stat("/home/pks/.gitconfig", 0x7ffe70540c80) = -1 ENOENT (No such file or directory)
stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
stat("/home/pks/.gitconfig", 0x7ffe70540b40) = -1 ENOENT (No such file or directory)
stat("/home/pks/.gitconfig", 0x7ffe70540b20) = -1 ENOENT (No such file or directory)
stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
stat("/home/pks/.gitconfig", 0x7ffe70540b40) = -1 ENOENT (No such file or directory)
stat("/home/pks/.gitconfig", 0x7ffe70540b20) = -1 ENOENT (No such file or directory)
stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
stat("/tmp/repo/.git/config", {st_mode=S_IFREG|0644, st_size=92, ...}) = 0
stat("/home/pks/.gitconfig", 0x7ffe70540b40) = -1 ENOENT (No such file or directory)
stat("/home/pks/.gitconfig", 0x7ffe70540b20) = -1 ENOENT (No such file or directory)
stat("/home/pks/.config/git/config", {st_mode=S_IFREG|0644, st_size=1154, ...}) = 0
The following benchmark has been performed with and without the
stat cache in a best-of-ten run:
```
int lg2_repro(git_repository *repo, int argc, char **argv)
{
git_config *cfg;
int32_t dummy;
int i;
UNUSED(argc);
UNUSED(argv);
check_lg2(git_repository_config(&cfg, repo),
"Could not obtain config", NULL);
for (i = 1; i < 100000; ++i)
git_config_get_int32(&dummy, cfg, "foo.bar");
git_config_free(cfg);
return 0;
}
```
Without stat cache:
$ time lg2 repro
real 0m1.528s
user 0m0.568s
sys 0m0.944s
With stat cache:
$ time lg2 repro
real 0m0.526s
user 0m0.268s
sys 0m0.258s
This benchmark shows a nearly three-fold performance improvement.
This change requires that we check our configuration stress tests
as we're now in fact becoming more racy. If somebody is writing a
configuration file at nearly the same time (there is a window of
100ns on Windows-based systems), then it might be that we realize
that this file has actually changed and thus may not re-read it.
This will only happen if either an external process is rewriting
the configuration file or if the same process has multiple
`git_config` structures pointing to the same time, where one of
both is being used to write and the other one is used to read
values.
We are quite lazy in how we refresh our config file backend when updating any of its keys: instead of just updating our in-memory representation of the keys, we just discard the old set of keys and then re-read the config file contents from disk. This refresh currently happens separately at every callsite of `config_write`, but it is clear that we _always_ want to refresh if we have written the config file to disk. If we didn't, then we'd run around with an outdated config file backend that does not represent what we have on disk. By moving the refresh into `config_write`, we are also able to optimize the case where the config file is currently locked. Before, we would've tried to re-read the file even if we have only updated its cached contents without touching the on-disk file. Thus we'd have unnecessarily stat'd the file, even though we know that it shouldn't have been modified in the meantime due to its lock.
The `config_read` function currently performs both reading the on-disk config file as well as parsing the retrieved buffer contents. To optimize how we refresh our config entries from an in-memory buffer, we need to be able to directly parse buffers, though, without involving any on-disk files at all. Extract a new function `config_read_buffer` that sets up the parsing logic and then parses config entries from a buffer, only. Have `config_read` use it to avoid duplicated logic.
Updating a config file backend's config entries is a bit more involved, as it requires clearing of the old config entries as well as handling locking correctly. As we will need this functionality in a future patch to refresh config entries from a buffer, let's extract this into its own function `config_set_entries`.
When we rewrite the configuration file due to any of its values
being modified, we call `config_refresh` to update the in-memory
representation of our config file backend. This is needlessly
wasteful though, as `config_refresh` will always open the on-disk
representation to reads the file contents while we already know
the complete file contents at this point in time as we have just
written it to disk.
Implement a new function `config_refresh_from_buffer` that will
refresh the backend's config entries from a buffer instead of
from the config file itself. Note that this will thus _not_
update the backend's timestamp, which will cause us to re-read
the buffer when performing a read operation on it. But this is
still an improvement as we now lazily re-read the contents, and
most importantly we will avoid constantly re-reading the contents
if we perform multiple write operations.
The following strace demonstrates this if we're re-writing a key
multiple times. It uses our config example with `config_set`
changed to update the file 10 times with different keys:
$ strace lg2 config x.x z |& grep '^open.*config'
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
And now with the optimization of `config_refresh_from_buffer`:
$ strace lg2 config x.x z |& grep '^open.*config'
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 3
open("/home/pks/.config/git/config", O_RDONLY|O_CLOEXEC) = 3
open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
open("/tmp/repo/.git/config.lock", O_WRONLY|O_CREAT|O_EXCL|O_CLOEXEC, 0666) = 3
open("/tmp/repo/.git/config", O_RDONLY|O_CLOEXEC) = 4
As can be seen, this is quite a lot of `open` calls less.
f0a2653 to
2ba7020
Compare
|
Thanks for your review, @tiennou! All comments should be addressed now |
2b50aac to
2ba7020
Compare
| if ((result = git_filebuf_commit(&file)) < 0) | ||
| goto done; | ||
|
|
||
| if ((result = config_refresh_from_buffer(&cfg->header.parent, buf.ptr, buf.size)) < 0) |
There was a problem hiding this comment.
Build is still red, because Windows fails and/or leaks. I've looked around for the leak and pinpointed one possible occurrence here, but since there's also timing issues at hand, maybe not ?
buf was "attached" in the other code path, but not here?
There was a problem hiding this comment.
Nah, that was just a quick test of mine :P Reverted it again to the previous state, which passes
|
Thanks for your reviews, @tiennou! |
To decide whether a config file has changed, we always hash its
complete contents. This is unnecessarily expensive, as
well-behaved filesystems will always update stat information for
files which have changed. So before computing the hash, we should
first check whether the stat info has actually changed for either
the configuration file or any of its includes. This avoids having
to re-read the configuration file and its includes every time
when we check whether it's been modified.
Tracing the for-each-ref example previous to this commit, one can
see that we repeatedly re-open both the repo configuration as
well as the global configuration:
With the change, we only do stats for those files and open them a
single time, only:
The following benchmark has been performed with and without the
stat cache in a best-of-ten run:
Without stat cache:
With stat cache:
This benchmark shows a nearly three-fold performance improvement.