Conversation
| #else | ||
| return strncmp(a, b, sz); | ||
| #endif | ||
| } |
There was a problem hiding this comment.
I'd be surprised if the libc git_strcmp would actually be any faster. Also, our git__strncmp implementation is nearly the same as that of musl libc, so there's no performance gain to be expected compared to that. That being said, glibc uses an unrolled loop for strncmp while n > 4 to enable the compiler to vectorize code, which is probably why you do see performance differences.
So because of this I'm fine with your proposal in general. Personally, I'd vote for having defines though instead of inlined code. E.g.
#ifdef GIT_WIN32
GIT_INLINE(int) git__strcmp(const char *a, const char *b)
{
....
}
#else
# define git__strcmp strcmp
#endif
By the way, why can't we use strcmp and strncmp on Win32 platforms?
There was a problem hiding this comment.
According to @ethomson, strcmp does locale-sensitive comparision on windows: #4230 (comment)
There was a problem hiding this comment.
I need to validate that this is still happening... this started in a prerelease of the 2015 (IIRC) C runtime. It's very possible that the change got reverted before it shipped and we've been unnecessarily shipping our own strcmp for years.
I haven't looked at the assembler output but yeah, I assume that there's vectorization opportunities that we've not taken. In any case, let me look into this in case we can ⚡️ it on win32 also. I'll 👀 this weekend (sorry, busy day today).
There was a problem hiding this comment.
I'd be surprised if the libc
git_strcmpwould actually be any faster.
It speeds up my benchmark by 0.85%.
So because of this I'm fine with your proposal in general. Personally, I'd vote for having defines though instead of inlined code.
I usually avoid having the same symbol being a function in one configuration and a macro in another. It can cause trouble when signatures diverge, or when there is hiding involved. And in general fewer macros is better.
However, I'll do anything you ask me to. Shall I change the code the way you suggested?
I'll eyes this weekend (sorry, busy day today).
@ethomson I'm waiting for you to come back. It's not urgent. Just so you know that the ball is in your court.
There was a problem hiding this comment.
I'd be surprised if the libc git_strcmp would actually be any faster.
It speeds up my benchmark by 0.85%.
Huh, I wonder why that is. I guess you didn't take a lot at the generated assembly, right? No need to do so if you didn't, I'm simply curious.
However, I'll do anything you ask me to. Shall I change the code the way you suggested?
I'd say we should first wait for @ethomson. No need to make you perform additional work when the requirements aren't quite clear yet :)
There was a problem hiding this comment.
Should I remove git__strcmp and git__strncmp and replace all calls to them with the standard equivalents?
There was a problem hiding this comment.
Mhh. I don't think the code churn would be worth it. Instead, I'd just remove the self-rolled implementations and unconditionally #define git__strcmp strcmp
There was a problem hiding this comment.
Done. And after much trial and error I managed to merge upstream changes and squash my commits into one.
There was a problem hiding this comment.
I think the #defines are okay. IIRC, in the past we could not simply do this since we take function pointers to git__strcmp (in status.c for example) so that we can switch easily between git__strcmp and git__strcasecmp. The problem is that we run into calling convention problems, depending on how we're built vs how the MSVCRT runtime is built.
It looks like since we removed the ability to build as stdcall that we're 👍 here now.
Right, I haven't looked at the assembly. This PR was produced with the following methodology.
|
|
Thanks for the fix @romkatv! |
See #4230 (comment) for context. This is diff optimization PR 1/3. The smallest of the three both in terms of code delta and impact.
Three hand-rolled string comparison functions -- git__prefixcmp, git__strcmp and git__strncmp -- are showing up on the CPU profile when scanning working directory of chromium repository on Linux. This PR optimizes git__prefixcmp and replaces the other two functions with standard equivalents (unless
GIT_WIN32is defined).Test:
Benchmark:
A modest 4% improvement. Working directory traversal increases more than that. The benchmark spends a lot of time doing other things, so the apparent impact of the PR is diluted.
Note: