Conversation
9643584 to
a9c4bf6
Compare
|
Benchmark result on below env shows max 83% performance boost. |
a9c4bf6 to
ff4d295
Compare
There was a problem hiding this comment.
While searching backward, we need to check the last occurrence of the needle in the long batch, which means basically working with the opposite endianness. I don't see any mention about it (in a comment too)
There was a problem hiding this comment.
@franz1981
I added the comment.
if (result != 0) {
// used the oppoiste endianness since we are looking for the last index.
return offset - 1 - SWARUtil.getIndex(result, !isNative);
}
54fefbb to
f857129
Compare
chrisvest
left a comment
There was a problem hiding this comment.
AbstractByteBufTest.testSWARIndexOf only covers forward searching. Please add test coverage for backward searching as well.
| return -1; | ||
| } | ||
|
|
||
| private static int unrolledLastIndexOf(final AbstractByteBuf buffer, final int fromIndex, final int byteCount, |
There was a problem hiding this comment.
Did you compare this unrolledLastIndexOf to calling linearLastIndexOf with adjusted range?
There was a problem hiding this comment.
Previous research has shown that manually unrolled loops improves performance for size=7 benchmark case(#10737 (comment)).
I will add an updated comparison.
There was a problem hiding this comment.
Manual unrolling results in better performance compared to a linear approach. (size > 1)
1X10X2, Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz, openjdk 17.0.8 2023-07-18, Ubuntu 22.04.3 LTS, tuend network low-latency, no turbo boost.
benchmark
linear benchmark source code
manual unroll benchmark source code
Motivation: The performance of `#lastIndexOf` could be enhanced by applying SWAR. Modification: Utilized `SWARUtil` for byte search. Result: Enhanced performance.
98a2c2a to
b39fa54
Compare
Motivation:
The performance of
#lastIndexOfcould be enhanced by applying SWAR.Modification:
Utilized
SWARUtilfor byte search.Result:
Enhanced performance.