Skip to content

HttpObjectEncoder scalability issue due to instanceof checks #12708

@franz1981

Description

@franz1981

Due to https://bugs.openjdk.org/browse/JDK-8180450 the chain of instanceof checks on HttpObjectEncoder competes to update Klass::secondary_super_cache field causing a scalability issue (very relevant with many cores and NUMA topologies).

Detection of such issue isn't easy for 3 reasons:

  1. performance hit won't happen in any Java code (ie no BCI - byte code index)
  2. updates on Klass::secondary_super_cache field cause false sharing over other fields sitting on the same cache line used to perform other parts of the instanceof logic (Klass::secondary_supers's length read to perform MacroAssembler::check_klass_subtype_slow_path) causing other instructions to receive a performance penalty while not directly causing the issue itself
  3. benchmarks (and micro-benchmarks) tends to use just a single http msg type making the JIT able to NOT perform any instanceof, but placing an uncommon trap to guard against new types used (ie by making the issue to not happen)

The 1. and 2. makes profilers very confused (see http://psy-lob-saw.blogspot.com/2018/07/how-inlined-code-confusing-profiles.html) if other Java methods are inlined near (forward/backward - the direction depends by the profiler, really) the guilty emitted JIT checks, getting all the blame. Only assembly analysis using tools like https://builds.shipilev.net/perfasm/ and perf c2c makes evident which instructions to blame.
The last reason means that validation benchmarks should consider polluting the JIT profiling data at the instanceof call-sites (see https://wiki.openjdk.org/display/HotSpot/MethodData) in order to have a synthetic way to provoke/detect it.

Sadly Netty 4.1 cannot rewrite the existing http types (that are public) in order to NOT use the type information to perform state changes on the encoder, but Netty 5 can consider a different approach to solve this performance issue.

In an ideal world where there's a single implementor for each interface type (implementing just one too), the issue shouldn't happen.

To better understand (one flavour of) this issue, an example below.
Let's consider DefaultFullHttpRequest; it implements FullHttpRequest and (transitively) ReferenceCounted too.
In a code like this:

    void write(Object o) {
        try {
            if (o instanceof FullHttpMessage) {
                foo(o);
            }
            // ... other logic
        } finally {
            if (o instanceof ReferenceCounted) { 
                ((ReferenceCounted) o).release();
            }
       } 
    }

if the both instanceof check observes more then 1 concrete types eg DefaultFullHttpRequest and DefaultHttpRequest
then the checks will make use of the mechanism explained in https://bugs.openjdk.org/browse/JDK-8180450.

For example, if a single thread write(o) and o is a DefaultFullHttpRequest, the checks invalidate the mentioned DefaultFullHttpRequest Klass field (first check set it to FullHttpMessage, second to ReferenceCounted), that's just inefficient; but if many threads will do it, they would compete vs the same Klass field, invalidating each others and causing false sharing for surrounding Klass fields eg length of the Klass array used to perform the slow path check.

In the existing code base this is the simplest form of Klass::secondary_super_cache field invalidation, but the logic of the encoder can make that happen too.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions