Skip to content

improve performance for Java "block-to-interface" conversion#9401

Open
kares wants to merge 10 commits into
jruby:jruby-10.0from
kares:proc-to-iface-slow-10.0
Open

improve performance for Java "block-to-interface" conversion#9401
kares wants to merge 10 commits into
jruby:jruby-10.0from
kares:proc-to-iface-slow-10.0

Conversation

@kares
Copy link
Copy Markdown
Member

@kares kares commented Apr 28, 2026

Passing a Ruby block to a Java method that expects a SAM (single-abstract-method "functional interface") type has evolved into having very bad performance despite being one of the very useful JI features available in JRuby, current behavior:

  1. allocates a fresh RubyProc (to wrap the block)
  2. materialises a singleton class (MetaClass) for the proc object
  3. singletonClass.include(<interfaceModule>) - synchronized(runtime.hierarchyLock).
  4. extended callback re-invokes singletonClass.include(<interfaceModule>)
    a no-op, but includeModule still calls invalidateCacheDescendants unconditionally, taking the same lock again
  5. addMethod("method_missing") - synchronized(hierarchyLock)
  6. addMethod("<sam-method>") - synchronized(hierarchyLock)

4 acquisitions per task on a single monitor
Under load with multiple threads, the lack of caching becomes very noticeable and prevents proper concurrent execution:

  • per-call latency is dominated by singleton-class setup, even when the call site is otherwise trivial
  • multi-thread throughput is capped - adding more workers doesn't make things "faster"

The fix introduces a specialised, lock-free conversion path for blocks that originate at the Java-integration layer and are known to be on their way into just being used for block-to-interface execution, existing semantics for user-land RubyProcs are preserved.

Performance

Numbers for the included micro benchmark: https://gist.github.com/kares/c4616956570fd58515ad6f0ffd822f8f

Improvement seems to be at the order of 20-30x and with multi-threaded execution far beyond (no more contention between threads).

Comment thread test/jruby/test_instantiating_interfaces.rb
Copy link
Copy Markdown
Member

@headius headius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically ok but there's a lot of duplicated code from elsewhere and we could consider using MethodHandle instead of reflection objects in some of these places. The logic seems sound; generate a class for Interface### to Proc dispatch and reuse it. I have some concerns about how these are being cached, for situations where many classes are encountered rarely or only once.

Overall I approve but we can chat through some improvements before merging.

I'd also like to see the benchmark code somewhere so we can continue to audit and profile those cases.

implClass = defineImplClass(loader, interfaceType, implClassName);
}

constructor = (Constructor<? extends BlockInterfaceTemplate>) implClass.getConstructor(RubyProc.class);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course this stuff uses java.lang.reflect all over the place (both before and after this change) but we could be using MethodHandles for all of these and skip the overhead of a reflective invocation.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried unreflecting the constructor into a handle, no difference with the benchmarks mentioned in the PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handle needs a static root of some kind, either a static final field (not possible here) or by using LambdaMetaFactory to generate an interface implementation. Without those it usually is not much faster than reflection, which also uses unrooted handles internally.

/**
* Loads {@code argIndex} from the Java arg slots and boxes it if primitive.
*/
private static void loadBoxedArg(GeneratorAdapter ga, int argIndex, Class<?> paramType) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three utility methods already exist in various forms throughout JRuby. Look at RealClassGenerator for some examples. We don't need to reimplement primitive boxing and unboxing and class literal loading again.

@kares kares marked this pull request as draft May 6, 2026 15:04
@kares kares force-pushed the proc-to-iface-slow-10.0 branch from 128b86d to 7fc73e2 Compare May 7, 2026 09:53
@JIT
@SuppressWarnings("unused")
protected final IRubyObject __ruby_call(final Class<?> returnType) {
return block.call(runtime.getCurrentContext());
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while this seem a little abandoned (could be part of the generated class); at occasions I wanted to be able to set a break point around the "block-to-interface" execution, in Java, having the "template" super-class allows for that. no hard feelings if preference is to simply get rid of this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the desire and it's nice to be able to fall back on something debuggable. I wish there were a way to specify __FILE__ and __LINE__ in Java so you could provide the original generated code lines in the stack trace.

@kares kares force-pushed the proc-to-iface-slow-10.0 branch from 7fc73e2 to d1ff8e7 Compare May 7, 2026 10:35
@kares kares force-pushed the proc-to-iface-slow-10.0 branch from d1ff8e7 to bfb3c62 Compare May 7, 2026 11:00
@kares kares marked this pull request as ready for review May 8, 2026 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

excessive loading of same InterfaceImpl Java interface using closure implementaion is slower than mixin implementation

2 participants