Fix #335 Improve overloaded varargs resolution by jhubbardnso · Pull Request #425 · jython/jython

jhubbardnso · 2026-05-12T19:05:10Z

When multiple Java varargs overloads match a Python argument list, rank the candidates by conversion quality before falling back to signature ordering. This preserves the preference for non-varargs overloads added for #221 while avoiding last-match or first-match behavior among varargs methods.

Add regression coverage for mixed varargs overloads such as insert(String, String...) and insert(String, boolean...), plus boolean varargs cases that previously exposed unstable overload selection.

Generated with assistance from Codex.

When multiple Java varargs overloads match a Python argument list, rank the candidates by conversion quality before falling back to signature ordering. This preserves the preference for non-varargs overloads added for jython#221 while avoiding last-match or first-match behavior among varargs methods. Add regression coverage for mixed varargs overloads such as insert(String, String...) and insert(String, boolean...), plus boolean varargs cases that previously exposed unstable overload selection. Generated with assistance from Codex.

jhubbardnso · 2026-05-12T19:18:46Z

Here is another stab at this issue. I had Codex aid me in generating these changes but I was very validated that it implemented the same structure that I had been trying to wrap my head around. It implemented the scoring mechanism that @Stewori suggested in #336.

Given two candidate vararg matches this will compute cost to convert the arguments python types to java types. A PyBoolean to a Java Boolean or boolean is 0 cost. In contrast a PyBoolean to a java.lang.Object is very expensive so it would be a last resort.

The keys here are betterVarargsMatchThan() which is the entry point to the is Foo better than Bar check and than conversionCost() which assigns a cost to the conversion from a python type to candndate Java type. The only thing I wasn't entirely sure about the checks for serializable. That is unrelated to any of my use cases and if no one can think of a reason why targets arguments that are serializable would impact the ranking I'm happy to rip that portion out. (IIRC there was recently a discussion around serializable recently and maybe codex latched on to that and hallucinated the need for this.

There seems to be more activity than normal around JDK25 cleanup. My project has already switched to JDK 25 and while things still work for us with our patched version of 2.7.2 upgrading to an official version 2.7.5 would be a be awesome for us.

Stewori · 2026-05-12T20:48:58Z

Maybe serializable is included because it is the only interface implemented by PyObject. That means, all PyObject subclasses are assignable to Serializable. So, if Serializable is the target it should score somewhat better than just Object - it's more specific. However I don't see why e.g. integer against serializable should score better than integer against boolean. So, that might need some adjustment. Any idea about the performance impact of this PR? The new comparison is more expensive than before. Did you notice a significant difference when running regrtests with or without the PR?

Stewori · 2026-05-12T20:53:59Z

+                    || target == Double.class || target == Number.class) {
+                return 2;
+            } else if (target == Serializable.class) {
+                return 10;


I think letting serializable score better than a plain Object is justified as all PyObject subtypes implement this interface. So the fit is more specific than against Object. However, I doubt that it should score better than Boolean below. Maybe that's just my opinion. Thoughts?

I think that gets to the crux of my issue. The old approach was all to happy to convert everything far too many things an array of booleans. Here is an example of the old code misbehaving

# Script evaluated by Jython interpreter from atst.cs.data import AttributeTable table = AttributeTable() table.insert('dbl', 1.0, 2.0, 3.14) table.insert('str', 'foo', 'bar', 'waz') table.insert('int', 1, 2, 3) table.insert('bool', True, False) table.show('')

Previously produced

ScriptReader Interpreter.dbl: true, true, true ScriptReader Interpreter.str: foo, bar, waz ScriptReader Interpreter.int: true, true, true ScriptReader Interpreter.bool: true, false

After some back and forth with codex it was able to clarify that the issue is a bit of an edge case but given the class:

import java.io.Serializable; import java.util.Arrays; public class Example { public String f(Serializable... values) { return "Serializable: " + values[0].getClass().getName(); } public String f(boolean... values) { return "boolean: " + Arrays.toString(values); } }

Invoking Example().f(1) would trigger the boolean path which is less faithful than the serializable path.

I think that given that the non-varargs path doesn't include the special handling of serializable (meaning foo(Serializable s) and foo(boolean b) will prefer the latter it would probably be best to drop the serializable test here. That keep the fixed arg and vararg behavior more consistent and not risk breaking things.

I will remove this Serializable check.

But if you remove the serializable check, would f(1) in

public String f(Serializable... values) { return "Serializable: " + values[0].getClass().getName(); } public String f(Object... values) { return "boolean: " + Arrays.toString(values); }

still prefer serializable over object? Maybe yes, because non-primitives are checked elsewhere. Is that so?

› Back to the question about why Serializable was used as an explict test. In the case of the portion removed from line 312 in the last commit; If you remove the serializable check, would f(1) in the example still prefer serializable of object? Maybe yes, because non-primitives are checked elsewhere. Is that so? • Yes, for Serializable vs Object, f(1) would still prefer Serializable even without the explicit Serializable cost. If the special case is removed, PyInteger -> Serializable falls through to: return 50 + precedence(target); For Serializable.class, precedence() returns 2000, so the cost is 2050. For Object.class, conversionCost() has an explicit fallback: target == Object.class -> 3000 So among varargs overloads like: f(Serializable... s) f(Object... o) the integer call still ranks: Serializable... cost 2050 Object... cost 3000 and Serializable... wins. The explicit Serializable check matters for comparisons against other legal conversions with lower fallback cost, especially Boolean: Boolean 20 Serializable 2050 without special case So without the check, f(1) would prefer boolean... over Serializable.... It would not prefer Object... over Serializable.... One nuance: the “non-primitives are checked elsewhere” logic in compare()/compareTo() is mostly the fixed-signature ordering path. The new varargs cost path does not use assignability among arbitrary reference types except through getJavaProxy() and target.isInstance(pyArg). A plain PyInteger is not itself an instance of Serializable; it only converts to a Java Integer through __tojava__, which is why the explicit Serializable cost changes varargs ranking.

Alright, good to know. What do you think of scoring boolean much worse? Since in Python every object implements __nonzero__, it can be used as a boolean. So, it might be justified to score that somewhere closer to the magnitude of other ultra-generalists like Serializable and Object. Perhaps at around 1000 or 1500 or so. What do you think?

1000 or 1500 isn't sufficient. We'd have to bump it to >2050 for an actual change of ordering. A value like 2500 would change the behavior if you had an overload like foo(boolean) and foo(BigInteger) and you tried to invoke with a PyLong. Should a PyLong map directly to a BigInteger? Maybe, but I don't think that is how the non-varargs resolver will behave and I think that consistency is important.

In principle, I think PyLong should indeed preferably cast to BigInteger rather than boolean. However, boolean should be preferred over Serializable and Object. Consistency with non-varargs is a fair point though. Is the behavior consistent in the current form of this PR? For PyLong/BigInteger/boolean, it would be good to pinpoint the current behavior (this PR) in vararg vs non-vararg by an example code snippet, perhaps by a test. To avoid relying on guessing for the anticipated consistency.
I already wondered whether the scoring might not also be a better solution for non-vararg resolving, but such changes would probably be too breaking. Perhaps a thought for Jython 3.

I've added some additional test cases that are just concerned with whether the single arg and var args resolutions match. Those tests exposed one case where previously the varargs and fixed arity behavior didn't match. The fixed arity resolution prefers Object over boolean when the argument is PyFloat. This was addressed by adding a very high cost to PyFloat --> boolean. I kind of think that the fixed-arity decision is wrong but that feels out of scope for these changes. If you'd like I can open a ticket for the fixed arg case.

Stewori · 2026-05-12T20:55:37Z

+            if (target == String.class) {
+                return 0;
+            } else if (target == Serializable.class) {
+                return 10;


Similar critique as above. Here I'm more confident though. Taking a String for a Character should score way better than for a Serializable.

I will remove the serializable check as well.

Remove instanceof Serializable checks as they create a difference in behavior between the varargs and fixed args resolution.

jhubbardnso · 2026-05-12T22:07:49Z

Any idea about the performance impact of this PR? The new comparison is more expensive than before. Did you notice a significant difference when running regrtests with or without the PR?

I have very little experience running the regrtests and can't really say what the performance impact is. I'm not sure how consistent things the are but the Ubuntu and Windows action runs for this PR took 11:13 and 18:14 respectively. The latest run for master took 11:18 and 17:19. It kind of feels like these changes are probably in the noise but the impact would obviously be felt more on code bases that heavily utilize varargs overloads. The resolution would be most expensive in cases where there are a lot of arguments that all need to be costed and there are multiple candidate overloads.

One good thing about the current implementation is that it does not compute the cost of the varargs conversions unless it needs to see if the cost of the current best function is better than the cost of an alternative function. The bad thing about the current implementation is that it does not cache the cost of the current preferred method. So, if you have a function with many overloads and many many arguments and the first checked function was the best option, it will still re-compute the cost for that first function for each subsequent comparison.

Use new isSequenceVarargs in existing ensureBoxedVarargs() to ensure a single source of truth. Move original comments regarding why a long list of lcasses is being used instead of PySequence to the new method.

Add overload-resolution regression coverage comparing fixed-arity and varargs dispatch for PyInteger, PyLong, and PyFloat against boolean, BigInteger, BigDecimal, and Object overloads. Keep varargs scoring aligned with existing fixed-arity behavior by handling PyLong boolean conversion explicitly, making PyFloat boolean conversion more expensive than Object, and ensuring Object keeps its intended high fallback cost.

Stewori reviewed May 12, 2026

View reviewed changes

Comment thread src/org/python/core/ReflectedArgs.java

Stewori reviewed May 12, 2026

View reviewed changes

Implement PR feedback

27ed9ca

Remove instanceof Serializable checks as they create a difference in behavior between the varargs and fixed args resolution.

jhubbardnso added 2 commits May 13, 2026 12:56

PR Feedback

b0395fd

Use new isSequenceVarargs in existing ensureBoxedVarargs() to ensure a single source of truth. Move original comments regarding why a long list of lcasses is being used instead of PySequence to the new method.

Conversation

jhubbardnso commented May 12, 2026

Uh oh!

jhubbardnso commented May 12, 2026

Uh oh!

Stewori commented May 12, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Stewori May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Stewori May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhubbardnso commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Stewori May 13, 2026 •

edited

Loading

Stewori May 17, 2026 •

edited

Loading