Skip to content

Fix #335 Improve overloaded varargs resolution#425

Open
jhubbardnso wants to merge 4 commits into
jython:masterfrom
jhubbardnso:vararg_method_resolution-v2
Open

Fix #335 Improve overloaded varargs resolution#425
jhubbardnso wants to merge 4 commits into
jython:masterfrom
jhubbardnso:vararg_method_resolution-v2

Conversation

@jhubbardnso
Copy link
Copy Markdown
Contributor

When multiple Java varargs overloads match a Python argument list, rank the candidates by conversion quality before falling back to signature ordering. This preserves the preference for non-varargs overloads added for #221 while avoiding last-match or first-match behavior among varargs methods.

Add regression coverage for mixed varargs overloads such as insert(String, String...) and insert(String, boolean...), plus boolean varargs cases that previously exposed unstable overload selection.

Generated with assistance from Codex.

When multiple Java varargs overloads match a Python argument list, rank
the candidates by conversion quality before falling back to signature
ordering. This preserves the preference for non-varargs overloads added
for jython#221 while avoiding last-match or first-match behavior among varargs
methods.

Add regression coverage for mixed varargs overloads such as
insert(String, String...) and insert(String, boolean...), plus boolean
varargs cases that previously exposed unstable overload selection.

Generated with assistance from Codex.
@jhubbardnso
Copy link
Copy Markdown
Contributor Author

Here is another stab at this issue. I had Codex aid me in generating these changes but I was very validated that it implemented the same structure that I had been trying to wrap my head around. It implemented the scoring mechanism that @Stewori suggested in #336.

Given two candidate vararg matches this will compute cost to convert the arguments python types to java types. A PyBoolean to a Java Boolean or boolean is 0 cost. In contrast a PyBoolean to a java.lang.Object is very expensive so it would be a last resort.

The keys here are betterVarargsMatchThan() which is the entry point to the is Foo better than Bar check and than conversionCost() which assigns a cost to the conversion from a python type to candndate Java type. The only thing I wasn't entirely sure about the checks for serializable. That is unrelated to any of my use cases and if no one can think of a reason why targets arguments that are serializable would impact the ranking I'm happy to rip that portion out. (IIRC there was recently a discussion around serializable recently and maybe codex latched on to that and hallucinated the need for this.


There seems to be more activity than normal around JDK25 cleanup. My project has already switched to JDK 25 and while things still work for us with our patched version of 2.7.2 upgrading to an official version 2.7.5 would be a be awesome for us.

@Stewori
Copy link
Copy Markdown
Member

Stewori commented May 12, 2026

Maybe serializable is included because it is the only interface implemented by PyObject. That means, all PyObject subclasses are assignable to Serializable. So, if Serializable is the target it should score somewhat better than just Object - it's more specific. However I don't see why e.g. integer against serializable should score better than integer against boolean. So, that might need some adjustment. Any idea about the performance impact of this PR? The new comparison is more expensive than before. Did you notice a significant difference when running regrtests with or without the PR?

Comment thread src/org/python/core/ReflectedArgs.java
Comment thread src/org/python/core/ReflectedArgs.java Outdated
|| target == Double.class || target == Number.class) {
return 2;
} else if (target == Serializable.class) {
return 10;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think letting serializable score better than a plain Object is justified as all PyObject subtypes implement this interface. So the fit is more specific than against Object. However, I doubt that it should score better than Boolean below. Maybe that's just my opinion. Thoughts?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that gets to the crux of my issue. The old approach was all to happy to convert everything far too many things an array of booleans. Here is an example of the old code misbehaving

# Script evaluated by Jython interpreter
from atst.cs.data import AttributeTable

table = AttributeTable()
table.insert('dbl', 1.0, 2.0, 3.14)
table.insert('str', 'foo', 'bar', 'waz')
table.insert('int', 1, 2, 3)
table.insert('bool', True, False)

table.show('')

Previously produced

ScriptReader Interpreter.dbl: true, true, true
ScriptReader Interpreter.str: foo, bar, waz
ScriptReader Interpreter.int: true, true, true
ScriptReader Interpreter.bool: true, false

After some back and forth with codex it was able to clarify that the issue is a bit of an edge case but given the class:

import java.io.Serializable;
import java.util.Arrays;

public class Example {
    public String f(Serializable... values) { 
        return "Serializable: " + values[0].getClass().getName(); 
    }

    public String f(boolean... values) { 
        return "boolean: " + Arrays.toString(values); 
    }
}

Invoking Example().f(1) would trigger the boolean path which is less faithful than the serializable path.

I think that given that the non-varargs path doesn't include the special handling of serializable (meaning foo(Serializable s) and foo(boolean b) will prefer the latter it would probably be best to drop the serializable test here. That keep the fixed arg and vararg behavior more consistent and not risk breaking things.

I will remove this Serializable check.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if you remove the serializable check, would f(1) in

    public String f(Serializable... values) { 
        return "Serializable: " + values[0].getClass().getName(); 
    }

    public String f(Object... values) { 
        return "boolean: " + Arrays.toString(values); 
    }

still prefer serializable over object? Maybe yes, because non-primitives are checked elsewhere. Is that so?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

› Back to the question about why Serializable was used as an explict test.  
  In the case of the portion removed from line 312 in the last commit; 
  If you remove the serializable check, would f(1) in the example still prefer
  serializable of object?  Maybe yes, because non-primitives are checked 
  elsewhere.  Is that so?


• Yes, for Serializable vs Object, f(1) would still prefer Serializable even without the 
  explicit Serializable cost.

  If the special case is removed, PyInteger -> Serializable falls through to:

  return 50 + precedence(target);

  For Serializable.class, precedence() returns 2000, so the cost is 2050.

  For Object.class, conversionCost() has an explicit fallback:

  target == Object.class -> 3000

  So among varargs overloads like:

  f(Serializable... s)
  f(Object... o)

  the integer call still ranks:

  Serializable... cost 2050
  Object...       cost 3000

  and Serializable... wins.

  The explicit Serializable check matters for comparisons against other legal 
  conversions with lower fallback cost, especially Boolean:

  Boolean      20
  Serializable 2050 without special case

  So without the check, f(1) would prefer boolean... over Serializable.... It would 
  not prefer Object... over Serializable....

  One nuance: the “non-primitives are checked elsewhere” logic in compare()/compareTo()
  is mostly the fixed-signature ordering path. The new varargs cost path does not use 
  assignability among arbitrary reference types except through getJavaProxy() and
  target.isInstance(pyArg). A plain PyInteger is not itself an instance of Serializable; it 
  only converts to a Java Integer through __tojava__, which is why the explicit 
  Serializable cost changes varargs ranking.

Copy link
Copy Markdown
Member

@Stewori Stewori May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, good to know. What do you think of scoring boolean much worse? Since in Python every object implements __nonzero__, it can be used as a boolean. So, it might be justified to score that somewhere closer to the magnitude of other ultra-generalists like Serializable and Object. Perhaps at around 1000 or 1500 or so. What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1000 or 1500 isn't sufficient. We'd have to bump it to >2050 for an actual change of ordering. A value like 2500 would change the behavior if you had an overload like foo(boolean) and foo(BigInteger) and you tried to invoke with a PyLong. Should a PyLong map directly to a BigInteger? Maybe, but I don't think that is how the non-varargs resolver will behave and I think that consistency is important.

Copy link
Copy Markdown
Member

@Stewori Stewori May 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, I think PyLong should indeed preferably cast to BigInteger rather than boolean. However, boolean should be preferred over Serializable and Object. Consistency with non-varargs is a fair point though. Is the behavior consistent in the current form of this PR? For PyLong/BigInteger/boolean, it would be good to pinpoint the current behavior (this PR) in vararg vs non-vararg by an example code snippet, perhaps by a test. To avoid relying on guessing for the anticipated consistency.
I already wondered whether the scoring might not also be a better solution for non-vararg resolving, but such changes would probably be too breaking. Perhaps a thought for Jython 3.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some additional test cases that are just concerned with whether the single arg and var args resolutions match. Those tests exposed one case where previously the varargs and fixed arity behavior didn't match. The fixed arity resolution prefers Object over boolean when the argument is PyFloat. This was addressed by adding a very high cost to PyFloat --> boolean. I kind of think that the fixed-arity decision is wrong but that feels out of scope for these changes. If you'd like I can open a ticket for the fixed arg case.

Comment thread src/org/python/core/ReflectedArgs.java Outdated
if (target == String.class) {
return 0;
} else if (target == Serializable.class) {
return 10;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar critique as above. Here I'm more confident though. Taking a String for a Character should score way better than for a Serializable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the serializable check as well.

Remove instanceof Serializable checks as they create a difference in
behavior between the varargs and fixed args resolution.
@jhubbardnso
Copy link
Copy Markdown
Contributor Author

Any idea about the performance impact of this PR? The new comparison is more expensive than before. Did you notice a significant difference when running regrtests with or without the PR?

I have very little experience running the regrtests and can't really say what the performance impact is. I'm not sure how consistent things the are but the Ubuntu and Windows action runs for this PR took 11:13 and 18:14 respectively. The latest run for master took 11:18 and 17:19. It kind of feels like these changes are probably in the noise but the impact would obviously be felt more on code bases that heavily utilize varargs overloads. The resolution would be most expensive in cases where there are a lot of arguments that all need to be costed and there are multiple candidate overloads.

One good thing about the current implementation is that it does not compute the cost of the varargs conversions unless it needs to see if the cost of the current best function is better than the cost of an alternative function. The bad thing about the current implementation is that it does not cache the cost of the current preferred method. So, if you have a function with many overloads and many many arguments and the first checked function was the best option, it will still re-compute the cost for that first function for each subsequent comparison.

Use new isSequenceVarargs in existing ensureBoxedVarargs() to ensure a
single source of truth.

Move original comments regarding why a long list of lcasses is being
used instead of PySequence to the new method.
Add overload-resolution regression coverage comparing fixed-arity and
varargs dispatch for PyInteger, PyLong, and PyFloat against boolean,
BigInteger, BigDecimal, and Object overloads.

Keep varargs scoring aligned with existing fixed-arity behavior by
handling PyLong boolean conversion explicitly, making PyFloat boolean
conversion more expensive than Object, and ensuring Object keeps its
intended high fallback cost.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants