SONARPY-870 Use serialized version of TypeShed core modules (builtins and its dependencies)#940
Conversation
f7df02a to
3eb4633
Compare
1cd43be to
9738c85
Compare
… and its dependencies)
9738c85 to
a21dfde
Compare
guillaume-dequenne
left a comment
There was a problem hiding this comment.
LGTM overall.
Left a few questions/suggestions here and there.
| return null; | ||
| } | ||
|
|
||
| public Set<String> validFor() { |
There was a problem hiding this comment.
I've been using this name in the context of serialization of merged symbols but I realize it might not be explicit once the context is missing.
Do you think we should try to come up with a better name? (validForPythonVersions or something like that...)
| 340, | ||
| 341, | ||
| 342, | ||
| 343, | ||
| 344, |
There was a problem hiding this comment.
I think these issues are FPs. The LHS operator is indeed is supposed to be bytes and thus unsuitable for identity comparisons, but I think the rule should make exceptions for is not None cases.
I think it's not that urgent to fix because this is a problem only when there is a TP reported by S5727, but it's a bit noisy. (and if we ever have FPs on S5727, it would be extra noisy). Could you create a ticket for that? (and if we have time during the sprint I don't think it would take too much time to address).
| * This method sort ambiguous symbol by python version and returns the one which is valid for | ||
| * the most recent Python version. | ||
| */ | ||
| private static Symbol disambiguateWithLatestPythonSymbol(Set<Symbol> alternatives) { |
There was a problem hiding this comment.
nitpick: I think this could be done more simply in a single for loop where you only keep the symbol corresponding to the max validFor value encountered.
It's not like it will have any real performance impact so feel free to keep things this way if you think it's more readable.
| Arrays.asList(INT, FLOAT, COMPLEX, STR, BuiltinTypes.SET, DICT, LIST, TUPLE, NONE_TYPE, BOOL, "type", "super", "frozenset", "memoryview")); | ||
|
|
||
| static { | ||
| BUILTINS_TO_DISAMBIGUATE.addAll(BuiltinSymbols.EXCEPTIONS); |
There was a problem hiding this comment.
I'm surprised to see we still rely on this BuiltinSymbols class for quite some things.
It's out of the scope of this PR but do you think we should consider getting rid of it and possibly generate it from TypeShed (through separate serialization of a list of all encountered builtin names for example) ?
We're not really maintaining it so it might be out of sync with latest Python versions (not sure many builtins were added, if any, though - but still).
If you agree, maybe we can create some ticket in our backlog to address this at some point?
There was a problem hiding this comment.
Yes I think it makes sense to have a ticket to remove this class
| Map<String, Symbol> builtinSymbols = builtinSymbols(); | ||
| Symbol builtinSymbol = builtinSymbols.get(fullyQualifiedName); | ||
| if (builtinSymbol != null) { | ||
| return builtinSymbol; | ||
| } | ||
| String[] fqnSplittedByDot = fullyQualifiedName.split("\\."); | ||
| String localName = fqnSplittedByDot[fqnSplittedByDot.length - 1]; | ||
| if (fqnSplittedByDot.length == 2 && fqnSplittedByDot[0].equals("builtins") && builtins.containsKey(localName)) { | ||
| return builtins.get(localName); | ||
| if (fqnSplittedByDot.length == 2 && fqnSplittedByDot[0].equals(BUILTINS_FQN) && builtinSymbols.containsKey(localName)) { | ||
| return builtinSymbols.get(localName); | ||
| } |
There was a problem hiding this comment.
If I understand correctly, those two if are meant to try and retrieve the builtin symbol matching the given FQN.
Perhaps the two conditions can be simplified into one by calling normalizedFQN before. Wdyt?
|
|
||
| @Override | ||
| public Set<String> validFor() { | ||
| return alternatives().stream().flatMap(symbol -> ((SymbolImpl) symbol).validFor().stream()).collect(Collectors.toSet()); |
There was a problem hiding this comment.
With the current implementation, I think it's hard to understand whether an ambiguous symbol is ambiguous due to having different definitions depending on the Python version or if it's an overloaded function/method.
In the latter case, I think it would make sense to actually store validFor as a field for the whole ambiguous symbol. Maybe we could store an isOverloaded field as well instead. I don't think it's really needed, though.
There was a problem hiding this comment.
As you said, I don't think we really need to know whether an ambiguous symbol is due to multiple Python version or not. Do you have some usecase in mind?
| Symbol memoryview = TypeShed.builtinSymbols().get("memoryview"); | ||
| InferredType memoryView = InferredTypes.runtimeType(memoryview); |
There was a problem hiding this comment.
I don't understand this change. Above, we keep using runtimeBuiltinType for unicode and it seems to be doing the same thing. I don't have an opinion on which approach to keep but why have two?
There was a problem hiding this comment.
Indeed, I forgot to remove this change!
| this.supportsGenerics = supportsGenerics; | ||
| } | ||
|
|
||
| public boolean hasResolvedSuperClasses() { |
There was a problem hiding this comment.
Intuitively from its name, this method would return !hasUnresolvedTypeHierarchy in my mind.
I'm not sure I understand it very well because to me it might still return true if there are no super classes at all or if they are unresolved. Am I missing something?
There was a problem hiding this comment.
This method is used to know whether super classes have been already computed of if the symbol is still "lazy".
superClassesFqns.isEmpty() condition is used to know wheter the symbol is from typeshed or not. (i.e. if superClassesFqns is empty, we don't care too much if the symbol is lazy or not)
Maybe hasComputedSuperClasses would be clearer? or do you have better suggestions?
| } else if (superClass.kind() == Kind.CLASS) { | ||
| copiedClassSymbol.superClasses.add(((ClassSymbolImpl) superClass).copyWithoutUsages()); | ||
| } else if (superClass.is(Kind.AMBIGUOUS)) { | ||
| copiedClassSymbol.superClasses.add(((AmbiguousSymbolImpl) superClass).copyWithoutUsages()); |
There was a problem hiding this comment.
Actually why are we keeping two branches here? I know this was done this way before but we could merge them and cast to SymbolImpl instead I think?
| this(functionSymbolProto, insideClass, functionSymbolProto.getValidForList()); | ||
| } | ||
|
|
||
| public FunctionSymbolImpl(SymbolsProtos.FunctionSymbol functionSymbolProto, boolean insideClass, List<String> validFor) { |
There was a problem hiding this comment.
It's a bit weird that we need this validFor parameter, to me the SymbolsProtos.FunctionSymbol should probably contain it, even in case of overloaded method/function.
It might make the protobuf serialization a bit heavier for no gain though, so I'm not entirely sure it's worth doing. Wdyt?
There was a problem hiding this comment.
I would prefer it to be done in the serialization too.
I think for the time being we can keep it in the deserialization, and maybe change this in another PR
|
SonarQube Quality Gate |
… and its dependencies) (#940)








No description provided.