Skip to content

SONARPY-870 Use serialized version of TypeShed core modules (builtins and its dependencies)#940

Merged
guillaume-dequenne merged 4 commits into
mmf-2435from
SONARPY-870
Jul 21, 2021
Merged

SONARPY-870 Use serialized version of TypeShed core modules (builtins and its dependencies)#940
guillaume-dequenne merged 4 commits into
mmf-2435from
SONARPY-870

Conversation

@andrea-guarino-sonarsource

Copy link
Copy Markdown
Contributor

No description provided.

@andrea-guarino-sonarsource andrea-guarino-sonarsource force-pushed the SONARPY-870 branch 3 times, most recently from f7df02a to 3eb4633 Compare July 15, 2021 22:29
@andrea-guarino-sonarsource andrea-guarino-sonarsource force-pushed the SONARPY-870 branch 2 times, most recently from 1cd43be to 9738c85 Compare July 19, 2021 10:30
@andrea-guarino-sonarsource andrea-guarino-sonarsource marked this pull request as ready for review July 19, 2021 12:41
@andrea-guarino-sonarsource andrea-guarino-sonarsource changed the title Use serialized version of TypeShed core modules (builtins and its dependencies) SONARPY-870 Use serialized version of TypeShed core modules (builtins and its dependencies) Jul 19, 2021

@guillaume-dequenne guillaume-dequenne left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall.
Left a few questions/suggestions here and there.

return null;
}

public Set<String> validFor() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been using this name in the context of serialization of merged symbols but I realize it might not be explicit once the context is missing.
Do you think we should try to come up with a better name? (validForPythonVersions or something like that...)

Comment on lines +7 to +11
340,
341,
342,
343,
344,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these issues are FPs. The LHS operator is indeed is supposed to be bytes and thus unsuitable for identity comparisons, but I think the rule should make exceptions for is not None cases.

I think it's not that urgent to fix because this is a problem only when there is a TP reported by S5727, but it's a bit noisy. (and if we ever have FPs on S5727, it would be extra noisy). Could you create a ticket for that? (and if we have time during the sprint I don't think it would take too much time to address).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ticket created

* This method sort ambiguous symbol by python version and returns the one which is valid for
* the most recent Python version.
*/
private static Symbol disambiguateWithLatestPythonSymbol(Set<Symbol> alternatives) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I think this could be done more simply in a single for loop where you only keep the symbol corresponding to the max validFor value encountered.
It's not like it will have any real performance impact so feel free to keep things this way if you think it's more readable.

Arrays.asList(INT, FLOAT, COMPLEX, STR, BuiltinTypes.SET, DICT, LIST, TUPLE, NONE_TYPE, BOOL, "type", "super", "frozenset", "memoryview"));

static {
BUILTINS_TO_DISAMBIGUATE.addAll(BuiltinSymbols.EXCEPTIONS);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised to see we still rely on this BuiltinSymbols class for quite some things.

It's out of the scope of this PR but do you think we should consider getting rid of it and possibly generate it from TypeShed (through separate serialization of a list of all encountered builtin names for example) ?

We're not really maintaining it so it might be out of sync with latest Python versions (not sure many builtins were added, if any, though - but still).

If you agree, maybe we can create some ticket in our backlog to address this at some point?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think it makes sense to have a ticket to remove this class

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ticket created.

Comment on lines 429 to 438
Map<String, Symbol> builtinSymbols = builtinSymbols();
Symbol builtinSymbol = builtinSymbols.get(fullyQualifiedName);
if (builtinSymbol != null) {
return builtinSymbol;
}
String[] fqnSplittedByDot = fullyQualifiedName.split("\\.");
String localName = fqnSplittedByDot[fqnSplittedByDot.length - 1];
if (fqnSplittedByDot.length == 2 && fqnSplittedByDot[0].equals("builtins") && builtins.containsKey(localName)) {
return builtins.get(localName);
if (fqnSplittedByDot.length == 2 && fqnSplittedByDot[0].equals(BUILTINS_FQN) && builtinSymbols.containsKey(localName)) {
return builtinSymbols.get(localName);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, those two if are meant to try and retrieve the builtin symbol matching the given FQN.
Perhaps the two conditions can be simplified into one by calling normalizedFQN before. Wdyt?


@Override
public Set<String> validFor() {
return alternatives().stream().flatMap(symbol -> ((SymbolImpl) symbol).validFor().stream()).collect(Collectors.toSet());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current implementation, I think it's hard to understand whether an ambiguous symbol is ambiguous due to having different definitions depending on the Python version or if it's an overloaded function/method.

In the latter case, I think it would make sense to actually store validFor as a field for the whole ambiguous symbol. Maybe we could store an isOverloaded field as well instead. I don't think it's really needed, though.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you said, I don't think we really need to know whether an ambiguous symbol is due to multiple Python version or not. Do you have some usecase in mind?

Comment on lines +264 to +265
Symbol memoryview = TypeShed.builtinSymbols().get("memoryview");
InferredType memoryView = InferredTypes.runtimeType(memoryview);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this change. Above, we keep using runtimeBuiltinType for unicode and it seems to be doing the same thing. I don't have an opinion on which approach to keep but why have two?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I forgot to remove this change!

this.supportsGenerics = supportsGenerics;
}

public boolean hasResolvedSuperClasses() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intuitively from its name, this method would return !hasUnresolvedTypeHierarchy in my mind.

I'm not sure I understand it very well because to me it might still return true if there are no super classes at all or if they are unresolved. Am I missing something?

@andrea-guarino-sonarsource andrea-guarino-sonarsource Jul 20, 2021

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is used to know whether super classes have been already computed of if the symbol is still "lazy".
superClassesFqns.isEmpty() condition is used to know wheter the symbol is from typeshed or not. (i.e. if superClassesFqns is empty, we don't care too much if the symbol is lazy or not)
Maybe hasComputedSuperClasses would be clearer? or do you have better suggestions?

Comment on lines +146 to +149
} else if (superClass.kind() == Kind.CLASS) {
copiedClassSymbol.superClasses.add(((ClassSymbolImpl) superClass).copyWithoutUsages());
} else if (superClass.is(Kind.AMBIGUOUS)) {
copiedClassSymbol.superClasses.add(((AmbiguousSymbolImpl) superClass).copyWithoutUsages());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually why are we keeping two branches here? I know this was done this way before but we could merge them and cast to SymbolImpl instead I think?

this(functionSymbolProto, insideClass, functionSymbolProto.getValidForList());
}

public FunctionSymbolImpl(SymbolsProtos.FunctionSymbol functionSymbolProto, boolean insideClass, List<String> validFor) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit weird that we need this validFor parameter, to me the SymbolsProtos.FunctionSymbol should probably contain it, even in case of overloaded method/function.
It might make the protobuf serialization a bit heavier for no gain though, so I'm not entirely sure it's worth doing. Wdyt?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer it to be done in the serialization too.
I think for the time being we can keep it in the deserialization, and maybe change this in another PR

@sonarsource-next

Copy link
Copy Markdown

@guillaume-dequenne guillaume-dequenne left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@guillaume-dequenne guillaume-dequenne merged commit a3003fb into mmf-2435 Jul 21, 2021
@guillaume-dequenne guillaume-dequenne deleted the SONARPY-870 branch July 21, 2021 12:29
andrea-guarino-sonarsource added a commit that referenced this pull request Jul 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants