Improve namespace handling performance in core schema building#10425
Improve namespace handling performance in core schema building#10425MarkusSintonen wants to merge 4 commits intopydantic:mainfrom
Conversation
CodSpeed Performance ReportMerging #10425 will not alter performanceComparing Summary
|
sydney-runkle
left a comment
There was a problem hiding this comment.
Great work here - very excited about this change. Addresses a lot of my concerns voiced in #10074.
Will give this a more thorough pass this afternoon :).
| source_module = sys.modules.get(cls.__module__) | ||
| if source_module is not None: | ||
| types_namespace = {**source_module.__dict__, **(types_namespace or {})} | ||
| types_namespace = NsResolver(ImmutableNs(source_module.__dict__), types_namespace) |
There was a problem hiding this comment.
We should probably be careful with precedence here - in dict updates, the latter dicts override duplicate keys in previous ones. We should follow the same pattern in NsResolver.
There was a problem hiding this comment.
Added tests now for this:
- latter overrides duplicate keys in previous ones
- "push" behaviour
|
|
||
| @contextmanager | ||
| def push(self, for_type: type[Any]): | ||
| """Push namespace of for_type as lowest priority in the resolved namespaces""" |
There was a problem hiding this comment.
See my comment above - I think push should make the new take the highest priority so that this is intuitively like a revertable dict update
There was a problem hiding this comment.
The logic actually requires it to be "lowest priority". Some tests fail if it's other way around. Should we rename this into something like "prepend_ns_from(type)"?
There was a problem hiding this comment.
Which tests fail? I recall in the types_ns_stack context management that we give priority to namespaces in the way that you've replicated, but I'm not sure why we did that (and I don't think we should). Seems much more intuitive for later ns entries to take priority. Wondering what the use case is here.
There was a problem hiding this comment.
These fail if we instead do self._namespaces.append(type_ns) + self._namespaces.pop() (which is different from original as the priority is other way around):
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ File ┃ Function ┃ Function Line ┃ Error Line ┃ Error ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│ tests/test_dataclasses.py │ test_base_dataclasses_annotations_resolving_with_override[pydantic] │ 1675 │ 1714 │ ValidationError │
│ tests/test_dataclasses.py │ test_base_dataclasses_annotations_resolving_with_override[stdlib] │ 1675 │ 1714 │ ValidationError │
│ tests/test_main.py │ test_extra_validator_named │ 2943 │ 2954 │ ValidationError │
│ tests/test_types_typeddict.py │ test_recursive_generic_typeddict_in_function_1 │ 612 │ 632 │ AssertionError │
│ tests/test_types_typeddict.py │ test_recursive_generic_typeddict_in_function_2 │ 648 │ 663 │ Failed │
│ tests/test_types_typeddict.py │ test_recursive_generic_typeddict_in_function_3 │ 681 │ 699 │ AssertionError │
└─────────────────────────────────┴───────────────────────────────────────────────────────────────────────┴─────────────────┴──────────────┴───────────────────┘
There was a problem hiding this comment.
Seems much more intuitive for later ns entries to take priority. Wondering what the use case is here.
Yeah agreed, it seems pretty weird why it needs to be like this 😅
There was a problem hiding this comment.
These fail if we instead do self._namespaces.append(type_ns) + self._namespaces.pop() (which is different from original as the priority is other way around)
Hmm, interesting.
For the dataclass annotations tests, I think we can fix this by reversing the order in which we add the dc bases to the types namespace stack (it's sort of hacky already).
I'm not sure without doing some debugging for the other tests, but I'd certainly be interested in using self._namespaces.append(type_ns) and trying to fix the breaking tests with that logic in place.
f328d81 to
5240537
Compare
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return value | ||
|
|
||
| # Only do resolve_namespace on demand as it can be expensive with big namespaces | ||
| globals_dict = globalns.resolve_namespace().unwrap_namespace() if isinstance(globalns, NsResolver) else globalns |
There was a problem hiding this comment.
It might be possible to optimize this for the ForwardRefs case also. So we would only resolve names needed from the namespace chain as required by the value. Taking a slow path of fully merging the namespaces when that fails.
5240537 to
ad58364
Compare
|
Thanks for the effort put into this. I haven't gone through all the changes yet, but wanted to communicate on a couple subjects we had with the team: Skipping namespace fetchingSydney had the idea a couple weeks ago if it was possible to skip namespace fetching logic entirely, if no stringified annotations are being used. If this is possible, I would really prefer going this way as it would most probably require less changes, and would improve performance even more. Not long ago, I started exploring some solutions. How Python evaluates stringified annotationsThe (internal) For a given type, it iterates over the arguments (similar to your To evaluate stringified annotations, Forward = int
class Model(BaseModel):
field: "Forward"
Using a lazy namespaceThe idea was to use a lazy namespace implementation: class LazyDict(dict[_KT, _VT]):
def __init__(self, fetcher: Callable[[], dict[_KT, _VT]]) -> None:
self.fetcher = fetcher
@cached_property
def data(self) -> dict[_KT, _VT]:
return self.fetcher()
def __getitem__(self, key: _KT) -> _VT:
try:
return self.data[key]
except KeyError:
# `eval` can inject `__builtins__`:
return super().__getitem__(key)In our case, This way, namespace logic is only run if necessary, and this also has the benefit of not letting any opportunity to mutate these namespaces dicts. However, for performance and probably historical reasons, One idea we had yesterday was to not provide any value for the
Or, if we want to be more generic, we could use a While this can feel a bit unusual (and I still have to confirm that the behavior is equivalent), this is probably going to be forward compatible as we will hopefully get python/cpython#121389 for Python 3.14. Performance improvementsI did some tests locally on your branch. I first have a question regarding load times: on my end, it takes around 14s to load the Regarding memory consumption, skipping namespace copies only reduces the number of temporary allocations, but we still ends up at ~600MiB when the module is fully loaded. While temporary allocations doesn't matter, I believe most of the time save is due to these reduced memory allocations, which is still a good thing. Regarding the patch, I have mixed feelings:
Footnotes
|
I like this idea! Let me check it out. For locals we can indeed pass any |
Great! I am pretty confident the |
|
Thanks for laying out our most recent thoughts here. Looking forward to iterating with you folks to find the right solution here. |
9b1e7f0 to
3890a68
Compare
|
@Viicos I now pushed some WIP refactoring where globalns dict is not given at all and the names are resolved on demand from the chain of namespaces via locals arg. Its a bit more complicated than originally. Did not yet do other refactoring to just validate the idea. It almost works! But... There is 3 pretty complex tests that fail. (The one was my local issue) Tried to dig a bit deeper but the tests are complex. Should try to isolate. But it might be that the change is revealing some hidden issue in the "experimental" feature or in generics handling. Those ForwardRefs handling in generics are super complex so its not obvious at all where its actually breaking. |
Its MacBook Pro Nov 2023 - M3 Max |
|
|
No this was not the case. Found a fix but not sure why this worked :D Doesnt seem to cause any regressions. Now only the doc example experimental thing fails for some reason. |
bed6360 to
61ebd98
Compare
Refactored it now to drop the "extra layer" here. But, I would like to keep the namespace specific resolving in its own logic instead of using some generic class like |
Following up with my commentary here. The original types namespace push logic was the following: @contextmanager
def push(self, for_type: type[Any]):
types_namespace = _typing_extra.get_module_ns_of(for_type).copy()
types_namespace.update(self.tail or {})
self._types_namespace_stack.append(types_namespace)
try:
yield
finally:
self._types_namespace_stack.pop()This, imo, is greatly flawed. Not to mention, this structure is already incredibly inefficient (which we're tackling in this PR, so not worried about that). We're effectively analyzing a type in a new namespace (getting the ns associated with the type), but giving priority to the previous ns. This results in the following flawed behavior: # test.py
from pydantic import BaseModel
from test1 import MyDataclass
from pydantic._internal._core_utils import pretty_print_core_schema
X = str
class Model(BaseModel):
dc: MyDataclass
# test1.py
from dataclasses import dataclass
X = int
@dataclass
class MyDataclass:
x: 'X'The type of Happy to help investigate why some tests fail when we make this change. |
Yes, I think I understand what the problem is. It has been deprioritizing the pushed namespace because it's coming from the module level. So it's related to global namespace. If it's prioritizing it then the global namespace may incorrectly override a local namespace already added previously. That's why tests start to fail if its changed. So how I think this should be made more sensible is to have 2 namespace list in "NsResolver" to track globals and locals separate. So in this "push" we would append to global ns list. Then on resolving we just prioritize first locals and then globals. That would make things more clear hopefully as things are not so mixed up together. |
10bf41d to
5250c37
Compare
This was indeed the issue. So separating the global and local namespace handling inside the resolver fixes the issue! So there was a old bug in the code. Added tests also against the revealed issue. Now the resolver separately manages the global namespace context and the chain of local namespaces. I think this makes the logic pretty clear as it no longer mixes and mangles the namespaces together but instead they are regarded as separate. cc @sydney-runkle nice finding! The commit fixing this: ba78bd5 |
3ea00d4 to
ba7832d
Compare
906d7b2 to
d0bb38c
Compare
|
Hi @MarkusSintonen, thanks for getting back at this. Over the next three days I'll try to come up with a solution based on your work (there are still some issues with API design and some edge cases not covered by tests that are failing now -- that's why I want to take the time and be careful with the introduced changes). I'll also document everything because with all these changes, I expect issues to be raised once we release a new version. I should be able to do it until Sunday evening. |
Fantastic - I think it makes sense to keep these compartmentalized. Much more intuitive, as well. I'll do a thorough review once @Viicos comes back with findings re the lazy namespace evals. Great work here folks. Really looking forward to having this new approach. |
4ee9bf0 to
5e4be29
Compare
248b59e to
d5e7080
Compare
…ority tests for context pushing
6616800 to
ba78bd5
Compare
8a79d25 to
d1264dd
Compare
|
Ok so I had the opportunity to take a deeper look at how namespaces are used throughout the code base. First, let's get back at the issue with the namespace push logic, as described in #10425 (comment): While it is indeed flawed as shown in the given example, it was necessary as ~50 tests are currently failing on We may be wondering: if the merging operation isn't necessary, is it necessary to keep this types namespace stack approach? The answer is "it seems to depend". Let's say we are generating a schema for a named tuple: from typing import NamedTuple
from pydantic import TypeAdapter
class NT(NamedTuple):
f: 'Forward'
TypeAdapter(NT)To do so, we push the diff --git a/pydantic/_internal/_generate_schema.py b/pydantic/_internal/_generate_schema.py
index c161818e3..15404c5df 100644
--- a/pydantic/_internal/_generate_schema.py
+++ b/pydantic/_internal/_generate_schema.py
@@ -1547,8 +1547,7 @@ class GenerateSchema:
if origin is not None:
namedtuple_cls = origin
- with self._types_namespace_stack.push(namedtuple_cls):
- annotations: dict[str, Any] = get_cls_type_hints_lenient(namedtuple_cls, self._types_namespace)
+ annotations: dict[str, Any] = get_cls_type_hints_lenient(namedtuple_cls, get_module_ns_of(namedtuple_cls))but the thing is for each field of the named tuple, class TypesStack:
_types: list[type[Any]]
@contextmanager
def push(self, typ: type[Any]):
self._types.append(typ)
try:
yield
finally:
self._types.pop()
class GenerateSchema:
...
@property
def _types_namespace(self):
return get_module_ns_of(self._types_stack.tail)The reason I'm mentioning this is you moved the stack logic in the For reference, here is a list of test cases I'm considering fixing (or documenting as a limitation if not possible). I'm going to work on it this week so no need to duplicate our efforts. |
|
Great digging. The failing tests are super helpful here. Let me know if I've missed anything:
Excited to make some more progress on this front in the coming weeks. Thanks folks! |
|
quick update on the work we've been doing. While I first thought we could be smart by only passing a local namespace to the eval function, it turns out it wasn't playing well with built-ins (you can try running the So I decided to try implementing things by always coupling the locals and globals as a named tuple, so that it can be easily unpacked when calling eval functions. In most cases, the globals will always be the module's But most importantly, we decided to completely specify how we handle namespaces for type evaluation, in particular when doing rebuilds. The document (which will be included in docs at some point) can be found here. My WIP implementation can be found on this branch: |
|
Going to close this in favor of #10530. Thanks so much @MarkusSintonen for kickstarting this process! |
Change Summary
Improves performance of namespace handling in core schema building. This avoids repeated namespace copying which happened massively previously. Eg every layer in model schema handling copied the namespace dicts again and again. This PR reduces amount of namespace copying by deferring the copying to happen only when needed when ForwardRefs are evaluated to actual types. Namespaces are now wrapped into immutable helpers to avoid copying needs from possible mutations.
Used big model file to run benchmarks https://gist.github.com/Viicos/bd9b3b0e3fbee62838626b07b311f3af
PR reduces the load time and memory usage during schema file loading:
Old
Load time: 4.32s
Total memory allocated: 2.925GB (memray)
New
Load time: 3.51s
Total memory allocated: 1.541GB (memray)
Related issue number
Partially related to #6768 and #10297
Checklist