🚨Fix memory leaks caused by lru decorators in vision models#45922
Conversation
|
run-slow: beit, conditional_detr, d_fine, data2vec, deformable_detr, deimv2, detr, edgetam, edgetam_video, mask2former, maskformer, oneformer, pp_doclayout_v2, pp_doclayout_v3, rt_detr, rt_detr_v2, sam2, sam2_video, sam3, sam3_lite_text, sam3_tracker, sam3_tracker_video |
|
This comment contains models: ["models/beit", "models/conditional_detr", "models/d_fine", "models/data2vec", "models/deformable_detr", "models/deimv2", "models/detr", "models/edgetam", "models/edgetam_video", "models/mask2former", "models/maskformer", "models/oneformer", "models/pp_doclayout_v2", "models/pp_doclayout_v3", "models/rt_detr", "models/rt_detr_v2", "models/sam2", "models/sam2_video", "models/sam3", "models/sam3_lite_text", "models/sam3_tracker", "models/sam3_tracker_video"] |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
CI ResultsCommit Info
The test failure analysis could not be completed. Please check the workflow run for details. |
|
run-slow: beit, conditional_detr, d_fine, data2vec, deformable_detr, deimv2, detr, edgetam, edgetam_video, mask2former, maskformer, oneformer, pp_doclayout_v2, pp_doclayout_v3, rt_detr, rt_detr_v2, sam2, sam2_video, sam3, sam3_lite_text, sam3_tracker, sam3_tracker_video |
|
This comment contains models: ["models/beit", "models/conditional_detr", "models/d_fine", "models/data2vec", "models/deformable_detr", "models/deimv2", "models/detr", "models/edgetam", "models/edgetam_video", "models/mask2former", "models/maskformer", "models/oneformer", "models/pp_doclayout_v2", "models/pp_doclayout_v3", "models/rt_detr", "models/rt_detr_v2", "models/sam2", "models/sam2_video", "models/sam3", "models/sam3_lite_text", "models/sam3_tracker", "models/sam3_tracker_video"] |
CI ResultsCommit Info
The test failure analysis could not be completed. Please check the workflow run for details. |
vasqu
left a comment
There was a problem hiding this comment.
Just did a sanity check and went through a few files. I think this is fine but would like your opinion on my comments; currently we do have a few breaking spots so we should be careful
|
|
||
|
|
||
| @compile_compatible_method_lru_cache(maxsize=10) | ||
| def _cached_generate_relative_position_index(window_size: tuple[int, int]) -> torch.Tensor: |
There was a problem hiding this comment.
Just to be sure that I understood the root cause --> the lru cache saved a reference of self as it's a model's internal function?
Would it be possible to instead make these staticmethod? Might be a more natural move than all the private global fns
Anyways we do need to add 🚨 because it changes the behavior slightly and people might have relied on these internal functions
There was a problem hiding this comment.
Just to be sure that I understood the root cause --> the lru cache saved a reference of self as it's a model's internal function?
Yes, I'm not sure about the exact inner workings of lru_cache but as a general rule it's not good practice to apply them to instance methods. That was my mistake
Would it be possible to instead make these staticmethod? Might be a more natural move than all the private global fns
That's a good point! I'll try to refactor with staticmethod instead, and make sure lru_cache doesn't cause issues when applied to static method (should be ok)
|
|
||
|
|
||
| @compile_compatible_method_lru_cache(maxsize=1) | ||
| def _cached_build_sine_position_embedding(*args, **kwargs) -> torch.Tensor: |
There was a problem hiding this comment.
Ah ok here we keep BC - we really need to decide what would be better ig but should be consistent then
| text_embeds = model.get_text_features( | ||
| input_ids=inputs_dict["input_ids"], attention_mask=inputs_dict["attention_mask"], return_dict=True | ||
| ).pooler_output | ||
| ) |
There was a problem hiding this comment.
That is definitely breaking 👀 Any reason we could not keep the same behavior there?
There was a problem hiding this comment.
I think it was broken before this fix, as the docs were already using the behavior of this PR, but it's a niche feature so maybe why it was never flagged. I'll add a 🚨
|
Thanks for the review @vasqu ! Indeed it might be better to use staticmethod, I'll make the changes and ping you when it's ready :) |
|
run-slow: beit, conditional_detr, d_fine, data2vec, deformable_detr, deimv2, detr, edgetam, edgetam_video, mask2former, maskformer, oneformer, pp_doclayout_v2, pp_doclayout_v3, rt_detr, rt_detr_v2, sam2, sam2_video, sam3, sam3_lite_text, sam3_tracker, sam3_tracker_video |
|
This comment contains models: ["models/beit", "models/conditional_detr", "models/d_fine", "models/data2vec", "models/deformable_detr", "models/deimv2", "models/detr", "models/edgetam", "models/edgetam_video", "models/mask2former", "models/maskformer", "models/oneformer", "models/pp_doclayout_v2", "models/pp_doclayout_v3", "models/rt_detr", "models/rt_detr_v2", "models/sam2", "models/sam2_video", "models/sam3", "models/sam3_lite_text", "models/sam3_tracker", "models/sam3_tracker_video"] |
CI ResultsCommit Info
Model CI Report❌ 29 new failed tests from this PR 😭
|
|
run-slow: beit, conditional_detr, d_fine, data2vec, deformable_detr, deimv2, detr, edgetam, edgetam_video, mask2former, maskformer, oneformer, pp_doclayout_v2, pp_doclayout_v3, rt_detr, rt_detr_v2, sam2, sam2_video, sam3, sam3_lite_text, sam3_tracker, sam3_tracker_video |
|
This comment contains models: ["models/beit", "models/conditional_detr", "models/d_fine", "models/data2vec", "models/deformable_detr", "models/deimv2", "models/detr", "models/edgetam", "models/edgetam_video", "models/mask2former", "models/maskformer", "models/oneformer", "models/pp_doclayout_v2", "models/pp_doclayout_v3", "models/rt_detr", "models/rt_detr_v2", "models/sam2", "models/sam2_video", "models/sam3", "models/sam3_lite_text", "models/sam3_tracker", "models/sam3_tracker_video"] |
|
Waiting to see if all slow tests pass but it should be good to go then @vasqu ! |
|
Sounds good, keeping an eye here to review later today 🫡 |
vasqu
left a comment
There was a problem hiding this comment.
Just a few smaller comments 🤗 Can you also add the potentially breaking stuff into the PR description, e.g. the sam3 pooler behavior, small renames for args (pos features)
| @staticmethod | ||
| @compile_compatible_method_lru_cache(maxsize=32) | ||
| def _cached_build_2d_sinusoidal_position_embedding(*args, **kwargs) -> torch.Tensor: | ||
| return build_2d_sinusoidal_position_embedding(*args, **kwargs) |
There was a problem hiding this comment.
this wrapper structure is a bit awkward, could we move things here a bit to make it directly use build_2d_sinusoidal_position_embedding or is the inheritance too awkward?
There was a problem hiding this comment.
Not sure we can do that and still have it standardized everywhere it's used, because some models call build_2d_sinusoidal_position_embedding at init and copy the result in their weights, so they don't need to lru_cache it at all.
| self.num_feature_levels = 3 | ||
| self.position_embedder = Mask2FormerSinePositionEmbedding(num_pos_feats=hidden_dim // 2, normalize=True) | ||
| self.position_embedder = Mask2FormerSinePositionEmbedding( | ||
| num_position_features=hidden_dim // 2, normalize=True |
There was a problem hiding this comment.
this is slightly breaking because of the rename
|
[For maintainers] Suggested jobs to run (before merge) run-slow: beit, conditional_detr, d_fine, dab_detr, data2vec, deformable_detr, deimv2, detr, edgetam, edgetam_video, got_ocr2, mask2former, maskformer, oneformer, pp_doclayout_v2, pp_doclayout_v3 |
Lru decorators were keeping references to entire models causing memory to not be freed when deleting the models
Fixes #45412
Breaking changes: