adds support for torch.Tensor input in inpaint pipeline#1128
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
| import torch | ||
|
|
||
| import PIL | ||
| from torchvision.transforms.functional import to_pil_image |
There was a problem hiding this comment.
Sorry we cannot use this here -> we don't have a dependency on torchvision . Could you instead use the pipe.numpy_to_pil function?
There was a problem hiding this comment.
Ah right, I thought it would be OK as it is listed as a dependency here (though maybe I have misunderstood this file. I will change this.
There was a problem hiding this comment.
There was a problem hiding this comment.
Maybe we could instead make prepare_mask_and_masked_image a method of the pipeline and instead use self.numpy_to_pil?
There was a problem hiding this comment.
I think it being in or out the pipeline would work, as numpy to pil is a static method so can be accessed without actually passing a reference to the pipe.
I prefer it as a method of the pipe I'll try and get something to you tomorrow ~
There was a problem hiding this comment.
We still cannot have a dependency on torchvision though ;-)
There was a problem hiding this comment.
I removed the torchvision dependency.
I noticed also the docstrings make reference to accepting batched inputs. So I'll add that too soon.
edit: I decided against using numpy_to_pil after experimenting with that approach.
There was a problem hiding this comment.
Actually, not sure if adding batching is easily doable here. Might be worth having a different issue for that. Can you take a look and see what you think?
I'll fix the conflicts for what I have so far.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
pcuenca
left a comment
There was a problem hiding this comment.
The approach looks fine, but I feel we need some clarification on the expected input shapes for the image and the mask - the docstrings mention "batches" and shapes like (B, H, W), but in the code we are assuming a single image and a single mask.
One option is to accept batches too. In this case, the batch dimension of both image and mask must match the cardinality of the prompts list. This would be useful for multiple in-painting tasks (for the same or different prompts), but it'd add some complexity.
If we keep the current behaviour (single image and mask, but optionally several prompts or predictions), then we need to clarify that in the docstrings :)
Thanks a lot!
| repainted, while black pixels will be preserved. If `mask_image` is a PIL image, it will be converted | ||
| to a single channel (luminance) before use. If it's a tensor, it should contain one color channel (L) | ||
| instead of 3, so the expected shape would be `(B, H, W, 1)`. | ||
| to a single channel (luminance) before use. If it's a tensor, it should be of shape `(B, H, W)`. |
There was a problem hiding this comment.
I'm not sure about this. The code in _prepare_mask_and_masked_image unconditionally prepends a batch dimension to the image. We also need to ensure that the mask has a single channel, so the shape in my opinion would have to be (H, W, 1) and then adapt the permutes accordingly.
We also need to document the expected shape of the image itself: (H, W, 3), I guess?
There was a problem hiding this comment.
I was confused too by the docstrings. Before I even began work it said it supported batches and tensor inputs, but I couldn't get either to work.
I disagree about the mask input, I feel it would be cleaner to implement with it being shape (H, W). But if there is some convention I am unaware of then I can change it.
I think for now I can update the doc strings for the single image case, then create a separate issue for multi batch. Would you agree? I have a local fork that adds batching to _prepare_mask_and_masked_image, but it fails further on in the code. It seems more involved and probably needs a separate issue.
|
@patrickvonplaten just noticed #1003 also handles this issue, what is the status on this? |
|
Yes totally fine! #1003 did what this does and then some, so it makes sense ~ |
I took the inpaint pipeline example from the README and used with a tensor input using the following code:
but this fails with an error when calling
image = np.array(image.convert("RGB")).I don't think this is expected as the docstrings make reference to a tensor input.
This PR changes it to check if the input is a
torch.Tensorand if it is, converts it to aPIL.Image.Image.