adds support for `torch.Tensor` input in inpaint pipeline by vvvm23 · Pull Request #1128 · huggingface/diffusers

vvvm23 · 2022-11-03T22:01:28Z

I took the inpaint pipeline example from the README and used with a tensor input using the following code:

import PIL
import requests
import torch
from io import BytesIO

from diffusers import StableDiffusionInpaintPipeline
from torchvision.transforms.functional import to_tensor # <<<<<< NEW

def download_image(url):
    response = requests.get(url)
    return PIL.Image.open(BytesIO(response.content)).convert("RGB")

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

init_image = download_image(img_url).resize((512, 512))
mask_image = download_image(mask_url).resize((512, 512))

init_image = to_tensor(init_image) # <<<<<< NEW
mask_image = to_tensor(mask_image)[0] # <<<<<< NEW

pipe = StableDiffusionInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-inpainting",
    revision="fp16",
    torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")

prompt = "A cute rabbit, high resolution, sitting on a park bench"
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image).images[0]

but this fails with an error when calling image = np.array(image.convert("RGB")).
I don't think this is expected as the docstrings make reference to a tensor input.

This PR changes it to check if the input is a torch.Tensor and if it is, converts it to a PIL.Image.Image.

HuggingFaceDocBuilderDev · 2022-11-03T22:04:51Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

patrickvonplaten · 2022-11-04T17:36:40Z

 import torch

 import PIL
+from torchvision.transforms.functional import to_pil_image


Sorry we cannot use this here -> we don't have a dependency on torchvision . Could you instead use the pipe.numpy_to_pil function?

Ah right, I thought it would be OK as it is listed as a dependency here (though maybe I have misunderstood this file. I will change this.

Hey @vvvm23,

That's a good point. The list you linked as a list of "soft-dependencies" which are not required for core functionality but for certain use cases which in the case of torchvision is training.

The "hard-dependencies" can be found here:

diffusers/setup.py

Line 201 in a480229

install_requires = [

Maybe we could instead make prepare_mask_and_masked_image a method of the pipeline and instead use self.numpy_to_pil?

I think it being in or out the pipeline would work, as numpy to pil is a static method so can be accessed without actually passing a reference to the pipe.

I prefer it as a method of the pipe I'll try and get something to you tomorrow ~

We still cannot have a dependency on torchvision though ;-)

I removed the torchvision dependency.

I noticed also the docstrings make reference to accepting batched inputs. So I'll add that too soon.

edit: I decided against using numpy_to_pil after experimenting with that approach.

Actually, not sure if adding batching is easily doable here. Might be worth having a different issue for that. Can you take a look and see what you think?

I'll fix the conflicts for what I have so far.

…mage` a class function

HuggingFaceDocBuilderDev · 2022-11-10T22:47:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

pcuenca

The approach looks fine, but I feel we need some clarification on the expected input shapes for the image and the mask - the docstrings mention "batches" and shapes like (B, H, W), but in the code we are assuming a single image and a single mask.

One option is to accept batches too. In this case, the batch dimension of both image and mask must match the cardinality of the prompts list. This would be useful for multiple in-painting tasks (for the same or different prompts), but it'd add some complexity.

If we keep the current behaviour (single image and mask, but optionally several prompts or predictions), then we need to clarify that in the docstrings :)

Thanks a lot!

pcuenca · 2022-11-11T11:03:51Z

                repainted, while black pixels will be preserved. If `mask_image` is a PIL image, it will be converted
-                to a single channel (luminance) before use. If it's a tensor, it should contain one color channel (L)
-                instead of 3, so the expected shape would be `(B, H, W, 1)`.
+                to a single channel (luminance) before use. If it's a tensor, it should be of shape `(B, H, W)`.


I'm not sure about this. The code in _prepare_mask_and_masked_image unconditionally prepends a batch dimension to the image. We also need to ensure that the mask has a single channel, so the shape in my opinion would have to be (H, W, 1) and then adapt the permutes accordingly.

We also need to document the expected shape of the image itself: (H, W, 3), I guess?

I was confused too by the docstrings. Before I even began work it said it supported batches and tensor inputs, but I couldn't get either to work.

I disagree about the mask input, I feel it would be cleaner to implement with it being shape (H, W). But if there is some convention I am unaware of then I can change it.

I think for now I can update the doc strings for the single image case, then create a separate issue for multi batch. Would you agree? I have a local fork that adds batching to _prepare_mask_and_masked_image, but it fails further on in the code. It seems more involved and probably needs a separate issue.

vvvm23 · 2022-11-14T23:51:28Z

@patrickvonplaten just noticed #1003 also handles this issue, what is the status on this?

patrickvonplaten · 2022-11-16T16:39:43Z

Hey @vvvm23,

Thanks for the PR!
Linking this to #1003 as well ->

We need a test for these changes, as they touch highly used pipelines.
Could you add a code snippet show-casing which use case is currently not possible, but will be enabled by this PR?

vvvm23 · 2022-11-17T10:10:26Z

I can handle that later today – but won't this just do the same thing as #1003? Maybe better to just use #1003 🤔 I'll defer to you though.

patrickvonplaten · 2022-11-20T19:41:17Z

Hey @vvvm23,

Thanks for your answer, we just merged #1003 - would it be ok to close this one? 😅

vvvm23 · 2022-11-21T11:08:26Z

Yes totally fine! #1003 did what this does and then some, so it makes sense ~

adds support for torch.Tensor input in inpaint pipeline

0928f49

run make clean/quality

d28069f

vvvm23 mentioned this pull request Nov 4, 2022

[Community Pipeline] image2image inpainting? #905

Closed

patrickvonplaten reviewed Nov 4, 2022

View reviewed changes

vvvm23 added 2 commits November 10, 2022 21:30

removes torchvision dependency and makes `prepare_mask_and_masked_i…

f2e2272

…mage` a class function

Merge branch 'main' into inpainting-pipeline-tensor-input-fix

28b373b

pcuenca reviewed Nov 11, 2022

View reviewed changes

patrickvonplaten mentioned this pull request Nov 16, 2022

Handle batches and Tensors in pipeline_stable_diffusion_inpaint.py:prepare_mask_and_masked_image #1003

Merged

vvvm23 closed this Nov 21, 2022

Conversation

vvvm23 commented Nov 3, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Nov 3, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vvvm23 Nov 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Nov 10, 2022

Uh oh!

pcuenca left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vvvm23 commented Nov 14, 2022

Uh oh!

patrickvonplaten commented Nov 16, 2022

Uh oh!

vvvm23 commented Nov 17, 2022

Uh oh!

patrickvonplaten commented Nov 20, 2022

Uh oh!

vvvm23 commented Nov 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vvvm23 Nov 10, 2022 •

edited

Loading