PyTorch

PyTorch tensor interface#

import machinevisiontoolbox
print(machinevisiontoolbox.__version__)

The toolbox provides interfaces to PyTorch – an important machine learning framework. The fundamental datatype in PyTorch is the tensor which is a multidimensional array, with shape (N, H, W, C) where N is the batch size, H and W are the image dimensions and C is the number of channels. For a single image the batch size is 1, so the shape is (1, H, W, C) or sometimes it is squeezed into the shape (H, W, C).

Image → tensor#

Use the tensor method to convert an Image to a tensor. For example:

>>> from machinevisiontoolbox import Image
>>> im = Image.Read("eiffel-1.png")
>>> tensor = im.tensor()
!! ImportError: PyTorch is required for to_tensor(). Install it with: pip install torch or pip install machinevision-toolbox-python[torch] [ERR unknown:30:unknown (source/tensor.rst)]
>>> print(tensor.shape)
!! NameError: name 'tensor' is not defined [ERR unknown:30:unknown (source/tensor.rst)]

By default, the resulting tensor is normalized to have the Imagenet mean and standard deviation, which is a common preprocessing step for feeding images into a neural network. If you want to get the raw pixel values as a tensor, you can set the normalized argument to False.

tensor → Image#

Use the constructor Tensor method to create an Image from a tensor. For example:

>>> from machinevisiontoolbox import Image
>>> from torch import rand
!! ModuleNotFoundError: No module named 'torch' [ERR unknown:47:unknown (source/tensor.rst)]
>>> tensor = rand(3, 480, 640)  # random 3-channel image
!! NameError: name 'rand' is not defined [ERR unknown:47:unknown (source/tensor.rst)]
>>> img = Image.Tensor(tensor)
!! NameError: name 'tensor' is not defined [ERR unknown:47:unknown (source/tensor.rst)]
>>> print(img)
!! NameError: name 'img' is not defined [ERR unknown:47:unknown (source/tensor.rst)]

An exception is thrown if the tensor has a batch dimension greater than 1.

If the tensor is from a segmentation model and contains logits, then the logits argument should be set to True. This will apply a softmax to the tensor and convert it to a color image where each pixel is colored according to the class with the highest probability. For example:

>>> outputs = model(img.tensor())
>>> classes = Image.Tensor(outputs, logits=True)
>>> classes.disp() # display the class labels as colors

Image source → batch tensor#

An image source (a concrete instance of the abstract class ImageSource that yields images) represents a set of images and can be converted to a batch tensor where N>1. All sources have a tensor method that creates a batch tensor containing all the images in the source.

For example, a video file is a set of images and a tensor can be created that contains all of its frames:

from machinevisiontoolbox import VideoFile
with VideoFile("traffic_sequence.mp4") as video:
    tensor = video.tensor()
print(tensor.shape)

Note the use of the context manager to ensure that the video file is properly closed after reading. The resulting tensor has shape (N, H, W, C) where N is the number of frames in the video.

We can similarly create a batch tensorfor a local file folder, a ROS bag, or a Zip archive.

batch tensor → Image iterator#

A batch tensor can be converted to a set of images. This is done using an Image iterator:

from machinevisiontoolbox import TensorStack
from torch import rand
tensor = rand(16, 480, 640, 3)  # batch of 16 random RGB images
for img in TensorStack(tensor):
    img.disp(fps=4)

This particular example could be achieved a little more concisely by using the ImageSource.disp method inherited by all image sources, including TensorStack. A single line of code iterates over a tensor and displays the frames as a video:

from machinevisiontoolbox import TensorStack
from torch import rand
tensor = rand(16, 480, 640, 3)  # batch of 16 random RGB images
TensorStack(tensor).disp(fps=4)