Skip to content

[RFC] Autoload Device Extension #122468

@jgong5

Description

@jgong5

🚀 The feature, motivation and pitch

co-authors:
Intel: @jgong5 @bsochack @sujoysaraswati @gujinghui @xiaodonglin1
Huawei: @Yikun @hipudding

PyTorch has been continuously expanding its support for various hardware backend devices, extending its reach to out-of-the-tree devices which are usually implemented with PyTorch extensions. However, a notable challenge arises with the current approach: users must explicitly load the these out-of-the-tree device extension to gain the new device support. This manual loading not only complicates the user experience but also introduces unnecessary cognitive overhead.

The core issue lies in the discrepancy between the handling of in-tree and out-of-tree devices. While in-tree devices seamlessly integrate into the PyTorch ecosystem, out-of-the-tree devices require additional steps for loading on top of the standard PyTorch device programming model. Our goal with this proposal is to bridge this gap, ensuring that out-of-the-tree devices are as user-friendly and seamlessly integrated as their in-tree counterparts. (Note that we only focus on the Python-based programs in the RFC. C++ applications can also be linked with the native device extensions and automatic loading is taken care of by the loader.)
Specifically, our objectives are twofold:

  1. Enable users to adhere to the familiar PyTorch device programming model without the need for explicit loading or importing of device-specific extensions.
  2. Facilitate effortless adoption of existing PyTorch applications with zero-code changes on out-of-the-tree devices.

Examples

In the section, we use Intel Gaudi and Huawei Ascend, two representative out-of-the-tree devices, to demonstrate the problem with current usages and expected ones.

Intel Gaudi

Let’s take a look at the following PyTorch program that runs on Intel Gaudi via the PyTorch HPU device key. Users need to import “habana_frameworks.torch” to load the extension support for HPU.

import torch
import torchvision.models as models
import habana_frameworks.torch # <-- extra import
model = models.resnet50().eval().to(“hpu”)
input = torch.rand(128, 3, 224, 224).to(“hpu”)
output = model(input)

With the proposal, the additional import of “habana_frameworks.torch” is not necessary and the code follows the standard PyTorch device programming model, i.e., moving the input and model parameters to the device followed by the forward call etc.

import torch
import torchvision.models as models
model = models.resnet50().eval().to(“hpu”)
input = torch.rand(128, 3, 224, 224).to(“hpu”)
output = model(input)

The standard way of programming brings benefit in that users even don’t have to change the code that invovles moving the tensors to the target device in applications with good device abstraction, e.g., the device name is passed as an argument to the program. This proposal allows users to use new devices without the need of changing code.
Even in some cases where the applications have hard-coded calls to CUDA runtime functions, there is also solution to automatically migrating the code to the new devices. For example, the Habana PyTorch bridge supports “automatic GPU migration” so that the PyTorch programs written for the NVidia CUDA device doesn’t need to be changed to work with the Intel Gaudi HPU device. This is achieved by an additional import with “gpu_migration” module.

import torch
import torchvision.models as models
import habana_frameworks.torch # <-- extra import
import habana_frameworks.torch.gpu_migration # <-- extra import
model = models.resnet50().eval().to(“cuda”)
input = torch.rand(128, 3, 224, 224).to(“cuda”)
output = model(input)

With the proposal, the device specific imports are not necessary and users can run the existing code without any code changes.

import torch
import torchvision.models as models
model = models.resnet50().eval().to(“cuda”)
input = torch.rand(128, 3, 224, 224).to(“cuda”)
output = model(input)

Huawei Ascend

Similar to Intel Gaudi, users need to import “torch_npu” to load the device extension support for Huawei Ascend. Internally, Huawei Ascend leverages the “PrivateUse1” device key and exposes the device name as “npu” to the end users. We use Ascend as an example for out-of-the-tree devices that don’t have in-tree device keys but leverage “PrivateUse1” device key for integration.

import torch
import torchvision.models as models
import torch_npu # <-- extra import (install npu to PrivateUse1 etc.)
model = models.resnet50().eval().to(“npu”)
input = torch.rand(128, 3, 224, 224).to(“npu”)
output = model(input)

With the proposal, the expected usage is similar – users don’t have to import “torch_npu” explicitly.

import torch
import torchvision.models as models
model = models.resnet50().eval().to(“npu”)
input = torch.rand(128, 3, 224, 224).to(“npu”)
output = model(input)

The comments on the zero-code change enabling also apply to Huawei Ascend.

Proposed Solution

We propose three options below. For the sake of simplicity, option 3 is most preferrable.

Option 1: Module preloading

Inspired by the LD_PRELOAD mechanism in Unix-like operating systems, we propose to introduce a similar TORCH_PRELOAD environment variable to specify a list of Python modules to be loaded during the initialization of PyTorch core. The preloaded modules are specified as follows:
TORCH_PRELOAD=module1:module2:module3...
Take Intel Gaudi as an example, one would specify:
TORCH_PRELOAD=habana_frameworks.torch
or
TORCH_PRELOAD=habana_frameworks.torch:habana_frameworks.torch.gpu_migration
For Huawei Ascend, one would do:
TORCH_PRELOAD=torch_npu
For situations where configuration files are more preferred over environment variables. We can have a JSON configuration file: torch_preload.json like below:
[“module1”, “module2”, “module3”]
The module preloading is triggered in torch/__init__.py. The configuration file can be placed in a pre-defined place (e.g., site-packages/torch_config/torch_preload.json), installed by the device extensions, or overridden by users, and for torch/__init__.py to find.

  • Pros: Intuitive for users experienced with Unix-like systems to understand and use. Allows loading multiple modules/extensions at once. Provides flexibility to users to load any desired module.
  • Cons: Users can use this mechanism for other purposes than device extension loading, which might introduce confliction.

Option 2: Device extension specific configuration

This option proposes a specialized configuration for loading the device extension via environment variable TORCH_DEVICE_EXT and configuration file torch_device_extension.json.
TORCH_DEVICE_EXT=device_key:device_name:module1:module2:module3...
torch_device_extension.json:

{
  "device_key": "device_name",
  "modules": ["module1", "module2", "module3"]
}

The module preloading is triggered in torch/__init__.py. The configuration file can be placed in a pre-defined place (e.g., site-packages/torch_config/torch_device_extension.json), installed by the device extensions, or overridden by users, and for torch/__init__.py to find.

  • Pros: Dedicated configuration for loading device extensions without potential confliction with other purposes like option 1. Leave flexibility for PyTorch core to take device extension specific actions when loading the modules.
  • Cons: More complex configuration compared to Option 1.

Option 3: Load from a pre-defined module as plugin

This option proposes to use a pre-defined module, e.g., torch_plugins_device_extension, for loading out-of-the-tree device extensions. It leverages the python plugin mechanism: https://packaging.python.org/en/latest/guides/creating-and-discovering-plugins/#using-package-metadata.
An entrypoint is added to the PyTorch package like below:

[project.entry-points.'torch.plugins']
device_extension = 'torch_plugins_device_extension'

The module is loaded from torch/__init__.py as illustrated below:

from importlib.metadata import entry_points
discovered_plugins = entry_points(group='torch.plugins')
for plugin in discovered_plugins:
    try:
        plugin.load()
    except IndexError:
        pass

This module is installed by the python package of the device vendor, e.g., habana_frameworks for Intel Gaudi or torch_npu for Huawei Ascend, to site-packages/torch_plugins_device_extension and the torch_plugins_device_extension/__init__.py import the corresponding vendor-specific device extension modules.

  • Pros: Flexible import and configuration via Python. Simpler configuration and interface to PyTorch.
  • Cons: Lack of flexibility for users to customize the loading process.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: PrivateUse1private usetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions