This subtree contains the Arm® Backend implementation for ExecuTorch. It supports multiple targets using a common infrastructure, that lowers PyTorch models to a TOSA representation. This representation is used to deploy to the following targets:
- Arm® Ethos™-U55/65/85 - Compiled using the Ethos-U Vela compiler.
- VGF Format, for ML extensions for Vulkan® – a format containing SPIR-V™ ML operators for Vulkan-capable devices.
The backend provides an ahead-of-time (AOT) flow, that produces a PTE file for your chosen target. The AOT flow supports the following development operating systems:
- Linux aarch64
- Linux x86_64
- macOS™ with Apple® Silicon
In addition, the following deployment paths are supported by this backend:
- Bare metal build of a reference runtime for Arm® Cortex®-M with Ethos-U acceleration:
- Full testing is available in tree using Corstone™ Fixed Virtual Platforms (FVP).
- Linux target support for VGF capable targets, using the executor_runner.
More information on TOSA can be found here: https://www.mlplatform.org/tosa/tosa_spec.html.
Below is an overview of the key folder and files in this directory:
backends/arm/
│
├── _passes/ # Graph transformation passes
│ ├── arm_pass_manager.py # Defines ordering of graph transformations
│ └── *_pass.py # Graph transformation implementation
│
├── common/ # Common functionality used across the backend
│
├── debug/ # Debugging schema and functionality
│
├── ethosu/ # Implementations of EthosUPartitioner and EthosUBackend
│
├── operator_support/ # Checks if operators can be partitioned
│
├── operators/ # ATen → TOSA serialization
│ ├── node_visitor.py # Defines base class for ATen → TOSA node visitors
│ └── op_*.py # Lowering implementations for individual operators
│
├── quantizer/ # Quantization-related logic
│ ├── arm_quantizer.py # EthosUQuantizer and VGFQuantizer definitions
│ └── quantization_annotator.py # Defines how operators are annotated for quantization
│
├── runtime/ # Backends for running inference on target devices
│ ├── EthosUBackend.cpp
│ └── VGFBackend.cpp
│
├── scripts/ # Auxiliary build, dependency installation and utility scripts
│
├── test/ # Unit tests for the backend
│ ├── ops/ # Operator level unit tests
│ ├── models/ # Model level unit tests
│ └── tester/ # Testing harnesses and utilities
│
├── third-party/ # External dependencies
│
├── tosa/ # Shared TOSA backend implementation and dialect
│
└── vgf/ # Implementations of VgfPartitioner and VgfBackend
The Arm backend can be built using the following command:
./install_executorch.sh
NOTE: While developing, it can be convenient to use ./install_executorch.sh --editable, which creates an editable installation of ExecuTorch.
Pick one of the target flows below. Each flow has a one-time setup step and a build command.
Builds ExecuTorch runtime libraries for Cortex-M with Ethos-U acceleration.
Setup:
./examples/arm/setup.sh --i-agree-to-the-contained-eula
Build:
./backends/arm/scripts/build_executorch.sh
Setup:
./examples/arm/setup.sh --disable-ethos-u-deps --enable-mlsdk-deps
This is the default setup path and installs the MLSDK components from pip. Developers who need local source builds can use:
./backends/arm/scripts/setup-mlsdk-from-source.sh
The current flow lowers to TOSA and converts to VGF for use in external projects,
so the executor_runner is not typically used here.
Direct Drive enables execution on Ethos-U85 via the Linux driver stack.
Driver stack (Linux) and API:
https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-linux-driver-stack
An FVP with Linux is available for Direct Drive, but it must be built and run manually. See:
https://corstone1000.docs.arm.com/en/corstone1000-2025.12/
Setup:
./examples/arm/setup.sh --i-agree-to-the-contained-eula --target-toolchain linux-musl
source ./examples/arm/arm-scratch/setup_path.sh
Build:
./backends/arm/scripts/build_executorch.sh \
--toolchain=aarch64-linux-musl-gcc \
--build_type=Debug
Note: setup selects the linux-musl toolchain; build uses the aarch64-linux-musl GCC toolchain name.
If your Yocto image enables the dropbear SSH server, you can copy the
executor_runner binary into the running FVP via scp:
scp -P 2222 arm_test/cmake-out/executor_runner root@127.0.0.1:/tmp/
Create a PTE file:
python3 -m backends.arm.scripts.aot_arm_compiler \
--model_name examples/arm/example_modules/add.py \
--delegate \
--quantize \
--target ethos-u85-256 \
--direct_drive
Copy the executor_runner binary and the generated PTE file to the running FVP:
scp -P 2222 arm_test/cmake-out/executor_runner add_arm_delegate_ethos-u85-256.pte root@127.0.0.1:/tmp/
Run the model on the FVP:
ssh -p 2222 root@127.0.0.1 -t "/tmp/executor_runner -model_path /tmp/add_arm_delegate_ethos-u85-256.pte -num_executions 1"
There are two approaches for running the tests for the Arm backend. This section will explain these two approaches:
The backend provides a script backends/arm/test/test_arm_baremetal.sh, which is used in the trunk CI workflow.
This approach is useful for checking your change against this workflow on your own machine.
These scripts also install the necessary dependencies to run the tests.
Below is an overview of some of the testing options this script provides:
| Command | Description |
|---|---|
test_arm_baremetal.sh test_pytest_ops_no_target |
Runs operator unit tests for non-target specific use-cases. |
test_arm_baremetal.sh test_pytest_models_no_target |
Runs model unit tests for non-target specific use-cases. |
test_arm_baremetal.sh test_pytest_ops_tosa |
Runs operator unit tests for TOSA specific use-cases. |
test_arm_baremetal.sh test_pytest_models_tosa |
Runs model unit tests for TOSA specific use-cases. |
test_arm_baremetal.sh test_run_tosa |
Runs end-to-end unit tests for TOSA specific use-cases. |
test_arm_baremetal.sh test_pytest_ops_ethos_u55 |
Runs operator unit tests for Ethos-U55 specific use-cases. |
test_arm_baremetal.sh test_pytest_models_ethos_u55 |
Runs model unit tests for Ethos-U55 specific use-cases. |
test_arm_baremetal.sh test_run_ethos_u55 |
Runs end-to-end unit tests for Ethos-U55 specific use-cases. |
test_arm_baremetal.sh test_pytest_ops_ethos_u85 |
Runs operator unit tests for Ethos-U85 specific use-cases. |
test_arm_baremetal.sh test_pytest_models_ethos_u85 |
Runs model unit tests for Ethos-U85 specific use-cases. |
test_arm_baremetal.sh test_run_ethos_u85 |
Runs end-to-end unit tests for Ethos-U85 specific use-cases. |
test_arm_baremetal.sh test_pytest_ops_vkml |
Runs operator unit tests for VGF specific use-cases. |
test_arm_baremetal.sh test_pytest_models_vkml |
Runs model unit tests for VGF specific use-cases. |
test_arm_baremetal.sh test_run_vkml |
Runs end-to-end unit tests for VGF specific use-cases. |
test_arm_baremetal.sh test_model_smollm2-135M |
Runs some models with Corstone FVP. |
test_arm_baremetal.sh test_smaller_stories_llama |
Runs E2E model tests on Corstone FVP. |
test_arm_baremetal.sh test_memory_allocation |
Runs memory allocation tests for Ethos-U specific targets |
For more information, please refer to the backends/arm/test/test_arm_baremetal.sh script.
The Arm backend uses pytest to run the unit test suite in backends/arm/test.
This option offers flexibility, allowing a specific test or a particular subset of the testsuite to be run.
Below provides some examples of how to use it:
-
To run all the unit tests run the following command:
pytest -v -n auto backends/arm/test/ -
To run a specific test in a file:
pytest -v backends/arm/test/ops/test_add.py -k test_add_tensor_tosa_INT_3
Some tests, with u55, u85 and vgf in the name require external dependencies to run if you use pytest:
- When a test contains
u55oru85, you must run the following to setup the executor_runner:./backends/arm/scripts/build_executorch.sh ./backends/arm/test/setup_testing.sh - When a test contains
vgf, you must run the following to install the ML SDK:./backends/arm/scripts/build_executorch.sh ./backends/arm/test/setup_testing_vkml.sh
In addition, some model tests in the Arm backend require third-party libraries or packages.
To run these tests, you need to install the required dependencies by running the script examples/arm/setup.sh with the flag --setup-test-dependency.
Please note that installing model test dependencies is a standalone process. When using the --setup-test-dependency flag,
the script will install only the necessary dependencies for model tests, skipping all other setup procedures.
The repo-wide pre-commit hook (lintrunner + torch_pin sync) is installed automatically
by ./install_executorch.sh. To install the Arm-specific pre-push hook (license checks,
commit message format, docgen):
cp backends/arm/scripts/pre-push .git/hooks/
The current TOSA version does not support int64. However, int64 is commonly used in many models. In order to lower the operators with int64 inputs and/or outputs to TOSA, a few passes have been developed to handle the int64-related issues. The main idea behind these passes is to replace the uses of int64 with int32 where feasible.
- For floating-point models, these passes need to run very early in the lowering process and can be passed in to the to_edge_transform_and_lower() function call as an optional parameter.
- For quantized models, these transformations will be automatically handled during annotation before the export stage.
List of model specific and optional passes:
-
ConvertInt64ConstOpsToInt32Pass
- Functionalities:
- Rewrites constant-producing ops that output int64 to instead output int32, when values are within int32 bounds.
- Supported Ops:
torch.full,torch.arange,torch.eye,torch.linspace,torch.tensor
- Example usage:
- backends/arm/test/models/stable_diffusion/test_CLIPTextModelWithProjection.py
- backends/arm/test/models/stable_diffusion/test_T5EncoderModel.py
- Functionalities:
-
ConvertInt64OutputOpsToInt32Pass
- Overview:
- Rewrites or removes operations that produce int64 outputs, converting them to int32 where possible.
- Overflow checks are applied selectively; for ops without such checks, users need to ensure values fit within the int32 range.
- Functionalities:
- Handling casting to int64:
- (1) int32 -> int64:
- Removes the cast and redirect uses of int64 to int32
- (2) other types -> int64:
- Rewrites the cast to other types -> int32
- Supported Ops:
- torch.ops.aten.to.[dtype|dtype_layout]
- exir_ops.edge.dim_order_ops._to_dim_order_copy.default
- (1) int32 -> int64:
- Post-process argmax outputs:
- Inserts an int64->int32 cast after the argmax operations that produce int64 outputs:
- Supported Ops:
- torch.ops.aten.argmax.default
- exir_ops.edge.aten.argmax.default
- Handling casting to int64:
- Example usage:
- (Functionality 1) backends/arm/test/models/stable_diffusion/test_T5EncoderModel.py
- (Functionality 2) backends/arm/test/models/stable_diffusion/test_CLIPTextModelWithProjection.py
- Overview:
-
InsertInt32CastsAfterInt64PlaceholdersPass
- Functionalities:
- Inserts an int64 -> int32 cast immediately after each int64 placeholder (graph input).
- Redirects all uses of each int64 placeholder to its int32 cast output.
- Inserts local int32 -> int64 casts at call sites where an operator requires int64 inputs, e.g.
torch.nn.functional.one_hot
- Pass ordering:
- When used with
ConvertInt64ConstOpsToInt32PassandConvertInt64OutputOpsToInt32Pass, run this pass last. - Rationale: Those passes may cause retracing to re-infer some int64 placeholders as int32. Running this pass last casts only inputs that remain int64, minimizing inserted casts.
- When used with
- Example usage:
- backends/arm/test/models/test_llama.py
- backends/arm/test/models/stable_diffusion/test_CLIPTextModelWithProjection.py
- backends/arm/test/models/stable_diffusion/test_T5EncoderModel.py
- Functionalities:
-
ToDevicePass
- This is a utility for moving an already-quantized or already-decomposed GraphModule to another device.
- it is intended to be used immediately before rerunning / retracing / torch.export.export(...)
- Functionalities:
- Calls
.to(device)on the GraphModule and rewrites explicitdevice=kwargs oncall_functionnodes to a user-specified device. - Useful when manually moving an already-quantized or already-decomposed graph module to another device for validation, since some constant-producing nodes may still carry an export-time device kwarg.
- Calls
- Example usage:
from executorch.exir.passes import ToDevicePassgraph_module = ToDevicePass("cpu")(graph_module).graph_module- backends/arm/test/misc/test_post_quant_device_switch.py
If you have problems or questions, or have suggestions for ways to improve the Arm backend, please reach out to the Arm team developing this backend, or create an issue on here and add the "partner: arm" label.