program-data-separation

Program Data Separation Examples

This directory provides an example of the Program Data Separation APIs in ExecuTorch.

Program data separation examples using a linear model with the portable operators and XNNPACK.
LoRA inference example with multiple LoRA models sharing a single foundation weight file.

Program Data Separation

The program-data separation APIs allow users to generate a separate data file when exporting and lowering a model. i.e., generate a PTE file containing the model execution program, and one (or more) PTD file/s containing only weights.

PTD files are used to store data outside of the PTE file. Some use-cases:

On-device training: checkpointing for model weights.
Deduplication: sharing model weights between multiple executable PTE files. This can significantly reduce binary file size and runtime memory usage.
Flexible deployment: allow async updates between program and data, especially if they are updated with different cadences.

For more information on the PTD data format, please see the flat_tensor directory.

Linear example

For a demo of the program-data separation APIs using a linear model, please see program-data-separation/cpp/linear_example. This example generates and runs a program-data separated linear model, with program in a pte file and weights and bias in a separate .ptd file.

LoRA example

A major use-case that program-data separation enables is inference with multiple LoRA adapters. LoRA is a fine-tuning technique introduced in LoRA: Low-Rank Adaptation of Large Language Models. LoRA fine-tuning produces lightweight 'adapter' weights that can be applied to an existing model to adapt it to a new task. LoRA adapters are typically small in comparison to LLM foundation weights, on the order of KB-MB depending on the finetuning setup and model size.

To enable LoRA, we generate:

PTE file/s: containing program and LoRA adapter weights.
PTD file: containing foundation weights.

Multiple LoRA-adapted PTE files can share the same foundation weights and adding a model adapted to a new task incurs minimal binary size and runtime memory overhead.

Please take a look at program-data-separation/cpp/lora_example for a demo of the program-data separation APIs with LoRA. This example shows how to generate and run multiple LoRA adapter PTEs with a shared foundation weight file.

Name		Name	Last commit message	Last commit date
parent directory ..
config		config
cpp		cpp
README.md		README.md
export_linear.py		export_linear.py
export_lora.sh		export_lora.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Program Data Separation Examples