Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

README.md

Program Data Separation Examples

This directory provides an example of the Program Data Separation APIs in ExecuTorch.

  1. Program data separation examples using a linear model with the portable operators and XNNPACK.
  2. LoRA inference example with multiple LoRA models sharing a single foundation weight file.

Program Data Separation

The program-data separation APIs allow users to generate a separate data file when exporting and lowering a model. i.e., generate a PTE file containing the model execution program, and one (or more) PTD file/s containing only weights.

PTD files are used to store data outside of the PTE file. Some use-cases:

  • On-device training: checkpointing for model weights.
  • Deduplication: sharing model weights between multiple executable PTE files. This can significantly reduce binary file size and runtime memory usage.
  • Flexible deployment: allow async updates between program and data, especially if they are updated with different cadences.

For more information on the PTD data format, please see the flat_tensor directory.

Linear example

For a demo of the program-data separation APIs using a linear model, please see program-data-separation/cpp/linear_example. This example generates and runs a program-data separated linear model, with program in a pte file and weights and bias in a separate .ptd file.

LoRA example

A major use-case that program-data separation enables is inference with multiple LoRA adapters. LoRA is a fine-tuning technique introduced in LoRA: Low-Rank Adaptation of Large Language Models. LoRA fine-tuning produces lightweight 'adapter' weights that can be applied to an existing model to adapt it to a new task. LoRA adapters are typically small in comparison to LLM foundation weights, on the order of KB-MB depending on the finetuning setup and model size.

To enable LoRA, we generate:

  • PTE file/s: containing program and LoRA adapter weights.
  • PTD file: containing foundation weights.

Multiple LoRA-adapted PTE files can share the same foundation weights and adding a model adapted to a new task incurs minimal binary size and runtime memory overhead.

Please take a look at program-data-separation/cpp/lora_example for a demo of the program-data separation APIs with LoRA. This example shows how to generate and run multiple LoRA adapter PTEs with a shared foundation weight file.