This demo showcases the capabilities of ExecuTorch's JavaScript bindings. It is able to load an LLM and tokenizer and generate tokens.
Emscripten is necessary to compile ExecuTorch for Wasm.
Make sure you have the system requirements listed in the Getting Started Guide before continuing.
- Install ExecuTorch from PyPI.
pip3 install executorch- Update the ExecuTorch submodule.
git submodule update --init --recursive executorch- Generate the stories110M binary file and the tokenizer configuration file for this demo.
bash export.shIt should output the files stories110M.pte and tokenizer.model.
Once you have Emscripten installed, ExecuTorch set up, and the model and tokenizer files generated, you can build and run the demo. Building may take up to 9 minutes.
cd stories110M/wasm # The directory containing this README
# Build the demo
bash build.sh
# Run the demo
python3 -m http.server --directory build/The page will be available at http://localhost:8000/demo.html.
- Load a model and tokenizer configuration from a file.
- The demo is configured to load the stories110M model.
- Larger models may fail to upload or run out of memory.
- Temperature slider ranging from 0.0 to 2.0.
- Tokens to generate slider ranging from 1 to max context length - 1.
- Generate tokens in a text box to tell a short story.
- Display the generated tokens in a table.
- Prefill latency is ~22ms per token.
- Decode latency is ~23ms.