Skip to content

CypherNaught-0x/TRM-Text

 
 

Repository files navigation

Tiny Recursion Model (TRM)

Tiny Recursion Model (TRM) is a compact recursive reasoning network that reaches 45 % on ARC-AGI-1 and 8 % on ARC-AGI-2 with only ~7 M parameters. This repository contains the training pipeline, dataset builders, and evaluation utilities that power the work described in the paper Less is More: Recursive Reasoning with Tiny Networks (arXiv).


How TRM Differs from HRM

TRM distills recursive reasoning into a single streamlined module and omits every hierarchy-specific component that defined the Hierarchical Reasoning Model (HRM).

  • Philosophy — TRM keeps the focus on minimal recursion without relying on brain analogies, fixed-point theorems, or explicit hierarchies.
  • Architecture — HRM maintained dedicated H-level and L-level reasoning stacks; TRM reuses the L-level module for all updates and ignores any H-level configuration.
  • Forward Iteration — HRM alternated between separate H and L modules during a step. TRM applies the same reasoning module when updating both latent states, producing a flatter recursive loop.
  • Configuration — Default TRM configs set H_layers=0 with multiple L_layers, alongside deeper L-level cycle counts (H_cycles=3, L_cycles=6), reflecting the absence of an H-level pathway.
  • Adaptive Computation Time — TRM adds the no_ACT_continue option to simplify halting, using only the halt signal sigmoid instead of comparing halt/continue logits.

All HRM code paths, configs, and documentation have been removed from this repository to keep the focus squarely on TRM and TRM-Text.


Quick Start

# 1. Set up a virtual environment
uv sync
source .venv/bin/activate

# 2. (Optional) authenticate TrackIO for experiment tracking
trackio login YOUR-LOGIN

ℹ️ Offline runs: TrackIO defaults to offline logging when no credentials are present—you can still inspect metrics locally.


Repository Tour

Path Purpose
pretrain.py Hydra-driven training entry point shared across ARC and text tasks
config/ Hydra configuration tree (architectures, datasets, tasks)
dataset/ Dataset builders and shared metadata helpers
datasets/ Runtime dataset adapters (e.g., TextDataset)
models/ Model components and loss heads, including the text transformer
evaluators/ ARC scorer and TrackIO-friendly text evaluator
utils/ Misc utilities (model loading, tokenisation helpers, etc.)
scripts/ Automation helpers like the TinyStories smoke-run script
tests/ Pytest suite covering builders, tokeniser, datasets, models, configs, and smoke CLI

Data Preparation

ARC-AGI builders

# ARC-AGI-1
python -m dataset.build_arc_dataset \
  --input-file-prefix kaggle/combined/arc-agi \
  --output-dir data/arc1concept-aug-1000 \
  --subsets training evaluation concept \
  --test-set-name evaluation

# ARC-AGI-2
python -m dataset.build_arc_dataset \
  --input-file-prefix kaggle/combined/arc-agi \
  --output-dir data/arc2concept-aug-1000 \
  --subsets training2 evaluation2 concept \
  --test-set-name evaluation2

Note: ARC-AGI-2’s training split overlaps ARC-AGI-1 evaluation data—do not train on both simultaneously if you plan to evaluate ARC-AGI-1.

Tiny text builders (TinyStories/TinyChat)

  1. Assemble raw JSON/JSONL files (mixing text or messages records).
  2. Run the builder:
python -m dataset.build_tiny_text_dataset \
  --input-paths data/raw/tinystories.jsonl \
  --output-dir data/tiny-text-processed \
  --max-sequence-length 128 \
  --lowercase true

The processed directory includes train/ and validation/ splits with padded NumPy tensors, alongside dataset.json metadata and identifier mappings.


Tiny Text Workflow

  1. Smoke test (recommended) — ensures configs, dataset wiring, and TrackIO logging all resolve:
./scripts/run_tiny_text_smoke.sh --dry-run data/tiny-text-processed smoke_demo  # view command
./scripts/run_tiny_text_smoke.sh data/tiny-text-processed smoke_demo            # execute (~minutes on CPU)
  1. Custom training run — override any Hydra field inline; example single-GPU run:
python pretrain.py \
  tasks=text/tinystories \
  data_paths="[data/tiny-text-processed]" \
  global_batch_size=16 \
  epochs=5 \
  eval_interval=1 \
  run_name="tinystories_run01"

Hydra configs config/tasks/text/tinystories.yaml and config/tasks/text/tinychat.yaml define sensible defaults for architecture depth, sequence length, and logging.


ARC Training (Legacy)

torchrun --nproc-per-node 4 pretrain.py \
  arch=trm \
  data_paths="[data/arc1concept-aug-1000]" \
  arch.H_cycles=3 arch.L_cycles=4 \
  global_batch_size=768 \
  +run_name=pretrain_att_arc1 ema=True

Expect ~3 days on 4× H100 GPUs for the full ARC-AGI-1 run.


Testing & Tooling

pytest                                 # full test suite
pytest tests/utils/test_text_tokenizer.py
pytest tests/integration/test_smoke_text_training.py -k smoke

The integration test constructs a miniature TinyStories shard and checks that run_tiny_text_smoke.sh produces a valid command in --dry-run mode.


Experiment Tracking with TrackIO

  • Authenticate once via trackio login.
  • Runs default to the project name configured via the Hydra field project_name.
  • Code snapshots are archived automatically when checkpoint_path is provided.
  • To operate fully offline: omit credentials or pass trackio mode=offline overrides when invoking pretrain.py.

Citation

If you build on this work, please cite the original paper:

@article{trm2025,
  title   = {Less is More: Recursive Reasoning with Tiny Networks},
  author  = {Alexia Massalin and collaborators},
  journal = {arXiv preprint arXiv:2510.04871},
  year    = {2025}
}

For assistance or bug reports, open an issue or consult AGENTS.md for contributor guidelines. Happy reasoning!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 96.8%
  • PowerShell 2.6%
  • Shell 0.6%