This contains an air hockey simulation environment powered by Box2D. It is fast (C++ back-end), capable of self-play, 1v1 play, and easy goal-conditioned reinforcement learning, resulting in a rich testbed for various algorithms.
| Policy Trained for Upward Puck Velocity | Goal-Conditioned RL |
|---|---|
![]() |
![]() |
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh# Create virtual environment and sync dependencies from lock file
uv sync
# For training dependencies
uv sync --extra train# create uv virtual environment and activate
uv venv
source .venv/bin/activate
# Install the package in development mode
uv pip install -e .
# Or if you need training too:
uv pip install -e ".[train]"# Install with training dependencies
pip install -e .[train]
# Or just the base package
pip install -e .- Project notes and formal docs (architecture, Cursor rule mirrors):
notes/docs/index.md
AttributeError: 'MjRenderContextOffscreen' object has no attribute 'con'
echo 'export MUJOCO_GL="glx"' >> ~/.bashrc
source ~/.bashrc
Most of the files use a configuration file (--cfg cmd argument), but is defaulted to one from configs/. Please see there to tune parameters for various scripts.
airhockey2d.py: base gym environment for air hockeyrender.py: renders the air hockey environmenttrain.py: trains an agent via stable-baselines3 PPO.
Legacy:
demonstrate.py: user plays a self-play air hockey environment using keyboardplay_trained_agent: run after training, you can play against the trained agent
- Boot up the robot through the touchpad
- Press physical power button
- Press red power on touchpad in bottom left corner
- power on the robot with touch button in the middle
- open program "external_control.urp"
- run desired script in scripts/real
- ex: python scripts/real/teleoperate.py --cfg configs/baseline_configs/puck_vel_real.yaml
- When prompted in the terminal, run the program using the play button in the bottom middle of the touchpad
- follow prompts on the terminal. Hold 'q' to end trajectories
All commands below use async_td3_real, which handles collection, resets, and (optionally) training.
Settings in --args-file are respected; CLI flags override them.
python -m scripts.smooth_policy.amp_history.amp_training.td3.extras.async_td3_real \
--config configs/real_configs/rollout_td3_config.yaml \
--model-path ex_model/new_td3_model/checkpoint_325000/training_state.pth \
--args-file scripts/smooth_policy/amp_history/configs/td3_real_world/td3_online.yaml \
--collector-device cpu \
--learner-device cuda:0 \
--episode-artifact-dir real_runs/online_run/episode_hdf5 \
--episode-gif-dir real_runs/online_run/episode_gifs \
--reset-artifact-dir real_runs/online_run/reset_hdf5 \
--min-replay-size-before-learning 999999999 \
--no-enable-periodic-checkpointing \
--no-load-replay-from-checkpoint \
--warm-start-hdf5-dirspython -m scripts.smooth_policy.amp_history.amp_training.td3.extras.async_td3_real \
--config configs/real_configs/rollout_td3_config.yaml \
--model-path ex_model/td3_model/checkpoint_1515000/training_state.pth \
--args-file scripts/smooth_policy/amp_history/configs/td3_real_world/td3_online.yaml \
--collector-device cpu \
--learner-device cuda:0 \
--episode-artifact-dir real_runs/online_run/episode_hdf5 \
--episode-gif-dir real_runs/online_run/episode_gifs \
--reset-artifact-dir real_runs/online_run/reset_hdf5python -m scripts.smooth_policy.amp_history.amp_training.td3.extras.async_td3_real \
--config configs/real_configs/rollout_td3_config.yaml \
--model-path real_runs/checkpoints/default/checkpoint_successeps_100_qupdates_1517000/training_state.pth \
--args-file scripts/smooth_policy/amp_history/configs/td3_real_world/td3_online.yaml \
--collector-device cpu \
--learner-device cuda:0 \
--episode-artifact-dir real_runs/online_run/episode_hdf5 \
--episode-gif-dir real_runs/online_run/episode_gifs \
--reset-artifact-dir real_runs/online_run/reset_hdf5 \
--load-replay-from-checkpoint \
--include-non-vital-training-state-fields
