A framework for adding natural language interfaces to CLI tools using locally-trained small language models. No cloud APIs, no subscriptions -- runs offline on CPU.
# Instead of memorizing flags
docker run -d -p 8080:80 --name web -e NODE_ENV=production nginx
# Just describe what you want
docker -w "run nginx on port 8080 with production env in background"📢 Discussion: See the Reddit thread for technical discussion and community feedback.
nl-cli.mp4
Trained on 594 Docker command examples across 8 categories (run, build, exec, compose, network, volume, system, ps/images).
| Gemma 3 1B | Gemma 3 4B | |
|---|---|---|
| Accuracy | 73-76% (ceiling after 3 runs) | 94% (first try) |
| Model size | 810 MB | ~2.5 GB |
| Inference (CPU) | ~5s | ~12s |
| Training time | 16 min on free Colab T4 | ~45 min on free Colab T4 |
| Trainable params | 13M / 1B (1.29%) | ~50M / 4B (~1.3%) |
The 1B model hits a capacity ceiling at 73-76% -- fixing one command category causes regressions in others (the "whack-a-mole effect"). The 4B model holds all flag patterns simultaneously without trading accuracy between categories. Full analysis in the Reddit discussion.
| Category | Accuracy | Category | Accuracy |
|---|---|---|---|
| run | 96.2% | network | 100% |
| build | 90.0% | volume | 100% |
| compose | 100% | system | 100% |
| exec | 84.6% | ps/images | 87.5% |
# Clone and install
git clone https://github.com/pranavkumaarofficial/nlcli-wizard.git
cd nlcli-wizard
pip install -e .
# Download the 4B GGUF model (~2.5GB) and place in models/
# (HuggingFace repo: pranavkumaarofficial/nlcli-gemma3-docker)
# Translate
python -m nlcli_wizard.cli translate --cli-tool docker "run nginx on port 8080 in background"
# Command: docker run -d -p 8080:80 nginx
# Confidence: 95%
# Runs nginx container in detached mode, mapping port 8080 to 80
python -m nlcli_wizard.cli translate --cli-tool docker "stop container web"
# Command: docker stop web
# Confidence: 95%
# Stops web containerThe training notebook runs on free Colab T4 with step-by-step explanations. No ML experience required.
# 1. Generate training data for your CLI tool
python -m nlcli_wizard.dataset_docker # generates data/docker_training.jsonl
# 2. Open the Colab notebook and train (free T4 GPU)
# 3. Download the GGUF model and place in models/
# 4. Run evaluation
python test/evaluate_docker.pyUser: "scale web service to 3 instances"
|
v
Prompt: "<start_of_turn>user\nTranslate to docker command: ...<end_of_turn>\n<start_of_turn>model\n"
|
v
Gemma 3 4B (fine-tuned, quantized Q4_K_M, running on CPU via llama.cpp)
|
v
COMMAND: docker-compose up --scale web=3
CONFIDENCE: 0.92
EXPLANATION: Scales the web service to 3 replicas
|
v
Preview -> Confirm -> Execute
The model outputs structured COMMAND / CONFIDENCE / EXPLANATION format. The agent parses this and asks for confirmation before executing.
The framework is tool-agnostic. To add support for a new CLI tool:
- Write a dataset generator -- parse
--helpoutput, generate NL variations for each command - Train on Colab -- swap the dataset file, run the notebook
- Drop in the GGUF -- place the quantized model in
models/ - Register in MODEL_REGISTRY -- add an entry in
model.py
nlcli-wizard/
nlcli_wizard/
cli.py # CLI interface
model.py # Model loading, MODEL_REGISTRY
agent.py # Prompt formatting, output parsing
dataset.py # Venvy dataset generator
dataset_docker.py # Docker dataset generator (594 examples)
training/
nlcli_wizard_training_[PUBLIC].ipynb # Colab training notebook
test/
evaluate_docker.py # Per-category accuracy evaluation
data/
docker_training.jsonl # Generated training data
models/
*.gguf # Quantized models (gitignored)
scripts/
docker-wizard.sh # Shell wrapper
docker-wizard.ps1 # PowerShell wrapper
plot_comparison.py # Generate comparison charts
- Base model: Gemma 3 4B-Instruct (via Unsloth)
- Training: QLoRA with Unsloth on free Colab T4
- Quantization: GGUF Q4_K_M with importance matrix via llama.cpp
- Inference: llama-cpp-python, CPU, 4 threads
- Output format: Structured COMMAND/CONFIDENCE/EXPLANATION
| Tool | Dataset | Model | Accuracy | Status |
|---|---|---|---|---|
| Docker | 594 examples | Gemma 3 4B | 94% | Available |
| Venvy | 1,500 examples | Gemma 3 1B | 83% | Available |
| Kubernetes | -- | -- | -- | Planned |
| Git | -- | -- | -- | Planned |
The first tool integrated was venvy, a fast Python virtual environment manager:
"show my environments sorted by size" -> venvy ls --sort size
"register this project as myenv" -> venvy register --name myenv
"clean up old venvs" -> venvy cleanup --days 90
Trained on Gemma 3 1B with 1,500 verified examples. 83% accuracy. This was the proof-of-concept that validated the architecture before moving to Docker and 4B.
- Venvy proof-of-concept (Gemma 3 1B, 83% accuracy)
- Docker support (Gemma 3 4B, 94% accuracy)
- 1B vs 4B comparison with per-category analysis
- Training notebook with step-by-step explanations
- Auto-ingestion pipeline:
--helpdocs in, training data out, weights packaged - Error correction feedback loop (command fails -> suggest fix)
- PyPI package release
- Kubernetes and Git datasets
The end goal: any CLI tool maintainer can point this at their docs, generate training data, fine-tune a model, and ship weights alongside their package. Their users get tool -w "what I want to do" for free.
See CONTRIBUTING.md for details on:
- Adding new CLI tool support
- Improving dataset quality
- Testing and evaluation
Built by Pranav Kumaar | nlcli-wizard | venvy
