Version: 0.3.0 Last Updated: December 2025
This document contains comprehensive documentation for MLCLI - Machine Learning Command Line Interface.
- Installation
- Project Structure
- CLI Commands
- Configuration Files
- Hyperparameter Tuning
- Model Explainability
- Data Preprocessing
- Interactive TUI
- Experiment Tracking
- Extending MLCLI
- Troubleshooting
pip install mlcli-toolkitgit clone https://github.com/codeMaestro78/MLcli.git
cd mlcliCreate Virtual Environment:
# Windows (PowerShell)
python -m venv .venv
.\.venv\Scripts\Activate.ps1
# Linux/macOS
python -m venv .venv
source .venv/bin/activateInstall Dependencies:
pip install --upgrade pip
pip install -r requirements.txt
pip install -e .Verify Installation:
mlcli --helpmlcli/
├── mlcli/
│ ├── __init__.py
│ ├── __main__.py
│ ├── cli.py
│ ├── config/
│ │ ├── __init__.py
│ │ └── loader.py
│ ├── trainers/
│ │ ├── __init__.py
│ │ ├── base_trainer.py
│ │ ├── logistic_trainer.py
│ │ ├── svm_trainer.py
│ │ ├── rf_trainer.py
│ │ ├── xgb_trainer.py
│ │ ├── tf_dnn_trainer.py
│ │ ├── tf_cnn_trainer.py
│ │ └── tf_rnn_trainer.py
│ ├── tuner/
│ │ ├── __init__.py
│ │ ├── base_tuner.py
│ │ ├── grid_tuner.py
│ │ ├── random_tuner.py
│ │ └── optuna_tuner.py
│ ├── explainer/
│ │ ├── __init__.py
│ │ ├── base_explainer.py
│ │ ├── shap_explainer.py
│ │ ├── lime_explainer.py
│ │ └── explainer_factory.py
│ ├── preprocessor/
│ │ ├── __init__.py
│ │ ├── base_preprocessor.py
│ │ ├── scalers.py
│ │ ├── normalizers.py
│ │ ├── encoders.py
│ │ ├── feature_selectors.py
│ │ ├── preprocessor_factory.py
│ │ └── pipeline.py
│ ├── utils/
│ │ ├── __init__.py
│ │ ├── io.py
│ │ ├── metrics.py
│ │ ├── logger.py
│ │ └── registry.py
│ ├── runner/
│ │ ├── __init__.py
│ │ └── experiment_tracker.py
│ └── ui/
│ ├── __init__.py
│ └── app.py
├── configs/
├── data/
├── artifacts/
├── logs/
├── runs/
├── docs/
├── README.md
├── pyproject.toml
└── requirements.txt
mlcli list-modelsOutput:
Available Model Trainers:
================================================================================
logistic_regression Logistic Regression Classifier [sklearn]
svm Support Vector Machine Classifier [sklearn]
random_forest Random Forest Classifier [sklearn]
xgboost XGBoost Gradient Boosting Classifier [xgboost]
tf_dnn TensorFlow Dense Neural Network [tensorflow]
tf_cnn TensorFlow CNN for Image Classification [tensorflow]
tf_rnn TensorFlow RNN for Sequence Data [tensorflow]
================================================================================
# Train with configuration file
mlcli train --config <path-to-config.json>
# Examples
mlcli train --config configs/logistic_config.json
mlcli train --config configs/rf_config.json
mlcli train --config configs/xgb_config.json
mlcli train --config configs/tf_dnn_config.json
# Train with parameter overrides
mlcli train --config configs/tf_dnn_config.json --epochs 50 --batch-size 64mlcli eval --model-path <path-to-model> --data-path <path-to-test-data> --model-type <model-type>
# Examples
mlcli eval --model-path artifacts/model.pkl --data-path data/test.csv --model-type logistic_regression
mlcli eval --model-path artifacts/model.joblib --data-path data/test.csv --model-type random_forest
mlcli eval --model-path artifacts/model.h5 --data-path data/test.csv --model-type tf_dnn# List all experiment runs
mlcli list-runs
# Show details of a specific run
mlcli show-run <run-id>
# Export all runs to CSV
mlcli export-runs --output experiments.csvmlcli ui{
"model": {
"type": "<model-type>",
"params": { ... }
},
"dataset": {
"path": "<path-to-data>",
"type": "csv",
"target_column": "<target-column-name>"
},
"training": {
"test_size": 0.2,
"random_state": 42
},
"output": {
"model_dir": "artifacts",
"save_formats": ["pickle", "joblib"]
}
}{
"model": {
"type": "logistic_regression",
"params": {
"penalty": "l2",
"C": 1.0,
"solver": "lbfgs",
"max_iter": 1000
}
},
"dataset": {
"path": "data/train.csv",
"type": "csv",
"target_column": "target"
},
"training": {
"test_size": 0.2,
"random_state": 42
},
"output": {
"model_dir": "artifacts",
"save_formats": ["pickle", "joblib"]
}
}{
"model": {
"type": "random_forest",
"params": {
"n_estimators": 100,
"max_depth": null,
"min_samples_split": 2,
"min_samples_leaf": 1,
"random_state": 42
}
},
"dataset": {
"path": "data/train.csv",
"type": "csv",
"target_column": "target"
},
"training": {
"test_size": 0.2,
"random_state": 42
},
"output": {
"model_dir": "artifacts",
"save_formats": ["pickle", "joblib"]
}
}{
"model": {
"type": "xgboost",
"params": {
"n_estimators": 100,
"max_depth": 6,
"learning_rate": 0.1,
"subsample": 0.8,
"colsample_bytree": 0.8,
"early_stopping_rounds": 10,
"random_state": 42
}
},
"dataset": {
"path": "data/train.csv",
"type": "csv",
"target_column": "target"
},
"training": {
"test_size": 0.2,
"random_state": 42
},
"output": {
"model_dir": "artifacts",
"save_formats": ["pickle", "joblib"]
}
}{
"model": {
"type": "tf_dnn",
"params": {
"layers": [128, 64, 32],
"activation": "relu",
"dropout": 0.3,
"optimizer": "adam",
"learning_rate": 0.001,
"epochs": 20,
"batch_size": 32,
"early_stopping": true,
"patience": 5
}
},
"dataset": {
"path": "data/train.csv",
"type": "csv",
"target_column": "target"
},
"training": {
"test_size": 0.2,
"random_state": 42
},
"output": {
"model_dir": "artifacts",
"save_formats": ["h5", "savedmodel"]
}
}| Method | Name | Best For |
|---|---|---|
grid |
Grid Search | Small parameter spaces with discrete values |
random |
Random Search | Large parameter spaces, continuous params |
bayesian |
Bayesian Optimization (Optuna) | Expensive evaluations, complex param spaces |
# List available tuning methods
mlcli list-tuners
# Tune with Grid Search
mlcli tune --config configs/tune_rf_config.json --method grid --cv 5
# Tune with Random Search
mlcli tune --config configs/tune_rf_config.json --method random --n-trials 100 --cv 5
# Tune with Bayesian Optimization
mlcli tune --config configs/tune_xgb_config.json --method bayesian --n-trials 200 --scoring accuracy
# Tune and train best model
mlcli tune --config configs/tune_rf_config.json --method random --n-trials 50 --train-best| Option | Description |
|---|---|
--config, -c |
Path to tuning configuration file |
--method, -m |
Tuning method: grid, random, or bayesian |
--n-trials, -n |
Number of trials (for random/bayesian) |
--cv |
Number of cross-validation folds |
--scoring, -s |
Metric to optimize: accuracy, f1, roc_auc, precision, recall |
--output, -o |
Path to save tuning results (JSON) |
--train-best |
Train a model with best params after tuning |
{
"model": {
"type": "random_forest",
"params": {}
},
"dataset": {
"path": "data/train.csv",
"type": "csv",
"target_column": "target"
},
"training": {
"test_size": 0.2,
"random_state": 42
},
"tuning": {
"param_space": {
"n_estimators": [50, 100, 200, 300],
"max_depth": [5, 10, 15, 20, null],
"min_samples_split": [2, 5, 10],
"min_samples_leaf": [1, 2, 4],
"max_features": ["sqrt", "log2"]
}
},
"output": {
"model_dir": "artifacts",
"save_formats": ["pickle", "joblib"]
}
}{
"model": {
"type": "xgboost",
"params": {}
},
"dataset": {
"path": "data/train.csv",
"type": "csv",
"target_column": "target"
},
"training": {
"test_size": 0.2,
"random_state": 42
},
"tuning": {
"param_space": {
"n_estimators": { "type": "int", "low": 50, "high": 500 },
"max_depth": { "type": "int", "low": 3, "high": 15 },
"learning_rate": { "type": "loguniform", "low": 0.01, "high": 0.3 },
"subsample": { "type": "uniform", "low": 0.6, "high": 1.0 },
"colsample_bytree": { "type": "uniform", "low": 0.6, "high": 1.0 },
"min_child_weight": { "type": "int", "low": 1, "high": 10 }
}
},
"output": {
"model_dir": "artifacts",
"save_formats": ["pickle", "joblib"]
}
}| Type | Description | Example |
|---|---|---|
list/tuple |
Discrete choices | [50, 100, 200] |
int |
Integer range | {"type": "int", "low": 1, "high": 100} |
uniform |
Uniform float | {"type": "uniform", "low": 0.0, "high": 1.0} |
loguniform |
Log-uniform | {"type": "loguniform", "low": 0.001, "high": 1.0} |
categorical |
Choice | {"type": "categorical", "choices": ["a", "b"]} |
| Method | Full Name | Best For |
|---|---|---|
shap |
SHapley Additive exPlanations | Tree-based models, global explanations |
lime |
Local Interpretable Model-agnostic Explanations | Any model, local explanations |
# List available explainers
mlcli list-explainers
# Explain model with SHAP
mlcli explain --model models/rf_model.pkl --data data/train.csv --type random_forest --method shap
# Explain model with LIME
mlcli explain --model models/xgb_model.pkl --data data/train.csv --type xgboost --method lime
# Explain with plot output
mlcli explain -m models/rf_model.pkl -d data/train.csv -t random_forest -e shap --plot-output feature_importance.png
# Explain single instance
mlcli explain-instance --model models/rf_model.pkl --data data/test.csv --type random_forest --instance 0
mlcli explain-instance -m models/xgb_model.pkl -d data/test.csv -t xgboost -i 5 -e lime| Option | Description |
|---|---|
--model, -m |
Path to saved model file |
--data, -d |
Path to data file |
--type, -t |
Model type (random_forest, xgboost, logistic_regression) |
--method, -e |
Explanation method: shap or lime |
--num-samples, -n |
Number of samples to explain (default: 100) |
--output, -o |
Path to save explanation results (JSON) |
--plot/--no-plot |
Generate explanation plot |
--plot-output, -p |
Path to save plot (PNG) |
| Feature | SHAP | LIME |
|---|---|---|
| Type | Global + Local | Local |
| Theory | Game Theory (Shapley Values) | Local Surrogate Models |
| Best For | Tree models (RF, XGBoost) | Any black-box model |
| Speed | Fast for trees | Slower (samples required) |
| Consistency | Mathematically consistent | Varies by sampling |
| Category | Method | Description |
|---|---|---|
| Scaling | standard_scaler |
Standardize to zero mean, unit variance |
minmax_scaler |
Scale to range (default 0-1) | |
robust_scaler |
Scale using median/IQR (outlier-resistant) | |
| Normalization | normalizer |
Normalize samples to unit norm |
l1_normalizer |
L1 norm normalization | |
l2_normalizer |
L2 norm normalization | |
| Encoding | label_encoder |
Encode labels to 0 to n_classes-1 |
onehot_encoder |
One-hot encode categorical features | |
ordinal_encoder |
Ordinal encode categorical features | |
| Feature Selection | select_k_best |
Select top K features |
rfe |
Recursive Feature Elimination | |
variance_threshold |
Remove low-variance features |
# List available preprocessors
mlcli list-preprocessors
# StandardScaler
mlcli preprocess --data data/train.csv --output data/train_scaled.csv --method standard_scaler
# MinMaxScaler
mlcli preprocess -d data/train.csv -o data/train_minmax.csv -m minmax_scaler --range-min 0 --range-max 1
# RobustScaler (outlier-resistant)
mlcli preprocess -d data/train.csv -o data/train_robust.csv -m robust_scaler
# Normalize Data (L2 norm)
mlcli preprocess -d data/train.csv -o data/train_norm.csv -m normalizer --norm l2
# Feature Selection with SelectKBest
mlcli preprocess -d data/train.csv -o data/train_selected.csv -m select_k_best --target label --k 10
# Feature Selection with RFE
mlcli preprocess -d data/train.csv -o data/train_rfe.csv -m rfe --target label --k 15
# Remove Low-Variance Features
mlcli preprocess -d data/train.csv -o data/train_var.csv -m variance_threshold --threshold 0.1
# Save Fitted Preprocessor
mlcli preprocess -d data/train.csv -o data/train_scaled.csv -m standard_scaler --save-preprocessor models/scaler.pkl
# Preprocessing Pipeline (Multiple Steps)
mlcli preprocess-pipeline --data data/train.csv --output data/processed.csv --steps "standard_scaler,select_k_best" --target label| Option | Description |
|---|---|
--data, -d |
Path to input CSV data |
--output, -o |
Path to save preprocessed data |
--method, -m |
Preprocessing method |
--target, -t |
Target column (for feature selection) |
--columns, -c |
Specific columns to preprocess |
--k |
Number of features (SelectKBest/RFE) |
--threshold |
Variance threshold |
--norm |
Norm type (l1, l2, max) |
--range-min, --range-max |
MinMaxScaler range |
--save-preprocessor, -s |
Save fitted preprocessor |
Launch the interactive terminal UI:
mlcli ui- Train Model - Select config, model type, and override parameters
- Evaluate Model - Load and evaluate saved models
- View Experiments - Browse, filter, and export experiment history
- List Models - View all registered trainers with metadata
| Key | Action |
|---|---|
h |
Go to Home screen |
q |
Quit application |
Enter |
Select/Confirm |
↑↓ |
Navigate lists |
Tab |
Move between fields |
MLCLI includes a built-in experiment tracker that logs all training runs.
- Run ID (UUID)
- Model type
- Configuration parameters
- Training metrics (accuracy, precision, recall, F1, etc.)
- Training duration
- Timestamp
# List all runs
mlcli list-runs
# Show specific run details
mlcli show-run <run-id>
# Export to CSV
mlcli export-runs --output experiments.csvAll experiment data is stored in runs/ directory as JSON files.
- Create a new file in
mlcli/trainers/:
from mlcli.trainers.base_trainer import BaseTrainer
from mlcli.utils.registry import register_model
@register_model(
name="my_custom_model",
description="My Custom Model Trainer",
framework="custom",
model_type="classification"
)
class MyCustomTrainer(BaseTrainer):
def train(self, X_train, y_train, X_val=None, y_val=None):
# Implementation
pass
def evaluate(self, X_test, y_test):
# Implementation
pass
def predict(self, X):
# Implementation
pass
@classmethod
def get_default_params(cls):
return {"param1": "value1"}- Import in
mlcli/trainers/__init__.py:
from mlcli.trainers.my_custom_trainer import MyCustomTrainerThe model will be automatically registered and available via CLI!
Make sure the virtual environment is activated and mlcli is installed:
.\.venv\Scripts\Activate.ps1
pip install -e .Install in development mode:
pip install -e .Ensure your data file exists at the specified path in the config file.
Neural networks need standardized features. Use preprocessing:
mlcli preprocess -d data/train.csv -o data/train_scaled.csv -m standard_scalerInstall skl2onnx:
pip install skl2onnxInstall optuna for Bayesian optimization:
pip install optunaInstall SHAP and LIME:
pip install shap lime matplotlib| Task | Command |
|---|---|
| Install | pip install mlcli-toolkit |
| Show help | mlcli --help |
| List models | mlcli list-models |
| List tuners | mlcli list-tuners |
| List explainers | mlcli list-explainers |
| List preprocessors | mlcli list-preprocessors |
| Train model | mlcli train --config <config.json> |
| Tune hyperparameters | mlcli tune -c <config> -m random -n 100 |
| Explain model (SHAP) | mlcli explain -m <model.pkl> -d <data.csv> -t <type> -e shap |
| Explain instance | mlcli explain-instance -m <model.pkl> -d <data.csv> -t <type> -i <idx> |
| Preprocess data | mlcli preprocess -d <data.csv> -o <output.csv> -m standard_scaler |
| Evaluate model | mlcli eval --model-path <path> --data-path <path> --model-type <type> |
| List runs | mlcli list-runs |
| Export runs | mlcli export-runs --output <file.csv> |
| Launch UI | mlcli ui |
This project is licensed under the MIT License.