This project provides a general-purpose Flask web application for serving simple classification or regression models trained using LightGBM (LGBMClassifier or LGBMRegressor).
The key feature is that the application is driven by a configuration file (config.json) which defines the model to use, the expected input features (names, types, valid ranges/options), and target variable details. This allows deploying different LightGBM models with varying features without changing the core application code.
- Web Interface: Simple HTML form generated dynamically based on the configuration.
- Configuration Driven: Model path, feature definitions (name, type, constraints), target info are all loaded from
config.json. - Supports LightGBM: Specifically designed for models saved using LightGBM's
Booster.save_model()method (typically.txtfiles). - Classification & Regression: Handles both
LGBMClassifierandLGBMRegressorbased on the configuration. - Dynamic UI Elements:
- Dropdowns (
<select>) for categorical features based on options in the config. - Number inputs (
<input type="number">) for numerical features with min/max hints from the config.
- Dropdowns (
- Input Validation: Basic checks for missing values and valid options for categorical features.
- Prediction Display: Shows the prediction result (class name/index/probability for classifiers, value for regressors) on the web page.
- Configuration Generation Helper: Includes a script (
generate_config.py) to create a configuration template from a Pandas DataFrame. - Example Training Script: Provides
train_example.pydemonstrating how to train a model, save it, and use the config generator.
.
├── models/ # Directory to store trained model files
│ └── iris_lgbm_classifier.txt # Example trained model
├── templates/ # Flask HTML templates
│ ├── index.html # Main prediction form template
│ └── error.html # Template for application errors
├── app.py # The main Flask application logic
├── config.json # Configuration file driving the app (NEEDS MANUAL EDITING)
├── generate_config.py # Helper script to generate config template from data
├── train_example.py # Example script to train a model and generate config template
├── requirements.txt # Python dependencies
└── README.md # This file
-
Clone the Repository:
git clone <your-repository-url> cd <repository-directory>
Or download and extract the source code.
-
Create a Virtual Environment (Recommended):
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install Dependencies:
pip install -r requirements.txt
This is the most important file for customizing the application. You need to create/edit config.json in the root directory.
-
Generation: You can generate a template using the
generate_config.pyscript after preparing your training data in a Pandas DataFrame:# Example: Generate template from 'my_data.csv' where 'target_var' is the target # python generate_config.py target_var --data_path my_data.csv -o config_template.json # (The current generate_config.py uses dummy data if no path specified) python generate_config.py <your_target_column_name> -o config_template.json
This creates
config_template.json. You MUST review and edit this template. -
Manual Editing:
- Rename
config_template.json(or your generated file) toconfig.json. - Open
config.jsonand edit the following fields:"model_type": Set to"classifier"or"regressor"."model_path": Crucial: Set the relative path to your saved LightGBM model file (e.g.,"models/my_model.txt"). The model should be saved usingbooster.save_model()."target_name": Name of the variable being predicted."target_type":"categorical"or"numerical"."class_names": Required for Classifiers if you want readable class names instead of indices. This must be a list of strings corresponding to the exact order of classes used during model training (often determined byLabelEncoder's.classes_attribute or the order of categories if using PandasCategoricaltype directly). Example:["setosa", "versicolor", "virginica"]."features": Review the list of features. Ensuretype("numerical"or"categorical"),range(for numerical hints),options(for categorical dropdowns), anddtypeare correct based on your training data.
- Rename
-
Example
config.jsonstructure:{ "model_type": "classifier", "model_path": "models/iris_lgbm_classifier.txt", "target_name": "species", "target_type": "categorical", "class_names": ["setosa", "versicolor", "virginica"], "features": [ { "name": "sepal length (cm)", "type": "numerical", "range": [4.3, 7.9], "dtype": "float64" }, { "name": "garden_location", "type": "categorical", "options": ["Mixed", "Shady", "Sunny"], // Order might matter depending on encoding "dtype": "category" } // ... other features ] }
-
Prepare Data: Load your data into a Pandas DataFrame. Ensure categorical features are appropriately typed (e.g.,
categoryorobject) and the target variable is correct. -
Train Model: Use LightGBM (
LGBMClassifierorLGBMRegressor). Remember to handle categorical features correctly during training (e.g., using thecategorical_featureparameter ordtype='category'). -
Save Model: Save the booster object, not the scikit-learn wrapper, using
model.booster_.save_model('path/to/your/model.txt'). Place the saved model file (e.g.,my_model.txt) inside themodels/directory (or updatemodel_pathinconfig.jsonaccordingly). -
Generate/Update Config: Use
generate_config.pyon your training data to get a template. Crucially, update themodel_pathandclass_names(for classifiers) in the resultingconfig.jsonfile. Verify all feature details.See
train_example.pyfor a practical demonstration.
-
Make sure your
config.jsonis correctly configured and points to a valid model file within themodelsdirectory. -
From the root directory of the project (where
app.pyis located), run:flask run
Or, for more control (e.g., setting host/port):
python app.py
(The
app.pyscript currently runs on0.0.0.0:5000with debug mode enabled). -
Open your web browser and navigate to
http://127.0.0.1:5000/(or the appropriate host/port if you changed it).
- The web page will display a form with input fields for each feature defined in
config.json. - For numerical features, enter a number. Hints about the typical range might be displayed.
- For categorical features, select an option from the dropdown menu.
- Click the "Predict" button.
- The application will process the input, run the prediction using the loaded LightGBM model, and display the result below the form.
- If there are input errors or prediction issues, an error message will be shown. Check the console logs where Flask is running for more details.
- Flask
- LightGBM
- Pandas
- Numpy
- Scikit-learn (primarily used in the example training script)
See requirements.txt for specific versions.
- Preprocessing: Add more robust preprocessing steps within
app.pyif needed (e.g., scaling, encoding) based on how the model was trained. This might require storing preprocessing objects (like scalers or encoders) alongside the model. - Model Types: Extend to support other model libraries (e.g., Scikit-learn, XGBoost) by adding logic to
load_modeland potentially adjusting the prediction call. - Input Methods: Allow file uploads (e.g., CSV) for batch predictions.
- Error Handling: Improve user feedback on errors.
- Styling: Enhance the web interface appearance.
- Dockerization: Package the application into a Docker container for easier deployment.