The Machine Learning Engineering Agent is an approach to building Machine Learning Engineering (MLE) agents that can train state-of-the-art machine learning models on various tasks (including classification and regression tasks), through a novel approach of leveraging web search and targeted code block refinement. Using the example of predicting California housing prices, we show how MLE-STAR can create a regression model based on factors like population, income, etc. that outperforms traditional approaches to training ML models. The experimental results show that MLE-STAR achieves medals in 63.6% of the Kaggle competitions on the MLE-bench-Lite, significantly outperforming the best alternative. The implementation is based on the Google Cloud AI Research paper "MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement" (https://www.arxiv.org/abs/2506.15692).
Performance of MLE agents on MLE-Bench-Lite datasets.
| MLE Agents | Base LLM | Any Medals | Gold Medals | Silver Medals | Bronze Medals |
|---|---|---|---|---|---|
| MLE-STAR | Gemini-2.5-Pro | 63.6% | 36.4% | 21.2% | 6.1% |
| MLE-STAR | Gemini-2.5-Flash | 43.9% | 30.3% | 4.5% | 9.1% |
The key features of the Machine Learning Agent include:
| Feature | Description |
|---|---|
| Interaction Type | Conversational |
| Complexity | Advanced |
| Agent Type | Multi Agent |
| Components | Tools: Code execution, Retrieval |
| Vertical | All |
This diagram shows the detailed architecture of the agents and tools used
to implement this workflow.
-
Initial Solution Generation: Uses a search engine to retrieve state-of-the-art models and their example codes, then merges the best-performing candidates into a consolidated initial solution.
-
Code Block Refinement: Iteratively improves the solution by identifying and targeting specific code blocks (ML pipeline components) that have the most significant impact on performance, determined through ablation studies. An inner loop refines the targeted block with various strategies.
-
Ensemble Strategies: Introduces a novel ensembling method where the Agent proposes and refines ensemble strategies to combine multiple solutions, aiming for superior performance than individual best solutions.
-
Robustness Modules: Includes a debugging agent for error correction, a data leakage checker to prevent improper data access during preprocessing, and a data usage checker to ensure all provided data sources are utilized.
- Python 3.12+
- uv for dependency management and packaging
- See the official uv website for installation.
curl -LsSf https://astral.sh/uv/install.sh | sh - Git
- Git can be downloaded from https://git-scm.com/. Then follow the installation guide.
- Google Cloud Account
- You need a Google Cloud account
- A project on Google Cloud Platform
- Google Cloud CLI
- For installation, please follow the instruction on the official Google Cloud website.
Use the Agent Starter Pack to scaffold a production-ready project and choose your deployment target (Vertex AI Agent Engine or Cloud Run), with CI/CD and other production features. The easiest way is with uv (one command, no venv or pip install needed):
uvx agent-starter-pack create my-mle-agent -a adk@machine-learning-engineeringIf you don't have uv yet: curl -LsSf https://astral.sh/uv/install.sh | sh
The starter pack will prompt you to select deployment options and set up your Google Cloud project.
Alternative: Using pip and a virtual environment
# Create and activate a virtual environment
python -m venv .venv && source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install the starter pack and create your project
pip install --upgrade agent-starter-pack
agent-starter-pack create my-mle-agent -a adk@machine-learning-engineeringFrom your newly created project directory (e.g. my-mle-agent), run:
cd my-mle-agent
uv sync --dev
uv run adk run machine_learning_engineeringFor the web UI:
uv run adk webThen select machine_learning_engineering from the dropdown menu.
Alternative: Local development (run from this sample repo)
- Clone the repository:
git clone https://github.com/google/adk-samples.git
cd adk-samples/python/agents/machine-learning-engineeringFor the rest of this tutorial ensure you remain in the python/agents/machine-learning-engineering directory.
2. Install the dependencies:
uv sync- Configure settings:
Set up Google Cloud credentials. You may set the following environment variables in your shell, or in a
.envfile instead. Authenticate your GCloud account.
Prepare your task
You should prepare the inputs for your task in the following way:
- Create a folder under
taskswith the name of your task. - In that folder, create a file containing the description of the task.
- Place the data files in this folder.
Using adk
ADK provides convenient ways to bring up agents locally and interact with them. You may talk to the agent using the CLI:
adk run machine_learning_engineeringOr on a web interface:
adk webThe command adk web will start a web server on your machine and print the URL.
uv sync --dev
uv run pytest tests
uv run pytest evalYou will need to have specified a GCS bucket in the environment variable GOOGLE_CLOUD_BUCKET as detailed in the Configuration section.
If the bucket does not exist, ADK will create one for you. This is the easiest option. If the bucket does exist, then you must provide permissions to the service account as described in this Troubleshooting article.
The Machine Learning Engineering Agent can be deployed to Vertex AI Agent Engine using the following commands:
uv sync --group deployment
uv run deployment/deploy.py --createWhen the deployment finishes, it will print a line like this:
Created remote agent: projects/<PROJECT_NUMBER>/locations/<PROJECT_LOCATION>/reasoningEngines/<AGENT_ENGINE_ID>
If you forget the AGENT_ENGINE_ID, you can list the existing agents using:
uv run deployment/deploy.py --listThe output will be like:
All remote agents:
123456789 ("machine_learning_engineering")
- Create time: 2025-07-11 09:46:07+00:00
- Update time: 2025-05-10 09:46:09+00:00
You may interact with the deployed agent using the test_deployment.py script
$ export USER_ID=<any string>
$ uv run deployment/test_deployment.py --resource_id=${AGENT_ENGINE_ID} --user_id=${USER_ID}
Found agent with resource ID: ...
Created session for user ID: ...
Type 'quit' to exit.
Input: Hello. What can you do for me?
Response: Hello! I'm a Machine Learning Engineer Assistant. I can help you achieve competition-level quality in solving machine learning tasks.
To get started, please provide the task description of the competition.To delete the deployed agent, you may run the following command:
uv run deployment/deploy.py --delete --resource_id=${AGENT_ENGINE_ID}You may open the URL, select "machine_learning_engineering" in the top-left drop-down menu, and a chatbot interface will appear on the right. The conversation is initially blank. Here are some example requests you may ask the Machine Learning Agent to identity itself:
[user]: who are you?
[mle_frontdoor_agent]: I am a machine learning engineer agent.
[user]: what can you do?
[mle_frontdoor_agent]: I am a machine learning engineer. My primary role is to engineer solutions for machine learning tasks, such as the California Housing Task. I can also describe the task if you'd like. I work by executing a sequence of sub-agents to solve the machine learning engineering task.
[user]: describe the task that you have
[mle_frontdoor_agent]: The task I have is the California Housing Task. This task involves predicting the median house value in California districts, given various features about those districts. It's a regression problem where the goal is to build a model that can accurately estimate house prices based on factors like population, median income, and housing age within a district.
[user]: execute the task
[mle_frontdoor_agent]: intermediate output snipped.
# Save the submission file to CSV without the index print(f"Submission file saved successfully to {submission_file_path}")
This document describes the required configuration parameters in the DefaultConfig dataclass.
- Description: Specifies the directory path where the machine learning tasks and their data are stored.
- Type:
str - Default:
"./machine_learning_engineering/tasks/"
- Description: The name of the specific task to be loaded and processed.
- Type:
str - Default:
"california-housing-prices"
- Description: Defines the type of machine learning problem.
- Type:
str - Default:
"Tabular Regression"
- Description: A boolean flag, indicating whether a lower value of the metric is better.
- Type:
bool - Default:
True
- Description: The directory path used for saving intermediate outputs, results, logs, or any other artifacts generated during the task execution.
- Type:
str - Default:
"./machine_learning_engineering/workspace/"
- Description: Specifies the identifier for the LLM model to be used by the agent. It defaults to the value of the environment variable
ROOT_AGENT_MODELor"gemini-2.0-flash-001"if the variable is not set. - Type:
str - Default:
os.environ.get("ROOT_AGENT_MODEL", "gemini-2.0-flash-001")