A comprehensive data engineering agent.
-
Dataform Pipeline Development
- Build and modify Dataform pipelines
- Create and update SQLx files
- Handle UDFs and stored procedures integration
- Manage table schemas and data types
-
Dataform Troubleshooting
- Diagnose pipeline issues
- Analyze execution logs
- Fix compilation errors
- Optimize pipeline performance
-
Data Engineering
- Design and implement data transformations
- Handle complex SQL queries
- Manage data dependencies
- Ensure data quality
- Python 3.10+
- uv for dependency management and packaging
- See the official uv website for installation.
curl -LsSf https://astral.sh/uv/install.sh | sh - A project on Google Cloud Platform
- Google Cloud CLI
- For installation, please follow the instruction on the official Google Cloud website.
Use the Agent Starter Pack to scaffold a production-ready project and choose your deployment target (Vertex AI Agent Engine or Cloud Run), with CI/CD and other production features. The easiest way is with uv (one command, no venv or pip install needed):
uvx agent-starter-pack create my-data-engineering-agent -a adk@data-engineeringIf you don't have uv yet: curl -LsSf https://astral.sh/uv/install.sh | sh
The starter pack will prompt you to select deployment options and set up your Google Cloud project.
Alternative: Using pip and a virtual environment
# Create and activate a virtual environment
python -m venv .venv && source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install the starter pack and create your project
pip install --upgrade agent-starter-pack
agent-starter-pack create my-data-engineering-agent -a adk@data-engineeringFrom your newly created project directory (e.g. my-data-engineering-agent), configure your environment and run:
cd my-data-engineering-agent
cp data_engineering_agent/.env.example .env
# Edit .env with your Google Cloud project, Dataform repository, and workspace settings
uv sync --dev
uv run adk run data_engineering_agentFor the web UI:
uv run adk webThen select data_engineering_agent from the dropdown menu.
Alternative: Local development (run from this sample repo)
-
Clone the repository:
git clone https://github.com/google/adk-samples.git cd adk-samples/python/agents/data-engineeringFor the rest of this tutorial ensure you remain in the
python/agents/data-engineeringdirectory. -
Install the dependencies:
uv sync
-
Configuration
Create a
.envfile from the provided template and fill in your settings:cp .env.example .env
Edit the
.envfile with your configuration (see.env.examplefor all available options):
GOOGLE_CLOUD_PROJECT: Your Google Cloud Project ID (preferred) or useGCP_PROJECT_IDGOOGLE_CLOUD_LOCATION: Location for Vertex AI, Dataform, and BigQuery (default: us-central1)GOOGLE_GENAI_USE_VERTEXAI: Set to 1 to use Vertex AI (default: 0)ROOT_AGENT_MODEL: Model to use for the root agent (default: gemini-2.5-pro)DATAFORM_REPOSITORY_NAME: Your Dataform repository nameDATAFORM_WORKSPACE_NAME: Your Dataform workspace name
Run ADK from the upper folder: cd .. adk web or adk run
# Prompt
Enrich the `new_york_taxi_trips.tlc_green_trips_2022` table by generating date and time features from the `pickup_datetime` column. Use `new_york_taxi_trips` dataset for the enriched table.
# Agent Response
Of course. I will enrich the tlc_green_trips_2022 table by creating a new table with additional date and time features from the pickup_datetime column.
First, I need to get information about the existing table to ensure I have the correct column names and data types.
I have created a new table named enriched_trips in the new_york_taxi_trips dataset. This table includes the original data from tlc_green_trips_2022 plus the following new columns:
pickup_date: The date of the pickup.
pickup_day_of_week: The day of the week of the pickup (e.g., Sunday, Monday).
pickup_hour_of_day: The hour of the day of the pickup.
I have also added the declaration file for the source table.
I will now compile the project to view the DAG.
I have successfully compiled the Dataform project. The compilation was successful, and I have the pipeline's DAG.
uv sync --dev
uv run pytest tests
uv run pytest evaltests runs the agent on a sample request, and makes sure that every component
is functional. eval is a demonstration of how to evaluate the agent, using the
AgentEvaluator in ADK. It sends a couple requests to the agent and expects
that the agent's responses match a pre-defined response reasonably well.
The Data Engineering Agent can be deployed to Vertex AI Agent Engine using the following commands:
uv sync --group deployment
uv run deployment/deploy.py --createWhen the deployment finishes, it will print a line like this:
Created remote agent: projects/<PROJECT_NUMBER>/locations/<PROJECT_LOCATION>/reasoningEngines/<AGENT_ENGINE_ID>
If you forget the AGENT_ENGINE_ID, you can list the existing agents using:
uv run deployment/deploy.py --listThe output will be like:
All remote agents:
123456789 ("data_engineering_agent")
- Create time: 2025-08-05 12:35:34.245431+00:00
- Update time: 2025-08-05 12:36:01.421432+00:00
You may interact with the deployed agent using the test_deployment.py script
$ export USER_ID=<any string>
$ uv run deployment/test_deployment.py --resource_id=${AGENT_ENGINE_ID} --user_id=${USER_ID}
Found agent with resource ID: ...
Created session for user ID: ...
Type 'quit' to exit.
Input: Hello, what can you do for me?
Response: To delete the deployed agent, you may run the following command:
uv run deployment/deploy.py --delete --resource_id=${AGENT_ENGINE_ID}Copyright 2025 Google LLC
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.