Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Data Engineering Agent

A comprehensive data engineering agent.

Features

  • Dataform Pipeline Development

    • Build and modify Dataform pipelines
    • Create and update SQLx files
    • Handle UDFs and stored procedures integration
    • Manage table schemas and data types
  • Dataform Troubleshooting

    • Diagnose pipeline issues
    • Analyze execution logs
    • Fix compilation errors
    • Optimize pipeline performance
  • Data Engineering

    • Design and implement data transformations
    • Handle complex SQL queries
    • Manage data dependencies
    • Ensure data quality

Setup and Installation

Prerequisites

  • Python 3.10+
  • uv for dependency management and packaging
    curl -LsSf https://astral.sh/uv/install.sh | sh
  • A project on Google Cloud Platform
  • Google Cloud CLI

Agent Starter Pack (recommended)

Use the Agent Starter Pack to scaffold a production-ready project and choose your deployment target (Vertex AI Agent Engine or Cloud Run), with CI/CD and other production features. The easiest way is with uv (one command, no venv or pip install needed):

uvx agent-starter-pack create my-data-engineering-agent -a adk@data-engineering

If you don't have uv yet: curl -LsSf https://astral.sh/uv/install.sh | sh

The starter pack will prompt you to select deployment options and set up your Google Cloud project.

Alternative: Using pip and a virtual environment
# Create and activate a virtual environment
python -m venv .venv && source .venv/bin/activate # On Windows: .venv\Scripts\activate

# Install the starter pack and create your project
pip install --upgrade agent-starter-pack
agent-starter-pack create my-data-engineering-agent -a adk@data-engineering

From your newly created project directory (e.g. my-data-engineering-agent), configure your environment and run:

cd my-data-engineering-agent
cp data_engineering_agent/.env.example .env
# Edit .env with your Google Cloud project, Dataform repository, and workspace settings
uv sync --dev
uv run adk run data_engineering_agent

For the web UI:

uv run adk web

Then select data_engineering_agent from the dropdown menu.


Alternative: Local development (run from this sample repo)

Agent Setup

  1. Clone the repository:

    git clone https://github.com/google/adk-samples.git
    cd adk-samples/python/agents/data-engineering

    For the rest of this tutorial ensure you remain in the python/agents/data-engineering directory.

  2. Install the dependencies:

    uv sync
  3. Configuration

    Create a .env file from the provided template and fill in your settings:

    cp .env.example .env

    Edit the .env file with your configuration (see .env.example for all available options):

Environment Variables

  • GOOGLE_CLOUD_PROJECT: Your Google Cloud Project ID (preferred) or use GCP_PROJECT_ID
  • GOOGLE_CLOUD_LOCATION: Location for Vertex AI, Dataform, and BigQuery (default: us-central1)
  • GOOGLE_GENAI_USE_VERTEXAI: Set to 1 to use Vertex AI (default: 0)
  • ROOT_AGENT_MODEL: Model to use for the root agent (default: gemini-2.5-pro)
  • DATAFORM_REPOSITORY_NAME: Your Dataform repository name
  • DATAFORM_WORKSPACE_NAME: Your Dataform workspace name

Running the Agent Locally

Run ADK from the upper folder: cd .. adk web or adk run

Example

# Prompt
Enrich the `new_york_taxi_trips.tlc_green_trips_2022` table by generating date and time features from the `pickup_datetime` column. Use `new_york_taxi_trips` dataset for the enriched table.

# Agent Response
Of course. I will enrich the tlc_green_trips_2022 table by creating a new table with additional date and time features from the pickup_datetime column.

First, I need to get information about the existing table to ensure I have the correct column names and data types.

I have created a new table named enriched_trips in the new_york_taxi_trips dataset. This table includes the original data from tlc_green_trips_2022 plus the following new columns:

pickup_date: The date of the pickup.
pickup_day_of_week: The day of the week of the pickup (e.g., Sunday, Monday).
pickup_hour_of_day: The hour of the day of the pickup.
I have also added the declaration file for the source table.

I will now compile the project to view the DAG.

I have successfully compiled the Dataform project. The compilation was successful, and I have the pipeline's DAG.

Development

uv sync --dev
uv run pytest tests
uv run pytest eval

tests runs the agent on a sample request, and makes sure that every component is functional. eval is a demonstration of how to evaluate the agent, using the AgentEvaluator in ADK. It sends a couple requests to the agent and expects that the agent's responses match a pre-defined response reasonably well.

Deployment

The Data Engineering Agent can be deployed to Vertex AI Agent Engine using the following commands:

uv sync --group deployment
uv run deployment/deploy.py --create

When the deployment finishes, it will print a line like this:

Created remote agent: projects/<PROJECT_NUMBER>/locations/<PROJECT_LOCATION>/reasoningEngines/<AGENT_ENGINE_ID>

If you forget the AGENT_ENGINE_ID, you can list the existing agents using:

uv run deployment/deploy.py --list

The output will be like:

All remote agents:

123456789 ("data_engineering_agent")
- Create time: 2025-08-05 12:35:34.245431+00:00
- Update time: 2025-08-05 12:36:01.421432+00:00

You may interact with the deployed agent using the test_deployment.py script

$ export USER_ID=<any string>
$ uv run deployment/test_deployment.py --resource_id=${AGENT_ENGINE_ID} --user_id=${USER_ID}
Found agent with resource ID: ...
Created session for user ID: ...
Type 'quit' to exit.
Input: Hello, what can you do for me?
Response: 

To delete the deployed agent, you may run the following command:

uv run deployment/deploy.py --delete --resource_id=${AGENT_ENGINE_ID}

License

Copyright 2025 Google LLC

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

  http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.