Skip to content

Latest commit

 

History

History
345 lines (233 loc) · 15.1 KB

File metadata and controls

345 lines (233 loc) · 15.1 KB

Local Virtual Environment (virtualenv)

The easiest way to run tests for Airflow is to use local virtualenv. While Breeze is the recommended way to run tests - because it provides a reproducible environment and is easy to set up, it is not always the best option as you need to run your tests inside a docker container. This might make it harder to debug the tests and to use your IDE to run them.

That's why we recommend using local virtualenv for development and testing.

Use system-level package managers like yum, apt-get for Linux, or Homebrew for macOS to install required software packages:

  • Python (One of: 3.10, 3.11, 3.12, 3.13, 3.14)
  • MySQL 5.7+
  • libxml
  • helm (only for helm chart tests)

There are also sometimes other system level packages needed to install python packages - especially those that are coming from providers. For example you might need to install pkgconf to be able to install mysqlclient package for mysql provider . Or you might need to install graphviz to be able to install devel extra bundle.

Please refer to the Dockerfile.ci for a comprehensive list of required packages.

Note

  • MySql 2.2.0 needs pkgconf to be a pre requisite, refer here to install pkgconf
  • MacOs with ARM architectures require graphviz for venv setup, refer here to install graphviz
  • The helm chart tests need helm to be installed as a pre requisite. Refer here to install and setup helm

Note

As of version 2.8 Airflow follows PEP 517/518 and uses pyproject.toml file to define build dependencies and build process and it requires relatively modern versions of packaging tools to get Airflow built from local sources or sdist packages, as PEP 517 compliant build hooks are used to determine dynamic build dependencies. In case of pip it means that at least version 22.1.0 is needed (released at the beginning of 2022) to build or install Airflow from sources. This does not affect the ability of installing Airflow from released wheel packages.

As of November 2024 we are recommending to use uv for local virtualenv management for Airflow development. The uv utility is a build frontend tool that is designed to manage python, virtualenvs and workspaces for development and testing of Python projects. It is a modern tool that is designed to work with PEP 517/518 compliant projects and it is much faster than "reference" pip tool. It has extensive support to not only create development environment but also to manage python versions, development environments, workspaces and Python tools used to develop Airflow (via uv tool command - such as prek and others, you can also use uv tool to install breeze - containerized development environment for Airflow that we use to reproduce the CI environment locally and to run release-management and certain development tasks.

You can read more about uv in UV Getting started but below you will find a few typical steps to get you started with uv.

You can follow the installation instructions to install uv on your system. Once you have uv installed, you can do all the environment preparation tasks using uv commands.

Note

Mac OS has a low ulimit setting (256) for number of opened file descriptors which does not work well with our workspace when installing it and you can hit Too many open files error. You should run the ulimit -n 2048 command to increase the limit of file descriptors to 2048 (for example). It's best to add the ulimit command to your shell profile (~/.bashrc, ~/.zshrc or similar) to make sure it's set for all your terminal sessions automatically. Other than small increase in resource usage it has no negative impact on your system.

Note

This step can be skipped - uv will automatically install the Python version you need when you create a virtualenv.

You can install Python versions using uv python install command. For example, to install Python 3.10.7, you can run:

uv python install 3.10.7

This is optional step - uv will automatically install the Python version you need when you create a virtualenv.

Note

This can be skipped, uv will automatically create a virtualenv when you run uv sync.

uv venv

This will create a default venv in your project's .venv directory. You can also create a venv with a specific Python version by running:

uv venv --python 3.10.7

You can also create a venv with a different venv directory name by running:

uv venv .my-venv

However uv creation/re-creation of venvs is so fast that you can easily create and delete venvs as needed. So usually you do not need to have more than one venv and recreate it as needed - for example when you need to change the python version.

In a project like Airflow it's important to have a consistent set of dependencies across all developers. You can use uv sync to install dependencies from pyproject.toml file. This will install all dependencies from the pyproject.toml file in the current directory - including devel dependencies of airflow, all providers dependencies.

uv sync

This will synchronize core dependencies of Airflow including all optional core dependencies as well as installs sources for all preinstalled providers and their dependencies.

For example this is how you install dependencies for amazon provider, amazon provider sources, all provider sources that amazon provider depends on and all development dependencies of the provider:

uv sync --package apache-airflow-providers-amazon

You can also synchronize all extras including development dependencies of all providers, task-sdk and other packages by running:

uv sync --all-packages

This will synchronize all development extras of Airflow and all packages (this might require some additional system dependencies to be installed - depending on your OS requirements).

When you only want to work on airflow-core, you can run uv sync in the airflow-core folder. This will install all dependencies needed to run tests for airflow-core.

cd airflow-core
uv sync

TODO(potiuk): This will not work yet - until we move some remaining provider tests from airflow-core. For now you need to add --all-package to install all providers and their dependencies.

cd airflow-core
uv sync --all-packages

Sometimes you want to only work on a specific provider and you only want to install that provider's dependencies and run only that provider's tests. This can be done very easily with uv by going to the provider's folder and running uv sync there. For example, to install dependencies of the mongo provider, you can run:

cd providers/mongo
uv sync

This will use the .venv environment in the root of your project and will install dependency of your provider and providers it depends on and its development dependencies.

Then running tests for the provider is as simple as activating the venv in the main repo and running pytest command - or alternatively running uv run in the provider directory.:

uv run pytest

Note that the uv sync command will automatically synchronize all dependencies needed for your provider and its development dependencies.

While uv uses workspace feature to synchronize both Airflow and Providers in a single sync command, you can still use other frontend tools (such as pip) to install Airflow and Providers and to develop them without relying on sync and workspace features of uv. Below chapters describe how to do it with pip.

In Airflow 2.0 we introduced split of Apache Airflow into separate distributions - there is one main apache-airflow package with core of Airflow and 90+ distributions for all providers (external services and software Airflow can communicate with).

In Airflow 3.0 we moved each provider to a separate sub-folder in "providers" directory - and each of those providers is a separate distribution with its own pyproject.toml file. The uv workspace feature allows to install all the distributions together and work together on all or only selected providers.

When you install Airflow from sources using editable install you only install Airflow now, but as described in the previous chapter, you can develop together both - main version of Airflow and providers of your choice, which is pretty convenient, because you can use the same environment for both.

You can install the dependencies of the provider you want to develop by installing the provider distribution in editable mode.

The dependencies for providers are configured in providers/PROVIDER/pyproject.toml files - separately for each provider. You can find there two types of dependencies - production runtime dependencies, and sometimes development dependencies (in dev dependency group) which are needed to run tests and are installed automatically when you install environment with uv sync.

If you want to add another dependency to a provider, you should add it to corresponding pyproject.toml, add the files to your commit with git add and run prek to update generated dependencies. Note that in the future we will remove that step.

For uv it's simple, you need to run uv sync in providers directory after you modified pyproject.toml file in the provider.

cd providers/PROVIDER
uv sync

This will install all dependencies of the provider in the virtualenv of airflow. Then running tests for the provider is as simple as running:

uv run pytest

The uv.lock file is committed to the Airflow repository and is used by uv sync to ensure consistent dependency versions across all developers. When you run uv sync, it uses the lock file to install exact dependency versions, so you don't need to pass constraint files manually.

The uv sync command prefers the locked versions of dependencies from uv.lock. It will only attempt to resolve new dependencies when pyproject.toml files change (e.g. when a new dependency is added or version bounds are modified). This means that day-to-day uv sync is fast and deterministic — it simply installs what the lock file specifies without re-resolving the dependency tree.

If you want to make sure that uv sync does not update your lock file at all (for example in CI or when running tests), you can pass the --frozen flag:

uv sync --frozen

This will fail if the lock file is out of date with respect to pyproject.toml, rather than silently updating it. This is useful when you want to guarantee fully reproducible environments.

The [tool.uv] section in the top-level pyproject.toml sets exclude-newer = "4 days". This acts as a cooldown period — when uv resolves new dependencies, it ignores package versions released in the last 4 days. This protects against broken or yanked releases that might otherwise immediately break the dependency resolution for all developers. When uv writes the lock file, it records the resolved exclude-newer timestamp so that subsequent uv sync calls use the same cutoff, ensuring consistency across machines.

Airflow also publishes traditional pip-style constraint files (see Airflow dependencies and extras for details). When installing Airflow from sources, these constraint files are generated directly from uv.lock using uv export --frozen, which converts the lock file into a flat list of pinned versions suitable for pip install --constraint. This ensures that both the uv sync workflow and the pip constraint workflow install the same dependency versions.

The lock file is updated regularly — whenever dependencies are changed via any pyproject.toml and when breeze ci upgrade is run. Make sure to use the latest main branch to get the most up-to-date uv.lock.

Running tests is described in Testing documentation.

While most of the tests are typical unit tests that do not require external components, there are a number of Integration tests. You can use local virtualenv to run those tests and also setup databases - and sometimes other external components (for integration test).

So, generally it should be easier to use the Breeze development environment (especially for Integration tests) = especially if you want to run tests with database different than sqlite.

When analyzing the situation, it is helpful to be able to directly query the database. You can do it using the built-in Airflow command (however you needs a CLI client tool for each database to be installed):

airflow db shell

The command will explain what CLI tool is needed for the database you have configured.


As the next step, it is important to learn about Static code checks.that are used to automate code quality checks. Your code must pass the static code checks to get merged.