Skip to content

Commit 0a03b4e

Browse files
authored
Docs: Add JWT authentication docs and strengthen security model (#64760)
* Docs: Add JWT authentication docs and strengthen security model Add comprehensive JWT token authentication documentation covering both the REST API and Execution API flows, including token structure, timings, refresh mechanisms, and the DFP/Triggerer in-process bypass. Update the security model to: - Document current isolation limitations (DFP/Triggerer DB access, shared Execution API resources, multi-team not guaranteeing task-level isolation) - Add deployment hardening guidance (per-component config, asymmetric JWT keys, env vars with PR_SET_DUMPABLE protection) - Add "What is NOT a security vulnerability" section covering all categories from the security team's response policies - Fix contradicting statements across docs that overstated isolation guarantees or recommended sharing all config across components Update AGENTS.md with security model awareness so AI agents performing security research distinguish intentional design choices from actual vulnerabilities. * Fix spelling errors and use 'potentially' for DFP/Triggerer access - Add dumpable, sandboxing, unsanitized, XSS to spelling wordlist - Use 'potentially' consistently when describing Dag File Processor and Triggerer database access and JWT authentication bypass, since these are capabilities that Dag author code could exploit rather than guaranteed behaviors of normal operation * Add prek hook to validate security doc constants against config.yml New hook `check-security-doc-constants` validates that: - [section] option references in security RST files match config.yml - AIRFLOW__X__Y env var references correspond to real config options - Default values in doc tables match config.yml defaults - Sensitive config variables are listed (warning, not error, since the list is documented as non-exhaustive) Loads both airflow-core config.yml and provider.yaml files to cover all config sections (including celery, sentry, workers, etc.). Runs automatically when config.yml or security RST docs are modified. * Expand sensitive vars to full list with component mapping and auto-update Update security_model.rst sensitive config variables section: - List ALL sensitive vars from config.yml and provider.yaml files - Core vars organized in a table with "Needed by" column mapping each var to the components that require it (API Server, Scheduler, Workers, Dag File Processor, Triggerer) - Provider vars in a separate table noting they should only be set where the provider functionality is needed - Tables are auto-generated between AUTOGENERATED markers Update prek hook to auto-update the sensitive var tables: - Reads config.yml and all provider.yaml files - Generates RST list-table content for core and provider sensitive vars - Replaces content between markers on each run - Warns when new sensitive vars need component mapping added to the hook - Validates [section] option and AIRFLOW__X__Y references against config - Skips autogenerated sections when checking env var references * Clarify software guards vs intentional access in DFP/Triggerer Address issues raised in security discussion about the gap between Airflow's isolation promises and reality: - Clearly distinguish software guards (prevent accidental DB access) from the inability to prevent intentional malicious access by code running as the same Unix user as the parent process - Document the specific mechanisms: /proc/PID/environ, config files, _CMD commands, secrets manager credential reuse - Clarify that worker isolation is genuine (no DB credentials at all) while DFP/Triggerer isolation is software-level only - Add Unix user impersonation as a deployment hardening measure - Document strategic (API-based DFP/Triggerer) and tactical (user impersonation) planned improvements - Add warning about sensitive config leakage through task logs - Add guidance to restrict task log access * Docs: Improve security docs wording, extract workload isolation, recommend DagBundle - Reword DFP/Triggerer descriptions to clarify software guards vs intentional bypass - Extract workload isolation section from jwt_token_authentication into workload.rst - Recommend Dag Bundle mechanism (GitDagBundle) for DAG synchronization - Fix typo in public-airflow-interface.rst and broken backtick in jwt_token_authentication.rst - Update cross-references between security docs
1 parent dd0ff5a commit 0a03b4e

16 files changed

Lines changed: 1477 additions & 53 deletions

File tree

.github/instructions/code-review.instructions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Use these rules when reviewing pull requests to the Apache Airflow repository.
1111

1212
- **Scheduler must never run user code.** It only processes serialized Dags. Flag any scheduler-path code that deserializes or executes Dag/task code.
1313
- **Flag any task execution code that accesses the metadata DB directly** instead of through the Execution API (`/execution` endpoints).
14-
- **Flag any code in Dag Processor or Triggerer that breaks process isolation** — these components run user code in isolated processes.
14+
- **Flag any code in Dag Processor or Triggerer that breaks process isolation** — these components run user code in separate processes from the Scheduler and API Server, but note that they potentially have direct metadata database access and potentially bypass JWT authentication via in-process Execution API transport. This is an intentional design choice documented in the security model, not a security vulnerability.
1515
- **Flag any provider importing core internals** like `SUPERVISOR_COMMS` or task-runner plumbing. Providers interact through the public SDK and execution API only.
1616

1717
## Database and Query Correctness

AGENTS.md

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -67,15 +67,38 @@ UV workspace monorepo. Key paths:
6767
## Architecture Boundaries
6868

6969
1. Users author Dags with the Task SDK (`airflow.sdk`).
70-
2. Dag Processor parses Dag files in isolated processes and stores serialized Dags in the metadata DB.
70+
2. Dag File Processor parses Dag files in separate processes and stores serialized Dags in the metadata DB. Software guards prevent individual parsing processes from accessing the database directly and enforce use of the Execution API, but these guards do not protect against intentional bypassing by malicious or misconfigured code.
7171
3. Scheduler reads serialized Dags — **never runs user code** — and creates Dag runs / task instances.
72-
4. Workers execute tasks via Task SDK and communicate with the API server through the Execution API — **never access the metadata DB directly**.
72+
4. Workers execute tasks via Task SDK and communicate with the API server through the Execution API — **never access the metadata DB directly**. Each task receives a short-lived JWT token scoped to its task instance ID.
7373
5. API Server serves the React UI and handles all client-database interactions.
74-
6. Triggerer evaluates deferred tasks/sensors in isolated processes.
74+
6. Triggerer evaluates deferred tasks/sensors in separate processes. Like the Dag File Processor, software guards steer it through the Execution API rather than direct database access, but these guards do not protect against intentional bypassing by malicious or misconfigured code.
7575
7. Shared libraries that are symbolically linked to different Python distributions are in `shared` folder.
7676
8. Airflow uses `uv workspace` feature to keep all the distributions sharing dependencies and venv
7777
9. Each of the distributions should declare other needed distributions: `uv --project <FOLDER> sync` command acts on the selected project in the monorepo with only dependencies that it has
7878

79+
## Security Model
80+
81+
When reviewing code, writing security documentation, or performing security research, keep in
82+
mind the following aspects of Airflow's security model. The authoritative reference is
83+
[`airflow-core/docs/security/security_model.rst`](airflow-core/docs/security/security_model.rst)
84+
and [`airflow-core/docs/security/jwt_token_authentication.rst`](airflow-core/docs/security/jwt_token_authentication.rst).
85+
86+
Particularly, the intentional design choices, that are not security vulnerabilities and should not be
87+
reported as such are described in "What is NOT considered a security vulnerability" chapter of the security model.
88+
89+
**When flagging security concerns, distinguish between:**
90+
91+
1. **Actual vulnerabilities** — code that violates the documented security model (e.g., a worker
92+
gaining database access it shouldn't have, a Scheduler executing user code, an unauthenticated
93+
user accessing protected endpoints).
94+
2. **Known limitations** — documented gaps where the current implementation doesn't provide full
95+
isolation (e.g., DFP/Triggerer database access, shared Execution API resources, multi-team
96+
not enforcing task-level isolation). These are tracked for improvement in future versions and
97+
should not be reported as new findings.
98+
3. **Deployment hardening opportunities** — measures a Deployment Manager can take to improve
99+
isolation beyond what Airflow enforces natively (e.g., per-component configuration, asymmetric
100+
JWT keys, network policies). These belong in deployment guidance, not as code-level issues.
101+
79102
# Shared libraries
80103

81104
- shared libraries provide implementation of some common utilities like logging, configuration where the code should be reused in different distributions (potentially in different versions)

airflow-core/.pre-commit-config.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -263,6 +263,16 @@ repos:
263263
require_serial: true
264264
pass_filenames: false
265265
files: ^src/airflow/config_templates/config\.yml$
266+
- id: check-security-doc-constants
267+
name: Check security docs match config.yml constants
268+
entry: ../scripts/ci/prek/check_security_doc_constants.py
269+
language: python
270+
pass_filenames: false
271+
files: >
272+
(?x)
273+
^src/airflow/config_templates/config\.yml$|
274+
^docs/security/jwt_token_authentication\.rst$|
275+
^docs/security/security_model\.rst$
266276
- id: check-airflow-version-checks-in-core
267277
language: pygrep
268278
name: No AIRFLOW_V_* imports in airflow-core

airflow-core/docs/administration-and-deployment/production-deployment.rst

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -62,9 +62,12 @@ the :doc:`Celery executor <apache-airflow-providers-celery:celery_executor>`.
6262

6363

6464
Once you have configured the executor, it is necessary to make sure that every node in the cluster contains
65-
the same configuration and Dags. Airflow sends simple instructions such as "execute task X of Dag Y", but
66-
does not send any Dag files or configuration. You can use a simple cronjob or any other mechanism to sync
67-
Dags and configs across your nodes, e.g., checkout Dags from git repo every 5 minutes on all nodes.
65+
the Dags and configuration appropriate for its role. Airflow sends simple instructions such as
66+
"execute task X of Dag Y", but does not send any Dag files or configuration. For synchronization of Dags
67+
we recommend the Dag Bundle mechanism (including ``GitDagBundle``), which allows you to make use of
68+
DAG versioning. For security-sensitive deployments, restrict sensitive configuration (JWT signing keys,
69+
database credentials, Fernet keys) to only the components that need them rather than sharing all
70+
configuration across all nodes — see :doc:`/security/security_model` for guidance.
6871

6972

7073
Logging

airflow-core/docs/best-practices.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1098,8 +1098,10 @@ The benefits of using those operators are:
10981098
environment is optimized for the case where you have multiple similar, but different environments.
10991099
* The dependencies can be pre-vetted by the admins and your security team, no unexpected, new code will
11001100
be added dynamically. This is good for both, security and stability.
1101-
* Complete isolation between tasks. They cannot influence one another in other ways than using standard
1102-
Airflow XCom mechanisms.
1101+
* Strong process-level isolation between tasks. Tasks run in separate containers/pods and cannot
1102+
influence one another at the process or filesystem level. They can still interact through standard
1103+
Airflow mechanisms (XComs, connections, variables) via the Execution API. See
1104+
:doc:`/security/security_model` for the full isolation model.
11031105

11041106
The drawbacks:
11051107

airflow-core/docs/configurations-ref.rst

Lines changed: 16 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -22,15 +22,22 @@ Configuration Reference
2222
This page contains the list of all the available Airflow configurations that you
2323
can set in ``airflow.cfg`` file or using environment variables.
2424

25-
Use the same configuration across all the Airflow components. While each component
26-
does not require all, some configurations need to be same otherwise they would not
27-
work as expected. A good example for that is :ref:`secret_key<config:api__secret_key>` which
28-
should be same on the Webserver and Worker to allow Webserver to fetch logs from Worker.
29-
30-
The webserver key is also used to authorize requests to Celery workers when logs are retrieved. The token
31-
generated using the secret key has a short expiry time though - make sure that time on ALL the machines
32-
that you run Airflow components on is synchronized (for example using ntpd) otherwise you might get
33-
"forbidden" errors when the logs are accessed.
25+
Different Airflow components may require different configuration parameters, and for
26+
improved security, you should restrict sensitive configuration to only the components that
27+
need it. Some configuration values must be shared across specific components to work
28+
correctly — for example, the JWT signing key (``[api_auth] jwt_secret`` or
29+
``[api_auth] jwt_private_key_path``) must be consistent across all components that generate
30+
or validate JWT tokens (Scheduler, API Server). However, other sensitive parameters such as
31+
database connection strings or Fernet keys should only be provided to components that need them.
32+
33+
For security-sensitive deployments, pass configuration values via environment variables
34+
scoped to individual components rather than sharing a single configuration file across all
35+
components. See :doc:`/security/security_model` for details on which configuration
36+
parameters should be restricted to which components.
37+
38+
Make sure that time on ALL the machines that you run Airflow components on is synchronized
39+
(for example using ntpd) otherwise you might get "forbidden" errors when the logs are
40+
accessed or API calls are made.
3441

3542
.. note::
3643
For more information see :doc:`/howto/set-config`.

airflow-core/docs/core-concepts/multi-team.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ Multi-Team mode is designed for medium to large organizations that typically hav
3838
**Use Multi-Team mode when:**
3939

4040
- You have many teams that need to share Airflow infrastructure
41-
- You need resource isolation (Variables, Connections, Secrets, etc) between teams
41+
- You need resource isolation (Variables, Connections, Secrets, etc) between teams at the UI and API level (see :doc:`/security/security_model` for task-level isolation limitations)
4242
- You want separate execution environments per team
4343
- You want separate views per team in the Airflow UI
4444
- You want to minimize operational overhead or cost by sharing a single Airflow deployment

airflow-core/docs/howto/set-config.rst

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -157,15 +157,20 @@ the example below.
157157
See :doc:`/administration-and-deployment/modules_management` for details on how Python and Airflow manage modules.
158158

159159
.. note::
160-
Use the same configuration across all the Airflow components. While each component
161-
does not require all, some configurations need to be same otherwise they would not
162-
work as expected. A good example for that is :ref:`secret_key<config:api__secret_key>` which
163-
should be same on the Webserver and Worker to allow Webserver to fetch logs from Worker.
164-
165-
The webserver key is also used to authorize requests to Celery workers when logs are retrieved. The token
166-
generated using the secret key has a short expiry time though - make sure that time on ALL the machines
167-
that you run Airflow components on is synchronized (for example using ntpd) otherwise you might get
168-
"forbidden" errors when the logs are accessed.
160+
Different Airflow components may require different configuration parameters. For improved
161+
security, restrict sensitive configuration to only the components that need it rather than
162+
sharing all configuration across all components. Some values must be consistent across specific
163+
components — for example, the JWT signing key must match between components that generate and
164+
validate tokens. However, sensitive parameters such as database connection strings, Fernet keys,
165+
and secrets backend credentials should only be provided to components that actually need them.
166+
167+
For security-sensitive deployments, pass configuration values via environment variables scoped
168+
to individual components. See :doc:`/security/security_model` for detailed guidance on
169+
restricting configuration parameters.
170+
171+
Make sure that time on ALL the machines that you run Airflow components on is synchronized
172+
(for example using ntpd) otherwise you might get "forbidden" errors when the logs are
173+
accessed or API calls are made.
169174

170175
.. _set-config:configuring-local-settings:
171176

airflow-core/docs/installation/upgrading_to_airflow3.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ In Airflow 3, direct metadata database access from task code is now restricted.
5454

5555
- **No Direct Database Access**: Task code can no longer directly import and use Airflow database sessions or models.
5656
- **API-Based Resource Access**: All runtime interactions (state transitions, heartbeats, XComs, and resource fetching) are handled through a dedicated Task Execution API.
57-
- **Enhanced Security**: This ensures isolation and security by preventing malicious task code from accessing or modifying the Airflow metadata database.
57+
- **Enhanced Security**: This improves isolation and security by preventing worker task code from directly accessing or modifying the Airflow metadata database. Note that Dag author code potentially still executes with direct database access in the Dag File Processor and Triggerer — see :doc:`/security/security_model` for details.
5858
- **Stable Interface**: The Task SDK provides a stable, forward-compatible interface for accessing Airflow resources without direct database dependencies.
5959

6060
Step 1: Take care of prerequisites

airflow-core/docs/public-airflow-interface.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -548,9 +548,10 @@ but in Airflow they are not parts of the Public Interface and might change any t
548548
internal implementation detail and you should not assume they will be maintained
549549
in a backwards-compatible way.
550550

551-
**Direct metadata database access from task code is no longer allowed**.
552-
Task code cannot directly access the metadata database to query Dag state, task history,
553-
or Dag runs. Instead, use one of the following alternatives:
551+
**Direct metadata database access from code authored by Dag Authors is no longer allowed**.
552+
The code authored by Dag Authors cannot directly access the metadata database to query Dag state, task history,
553+
or Dag runs — workers communicate exclusively through the Execution API. Instead, use one
554+
of the following alternatives:
554555

555556
* **Task Context**: Use :func:`~airflow.sdk.get_current_context` to access task instance
556557
information and methods like :meth:`~airflow.sdk.types.RuntimeTaskInstanceProtocol.get_dr_count`,

0 commit comments

Comments
 (0)