You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Docs: Add JWT authentication docs and strengthen security model (#64760)
* Docs: Add JWT authentication docs and strengthen security model
Add comprehensive JWT token authentication documentation covering both
the REST API and Execution API flows, including token structure, timings,
refresh mechanisms, and the DFP/Triggerer in-process bypass.
Update the security model to:
- Document current isolation limitations (DFP/Triggerer DB access,
shared Execution API resources, multi-team not guaranteeing task-level
isolation)
- Add deployment hardening guidance (per-component config, asymmetric
JWT keys, env vars with PR_SET_DUMPABLE protection)
- Add "What is NOT a security vulnerability" section covering all
categories from the security team's response policies
- Fix contradicting statements across docs that overstated isolation
guarantees or recommended sharing all config across components
Update AGENTS.md with security model awareness so AI agents performing
security research distinguish intentional design choices from actual
vulnerabilities.
* Fix spelling errors and use 'potentially' for DFP/Triggerer access
- Add dumpable, sandboxing, unsanitized, XSS to spelling wordlist
- Use 'potentially' consistently when describing Dag File Processor
and Triggerer database access and JWT authentication bypass, since
these are capabilities that Dag author code could exploit rather
than guaranteed behaviors of normal operation
* Add prek hook to validate security doc constants against config.yml
New hook `check-security-doc-constants` validates that:
- [section] option references in security RST files match config.yml
- AIRFLOW__X__Y env var references correspond to real config options
- Default values in doc tables match config.yml defaults
- Sensitive config variables are listed (warning, not error, since
the list is documented as non-exhaustive)
Loads both airflow-core config.yml and provider.yaml files to cover
all config sections (including celery, sentry, workers, etc.).
Runs automatically when config.yml or security RST docs are modified.
* Expand sensitive vars to full list with component mapping and auto-update
Update security_model.rst sensitive config variables section:
- List ALL sensitive vars from config.yml and provider.yaml files
- Core vars organized in a table with "Needed by" column mapping each
var to the components that require it (API Server, Scheduler, Workers,
Dag File Processor, Triggerer)
- Provider vars in a separate table noting they should only be set where
the provider functionality is needed
- Tables are auto-generated between AUTOGENERATED markers
Update prek hook to auto-update the sensitive var tables:
- Reads config.yml and all provider.yaml files
- Generates RST list-table content for core and provider sensitive vars
- Replaces content between markers on each run
- Warns when new sensitive vars need component mapping added to the hook
- Validates [section] option and AIRFLOW__X__Y references against config
- Skips autogenerated sections when checking env var references
* Clarify software guards vs intentional access in DFP/Triggerer
Address issues raised in security discussion about the gap between
Airflow's isolation promises and reality:
- Clearly distinguish software guards (prevent accidental DB access)
from the inability to prevent intentional malicious access by code
running as the same Unix user as the parent process
- Document the specific mechanisms: /proc/PID/environ, config files,
_CMD commands, secrets manager credential reuse
- Clarify that worker isolation is genuine (no DB credentials at all)
while DFP/Triggerer isolation is software-level only
- Add Unix user impersonation as a deployment hardening measure
- Document strategic (API-based DFP/Triggerer) and tactical (user
impersonation) planned improvements
- Add warning about sensitive config leakage through task logs
- Add guidance to restrict task log access
* Docs: Improve security docs wording, extract workload isolation, recommend DagBundle
- Reword DFP/Triggerer descriptions to clarify software guards vs intentional bypass
- Extract workload isolation section from jwt_token_authentication into workload.rst
- Recommend Dag Bundle mechanism (GitDagBundle) for DAG synchronization
- Fix typo in public-airflow-interface.rst and broken backtick in jwt_token_authentication.rst
- Update cross-references between security docs
Copy file name to clipboardExpand all lines: .github/instructions/code-review.instructions.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,7 @@ Use these rules when reviewing pull requests to the Apache Airflow repository.
11
11
12
12
-**Scheduler must never run user code.** It only processes serialized Dags. Flag any scheduler-path code that deserializes or executes Dag/task code.
13
13
-**Flag any task execution code that accesses the metadata DB directly** instead of through the Execution API (`/execution` endpoints).
14
-
-**Flag any code in Dag Processor or Triggerer that breaks process isolation** — these components run user code in isolated processes.
14
+
-**Flag any code in Dag Processor or Triggerer that breaks process isolation** — these components run user code in separate processes from the Scheduler and API Server, but note that they potentially have direct metadata database access and potentially bypass JWT authentication via in-process Execution API transport. This is an intentional design choice documented in the security model, not a security vulnerability.
15
15
-**Flag any provider importing core internals** like `SUPERVISOR_COMMS` or task-runner plumbing. Providers interact through the public SDK and execution API only.
1. Users author Dags with the Task SDK (`airflow.sdk`).
70
-
2. Dag Processor parses Dag files in isolated processes and stores serialized Dags in the metadata DB.
70
+
2. Dag File Processor parses Dag files in separate processes and stores serialized Dags in the metadata DB. Software guards prevent individual parsing processes from accessing the database directly and enforce use of the Execution API, but these guards do not protect against intentional bypassing by malicious or misconfigured code.
71
71
3. Scheduler reads serialized Dags — **never runs user code** — and creates Dag runs / task instances.
72
-
4. Workers execute tasks via Task SDK and communicate with the API server through the Execution API — **never access the metadata DB directly**.
72
+
4. Workers execute tasks via Task SDK and communicate with the API server through the Execution API — **never access the metadata DB directly**. Each task receives a short-lived JWT token scoped to its task instance ID.
73
73
5. API Server serves the React UI and handles all client-database interactions.
74
-
6. Triggerer evaluates deferred tasks/sensors in isolated processes.
74
+
6. Triggerer evaluates deferred tasks/sensors in separate processes. Like the Dag File Processor, software guards steer it through the Execution API rather than direct database access, but these guards do not protect against intentional bypassing by malicious or misconfigured code.
75
75
7. Shared libraries that are symbolically linked to different Python distributions are in `shared` folder.
76
76
8. Airflow uses `uv workspace` feature to keep all the distributions sharing dependencies and venv
77
77
9. Each of the distributions should declare other needed distributions: `uv --project <FOLDER> sync` command acts on the selected project in the monorepo with only dependencies that it has
78
78
79
+
## Security Model
80
+
81
+
When reviewing code, writing security documentation, or performing security research, keep in
82
+
mind the following aspects of Airflow's security model. The authoritative reference is
1.**Actual vulnerabilities** — code that violates the documented security model (e.g., a worker
92
+
gaining database access it shouldn't have, a Scheduler executing user code, an unauthenticated
93
+
user accessing protected endpoints).
94
+
2.**Known limitations** — documented gaps where the current implementation doesn't provide full
95
+
isolation (e.g., DFP/Triggerer database access, shared Execution API resources, multi-team
96
+
not enforcing task-level isolation). These are tracked for improvement in future versions and
97
+
should not be reported as new findings.
98
+
3.**Deployment hardening opportunities** — measures a Deployment Manager can take to improve
99
+
isolation beyond what Airflow enforces natively (e.g., per-component configuration, asymmetric
100
+
JWT keys, network policies). These belong in deployment guidance, not as code-level issues.
101
+
79
102
# Shared libraries
80
103
81
104
- shared libraries provide implementation of some common utilities like logging, configuration where the code should be reused in different distributions (potentially in different versions)
Copy file name to clipboardExpand all lines: airflow-core/docs/core-concepts/multi-team.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -38,7 +38,7 @@ Multi-Team mode is designed for medium to large organizations that typically hav
38
38
**Use Multi-Team mode when:**
39
39
40
40
- You have many teams that need to share Airflow infrastructure
41
-
- You need resource isolation (Variables, Connections, Secrets, etc) between teams
41
+
- You need resource isolation (Variables, Connections, Secrets, etc) between teams at the UI and API level (see :doc:`/security/security_model` for task-level isolation limitations)
42
42
- You want separate execution environments per team
43
43
- You want separate views per team in the Airflow UI
44
44
- You want to minimize operational overhead or cost by sharing a single Airflow deployment
Copy file name to clipboardExpand all lines: airflow-core/docs/installation/upgrading_to_airflow3.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -54,7 +54,7 @@ In Airflow 3, direct metadata database access from task code is now restricted.
54
54
55
55
- **No Direct Database Access**: Task code can no longer directly import and use Airflow database sessions or models.
56
56
- **API-Based Resource Access**: All runtime interactions (state transitions, heartbeats, XComs, and resource fetching) are handled through a dedicated Task Execution API.
57
-
- **Enhanced Security**: This ensures isolation and security by preventing malicious task code from accessing or modifying the Airflow metadata database.
57
+
- **Enhanced Security**: This improves isolation and security by preventing worker task code from directly accessing or modifying the Airflow metadata database. Note that Dag author code potentially still executes with direct database access in the Dag File Processor and Triggerer — see :doc:`/security/security_model` for details.
58
58
- **Stable Interface**: The Task SDK provides a stable, forward-compatible interface for accessing Airflow resources without direct database dependencies.
0 commit comments