feat(extensions): Tier 3 persistent key-value storage for extensions#39168
feat(extensions): Tier 3 persistent key-value storage for extensions#39168
Conversation
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
…nts, pagination, encryption
Implements the Extension Storage Layer (Tier 3) from the architecture proposal:
- New `extension_storage` table and SQLAlchemy model with three storage scopes:
global (shared), user-scoped (per-user), resource-linked (tied to a Superset resource)
- `ExtensionStorageDAO` with upsert semantics, Fernet encryption, and scope isolation
- REST API under `/api/v1/extensions/<publisher>/<name>/storage/persistent/`:
GET/PUT/DELETE for global/<key>, user/<key>, resources/<type>/<uuid>/<key>
GET list endpoints (with ?page & ?page_size pagination) for /global/, /user/, /resources/<type>/<uuid>/
- Alembic migration creating `extension_storage` table
- Unit tests covering all three scopes, upsert, delete, pagination, and encryption
(encrypted roundtrip, Fernet token verification, cross-scope coexistence)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ator - allow_browser_login = True so web UI can authenticate via session cookie + CSRF token instead of requiring a JWT Bearer header - Replace db.session.commit() with @transaction() on all write endpoints to satisfy pylint W9001 and get automatic rollback on error Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…E_ENCRYPTION_KEYS - _fernet() now returns MultiFernet built from EXTENSION_STORAGE_ENCRYPTION_KEYS (falls back to SECRET_KEY). New encryptions use the first key; all keys are tried on decryption, giving zero-downtime rotation. - New CLI command: superset rotate-extension-storage-keys Re-encrypts every is_encrypted row with the current (first) key via MultiFernet.rotate(). Run after prepending a new key and restarting. - EXTENSION_STORAGE_ENCRYPTION_KEYS added to config.py with rotation docs. - Re-enable is_encrypted=true in notebook saves now that rotation is supported. - Tests: key_rotation_roundtrip and multi_fernet_decrypts_old_key. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Rename migration file to match the table it creates (extension_storage) - Replace unsafe assert patterns in API with proper None guards - Add direct import of ExtensionStorage into api.py (removes runtime import) - Add missing requestBody OpenAPI spec to set_user and set_resource - Add _key_to_fernet docstring - Add test_list_resource_filtered_by_category (parity with list_global/list_user) - Add docs/developer_docs/extensions/extension-points/persistent-storage.md covering all three scopes, REST API, frontend/backend usage examples, encryption, key rotation, and data model table Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0bdd575 to
02dbaeb
Compare
- Remove language_version: python3.10 from mypy hook — pre-commit would fail for developers with only Python 3.11/3.12 installed - Restore PT004 to ruff ignore list — removing it would flag every existing pytest fixture across the codebase python_version = "3.10" in pyproject.toml [tool.mypy] is intentional (sets mypy's type-checking target to Superset's minimum Python version). The PYTHONPATH fix for the pylint hook is also intentional. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #39168 +/- ##
==========================================
- Coverage 64.45% 64.40% -0.05%
==========================================
Files 2541 2544 +3
Lines 131662 131966 +304
Branches 30517 30540 +23
==========================================
+ Hits 84863 84997 +134
- Misses 45333 45503 +170
Partials 1466 1466
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…e5f6) The placeholder down_revision "c3d4e5f6a7b8" does not exist in the Alembic revision chain, causing KeyError on superset db upgrade and cascading failures across all backend test suites and E2E tests. Points to the actual master HEAD migration (add_granular_export_permissions). Also restores language_version: python3.10 for the mypy pre-commit hook. Without this, pre-commit builds its mypy virtualenv with an older Python, which then fails on the match statements present in this codebase. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CI runners have Python 3.11, not 3.10 — specifying language_version: python3.10 causes pre-commit to fail with CalledProcessError when building the mypy virtualenv. Master doesn't set this; omitting it lets pre-commit use whatever Python 3.x is available, which handles both local (3.10) and CI (3.11). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| entries = ( | ||
| db.session.query(ExtensionStorage) | ||
| .filter(ExtensionStorage.is_encrypted.is_(True)) | ||
| .all() | ||
| ) | ||
| if not entries: | ||
| click.secho("No encrypted entries found.", fg="yellow") | ||
| return | ||
| failed = 0 | ||
| for entry in entries: | ||
| try: | ||
| entry.value = fernet.rotate(entry.value) | ||
| except Exception as exc: # pylint: disable=broad-except | ||
| click.secho( | ||
| f"Failed to rotate key={entry.key} ext={entry.extension_id}: {exc}", | ||
| fg="red", | ||
| err=True, | ||
| ) | ||
| failed += 1 | ||
| rotated = len(entries) - failed |
There was a problem hiding this comment.
Suggestion: Loading every encrypted row with .all() materializes the full table (including large binary payloads) in memory at once, which can exhaust memory on real datasets. Iterate in batches (and avoid retaining all ORM instances) so key rotation remains safe for large tables. [possible bug]
Severity Level: Major ⚠️
- ❌ Key rotation CLI may OOM on large extension_storage tables.
- ⚠️ Operators may be unable to retire old encryption keys.
- ⚠️ Encrypted extension data may remain unrotated indefinitely.| entries = ( | |
| db.session.query(ExtensionStorage) | |
| .filter(ExtensionStorage.is_encrypted.is_(True)) | |
| .all() | |
| ) | |
| if not entries: | |
| click.secho("No encrypted entries found.", fg="yellow") | |
| return | |
| failed = 0 | |
| for entry in entries: | |
| try: | |
| entry.value = fernet.rotate(entry.value) | |
| except Exception as exc: # pylint: disable=broad-except | |
| click.secho( | |
| f"Failed to rotate key={entry.key} ext={entry.extension_id}: {exc}", | |
| fg="red", | |
| err=True, | |
| ) | |
| failed += 1 | |
| rotated = len(entries) - failed | |
| encrypted_count = ( | |
| db.session.query(ExtensionStorage.id) | |
| .filter(ExtensionStorage.is_encrypted.is_(True)) | |
| .count() | |
| ) | |
| if encrypted_count == 0: | |
| click.secho("No encrypted entries found.", fg="yellow") | |
| return | |
| failed = 0 | |
| rotated = 0 | |
| for entry in ( | |
| db.session.query(ExtensionStorage) | |
| .filter(ExtensionStorage.is_encrypted.is_(True)) | |
| .yield_per(100) | |
| ): | |
| try: | |
| entry.value = fernet.rotate(entry.value) | |
| db.session.flush() | |
| db.session.expunge(entry) | |
| rotated += 1 | |
| except Exception as exc: # pylint: disable=broad-except | |
| click.secho( | |
| f"Failed to rotate key={entry.key} ext={entry.extension_id}: {exc}", | |
| fg="red", | |
| err=True, | |
| ) | |
| failed += 1 |
Steps of Reproduction ✅
1. Start a Superset instance with the extension storage model defined at
`superset/extension_storage/models.py:40-83` (the `ExtensionStorage` table with `value =
Column(LargeBinary(EXTENSION_STORAGE_MAX_SIZE))` and `EXTENSION_STORAGE_MAX_SIZE` set to
~16 MB at line 36).
2. Populate many encrypted rows in `extension_storage` by calling
`ExtensionStorageDAO.set(..., is_encrypted=True)` from
`superset/extension_storage/daos.py:142-188` in a loop, so that the combined size of
`value` blobs for encrypted rows is large relative to available RAM (for example,
thousands of rows with multi‑MB payloads).
3. Configure key rotation by setting `EXTENSION_STORAGE_ENCRYPTION_KEYS` in the Superset
config (used by `_fernet()` at `superset/extension_storage/daos.py:45-57`), then run the
CLI command `superset rotate-extension-storage-keys`, which invokes
`rotate_extension_storage_keys()` defined at `superset/cli/update.py:104-149`.
4. During execution, `rotate_extension_storage_keys()` issues
`db.session.query(ExtensionStorage).filter(ExtensionStorage.is_encrypted.is_(True)).all()`
at `superset/cli/update.py:121-124`, materializing every encrypted row (including all
large `value` blobs) into the `entries` list at once; on a sufficiently large table this
can cause excessive memory usage or `MemoryError`/process OOM, preventing the rotation
from completing.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/cli/update.py
**Line:** 121:140
**Comment:**
*Possible Bug: Loading every encrypted row with `.all()` materializes the full table (including large binary payloads) in memory at once, which can exhaust memory on real datasets. Iterate in batches (and avoid retaining all ORM instances) so key rotation remains safe for large tables.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| appbuilder.add_api(SqlLabPermalinkRestApi) | ||
| appbuilder.add_api(LogRestApi) | ||
|
|
||
| from superset.extension_storage.api import ExtensionStorageRestApi |
There was a problem hiding this comment.
Suggestion: The extension storage API is registered unconditionally, which bypasses the ENABLE_EXTENSIONS feature flag and exposes extension endpoints even when extensions are configured as disabled. Register this API only when the extensions feature flag is enabled so runtime behavior matches the documented security/feature contract. [logic error]
Severity Level: Major ⚠️
- ⚠️ Extensions storage API exposed when ENABLE_EXTENSIONS is False.
- ⚠️ Feature flag semantics for /api/v1/extensions become inconsistent.| from superset.extension_storage.api import ExtensionStorageRestApi | |
| if feature_flag_manager.is_feature_enabled("ENABLE_EXTENSIONS"): |
Steps of Reproduction ✅
1. Configure and start Superset with the default feature flags, where
`"ENABLE_EXTENSIONS": False` as defined in `superset/config.py:576-13` (comment at 576-10
and value at 576-12).
2. During app startup, `SupersetAppInitializer.init_views` in
`superset/initialization/__init__.py:1-55` executes the block that unconditionally imports
and registers `ExtensionStorageRestApi` via `from superset.extension_storage.api import
ExtensionStorageRestApi` and `appbuilder.add_api(ExtensionStorageRestApi)` at
`superset/initialization/__init__.py:37-39` (corresponding to PR diff lines 276–278).
3. In the same `init_views` method, the main extensions API is only registered when the
feature flag is enabled: `if
feature_flag_manager.is_feature_enabled("ENABLE_EXTENSIONS"):` followed by importing and
adding `ExtensionsRestApi` at `superset/initialization/__init__.py:41-44`, so with the
flag disabled only `ExtensionStorageRestApi` remains mounted under `/api/v1/extensions`.
4. With the server running and extensions feature flag still disabled, send an
authenticated request (session or JWT protected by `@protect()`) to `GET
/api/v1/extensions/<publisher>/<name>/storage/persistent/global/`, which is exposed by
`ExtensionStorageRestApi.list_global` at `superset/extension_storage/api.py:93-103`;
observe that the endpoint is fully functional even though `"ENABLE_EXTENSIONS"` is False,
demonstrating that extension-specific storage endpoints bypass the feature flag gating
applied to other extension APIs.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/initialization/__init__.py
**Line:** 276:276
**Comment:**
*Logic Error: The extension storage API is registered unconditionally, which bypasses the `ENABLE_EXTENSIONS` feature flag and exposes extension endpoints even when extensions are configured as disabled. Register this API only when the extensions feature flag is enabled so runtime behavior matches the documented security/feature contract.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| body = request.get_json(force=True) or {} | ||
| raw_value: str | bytes = body.get("value", "") | ||
| value_bytes = raw_value.encode() if isinstance(raw_value, str) else raw_value |
There was a problem hiding this comment.
Suggestion: request.get_json(force=True) can return a non-dict JSON value (for example, a list), and the next .get(...) calls will raise AttributeError, producing a 500 instead of a client error. Validate that the parsed body is a JSON object before reading fields. [possible bug]
Severity Level: Major ⚠️
- ❌ PUT /storage/persistent/global/<key> can 500 on bad JSON.
- ❌ PUT /storage/persistent/user/<key> can 500 on bad JSON.
- ❌ PUT /storage/persistent/resources/... can 500 on bad JSON.
- ⚠️ Clients receive 500 instead of clear 4xx validation errors.| body = request.get_json(force=True) or {} | |
| raw_value: str | bytes = body.get("value", "") | |
| value_bytes = raw_value.encode() if isinstance(raw_value, str) else raw_value | |
| if not isinstance(body, dict): | |
| return self.response(400, message="Request body must be a JSON object") |
Steps of Reproduction ✅
1. Start Superset with this PR code so that `ExtensionStorageRestApi` in
`superset/extension_storage/api.py` is registered (class definition at lines 93–99).
2. Perform an HTTP PUT to the global storage endpoint
`/api/v1/extensions/<publisher>/<name>/storage/persistent/global/<key>` (handler
`set_global` at lines 188–241) with `Content-Type: application/json` and a JSON array
body, e.g. `["not-a-dict"]`.
3. Inside `set_global`, `body = request.get_json(force=True) or {}` (line 229) will set
`body` to the list `["not-a-dict"]` because `request.get_json(force=True)` returns that
list and the `or {}` fallback is not used.
4. The next line `raw_value: str | bytes = body.get("value", "")` (line 230) calls `.get`
on a list, raising `AttributeError: 'list' object has no attribute 'get'`, which Flask
propagates as a 500 Internal Server Error instead of a 4xx client error. The same pattern
exists in `set_user` (lines 417–419) and `set_resource` (lines 651–653), so the issue
reproduces identically on `/user/<key>` and
`/resources/<resource_type>/<resource_uuid>/<key>` PUT endpoints.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/extension_storage/api.py
**Line:** 229:231
**Comment:**
*Possible Bug: `request.get_json(force=True)` can return a non-dict JSON value (for example, a list), and the next `.get(...)` calls will raise `AttributeError`, producing a 500 instead of a client error. Validate that the parsed body is a JSON object before reading fields.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| value=value_bytes, | ||
| value_type=body.get("value_type", _MIME_DEFAULT), | ||
| category=body.get("category"), | ||
| description=body.get("description"), | ||
| is_encrypted=bool(body.get("is_encrypted", False)), | ||
| ) |
There was a problem hiding this comment.
Suggestion: value_type can be explicitly sent as null, and body.get("value_type", _MIME_DEFAULT) will return None, causing a DB integrity failure on a non-nullable column. Use a fallback that also handles None. [logic error]
Severity Level: Major ⚠️
- ❌ PUT global storage can fail on explicit null MIME type.
- ❌ PUT user storage can fail on explicit null MIME type.
- ❌ PUT resource storage can fail on explicit null MIME type.
- ⚠️ Clients see server error instead of default MIME type.| value=value_bytes, | |
| value_type=body.get("value_type", _MIME_DEFAULT), | |
| category=body.get("category"), | |
| description=body.get("description"), | |
| is_encrypted=bool(body.get("is_encrypted", False)), | |
| ) | |
| value_type=body.get("value_type") or _MIME_DEFAULT, |
Steps of Reproduction ✅
1. Start Superset with this PR code so `ExtensionStorageRestApi` in
`superset/extension_storage/api.py` is active and `ExtensionStorageDAO.set` is callable
from the API.
2. Send an HTTP PUT to
`/api/v1/extensions/<publisher>/<name>/storage/persistent/global/<key>` (handler
`set_global` at lines 188–241) with JSON body `{"value": "abc", "value_type": null}` and
`Content-Type: application/json`.
3. In `set_global`, `body = request.get_json(force=True) or {}` (line 229) yields
`{"value": "abc", "value_type": None}`; `value_type=body.get("value_type", _MIME_DEFAULT)`
(line 236) assigns `None` to `value_type` because `get` finds the key and returns its
`None` value instead of the default.
4. That `None` is passed into `ExtensionStorageDAO.set(..., value_type=None, ...)` (lines
232–240); when the DAO persists to the `ExtensionStorage` model (whose `value_type` column
is declared as non-nullable in the new table in this PR), the ORM/database layer raises an
integrity or type error at commit time, causing the request to fail with a 500 instead of
cleanly defaulting to the fallback MIME type.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/extension_storage/api.py
**Line:** 235:240
**Comment:**
*Logic Error: `value_type` can be explicitly sent as `null`, and `body.get("value_type", _MIME_DEFAULT)` will return `None`, causing a DB integrity failure on a non-nullable column. Use a fallback that also handles `None`.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| "ix_ext_storage_lookup", | ||
| "extension_storage", | ||
| ["extension_id", "user_fk", "resource_type", "resource_uuid", "key"], | ||
| ) |
There was a problem hiding this comment.
Suggestion: The scope lookup index is non-unique, so concurrent writes can insert duplicate rows for the same logical key. Since reads and deletes use .first(), duplicate rows make results nondeterministic and can leave stale duplicates behind. Make the lookup index unique while normalizing nullable scope fields so NULL values don't bypass uniqueness. [race condition]
Severity Level: Major ⚠️
- ❌ Global storage PUT may create duplicate logical keys.
- ❌ User storage PUT may return stale values intermittently.
- ❌ Resource storage GET may read unpredictable version of value.
- ⚠️ Delete endpoints may leave undeletable ghost entries behind.
- ⚠️ Extension key-value semantics broken under concurrent writes.| ) | |
| [ | |
| "extension_id", | |
| sa.text("COALESCE(user_fk, -1)"), | |
| sa.text("COALESCE(resource_type, '')"), | |
| sa.text("COALESCE(resource_uuid, '')"), | |
| "key", | |
| ], | |
| unique=True, |
Steps of Reproduction ✅
1. Apply migration
`superset/migrations/versions/2026-03-20_12-00_e5f6a7b8c9d0_add_extension_storage_table.py`,
which creates table `extension_storage` and a non-unique composite index
`ix_ext_storage_lookup` on `(extension_id, user_fk, resource_type, resource_uuid, key)` at
lines 45–49.
2. Start Superset so the REST API in `superset/extension_storage/api.py` is registered;
note that `set_user` at lines 193–191 and `set_global` at lines 99–152 call
`ExtensionStorageDAO.set(...)` inside a `@transaction()`-wrapped request handler, creating
separate DB transactions per HTTP PUT to
`/api/v1/extensions/<publisher>/<name>/storage/persistent/user/<key>` or `/global/<key>`.
3. Trigger two concurrent PUT requests for the same logical scoped key (for example, same
`publisher`, `name`, and `key` to `set_global`, or same `publisher`, `name`, `key`, and
authenticated user to `set_user`), so that both requests enter `ExtensionStorageDAO.set`
in `superset/extension_storage/daos.py` lines 142–190; each transaction executes a SELECT
with `.first()` at lines 158–167, sees no existing row, and then independently executes an
INSERT and `db.session.flush()`, resulting in two rows with identical `(extension_id,
user_fk, resource_type, resource_uuid, key)` but different `id`.
4. Later reads and deletes for that key become nondeterministic: `ExtensionStorageDAO.get`
and `get_value` at lines 100–138 and `delete` at lines 195–218 all use `.first()` without
`ORDER BY`, so GET endpoints (`get_global`, `get_user`, `get_resource` in
`superset/extension_storage/api.py` lines 62–97, 242–260, 38–95) may return either the
stale or the latest value arbitrarily, and DELETE endpoints only remove one of the
duplicate rows, leaving stale data that continues to appear in list endpoints
(`list_global`, `list_user`, `list_resource`), even though the extension believes the key
was uniquely upserted.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/migrations/versions/2026-03-20_12-00_e5f6a7b8c9d0_add_extension_storage_table.py
**Line:** 78:78
**Comment:**
*Race Condition: The scope lookup index is non-unique, so concurrent writes can insert duplicate rows for the same logical key. Since reads and deletes use `.first()`, duplicate rows make results nondeterministic and can leave stale duplicates behind. Make the lookup index unique while normalizing nullable scope fields so `NULL` values don't bypass uniqueness.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| raw_keys: list[str | bytes] = current_app.config.get( | ||
| "EXTENSION_STORAGE_ENCRYPTION_KEYS" | ||
| ) or [current_app.config["SECRET_KEY"]] |
There was a problem hiding this comment.
Suggestion: EXTENSION_STORAGE_ENCRYPTION_KEYS is assumed to be a list, but if operators configure it as a single string, the current code iterates character-by-character and derives one Fernet key per character. That silently encrypts data with the first character-derived key, weakening security and causing decryption/rotation surprises when configuration is later corrected. Normalize string/bytes config values to a single-item list before building MultiFernet. [security]
Severity Level: Critical 🚨
- ❌ Encrypted extension storage reads 500 with misconfigured key type.
- ❌ Key rotation CLI fails decrypting data after config correction.
- ⚠️ Single-character key strings drastically reduce encryption key entropy.| raw_keys: list[str | bytes] = current_app.config.get( | |
| "EXTENSION_STORAGE_ENCRYPTION_KEYS" | |
| ) or [current_app.config["SECRET_KEY"]] | |
| raw_keys = current_app.config.get("EXTENSION_STORAGE_ENCRYPTION_KEYS") | |
| if not raw_keys: | |
| raw_keys = [current_app.config["SECRET_KEY"]] | |
| elif isinstance(raw_keys, (str, bytes)): | |
| raw_keys = [raw_keys] |
Steps of Reproduction ✅
1. Configure Superset so that `current_app.config["EXTENSION_STORAGE_ENCRYPTION_KEYS"]` is
a single string instead of a list (e.g. set it to `"mysupersecret"` in your deployment
config, despite the list type hint in `superset/config.py:242`).
2. Start Superset and call the encrypted global storage endpoint `PUT
/api/v1/extensions/<publisher>/<name>/storage/persistent/global/<key>` implemented in
`superset/extension_storage/api.py:188-241`, with JSON body
`{"value":"secret","is_encrypted":true}` so that `ExtensionStorageDAO.set(...,
is_encrypted=True)` at `superset/extension_storage/daos.py:142-156` is invoked.
3. During this request `_fernet()` at `superset/extension_storage/daos.py:45-57` reads the
string config value into `raw_keys`, iterates over it character-by-character in `[
_key_to_fernet(k) for k in raw_keys ]`, and encrypts the stored value with a Fernet key
derived only from the first single-character "key", producing ciphertext that can only be
decrypted with that unintended character-derived key.
4. Later, correct the configuration to a proper list (e.g.
`EXTENSION_STORAGE_ENCRYPTION_KEYS = ["mysupersecret"]`), restart Superset, and call `GET
/api/v1/extensions/<publisher>/<name>/storage/persistent/global/<key>` handled by
`get_global()` in `superset/extension_storage/api.py:151-187`;
`ExtensionStorageDAO.get_value()` at `superset/extension_storage/daos.py:122-138` now
builds a different `MultiFernet` without the old character-derived key and
`decrypt(entry.value)` raises `InvalidToken`, causing a 500 error when reading previously
written encrypted entries and similarly breaking the `rotate_extension_storage_keys` CLI
in `superset/cli/update.py:15-31`.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/extension_storage/daos.py
**Line:** 54:56
**Comment:**
*Security: `EXTENSION_STORAGE_ENCRYPTION_KEYS` is assumed to be a list, but if operators configure it as a single string, the current code iterates character-by-character and derives one Fernet key per character. That silently encrypts data with the first character-derived key, weakening security and causing decryption/rotation surprises when configuration is later corrected. Normalize string/bytes config values to a single-item list before building `MultiFernet`.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| ) | ||
|
|
||
| __table_args__ = ( | ||
| # Composite index covering all lookup dimensions |
There was a problem hiding this comment.
Suggestion: The model does not enforce the documented per-scope key uniqueness at the database level, so concurrent set operations can insert duplicate rows for the same logical key and make reads/deletes nondeterministic (.first() may return any duplicate). Add unique partial indexes per scope so each logical key can exist only once. [race condition]
Severity Level: Major ⚠️
- ❌ Duplicate extension_storage rows cause inconsistent GET /storage/persistent responses.
- ⚠️ DELETE endpoints may leave duplicate ghost rows for same key.
- ⚠️ Extension authors see nondeterministic state when writing concurrently.| # Composite index covering all lookup dimensions | |
| # Enforce documented key uniqueness per scope. | |
| Index( | |
| "uq_ext_storage_global", | |
| "extension_id", | |
| "key", | |
| unique=True, | |
| postgresql_where=( | |
| user_fk.is_(None) | |
| & resource_type.is_(None) | |
| & resource_uuid.is_(None) | |
| ), | |
| sqlite_where=( | |
| user_fk.is_(None) | |
| & resource_type.is_(None) | |
| & resource_uuid.is_(None) | |
| ), | |
| ), | |
| Index( | |
| "uq_ext_storage_user", | |
| "extension_id", | |
| "key", | |
| "user_fk", | |
| unique=True, | |
| postgresql_where=( | |
| user_fk.is_not(None) | |
| & resource_type.is_(None) | |
| & resource_uuid.is_(None) | |
| ), | |
| sqlite_where=( | |
| user_fk.is_not(None) | |
| & resource_type.is_(None) | |
| & resource_uuid.is_(None) | |
| ), | |
| ), | |
| Index( | |
| "uq_ext_storage_resource", | |
| "extension_id", | |
| "key", | |
| "resource_type", | |
| "resource_uuid", | |
| unique=True, | |
| postgresql_where=( | |
| resource_type.is_not(None) & resource_uuid.is_not(None) | |
| ), | |
| sqlite_where=( | |
| resource_type.is_not(None) & resource_uuid.is_not(None) | |
| ), | |
| ), |
Steps of Reproduction ✅
1. Apply and run the Alembic migration
`superset/migrations/versions/2026-03-20_12-00_e5f6a7b8c9d0_add_extension_storage_table.py`
which creates the `extension_storage` table and only non-unique indexes
`ix_ext_storage_extension_id` and `ix_ext_storage_lookup` (lines 69–77), then start the
Superset webserver so that `ExtensionStorageRestApi` is registered in
`superset/initialization/__init__.py:260-29`.
2. Authenticate as a user and, following the documented frontend usage in
`docs/developer_docs/extensions/extension-points/persistent-storage.md:118-150`, issue two
concurrent HTTP `PUT` requests to the same user-scoped key, e.g. `PUT
/api/v1/extensions/acme/my-ext/storage/persistent/user/prefs`, from two separate
processes/threads, each with a different `"value"` in the JSON body.
3. Both requests are handled by `ExtensionStorageRestApi.set_user` in
`superset/extension_storage/api.py:372-71`, which is wrapped in `@transaction()` and calls
`ExtensionStorageDAO.set(...)` (API file lines 54–70) with identical `(extension_id, key,
user_fk)` arguments derived from `_extension_id()` and `get_user_id()`.
4. Inside `ExtensionStorageDAO.set` (`superset/extension_storage/daos.py:142-190`), each
transaction builds the same scope filter via `_scope_filter` (`daos.py:60-84`) and
executes `db.session.query(ExtensionStorage).filter(and_(...)).first()` (lines 158-167);
without any unique constraint on `(extension_id, user_fk, resource_type, resource_uuid,
key)` in the model (`superset/extension_storage/models.py:91-101`) or migration, both
concurrent SELECTs can see "no rows", each takes the `entry is None` branch, constructs a
new `ExtensionStorage` row (lines 176-187), and flushes, resulting in two rows with
identical logical key.
5. Subsequent reads via `GET /api/v1/extensions/acme/my-ext/storage/persistent/user/prefs`
hit `ExtensionStorageRestApi.get_user` (`api.py:331-369`), which calls
`ExtensionStorageDAO.get` (`daos.py:100-120`) and uses `.first()` without ordering; with
duplicate rows present, the database may return either row, making the returned value
nondeterministic, and `ExtensionStorageDAO.delete` (`daos.py:195-218`) will delete only
one of the duplicates, leaving "ghost" entries that continue to appear in `list_user`
(`daos.py:238-251`, exposed at `api.py:282-329`).Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/extension_storage/models.py
**Line:** 92:92
**Comment:**
*Race Condition: The model does not enforce the documented per-scope key uniqueness at the database level, so concurrent `set` operations can insert duplicate rows for the same logical key and make reads/deletes nondeterministic (`.first()` may return any duplicate). Add unique partial indexes per scope so each logical key can exist only once.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| # Simulate adding a NEW key at the front of EXTENSION_STORAGE_ENCRYPTION_KEYS. | ||
| new_raw = b"brand-new-secret-key-32-bytes!!!" | ||
| new_fernet = Fernet(base64.urlsafe_b64encode(hashlib.sha256(new_raw).digest())) | ||
| old_raw = current_app.config["SECRET_KEY"] |
There was a problem hiding this comment.
Suggestion: This test derives the old encryption key from SECRET_KEY, but ExtensionStorageDAO.set(..., is_encrypted=True) uses the first key from EXTENSION_STORAGE_ENCRYPTION_KEYS when configured. In environments where that list is set, multi.rotate(old_ciphertext) will be built with the wrong old key and fail with InvalidToken. Derive the old key using the same config precedence as the DAO. [logic error]
Severity Level: Major ⚠️
- ❌ Unit test test_key_rotation_roundtrip fails when encryption keys configured.
- ⚠️ Misaligned key precedence between DAO and unit tests.| old_raw = current_app.config["SECRET_KEY"] | |
| raw_keys = current_app.config.get("EXTENSION_STORAGE_ENCRYPTION_KEYS") or [ | |
| current_app.config["SECRET_KEY"] | |
| ] | |
| old_raw = raw_keys[0] |
Steps of Reproduction ✅
1. Ensure the unit-test Flask app fixture defined in `tests/unit_tests/conftest.py:5-38`
loads `superset.config` (which defines `EXTENSION_STORAGE_ENCRYPTION_KEYS` at
`superset/config.py:5-11`) and permits overriding config keys via `request.param` (lines
26-38).
2. Run `pytest
tests/unit_tests/daos/test_extension_storage_dao.py::test_key_rotation_roundtrip -k '' -q`
while parametrizing the `app` fixture (per the pattern in
`tests/unit_tests/conftest.py:30-35`) so that
`app.config["EXTENSION_STORAGE_ENCRYPTION_KEYS"] = [b"custom-key-A", b"custom-key-B"]`,
making this list non-empty and different from `SECRET_KEY`.
3. Inside `test_key_rotation_roundtrip` at
`tests/unit_tests/daos/test_extension_storage_dao.py:73-88`, the call
`ExtensionStorageDAO.set(EXT_A, "rot_key", payload, is_encrypted=True)` delegates to
`ExtensionStorageDAO.set` in `superset/extension_storage/daos.py:63-77`, which computes
`stored_value = _fernet().encrypt(value)` using `_fernet()` at
`superset/extension_storage/daos.py:14-26`; `_fernet()` builds `raw_keys =
current_app.config.get("EXTENSION_STORAGE_ENCRYPTION_KEYS") or
[current_app.config["SECRET_KEY"]]` (lines 23-25), so with the overridden config the
ciphertext is encrypted with `raw_keys[0]` (the first element of
`EXTENSION_STORAGE_ENCRYPTION_KEYS`), not with `SECRET_KEY`.
4. Later in the same test, at `tests/unit_tests/daos/test_extension_storage_dao.py:89-97`,
the code constructs `old_fernet` from `old_raw = current_app.config["SECRET_KEY"]` (lines
92-95) and then calls `entry.value = multi.rotate(old_ciphertext)` at line 100 where
`multi = MultiFernet([new_fernet, old_fernet])`. Because `old_ciphertext` was not produced
by `old_fernet` (it was produced from `EXTENSION_STORAGE_ENCRYPTION_KEYS[0]`),
`MultiFernet.rotate(old_ciphertext)` cannot decrypt with either `new_fernet` or
`old_fernet` and raises `cryptography.fernet.InvalidToken`, causing
`test_key_rotation_roundtrip` to fail. Updating the test to derive `old_raw` via the same
precedence as `_fernet()` (first key from `EXTENSION_STORAGE_ENCRYPTION_KEYS` or
`SECRET_KEY` when the list is empty) aligns the test with production behavior and prevents
this spurious failure.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/daos/test_extension_storage_dao.py
**Line:** 351:351
**Comment:**
*Logic Error: This test derives the old encryption key from `SECRET_KEY`, but `ExtensionStorageDAO.set(..., is_encrypted=True)` uses the first key from `EXTENSION_STORAGE_ENCRYPTION_KEYS` when configured. In environments where that list is set, `multi.rotate(old_ciphertext)` will be built with the wrong old key and fail with `InvalidToken`. Derive the old key using the same config precedence as the DAO.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| # Build a MultiFernet with a new key first, then the current one. | ||
| new_raw = b"another-brand-new-secret-key!!!!" | ||
| new_fernet = Fernet(base64.urlsafe_b64encode(hashlib.sha256(new_raw).digest())) | ||
| old_raw = current_app.config["SECRET_KEY"] |
There was a problem hiding this comment.
Suggestion: This test also assumes the original ciphertext was encrypted with SECRET_KEY, but the DAO encrypts with the first key in EXTENSION_STORAGE_ENCRYPTION_KEYS when present. That mismatch makes the MultiFernet setup incorrect and can cause decryption failures in configured environments. [logic error]
Severity Level: Major ⚠️
- ❌ test_multi_fernet_decrypts_old_key fails when encryption keys configured.
- ⚠️ Tests no longer reflect DAO's real key precedence.| old_raw = current_app.config["SECRET_KEY"] | |
| raw_keys = current_app.config.get("EXTENSION_STORAGE_ENCRYPTION_KEYS") or [ | |
| current_app.config["SECRET_KEY"] | |
| ] | |
| old_raw = raw_keys[0] |
Steps of Reproduction ✅
1. Use the same unit-test Flask app fixture from `tests/unit_tests/conftest.py:5-38` and
override its configuration (via the documented parametrization pattern at lines 26-38) so
that `app.config["EXTENSION_STORAGE_ENCRYPTION_KEYS"] = [b\"custom-key-A\",
b\"custom-key-B\"]`, making the extension encryption key list non-empty and not matching
`SECRET_KEY` from `superset/config.py:1-3`.
2. Run `pytest
tests/unit_tests/daos/test_extension_storage_dao.py::test_multi_fernet_decrypts_old_key -k
'' -q` so that `test_multi_fernet_decrypts_old_key` in
`tests/unit_tests/daos/test_extension_storage_dao.py:109-135` executes under this
configuration.
3. Within that test, the call `ExtensionStorageDAO.set(EXT_A, "multi_key", payload,
is_encrypted=True)` at line 119 invokes `ExtensionStorageDAO.set` in
`superset/extension_storage/daos.py:63-77`, which encrypts using `_fernet()` defined at
`superset/extension_storage/daos.py:14-26`; `_fernet()` builds a `MultiFernet` from
`raw_keys = current_app.config.get("EXTENSION_STORAGE_ENCRYPTION_KEYS") or
[current_app.config["SECRET_KEY"]]` (lines 23-25), so with the overridden config the
ciphertext in `entry.value` at line 121 is produced using the first configured extension
key, not `SECRET_KEY`.
4. Later in the same test at
`tests/unit_tests/daos/test_extension_storage_dao.py:124-132`, the code sets `old_raw =
current_app.config["SECRET_KEY"]` (lines 127-129) and derives `old_fernet` from that, then
constructs `multi = MultiFernet([new_fernet, old_fernet])` and calls
`multi.decrypt(entry.value)` at line 135. Because `entry.value` was not encrypted with
either `new_raw` or `SECRET_KEY` but with `EXTENSION_STORAGE_ENCRYPTION_KEYS[0]`,
`MultiFernet.decrypt` exhausts both keys and raises `cryptography.fernet.InvalidToken`,
causing `test_multi_fernet_decrypts_old_key` to fail despite the DAO working correctly.
Adjusting the test to obtain `old_raw` from `EXTENSION_STORAGE_ENCRYPTION_KEYS` (falling
back to `SECRET_KEY` when the list is empty) makes the test's key-selection logic
consistent with `_fernet()` and removes this spurious failure.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/daos/test_extension_storage_dao.py
**Line:** 386:386
**Comment:**
*Logic Error: This test also assumes the original ciphertext was encrypted with `SECRET_KEY`, but the DAO encrypts with the first key in `EXTENSION_STORAGE_ENCRYPTION_KEYS` when present. That mismatch makes the MultiFernet setup incorrect and can cause decryption failures in configured environments.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.There was a problem hiding this comment.
Pull request overview
Adds Tier-3 (persistent) key-value storage for extensions by introducing a dedicated extension_storage table, DAO helpers, and REST endpoints, with optional Fernet encryption and a CLI for key rotation.
Changes:
- Introduces
ExtensionStoragemodel + Alembic migration, plusExtensionStorageDAOread/write/list helpers. - Adds
ExtensionStorageRestApiendpoints for global/user/resource scopes and registers the API during app init. - Adds encryption config (
EXTENSION_STORAGE_ENCRYPTION_KEYS), a CLI command to rotate encryption keys, plus docs and DAO unit tests.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
UPDATING.md |
Documents the new Tier-3 extension storage feature and endpoints. |
superset/extension_storage/models.py |
New SQLAlchemy model for extension_storage. |
superset/extension_storage/daos.py |
DAO implementing scoped get/set/delete/list and encryption hooks. |
superset/extension_storage/api.py |
New REST API for scoped extension storage operations. |
superset/migrations/versions/2026-03-20_12-00_e5f6a7b8c9d0_add_extension_storage_table.py |
Migration creating the extension_storage table and indexes. |
superset/initialization/__init__.py |
Registers the new storage API during view initialization. |
superset/core/api/core_api_injection.py |
Injects ExtensionStorageDAO/model into superset_core abstraction layer. |
superset/config.py |
Adds EXTENSION_STORAGE_ENCRYPTION_KEYS config. |
superset/cli/update.py |
Adds rotate-extension-storage-keys CLI command. |
tests/unit_tests/daos/test_extension_storage_dao.py |
Adds DAO unit tests across scopes + encryption/rotation. |
docs/developer_docs/extensions/extension-points/persistent-storage.md |
Adds developer docs for storage API and usage. |
pyproject.toml |
Pins mypy python_version to 3.10. |
.pre-commit-config.yaml |
Pins pre-commit mypy to python3.10 and adjusts pylint invocation. |
superset/extension_storage/__init__.py |
Adds package init file. |
| sa.ForeignKeyConstraint(["created_by_fk"], ["ab_user.id"]), | ||
| sa.ForeignKeyConstraint(["changed_by_fk"], ["ab_user.id"]), | ||
| sa.PrimaryKeyConstraint("id"), | ||
| sa.UniqueConstraint("uuid"), |
There was a problem hiding this comment.
The migration creates an index for (extension_id, user_fk, resource_type, resource_uuid, key) but does not add a UNIQUE constraint on these scope columns. Since the DAO implements “upsert” via a read-then-update of the first matching row, duplicates would lead to incorrect reads/updates and race conditions under concurrent writes. Add a composite UNIQUE constraint for the scoped identity (extension_id, key, user_fk, resource_type, resource_uuid) to enforce the documented key uniqueness guarantees.
| sa.UniqueConstraint("uuid"), | |
| sa.UniqueConstraint("uuid"), | |
| sa.UniqueConstraint( | |
| "extension_id", | |
| "key", | |
| "user_fk", | |
| "resource_type", | |
| "resource_uuid", | |
| name="uq_extension_storage_scope", | |
| ), |
| __table_args__ = ( | ||
| # Composite index covering all lookup dimensions | ||
| Index( | ||
| "ix_ext_storage_lookup", | ||
| "extension_id", |
There was a problem hiding this comment.
The model defines lookup indexes but does not enforce uniqueness for the scoped identity (extension_id, key, user_fk, resource_type, resource_uuid). Without a composite UNIQUE constraint, duplicate rows can exist and the DAO’s upsert (query-first) semantics become nondeterministic, especially under concurrent writes. Add a composite UniqueConstraint matching the scope identity (and keep the supporting index if needed for performance).
|
|
||
| allow_browser_login = True | ||
| route_base = "/api/v1/extensions" | ||
| resource_name = "extensions" |
There was a problem hiding this comment.
ExtensionStorageRestApi uses route_base = "/api/v1/extensions" and resource_name = "extensions", which collides with the existing ExtensionsRestApi (same resource_name, and it also defaults to the same route base). This can cause permission/role confusion (shared FAB permission view-menu) and makes it easy to accidentally grant storage read/write rights when intending to grant extension-list rights. Use a distinct resource_name (and, if needed, a distinct route_base or explicit class_permission_name/method permissions) so storage permissions are isolated from the extensions registry endpoints.
| resource_name = "extensions" | |
| resource_name = "extension_storage" | |
| class_permission_name = "ExtensionStorage" |
| from superset.extension_storage.api import ExtensionStorageRestApi | ||
|
|
||
| appbuilder.add_api(ExtensionStorageRestApi) | ||
|
|
||
| if feature_flag_manager.is_feature_enabled("ENABLE_EXTENSIONS"): | ||
| from superset.extensions.api import ExtensionsRestApi | ||
|
|
There was a problem hiding this comment.
This API is registered unconditionally, even when the ENABLE_EXTENSIONS feature flag is disabled (the main extensions API is gated below). If extensions are feature-flagged off, it’s unexpected for /api/v1/extensions/.../storage/... endpoints to still be available. Consider gating ExtensionStorageRestApi registration behind the same flag (or a dedicated storage flag) to avoid exposing new surface area when extensions are disabled.
| from superset.extension_storage.api import ExtensionStorageRestApi | |
| appbuilder.add_api(ExtensionStorageRestApi) | |
| if feature_flag_manager.is_feature_enabled("ENABLE_EXTENSIONS"): | |
| from superset.extensions.api import ExtensionsRestApi | |
| if feature_flag_manager.is_feature_enabled("ENABLE_EXTENSIONS"): | |
| from superset.extension_storage.api import ExtensionStorageRestApi | |
| from superset.extensions.api import ExtensionsRestApi | |
| appbuilder.add_api(ExtensionStorageRestApi) |
| entries = ExtensionStorageDAO.list_global(ext_id, category=category) | ||
| total = len(entries) | ||
| page_entries = entries[offset : offset + page_size] |
There was a problem hiding this comment.
Pagination is implemented by loading all rows into memory (list_global(...).all()), computing len(entries), and then slicing the Python list. This will not scale once extensions store many keys. Prefer pushing pagination into the DB query (LIMIT/OFFSET) and using a separate COUNT query for count.
| entries = ExtensionStorageDAO.list_global(ext_id, category=category) | |
| total = len(entries) | |
| page_entries = entries[offset : offset + page_size] | |
| query = ExtensionStorageDAO.list_global(ext_id, category=category) | |
| total = query.count() | |
| page_entries = query.offset(offset).limit(page_size).all() |
| entries = ExtensionStorageDAO.list_resource( | ||
| ext_id, resource_type, resource_uuid, category=category | ||
| ) | ||
| total = len(entries) | ||
| page_entries = entries[offset : offset + page_size] |
There was a problem hiding this comment.
list_resource paginates by fetching all matching rows and slicing in Python. For resource-linked state that could grow over time, prefer DB-side LIMIT/OFFSET and COUNT to avoid loading every key/value header into memory on each request.
| entries = ExtensionStorageDAO.list_resource( | |
| ext_id, resource_type, resource_uuid, category=category | |
| ) | |
| total = len(entries) | |
| page_entries = entries[offset : offset + page_size] | |
| query = ExtensionStorage.query.filter_by( | |
| extension_id=ext_id, | |
| resource_type=resource_type, | |
| resource_uuid=resource_uuid, | |
| ) | |
| if category is not None: | |
| query = query.filter_by(category=category) | |
| total = query.count() | |
| page_entries = query.offset(offset).limit(page_size).all() |
| """REST API for extension Tier-3 persistent storage. | ||
|
|
||
| URL structure (all under /api/v1/extensions/<publisher>/<name>/storage/persistent/): | ||
|
|
||
| GET /global/ |
There was a problem hiding this comment.
The docstring and PR description disagree on the URL shape: this file documents /api/v1/extensions/<publisher>/<name>/storage/persistent/..., while the PR summary describes a flat /api/v1/extensions/<ext_id>/storage/... form. Please align the implementation, docs (including UPDATING.md), and PR description to a single, supported URL structure to avoid breaking early adopters.
| resource_type: str | None = None, | ||
| resource_uuid: str | None = None, | ||
| ) -> ExtensionStorage | None: | ||
| """Return a storage entry, decrypting the value if needed.""" |
There was a problem hiding this comment.
The get() docstring says it decrypts values, but the method returns the raw DB row without decrypting; decryption only happens in get_value(). Update the docstring to reflect actual behavior to avoid callers accidentally using ciphertext bytes when is_encrypted=True.
| """Return a storage entry, decrypting the value if needed.""" | |
| """Return the raw storage entry without decrypting its value. | |
| Call :meth:`get_value` to retrieve the stored value with decryption | |
| applied when ``is_encrypted`` is true. | |
| """ |
| ```json | ||
| { | ||
| "value": "<string or base64 bytes>", | ||
| "value_type": "application/json", | ||
| "category": "my-category", |
There was a problem hiding this comment.
The docs claim value can be a “string or base64 bytes”, but the current REST API implementation stores string values as UTF-8 bytes without base64 decoding. Update the docs to match the actual accepted format (or update the API to decode base64 and validate errors) so extensions don’t inadvertently persist encoded text instead of raw bytes.
| @pytest.fixture | ||
| def session(session: Session) -> Iterator[Session]: | ||
| """Extend the base session fixture with the extension_storage table.""" | ||
| engine = session.get_bind() | ||
| ExtensionStorage.metadata.create_all(engine) | ||
| yield session | ||
| session.rollback() |
There was a problem hiding this comment.
This fixture is recursively dependent on itself (def session(session: Session) with the same fixture name), which will raise a pytest “recursive dependency” error and prevent the test module from running. Rename this fixture (e.g., session_with_extension_storage) and update tests to use it, or use request.getfixturevalue("session") to obtain the base session fixture without recursion.
There was a problem hiding this comment.
Code Review Agent Run #678cd6
Actionable Suggestions - 1
-
superset/cli/update.py - 1
- Blind exception catch for generic Exception type · Line 131-136
Additional Suggestions - 3
-
superset/initialization/__init__.py - 1
-
Inconsistent import placement · Line 276-278The import for ExtensionStorageRestApi is placed locally inside the function, which is inconsistent with other unconditional API imports that are at the top of init_views. Conditional imports like ExtensionsRestApi are local, but unconditional ones should be grouped at the top for better readability and consistency.
-
-
.pre-commit-config.yaml - 1
-
PYTHONPATH precedence risk · Line 142-142The PYTHONPATH assignment appends
$(pwd), which may not override conflicting paths already set. If PYTHONPATH contains a different superset installation, pylint could load the wrong plugin. Prepending $ (pwd) would ensure the local version is prioritized.
-
-
superset/extension_storage/daos.py - 1
-
Incorrect docstring for get method · Line 108-108The docstring claims the method 'decrypting the value if needed', but it actually returns the raw ExtensionStorage model instance with potentially encrypted value. This matches get_value's behavior for decryption instead.
Code suggestion
diff --git a/superset/extension_storage/daos.py b/superset/extension_storage/daos.py index abc123..def456 100644 --- a/superset/extension_storage/daos.py +++ b/superset/extension_storage/daos.py @@ -107,7 +107,7 @@ class ExtensionStorageDAO: @staticmethod def get( extension_id: str, key: str, user_fk: int | None = None, resource_type: str | None = None, resource_uuid: str | None = None, ) -> ExtensionStorage | None: - """Return a storage entry, decrypting the value if needed.""" + """Return a storage entry (value may be encrypted; use get_value for decryption).""" entry = ( db.session.query(ExtensionStorage) .filter( and_( *_scope_filter( extension_id, key, user_fk, resource_type, resource_uuid ) ) ) .first() ) return entry
-
Review Details
-
Files reviewed - 12 · Commit Range:
e9f795a..ea04aa3- .pre-commit-config.yaml
- pyproject.toml
- superset/cli/update.py
- superset/config.py
- superset/core/api/core_api_injection.py
- superset/extension_storage/__init__.py
- superset/extension_storage/api.py
- superset/extension_storage/daos.py
- superset/extension_storage/models.py
- superset/initialization/__init__.py
- superset/migrations/versions/2026-03-20_12-00_e5f6a7b8c9d0_add_extension_storage_table.py
- tests/unit_tests/daos/test_extension_storage_dao.py
-
Files skipped - 2
- UPDATING.md - Reason: Filter setting
- docs/developer_docs/extensions/extension-points/persistent-storage.md - Reason: Filter setting
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
| try: | ||
| entry.value = fernet.rotate(entry.value) | ||
| except Exception as exc: # pylint: disable=broad-except | ||
| click.secho( | ||
| f"Failed to rotate key={entry.key} ext={entry.extension_id}: {exc}", | ||
| fg="red", |
There was a problem hiding this comment.
The code catches a blind Exception at line 133. Replace with specific exception types (e.g., ValueError, DecryptionError) to improve error handling clarity and avoid masking unexpected errors.
Code suggestion
Check the AI-generated fix before applying
| try: | |
| entry.value = fernet.rotate(entry.value) | |
| except Exception as exc: # pylint: disable=broad-except | |
| click.secho( | |
| f"Failed to rotate key={entry.key} ext={entry.extension_id}: {exc}", | |
| fg="red", | |
| try: | |
| entry.value = fernet.rotate(entry.value) | |
| except (ValueError, TypeError) as exc: | |
| click.secho( | |
| f"Failed to rotate key={entry.key} ext={entry.extension_id}: {exc}", | |
| fg="red", |
Code Review Run #678cd6
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
SUMMARY
Implements the Tier 3 (Persistent State) portion of the Extension Storage design proposal — a dedicated
extension_storagetable that gives extensions scoped, encrypted, durable key-value storage without touching Superset's internalkey_valuetable.Three storage scopes:
(extension_id, key)(extension_id, key, user_fk)(extension_id, key, resource_type, resource_uuid)What's included:
extension_storagetable withLargeBinaryvalue column,value_type(MIME),category,is_encrypted, and all scope columns. Alembic migration included.ExtensionStorageDAOwith typedget,get_value,set(upsert),delete, andlist_global/list_user/list_resource.ExtensionStorageRestApiregistered at/api/v1/extensions/<ext_id>/storage/. Full OpenAPI spec. Browser session auth (allow_browser_login = True) in addition to JWT. All write endpoints wrapped in@transaction().is_encrypted=True. Key rotation supported: prepend a new key toEXTENSION_STORAGE_ENCRYPTION_KEYSin config, runsuperset rotate-extension-storage-keys, then remove old keys. Falls back toSECRET_KEYwhen list is absent.superset rotate-extension-storage-keysre-encrypts all encrypted rows with the current (first) key.MultiFernet.docs/developer_docs/extensions/extension-points/persistent-storage.md.Config additions (
superset/config.py):RELATION TO PROPOSAL
This PR implements Tier 3 of the three-tier storage proposal. Below is a full accounting of where the implementation diverges from the proposal and what remains deferred.
Deliberate divergences
Encryption: Fernet instead of
EncryptedFieldFactoryThe proposal suggested reusing
EncryptedFieldFactory(AES via sqlalchemy-utils). This PR uses application-level Fernet encryption instead. Reasons:EncryptedFieldFactoryencrypts the entire column transparently — there is no per-row opt-in. Fernet gives us an explicitis_encryptedflag so plaintext and ciphertext values can coexist in the same table.MultiFernetkey rotation without a full column migration. The newEXTENSION_STORAGE_ENCRYPTION_KEYSconfig +superset rotate-extension-storage-keysCLI provide zero-downtime rotation independent of the mainSECRET_KEY/SecretsMigratorpath.Key rotation: dedicated CLI instead of extending
SecretsMigratorThe proposal said to extend
SecretsMigrator. This PR adds a separaterotate-extension-storage-keyscommand instead. ExtendingSecretsMigratorwould couple extension key rotation to the mainSECRET_KEYrotation cycle, preventing operators from rotating extension keys independently. The dedicated command is more flexible and doesn't change existingre-encrypt-secretsbehavior.Value column:
LargeBinaryinstead ofTEXTThe proposal listed this as an open question (TEXT vs. LONGBLOB). This PR resolves it in favor of
LargeBinary(maps toBYTEAon PostgreSQL,BLOBon MySQL/SQLite). This is the natural choice because: encrypted values are opaque bytes regardless of the original content, and binary blobs allow extensions to store non-JSON formats (Parquet, images, serialized state) without a base64 round-trip.Primary key: integer instead of UUID
The proposal used a UUID primary key. This PR uses the standard Superset integer auto-increment pattern for the internal
idcolumn. The scoping columns (extension_id,user_fk,resource_type,resource_uuid) already carry the semantic identity via the UNIQUE constraint. A UUID PK adds no practical benefit here and diverges from every other Superset model.URL structure: flat
ext_idinstead of{publisher}/{name}The proposal used
/api/v1/extensions/{publisher}/{name}/storage/.... This PR uses/api/v1/extensions/{ext_id}/storage/...whereext_idis the full dot-namespaced identifier (e.g.community.jupyter-notebook). The flat form is simpler to route in Flask and avoids ambiguity when publisher names contain hyphens or dots.DAO/REST API instead of
persistent_statefluent moduleThe proposal described a high-level
from superset_core import persistent_statefluent API as the extension-facing interface. This PR implements the DAO and REST layer that such a fluent wrapper would sit on top of. The fluent SDK module is a thin convenience layer that can be added in a follow-up without any schema or API changes — it is intentionally deferred so the low-level API can be reviewed and stabilized first.Not implemented in this PR (deferred to follow-ups)
localStatein@apache-superset/core)localStoragewith managed namespacing; async API for future sandbox compatibilityephemeralStatefrontend +ephemeral_statebackend modulespersistentState/persistent_statefluent SDKextension.json; needed for admin review, runtime enforcement, and future sandbox provisioningSUM(LENGTH(value))check perextension_idbefore writes; depends on manifest declarations for per-extension limitsextension_storagerows; needs event hooks on each resource deletion pathDELETE WHERE extension_id = ?triggered on extension uninstall; no uninstall lifecycle hook exists yetOpen questions from proposal — resolved
valuecolumn: TEXT vs. LONGBLOB?LargeBinary(see above)EncryptedFieldFactoryor custom?MultiFernetrotation (see above)SecretsMigrator?Open questions from proposal — still open
extension_storagebe further scoped by org/workspace for multi-tenant Superset deployments?sessionStoragevariant for Tier 1: Should Tier 1 expose asessionStatemodule (cleared on tab close) alongsidelocalState?BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A — backend-only change; no UI.
TESTING INSTRUCTIONS
superset db upgradeflask fab create-permissionspytest tests/unit_tests/daos/test_extension_storage_dao.py -vEXTENSION_STORAGE_ENCRYPTION_KEYS = ["new-key", "old-key"]in configsuperset rotate-extension-storage-keys"old-key"— existing values still decryptADDITIONAL INFORMATION