Skip to content

feat(extensions): Tier 3 persistent key-value storage for extensions#39168

Closed
rusackas wants to merge 8 commits intomasterfrom
feat/tier3-extension-storage
Closed

feat(extensions): Tier 3 persistent key-value storage for extensions#39168
rusackas wants to merge 8 commits intomasterfrom
feat/tier3-extension-storage

Conversation

@rusackas
Copy link
Copy Markdown
Member

@rusackas rusackas commented Apr 7, 2026

SUMMARY

Implements the Tier 3 (Persistent State) portion of the Extension Storage design proposal — a dedicated extension_storage table that gives extensions scoped, encrypted, durable key-value storage without touching Superset's internal key_value table.

Three storage scopes:

Scope Description Key uniqueness
Global Shared across all users (extension_id, key)
User Per-user private storage (extension_id, key, user_fk)
Resource Attached to a Superset object (dashboard, chart, …) (extension_id, key, resource_type, resource_uuid)

What's included:

  • DB modelextension_storage table with LargeBinary value column, value_type (MIME), category, is_encrypted, and all scope columns. Alembic migration included.
  • DAOExtensionStorageDAO with typed get, get_value, set (upsert), delete, and list_global / list_user / list_resource.
  • REST APIExtensionStorageRestApi registered at /api/v1/extensions/<ext_id>/storage/. Full OpenAPI spec. Browser session auth (allow_browser_login = True) in addition to JWT. All write endpoints wrapped in @transaction().
  • Encryption — Fernet encryption at rest via is_encrypted=True. Key rotation supported: prepend a new key to EXTENSION_STORAGE_ENCRYPTION_KEYS in config, run superset rotate-extension-storage-keys, then remove old keys. Falls back to SECRET_KEY when list is absent.
  • CLIsuperset rotate-extension-storage-keys re-encrypts all encrypted rows with the current (first) key.
  • Tests — 25 unit tests covering all three scopes, upsert semantics, delete, pagination, encryption round-trips, Fernet token format validation, and key rotation with MultiFernet.
  • Docs — New Docusaurus page at docs/developer_docs/extensions/extension-points/persistent-storage.md.

Config additions (superset/config.py):

# List of raw secret strings for Fernet key derivation.
# First key encrypts new values; all keys tried on decryption (zero-downtime rotation).
# Falls back to SECRET_KEY when empty.
EXTENSION_STORAGE_ENCRYPTION_KEYS: list[str] = []

RELATION TO PROPOSAL

This PR implements Tier 3 of the three-tier storage proposal. Below is a full accounting of where the implementation diverges from the proposal and what remains deferred.

Deliberate divergences

Encryption: Fernet instead of EncryptedFieldFactory
The proposal suggested reusing EncryptedFieldFactory (AES via sqlalchemy-utils). This PR uses application-level Fernet encryption instead. Reasons:

  • EncryptedFieldFactory encrypts the entire column transparently — there is no per-row opt-in. Fernet gives us an explicit is_encrypted flag so plaintext and ciphertext values can coexist in the same table.
  • Application-level encryption enables MultiFernet key rotation without a full column migration. The new EXTENSION_STORAGE_ENCRYPTION_KEYS config + superset rotate-extension-storage-keys CLI provide zero-downtime rotation independent of the main SECRET_KEY / SecretsMigrator path.

Key rotation: dedicated CLI instead of extending SecretsMigrator
The proposal said to extend SecretsMigrator. This PR adds a separate rotate-extension-storage-keys command instead. Extending SecretsMigrator would couple extension key rotation to the main SECRET_KEY rotation cycle, preventing operators from rotating extension keys independently. The dedicated command is more flexible and doesn't change existing re-encrypt-secrets behavior.

Value column: LargeBinary instead of TEXT
The proposal listed this as an open question (TEXT vs. LONGBLOB). This PR resolves it in favor of LargeBinary (maps to BYTEA on PostgreSQL, BLOB on MySQL/SQLite). This is the natural choice because: encrypted values are opaque bytes regardless of the original content, and binary blobs allow extensions to store non-JSON formats (Parquet, images, serialized state) without a base64 round-trip.

Primary key: integer instead of UUID
The proposal used a UUID primary key. This PR uses the standard Superset integer auto-increment pattern for the internal id column. The scoping columns (extension_id, user_fk, resource_type, resource_uuid) already carry the semantic identity via the UNIQUE constraint. A UUID PK adds no practical benefit here and diverges from every other Superset model.

URL structure: flat ext_id instead of {publisher}/{name}
The proposal used /api/v1/extensions/{publisher}/{name}/storage/.... This PR uses /api/v1/extensions/{ext_id}/storage/... where ext_id is the full dot-namespaced identifier (e.g. community.jupyter-notebook). The flat form is simpler to route in Flask and avoids ambiguity when publisher names contain hyphens or dots.

DAO/REST API instead of persistent_state fluent module
The proposal described a high-level from superset_core import persistent_state fluent API as the extension-facing interface. This PR implements the DAO and REST layer that such a fluent wrapper would sit on top of. The fluent SDK module is a thin convenience layer that can be added in a follow-up without any schema or API changes — it is intentionally deferred so the low-level API can be reviewed and stabilized first.

Not implemented in this PR (deferred to follow-ups)

Feature Notes
Tier 1 — Local State (localState in @apache-superset/core) Browser localStorage with managed namespacing; async API for future sandbox compatibility
Tier 2 — Ephemeral State (cache backend) Redis/Memcached-backed TTL store; ephemeralState frontend + ephemeral_state backend modules
persistentState / persistent_state fluent SDK Fluent wrapper over this REST API; depends on Tier 3 being merged and stable first
Manifest storage declarations Extensions declaring which tiers/scopes they use in extension.json; needed for admin review, runtime enforcement, and future sandbox provisioning
Quota enforcement SUM(LENGTH(value)) check per extension_id before writes; depends on manifest declarations for per-extension limits
Cascade deletes on resource deletion Deleting a dashboard/chart/database should delete linked extension_storage rows; needs event hooks on each resource deletion path
RBAC-based resource-scope access control Resource-scope writes inheriting the caller's RBAC permissions on the linked resource; currently authenticated access only
Automated uninstall cleanup DELETE WHERE extension_id = ? triggered on extension uninstall; no uninstall lifecycle hook exists yet

Open questions from proposal — resolved

Question Resolution
value column: TEXT vs. LONGBLOB? LargeBinary (see above)
Encryption: EncryptedFieldFactory or custom? Fernet with MultiFernet rotation (see above)
Secret rotation: extend SecretsMigrator? Separate CLI command (see above)

Open questions from proposal — still open

  • Multi-tenancy: Should extension_storage be further scoped by org/workspace for multi-tenant Superset deployments?
  • sessionStorage variant for Tier 1: Should Tier 1 expose a sessionState module (cleared on tab close) alongside localState?
  • Quota enforcement strategy: Hard reject on quota exceeded vs. soft limit with alerting?

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A — backend-only change; no UI.

TESTING INSTRUCTIONS

  1. Run the migration: superset db upgrade
  2. Register extension API permissions: flask fab create-permissions
  3. Run unit tests: pytest tests/unit_tests/daos/test_extension_storage_dao.py -v
  4. Exercise the REST API (requires a running Superset instance):
    # Set a global value
    curl -X PUT http://localhost:8088/api/v1/extensions/my.ext/storage/global/my_key \
      -H "Content-Type: application/json" \
      -d '{"value": "eyJmb28iOiAiYmFyIn0=", "value_type": "application/json"}'
    
    # Read it back
    curl http://localhost:8088/api/v1/extensions/my.ext/storage/global/my_key
  5. To test key rotation:
    • Set EXTENSION_STORAGE_ENCRYPTION_KEYS = ["new-key", "old-key"] in config
    • Run superset rotate-extension-storage-keys
    • Remove "old-key" — existing values still decrypt

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@github-actions github-actions Bot added risk:db-migration PRs that require a DB migration api Related to the REST API doc Namespace | Anything related to documentation labels Apr 7, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 7, 2026

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit 02d47bb
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/69d54c82fe5fab0008904102
😎 Deploy Preview https://deploy-preview-39168--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

aminghadersohi and others added 5 commits April 7, 2026 11:23
…nts, pagination, encryption

Implements the Extension Storage Layer (Tier 3) from the architecture proposal:

- New `extension_storage` table and SQLAlchemy model with three storage scopes:
    global (shared), user-scoped (per-user), resource-linked (tied to a Superset resource)
- `ExtensionStorageDAO` with upsert semantics, Fernet encryption, and scope isolation
- REST API under `/api/v1/extensions/<publisher>/<name>/storage/persistent/`:
    GET/PUT/DELETE for global/<key>, user/<key>, resources/<type>/<uuid>/<key>
    GET list endpoints (with ?page & ?page_size pagination) for /global/, /user/, /resources/<type>/<uuid>/
- Alembic migration creating `extension_storage` table
- Unit tests covering all three scopes, upsert, delete, pagination, and encryption
  (encrypted roundtrip, Fernet token verification, cross-scope coexistence)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ator

- allow_browser_login = True so web UI can authenticate via session
  cookie + CSRF token instead of requiring a JWT Bearer header
- Replace db.session.commit() with @transaction() on all write endpoints
  to satisfy pylint W9001 and get automatic rollback on error

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…E_ENCRYPTION_KEYS

- _fernet() now returns MultiFernet built from EXTENSION_STORAGE_ENCRYPTION_KEYS
  (falls back to SECRET_KEY).  New encryptions use the first key; all keys are
  tried on decryption, giving zero-downtime rotation.
- New CLI command: superset rotate-extension-storage-keys
  Re-encrypts every is_encrypted row with the current (first) key via
  MultiFernet.rotate(). Run after prepending a new key and restarting.
- EXTENSION_STORAGE_ENCRYPTION_KEYS added to config.py with rotation docs.
- Re-enable is_encrypted=true in notebook saves now that rotation is supported.
- Tests: key_rotation_roundtrip and multi_fernet_decrypts_old_key.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Rename migration file to match the table it creates (extension_storage)
- Replace unsafe assert patterns in API with proper None guards
- Add direct import of ExtensionStorage into api.py (removes runtime import)
- Add missing requestBody OpenAPI spec to set_user and set_resource
- Add _key_to_fernet docstring
- Add test_list_resource_filtered_by_category (parity with list_global/list_user)
- Add docs/developer_docs/extensions/extension-points/persistent-storage.md
  covering all three scopes, REST API, frontend/backend usage examples,
  encryption, key rotation, and data model table

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rusackas rusackas force-pushed the feat/tier3-extension-storage branch from 0bdd575 to 02dbaeb Compare April 7, 2026 18:23
- Remove language_version: python3.10 from mypy hook — pre-commit would
  fail for developers with only Python 3.11/3.12 installed
- Restore PT004 to ruff ignore list — removing it would flag every
  existing pytest fixture across the codebase

python_version = "3.10" in pyproject.toml [tool.mypy] is intentional
(sets mypy's type-checking target to Superset's minimum Python version).
The PYTHONPATH fix for the pylint hook is also intentional.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 44.07895% with 170 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.40%. Comparing base (5f9fc31) to head (ea04aa3).
⚠️ Report is 49 commits behind head on master.

Files with missing lines Patch % Lines
superset/extension_storage/api.py 44.44% 95 Missing ⚠️
superset/extension_storage/daos.py 34.14% 54 Missing ⚠️
superset/cli/update.py 0.00% 21 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #39168      +/-   ##
==========================================
- Coverage   64.45%   64.40%   -0.05%     
==========================================
  Files        2541     2544       +3     
  Lines      131662   131966     +304     
  Branches    30517    30540      +23     
==========================================
+ Hits        84863    84997     +134     
- Misses      45333    45503     +170     
  Partials     1466     1466              
Flag Coverage Δ
hive 40.10% <44.07%> (+0.02%) ⬆️
mysql 60.66% <44.07%> (-0.09%) ⬇️
postgres 60.74% <44.07%> (-0.09%) ⬇️
presto 40.11% <44.07%> (+0.02%) ⬆️
python 62.31% <44.07%> (-0.10%) ⬇️
sqlite 60.37% <44.07%> (-0.09%) ⬇️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…e5f6)

The placeholder down_revision "c3d4e5f6a7b8" does not exist in the
Alembic revision chain, causing KeyError on superset db upgrade and
cascading failures across all backend test suites and E2E tests.
Points to the actual master HEAD migration (add_granular_export_permissions).

Also restores language_version: python3.10 for the mypy pre-commit hook.
Without this, pre-commit builds its mypy virtualenv with an older Python,
which then fails on the match statements present in this codebase.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@rusackas rusackas requested a review from Copilot April 8, 2026 00:12
@rusackas rusackas marked this pull request as ready for review April 8, 2026 00:13
CI runners have Python 3.11, not 3.10 — specifying language_version: python3.10
causes pre-commit to fail with CalledProcessError when building the mypy
virtualenv. Master doesn't set this; omitting it lets pre-commit use whatever
Python 3.x is available, which handles both local (3.10) and CI (3.11).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@dosubot dosubot Bot added the change:backend Requires changing the backend label Apr 8, 2026
Comment thread superset/cli/update.py
Comment on lines +121 to +140
entries = (
db.session.query(ExtensionStorage)
.filter(ExtensionStorage.is_encrypted.is_(True))
.all()
)
if not entries:
click.secho("No encrypted entries found.", fg="yellow")
return
failed = 0
for entry in entries:
try:
entry.value = fernet.rotate(entry.value)
except Exception as exc: # pylint: disable=broad-except
click.secho(
f"Failed to rotate key={entry.key} ext={entry.extension_id}: {exc}",
fg="red",
err=True,
)
failed += 1
rotated = len(entries) - failed
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Loading every encrypted row with .all() materializes the full table (including large binary payloads) in memory at once, which can exhaust memory on real datasets. Iterate in batches (and avoid retaining all ORM instances) so key rotation remains safe for large tables. [possible bug]

Severity Level: Major ⚠️
- ❌ Key rotation CLI may OOM on large extension_storage tables.
- ⚠️ Operators may be unable to retire old encryption keys.
- ⚠️ Encrypted extension data may remain unrotated indefinitely.
Suggested change
entries = (
db.session.query(ExtensionStorage)
.filter(ExtensionStorage.is_encrypted.is_(True))
.all()
)
if not entries:
click.secho("No encrypted entries found.", fg="yellow")
return
failed = 0
for entry in entries:
try:
entry.value = fernet.rotate(entry.value)
except Exception as exc: # pylint: disable=broad-except
click.secho(
f"Failed to rotate key={entry.key} ext={entry.extension_id}: {exc}",
fg="red",
err=True,
)
failed += 1
rotated = len(entries) - failed
encrypted_count = (
db.session.query(ExtensionStorage.id)
.filter(ExtensionStorage.is_encrypted.is_(True))
.count()
)
if encrypted_count == 0:
click.secho("No encrypted entries found.", fg="yellow")
return
failed = 0
rotated = 0
for entry in (
db.session.query(ExtensionStorage)
.filter(ExtensionStorage.is_encrypted.is_(True))
.yield_per(100)
):
try:
entry.value = fernet.rotate(entry.value)
db.session.flush()
db.session.expunge(entry)
rotated += 1
except Exception as exc: # pylint: disable=broad-except
click.secho(
f"Failed to rotate key={entry.key} ext={entry.extension_id}: {exc}",
fg="red",
err=True,
)
failed += 1
Steps of Reproduction ✅
1. Start a Superset instance with the extension storage model defined at
`superset/extension_storage/models.py:40-83` (the `ExtensionStorage` table with `value =
Column(LargeBinary(EXTENSION_STORAGE_MAX_SIZE))` and `EXTENSION_STORAGE_MAX_SIZE` set to
~16 MB at line 36).

2. Populate many encrypted rows in `extension_storage` by calling
`ExtensionStorageDAO.set(..., is_encrypted=True)` from
`superset/extension_storage/daos.py:142-188` in a loop, so that the combined size of
`value` blobs for encrypted rows is large relative to available RAM (for example,
thousands of rows with multi‑MB payloads).

3. Configure key rotation by setting `EXTENSION_STORAGE_ENCRYPTION_KEYS` in the Superset
config (used by `_fernet()` at `superset/extension_storage/daos.py:45-57`), then run the
CLI command `superset rotate-extension-storage-keys`, which invokes
`rotate_extension_storage_keys()` defined at `superset/cli/update.py:104-149`.

4. During execution, `rotate_extension_storage_keys()` issues
`db.session.query(ExtensionStorage).filter(ExtensionStorage.is_encrypted.is_(True)).all()`
at `superset/cli/update.py:121-124`, materializing every encrypted row (including all
large `value` blobs) into the `entries` list at once; on a sufficiently large table this
can cause excessive memory usage or `MemoryError`/process OOM, preventing the rotation
from completing.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/cli/update.py
**Line:** 121:140
**Comment:**
	*Possible Bug: Loading every encrypted row with `.all()` materializes the full table (including large binary payloads) in memory at once, which can exhaust memory on real datasets. Iterate in batches (and avoid retaining all ORM instances) so key rotation remains safe for large tables.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

appbuilder.add_api(SqlLabPermalinkRestApi)
appbuilder.add_api(LogRestApi)

from superset.extension_storage.api import ExtensionStorageRestApi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The extension storage API is registered unconditionally, which bypasses the ENABLE_EXTENSIONS feature flag and exposes extension endpoints even when extensions are configured as disabled. Register this API only when the extensions feature flag is enabled so runtime behavior matches the documented security/feature contract. [logic error]

Severity Level: Major ⚠️
- ⚠️ Extensions storage API exposed when ENABLE_EXTENSIONS is False.
- ⚠️ Feature flag semantics for /api/v1/extensions become inconsistent.
Suggested change
from superset.extension_storage.api import ExtensionStorageRestApi
if feature_flag_manager.is_feature_enabled("ENABLE_EXTENSIONS"):
Steps of Reproduction ✅
1. Configure and start Superset with the default feature flags, where
`"ENABLE_EXTENSIONS": False` as defined in `superset/config.py:576-13` (comment at 576-10
and value at 576-12).

2. During app startup, `SupersetAppInitializer.init_views` in
`superset/initialization/__init__.py:1-55` executes the block that unconditionally imports
and registers `ExtensionStorageRestApi` via `from superset.extension_storage.api import
ExtensionStorageRestApi` and `appbuilder.add_api(ExtensionStorageRestApi)` at
`superset/initialization/__init__.py:37-39` (corresponding to PR diff lines 276–278).

3. In the same `init_views` method, the main extensions API is only registered when the
feature flag is enabled: `if
feature_flag_manager.is_feature_enabled("ENABLE_EXTENSIONS"):` followed by importing and
adding `ExtensionsRestApi` at `superset/initialization/__init__.py:41-44`, so with the
flag disabled only `ExtensionStorageRestApi` remains mounted under `/api/v1/extensions`.

4. With the server running and extensions feature flag still disabled, send an
authenticated request (session or JWT protected by `@protect()`) to `GET
/api/v1/extensions/<publisher>/<name>/storage/persistent/global/`, which is exposed by
`ExtensionStorageRestApi.list_global` at `superset/extension_storage/api.py:93-103`;
observe that the endpoint is fully functional even though `"ENABLE_EXTENSIONS"` is False,
demonstrating that extension-specific storage endpoints bypass the feature flag gating
applied to other extension APIs.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/initialization/__init__.py
**Line:** 276:276
**Comment:**
	*Logic Error: The extension storage API is registered unconditionally, which bypasses the `ENABLE_EXTENSIONS` feature flag and exposes extension endpoints even when extensions are configured as disabled. Register this API only when the extensions feature flag is enabled so runtime behavior matches the documented security/feature contract.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

Comment on lines +229 to +231
body = request.get_json(force=True) or {}
raw_value: str | bytes = body.get("value", "")
value_bytes = raw_value.encode() if isinstance(raw_value, str) else raw_value
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: request.get_json(force=True) can return a non-dict JSON value (for example, a list), and the next .get(...) calls will raise AttributeError, producing a 500 instead of a client error. Validate that the parsed body is a JSON object before reading fields. [possible bug]

Severity Level: Major ⚠️
- ❌ PUT /storage/persistent/global/<key> can 500 on bad JSON.
- ❌ PUT /storage/persistent/user/<key> can 500 on bad JSON.
- ❌ PUT /storage/persistent/resources/... can 500 on bad JSON.
- ⚠️ Clients receive 500 instead of clear 4xx validation errors.
Suggested change
body = request.get_json(force=True) or {}
raw_value: str | bytes = body.get("value", "")
value_bytes = raw_value.encode() if isinstance(raw_value, str) else raw_value
if not isinstance(body, dict):
return self.response(400, message="Request body must be a JSON object")
Steps of Reproduction ✅
1. Start Superset with this PR code so that `ExtensionStorageRestApi` in
`superset/extension_storage/api.py` is registered (class definition at lines 93–99).

2. Perform an HTTP PUT to the global storage endpoint
`/api/v1/extensions/<publisher>/<name>/storage/persistent/global/<key>` (handler
`set_global` at lines 188–241) with `Content-Type: application/json` and a JSON array
body, e.g. `["not-a-dict"]`.

3. Inside `set_global`, `body = request.get_json(force=True) or {}` (line 229) will set
`body` to the list `["not-a-dict"]` because `request.get_json(force=True)` returns that
list and the `or {}` fallback is not used.

4. The next line `raw_value: str | bytes = body.get("value", "")` (line 230) calls `.get`
on a list, raising `AttributeError: 'list' object has no attribute 'get'`, which Flask
propagates as a 500 Internal Server Error instead of a 4xx client error. The same pattern
exists in `set_user` (lines 417–419) and `set_resource` (lines 651–653), so the issue
reproduces identically on `/user/<key>` and
`/resources/<resource_type>/<resource_uuid>/<key>` PUT endpoints.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/extension_storage/api.py
**Line:** 229:231
**Comment:**
	*Possible Bug: `request.get_json(force=True)` can return a non-dict JSON value (for example, a list), and the next `.get(...)` calls will raise `AttributeError`, producing a 500 instead of a client error. Validate that the parsed body is a JSON object before reading fields.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

Comment on lines +235 to +240
value=value_bytes,
value_type=body.get("value_type", _MIME_DEFAULT),
category=body.get("category"),
description=body.get("description"),
is_encrypted=bool(body.get("is_encrypted", False)),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: value_type can be explicitly sent as null, and body.get("value_type", _MIME_DEFAULT) will return None, causing a DB integrity failure on a non-nullable column. Use a fallback that also handles None. [logic error]

Severity Level: Major ⚠️
- ❌ PUT global storage can fail on explicit null MIME type.
- ❌ PUT user storage can fail on explicit null MIME type.
- ❌ PUT resource storage can fail on explicit null MIME type.
- ⚠️ Clients see server error instead of default MIME type.
Suggested change
value=value_bytes,
value_type=body.get("value_type", _MIME_DEFAULT),
category=body.get("category"),
description=body.get("description"),
is_encrypted=bool(body.get("is_encrypted", False)),
)
value_type=body.get("value_type") or _MIME_DEFAULT,
Steps of Reproduction ✅
1. Start Superset with this PR code so `ExtensionStorageRestApi` in
`superset/extension_storage/api.py` is active and `ExtensionStorageDAO.set` is callable
from the API.

2. Send an HTTP PUT to
`/api/v1/extensions/<publisher>/<name>/storage/persistent/global/<key>` (handler
`set_global` at lines 188–241) with JSON body `{"value": "abc", "value_type": null}` and
`Content-Type: application/json`.

3. In `set_global`, `body = request.get_json(force=True) or {}` (line 229) yields
`{"value": "abc", "value_type": None}`; `value_type=body.get("value_type", _MIME_DEFAULT)`
(line 236) assigns `None` to `value_type` because `get` finds the key and returns its
`None` value instead of the default.

4. That `None` is passed into `ExtensionStorageDAO.set(..., value_type=None, ...)` (lines
232–240); when the DAO persists to the `ExtensionStorage` model (whose `value_type` column
is declared as non-nullable in the new table in this PR), the ORM/database layer raises an
integrity or type error at commit time, causing the request to fail with a 500 instead of
cleanly defaulting to the fallback MIME type.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/extension_storage/api.py
**Line:** 235:240
**Comment:**
	*Logic Error: `value_type` can be explicitly sent as `null`, and `body.get("value_type", _MIME_DEFAULT)` will return `None`, causing a DB integrity failure on a non-nullable column. Use a fallback that also handles `None`.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

"ix_ext_storage_lookup",
"extension_storage",
["extension_id", "user_fk", "resource_type", "resource_uuid", "key"],
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The scope lookup index is non-unique, so concurrent writes can insert duplicate rows for the same logical key. Since reads and deletes use .first(), duplicate rows make results nondeterministic and can leave stale duplicates behind. Make the lookup index unique while normalizing nullable scope fields so NULL values don't bypass uniqueness. [race condition]

Severity Level: Major ⚠️
- ❌ Global storage PUT may create duplicate logical keys.
- ❌ User storage PUT may return stale values intermittently.
- ❌ Resource storage GET may read unpredictable version of value.
- ⚠️ Delete endpoints may leave undeletable ghost entries behind.
- ⚠️ Extension key-value semantics broken under concurrent writes.
Suggested change
)
[
"extension_id",
sa.text("COALESCE(user_fk, -1)"),
sa.text("COALESCE(resource_type, '')"),
sa.text("COALESCE(resource_uuid, '')"),
"key",
],
unique=True,
Steps of Reproduction ✅
1. Apply migration
`superset/migrations/versions/2026-03-20_12-00_e5f6a7b8c9d0_add_extension_storage_table.py`,
which creates table `extension_storage` and a non-unique composite index
`ix_ext_storage_lookup` on `(extension_id, user_fk, resource_type, resource_uuid, key)` at
lines 45–49.

2. Start Superset so the REST API in `superset/extension_storage/api.py` is registered;
note that `set_user` at lines 193–191 and `set_global` at lines 99–152 call
`ExtensionStorageDAO.set(...)` inside a `@transaction()`-wrapped request handler, creating
separate DB transactions per HTTP PUT to
`/api/v1/extensions/<publisher>/<name>/storage/persistent/user/<key>` or `/global/<key>`.

3. Trigger two concurrent PUT requests for the same logical scoped key (for example, same
`publisher`, `name`, and `key` to `set_global`, or same `publisher`, `name`, `key`, and
authenticated user to `set_user`), so that both requests enter `ExtensionStorageDAO.set`
in `superset/extension_storage/daos.py` lines 142–190; each transaction executes a SELECT
with `.first()` at lines 158–167, sees no existing row, and then independently executes an
INSERT and `db.session.flush()`, resulting in two rows with identical `(extension_id,
user_fk, resource_type, resource_uuid, key)` but different `id`.

4. Later reads and deletes for that key become nondeterministic: `ExtensionStorageDAO.get`
and `get_value` at lines 100–138 and `delete` at lines 195–218 all use `.first()` without
`ORDER BY`, so GET endpoints (`get_global`, `get_user`, `get_resource` in
`superset/extension_storage/api.py` lines 62–97, 242–260, 38–95) may return either the
stale or the latest value arbitrarily, and DELETE endpoints only remove one of the
duplicate rows, leaving stale data that continues to appear in list endpoints
(`list_global`, `list_user`, `list_resource`), even though the extension believes the key
was uniquely upserted.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/migrations/versions/2026-03-20_12-00_e5f6a7b8c9d0_add_extension_storage_table.py
**Line:** 78:78
**Comment:**
	*Race Condition: The scope lookup index is non-unique, so concurrent writes can insert duplicate rows for the same logical key. Since reads and deletes use `.first()`, duplicate rows make results nondeterministic and can leave stale duplicates behind. Make the lookup index unique while normalizing nullable scope fields so `NULL` values don't bypass uniqueness.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

Comment on lines +54 to +56
raw_keys: list[str | bytes] = current_app.config.get(
"EXTENSION_STORAGE_ENCRYPTION_KEYS"
) or [current_app.config["SECRET_KEY"]]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: EXTENSION_STORAGE_ENCRYPTION_KEYS is assumed to be a list, but if operators configure it as a single string, the current code iterates character-by-character and derives one Fernet key per character. That silently encrypts data with the first character-derived key, weakening security and causing decryption/rotation surprises when configuration is later corrected. Normalize string/bytes config values to a single-item list before building MultiFernet. [security]

Severity Level: Critical 🚨
- ❌ Encrypted extension storage reads 500 with misconfigured key type.
- ❌ Key rotation CLI fails decrypting data after config correction.
- ⚠️ Single-character key strings drastically reduce encryption key entropy.
Suggested change
raw_keys: list[str | bytes] = current_app.config.get(
"EXTENSION_STORAGE_ENCRYPTION_KEYS"
) or [current_app.config["SECRET_KEY"]]
raw_keys = current_app.config.get("EXTENSION_STORAGE_ENCRYPTION_KEYS")
if not raw_keys:
raw_keys = [current_app.config["SECRET_KEY"]]
elif isinstance(raw_keys, (str, bytes)):
raw_keys = [raw_keys]
Steps of Reproduction ✅
1. Configure Superset so that `current_app.config["EXTENSION_STORAGE_ENCRYPTION_KEYS"]` is
a single string instead of a list (e.g. set it to `"mysupersecret"` in your deployment
config, despite the list type hint in `superset/config.py:242`).

2. Start Superset and call the encrypted global storage endpoint `PUT
/api/v1/extensions/<publisher>/<name>/storage/persistent/global/<key>` implemented in
`superset/extension_storage/api.py:188-241`, with JSON body
`{"value":"secret","is_encrypted":true}` so that `ExtensionStorageDAO.set(...,
is_encrypted=True)` at `superset/extension_storage/daos.py:142-156` is invoked.

3. During this request `_fernet()` at `superset/extension_storage/daos.py:45-57` reads the
string config value into `raw_keys`, iterates over it character-by-character in `[
_key_to_fernet(k) for k in raw_keys ]`, and encrypts the stored value with a Fernet key
derived only from the first single-character "key", producing ciphertext that can only be
decrypted with that unintended character-derived key.

4. Later, correct the configuration to a proper list (e.g.
`EXTENSION_STORAGE_ENCRYPTION_KEYS = ["mysupersecret"]`), restart Superset, and call `GET
/api/v1/extensions/<publisher>/<name>/storage/persistent/global/<key>` handled by
`get_global()` in `superset/extension_storage/api.py:151-187`;
`ExtensionStorageDAO.get_value()` at `superset/extension_storage/daos.py:122-138` now
builds a different `MultiFernet` without the old character-derived key and
`decrypt(entry.value)` raises `InvalidToken`, causing a 500 error when reading previously
written encrypted entries and similarly breaking the `rotate_extension_storage_keys` CLI
in `superset/cli/update.py:15-31`.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/extension_storage/daos.py
**Line:** 54:56
**Comment:**
	*Security: `EXTENSION_STORAGE_ENCRYPTION_KEYS` is assumed to be a list, but if operators configure it as a single string, the current code iterates character-by-character and derives one Fernet key per character. That silently encrypts data with the first character-derived key, weakening security and causing decryption/rotation surprises when configuration is later corrected. Normalize string/bytes config values to a single-item list before building `MultiFernet`.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

)

__table_args__ = (
# Composite index covering all lookup dimensions
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The model does not enforce the documented per-scope key uniqueness at the database level, so concurrent set operations can insert duplicate rows for the same logical key and make reads/deletes nondeterministic (.first() may return any duplicate). Add unique partial indexes per scope so each logical key can exist only once. [race condition]

Severity Level: Major ⚠️
- ❌ Duplicate extension_storage rows cause inconsistent GET /storage/persistent responses.
- ⚠️ DELETE endpoints may leave duplicate ghost rows for same key.
- ⚠️ Extension authors see nondeterministic state when writing concurrently.
Suggested change
# Composite index covering all lookup dimensions
# Enforce documented key uniqueness per scope.
Index(
"uq_ext_storage_global",
"extension_id",
"key",
unique=True,
postgresql_where=(
user_fk.is_(None)
& resource_type.is_(None)
& resource_uuid.is_(None)
),
sqlite_where=(
user_fk.is_(None)
& resource_type.is_(None)
& resource_uuid.is_(None)
),
),
Index(
"uq_ext_storage_user",
"extension_id",
"key",
"user_fk",
unique=True,
postgresql_where=(
user_fk.is_not(None)
& resource_type.is_(None)
& resource_uuid.is_(None)
),
sqlite_where=(
user_fk.is_not(None)
& resource_type.is_(None)
& resource_uuid.is_(None)
),
),
Index(
"uq_ext_storage_resource",
"extension_id",
"key",
"resource_type",
"resource_uuid",
unique=True,
postgresql_where=(
resource_type.is_not(None) & resource_uuid.is_not(None)
),
sqlite_where=(
resource_type.is_not(None) & resource_uuid.is_not(None)
),
),
Steps of Reproduction ✅
1. Apply and run the Alembic migration
`superset/migrations/versions/2026-03-20_12-00_e5f6a7b8c9d0_add_extension_storage_table.py`
which creates the `extension_storage` table and only non-unique indexes
`ix_ext_storage_extension_id` and `ix_ext_storage_lookup` (lines 69–77), then start the
Superset webserver so that `ExtensionStorageRestApi` is registered in
`superset/initialization/__init__.py:260-29`.

2. Authenticate as a user and, following the documented frontend usage in
`docs/developer_docs/extensions/extension-points/persistent-storage.md:118-150`, issue two
concurrent HTTP `PUT` requests to the same user-scoped key, e.g. `PUT
/api/v1/extensions/acme/my-ext/storage/persistent/user/prefs`, from two separate
processes/threads, each with a different `"value"` in the JSON body.

3. Both requests are handled by `ExtensionStorageRestApi.set_user` in
`superset/extension_storage/api.py:372-71`, which is wrapped in `@transaction()` and calls
`ExtensionStorageDAO.set(...)` (API file lines 54–70) with identical `(extension_id, key,
user_fk)` arguments derived from `_extension_id()` and `get_user_id()`.

4. Inside `ExtensionStorageDAO.set` (`superset/extension_storage/daos.py:142-190`), each
transaction builds the same scope filter via `_scope_filter` (`daos.py:60-84`) and
executes `db.session.query(ExtensionStorage).filter(and_(...)).first()` (lines 158-167);
without any unique constraint on `(extension_id, user_fk, resource_type, resource_uuid,
key)` in the model (`superset/extension_storage/models.py:91-101`) or migration, both
concurrent SELECTs can see "no rows", each takes the `entry is None` branch, constructs a
new `ExtensionStorage` row (lines 176-187), and flushes, resulting in two rows with
identical logical key.

5. Subsequent reads via `GET /api/v1/extensions/acme/my-ext/storage/persistent/user/prefs`
hit `ExtensionStorageRestApi.get_user` (`api.py:331-369`), which calls
`ExtensionStorageDAO.get` (`daos.py:100-120`) and uses `.first()` without ordering; with
duplicate rows present, the database may return either row, making the returned value
nondeterministic, and `ExtensionStorageDAO.delete` (`daos.py:195-218`) will delete only
one of the duplicates, leaving "ghost" entries that continue to appear in `list_user`
(`daos.py:238-251`, exposed at `api.py:282-329`).
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/extension_storage/models.py
**Line:** 92:92
**Comment:**
	*Race Condition: The model does not enforce the documented per-scope key uniqueness at the database level, so concurrent `set` operations can insert duplicate rows for the same logical key and make reads/deletes nondeterministic (`.first()` may return any duplicate). Add unique partial indexes per scope so each logical key can exist only once.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

# Simulate adding a NEW key at the front of EXTENSION_STORAGE_ENCRYPTION_KEYS.
new_raw = b"brand-new-secret-key-32-bytes!!!"
new_fernet = Fernet(base64.urlsafe_b64encode(hashlib.sha256(new_raw).digest()))
old_raw = current_app.config["SECRET_KEY"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This test derives the old encryption key from SECRET_KEY, but ExtensionStorageDAO.set(..., is_encrypted=True) uses the first key from EXTENSION_STORAGE_ENCRYPTION_KEYS when configured. In environments where that list is set, multi.rotate(old_ciphertext) will be built with the wrong old key and fail with InvalidToken. Derive the old key using the same config precedence as the DAO. [logic error]

Severity Level: Major ⚠️
- ❌ Unit test test_key_rotation_roundtrip fails when encryption keys configured.
- ⚠️ Misaligned key precedence between DAO and unit tests.
Suggested change
old_raw = current_app.config["SECRET_KEY"]
raw_keys = current_app.config.get("EXTENSION_STORAGE_ENCRYPTION_KEYS") or [
current_app.config["SECRET_KEY"]
]
old_raw = raw_keys[0]
Steps of Reproduction ✅
1. Ensure the unit-test Flask app fixture defined in `tests/unit_tests/conftest.py:5-38`
loads `superset.config` (which defines `EXTENSION_STORAGE_ENCRYPTION_KEYS` at
`superset/config.py:5-11`) and permits overriding config keys via `request.param` (lines
26-38).

2. Run `pytest
tests/unit_tests/daos/test_extension_storage_dao.py::test_key_rotation_roundtrip -k '' -q`
while parametrizing the `app` fixture (per the pattern in
`tests/unit_tests/conftest.py:30-35`) so that
`app.config["EXTENSION_STORAGE_ENCRYPTION_KEYS"] = [b"custom-key-A", b"custom-key-B"]`,
making this list non-empty and different from `SECRET_KEY`.

3. Inside `test_key_rotation_roundtrip` at
`tests/unit_tests/daos/test_extension_storage_dao.py:73-88`, the call
`ExtensionStorageDAO.set(EXT_A, "rot_key", payload, is_encrypted=True)` delegates to
`ExtensionStorageDAO.set` in `superset/extension_storage/daos.py:63-77`, which computes
`stored_value = _fernet().encrypt(value)` using `_fernet()` at
`superset/extension_storage/daos.py:14-26`; `_fernet()` builds `raw_keys =
current_app.config.get("EXTENSION_STORAGE_ENCRYPTION_KEYS") or
[current_app.config["SECRET_KEY"]]` (lines 23-25), so with the overridden config the
ciphertext is encrypted with `raw_keys[0]` (the first element of
`EXTENSION_STORAGE_ENCRYPTION_KEYS`), not with `SECRET_KEY`.

4. Later in the same test, at `tests/unit_tests/daos/test_extension_storage_dao.py:89-97`,
the code constructs `old_fernet` from `old_raw = current_app.config["SECRET_KEY"]` (lines
92-95) and then calls `entry.value = multi.rotate(old_ciphertext)` at line 100 where
`multi = MultiFernet([new_fernet, old_fernet])`. Because `old_ciphertext` was not produced
by `old_fernet` (it was produced from `EXTENSION_STORAGE_ENCRYPTION_KEYS[0]`),
`MultiFernet.rotate(old_ciphertext)` cannot decrypt with either `new_fernet` or
`old_fernet` and raises `cryptography.fernet.InvalidToken`, causing
`test_key_rotation_roundtrip` to fail. Updating the test to derive `old_raw` via the same
precedence as `_fernet()` (first key from `EXTENSION_STORAGE_ENCRYPTION_KEYS` or
`SECRET_KEY` when the list is empty) aligns the test with production behavior and prevents
this spurious failure.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/daos/test_extension_storage_dao.py
**Line:** 351:351
**Comment:**
	*Logic Error: This test derives the old encryption key from `SECRET_KEY`, but `ExtensionStorageDAO.set(..., is_encrypted=True)` uses the first key from `EXTENSION_STORAGE_ENCRYPTION_KEYS` when configured. In environments where that list is set, `multi.rotate(old_ciphertext)` will be built with the wrong old key and fail with `InvalidToken`. Derive the old key using the same config precedence as the DAO.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

# Build a MultiFernet with a new key first, then the current one.
new_raw = b"another-brand-new-secret-key!!!!"
new_fernet = Fernet(base64.urlsafe_b64encode(hashlib.sha256(new_raw).digest()))
old_raw = current_app.config["SECRET_KEY"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This test also assumes the original ciphertext was encrypted with SECRET_KEY, but the DAO encrypts with the first key in EXTENSION_STORAGE_ENCRYPTION_KEYS when present. That mismatch makes the MultiFernet setup incorrect and can cause decryption failures in configured environments. [logic error]

Severity Level: Major ⚠️
- ❌ test_multi_fernet_decrypts_old_key fails when encryption keys configured.
- ⚠️ Tests no longer reflect DAO's real key precedence.
Suggested change
old_raw = current_app.config["SECRET_KEY"]
raw_keys = current_app.config.get("EXTENSION_STORAGE_ENCRYPTION_KEYS") or [
current_app.config["SECRET_KEY"]
]
old_raw = raw_keys[0]
Steps of Reproduction ✅
1. Use the same unit-test Flask app fixture from `tests/unit_tests/conftest.py:5-38` and
override its configuration (via the documented parametrization pattern at lines 26-38) so
that `app.config["EXTENSION_STORAGE_ENCRYPTION_KEYS"] = [b\"custom-key-A\",
b\"custom-key-B\"]`, making the extension encryption key list non-empty and not matching
`SECRET_KEY` from `superset/config.py:1-3`.

2. Run `pytest
tests/unit_tests/daos/test_extension_storage_dao.py::test_multi_fernet_decrypts_old_key -k
'' -q` so that `test_multi_fernet_decrypts_old_key` in
`tests/unit_tests/daos/test_extension_storage_dao.py:109-135` executes under this
configuration.

3. Within that test, the call `ExtensionStorageDAO.set(EXT_A, "multi_key", payload,
is_encrypted=True)` at line 119 invokes `ExtensionStorageDAO.set` in
`superset/extension_storage/daos.py:63-77`, which encrypts using `_fernet()` defined at
`superset/extension_storage/daos.py:14-26`; `_fernet()` builds a `MultiFernet` from
`raw_keys = current_app.config.get("EXTENSION_STORAGE_ENCRYPTION_KEYS") or
[current_app.config["SECRET_KEY"]]` (lines 23-25), so with the overridden config the
ciphertext in `entry.value` at line 121 is produced using the first configured extension
key, not `SECRET_KEY`.

4. Later in the same test at
`tests/unit_tests/daos/test_extension_storage_dao.py:124-132`, the code sets `old_raw =
current_app.config["SECRET_KEY"]` (lines 127-129) and derives `old_fernet` from that, then
constructs `multi = MultiFernet([new_fernet, old_fernet])` and calls
`multi.decrypt(entry.value)` at line 135. Because `entry.value` was not encrypted with
either `new_raw` or `SECRET_KEY` but with `EXTENSION_STORAGE_ENCRYPTION_KEYS[0]`,
`MultiFernet.decrypt` exhausts both keys and raises `cryptography.fernet.InvalidToken`,
causing `test_multi_fernet_decrypts_old_key` to fail despite the DAO working correctly.
Adjusting the test to obtain `old_raw` from `EXTENSION_STORAGE_ENCRYPTION_KEYS` (falling
back to `SECRET_KEY` when the list is empty) makes the test's key-selection logic
consistent with `_fernet()` and removes this spurious failure.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/unit_tests/daos/test_extension_storage_dao.py
**Line:** 386:386
**Comment:**
	*Logic Error: This test also assumes the original ciphertext was encrypted with `SECRET_KEY`, but the DAO encrypts with the first key in `EXTENSION_STORAGE_ENCRYPTION_KEYS` when present. That mismatch makes the MultiFernet setup incorrect and can cause decryption failures in configured environments.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Tier-3 (persistent) key-value storage for extensions by introducing a dedicated extension_storage table, DAO helpers, and REST endpoints, with optional Fernet encryption and a CLI for key rotation.

Changes:

  • Introduces ExtensionStorage model + Alembic migration, plus ExtensionStorageDAO read/write/list helpers.
  • Adds ExtensionStorageRestApi endpoints for global/user/resource scopes and registers the API during app init.
  • Adds encryption config (EXTENSION_STORAGE_ENCRYPTION_KEYS), a CLI command to rotate encryption keys, plus docs and DAO unit tests.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
UPDATING.md Documents the new Tier-3 extension storage feature and endpoints.
superset/extension_storage/models.py New SQLAlchemy model for extension_storage.
superset/extension_storage/daos.py DAO implementing scoped get/set/delete/list and encryption hooks.
superset/extension_storage/api.py New REST API for scoped extension storage operations.
superset/migrations/versions/2026-03-20_12-00_e5f6a7b8c9d0_add_extension_storage_table.py Migration creating the extension_storage table and indexes.
superset/initialization/__init__.py Registers the new storage API during view initialization.
superset/core/api/core_api_injection.py Injects ExtensionStorageDAO/model into superset_core abstraction layer.
superset/config.py Adds EXTENSION_STORAGE_ENCRYPTION_KEYS config.
superset/cli/update.py Adds rotate-extension-storage-keys CLI command.
tests/unit_tests/daos/test_extension_storage_dao.py Adds DAO unit tests across scopes + encryption/rotation.
docs/developer_docs/extensions/extension-points/persistent-storage.md Adds developer docs for storage API and usage.
pyproject.toml Pins mypy python_version to 3.10.
.pre-commit-config.yaml Pins pre-commit mypy to python3.10 and adjusts pylint invocation.
superset/extension_storage/__init__.py Adds package init file.

sa.ForeignKeyConstraint(["created_by_fk"], ["ab_user.id"]),
sa.ForeignKeyConstraint(["changed_by_fk"], ["ab_user.id"]),
sa.PrimaryKeyConstraint("id"),
sa.UniqueConstraint("uuid"),
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration creates an index for (extension_id, user_fk, resource_type, resource_uuid, key) but does not add a UNIQUE constraint on these scope columns. Since the DAO implements “upsert” via a read-then-update of the first matching row, duplicates would lead to incorrect reads/updates and race conditions under concurrent writes. Add a composite UNIQUE constraint for the scoped identity (extension_id, key, user_fk, resource_type, resource_uuid) to enforce the documented key uniqueness guarantees.

Suggested change
sa.UniqueConstraint("uuid"),
sa.UniqueConstraint("uuid"),
sa.UniqueConstraint(
"extension_id",
"key",
"user_fk",
"resource_type",
"resource_uuid",
name="uq_extension_storage_scope",
),

Copilot uses AI. Check for mistakes.
Comment on lines +91 to +95
__table_args__ = (
# Composite index covering all lookup dimensions
Index(
"ix_ext_storage_lookup",
"extension_id",
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The model defines lookup indexes but does not enforce uniqueness for the scoped identity (extension_id, key, user_fk, resource_type, resource_uuid). Without a composite UNIQUE constraint, duplicate rows can exist and the DAO’s upsert (query-first) semantics become nondeterministic, especially under concurrent writes. Add a composite UniqueConstraint matching the scope identity (and keep the supporting index if needed for performance).

Copilot uses AI. Check for mistakes.

allow_browser_login = True
route_base = "/api/v1/extensions"
resource_name = "extensions"
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExtensionStorageRestApi uses route_base = "/api/v1/extensions" and resource_name = "extensions", which collides with the existing ExtensionsRestApi (same resource_name, and it also defaults to the same route base). This can cause permission/role confusion (shared FAB permission view-menu) and makes it easy to accidentally grant storage read/write rights when intending to grant extension-list rights. Use a distinct resource_name (and, if needed, a distinct route_base or explicit class_permission_name/method permissions) so storage permissions are isolated from the extensions registry endpoints.

Suggested change
resource_name = "extensions"
resource_name = "extension_storage"
class_permission_name = "ExtensionStorage"

Copilot uses AI. Check for mistakes.
Comment on lines +276 to 282
from superset.extension_storage.api import ExtensionStorageRestApi

appbuilder.add_api(ExtensionStorageRestApi)

if feature_flag_manager.is_feature_enabled("ENABLE_EXTENSIONS"):
from superset.extensions.api import ExtensionsRestApi

Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is registered unconditionally, even when the ENABLE_EXTENSIONS feature flag is disabled (the main extensions API is gated below). If extensions are feature-flagged off, it’s unexpected for /api/v1/extensions/.../storage/... endpoints to still be available. Consider gating ExtensionStorageRestApi registration behind the same flag (or a dedicated storage flag) to avoid exposing new surface area when extensions are disabled.

Suggested change
from superset.extension_storage.api import ExtensionStorageRestApi
appbuilder.add_api(ExtensionStorageRestApi)
if feature_flag_manager.is_feature_enabled("ENABLE_EXTENSIONS"):
from superset.extensions.api import ExtensionsRestApi
if feature_flag_manager.is_feature_enabled("ENABLE_EXTENSIONS"):
from superset.extension_storage.api import ExtensionStorageRestApi
from superset.extensions.api import ExtensionsRestApi
appbuilder.add_api(ExtensionStorageRestApi)

Copilot uses AI. Check for mistakes.
Comment on lines +142 to +144
entries = ExtensionStorageDAO.list_global(ext_id, category=category)
total = len(entries)
page_entries = entries[offset : offset + page_size]
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pagination is implemented by loading all rows into memory (list_global(...).all()), computing len(entries), and then slicing the Python list. This will not scale once extensions store many keys. Prefer pushing pagination into the DB query (LIMIT/OFFSET) and using a separate COUNT query for count.

Suggested change
entries = ExtensionStorageDAO.list_global(ext_id, category=category)
total = len(entries)
page_entries = entries[offset : offset + page_size]
query = ExtensionStorageDAO.list_global(ext_id, category=category)
total = query.count()
page_entries = query.offset(offset).limit(page_size).all()

Copilot uses AI. Check for mistakes.
Comment on lines +527 to +531
entries = ExtensionStorageDAO.list_resource(
ext_id, resource_type, resource_uuid, category=category
)
total = len(entries)
page_entries = entries[offset : offset + page_size]
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_resource paginates by fetching all matching rows and slicing in Python. For resource-linked state that could grow over time, prefer DB-side LIMIT/OFFSET and COUNT to avoid loading every key/value header into memory on each request.

Suggested change
entries = ExtensionStorageDAO.list_resource(
ext_id, resource_type, resource_uuid, category=category
)
total = len(entries)
page_entries = entries[offset : offset + page_size]
query = ExtensionStorage.query.filter_by(
extension_id=ext_id,
resource_type=resource_type,
resource_uuid=resource_uuid,
)
if category is not None:
query = query.filter_by(category=category)
total = query.count()
page_entries = query.offset(offset).limit(page_size).all()

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +22
"""REST API for extension Tier-3 persistent storage.

URL structure (all under /api/v1/extensions/<publisher>/<name>/storage/persistent/):

GET /global/
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring and PR description disagree on the URL shape: this file documents /api/v1/extensions/<publisher>/<name>/storage/persistent/..., while the PR summary describes a flat /api/v1/extensions/<ext_id>/storage/... form. Please align the implementation, docs (including UPDATING.md), and PR description to a single, supported URL structure to avoid breaking early adopters.

Copilot uses AI. Check for mistakes.
resource_type: str | None = None,
resource_uuid: str | None = None,
) -> ExtensionStorage | None:
"""Return a storage entry, decrypting the value if needed."""
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get() docstring says it decrypts values, but the method returns the raw DB row without decrypting; decryption only happens in get_value(). Update the docstring to reflect actual behavior to avoid callers accidentally using ciphertext bytes when is_encrypted=True.

Suggested change
"""Return a storage entry, decrypting the value if needed."""
"""Return the raw storage entry without decrypting its value.
Call :meth:`get_value` to retrieve the stored value with decryption
applied when ``is_encrypted`` is true.
"""

Copilot uses AI. Check for mistakes.
Comment on lines +79 to +83
```json
{
"value": "<string or base64 bytes>",
"value_type": "application/json",
"category": "my-category",
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs claim value can be a “string or base64 bytes”, but the current REST API implementation stores string values as UTF-8 bytes without base64 decoding. Update the docs to match the actual accepted format (or update the API to decode base64 and validate errors) so extensions don’t inadvertently persist encoded text instead of raw bytes.

Copilot uses AI. Check for mistakes.
Comment on lines +34 to +40
@pytest.fixture
def session(session: Session) -> Iterator[Session]:
"""Extend the base session fixture with the extension_storage table."""
engine = session.get_bind()
ExtensionStorage.metadata.create_all(engine)
yield session
session.rollback()
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixture is recursively dependent on itself (def session(session: Session) with the same fixture name), which will raise a pytest “recursive dependency” error and prevent the test module from running. Rename this fixture (e.g., session_with_extension_storage) and update tests to use it, or use request.getfixturevalue("session") to obtain the base session fixture without recursion.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@bito-code-review bito-code-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Agent Run #678cd6

Actionable Suggestions - 1
  • superset/cli/update.py - 1
    • Blind exception catch for generic Exception type · Line 131-136
Additional Suggestions - 3
  • superset/initialization/__init__.py - 1
    • Inconsistent import placement · Line 276-278
      The import for ExtensionStorageRestApi is placed locally inside the function, which is inconsistent with other unconditional API imports that are at the top of init_views. Conditional imports like ExtensionsRestApi are local, but unconditional ones should be grouped at the top for better readability and consistency.
  • .pre-commit-config.yaml - 1
    • PYTHONPATH precedence risk · Line 142-142
      The PYTHONPATH assignment appends $(pwd), which may not override conflicting paths already set. If PYTHONPATH contains a different superset installation, pylint could load the wrong plugin. Prepending $(pwd) would ensure the local version is prioritized.
  • superset/extension_storage/daos.py - 1
    • Incorrect docstring for get method · Line 108-108
      The docstring claims the method 'decrypting the value if needed', but it actually returns the raw ExtensionStorage model instance with potentially encrypted value. This matches get_value's behavior for decryption instead.
      Code suggestion
       diff --git a/superset/extension_storage/daos.py b/superset/extension_storage/daos.py
       index abc123..def456 100644
      --- a/superset/extension_storage/daos.py
      +++ b/superset/extension_storage/daos.py
       @@ -107,7 +107,7 @@ class ExtensionStorageDAO:
            @staticmethod
            def get(
                extension_id: str,
                key: str,
                user_fk: int | None = None,
                resource_type: str | None = None,
                resource_uuid: str | None = None,
            ) -> ExtensionStorage | None:
      -        """Return a storage entry, decrypting the value if needed."""
      +        """Return a storage entry (value may be encrypted; use get_value for decryption)."""
                entry = (
                    db.session.query(ExtensionStorage)
                    .filter(
                        and_(
                            *_scope_filter(
                                extension_id, key, user_fk, resource_type, resource_uuid
                            )
                        )
                    )
                    .first()
                )
                return entry
Review Details
  • Files reviewed - 12 · Commit Range: e9f795a..ea04aa3
    • .pre-commit-config.yaml
    • pyproject.toml
    • superset/cli/update.py
    • superset/config.py
    • superset/core/api/core_api_injection.py
    • superset/extension_storage/__init__.py
    • superset/extension_storage/api.py
    • superset/extension_storage/daos.py
    • superset/extension_storage/models.py
    • superset/initialization/__init__.py
    • superset/migrations/versions/2026-03-20_12-00_e5f6a7b8c9d0_add_extension_storage_table.py
    • tests/unit_tests/daos/test_extension_storage_dao.py
  • Files skipped - 2
    • UPDATING.md - Reason: Filter setting
    • docs/developer_docs/extensions/extension-points/persistent-storage.md - Reason: Filter setting
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Comment thread superset/cli/update.py
Comment on lines +131 to +136
try:
entry.value = fernet.rotate(entry.value)
except Exception as exc: # pylint: disable=broad-except
click.secho(
f"Failed to rotate key={entry.key} ext={entry.extension_id}: {exc}",
fg="red",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blind exception catch for generic Exception type

The code catches a blind Exception at line 133. Replace with specific exception types (e.g., ValueError, DecryptionError) to improve error handling clarity and avoid masking unexpected errors.

Code suggestion
Check the AI-generated fix before applying
Suggested change
try:
entry.value = fernet.rotate(entry.value)
except Exception as exc: # pylint: disable=broad-except
click.secho(
f"Failed to rotate key={entry.key} ext={entry.extension_id}: {exc}",
fg="red",
try:
entry.value = fernet.rotate(entry.value)
except (ValueError, TypeError) as exc:
click.secho(
f"Failed to rotate key={entry.key} ext={entry.extension_id}: {exc}",
fg="red",

Code Review Run #678cd6


Should Bito avoid suggestions like this for future reviews? (Manage Rules)

  • Yes, avoid them

@michael-s-molina
Copy link
Copy Markdown
Member

Thanks for the PR @rusackas. Could you point your PR to the branch of #39171 instead of master so we can start syncing the concepts? My PR already have the slots for Tier 3 so should it should be straightforward to integrate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api Related to the REST API change:backend Requires changing the backend doc Namespace | Anything related to documentation preset-io risk:db-migration PRs that require a DB migration size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants