Skip to content

[SSO] Add design doc for group to role mapping for OIDC#35899

Merged
mtabebe merged 1 commit intoMaterializeInc:mainfrom
mtabebe:ma/sso/scim-design-doc
Apr 22, 2026
Merged

[SSO] Add design doc for group to role mapping for OIDC#35899
mtabebe merged 1 commit intoMaterializeInc:mainfrom
mtabebe:ma/sso/scim-design-doc

Conversation

@mtabebe
Copy link
Copy Markdown
Contributor

@mtabebe mtabebe commented Apr 7, 2026

Proposes JWT-based group-to-role sync for self-managed.

On connection, Materialize reads group claims from the JWT and grants/revokes role memberships accordingly. Track these using a dedicated sentinel grantor (MZ_JWT_SYNC_ROLE_ID) to distinguish sync-managed from manually-managed grants.

@mtabebe mtabebe requested a review from SangJunBak April 7, 2026 20:51
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 7, 2026

Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone.

PR title guidelines

  • Use imperative mood: "Fix X" not "Fixed X" or "Fixes X"
  • Be specific: "Fix panic in catalog sync when controller restarts" not "Fix bug" or "Update catalog code"
  • Prefix with area if helpful: compute: , storage: , adapter: , sql:

Pre-merge checklist

  • The PR title is descriptive and will make sense in the git log.
  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).

@mtabebe mtabebe force-pushed the ma/sso/scim-design-doc branch from e81c023 to f4c4dd1 Compare April 8, 2026 15:37
@mtabebe mtabebe marked this pull request as ready for review April 8, 2026 15:37
@mtabebe mtabebe requested a review from pH14 April 8, 2026 15:38
Comment thread doc/developer/design/20260407_jwt_role_mapping.md
Comment thread doc/developer/design/20260407_jwt_role_mapping.md Outdated
Comment thread doc/developer/design/20260407_jwt_role_mapping.md Outdated

For MVP, sync activity is surfaced through:

- **`mz_audit_log`**: All GRANTs and REVOKEs from sync are already logged (they go through `Op::GrantRole`/`Op::RevokeRole`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't have any, we should indicate the source

Comment on lines +69 to +76
The role names must match the IdP group names (case-insensitive).

**Step 3: Enable group sync.**
```sql
ALTER SYSTEM SET jwt_group_role_sync_enabled = true;
-- Optional: change the claim name if your IdP uses something other than "groups"
-- ALTER SYSTEM SET jwt_group_claim = 'groups';
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must not sync users to roles prefixed with mz_ or pg_


- Auto-creating database roles from IdP groups
- Syncing role *privileges* (GRANTs on objects) from the IdP, only role *membership*
- Real-time push-based sync (SCIM webhook to Materialize), we sync on connection
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sync on connect is the standard, but this also make sit very difficult to get an accurate representation of the current state in our web console or to discover access via querying internal tables.

If this is standard art and aligned with customer expectations this is probably fine.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me. Who makes the call on customer expectations?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be @maheshwarip or @val-materialize

GRANT ALL ON SCHEMA infrastructure TO platform_eng;
```

The role names must match the IdP group names (case-insensitive).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past I've had to create a transformation for upstream IDP to user database roles
IE
materialize_platform_eng -> platform_eng via something like materialize_(.*)->$1

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I'll leave it as an open question, we can incorporate it at a later point?

@mtabebe mtabebe force-pushed the ma/sso/scim-design-doc branch from f4c4dd1 to 36bfb1a Compare April 20, 2026 19:41
@mtabebe mtabebe requested review from Alphadelta14 and jubrad April 20, 2026 20:05
Copy link
Copy Markdown
Contributor

@Alphadelta14 Alphadelta14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff looks good to me


## Open Questions

1. **Sync on the connection hot path**: `handle_startup_inner` runs on every connection. The sync does a `catalog_transact`, which takes a write lock on the catalog. If 50 users reconnect after a deploy, are we serializing these all through the catalog on startup? Or could we skip the operation if the groups haven't changed (which would be the common case).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think skipping the operation if the groups haven't changed would probably make this fast enough!

| `jwt_group_claim` | `'groups'` | JWT claim name containing group memberships |
| `jwt_group_role_sync_strict` | `false` | When `true`, login is rejected if sync fails or groups resolve to empty (fail-closed). When `false`, sync errors are logged but login proceeds (fail-open). |

Group names from the JWT are matched directly to Materialize role names (case-insensitive). Roles must be pre-created in Materialize; group sync does not auto-create roles (only auto-creates users). Pre-created roles serve as the allowlist for IdP groups. Roles prefixed with `mz_` or `pg_` are always excluded from sync to prevent privilege escalation to system roles.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently our dyncfgs related to SSO are prefixed with oidc_ rather than jwt_. However it might Decided to prefix it as OIDC since we were relying on the Open ID connect spec, however there may be plans of.

## Open Questions

1. **Sync on the connection hot path**: `handle_startup_inner` runs on every connection. The sync does a `catalog_transact`, which takes a write lock on the catalog. If 50 users reconnect after a deploy, are we serializing these all through the catalog on startup? Or could we skip the operation if the groups haven't changed (which would be the common case).
2. **Unmatched group observability**: How should unmatched groups (IdP group with no corresponding Materialize role) be surfaced? The `mz_audit_log` uses a closed `EventType` enum (`Create`, `Drop`, `Alter`, `Grant`, `Revoke`, `Comment`), so adding a new event type requires a new `EventType` variant, a new `EventDetails` struct, and a proto migration. For MVP, server logs are sufficient, with a dedicated system table or audit log variant added later.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can imagine a user being part of many groups unrelated to Materialize, and if we're doing this per login, we might be storing a lot of redundant data if we decide to persist this. If we're improving on server logs, maybe these can be notices controlled by a new session variable that's off by default?

2. **Unmatched group observability**: How should unmatched groups (IdP group with no corresponding Materialize role) be surfaced? The `mz_audit_log` uses a closed `EventType` enum (`Create`, `Drop`, `Alter`, `Grant`, `Revoke`, `Comment`), so adding a new event type requires a new `EventType` variant, a new `EventDetails` struct, and a proto migration. For MVP, server logs are sufficient, with a dedicated system table or audit log variant added later.
3. **Edge case behaviour**: Are these the right choices? (See Security: Shadowed Permissions section for the `strict` mode trade-off.)
4. **Frontegg group claims in JWT**: Can Frontegg be configured to include group membership as a claim in the JWT access token it issues?
5. **Two authentication paths for Cloud support**: The Frontegg authenticator uses app passwords. Users authenticate with a client ID and secret key, which are exchanged with Frontegg's API for a JWT (`exchange_app_password()`). This means group sync for Cloud would need its own implementation: extract groups from the JWT returned by the app password exchange, and push updated group memberships through another (or existing channel) on each token refresh. This is a separate code path from OIDC.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunate but not much we can do right now :(

1. **Sync on the connection hot path**: `handle_startup_inner` runs on every connection. The sync does a `catalog_transact`, which takes a write lock on the catalog. If 50 users reconnect after a deploy, are we serializing these all through the catalog on startup? Or could we skip the operation if the groups haven't changed (which would be the common case).
2. **Unmatched group observability**: How should unmatched groups (IdP group with no corresponding Materialize role) be surfaced? The `mz_audit_log` uses a closed `EventType` enum (`Create`, `Drop`, `Alter`, `Grant`, `Revoke`, `Comment`), so adding a new event type requires a new `EventType` variant, a new `EventDetails` struct, and a proto migration. For MVP, server logs are sufficient, with a dedicated system table or audit log variant added later.
3. **Edge case behaviour**: Are these the right choices? (See Security: Shadowed Permissions section for the `strict` mode trade-off.)
4. **Frontegg group claims in JWT**: Can Frontegg be configured to include group membership as a claim in the JWT access token it issues?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before implementation, we should definitely should have someone from Cloud verify this, as well as verifying if we get the group membership of not only the user in Frontegg, but the upstream IdP too


### Edge Cases

The default behavior is fail-open: log warning and skip on misconfiguration, allowing login to proceed. When `jwt_group_role_sync_strict = true`, sync failures and empty group resolution reject the login instead.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "log", do we mean send a notice?


The default behavior is fail-open: log warning and skip on misconfiguration, allowing login to proceed. When `jwt_group_role_sync_strict = true`, sync failures and empty group resolution reject the login instead.

**Group maps to non-existent role**: Log warning, record in `mz_audit_log`, and skip
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would we record in the audit log?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, nothing to record. I'll remove

**Group maps to non-existent role**: Log warning, record in `mz_audit_log`, and skip

**Missing groups claim vs empty groups claim**: These are different.
- `groups: []` (explicit empty) means revoke all sync-granted roles, keep manual grants. In strict mode, login is rejected if no manual grants remain. In default (fail-open) mode, login proceeds.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In strict mode, login is rejected if no manual grants remain

Why would login be rejected here? To me, this should just be the same as if the user had only default privileges (reference: https://materialize.com/docs/security/self-managed/access-control/#initial-privileges)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point

Proposes JWT-based group-to-role sync for self-managed.

On connection, Materialize reads group claims from the JWT and
grants/revokes role memberships accordingly. Track these using
a dedicated sentinel grantor (MZ_JWT_SYNC_ROLE_ID)
to distinguish sync-managed from manually-managed grants.
@mtabebe mtabebe force-pushed the ma/sso/scim-design-doc branch from 36bfb1a to 8e77b66 Compare April 22, 2026 13:41
@mtabebe mtabebe merged commit 4d48ae7 into MaterializeInc:main Apr 22, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants