[SSO] Add design doc for group to role mapping for OIDC#35899
[SSO] Add design doc for group to role mapping for OIDC#35899mtabebe merged 1 commit intoMaterializeInc:mainfrom
Conversation
|
Thanks for opening this PR! Here are a few tips to help make the review process smooth for everyone. PR title guidelines
Pre-merge checklist
|
e81c023 to
f4c4dd1
Compare
|
|
||
| For MVP, sync activity is surfaced through: | ||
|
|
||
| - **`mz_audit_log`**: All GRANTs and REVOKEs from sync are already logged (they go through `Op::GrantRole`/`Op::RevokeRole`). |
There was a problem hiding this comment.
if we don't have any, we should indicate the source
| The role names must match the IdP group names (case-insensitive). | ||
|
|
||
| **Step 3: Enable group sync.** | ||
| ```sql | ||
| ALTER SYSTEM SET jwt_group_role_sync_enabled = true; | ||
| -- Optional: change the claim name if your IdP uses something other than "groups" | ||
| -- ALTER SYSTEM SET jwt_group_claim = 'groups'; | ||
| ``` |
There was a problem hiding this comment.
We must not sync users to roles prefixed with mz_ or pg_
|
|
||
| - Auto-creating database roles from IdP groups | ||
| - Syncing role *privileges* (GRANTs on objects) from the IdP, only role *membership* | ||
| - Real-time push-based sync (SCIM webhook to Materialize), we sync on connection |
There was a problem hiding this comment.
sync on connect is the standard, but this also make sit very difficult to get an accurate representation of the current state in our web console or to discover access via querying internal tables.
If this is standard art and aligned with customer expectations this is probably fine.
There was a problem hiding this comment.
That makes sense to me. Who makes the call on customer expectations?
| GRANT ALL ON SCHEMA infrastructure TO platform_eng; | ||
| ``` | ||
|
|
||
| The role names must match the IdP group names (case-insensitive). |
There was a problem hiding this comment.
In the past I've had to create a transformation for upstream IDP to user database roles
IE
materialize_platform_eng -> platform_eng via something like materialize_(.*)->$1
There was a problem hiding this comment.
That makes sense, I'll leave it as an open question, we can incorporate it at a later point?
f4c4dd1 to
36bfb1a
Compare
Alphadelta14
left a comment
There was a problem hiding this comment.
diff looks good to me
|
|
||
| ## Open Questions | ||
|
|
||
| 1. **Sync on the connection hot path**: `handle_startup_inner` runs on every connection. The sync does a `catalog_transact`, which takes a write lock on the catalog. If 50 users reconnect after a deploy, are we serializing these all through the catalog on startup? Or could we skip the operation if the groups haven't changed (which would be the common case). |
There was a problem hiding this comment.
I think skipping the operation if the groups haven't changed would probably make this fast enough!
| | `jwt_group_claim` | `'groups'` | JWT claim name containing group memberships | | ||
| | `jwt_group_role_sync_strict` | `false` | When `true`, login is rejected if sync fails or groups resolve to empty (fail-closed). When `false`, sync errors are logged but login proceeds (fail-open). | | ||
|
|
||
| Group names from the JWT are matched directly to Materialize role names (case-insensitive). Roles must be pre-created in Materialize; group sync does not auto-create roles (only auto-creates users). Pre-created roles serve as the allowlist for IdP groups. Roles prefixed with `mz_` or `pg_` are always excluded from sync to prevent privilege escalation to system roles. |
There was a problem hiding this comment.
Currently our dyncfgs related to SSO are prefixed with oidc_ rather than jwt_. However it might Decided to prefix it as OIDC since we were relying on the Open ID connect spec, however there may be plans of.
| ## Open Questions | ||
|
|
||
| 1. **Sync on the connection hot path**: `handle_startup_inner` runs on every connection. The sync does a `catalog_transact`, which takes a write lock on the catalog. If 50 users reconnect after a deploy, are we serializing these all through the catalog on startup? Or could we skip the operation if the groups haven't changed (which would be the common case). | ||
| 2. **Unmatched group observability**: How should unmatched groups (IdP group with no corresponding Materialize role) be surfaced? The `mz_audit_log` uses a closed `EventType` enum (`Create`, `Drop`, `Alter`, `Grant`, `Revoke`, `Comment`), so adding a new event type requires a new `EventType` variant, a new `EventDetails` struct, and a proto migration. For MVP, server logs are sufficient, with a dedicated system table or audit log variant added later. |
There was a problem hiding this comment.
I can imagine a user being part of many groups unrelated to Materialize, and if we're doing this per login, we might be storing a lot of redundant data if we decide to persist this. If we're improving on server logs, maybe these can be notices controlled by a new session variable that's off by default?
| 2. **Unmatched group observability**: How should unmatched groups (IdP group with no corresponding Materialize role) be surfaced? The `mz_audit_log` uses a closed `EventType` enum (`Create`, `Drop`, `Alter`, `Grant`, `Revoke`, `Comment`), so adding a new event type requires a new `EventType` variant, a new `EventDetails` struct, and a proto migration. For MVP, server logs are sufficient, with a dedicated system table or audit log variant added later. | ||
| 3. **Edge case behaviour**: Are these the right choices? (See Security: Shadowed Permissions section for the `strict` mode trade-off.) | ||
| 4. **Frontegg group claims in JWT**: Can Frontegg be configured to include group membership as a claim in the JWT access token it issues? | ||
| 5. **Two authentication paths for Cloud support**: The Frontegg authenticator uses app passwords. Users authenticate with a client ID and secret key, which are exchanged with Frontegg's API for a JWT (`exchange_app_password()`). This means group sync for Cloud would need its own implementation: extract groups from the JWT returned by the app password exchange, and push updated group memberships through another (or existing channel) on each token refresh. This is a separate code path from OIDC. |
There was a problem hiding this comment.
This is unfortunate but not much we can do right now :(
| 1. **Sync on the connection hot path**: `handle_startup_inner` runs on every connection. The sync does a `catalog_transact`, which takes a write lock on the catalog. If 50 users reconnect after a deploy, are we serializing these all through the catalog on startup? Or could we skip the operation if the groups haven't changed (which would be the common case). | ||
| 2. **Unmatched group observability**: How should unmatched groups (IdP group with no corresponding Materialize role) be surfaced? The `mz_audit_log` uses a closed `EventType` enum (`Create`, `Drop`, `Alter`, `Grant`, `Revoke`, `Comment`), so adding a new event type requires a new `EventType` variant, a new `EventDetails` struct, and a proto migration. For MVP, server logs are sufficient, with a dedicated system table or audit log variant added later. | ||
| 3. **Edge case behaviour**: Are these the right choices? (See Security: Shadowed Permissions section for the `strict` mode trade-off.) | ||
| 4. **Frontegg group claims in JWT**: Can Frontegg be configured to include group membership as a claim in the JWT access token it issues? |
There was a problem hiding this comment.
Before implementation, we should definitely should have someone from Cloud verify this, as well as verifying if we get the group membership of not only the user in Frontegg, but the upstream IdP too
|
|
||
| ### Edge Cases | ||
|
|
||
| The default behavior is fail-open: log warning and skip on misconfiguration, allowing login to proceed. When `jwt_group_role_sync_strict = true`, sync failures and empty group resolution reject the login instead. |
There was a problem hiding this comment.
By "log", do we mean send a notice?
|
|
||
| The default behavior is fail-open: log warning and skip on misconfiguration, allowing login to proceed. When `jwt_group_role_sync_strict = true`, sync failures and empty group resolution reject the login instead. | ||
|
|
||
| **Group maps to non-existent role**: Log warning, record in `mz_audit_log`, and skip |
There was a problem hiding this comment.
What would we record in the audit log?
There was a problem hiding this comment.
Good point, nothing to record. I'll remove
| **Group maps to non-existent role**: Log warning, record in `mz_audit_log`, and skip | ||
|
|
||
| **Missing groups claim vs empty groups claim**: These are different. | ||
| - `groups: []` (explicit empty) means revoke all sync-granted roles, keep manual grants. In strict mode, login is rejected if no manual grants remain. In default (fail-open) mode, login proceeds. |
There was a problem hiding this comment.
In strict mode, login is rejected if no manual grants remain
Why would login be rejected here? To me, this should just be the same as if the user had only default privileges (reference: https://materialize.com/docs/security/self-managed/access-control/#initial-privileges)
Proposes JWT-based group-to-role sync for self-managed. On connection, Materialize reads group claims from the JWT and grants/revokes role memberships accordingly. Track these using a dedicated sentinel grantor (MZ_JWT_SYNC_ROLE_ID) to distinguish sync-managed from manually-managed grants.
36bfb1a to
8e77b66
Compare
Proposes JWT-based group-to-role sync for self-managed.
On connection, Materialize reads group claims from the JWT and grants/revokes role memberships accordingly. Track these using a dedicated sentinel grantor (MZ_JWT_SYNC_ROLE_ID) to distinguish sync-managed from manually-managed grants.