Don't `:require_levelbuilder_mode` to edit datasets by snickell · Pull Request #64779 · code-dot-org/code-dot-org

snickell · 2025-03-25T01:48:25Z

Datablock Storage (PR) unintentionally broke the ability for curriculum to self-service in updating applab datasets.

This PR is a one-liner (it removes an unnecessary (?) security check) that restores that functionality by allowing users with levelbuilder permissions on prod to access existing endpoints https://studio.code.org/datasets and https://studio.code.org/datasets/manifest/edit.

This PR restores the same editing flow as prior to datablock storage, but curriculum should edit both levelbuilder and prod to make an update as they are no longer joined at the DB level.

History of the issue

Prior to Datablock Storage (which uses our mysql DB), we stored applab datasets in Firebase.

These were typically edited by curriculum accessing https://levelbuilder-studio.code.org/datasets/manifest/edit and https://levelbuilder-studio.code.org/datasets, which use the DatasetsController.

Behind the scenes, levelbuilder was actually sharing applab datasets with production, and therefore when you edited levelbuilder, prod was instantly updated. This bypassed the usual "updates on levelbuilder are content scooped, stored in git, and re-hydrated on prod" flow, but nobody noticed/remembered this was the case.

With Datablock Storage, we did not catch this impicit coupling between levelbuilder and prod. As a result, we broke the ability for curriculum to self-service.

Current status quo is that curriculum must request updates from engineering, who can run a hacky / error-prone script I created to do the updates manually using console access (relevant slack thread).

Possible fixes

rejected: use the same model firebase used: levelbuilder edits the prod dataset definition. We won't do this for obvious security reasons: we will not give levelbuilder access to the prod DB.
add a content scoop hook to dump both datasets and the dataset library manifest into git, and rehydrate from there when seeding/re-seeding the DB. this matches how levels/scripts work.

this is probably the 'right' fix, but it'd take significant dev time to implement for a procedure we do once or twice a year.
additionally, datasets are a significant amount of data kb-wise, and I'm reluctant to check another large thing into git
seeding is already slow, and this will add more seed data which is not actually used in tests
realistically, we might implement this fix long-term, but its probably not a real option today given the time involved and there's no work hours allocated toward it

approach taken by this PR: currently we only allow accessing /datasets and /datasets/manifest/edit when the user has UserPermission::LEVELBUILDER AND the server is in levelbuilder_mode. If we reduce this to simply requiring UserPermission::LEVELBUILDER, curriculum authors can self-service again.

They'll need to update both prod and levelbuilder when they make changes, but edits have not been frequent (once or twice per year).

Really the choice today is between #3, or continuing to edit SQL by hand / use scripts, which is relatively disempowering for curriculum, and relatively risk prone.

Security Implications

Currently ability.rb restricts access to the datasets controller to users with user.permission?(UserPermission::LEVELBUILDER).

Additionally, datasets_controller.rb limits usage of the datasets controller to levelbuilder intsances, which prevents accessing /datasets on production instances.

This PR removes the second check. Users still require levelbuilder permissions, this time on prod, but based on empirical testing, this does not open editing datasets up to random users.

If somebody unauthorized were able to escalate to levelbuilder permissions for their user, they would not have access to PII, but could disrupt applab usage by deleting our datasets (which would be pretty easy to restore, say it would take a few hours).

Additionally, if such an exploit were found, it would be a much more serious issue anyway, as it would allow deleting ALL our curriculum by using the exploit (with a content-scoop hrs delay).

UPDATE: security issues

We've been granting levelbuilder access on prod more freely than levelbuilder access to levelbuilder. It was a very bad assumption that it would be a subset of levelbuilder access on levelbuilder, the reverse is closer to true. The situation is that levelbuilder access on prod has been considered safe to grant, because it doesn't let you edit (due to checking if levelbuilder_mode is enabled), and therefore to let external reviewers and partners access courses / scripts with a published_state of in_development, we've historically granted them levelbuilder access on prod (but not on levelbuilder).

If we wanted to proceed down this route, which may not be worth doing in light of this, we'd need to create a new perms set that permits access to in_development courses (relevant code in ability.rb), without granting full levelbuilder access, and ALSO migrate existing accounts over to this (with some heuristic for preserving those, if any, that need levelbuilder access to prod, potentially for other reasons?).

cat5inthecradle · 2025-03-25T14:13:48Z

What about an option where we publish datasets from levelbuilder as an artifact to S3(?), and then on DTx we load the latest artifact into the datablock storage for that environment? This would keep the levelbuilder-user experience the same, and move us to a model where dataset updates are considered part of levelbuilder updates, with a new version entering the pipeline to production at night with the rest of the levelbuilder changes.

The thing I don't like about this approach is that it introduces (compared to the pre-datablocks implementation) a likely discrepancy between Levelbuilder and Production. The system will not enforce that Production be "behind" Levelbuilder, or that Levelbuilder be a "draft/unpublished" version - they are simply different and their consistency is enforced only by human decisions. This is on a per-dataset basis, too, so we will find ourselves in a situation where Levelbuilder and Production datasets are unintentionally different.

I'd argue that option 1 should have been the MVP of the datablocks project, and Option 3 wouldn't have been on the table, so I actually prefer 1.

snickell · 2025-03-25T15:16:01Z

The coupling in firebase seemed to be a UX surprise for curriculum, it hadn't broken the world because we'd gotten lucky vs it being a pattern we want to perpetuate.

I'm closing this PR given its reliance on an access grants ("levelbuilder on prod") that already have been handed out more widely than I'd like. Its possibly relevant to revisit it if we independently address that issue in a way that frees up levelbuilder on prod to more literally mean "levelbuilder access on prod".

Don't :require_levelbuilder_mode to edit datasets

009696a

cat5inthecradle requested a review from a team March 25, 2025 14:06

snickell closed this Mar 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't `:require_levelbuilder_mode` to edit datasets#64779

Don't `:require_levelbuilder_mode` to edit datasets#64779
snickell wants to merge 1 commit into
stagingfrom
seth/allow-editing-datasets-wo-levelbuilder-mode

snickell commented Mar 25, 2025 •

edited

Loading

Uh oh!

cat5inthecradle commented Mar 25, 2025 •

edited

Loading

Uh oh!

snickell commented Mar 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

snickell commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

History of the issue

Possible fixes

Security Implications

UPDATE: security issues

Uh oh!

cat5inthecradle commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snickell commented Mar 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

snickell commented Mar 25, 2025 •

edited

Loading

cat5inthecradle commented Mar 25, 2025 •

edited

Loading