Skip to content

Don't :require_levelbuilder_mode to edit datasets#64779

Closed
snickell wants to merge 1 commit into
stagingfrom
seth/allow-editing-datasets-wo-levelbuilder-mode
Closed

Don't :require_levelbuilder_mode to edit datasets#64779
snickell wants to merge 1 commit into
stagingfrom
seth/allow-editing-datasets-wo-levelbuilder-mode

Conversation

@snickell

@snickell snickell commented Mar 25, 2025

Copy link
Copy Markdown
Collaborator

Datablock Storage (PR) unintentionally broke the ability for curriculum to self-service in updating applab datasets.

This PR is a one-liner (it removes an unnecessary (?) security check) that restores that functionality by allowing users with levelbuilder permissions on prod to access existing endpoints https://studio.code.org/datasets and https://studio.code.org/datasets/manifest/edit.

This PR restores the same editing flow as prior to datablock storage, but curriculum should edit both levelbuilder and prod to make an update as they are no longer joined at the DB level.

History of the issue

Prior to Datablock Storage (which uses our mysql DB), we stored applab datasets in Firebase.

These were typically edited by curriculum accessing https://levelbuilder-studio.code.org/datasets/manifest/edit and https://levelbuilder-studio.code.org/datasets, which use the DatasetsController.

Behind the scenes, levelbuilder was actually sharing applab datasets with production, and therefore when you edited levelbuilder, prod was instantly updated. This bypassed the usual "updates on levelbuilder are content scooped, stored in git, and re-hydrated on prod" flow, but nobody noticed/remembered this was the case.

With Datablock Storage, we did not catch this impicit coupling between levelbuilder and prod. As a result, we broke the ability for curriculum to self-service.

Current status quo is that curriculum must request updates from engineering, who can run a hacky / error-prone script I created to do the updates manually using console access (relevant slack thread).

Possible fixes

  1. rejected: use the same model firebase used: levelbuilder edits the prod dataset definition. We won't do this for obvious security reasons: we will not give levelbuilder access to the prod DB.

  2. add a content scoop hook to dump both datasets and the dataset library manifest into git, and rehydrate from there when seeding/re-seeding the DB. this matches how levels/scripts work.

  • this is probably the 'right' fix, but it'd take significant dev time to implement for a procedure we do once or twice a year.
  • additionally, datasets are a significant amount of data kb-wise, and I'm reluctant to check another large thing into git
  • seeding is already slow, and this will add more seed data which is not actually used in tests
  • realistically, we might implement this fix long-term, but its probably not a real option today given the time involved and there's no work hours allocated toward it
  1. approach taken by this PR: currently we only allow accessing /datasets and /datasets/manifest/edit when the user has UserPermission::LEVELBUILDER AND the server is in levelbuilder_mode. If we reduce this to simply requiring UserPermission::LEVELBUILDER, curriculum authors can self-service again.
  • They'll need to update both prod and levelbuilder when they make changes, but edits have not been frequent (once or twice per year).

Really the choice today is between #3, or continuing to edit SQL by hand / use scripts, which is relatively disempowering for curriculum, and relatively risk prone.

Security Implications

Currently ability.rb restricts access to the datasets controller to users with user.permission?(UserPermission::LEVELBUILDER).

Additionally, datasets_controller.rb limits usage of the datasets controller to levelbuilder intsances, which prevents accessing /datasets on production instances.

This PR removes the second check. Users still require levelbuilder permissions, this time on prod, but based on empirical testing, this does not open editing datasets up to random users.

If somebody unauthorized were able to escalate to levelbuilder permissions for their user, they would not have access to PII, but could disrupt applab usage by deleting our datasets (which would be pretty easy to restore, say it would take a few hours).

Additionally, if such an exploit were found, it would be a much more serious issue anyway, as it would allow deleting ALL our curriculum by using the exploit (with a content-scoop hrs delay).

UPDATE: security issues

We've been granting levelbuilder access on prod more freely than levelbuilder access to levelbuilder. It was a very bad assumption that it would be a subset of levelbuilder access on levelbuilder, the reverse is closer to true. The situation is that levelbuilder access on prod has been considered safe to grant, because it doesn't let you edit (due to checking if levelbuilder_mode is enabled), and therefore to let external reviewers and partners access courses / scripts with a published_state of in_development, we've historically granted them levelbuilder access on prod (but not on levelbuilder).

If we wanted to proceed down this route, which may not be worth doing in light of this, we'd need to create a new perms set that permits access to in_development courses (relevant code in ability.rb), without granting full levelbuilder access, and ALSO migrate existing accounts over to this (with some heuristic for preserving those, if any, that need levelbuilder access to prod, potentially for other reasons?).

@cat5inthecradle cat5inthecradle requested a review from a team March 25, 2025 14:06
@cat5inthecradle

cat5inthecradle commented Mar 25, 2025

Copy link
Copy Markdown
Contributor

What about an option where we publish datasets from levelbuilder as an artifact to S3(?), and then on DTx we load the latest artifact into the datablock storage for that environment? This would keep the levelbuilder-user experience the same, and move us to a model where dataset updates are considered part of levelbuilder updates, with a new version entering the pipeline to production at night with the rest of the levelbuilder changes.

The thing I don't like about this approach is that it introduces (compared to the pre-datablocks implementation) a likely discrepancy between Levelbuilder and Production. The system will not enforce that Production be "behind" Levelbuilder, or that Levelbuilder be a "draft/unpublished" version - they are simply different and their consistency is enforced only by human decisions. This is on a per-dataset basis, too, so we will find ourselves in a situation where Levelbuilder and Production datasets are unintentionally different.

I'd argue that option 1 should have been the MVP of the datablocks project, and Option 3 wouldn't have been on the table, so I actually prefer 1.

@snickell

Copy link
Copy Markdown
Collaborator Author

The coupling in firebase seemed to be a UX surprise for curriculum, it hadn't broken the world because we'd gotten lucky vs it being a pattern we want to perpetuate.

I'm closing this PR given its reliance on an access grants ("levelbuilder on prod") that already have been handed out more widely than I'd like. Its possibly relevant to revisit it if we independently address that issue in a way that frees up levelbuilder on prod to more literally mean "levelbuilder access on prod".

@snickell snickell closed this Mar 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants