Skip to content

Filter open-ended K-5 project types#66614

Merged
fisher-alice merged 24 commits into
stagingfrom
alice/filter-text
Jun 25, 2025
Merged

Filter open-ended K-5 project types#66614
fisher-alice merged 24 commits into
stagingfrom
alice/filter-text

Conversation

@fisher-alice

@fisher-alice fisher-alice commented Jun 18, 2025

Copy link
Copy Markdown
Contributor

This PR adds profanity and privacy filtering via WebPurify for projects that support open-ended text fields and are geared for young users (Sprite Lab, Poetry Lab, and Play Lab).

I reverted changes from:

We temporarily stopped the filtering of Play Lab project source files because of a several reports from teachers that false positives were blocking projects and disrupting their classrooms. A high number of these false positives were due to the block ids which include a random sequence of ascii characters, parts of which were being flagged by WebPurify.

To resolve this, this PR updates the find_share_failure method in share_filtering which now calls on new helper functionextract_text_blockly. This function first detects whether the program is in XML or JSON format. If XML, it strips the xml tags. If JSON, field values, block inputs, comments, and variables are extracted (and block ids are not included). traverse_block is a recursive helper function that helps extracts field values, comments, and input values within a Blockly 'block'.

Before update

Sprite Lab standalone project (currently program is not filtered for profanity):

before-update-spritelab-project.mov

Sprite Lab standalone activity level (currently program is not filtered for profanity):

before-update-spritelab-level.mov

Poetry Lab standalone project (currently program is not filtered for PII):

poetry-before-pii.mov

Play Lab standalone project (currently program is not filtered for profanity):

before-update-playlab-project.mov

A note that currently, Play Lab activity levels ARE being filtered. When a program is flagged, the level sources are not saved. This will be maintained:

on-prod-playlab-activity-level.mov

After update

Sprite Lab standalone project with profanity detected:

after-update-spritelab-project.mov

Sprite Lab activity level with profanity detected:

after-update-spritelab-level.mov

Poetry standalone project with PII detected:

after-update-poetry-project-pii-filtering.mov

Play Lab standalone projectwith profanity detected:

after-update-playlab-project.mov

Links

Testing story

  • I added back unit tests in test_sources that were removed by Remove WebPurify check for Play Lab when fetching source file #65468.
  • I added and updated unit tests in test_share_filtering for new and updated helper functions.
  • Locally, I checked in standalone open-ended projects geared for young users (Sprite Lab, Play Lab, and Poetry Lab) and on activity levels that programs are filtered.

Deployment strategy

Follow-up work

Privacy

Security

Caching

PR Checklist:

  • Tests provide adequate coverage
  • Privacy and Security impacts have been assessed
  • Code is well-commented
  • New features are translatable or updates will not break translations
  • Relevant documentation has been added or updated
  • User impact is well-understood and desirable
  • Pull Request is labeled appropriately
  • Follow-up work items (including potential tech debt) are tracked and linked

_, project_id = storage_decrypt_channel_id(encrypted_channel_id)
project_type = Project.find(project_id).project_type
not_found if abuse_score >= SharedConstants::ABUSE_CONSTANTS.ABUSE_THRESHOLD && !can_view_abusive_assets?(encrypted_channel_id)
not_found if profanity_privacy_violation?(filename, result[:body], project_type) && !can_view_profane_or_pii_assets?(encrypted_channel_id)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

});
}

function fetchPrivacyProfanityViolations(resolve) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undoing #65397.

if params[:program] && sharing_allowed
share_failure = nil
if @level.game.sharing_filtered?
project_type = 'playlab'

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other open-ended projects geared for young users are channel-backed and filtered via call to fetchPrivacyProfanityViolations in project.js within loadProjectBackedLevel_:

return fetchAbuseScoreAndPrivacyViolations(this);

@fisher-alice fisher-alice marked this pull request as ready for review June 24, 2025 16:53
@fisher-alice fisher-alice requested a review from a team June 24, 2025 16:54

@molly-moen molly-moen left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a couple minor questions!

# convert to array of lines split at newline,
# strip leading/trailing whitespace from each line,
# drop any blank lines.
return stripped.gsub(/<[^>]*>/, "\n").split("\n").map(&:strip).reject(&:empty?)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this still include ids?

@fisher-alice fisher-alice Jun 25, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No , the returned string array will not include ids. XML programs can include block ids, but they're contained within the XML tag, e.g., <block type="math_number" id="fill_in_actor_qtip">, so they will be stripped.

end

# Recurse into the 'next' chain.
traverse_block(block.dig("next", "block"), texts)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will there always be a next?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is not always a "next". I'm going to update the test to reflect that there is not always a "next". If there is no "next", then block.dig("next", "block") returns nil, and when traverse_block is called again, it will return immediately since block.is_a?(nil) will return false. Thanks!

@fisher-alice fisher-alice merged commit d9f5b3a into staging Jun 25, 2025
6 checks passed
@fisher-alice fisher-alice deleted the alice/filter-text branch June 25, 2025 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants