App Lab - extract user text from source code before checking for profanity#70649
Conversation
| * @param {string} code JavaScript source code | ||
| * @returns {string} Extracted text content separated by spaces | ||
| */ | ||
| export const extractTextFromCode = code => { |
There was a problem hiding this comment.
could this function go somewhere more specific than a generic utils function? Maybe within the libraries folder for now since it's only used there?
There was a problem hiding this comment.
That makes sense...will move!
| it('extracts multi-line comments', () => { | ||
| const code = '/* This is a\nmulti-line comment */\nlet x = 1;'; | ||
| const result = extractTextFromCode(code); | ||
| expect(result).to.include('This is a'); |
There was a problem hiding this comment.
nit: should we just have these tests be something like expect(result).to.equal(<whatever the resulting string is)? That would make it more clear as to what the util does.
There was a problem hiding this comment.
Updated! It was a good exercise for me to update to use equals 😁
| const code = 'if(artistList[i]) == artist'; | ||
| const result = extractTextFromCode(code); | ||
| expect(result).to.equal('if artistList artist'); | ||
| expect(result).to.not.include('if(artlistList'); |
There was a problem hiding this comment.
nit: I don't think we need the not include when we have the to equal above
| @@ -0,0 +1,82 @@ | |||
| /** | |||
There was a problem hiding this comment.
nit: I would name this file extractTextFromCode to make it easier to find.
This PR extracts user text from App Lab JavaScript source code before it is sent to be filtered for profanity.
In App Lab, only project code that is being shared as libraries with other students are moderated by our current text moderation service (WebPurify).
We have gotten repeated reports of projects being flagged as false positives such as this one: https://studio.code.org/projects/applab/07b62fe6-797f-4c19-8193-645188f76389/edit
Before update
Screen.Recording.2026-02-04.at.5.19.23.PM.mov
Slack thread with this SAME example from a couple years ago and reported recently in this Zendesk ticket. https://codeorg.zendesk.com/agent/tickets/575371
Although we added the offending word to the allowlist in WebPurify, the text is still being flagged. Apparently, adding a parentheses affects the result of the filtering against the allowlist.
When I tested the phrase with a parentheses ‘if(artistlist’ a violation was found, but without a parentheses there was no violation found.
Without parentheses: no violation found

With parentheses: violation found

Thus, I added a call to a function that extracts user text from the JavaScript source code including removing parentheses from text.
This is a similar approach to when we added moderation of open-ended K-5 project types (Blockly) as we had been receiving a lot of reports of false positives because block ids were being included in the code: See #67024, #66614
Links
Testing story
Added unit tests.
I tested locally with the source code above, and this is the text returned by the
extractTextFromCode:"artistInList - Takes an artist and returns the number of times the artist appears in the list artist string - music artist like "The Beatles" "Nirvana" "Taylor Swift" etc return number - the number of times the artist appears in the list artistsInYear - Takes in a year and returns a list of artists that released an album during that year year number - year an album was released return list - the list of artists that released an album during the given year test to see if it works test to see if it works test to see if it works console log artistsInYear 2028 console log artistsInYear 1993 test to see if it works test to see if it works test to see if it works console log artistInList "The Beatles" console log artistInList "Taylor Swift" RollingStone 500 Albums Artist RollingStone 500 Albums Year RollingStone 500 Albums Artist No artist found function artistInList artist var artistList getColumn var filteredArtistList for var artistList length if artistList artist appendItem filteredArtistList artist return filteredArtistList length function artistsInYear year var yearList getColumn var filteredArtists var artistList getColumn for var yearList length if yearList year appendItem filteredArtists artistList if filteredArtists length return else return filteredArtists"Deployment strategy
Follow-up work
Privacy
Security
Caching
PR Creation Checklist: