[AI Tutor] CT-432: Get profanity and PII filters working as expected#58083
Conversation
…mily/ct-427/students-provide-feedback
…mily/ct-432/flag-pii
| }; | ||
| if (!response) return {status: Status.ERROR}; | ||
|
|
||
| switch (response.status) { |
There was a problem hiding this comment.
This logic maps the response.status from the existing ShareFilter API (which is the type of violation) to one of two non-generic error statuses profanity_violation or pii_violation we'll store in the db.
The previous checks would never have passed
response?.status === Status.PROFANITY_VIOLATION
and I switched to a switch statement because of the many-1 mapping of ShareFilter statuses to AITutorInteractionStatus.
| const assistantChatMessage: ChatCompletionMessage = { | ||
| role: Role.ASSISTANT, | ||
| status: Status.OK, | ||
| status: chatApiResponse.status, |
There was a problem hiding this comment.
Storing the chatApiResponse status from the server lets us style the messages appropriately later.
| 'This chat has been hidden because it is inappropriate.'; | ||
| const PII_VIOLATION_USER_MESSAGE = | ||
| 'This chat has been hidden because it contains personal information.'; | ||
| const ERROR_USER_MESSAGE = |
There was a problem hiding this comment.
I standardized the variable names in this file so they all refer to "PII violation" and "profanity violation" instead of the more generic "inappropriate"/"personal."
| return null; | ||
| }; | ||
|
|
||
| const UserMessage: React.FC<UserMessageProps> = ({message}) => { |
There was a problem hiding this comment.
Breaking this up into a functional component called displayUserMessage felt redundant, so I condensed it.
| GENERAL_CHAT: 'general_chat', | ||
| }.freeze | ||
|
|
||
| PII_TYPES = { |
There was a problem hiding this comment.
We don't use these anywhere on the backend, so I removed them. cc @bencodeorg
molly-moen
left a comment
There was a problem hiding this comment.
One question on logging, otherwise looks good!
| // Analogous to https://github.com/code-dot-org/ml-playground/pull/299 | ||
| // We want to expose enough information to help troubleshoot false positives | ||
| const logViolationDetails = (response: OpenaiChatCompletionMessage) => { | ||
| console.info('Violation detected in chat completion response', { |
There was a problem hiding this comment.
should we log this to cloudwatch instead? Or in addition to this?
There was a problem hiding this comment.
Great idea. It'd be nice to not have to wait for users to report these to see how aggressive it's being. Thanks, Molly!
There was a problem hiding this comment.
seems reasonable to log a generic event for now!
|
@molly-moen I added a generic metric so we can view frequency, but I'd want to check with Travis before I start logging student PII/profanity to Cloudwatch. Does that seem reasonable to you? I also removed the use of the Lab2MetricsReporter per conversation with @sanchitmalhotra126. Diff: efe22e3 |
The following PR:
UserMessagecomponent to correctly "redact" any flagged messagesLinks
Jira ticket 1: https://codedotorg.atlassian.net/browse/CT-432
Jira ticket 2: https://codedotorg.atlassian.net/browse/CT-442
Testing story
Tested manually.
4_29_demo.mov
Here's what the log will look like.
Deployment strategy
Follow-up work
This style of "redacting" messages is unique to AI Tutor. Gen AI is rendering dismissable "alerts." This is fine for now according to Mark, but I will reach out in the #ai-for-student-learning to continue the conversation.
PR Checklist: