AIChat Structured Output Request on Initial Failed Request by tshaffercodeorg · Pull Request #65569 · code-dot-org/code-dot-org

tshaffercodeorg · 2025-04-29T18:28:45Z

Changes Summary

Added new method .request_structured_safety_check that formats a request using openAI's new (as of writing) structured output API for moderation purposes
Adjusted reattempt logic in .openai_safety_check to now use this new structured API after an initial failure. This was due to our internal testing revealing that unstructured output has an overall higher true positive rate than structured outputs as of 04/29/2025 and so structured outputs are only used to account for edge cases where unexpected output due to user prompts are generated

Files Changed

aichat_safety_helper.rb
aichat_openai_helper.rb
openai_chat_helper.rb
aichat_safety_helper_test.rb

Testing Notes

Used prompt known to override previous moderation framework prompting ("What is the largest number less than 100 that is both a perfect square and a perfect cube?") after implementing new structured output method to determine whether the system was now reattempting failed edge cases with structured outputs.
Also used known safe prompt ("What are fun things to do in Tucson, Arizona?") to make sure that the new method did not fundamentally alter/affect querying.

Current Production Build:

Local Testing

github-actions · 2025-04-30T22:01:50Z

🖼️ Storybook Visual Comparison Report

⚠️⚠️⚠️ Detected Storybook eyes differences, see report!

A difference was found in our Storybook front-end visual comparison testing against the staging baseline.
This difference was detected in Applitools Eyes and is viewable in the link above.

Remediation steps:

Open the report
Determine whether the differences are expected based on this PR's changes
a. If expected: Before merging this PR, accept the new baselines and re-run this action, it should pass.
b. If not expected: Push updates to this PR to correct the differences.

…chat completion

…put processing outside of retry loop in safety_helper

…tructured output

bencodeorg

A few smaller comments, but main question is about using "function calling" vs. specifying a response format like we do in our existing usage of structured outputs.

Per this description, it sounds like our use case would probably be a candidate for using response format?
https://platform.openai.com/docs/guides/structured-outputs#function-calling-vs-response-format

Happy to chat synchronously and/or pair on this as well if helpful!

bencodeorg · 2025-05-07T20:20:14Z

+            content: text
+          }
+        ],
+        tools: [


Readability nit: maybe break out these beefier objects into variable declarations before using them when calling request_structured_chat_completion?

…st case to reflect new structured response fallback

bencodeorg

A few smaller comments, but overall, I pulled this and it worked great! 🎉

One general note is that I do think this could be simplified a bit -- easiest way to do this was for me to play with it a bit, so here's a branch off your branch that drops one of the layers: #65808

We can merge yours as-is and do this as a follow-up, or merge this into your branch (and take it for another spin if you do merge it in, I just tested quickly 😅 ) Thanks for doing this!

bencodeorg · 2025-05-12T18:45:04Z

+
+      evaluation = JSON.parse(response.body)['choices'][0]['message']['content']
+      unless VALID_EVALUATION_RESPONSES_SIMPLE.include?(evaluation)
+        report_openai_safety_check("InvalidResponse")


Could we use a different message to differentiate this response (expected sometimes) from the errors below (expected almost never)?

+1, or alternatively should we also pass through the attempt # we're on so we can use that as a dimension value?

bencodeorg · 2025-05-12T20:13:13Z

+      }
+
+      AichatOpenaiHelper.request_structured_chat_completion(
+        [


Is this the same as what's returned by safety_check_messages?

bencodeorg · 2025-05-12T20:14:36Z

+    body = JSON.parse(http_response.body)
+    raise StandardError.new(body['error']) if body['error']
+    # Check if the returned content is a json, else throw error
+    response_json = body.dig("choices", 0, "message", "content")


Is this error checking different than what we're doing in request_chat_completion above?

+1, could this be consolidated with the above method? and/or common code moved to a helper? Looks like the key differences here are 1) passing in the response format and 2) parsing the JSON response?

bencodeorg · 2025-05-12T20:33:33Z

+      # Retry only on network-related exceptions
+      response = Retryable.retryable(
+        tries: 2,
+        on: [Net::OpenTimeout, Net::ReadTimeout, SocketError, Errno::ECONNRESET]


Does this error list come from somewhere (eg, errors we've seen in HB?), or just theorized network exceptions? FWIW still feels a little excessive to me, but not a huge deal. I think adding a more verbose comment on the reasoning though might be helpful for future readers.

bencodeorg · 2025-05-12T20:42:25Z

+
+        begin
+          parsed = JSON.parse(raw_content)
+        rescue JSON::ParserError


Why don't we just let Ruby throw this error (vs rescuing and raising a custom error)?

sanchitmalhotra126

Nice work, very cool to see this working end to end! I think there are some options for simplification and consolidation to make the code easier to read and follow, and +1 to Ben's suggestions and simplification PR. But overall this is looking good!

sanchitmalhotra126 · 2025-05-12T21:01:43Z

+      messages = safety_check_messages(text, level_id)
+
+      # Retry only on network-related exceptions
+      response = Retryable.retryable(


Is it still worth ditching Retryable and letting any uncaught exceptions after the second round bubble up to the caller?

I definitely thought about this. In some ways, I feel like the retryable clause ends up being cleaner since the worst case flow is 1st call network error -> 2nd attempt invalid response -> 3rd attempt structural and retryable feels like it has a much shorter/simpler syntax to capture what's going on with those first two attempts rather than a tree of if/else.

If we'd prefer being more explicit with the if/else cases, though, I think it totally makes sense to switch to that and it relies less on a library.

sanchitmalhotra126 · 2025-05-12T21:03:30Z

+
+      evaluation = JSON.parse(response.body)['choices'][0]['message']['content']
+      unless VALID_EVALUATION_RESPONSES_SIMPLE.include?(evaluation)
+        report_openai_safety_check("InvalidResponse")


+1, or alternatively should we also pass through the attempt # we're on so we can use that as a dimension value?

sanchitmalhotra126 · 2025-05-12T21:07:05Z

-    def request_chat_completion(messages, temperature = DEFAULT_TEMPERATURE)
+    # Extra empty set is included to provide generic coverage to new parameters that OpenAI adds in the future
+    # Examples include response_format for json response formatting and tools for function calling
+    def request_chat_completion(messages, temperature = DEFAULT_TEMPERATURE, extra: {})


Could this generic extra param just be an optional request_parameters so you don't have to merge later on?

sanchitmalhotra126 · 2025-05-12T21:09:05Z

+    body = JSON.parse(http_response.body)
+    raise StandardError.new(body['error']) if body['error']
+    # Check if the returned content is a json, else throw error
+    response_json = body.dig("choices", 0, "message", "content")


+1, could this be consolidated with the above method? and/or common code moved to a helper? Looks like the key differences here are 1) passing in the response format and 2) parsing the JSON response?

sanchitmalhotra126 · 2025-05-12T21:15:17Z

+    rescue JSON::ParserError
+      raise StandardError.new("Unexpected JSON response, got: #{response_json}")
+    end
+    http_response


Why are we returning the whole http_response here? Would it be simpler to just return the parsed JSON since we're already checking that it's valid?

I think I was being overly cautious; I had built up a lot of edge case handling around the structured responses (e.g. if it still gives a malformed response or returns empty) since while OpenAI advertises structured responses as being much more fixed than typical responses, they still included a caveated in their documentation that it could still possibly ignore the instructions. This means there might (very) occasionally be times where you have to fallback onto the body content instead of checking the intended json output.

However, I think you are correct in that we could totally just be handling all that logic within this function instead. I think the only logical reason why we would return the full response is to be able to log the exact kind of error (e.g. malformed structured response, empty response, response inside content body) but, on thinking about it further, it doesn't really make sense to bother logging that since we wouldn't be able to meaningfully action on those errors anyways.

Anyways TL;DR: I basically dunno after the weekend LOL. But I think I was probably being overly cautious and trying to include error messages for very specific edge cases.

sanchitmalhotra126 · 2025-05-12T21:22:28Z

+      end
+      raise "OpenAI request failed with status #{response.code}: #{response.body}" unless response.success?
+
+      evaluation = JSON.parse(response.body)['choices'][0]['message']['content']


Also not your change - but I just realized that this helper calls the OpenaiChatHelper directly rather than the AichatSafetyHelper, of which the latter already does some work to unwrap and verify this response. It might be worth using AichatSafetyHelper#request_chat_completion to further simplify some of this (and that way it's analogous to AichatSafetyHelper#request_structured_chat_completion, or better yet if those two functions can be combined, then we're calling the same function in both cases).

sanchitmalhotra126 · 2025-05-12T21:23:13Z


+    # Makes a structured output request to GPT for moderation classification.
+    private def request_structured_safety_check(text, level_id)
+      safety_json_schema = {


nit: might be nice to move this to a constant

…utput-updates Incorporated feedback to remove extraneous function/call by instead directly invoking the main function with a clarified optional parameter options

tshaffercodeorg · 2025-05-14T05:58:34Z

I went through and tested Ben's changes to simplify things and confirmed that it doesn't change any core functionality!

Going to merge and address any remaining simplifications as a follow-up

tshaffercodeorg requested review from a team, bencodeorg, cnbrenci and sanchitmalhotra126 April 29, 2025 18:29

bencodeorg reviewed Apr 29, 2025

View reviewed changes

Comment thread dashboard/app/helpers/aichat_safety_helper.rb Outdated

sanchitmalhotra126 reviewed Apr 29, 2025

View reviewed changes

Comment thread dashboard/app/helpers/aichat_safety_helper.rb Outdated

tshaffercodeorg requested review from a team as code owners April 30, 2025 21:53

tshaffercodeorg force-pushed the tyrone/aichat_structured_output_retry branch from d7962d5 to a86c6ea Compare April 30, 2025 21:59

sanchitmalhotra126 reviewed Apr 30, 2025

View reviewed changes

Comment thread dashboard/app/helpers/aichat_openai_helper.rb Outdated

tshaffercodeorg requested a review from a team as a code owner May 7, 2025 18:54

tshaffercodeorg added 4 commits May 7, 2025 15:24

Rerouted 2nd attempt to use structured output API instead of regular …

c35725a

…chat completion

Added structured output method to openai_helper, moved structured out…

496ad25

…put processing outside of retry loop in safety_helper

Adjusted openai_chat_helper to reflect new optional extra field for s…

baefc42

…tructured output

Replace missing inappropriate raised error, fix merge shenanigans

0afcdc9

tshaffercodeorg force-pushed the tyrone/aichat_structured_output_retry branch from 91e9a53 to 0afcdc9 Compare May 7, 2025 19:31

bencodeorg reviewed May 7, 2025

View reviewed changes

tshaffercodeorg and others added 3 commits May 9, 2025 21:05

Refactored to use the json_schema openai API property and adjusted te…

12a8ee9

…st case to reflect new structured response fallback

Call helper directly

7f5f181

Shorten comment

5344e37

bencodeorg approved these changes May 12, 2025

View reviewed changes

sanchitmalhotra126 reviewed May 12, 2025

View reviewed changes

bencodeorg and others added 2 commits May 12, 2025 14:25

Update method name

958478d

Merge pull request #65808 from code-dot-org/ben/possible-structured-o…

3a84469

…utput-updates Incorporated feedback to remove extraneous function/call by instead directly invoking the main function with a clarified optional parameter options

tshaffercodeorg merged commit da335c8 into staging May 14, 2025
6 checks passed

tshaffercodeorg deleted the tyrone/aichat_structured_output_retry branch May 14, 2025 05:59

Conversation

tshaffercodeorg commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes Summary

Files Changed

Testing Notes

Current Production Build:

Local Testing

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 30, 2025

🖼️ Storybook Visual Comparison Report

Remediation steps:

Uh oh!

Uh oh!

bencodeorg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bencodeorg May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bencodeorg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanchitmalhotra126 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tshaffercodeorg commented May 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tshaffercodeorg commented Apr 29, 2025 •

edited

Loading

bencodeorg May 7, 2025 •

edited

Loading