Skip to content

AIChat Structured Output Request on Initial Failed Request#65569

Merged
tshaffercodeorg merged 9 commits into
stagingfrom
tyrone/aichat_structured_output_retry
May 14, 2025
Merged

AIChat Structured Output Request on Initial Failed Request#65569
tshaffercodeorg merged 9 commits into
stagingfrom
tyrone/aichat_structured_output_retry

Conversation

@tshaffercodeorg

@tshaffercodeorg tshaffercodeorg commented Apr 29, 2025

Copy link
Copy Markdown
Contributor

Changes Summary

Files Changed

  • aichat_safety_helper.rb
  • aichat_openai_helper.rb
  • openai_chat_helper.rb
  • aichat_safety_helper_test.rb

Testing Notes

  • Used prompt known to override previous moderation framework prompting ("What is the largest number less than 100 that is both a perfect square and a perfect cube?") after implementing new structured output method to determine whether the system was now reattempting failed edge cases with structured outputs.
  • Also used known safe prompt ("What are fun things to do in Tucson, Arizona?") to make sure that the new method did not fundamentally alter/affect querying.

Current Production Build:

image

Local Testing

image

Comment thread dashboard/app/helpers/aichat_safety_helper.rb Outdated
Comment thread dashboard/app/helpers/aichat_safety_helper.rb Outdated
@tshaffercodeorg tshaffercodeorg requested review from a team as code owners April 30, 2025 21:53
@tshaffercodeorg tshaffercodeorg force-pushed the tyrone/aichat_structured_output_retry branch from d7962d5 to a86c6ea Compare April 30, 2025 21:59
@github-actions

Copy link
Copy Markdown

🖼️ Storybook Visual Comparison Report

⚠️⚠️⚠️ Detected Storybook eyes differences, see report!

A difference was found in our Storybook front-end visual comparison testing against the staging baseline.
This difference was detected in Applitools Eyes and is viewable in the link above.

Remediation steps:

  1. Open the report
  2. Determine whether the differences are expected based on this PR's changes
    a. If expected: Before merging this PR, accept the new baselines and re-run this action, it should pass.
    b. If not expected: Push updates to this PR to correct the differences.

Comment thread dashboard/app/helpers/aichat_openai_helper.rb Outdated
@tshaffercodeorg tshaffercodeorg requested a review from a team as a code owner May 7, 2025 18:54
@tshaffercodeorg tshaffercodeorg force-pushed the tyrone/aichat_structured_output_retry branch from 91e9a53 to 0afcdc9 Compare May 7, 2025 19:31

@bencodeorg bencodeorg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few smaller comments, but main question is about using "function calling" vs. specifying a response format like we do in our existing usage of structured outputs.

Per this description, it sounds like our use case would probably be a candidate for using response format?
https://platform.openai.com/docs/guides/structured-outputs#function-calling-vs-response-format

Happy to chat synchronously and/or pair on this as well if helpful!

Comment thread dashboard/app/helpers/aichat_openai_helper.rb Outdated
Comment thread dashboard/app/helpers/aichat_safety_helper.rb Outdated
Comment thread dashboard/app/helpers/aichat_safety_helper.rb Outdated
content: text
}
],
tools: [

@bencodeorg bencodeorg May 7, 2025

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readability nit: maybe break out these beefier objects into variable declarations before using them when calling request_structured_chat_completion?

@bencodeorg bencodeorg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few smaller comments, but overall, I pulled this and it worked great! 🎉

One general note is that I do think this could be simplified a bit -- easiest way to do this was for me to play with it a bit, so here's a branch off your branch that drops one of the layers: #65808

We can merge yours as-is and do this as a follow-up, or merge this into your branch (and take it for another spin if you do merge it in, I just tested quickly 😅 ) Thanks for doing this!


evaluation = JSON.parse(response.body)['choices'][0]['message']['content']
unless VALID_EVALUATION_RESPONSES_SIMPLE.include?(evaluation)
report_openai_safety_check("InvalidResponse")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use a different message to differentiate this response (expected sometimes) from the errors below (expected almost never)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, or alternatively should we also pass through the attempt # we're on so we can use that as a dimension value?

}

AichatOpenaiHelper.request_structured_chat_completion(
[

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same as what's returned by safety_check_messages?

body = JSON.parse(http_response.body)
raise StandardError.new(body['error']) if body['error']
# Check if the returned content is a json, else throw error
response_json = body.dig("choices", 0, "message", "content")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this error checking different than what we're doing in request_chat_completion above?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, could this be consolidated with the above method? and/or common code moved to a helper? Looks like the key differences here are 1) passing in the response format and 2) parsing the JSON response?

# Retry only on network-related exceptions
response = Retryable.retryable(
tries: 2,
on: [Net::OpenTimeout, Net::ReadTimeout, SocketError, Errno::ECONNRESET]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this error list come from somewhere (eg, errors we've seen in HB?), or just theorized network exceptions? FWIW still feels a little excessive to me, but not a huge deal. I think adding a more verbose comment on the reasoning though might be helpful for future readers.


begin
parsed = JSON.parse(raw_content)
rescue JSON::ParserError

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we just let Ruby throw this error (vs rescuing and raising a custom error)?

@sanchitmalhotra126 sanchitmalhotra126 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, very cool to see this working end to end! I think there are some options for simplification and consolidation to make the code easier to read and follow, and +1 to Ben's suggestions and simplification PR. But overall this is looking good!

messages = safety_check_messages(text, level_id)

# Retry only on network-related exceptions
response = Retryable.retryable(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still worth ditching Retryable and letting any uncaught exceptions after the second round bubble up to the caller?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely thought about this. In some ways, I feel like the retryable clause ends up being cleaner since the worst case flow is 1st call network error -> 2nd attempt invalid response -> 3rd attempt structural and retryable feels like it has a much shorter/simpler syntax to capture what's going on with those first two attempts rather than a tree of if/else.

If we'd prefer being more explicit with the if/else cases, though, I think it totally makes sense to switch to that and it relies less on a library.


evaluation = JSON.parse(response.body)['choices'][0]['message']['content']
unless VALID_EVALUATION_RESPONSES_SIMPLE.include?(evaluation)
report_openai_safety_check("InvalidResponse")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, or alternatively should we also pass through the attempt # we're on so we can use that as a dimension value?

def request_chat_completion(messages, temperature = DEFAULT_TEMPERATURE)
# Extra empty set is included to provide generic coverage to new parameters that OpenAI adds in the future
# Examples include response_format for json response formatting and tools for function calling
def request_chat_completion(messages, temperature = DEFAULT_TEMPERATURE, extra: {})

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this generic extra param just be an optional request_parameters so you don't have to merge later on?

body = JSON.parse(http_response.body)
raise StandardError.new(body['error']) if body['error']
# Check if the returned content is a json, else throw error
response_json = body.dig("choices", 0, "message", "content")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, could this be consolidated with the above method? and/or common code moved to a helper? Looks like the key differences here are 1) passing in the response format and 2) parsing the JSON response?

Comment thread dashboard/app/helpers/aichat_safety_helper.rb Outdated
rescue JSON::ParserError
raise StandardError.new("Unexpected JSON response, got: #{response_json}")
end
http_response

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we returning the whole http_response here? Would it be simpler to just return the parsed JSON since we're already checking that it's valid?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was being overly cautious; I had built up a lot of edge case handling around the structured responses (e.g. if it still gives a malformed response or returns empty) since while OpenAI advertises structured responses as being much more fixed than typical responses, they still included a caveated in their documentation that it could still possibly ignore the instructions. This means there might (very) occasionally be times where you have to fallback onto the body content instead of checking the intended json output.

However, I think you are correct in that we could totally just be handling all that logic within this function instead. I think the only logical reason why we would return the full response is to be able to log the exact kind of error (e.g. malformed structured response, empty response, response inside content body) but, on thinking about it further, it doesn't really make sense to bother logging that since we wouldn't be able to meaningfully action on those errors anyways.

Anyways TL;DR: I basically dunno after the weekend LOL. But I think I was probably being overly cautious and trying to include error messages for very specific edge cases.

end
raise "OpenAI request failed with status #{response.code}: #{response.body}" unless response.success?

evaluation = JSON.parse(response.body)['choices'][0]['message']['content']

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not your change - but I just realized that this helper calls the OpenaiChatHelper directly rather than the AichatSafetyHelper, of which the latter already does some work to unwrap and verify this response. It might be worth using AichatSafetyHelper#request_chat_completion to further simplify some of this (and that way it's analogous to AichatSafetyHelper#request_structured_chat_completion, or better yet if those two functions can be combined, then we're calling the same function in both cases).


# Makes a structured output request to GPT for moderation classification.
private def request_structured_safety_check(text, level_id)
safety_json_schema = {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: might be nice to move this to a constant

bencodeorg and others added 2 commits May 12, 2025 14:25
…utput-updates

Incorporated feedback to remove extraneous function/call by instead directly invoking the main function with a clarified optional parameter options
@tshaffercodeorg

Copy link
Copy Markdown
Contributor Author

I went through and tested Ben's changes to simplify things and confirmed that it doesn't change any core functionality!

image

Going to merge and address any remaining simplifications as a follow-up

@tshaffercodeorg tshaffercodeorg merged commit da335c8 into staging May 14, 2025
6 checks passed
@tshaffercodeorg tshaffercodeorg deleted the tyrone/aichat_structured_output_retry branch May 14, 2025 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants