Categories:: Table functions (Cortex Agents)

GET_AI_EVALUATION_DATA (SNOWFLAKE.LOCAL)¶

Retrieves evaluation data for a run for a Cortex Agent or for an External Agent application (see External Agent commands).

Call this function to inspect all recorded traces for an evaluation run. For more information on Cortex Agent evaluations, see Cortex Agent evaluations. For AI Observability applications, see Observability data.

See also:: EXECUTE_AI_EVALUATION , GET_AI_RECORD_TRACE (SNOWFLAKE.LOCAL) , GET_AI_OBSERVABILITY_LOGS (SNOWFLAKE.LOCAL) , GET_AI_OBSERVABILITY_EVENTS (SNOWFLAKE.LOCAL)

Syntax¶

SNOWFLAKE.LOCAL.GET_AI_EVALUATION_DATA( <database> , <schema> , <agent_name> , <agent_type>, <run_name> )

Arguments¶

database: Name of the database containing the agent.
schema: Name of the schema containing the agent.
agent_name: Name of the agent to retrieve a record for.
agent_type: The agent type string. Use CORTEX AGENT for a Cortex Agent or EXTERNAL AGENT for an External Agent object. This value is case-insensitive.
run_name: Name of the run to retrieve full evaluation data for.

Returns¶

A table containing information for the specified evaluation, with the following columns:


Column	Data type	Description
RECORD_ID	VARCHAR	The unique identifier assigned by Snowflake for this evaluation record.
INPUT_ID	VARCHAR	The unique identifier assigned by Snowflake for this evaluation input.
REQUEST_ID	VARCHAR	The unique identifier assigned by Snowflake for this request.
TIMESTAMP	TIMESTAMP_TZ	The time (in UTC) at which the request was made.
DURATION_MS	INT	The amount of time, in milliseconds, that it took for the agent to return a response.
INPUT	VARCHAR	The query string used as input for this evaluation record.
OUTPUT	VARCHAR	The response returned by the Cortex Agent for this evaluation record.
ERROR	VARCHAR	Information about any errors that occurred during the request.
GROUND_TRUTH	VARCHAR	The ground truth information used to evaluate this record’s Cortex Agent output. This column holds the JSON from your dataset’s ground truth column, serialized as a string. For how `{{ground_truth}}` in custom metrics relates to this value, see the notes under Evaluation results table format.
METRIC_NAME	VARCHAR	The name of the metric evaluated for this record.
EVAL_AGG_SCORE	NUMBER	The evaluation score assigned for this record.
METRIC_TYPE	VARCHAR	The type of metric being evaluated. For built-in metrics, the value is `system`. For custom metrics, the value is `custom`.
METRIC_STATUS	VARIANT	A map containing information about the agent’s HTTP response for this record, with the following keys: `status`: The HTTP status code of the response. `message`: The HTTP message sent in the status response.
METRIC_CALLS	ARRAY	An array of VARIANT values that contain information about the computed metric. Each array entry contains the metric’s criteria, an explanation of the metric score, and metadata. The keys of each entry are: `criteria`: The criteria used by an LLM judge to evaluate response correctness. `explanation`: An explanation of why the score was assigned. `full_metadata`: A VARIANT value that contains metadata and information about this metric’s processing by the LLM judge. The keys of this map include: `completion_tokens`: The number of output tokens generated by the LLM for this metric evaluation call. `normalized_score`: The original evaluation score normalized to the range [0.0, 1.0], rounded to two decimal places. `original_score`: The original score assigned by this metric evaluation for the record. `prompt_tokens`: The number of tokens taken up by the prompt provided to the LLM judge. `total_tokens`: The total number of tokens used by the LLM judge for this computation.
TOTAL_INPUT_TOKENS	INT	The total number of tokens used to process the input query.
TOTAL_OUTPUT_TOKENS	INT	The total number of output tokens produced by the Cortex Agent.
LLM_CALL_COUNT	INT	Counts the number of times any LLM was called, either by the agent or an evaluation judge.

Access control requirements¶

A role used to execute this operation must have the following privileges at a minimum:


Privilege	Object	Notes
CORTEX_USER	Database role
USAGE	Cortex Agent or External Agent	Required on the object identified by `agent_name`. For `EXTERNAL AGENT`, USAGE on the External Agent is sufficient to call this function (MONITOR does not apply).
MONITOR	Cortex Agent	Required on the Cortex Agent identified by `agent_name` when `agent_type` is `CORTEX AGENT`. Does not apply when `agent_type` is `EXTERNAL AGENT`.

Operating on an object in a schema requires at least one privilege on the parent database and at least one privilege on the parent schema.

For instructions on creating a custom role with a specified set of privileges, see Creating custom roles.

For general information about roles and privilege grants for performing SQL actions on securable objects, see Overview of Access Control.

When agent_type is EXTERNAL AGENT, only USAGE on that object is required to call this function. OWNERSHIP on the External Agent is required to modify or remove the object with ALTER EXTERNAL AGENT or DROP EXTERNAL AGENT.

For the full access control permissions required by Cortex Agent evaluations, see Cortex Agent evaluations – Access control requirements. For External Agent objects, see Observability data.

Examples¶

The following example displays the full evaluation details for a run called run-1, where the agent is named evaluated_agent stored on the schema eval_db.eval_schema:

SELECT * FROM TABLE(SNOWFLAKE.LOCAL.GET_AI_EVALUATION_DATA(
  'eval_db',
  'eval_schema',
  'evaluated_agent',
  'CORTEX AGENT',
  'run-1')
);