diff --git a/tutorials/ai-core-genaihub-evaluation/PUT_YOUR_DATASET_HERE/medicalqna_dataset.csv b/tutorials/ai-core-genaihub-evaluation-comprehensive/DATASET/medicalqna_dataset.csv similarity index 100% rename from tutorials/ai-core-genaihub-evaluation/PUT_YOUR_DATASET_HERE/medicalqna_dataset.csv rename to tutorials/ai-core-genaihub-evaluation-comprehensive/DATASET/medicalqna_dataset.csv diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/README.md b/tutorials/ai-core-genaihub-evaluation-comprehensive/README.md new file mode 100644 index 000000000..2528756d3 --- /dev/null +++ b/tutorials/ai-core-genaihub-evaluation-comprehensive/README.md @@ -0,0 +1,548 @@ +# Generative AI Custom Evaluation Workflow + +This notebook demonstrates a workflow for using AI Core's custom evaluation capabilities to benchmark Large Language Models (LLMs), and evaluate different prompts for a specific use case. It utilizes the public [MedicationQA dataset](https://langtest.org/docs/pages/benchmarks/medical/medicationqa/) to showcase how to compute industry-standard metrics and assess the reliability of LLM-generated responses. + +## Prerequisites + +Before running this notebook, ensure you have the following: + +1. **Python Environment**: A running Jupyter Notebook environment. +2. **Dependencies**: The required Python packages can be installed by running the pip command in the notebook: + ```bash + pip install -r requirements.txt + ``` +3. **Environment Variables**: Create a `.env` file in the same directory as the notebook. This file should contain your credentials for SAP AI Core and AWS. A `sample.env` file is provided as a template. The notebook will prompt for any missing values. + + Your `.env` file should look like this: + ``` + # SAP AI Core Credentials + AICORE_BASE_URL= + AICORE_RESOURCE_GROUP= + AICORE_AUTH_URL= + AICORE_CLIENT_ID= + AICORE_CLIENT_SECRET= + + # AWS Credentials + AWS_ACCESS_KEY= + AWS_BUCKET_ID= + AWS_REGION= + AWS_SECRET_ACCESS_KEY= + + # Optional Orchestration Deployment URL + DEPLOYMENT_URL= + ``` + +## Workflow Overview + +The notebook is structured into the following key steps: + +### Step 1: Setup + +* **Install Dependencies**: Installs the necessary Python packages from `requirements.txt`. +* **Load Credentials**: Loads the necessary credentials and configuration from the `.env` file. It initializes the `GenAIHubProxyClient` for interacting with SAP AI Core. + +### Step 2: Prepare for Evaluation + +This section involves preparing all the necessary assets for the evaluation run. + +1. **Register Object Store Secret**: Registers your AWS S3 bucket credentials with SAP AI Core. This allows the evaluation job to access your dataset. + +An [object store secret](https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/register-your-object-store-secret) is required to store credentials to access your AWS S3 buckets, and limit access to a particular directory. +User needs to select a resource group while creating secret. +To read more about resources groups visit: [Resource Group](https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/resource-groups) + +The API endpoint can be found in : [Object store secret endpoint](https://api.sap.com/api/AI_CORE_API/resource/Object_Store_Secret) +``` + { + "name": "genai-data-notebook", + "data": { + "AWS_ACCESS_KEY_ID": AWS_ACCESS_KEY, + "AWS_SECRET_ACCESS_KEY": AWS_SECRET_ACCESS_KEY + }, + "type": "S3", + "bucket": AWS_BUCKET_ID, + "endpoint": "https://s3.aws.com", + "region": AWS_REGION, + "pathPrefix": "" + } +``` + +2. **Upload Data to S3**: Uploads the local dataset from `DATASET` to your S3 object store and registers the root folder as artifact with AI Core. The File Upload and Artifact endpoints of AI Core API may be used for this purpose. In this example `genaiEvaluation\{prefix_guid}` is the root folder containing the orchestration configurations and test data which is registered as AI Core artifact. + +3. **Register Artifact with AI Core**: Registers the uploaded dataset in S3 as an artifact in SAP AI Core. This makes the data accessible to the evaluation workflow. +The input [artifact](https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/manage-artifacts) is a placeholder in an executable or template that enables the attachment of datasets or models required for the execution of an AI workflow or pipeline. +To register an artifact with AI Core + - Upload the input files to the path specified in the object store. + - Register an artifact with AICore by providing the path to the input artifact + +The API endpoint can be found in : [Register Artifact](https://api.sap.com/api/AI_CORE_API/resource/Artifact) +``` + { + "labels": [ + { + "key": "ext.ai.sap.com/prompt-evaluation", + "value": "true" + } + ], + "name": "genai-eval-simplified-test-data", + "kind": "other", + "url": input_artifact_path, # input artifact path + "description": "demo artifacts for evaluation flow.", + "scenarioId": "genai-evaluations" + } +``` + - The url needs to be constructed as ai://genai-data-notebook/genaiEvaluation/{prefix_guid}, where genai-data-notebook is the object store secret name created previously. Hence the path translates as ai://genai-data-notebook/ which is the directory that your dataset file is located in. + - The url points to a directory, not a file, which gives you advantage that you can store multiple files in an AWS S3 directory and register the directory containing all files as a single artifact. + - All the files present in the path referenced by artifact will be copied from your S3 storage to your SAP AI Core instance during training or inferencing. This includes subfolders, apart from where Kind = MODEL. + - The scenario Id here referes to the Global Workflow already present in AI Core. + + +4. **Create Orchestration Deployment**: If you don't have an existing orchestration deployment, this step creates one. The deployment provides the endpoint for running the LLM. + + +5. **Select Metrics**: You can select from a list of system-defined metrics (e.g., ROUGE, BERT Score, Answer Relevance) and/or register your own custom metrics through the notebook. + +The following **system-defined computed metrics** are supported: + +| Name | Description | Reference required | +---------------------------------------------------------------------------------------------|------------------| +| BERT Score | https://huggingface.co/spaces/evaluate-metric/bertscore | Yes | +| BLEU | https://huggingface.co/spaces/evaluate-metric/bleu | Yes | +| ROUGE | https://huggingface.co/spaces/evaluate-metric/rouge | Yes | +| JSON Schema Match | validates LLM generated response against a predefined Json schema, returns boolean result. | Yes | +| Content Filter on Input | Whether orchestration input was rejected by the input filter | No | +| Content Filter on Output | Whether orchestration output was rejected by the output filter | No | +| Exact Match | Whether the output exactly matches the reference | Yes | +| Language Match | The metric returns true/false to indicate if the text matches the given language | No | + + +The following **system-defined model-as-a-judge metrics** are supported: + +| Name | Description | Reference required |-------------------------------------------------------------------------------------------------------------------- +| Pointwise Instruction Following | assess the model's ability to follow instructions provided in the user prompt | No | +| Pointwise Correctness | assess the model's ability to provide a correct response based on the user prompt | Yes | +| Pointwise Answer Relevance | assess the model's response is related to user prompt | No | +| Pointwise Conciseness | assess the model's response is a short and concise answer to user prompt | No | +| +*Entries marked with an asterisk (*) are experimental metrics. + + + +## Model-as-a-Judge metrics internally follow this template: +
+Pointwise Instructions Following prompt template: + +```text +Please act as an impartial judge and evaluate the quality of the responses based on the prompt and following criteria: + +## Metric Definition +You will be assessing model's the ability to follow instructions provided in the user prompt. + +## Criteria +Instruction following: The response demonstrates a clear understanding of the instructions in the user prompt, satisfying all of the instruction's requirements. +Evaluate the responses STRICTLY on the ability to follow instruction ONLY. + +## Rating Rubric +5: (Complete fulfillment). Response addresses all aspects and adheres to all requirements of the instruction. The user would feel like their instruction was completely understood. +4: (Good fulfillment). Response addresses most aspects and requirements of the instruction. It might miss very minor details or have slight deviations from requirements. The user would feel like their instruction was well understood. +3: (Some fulfillment). Response does not address some minor aspects and/or ignores some requirements of the instruction. The user would feel like their instruction was partially understood. +2: (Poor fulfillment). Response addresses some aspects of the instruction but misses key requirements or major components. The user would feel like their instruction was misunderstood in significant ways. +1: (No fulfillment). Response does not address the most important aspects of the instruction. The user would feel like their request was not at all understood. + + +User Prompt: +{{?aicore_prompt_template}} + +Model Response: +{{?aicore_llm_completion}} + +Begin your evaluation by providing a short explanation. Be as unbiased as possible. After providing your explanation, please rate the response according to the rubric and outputs STRICTLY following this JSON format: +{ + "explanation": string, + "rating": integer +} +Output: +``` +
+ +
+Pointwise Correctness prompt template: + +```text +You are an expert evaluator. Your task is to evaluate the quality of the responses generated by AI models. +We will provide you with the user input, an AI-generated responses and a reference answer. +You should first read the user input carefully for analyzing the task, and then evaluate the quality of the responses based on the criteria provided in the Evaluation section below. +You will assign the response a rating following the Rating Rubric and Evaluation Steps. +Give step-by-step explanations for your rating, and only choose ratings from the Rating Rubric. + +## Metric Definition +You will be assessing correctness, which measures the ability to provide a correct response based on the user prompt and the reference. + +## Criteria +Correctness: Is the response correct, accurate, and factual? + +## Rating Rubric +5: (Completely correct). The response is completely correct, accurate, and factual. +4: (Mostly correct). The response is mostly correct, accurate, and factual. +3: (Somewhat correct). The response is somewhat correct, accurate, and factual. +2: (Somewhat incorrect). The response is somewhat incorrect, inaccurate, and fictitious. +1: (Incorrect). The response is incorrect, inaccurate, and fictitious. + +## Evaluation Steps +STEP 1: Assess the response in aspects of Correctness. Identify any information in the response and provide assessment according to the Criteria. +STEP 2: Score based on the rating rubric. Give a brief rationale to explain your evaluation considering Correctness. + +Prompt: +{{?aicore_prompt_template}} + +Response: +{{?aicore_llm_completion}} + +Reference: +{{?reference}} + +Begin your evaluation by providing a short explanation. Be as unbiased as possible. After providing your explanation, please rate the response according to the rubric and outputs STRICTLY following this JSON format: +{ + "explanation": string, + "rating": integer +} + +Output: +``` + +
+ +
+Pointwise Answer Relevance prompt template: + +```text +You are an expert evaluator. Your task is to evaluate the relevance of responses generated by AI models. +We will provide you with the user input and an AI-generated response. +You should first read the user input carefully to understand the context and intention, and then evaluate the relevance of the response based on the criteria provided in the Evaluation section below. +You will assign the response a rating following the Rating Rubric and Evaluation Steps. +Give step-by-step explanations for your rating, and only choose ratings from the Rating Rubric. + +## Metric Definition +You will be assessing relevance, which measures the ability to provide a response that is pertinent and useful based on the user prompt and the context provided. + +## Criteria +Relevance: Does the response address the user's query appropriately and provide pertinent information? + +## Rating Rubric +5: (Highly relevant). The response is highly relevant, directly addresses the user's query, and provides useful information. +4: (Mostly relevant). The response is mostly relevant and generally addresses the user's query with useful information. +3: (Somewhat relevant). The response is somewhat relevant but may miss key aspects of the user's query. +2: (Slightly relevant). The response is slightly relevant and largely misses the user's query. +1: (Irrelevant). The response is irrelevant and does not address the user's query. + +## Evaluation Steps +STEP 1: Assess the response in terms of Relevance. Identify how well the response aligns with the user's query and context according to the Criteria. +STEP 2: Score based on the rating rubric. Give a brief rationale to explain your evaluation considering Relevance. + +Prompt: +{{?aicore_prompt_template}} + +Response: +{{?aicore_llm_completion}} + +Begin your evaluation by providing a short explanation. Be as unbiased as possible. After providing your explanation, please rate the response according to the rubric and outputs STRICTLY following this JSON format: +{ + "explanation": string, + "rating": integer +} + +Output: +``` + +
+ +
+Pointwise Conciseness prompt template: + +```text +You are an expert evaluator. Your task is to evaluate the conciseness of responses generated by AI models. +We will provide you with the user input and an AI-generated response. +You should first read the user input carefully to understand the context and intention, and then evaluate the conciseness of the response based on the criteria provided in the Evaluation section below. +You will assign the response a rating following the Rating Rubric and Evaluation Steps. +Give step-by-step explanations for your rating, and only choose ratings from the Rating Rubric. + +## Metric Definition +You will be assessing conciseness, which measures the ability to convey the necessary information in a clear and succinct manner. + +## Criteria +Conciseness: Does the response deliver the essential information without unnecessary words or redundancy? + +## Rating Rubric +5: (Highly concise). The response is very concise, delivering all necessary information in a succinct manner without any superfluous content. +4: (Mostly concise). The response is mostly concise and generally avoids unnecessary words while covering the essential information. +3: (Somewhat concise). The response is somewhat concise but may include some unnecessary words or slightly redundant information. +2: (Slightly concise). The response is slightly concise and contains a significant amount of unnecessary or redundant information. +1: (Not concise). The response is not concise and is filled with unnecessary or redundant content that obscures the main points. + +## Evaluation Steps +STEP 1: Assess the response in terms of Conciseness. Identify how effectively the response communicates essential information without unnecessary words according to the Criteria. +STEP 2: Score based on the rating rubric. Give a brief rationale to explain your evaluation considering Conciseness. + +Prompt: +{{?aicore_prompt_template}} + +Response: +{{?aicore_llm_completion}} + +Begin your evaluation by providing a short explanation. Be as unbiased as possible. After providing your explanation, please rate the response according to the rubric and outputs STRICTLY following this JSON format: +{ + "explanation": string, + "rating": integer +} + +Output: +``` + +
+ +#### User-defined metrics (Custom metrics) + +User-defined metrics can be used to evaluate the LLM outputs according to the unique needs of a use case. +A **user-defined llm-as-a-judge metric** uses a judge LLM along with a rubric to compute a metric rating. The output of a llm-as-a-judge metric can be numeric or text. + + + +The system defines a structure for the judge prompts and users provide the metric definition in the pre-defined format. Relevant instructions, such as output instructions, are automatically added to ensure the desired output from the LLM. + +Example definition +```json +{ + "scenario": "genai-evaluations", #required only if metricId is not provided + "metricName": "my_custom_metric", #required only if metricId is not provided + "version": "0.0.1", #required only if metricId is not provided + "type": "structured", # structured . + "model_configuration": { # model parameters are system-defined for structured prompts. User-defined model parameters will be ignored. + "model_name": "string", + "model_version": "string", + }, + "prompt_configuration": { + "evaluation_task": "string", #Describe the goal of this evaluation. + "criteria": "string", #Describe in a one or two sentences how the evaluation is done. + "rating_rubric": [ + { + "rating": "number", #Rating is always an integer. + "rule": "string" #Describe the criteria for choosing this rating. + }, + ... + ], + "include_properties": ["prompt", "reference"], #If present, a variable to hold the value (prompt, reference, etc) will be included. + "examples": [ #optional, few shot examples to provide context to the judge llm for better results. Ensure that examples cover all ratings for good results. + { + "prompt": "string", #required only if prompt is present in include_properties + "response": "string", #mandatory + "reference": "string", #required only if reference is present in include_properties + "rating": "number", #mandatory + "explanation": "string", #mandatory, providing this value will improve the response from the judge llm. + }, + ... + ], + } +} +``` + +The constructed prompt will be: + +``` +Please act as an impartial judge and evaluate the quality of the responses based on the following criteria: + +## Evaluation Task +{{?evaluation_task}} # Example: You will be assessing correctness, which measures the ability to provide a correct response based on the user prompt and the reference. + +## Criteria +{{?Criteria}} # Example: Correctness: Is the response correct, accurate, and factual? + +## Rating Rubric +5: (Complete fulfillment). Response addresses all aspects and adheres to all requirements of the evaluation criteria. The user would feel like their expectations were completely met. +4: (Good fulfillment). Response addresses most aspects and requirements of the evaluation criteria. It might miss very minor details or have slight deviations from expectations. The user would feel like their expectations were well met. +3: (Some fulfillment). Response does not address some minor aspects and/or ignores some requirements of the evaluation criteria. The user would feel like their expectations were partially met. +2: (Poor fulfillment). Response addresses some aspects of the evaluation criteria but misses key requirements or major components. The user would feel like their expectations were misunderstood in significant ways. +1: (No fulfillment). Response does not address the most important aspects of the evaluation criteria. The user would feel like their expectations were not at all met. + +Prompt: +{{?aicore_llm_prompt}} + +Response: +{{?aicore_llm_completion}} + +Reference: +{{?reference}} + +Begin your evaluation by providing a short explanation. Be as unbiased as possible. After providing your explanation, please rate the response according to the rubric and outputs STRICTLY following this JSON format: +{ + "explanation": string, + "rating": integer +} + +Output: +``` +**NOTE**: "scenario" and "metricName" and "version" is a required parameter for the custom metric in evaluation configuration. + +**NOTE**: The user must provide at least one prompt, system or user prompt, or both prompts can be provided. + + +6. **Select Models**: Choose the foundation models you want to evaluate from a list of available models in your AI Core instance. + + +⚠️ **Model Availability Notice** +If you are in a region where the `gpt-4.1` model (version `2025-04-14`) is not available, the existing LLM-as-a-Judge metrics evaluation cannot be performed. Currently, the evaluation service relies on this specific model version for metrics computation. + + +7. **Create orchestration Registry Config**: Prompt and models to be provided as part of orchestration configuration (Inline Prompt). + +Sample Body: +```json +{ + "name": "genai-eval-test", + "version": "1.0.0", + "scenario": "genai-evaluations", + "spec": { + "modules": { + "prompt_templating": { + "model": { + "name": "model_name", + "version": "model_version" + }, + "prompt": { + "template": [ + { + "role": "user", + "content": "List the benefits and side effects of the drug in the following consumer health question: {{?question}}." + } + ] + } + } + } + } + } +``` + +### Step 3: Start Evaluation Run + +* **Create AI Core Configuration**: A configuration is created that binds together the dataset artifact, the selected models, the chosen metrics, and the prompt template. +After registering input artifacts, we create AI Core configuration using the global executable of genai-evaluations global scenario. +The evaluation configuration takes the following input parameters which are provided as parameterBindings. + +After registering input artifacts, we create AI Core configuration using the global executable of genai-evaluations global scenario. The evaluation configuration takes the following input parameters which are provided as parameterBindings. + +| Input parameter | Description | +| --------------- | ----------- | +| orchestrationDeploymentURL | The orchestration deployment to use for calling the LLM. | +| metrics | A string containing comma-separated names of system-defined metrics or scenario/metricName/version for custom metrics to be evaluated. | +| testDataset | JSON containing the path to a test dataset relative to the rootFolder and its file type. | +| orchestrationRegistryIds | The ID of the orchestration config stored in the orchestration registry. | +| tags (Optional) | A JSON containing name-value pairs containing user-defined metadata | +| orchestrationDeploymentURL | The orchestration deployment to use for calling the LLM | +| repetitions (Optional) | The number of times the same input is submitted to the LLM to evaluate the consistency of the LLM outputs. Should be greater than 1 if specified. Default is 1. | +| testRowCount (Optional) | Specifies the number of rows to be selected from the testDataset for evaluation. + + +Below is an example of a configuration request body: +The API endpoint for this can be found here: [Configuration Endpoint](https://api.sap.com/api/AI_CORE_API/resource/Configuration) + +request_body = { + "name": "genai-eval-conf", + "scenarioId": "genai-evaluations", + "executableId": "genai-evaluations-simplified", + "inputArtifactBindings": [ + { + "key": "datasetFolder", + "artifactId": artifact_id + } + ], + "parameterBindings": [ + { + "key": "repetitions", + "value": repetitions + }, + { + "key": "orchestrationDeploymentURL", + "value": orchestration_deployment_url + }, + { + "key": "metrics", + "value": metrics_list + }, + { + "key": "testDataset", + "value": test_datasets + }, + { + "key": "orchestrationRegistryIds", + "value": orchestration_registry_id + }, + { + "key": "debugMode", + "value": "ON" + } + ] +} + + +* **Execute Evaluation**: Once Configration is created, we create the AI Core execution which triggers the evaluation workload. +The status of the execution needs to be Completed for the workflow to have succeeded. +The evaluation job produces two outputs +1. A SQLite DB file which stores the orchestration input, orchestration output, values for all the metrics calculated for this orchestration output and statistics such as latency for this orchestration output. These metric values are called raw metric values. This SQLite DB file is stored in the object store as an AI Core output artifact. +2. A set of metrics whose values are aggregated from the raw metric values. The aggregate metrics are stored in the tracking service. The user-defined tags along with the run names are stored with the metrics. +Post execution completion the runs generated by the workload along with the aggregate metrics can be seen by calling the tracking api. + +The API endpoints to create an execution and monitor it's status is : [Execution Endpoint](https://api.sap.com/api/AI_CORE_API/resource/Execution). + +* **Monitor Execution**: You can monitor the status of the execution until it is `COMPLETED`. + +### Step 4: Analyze Evaluation Results + +Once the execution is complete, you can analyze the results. + +1. **Retrieve Aggregate Metrics**: Fetches the high-level, aggregated metrics from the AI Core Tracking service for each evaluation run.The evaluation job will report the following aggregate statistics. + +| statistic | description | +| --------- | ----------- | +| average_latency | The average time taken in seconds to get a completion from the orchestration service | +| completion_count | Number of completions evaluated | +| total_prompt_tokens | Sum of prompt_tokens of completions | +| total_completion_tokens | Sum of completion_tokens of completions | + +The aggregation metrics can be found by calling the tracking endpoint. There are two ways we can do this: +- By using the execution id from the previous step: + +{base_url}/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/child-of={execution_id} + +- By using the run name used in the dataset: + +{base_url}/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/run-name={run_name} + + +For further drill down, the output artifacts can be downloaded. The output artifact contains the following information. The results will be stored in aws as an output artifact in the location +object-store-secret-path/ + +- folders containing results from each step of the workflow execution. The final result is stored in the sqlite_combined folder in the results.db . +The results.db contains the following tables : + | table name | description | + | --------- | ----------- | + | runs |Stores the prompt templates provided in the dataset| + | aggregation_results | Stores the aggregated statistics obtained from the tracking service| + | completion | Stores response body received based on the provided configuration| + | submissions |Stores the requests that will be sent to the orchestration service| + | submissions_results | Stores the output received from the orchestration service | + | evaluation_results |Stores the computed results after applying evaluation metrics| + +- Custom_logs : Useful for debugging in case of errors. + + +2. **Download Raw Results**: Downloads the detailed, instance-level results, which are stored in a SQLite database (`results.db`) in your S3 bucket. + +3. **View Detailed Results**: The notebook provides code to connect to the downloaded SQLite database and display the contents of various tables, including: + * `run`: Information about each run. + * `configuration`: The configuration used for the run. + * `submission` & `submission_result`: Details of the requests and responses to the LLM. + * `evaluation_result`: The raw, per-instance metric scores. + * `aggregation_result`: The aggregated results for the entire run. +4. **Process and Rank Results**: The notebook includes scripts to process the raw results further: + * It calculates mean and standard deviation for numerical metrics. + * It processes categorical and boolean metrics by applying a scoring system. + * Finally, it combines all the processed metrics and provides a weighted ranking of the different runs, helping you identify the best-performing model and prompt configuration based on your criteria. diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/ai-core-genaihub-evaluation-comprehensive.md b/tutorials/ai-core-genaihub-evaluation-comprehensive/ai-core-genaihub-evaluation-comprehensive.md new file mode 100644 index 000000000..5e2353108 --- /dev/null +++ b/tutorials/ai-core-genaihub-evaluation-comprehensive/ai-core-genaihub-evaluation-comprehensive.md @@ -0,0 +1,2186 @@ +--- +parser: v2 +auto_validation: true +time: 45 +primary_tag: software-product>sap-ai-core +tags: [ tutorial>beginner, topic>artificial-intelligence, topic>machine-learning, software-product>sap-ai-core ] +author_name: Smita Naik +author_profile: https://github.com/I321506 +--- + +# Custom Evaluation for Generative AI – Comprehensive Guide + This tutorial demonstrates how to use SAP AI Core Custom Evaluation to benchmark Large Language Models (LLMs) using **Orchestration Registry**. It guides you through environment setup, configuration creation, execution, and result analysis in a unified and simplified workflow. + +It extends the Quick Start tutorial and is intended for Application Developers and Data Scientists who already know the basics of GenAI workflows in SAP AI Core. + +## You will learn +- How to prepare and organize datasets for evaluation. +- How to configure and run evaluations in SAP AI Core. +- How to analyze and interpret aggregated evaluation results. + +## Prerequisites +1. **BTP Account** + Set up your SAP Business Technology Platform (BTP) account. + [Create a BTP Account](https://developers.sap.com/group.btp-setup.html) +2. **For SAP Developers or Employees** + Internal SAP stakeholders should refer to the following documentation: [How to create BTP Account For Internal SAP Employee](https://me.sap.com/notes/3493139), [SAP AI Core Internal Documentation](https://help.sap.com/docs/sap-ai-core) +3. **For External Developers, Customers, or Partners** + Follow this tutorial to set up your environment and entitlements: [External Developer Setup Tutorial](https://developers.sap.com/tutorials/btp-cockpit-entitlements.html), [SAP AI Core External Documentation](https://help.sap.com/docs/sap-ai-core?version=CLOUD) +4. **Create BTP Instance and Service Key for SAP AI Core** + Follow the steps to create an instance and generate a service key for SAP AI Core: + [Create Service Key and Instance](https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/create-service-key?version=CLOUD) +5. **AI Core Setup Guide** + Step-by-step guide to set up and get started with SAP AI Core: + [AI Core Setup Tutorial](https://developers.sap.com/tutorials/ai-core-setup.html) +6. An Extended SAP AI Core service plan is required, as the Generative AI Hub is not available in the Free or Standard tiers. For more details, refer to +[SAP AI Core Service Plans](https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/service-plans?version=CLOUD) +7. **Orchestration Deployment** + Ensure at least one orchestration deployment is ready to be consumed during this process. +Refer to [this tutorial understand the basic consumption of GenAI models using orchestration.](https://developers.sap.com/tutorials/ai-core-orchestration-consumption.html) +8. **Basic Knowledge** + Familiarity with the orchestration workflow is recommended +9. **Install Dependencies** + Install the required Python packages using the requirements.txt file provided. +Download [requirements.txt](img/requirements.txt) + +💡 Right-click the link above and choose **"Save link as..."** to download it directly. + +## Pre-Read + +This tutorial which showcases how a user can use AI Core custom evaluation to benchmark their large language models, evaluate orchestration configuration or prompts for their use case. +It uses publicly available [MedicationQA dataset](https://langtest.org/docs/pages/benchmarks/medical/medicationqa/) which consists of commonly asked consumer questions about medications. The workload computes industry standard metrics to check the reliability of the response generate by llm. + +### Environment Variables Setup + +[OPTION BEGIN [SAP AI Launchpad]] + +- Navigate to your SAP AI Core Launchpad. + +- In the Workspaces section, click on "Add" to create a new workspace. + - A workspace in SAP AI Core is a logical container that holds your resources (like models and pipelines) and provides the isolation needed for your projects. + +- When prompted, enter your AI Core credentials (such as Client ID, Client Secret, and Base URL). + - Note: If you're unsure about where to find these credentials, refer to this [guide](https://developers.sap.com/tutorials/ai-core-generative-ai.html#1c4f36d7-f345-4822-be00-c15f133ff7d8). + +- Once the workspace is successfully created, select your desired Resource Group to begin the evaluation process. + +Refer to the screenshot below for guidance: +![img](img/image_34.png) + +[OPTION END] + +[OPTION BEGIN [Python]] + +- Open **Visual Studio Code or Jupyter Notebook**. Create a new file with the .ipynb extension (e.g., custom_evaluation.ipynb). +- Create a **.env** file in the root directory of your project. +- Add your **AI Core** and **AWS credentials** as shown below. + +```env +# AICORE CREDENTIALS +AICORE_CLIENT_ID= +AICORE_CLIENT_SECRET= +AICORE_AUTH_URL= +AICORE_BASE_URL= +AICORE_RESOURCE_GROUP= + +# AWS CREDENTIALS +AWS_ACCESS_KEY= +AWS_BUCKET_ID= +AWS_REGION= +AWS_SECRET_ACCESS_KEY= + +# ORCHESTRATION DEPLOYMENT URL +DEPLOYMENT_URL= +``` + +**Note:** Replace placeholders (e.g., CLIENT_ID, CLIENT_SECRET, etc) with your actual environment credentials. + +Refer to the below screenshot for clarity: +![img](img/image_1.png) + +#### Install Dependencies + +Install the required packages using the [requirements.txt](img/requirements.txt) file you downloaded in the Prerequisites section. +```bash +pip install -r requirements.txt +``` +#### Connect to AI Core Instance + +Once the environment variables are set and dependencies are installed, run the following code to connect to your instance: + +```PYTHON +# Loading the credentials from the env file +from gen_ai_hub.proxy.gen_ai_hub_proxy import GenAIHubProxyClient +from dotenv import load_dotenv +import os + +load_dotenv(override=True) + +# Fetching environment variables +AICORE_BASE_URL = os.getenv("AICORE_BASE_URL") +AICORE_RESOURCE_GROUP = os.getenv("AICORE_RESOURCE_GROUP") +AICORE_AUTH_URL = os.getenv("AICORE_AUTH_URL") +AICORE_CLIENT_ID = os.getenv("AICORE_CLIENT_ID") +AICORE_CLIENT_SECRET = os.getenv("AICORE_CLIENT_SECRET") + +AWS_ACCESS_KEY = os.getenv("AWS_ACCESS_KEY") +AWS_BUCKET_ID = os.getenv("AWS_BUCKET_ID") +AWS_REGION = os.getenv("AWS_REGION") +AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY") +DEPLOYMENT_URL = os.getenv("DEPLOYMENT_URL") + +# Initializing the GenAIHubProxyClient +client = GenAIHubProxyClient( + base_url=AICORE_BASE_URL, + auth_url=AICORE_AUTH_URL, + client_id=AICORE_CLIENT_ID, + client_secret=AICORE_CLIENT_SECRET, + resource_group=AICORE_RESOURCE_GROUP +) +``` + +**NOTE:** +- Ensure the **requirements.txt** installation completes successfully before running the code. +- If you face any issues, recheck your **.env** values and installed packages. + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +- Download the [Bruno_collections](img/AI_Core.json) file + +- please follow the steps in the [Tutorial](https://developers.sap.com/tutorials/ai-core-orchestration-consumption.html) to set up your environment, refer step - **Set Up Your Environment and Configure Access** and proceed till generating the token + +[OPTION END] + +### Preparing Dataset Files + +[OPTION BEGIN [SAP AI Launchpad]] + +> **Note:** This step involves local setup using Python and does not require any action on the SAP AI Launchpad. + +[OPTION END] + +[OPTION BEGIN [Python]] + +In this step, the evaluation notebook dynamically detects the dataset file from a predefined folder structure. +You are not required to hardcode the dataset filename. + +```Python +import os +import json + + + +def get_dataset_file_name(folder_path): + """ + Retrieves the name of the first file in the specified folder. + """ + if not os.path.isdir(folder_path): + print(f"The folder path '{folder_path}' does not exist.") + return None + + items_in_folder = os.listdir(folder_path) + + for item in items_in_folder: + item_path = os.path.join(folder_path, item) + if os.path.isfile(item_path): + return item + + print(f"No files were found in the folder '{folder_path}'.") + return None + + + +# --- MAIN EXECUTION --- +DATASET_FOLDER = "../DATASET" + +DATASET_NAME = get_dataset_file_name(DATASET_FOLDER) + +if DATASET_NAME: + print(f"Dataset name: {DATASET_NAME}") +else: + print("Missing run or dataset file.") + raise SystemExit("Exiting due to missing run/dataset file.") +``` + +![img](img/image_py_dtst.png) + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +> **Note:** This step involves local setup using Python and does not require any action on Bruno. + +[OPTION END] + +### Registering an Object Store Secret in AI Core + +[OPTION BEGIN [SAP AI Launchpad]] + +- Open the **SAP AI Core Launchpad** and navigate to the **Administration** tab. +- Select the **Object Store** section from the left-hand menu. +- Click on **“Add”** to register a new object store secret. +- Fill in the required bucket details as shown in the screenshot below. + +![img](img/image_33.png) + +In the **Secret** field, use the following structure to provide your AWS credentials: + +```json +{ + "AWS_ACCESS_KEY_ID": "Enter Your value", + "AWS_SECRET_ACCESS_KEY": "Enter Your value" +} +``` + +[OPTION END] + +[OPTION BEGIN [Python]] + +To make your evaluation files available for AI Core orchestration, you need to: + +- Upload them to an object store (e.g., AWS S3). +- Register the object store secret in AI Core. + +#### **Setup Authentication and Headers** + +First, define the authentication headers for AI Core REST API calls. + +```PYTHON +def _get_headers(): + headers = { + "Authorization": client.get_ai_core_token(), + "AI-Resource-Group": AICORE_RESOURCE_GROUP, + "Content-Type": "application/json", + } + return headers +``` + +#### **Register Object Store Secret in AI Core** + +Register your S3 bucket and credentials as a secret. + +```PYTHON +# Register S3 secret with AI Core which will be used an input source +import requests +import json +import logging + +def delete_oss_secret(oss_name=""): + headers = _get_headers() + + DELETE_SECRETS_ENDPOINT = f'/v2/admin/objectStoreSecrets/{oss_name}' + request_url = f"{AICORE_BASE_URL}{DELETE_SECRETS_ENDPOINT}" + + try: + response = requests.delete(request_url, headers=headers, timeout=120) + if response.status_code == 202: + print(f"Successfully deleted object store secret: {oss_name}") + elif response.status_code == 404: + print(f"Object store secret not found: {oss_name}. It may not exist.") + else: + logging.error(f"Failed to delete object store secret: {oss_name}, Status Code: {response.status_code}") + except Exception as e: + logging.error(f"Error occurred while attempting to delete object store secret: {e}") + raise + +def register_oss_secret(oss_name="", path_prefix=""): + headers = _get_headers() + + POST_SECRETS_ENDPOINT = '/v2/admin/objectStoreSecrets' + request_url = f"{AICORE_BASE_URL}{POST_SECRETS_ENDPOINT}" + + request_body = { + "name": oss_name, + "data": { + "AWS_ACCESS_KEY_ID": AWS_ACCESS_KEY, + "AWS_SECRET_ACCESS_KEY": AWS_SECRET_ACCESS_KEY + }, + "type": "S3", + "bucket": AWS_BUCKET_ID, + "endpoint": "s3-eu-central-1.amazonaws.com", + "region": AWS_REGION, + "pathPrefix": path_prefix, + "verifyssl": "0", + "usehttps": "1", + } + try: + response = requests.post( + request_url, headers=headers, data=json.dumps(request_body), timeout=120 + ) + result = response.json() + return result + except: + logging.error("Error occurred while attempting to create object store secret") + raise + +delete_oss_secret(oss_name="default") +delete_oss_secret(oss_name="genai-simplified-notebook") + +register_oss_secret(oss_name="default", path_prefix="") +register_oss_secret(oss_name="genai-simplified-notebook", path_prefix="") +``` + +![img](img/image_objsec.png) + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +Generic secrets securely store AWS S3 credentials required for document access + +• Expand **objectStoreSecrets** under admin and select create a secret request + +Use the below payload to create a secret for AWS S3 with NoAuthentication as authentication type. + +```CODE +{ + "name": "genai-data", + "data": { + "AWS_ACCESS_KEY_ID": "", + "AWS_SECRET_ACCESS_KEY": "", + }, + "type": "S3", + "bucket": "", + "endpoint": "", + "region": "", + "pathPrefix": "" + } +``` +• Ensure that all values in the data dictionary are Base64-encoded as per AWS S3 credential requirements + +![img](img/image-br01.png) + +[OPTION END] + +> ⚠️ **Important Note (Must Read)** +> +> - You must **create an object store secret** with a user defined name (for eg: default) to store **output artifacts** from orchestration runs. This is **mandatory**. +> - For **input artifacts**, you may create additional object store secrets with different names if needed. +> - If a user defined name (for eg: default) is not configured, orchestration runs will **fail** due to missing output target setup. + + +### Upload and Register Dataset + +[OPTION BEGIN [SAP AI Launchpad]] + +After creating the secret, upload your evaluation files to the S3 bucket and register them as an artifact in AI Core. + +#### **Register Uploaded Files as Artifact in AI Core** + +To register your evaluation dataset with SAP AI Core, you need to upload it as an artifact. Follow the instructions below using the **SAP AI Launchpad UI**. + +--- + +- Open the **SAP AI Core Launchpad**. +- Navigate to the **Generative AI/Optimization/Artifacts** section to create dataset artifact. + +![img](img/image_19.png) + +- On the **Artifacts** section, click **add**. + +--- + +- On the **General Information** screen, enter the following: + + - **Select Scenario:** `genai-evaluations` + - **Name:** `genai-eval-test-data` + - **Description:** `Demo artifacts for evaluation flow.` + - **Select Object Store:** `genai-data` + - **Sub-folder path:** `genaiEvaluation/` + + > 💡 Replace `` with your **SAP BTP user ID** or the folder path in your object store where the evaluation files are uploaded. + +- On the **Labels** screen, click **“Add Label”** and provide the following: + + - **Key:** `prompt-evaluation` + - **Value:** `true` + *(Note: The prefix `ext.ai.sap.com/` is automatically pre-filled in the UI.)* + + ![img](img/image_21.png) + +- Review all entered details carefully. +- Click **“Add”** to complete the artifact registration. + +[OPTION END] + +[OPTION BEGIN [Python]] + +After creating the secret, organize your evaluation files into the eval/ folder testdata. Upload them to S3 and register as artifacts in AI Core. + +#### **Upload Files to S3 Bucket** +```python +# uploading these files to Object store to register as an artifact inside ai core + +import boto3 +import os +import uuid + +def upload_folder_to_s3(folder_path, bucket_name, s3_prefix=""): + """ + Upload a folder to an S3 bucket recursively. + + :param folder_path: The local folder path to upload. + :param bucket_name: The name of the S3 bucket. + :param s3_prefix: Optional prefix to use for the S3 keys (e.g., subfolder in the bucket). + """ + s3_client = boto3.client( + 's3', + aws_access_key_id=AWS_ACCESS_KEY, + aws_secret_access_key=AWS_SECRET_ACCESS_KEY, + region_name=AWS_REGION + ) + + for root, dirs, files in os.walk(folder_path): + for file_name in files: + print("val of root is ", file_name) + local_path = os.path.join(root, file_name) + # Compute the relative path for the S3 key + relative_path = os.path.relpath(local_path, folder_path) + s3_key = os.path.join(s3_prefix, relative_path).replace("\\", "/") # Ensure S3-compatible paths + print("val of s3 key is ", s3_key) + print(f"Uploading {local_path} to s3://{bucket_name}/{s3_key}") + + # Upload the file + s3_client.upload_file(local_path, bucket_name, s3_key) + +# Example usage +folder_to_upload_testdata = "../DATASET" +user_directory_prefix = "" # replace with your i-number as string here +prefix_guid = user_directory_prefix if user_directory_prefix is not None else str(uuid.uuid4().hex) +s3_testdata_prefix = f"genaiEvaluation/{prefix_guid}/testdata" # Leave empty for root of the bucket + + +upload_folder_to_s3(folder_to_upload_testdata, AWS_BUCKET_ID, s3_testdata_prefix) +input_artifact_path = f"ai://genai-simplified-notebook/genaiEvaluation/{prefix_guid}" +``` + ![img](img/image_5.png) + +#### **Register Uploaded Files as Artifact in AI Core** + +```Python +import requests +import logging +# Registering the uploaded files from AWS as artifacts to use inside configuration. + +def register_artifact(): + headers = _get_headers() + + GET_ARTIFACTS_ENDPOINT = '/v2/lm/artifacts' + request_url = f"{AICORE_BASE_URL}{GET_ARTIFACTS_ENDPOINT}" + + request_body = { + "labels": [ + { + "key": "ext.ai.sap.com/prompt-evaluation", + "value": "true" + } + ], + "name": "genai-eval-simplified-test-data", + "kind": "other", + "url": input_artifact_path, # input artifact path + "description": "demo artifacts for evaluation flow.", + "scenarioId": "genai-evaluations" + } + try: + response = requests.post( + request_url, headers=headers, data=json.dumps(request_body), timeout=120 + ) + result = response.json() + print(result) + return result['id'] + except: + print("Error occurred while attempting to create an execution") + raise + +artifact_id = register_artifact() +``` +![img](img/image_6.png) + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +Before registering a dataset artifact in Bruno, you must upload your CSV file to the SAP AI Core object store using the Dataset API. +Bruno cannot upload files directly to S3; therefore, this step is required. + +**Prerequisites** + + - An object store secret must already exist in your resource group.Typically, this is the default secret named **default**. + + - The Dataset API currently supports: + + - S3 object stores only + + - CSV file uploads + +**Upload Your Dataset** + +Use the Dataset API – Upload File request in Bruno: + +```bash +PUT:{{ai_api_url}}/v2/lm/dataset/files/{{secretName}}/{{datasetPath}} +``` + +**Headers** + +```json +Authorization: Bearer {{token}} +AI-Resource-Group: {{resourceGroup}} +Content-Type: text/csv +``` + +**Body** + +Upload your .csv file directly as binary in Bruno’s Body + +Example Path Values: + + - secretName: default + + - datasetPath: testdata/medicalqna_dataset.csv + +![img](img/image_br_dt.png) + +**Note:** + +Save the ai://… URL — you will use this when creating the dataset artifact. + +**Register the Dataset Artifact** + +- Click on **Register artifact** under lm -> artifacts in bruno collection to register the artifact + +```CODE +{ + "name": "aiconfig", + "kind": "dataset", + "url": "ai://default/testdata/medicalqna_dataset.csv", + "scenarioId": "genai-evaluations" +} +``` +![img](img/image-br02.png) + +[OPTION END] + +### Approach Selection – How to Provide Prompts (Read-Up) + +In this evaluation workflow, prompts can be provided in two different ways. +Before proceeding, understand the available approaches and choose the one that fits your requirement. + +**🔹 Option 1 – Prompt Template + Model (Prompt Registry)** + + - The prompt is stored in the Prompt Registry + + - The model is referenced directly in the evaluation configuration + + - Prompts are reusable and version-controlled + + - Best suited for standardized or production-grade workflows + +**📌 When to use this?** + +If you want reusable, versioned prompts that can be managed independently. + +👉 If you would like to see this approach in action, refer to the [Evaluation Quickstart tutorial](LINK TO ADD), where we demonstrate the Prompt Registry method. + +**🔹 Option 2 – Orchestration Registry (Inline Prompt)** + + - The prompt is defined directly inside the orchestration configuration + + - No separate prompt registry entry is required + + - Ideal for ad-hoc, experimental, or one-time evaluations + +**📌 When to use this?** + +If the prompt is specific to this evaluation and does not need reuse or versioning. + +### Create a Prompt Template in Orchestration Registry + +In this tutorial, we will use the **Orchestration Registry (Inline Prompt)** approach. + +**Create Orchestration Registry Configuration** + +[OPTION BEGIN [SAP AI Launchpad]] + +Go to Generative AI Hub → Orchestration → Orchestration Configurations + +- click create + +- In templating add the system prompt + +```json +List the benefits and side effects of the drug in the following consumer health question: {{?question}}. +``` +![img](img/image_ail_or1.png) + +- select the model in model configuration and save the orchestration registry + +![img](img/image_ail_or2.png) + +![img](img/image_ail_or3.png) + +[OPTION END] + +[OPTION BEGIN [Python]] + +The following code defines a function `create_orchestration_registry_config()` that creates a new **Orchestration Configuration** in **Orchestration Registry**. + +```python +def create_orchestration_registry_config(): + headers = _get_headers() + prompt_template = { + "template": [ + { + "role": "user", + "content": "List the benefits and side effects of the drug in the following consumer health question: {{?question}}." + } + ] + } + CREATE_ORCHESTRATION_REGISTRY = '/v2/registry/v2/orchestrationConfigs' + request_url = f"{AICORE_BASE_URL}{CREATE_ORCHESTRATION_REGISTRY}" + model_name,model_version=selected_models_str.split(":") + request_body = { + "name": "genai-eval-test", + "version": "1.0.0", + "scenario": "genai-evaluations", + "spec": { + "modules": { + "prompt_templating": { + "model": { + "name": model_name, + "version": model_version + }, + "prompt": prompt_template + } + } + } + } + try: + response = requests.post( + request_url, headers=headers, data=json.dumps(request_body), timeout=120 + ) + if(response.status_code != 200): + print(response.json()) + raise + result = response.json() + print(result) + return result['id'] + except: + logging.error("Error occurred while attempting to create a orchestration registry id") + raise +orchestration_registry_id = create_orchestration_registry_config() +``` + +![img](img/image_py_or1.png) + +**Note** : If you wish to use an existing orchestration config, skip executing this cell and add the orchestration config id in `orchestration_registry_id` string in the next cell. + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +You can paste this directly into a Bruno .bru file or create a new request inside Bruno. + +**Url:** +```bash +POST {{AICORE_BASE_URL}}/v2/registry/v2/orchestrationConfigs +``` + +**headers:** +``` +{ + Authorization: Bearer {{token}} + AI-Resource-Group: {{resource_group}} + Content-Type: application/json + } +``` + +**body:** +```json +{ + "name": "genai-eval-test", + "version": "1.0.0", + "scenario": "genai-evaluations", + "spec": { + "modules": { + "prompt_templating": { + "model": { + "name": "model_name", + "version": "model_version" + }, + "prompt": { + "template": [ + { + "role": "user", + "content": "List the benefits and side effects of the drug in the following consumer health question: {{?question}}." + } + ], + "defaults": {} + } + } + } + } +} +``` + +![img](img/image_br_or1.png) + +[OPTION END] + +### Understanding Metrics (Pre-Read) + +Metrics determine how your model outputs are evaluated during an evaluation run. They define the scoring logic that SAP AI Core uses to compare models, measure quality, and validate improvements over time. + +In SAP AI Core, metrics are configured during the **Create Evaluation Configuration** step: + +```json +"metrics": "Content Filter on Input,Pointwise Instruction Following,Content Filter on Output" +``` + +You can specify one or multiple metrics (comma-separated). + +#### Types of Metrics + +SAP AI Core supports two major types: + +1. System-defined Metrics (Ready to use) + +2. Custom Metrics (User-defined) + + +**1. System-defined Metrics** + +These are built-in metrics provided by SAP AI Core. No additional setup required. + +They are grouped into two categories: + +**Computed Metrics** + +These use reference data, schema validation, or deterministic logic. + +| Name | Description | Reference required | +---------------------------------------------------------------------------------------------|------------------| +| BERT Score | https://huggingface.co/spaces/evaluate-metric/bertscore | Yes | +| BLEU | https://huggingface.co/spaces/evaluate-metric/bleu | Yes | +| ROUGE | https://huggingface.co/spaces/evaluate-metric/rouge | Yes | +| JSON Schema Match | validates LLM generated response against a predefined Json schema, returns boolean result. | Yes | +| Content Filter on Input | Whether orchestration input was rejected by the input filter | No | +| Content Filter on Output | Whether orchestration output was rejected by the output filter | No | +| Exact Match | Whether the output exactly matches the reference | Yes | +| Language Match | The metric returns true/false to indicate if the text matches the given language | No | + +👉 Use computed metrics when: + + - You have ground truth/reference answers + + - You need deterministic validation + + - You want schema validation + +**model-as-a-judge metrics** + +These use a judge LLM to evaluate responses qualitatively. + +| Name | Description | Reference required |-------------------------------------------------------------------------------------------------------------------- +| Pointwise Instruction Following | assess the model's ability to follow instructions provided in the user prompt | No | +| Pointwise Correctness | assess the model's ability to provide a correct response based on the user prompt | Yes | +| Pointwise Answer Relevance | assess the model's response is related to user prompt | No | +| Pointwise Conciseness | assess the model's response is a short and concise answer to user prompt | No | +| + +*Entries marked with an asterisk (*) are experimental metrics. + +👉 Use model-as-a-judge metrics when: + + - You need qualitative evaluation + + - No exact ground truth exists + + - You want human-like evaluation logic + +#### Custom Metrics (User-defined metrics) + +When system metrics are insufficient, you can define your own metric. + +Custom metrics can be used to evaluate the LLM outputs according to the unique needs of a use case. A user-defined llm-as-a-judge metric uses a judge LLM along with a rubric to compute a metric rating. The output of a llm-as-a-judge metric can be numeric or text. + +The system defines a structure for the judge prompts and users provide the metric definition in the pre-defined format. Relevant instructions, such as output instructions, are automatically added to ensure the desired output from the LLM. + +**Custom Metric Definition Structure** + +```json +{ + "scenario": "genai-evaluations", + "metricName": "my_custom_metric", + "version": "0.0.1", + "type": "structured", + "model_configuration": { + "model_name": "string", + "model_version": "string" + }, + "prompt_configuration": { + "evaluation_task": "Describe the goal of this evaluation.", + "criteria": "Explain how evaluation is performed.", + "rating_rubric": [ + { + "rating": 1, + "rule": "Poor quality response" + }, + { + "rating": 5, + "rule": "Excellent response" + } + ], + "include_properties": ["prompt", "reference"], + "examples": [ + { + "prompt": "Sample prompt", + "response": "Sample response", + "reference": "Expected answer", + "rating": 5, + "explanation": "Why this rating was given" + } + ] + } +} +``` +**NOTE**: "scenario" and "metricName" and "version" is a required parameter for the custom metric in evaluation configuration. + +**NOTE**: The user must provide at least one prompt, system or user prompt, or both prompts can be provided. + +**Model Availability Notice** + +⚠️ If gpt-4.1 (2025-04-14) is not available in your region: + + - LLM-as-a-Judge metrics cannot be executed + + - Evaluation service depends on this specific model version + + +### Providing Models and Metrics for Evaluation + +Metrics determine how your model outputs are evaluated during an evaluation run. They define the scoring logic that SAP AI Core uses to compare models, measure quality, and validate improvements over time. + +Metrics must be supplied before creating an Evaluation Configuration. + +[OPTION BEGIN [SAP AI Launchpad]] + +In SAP AI Launchpad, metrics are selected visually during the Evaluation Configuration creation flow. + +You can choose: + + - System-defined metrics + + - Custom metrics (your own definitions stored in the metric registry — cannot be created directly in AI Launchpad; to use them, register them via API/Bruno mentioned in the same step and then select them in the Evaluation Configuration) + +No manual JSON input is needed—the UI provides a selectable list of available metrics. + +1. Go to Generative AI Hub → Optimization. + +2. Click Create to start a new evaluation configuration. + +![img](img/image_25.png) + +- In Select Test Input section, + + - select orchestration configuration + + - Select your registered dataset artifact + + - Enter the dataset path (example): + testdata/medicalqna_dataset.csv + + - Set the number of test samples (e.g., 20) + + ![img](img/image_26.png) + +- Click **Next** to go to Metrics selection. + +#### Select Evaluation Metrics + +Choose the metrics you want to evaluate. + +You may choose one or multiple system-defined or custom metrics—examples: + + - BERT Score + + - Content Filter on Input + + - Pointwise Instruction Following + + - Content Filter on Output + +![img](img/image_27.png) + +--- + +> 📘 **Helpful Resources**: +> +> - [System-Defined Evaluation Metrics – SAP Documentation](https://help.sap.com/docs/sap-ai-core/generative-ai-hub/system-defined-evaluation-metrics) +> - [Define Your Own Custom Metrics – SAP Guide](https://help.sap.com/docs/sap-ai-core/generative-ai-hub/custom-metrics) +> *(If your evaluation requires domain-specific or advanced scoring logic)* + +> **Note: You may select additional metrics based on your use case.** + +--- + +[OPTION END] + +[OPTION BEGIN [Python]] + +**Select your Models** + +Add the models you wish to use in the string `selected_models_str` + +```Python +# Manual selection of models +selected_models_str="gemini-2.5-pro:001" +print("Selected models string:", selected_models_str) +``` + +**Metrics Handling in Python Notebook** + +When running the evaluation through the Python notebook, metric setup is partially automated. +Before the evaluation configuration is created, the script performs the following: + + - Users can manually specify metric IDs + + - Or can pass custom metrics JSON directly + + - It checks if each metric already exists in AI Core + + - If not found → creates it automatically + + - Prints final list of metric IDs used for evaluation + +This ensures all metrics exist before the evaluation configuration is created. + +```Python +user_metric_ids = "d18******************d1f,dbf56**********210c7e771" + +custom_metric_list = [ + { + "name": "test-metric", + "scenario": "genai-evaluations-test", + "version": "0.0.1", + "evaluationMethod": "llm-as-a-judge", + "managedBy": "imperative", + "systemPredefined": False, + "metricType": "evaluation", + "spec": { + "outputType": "numerical", + "promptType": "structured", + "configuration": { + "modelConfiguration": { + "name": "gpt-4.1-mini", + "version": "2025-08-07", + "parameters": [ + { + "key": "max_tokens", + "value": "10000" + } + ] + }, + "promptConfiguration": { + "definition": "You are an expert evaluator. Your task is to evaluate the quality of the responses generated by AI models. We will provide you with a reference and an AI-generated response. You should first read the user input carefully for analyzing the task, and then evaluate the quality of the responses based on the criteria provided in the Evaluation section below. You will assign the response a rating following the Rating Rubric and Evaluation Steps. Give step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.\n\n## Metric Definition\nYou are an INFORMATION OVERLAP classifier providing the overlap of information between a response and reference.\n\n## Criteria\nGroundedness: The of information between a response generated by AI models and provided reference.\n\n## Rating Rubric\n5: (Fully grounded). The response and the reference are fully overlapped.\n4: (Mostly grounded). The response and the reference are mostly overlapped.\n3: (Somewhat grounded). The response and the reference are somewhat overlapped.\n2: (Poorly grounded). The response and the reference are slightly overlapped.\n1: (Not grounded). There is no overlap between the response and the reference.\n\n## Evaluation Steps\nSTEP 1: Assess the response in aspects of Groundedness. Identify any information in the response and provide assessment according to the Criteria.\nSTEP 2: Score based on the rating rubric. Give a brief rationale to explain your evaluation considering Groundedness.\n\nReference: {{?reference}}\nResponse: {{?aicore_llm_completion}}\n\nBegin your evaluation by providing a short explanation. Be as unbiased as possible. After providing your explanation, please rate the response according to the rubric and outputs STRICTLY following this JSON format:\n\n{ \"explanation\": string, \"rating\": integer }\n\nOutput:\n", + "evaluationTask": "You are an expert evaluator. Your task is to evaluate the quality of the responses generated by AI models. We will provide you with a reference and an AI-generated response. You should first read the user input carefully for analyzing the task, and then evaluate the quality of the responses based on the criteria provided in the Evaluation section below. You will assign the response a rating following the Rating Rubric and Evaluation Steps. Give step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.\n\n## Metric Definition\nYou are an INFORMATION OVERLAP classifier providing the overlap of information between a response and reference.\n\n## Criteria\nGroundedness: The of information between a response generated by AI models and provided reference.\n\n## Rating Rubric\n5: (Fully grounded). The response and the reference are fully overlapped.\n4: (Mostly grounded). The response and the reference are mostly overlapped.\n3: (Somewhat grounded). The response and the reference are somewhat overlapped.\n2: (Poorly grounded). The response and the reference are slightly overlapped.\n1: (Not grounded). There is no overlap between the response and the reference.\n\n## Evaluation Steps\nSTEP 1: Assess the response in aspects of Groundedness. Identify any information in the response and provide assessment according to the Criteria.\nSTEP 2: Score based on the rating rubric. Give a brief rationale to explain your evaluation considering Groundedness.\n\nReference: {{?reference}}\nResponse: {{?aicore_llm_completion}}\n\nBegin your evaluation by providing a short explanation. Be as unbiased as possible. After providing your explanation, please rate the response according to the rubric and outputs STRICTLY following this JSON format:\n\n{ \"explanation\": string, \"rating\": integer }\n\nOutput:\n", + "criteria": "You should strictly follow the instruction given to you. Please act as an impartial judge and evaluate the quality of the responses based on the prompt and following criteria:", + "ratingRubric": [ + { + "rating": 3, + "rule": "Response is completely factual with no unsupported claims" + }, + { + "rating": 2, + "rule": "Response has minor inaccuracies but no major contradictions" + }, + { + "rating": 1, + "rule": "Response contains significant factual errors or hallucinations" + } + ] + } + } + } + } +] +``` + +```python +import os +import json +import requests + + +# --- Fetch all metrics from SAP AI Core --- +def fetch_all_metrics(): + request_url = f"{AICORE_BASE_URL}/v2/lm/evaluationMetrics" + resp = requests.get(request_url, headers=_get_headers()) + resp.raise_for_status() + return resp.json().get("resources", []) + +# --- Create or fetch a metric --- +def create_or_get_metric(custom_metric, user_metric_id=None): + all_metrics = fetch_all_metrics() + + # 1️⃣ User-supplied ID lookup + if user_metric_id: + for m in all_metrics: + if m.get("id") == user_metric_id: + print(f"✅ Metric already exists by ID: {user_metric_id}") + return user_metric_id + print(f"⚠️ User metric ID {user_metric_id} not found, will only include if valid later") + + # 2️⃣ Check by scenario, name, version + scenario = custom_metric.get("scenario") + name = custom_metric.get("name") + version = custom_metric.get("version") + if not all([scenario, name, version]): + raise ValueError("Metric must include 'scenario', 'name', and 'version'") + + for m in all_metrics: + if (m.get("scenario") == scenario and + m.get("name") == name and + m.get("version") == version): + metric_id = m.get("id") + print(f"✅ Metric already exists: {scenario}/{name} v{version}, ID = {metric_id}") + return metric_id + + # 3️⃣ Create metric if not found + request_url = f"{AICORE_BASE_URL}/v2/lm/evaluationMetrics" + required_fields = ["scenario", "name", "version", "evaluationMethod", "metricType"] + for f in required_fields: + if f not in custom_metric: + raise ValueError(f"❌ Missing required field: {f}") + + resp = requests.post(request_url, headers=_get_headers(), json=custom_metric) + resp.raise_for_status() + metric_id = resp.json().get("id") + print(f"✅ Metric created successfully: {name} v{version}, ID = {metric_id}") + return metric_id + +# --- Main pipeline --- + +# 1️⃣ Create/fetch metrics from SAP AI Core +metric_ids = [] +for metric in custom_metric_list: + try: + print(f"metric:{metric}") + metric_id = create_or_get_metric(metric) + metric_ids.append(metric_id) + except ValueError as e: + print(f"Skipping metric due to error: {e}") + +# 2️⃣ Validate user_metric_ids separately if provided +if user_metric_ids and user_metric_ids.strip(): + all_metrics = fetch_all_metrics() + # Split comma-separated IDs and strip whitespace + for uid in [uid.strip() for uid in user_metric_ids.split(",")]: + if any(m.get("id") == uid for m in all_metrics): + metric_ids.append(uid) + else: + print(f"⚠️ User metric ID {uid} does not exist in AI Core, skipping.") +# 3️⃣ Convert to comma-separated string +custom_metric_ids_str = ",".join(metric_ids) +print("✅ All processed metric IDs:", custom_metric_ids_str) +``` +![img](img/image_py03.png) + +This ensures all required metrics are available before launching the evaluation. + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +Bruno supports two ways of providing metrics: + +**Use System-Defined Metrics** + +You can directly pass system metrics in your configuration: + +Example: + +```json +"metrics": "Pointwise Answer Relevance, Pointwise Instruction Following" +``` + +If you want to register custom metrics, you must call: + +➡️ **Create Custom Metric** + +```bash +POST {{ai_api_url}}/v2/lm/evaluationMetrics +``` +**Body example:** + +```json +{ + "name": "test-metric", + "scenario": "genai-evaluations-test", + "version": "0.0.1", + "evaluationMethod": "llm-as-a-judge", + "managedBy": "imperative", + "metricType": "evaluation", + "spec": { + "outputType": "numerical", + "promptType": "structured", + "configuration": { + "modelConfiguration": { + "name": "gpt-4.1-mini", + "version": "2025-08-07", + "parameters": [ + { + "key": "max_tokens", + "value": "10000" + } + ] + }, + "promptConfiguration": { + "definition": "You are an expert evaluator. Your task is to evaluate the quality of the responses generated by AI models. We will provide you with a reference and an AI-generated response. You should first read the user input carefully for analyzing the task, and then evaluate the quality of the responses based on the criteria provided in the Evaluation section below. You will assign the response a rating following the Rating Rubric and Evaluation Steps. Give step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.\n\n## Metric Definition\nYou are an INFORMATION OVERLAP classifier providing the overlap of information between a response and reference.\n\n## Criteria\nGroundedness: The of information between a response generated by AI models and provided reference.\n\n## Rating Rubric\n5: (Fully grounded). The response and the reference are fully overlapped.\n4: (Mostly grounded). The response and the reference are mostly overlapped.\n3: (Somewhat grounded). The response and the reference are somewhat overlapped.\n2: (Poorly grounded). The response and the reference are slightly overlapped.\n1: (Not grounded). There is no overlap between the response and the reference.\n\n## Evaluation Steps\nSTEP 1: Assess the response in aspects of Groundedness. Identify any information in the response and provide assessment according to the Criteria.\nSTEP 2: Score based on the rating rubric. Give a brief rationale to explain your evaluation considering Groundedness.\n\nReference: {{?reference}}\nResponse: {{?aicore_llm_completion}}\n\nBegin your evaluation by providing a short explanation. Be as unbiased as possible. After providing your explanation, please rate the response according to the rubric and outputs STRICTLY following this JSON format:\n\n{ \"explanation\": string, \"rating\": integer }\n\nOutput:\n", + "evaluationTask": "You are an expert evaluator. Your task is to evaluate the quality of the responses generated by AI models. We will provide you with a reference and an AI-generated response. You should first read the user input carefully for analyzing the task, and then evaluate the quality of the responses based on the criteria provided in the Evaluation section below. You will assign the response a rating following the Rating Rubric and Evaluation Steps. Give step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.\n\n## Metric Definition\nYou are an INFORMATION OVERLAP classifier providing the overlap of information between a response and reference.\n\n## Criteria\nGroundedness: The of information between a response generated by AI models and provided reference.\n\n## Rating Rubric\n5: (Fully grounded). The response and the reference are fully overlapped.\n4: (Mostly grounded). The response and the reference are mostly overlapped.\n3: (Somewhat grounded). The response and the reference are somewhat overlapped.\n2: (Poorly grounded). The response and the reference are slightly overlapped.\n1: (Not grounded). There is no overlap between the response and the reference.\n\n## Evaluation Steps\nSTEP 1: Assess the response in aspects of Groundedness. Identify any information in the response and provide assessment according to the Criteria.\nSTEP 2: Score based on the rating rubric. Give a brief rationale to explain your evaluation considering Groundedness.\n\nReference: {{?reference}}\nResponse: {{?aicore_llm_completion}}\n\nBegin your evaluation by providing a short explanation. Be as unbiased as possible. After providing your explanation, please rate the response according to the rubric and outputs STRICTLY following this JSON format:\n\n{ \"explanation\": string, \"rating\": integer }\n\nOutput:\n", + "criteria": "You should strictly follow the instruction given to you. Please act as an impartial judge and evaluate the quality of the responses based on the prompt and following criteria:", + "ratingRubric": [ + { + "rating": 3, + "rule": "Response is completely factual with no unsupported claims" + }, + { + "rating": 2, + "rule": "Response has minor inaccuracies but no major contradictions" + }, + { + "rating": 1, + "rule": "Response contains significant factual errors or hallucinations" + } + ] + } + } + } + } +``` +![img](img/image_br_mtrs.png) + +You will receive: + +```json +"id": "" +``` + +This metric ID can be directly passed into the evaluation configuration. + +[OPTION END] + +**Note** + +To evaluate and compare multiple models in a single execution, you must create a distinct orchestration registry ID for each model you wish to test. Assign a different foundation model to each registry ID, and then pass this list of registry IDs into your evaluation configuration. This ensures the system generates separate, comparable runs for each model simultaneously. + +### Define and Create Evaluation Configurations + +[OPTION BEGIN [SAP AI Launchpad]] + +Once your dataset artifact is registered and you have completed creating Orchestration Registry, the next step is to create an Evaluation Configuration. + +An Evaluation Configuration tells SAP AI Core: + + - which dataset to evaluate + + - which prompt/model or orchestration config to use + + - which metrics to compute + + - which orchestration deployment endpoint to call + + - how many repetitions to run + + - which test dataset file to load + +This configuration becomes the blueprint for your evaluation execution. + +**Steps to Create Evaluation Configuration** + +In Additional Configuration + +- Set **Number of Repetitions** to `1`. +- Choose an existing deployment for **Orchestration Endpoint**. + + ![img](img/image_29.png) +--- + +#### Final Review & Start + +- Review all the details on the summary page. +- Once confirmed, click **Create** to start the evaluation job. + +![img](img/image_40.png) + +> ✅ You have now successfully configured and triggered a Generative AI Evaluation. + +[OPTION END] + +[OPTION BEGIN [Python]] + +When using the Python notebook, the evaluation configuration is created automatically based on your selections. +Before creating the configuration, the notebook will: + + - Load the dataset artifact ID + + - Resolve metric IDs + + - Load orchestration registry IDs + + - Validate all required parameters + +**Sample parameter setup:** + +```Python +import json +test_data_path = f"testdata/{DATASET_NAME}" # specify the test data path here. For the full folder just specifying testdata will work +test_datasets = json.dumps({'path': test_data_path, 'type': 'csv'}) +metrics_list = ",".join([selected_metrics_str,custom_metric_ids_str]) +models_list = selected_models_str +print(f"Selected metrics: {metrics_list}") +print(f"Selected models: {models_list}") +orchestration_deployment_url = deployment_url +repetitions = "1" +``` + +#### Create Configuration Body + +The notebook builds the configuration using the required SAP AI Core fields: + + - scenarioId + + - executableId + + - dataset artifact binding + + - selected metrics + + - test dataset details + + - repetitions + + - orchestration deployment URL + + - orchestrationRegistryIds + + - models. + +The following function dynamically creates the configuration body for AI Core. + +```Python +# creating an AICORE Configuration. +import requests + +request_body = { + "name": "genai-eval-conf", + "scenarioId": "genai-evaluations", + "executableId": "genai-evaluations-simplified", + "inputArtifactBindings": [ + { + "key": "datasetFolder", + "artifactId": "e30ef8d7-c3e1-4b9c-a834-a00ac0a9a053" + } + ], + "parameterBindings": [ + { + "key": "repetitions", + "value": repetitions + }, + { + "key": "orchestrationDeploymentURL", + "value": orchestration_deployment_url + }, + { + "key": "metrics", + "value": metrics_list + }, + { + "key": "testDataset", + "value": test_datasets + }, + { + "key": "orchestrationRegistryIds", + "value": orchestration_registry_id + } + ] +} + +def create_aicore_configuration(): + headers = _get_headers() + GET_CONFIGURATIONS_ENDPOINT = '/v2/lm/configurations' + request_url = f"{AICORE_BASE_URL}{GET_CONFIGURATIONS_ENDPOINT}" + try: + response = requests.post( + request_url, headers=headers, data=json.dumps(request_body), timeout=120 + ) + print(response) + if(response.status_code != 201): + raise + result = response.json() + print(result) + return result['id'] + except: + logging.error("Error occurred while attempting to create a Configuration") + raise + +configuration_id = create_aicore_configuration() +``` + +You will receive a configuration ID, which is required for the next step (Execution). + +![img](img/image_py_con.png) + +SAP AI Core returns a configuration ID, which is used to trigger the evaluation execution. + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +When creating an Evaluation Configuration through Bruno, you call: + +```bash +POST {{api_url}}/v2/lm/configurations +``` + +Below is the sample request body to create configuration. + +```json +{ + "name": "genai-eval-conf", + "scenarioId": "genai-evaluations", + "executableId": "genai-evaluations-simplified", + "inputArtifactBindings": [ + { + "key": "datasetFolder", + "artifactId": "{{artifactId}}" + } + ], + "parameterBindings": [ + { + "key": "repetitions", + "value": "1" + }, + { + "key": "orchestrationDeploymentURL", + "value": "{{deployment_url}}" + }, + { + "key": "metrics", + "value": "BERT Score, Pointwise Conciseness" + }, + { + "key": "testDataset", + "value": "{\"path\": \"testdata/{{dataset_file}}\", \"type\": \"csv\"}" + }, + { + "key": "orchestrationRegistryIds", + "value": "{{orchestrationRegistryIds}}" + }, + { + "key": "models", + "value": "{{model_name}}:{{model_version}}" + } + ] +} +``` +![img](img/image-br03.png) + +[OPTION END] + +### Create and Run Evaluation Execution + +After creating the Evaluation Configuration, the next step is to execute it. + +Execution triggers the evaluation workflow, which: + + - Reads the test dataset + + - Generates submissions to the orchestration service + + - Collects model outputs + + - Computes all selected metrics + + - Produces aggregate and raw evaluation results + +The process is identical for SAP AI Launchpad, Python, and Bruno, with only the invocation method differing. + +[OPTION BEGIN [SAP AI Launchpad]] + +- Once the evaluation configuration is created, the system automatically triggers an evaluation execution. + +- Follow these steps to monitor its progress and verify completion: + + - Navigate to **ML Operations** in the SAP AI Core Launchpad. + + - In the sidebar, click **Executions**. + + ![img](img/image_41.png) + + - Locate the most recent execution triggered by your evaluation configuration. You can use the timestamp or configuration name to identify it. + + - Click on the execution entry to open its details. The Current Status will update as the process runs. + + ![img](img/image_31.png) + +- Once the Target Status reaches **COMPLETED** , your evaluation has successfully finished. + +> [For More information](https://help.sap.com/docs/sap-ai-core/generative-ai-hub/create-evaluation) + +Track Execution Status + +The execution page will show: + + - Unknown + + - Pending + + - Running + + - Completed + +Once completed, you can navigate to: + + - Outputs → Tracking Metrics (aggregate results) + + - Output Artifacts (raw results stored in the SQLite DB) + +[OPTION END] + +[OPTION BEGIN [Python]] + +Once the configuration is ready, the next step is to trigger an execution. +An execution is a single evaluation run based on the configuration you defined. + +**Create Execution** + +The following function starts the evaluation in SAP AI Core using the configuration ID: + +```python +# create an execution with the created configuration. + +import requests +def create_execution(): + headers = _get_headers() + GET_EXECUTIONS_ENDPOINT = '/v2/lm/executions' + request_url = f"{AICORE_BASE_URL}{GET_EXECUTIONS_ENDPOINT}" + request_body = {"configurationId" : configuration_id} + try: + response = requests.post( + request_url, headers=headers, data=json.dumps(request_body), timeout=120 + ) + print("response received is ", response) + result = response.json() + print(result) + return result['id'] + except: + logging.error("Error occurred while attempting to create an execution") + raise + + +execution_id = create_execution() +``` +![img](img/image_44.png) + +#### Monitor Execution Status + +The execution progresses through states: + +UNKNOWN → PENDING → RUNNING → COMPLETED + +```python +# get execution status +import requests +def get_execution_status(execution_id): + headers = _get_headers() + LOG_EXECUTIONS_ENDPOINT = f'/v2/lm/executions/{execution_id}' + request_url = f"{AICORE_BASE_URL}{LOG_EXECUTIONS_ENDPOINT}" + try: + response = requests.get( + request_url, headers=headers, timeout=120 + ) + print("response received is ", response) + result = response.json() + return result + except: + logging.error("Error occurred while attempting to get execution status") + raise + + +get_execution_status(execution_id) +``` + +#### Automatic Polling + +To continuously monitor until the evaluation finishes: + +```python +# Polling the execution status until it is COMPLETED or DEAD or timeout occurs +def poll_execution_status(execution_id, timeout_minutes=1800, poll_interval=30): + start_time = time.time() + while True: + result = get_execution_status(execution_id) + print(f"Execution Status: {result.get('status')}") + if result.get("status") == "COMPLETED": + print(f"Execution completed successfully in {time.time() - start_time} seconds, proceed to fetch results.") + break + if result.get("status") == "DEAD": + print(f"Execution failed with status DEAD in {time.time() - start_time} seconds. Check the logs for more details.") + break + if time.time() - start_time > timeout_minutes * 60: + raise TimeoutError(f"Execution status polling timed out after {timeout_minutes} minutes.") + time.sleep(poll_interval) + +``` + +![img](img/image_45.png) + +✅ Once the execution status shows COMPLETED, the evaluation results are available and can be analyzed in the next step. + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +After creating the configuration, the next step is to trigger the evaluation workload by creating an AI Core execution. + +**Create an Execution with the Created Configuration** + +- Click on create execution under executions, pass the configuration id created in previous step + +![img](img/image-br04.png) + +- The status field progresses through different states over time: +UNKNOWN → PENDING → RUNNING → COMPLETED. + +**Get Execution Status** + +check the status of created execution by passing the execution ID, The Current Status will update as the process runs. please refer the below image + +![img](img/image-br05.png) + +[OPTION END] + +### View and Analyze Evaluation Results + +Once the evaluation execution is complete, SAP AI Core generates both aggregated metrics and detailed instance-level results. +These results help compare model performance, understand quality metrics, and debug issues. + +[OPTION BEGIN [SAP AI Launchpad]] + +Once the evaluation workflow execution is completed, this step retrieves the aggregated evaluation metrics from the SAP AI Core service by specifying the run name. + +1. Go to Optimizations + +2. In the runs section , select the run you created + +3. you can View detailed results of a run across your selected metrics. + +This is the easiest way to visually inspect evaluation outcomes and you can also compare multiple model runs. + +![img](img/image_46_01.png) + +- Compare run performance across your selected metrics. Metrics are aggregated at run level. + +![img](img/image_46.png) + +![img](img/image_46a.png) + +[OPTION END] + +[OPTION BEGIN [Python]] + +The notebook includes utility scripts to retrieve aggregated metrics, download detailed artifacts, and inspect SQLite results.This returns all metric values per evaluated run. + +**Retrieve Aggregate Metrics (Tracking API)** + +Aggregated metrics summarize performance across all test samples. +To fetch them using execution ID: + +```python +# Get aggregate metrics using execution id +import requests +def retrieve_aggregate_metrics(execution_id): + headers = _get_headers() + GET_METRICS_ENDPOINT = f'/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/child-of={execution_id}' + request_url = f"{AICORE_BASE_URL}{GET_METRICS_ENDPOINT}" + try: + response = requests.get(request_url, headers=headers, timeout=120) + print("response received is ", response) + result = response.json() + return result + except: + logging.error("Error occurred while attempting to retreive aggeregate metrics for the run") + raise + +runs_data = retrieve_aggregate_metrics(execution_id) +``` +![img](img/image_47.png) + +**Download Raw Results (Output Artifact)** + +All detailed evaluation outputs are stored as an output artifact in your object store. To download all output files programmatically: + +```python +# download the result artifacts from Object store. +import boto3 + +def download_all_objects(prefix, destination_folder): + """ + Recursively download all objects from an S3 bucket starting with a specific prefix. + + :param bucket_name: Name of the S3 bucket. + :param prefix: Prefix to filter objects in the bucket. + :param destination_folder: Local folder to save the downloaded files. + """ + s3_client = boto3.client( + 's3', + aws_access_key_id=AWS_ACCESS_KEY, + aws_secret_access_key=AWS_SECRET_ACCESS_KEY, + region_name=AWS_REGION + ) + + # Ensure the destination folder exists + if not os.path.exists(destination_folder): + os.makedirs(destination_folder) + + # Paginate through objects + paginator = s3_client.get_paginator('list_objects_v2') + pages = paginator.paginate(Bucket=AWS_BUCKET_ID, Prefix=prefix) + + for page in pages: + if 'Contents' in page: + for obj in page['Contents']: + key = obj['Key'] + local_file_path = os.path.join(destination_folder, os.path.relpath(key, prefix)) + + # Ensure the local directory structure exists + local_directory = os.path.dirname(local_file_path) + if not os.path.exists(local_directory): + os.makedirs(local_directory) + + # Download the object + print(f"Downloading {key} to {local_file_path}") + s3_client.download_file(AWS_BUCKET_ID, key, local_file_path) + + +# Download the evaluation results from the object store. Look at execution status under "outputArtifacts" key to see the 'url' +# which shows the data path of where your output results are stored +EXECUTION_ID = execution_id +sqlite_db_prefix = f'{EXECUTION_ID}/tmp/' # change the prefix based on where your output artifact is stored in the bucket. +destination_folder = 'results-new' + +download_all_objects(sqlite_db_prefix, destination_folder) +``` + +![img](img/image_48.png) + +**View Detailed Results (SQLite DB)** + +The evaluation stores detailed instance-level results in results.db. + +Example: Reading SQLite tables: + +```python +# viewing the results from sqlite db in tabular format.. +import sqlite3 +import pandas as pd +from IPython.display import display, HTML + +# Path to your SQLite database file +db_file = 'results-new/results.db' + +connection = sqlite3.connect(db_file) + +# Specify the table names you want to display +table_names = ['run','configuration', 'submission', 'submission_result', 'evaluation_result'] + +# Create the CSS and HTML container +html_content = """ + +
+""" + +for table_name in table_names: + query = f"SELECT * FROM {table_name};" + df = pd.read_sql_query(query, connection) + # If you want to see all the rows across all tables, remove/comment the next line + df = df.head(5) # Limiting the number of rows displayed + table_html = df.to_html(classes='table-container', index=False) + html_content += f""" +
+

Table: {table_name}

+ {table_html} +
+ """ + +html_content += "
" + +display(HTML(html_content)) + +# Close the connection +connection.close() +``` + +![img](img/image_py_rk.png) + +#### Process and Rank Results + +This step generates a leaderboard ranking models by their Win Rate (percentage of pairwise victories), providing a robust, comparative measure of the best-performing model and prompt configuration. + +```Python +import pandas as pd +import numpy as np +import sqlite3 +import json +import os +from IPython.display import display, HTML + +# ========================================== +# 1. CONFIGURATION (Separated Groups) +# ========================================== +METRIC_GROUPS = { + "Categorical": { + "type": "categorical", + "description": "Weighted Average (1-5 scale)", + "metrics": [ + "Pointwise Conciseness", + "Pointwise Instruction Following", + "Pointwise Correctness", + "Pointwise Answer Relevance" + ] + }, + "Boolean": { + "type": "categorical", # Uses same weighted avg logic (0 or 1) + "description": "Pass Rate (0-1 scale)", + "metrics": [ + "Exact Match", + "Content Filter on Input", + "Content Filter on Output", + "Language Match", + "JSON Schema Match" + ] + }, + "Numerical": { + "type": "numerical", + "description": "Mean Value", + "metrics": [ + "BLEU", + "ROUGE", + "BERT Score", + "test-metric" + ] + } +} + +# ========================================== +# 2. DATA EXTRACTION +# ========================================== +def extract_db_metadata(db_path): + if not os.path.exists(db_path): return pd.DataFrame() + conn = sqlite3.connect(db_path) + df_runs = pd.read_sql_query("SELECT id, name, tags, config FROM run", conn) + conn.close() + + meta_data = [] + for _, row in df_runs.iterrows(): + run_id = str(row["id"]) + run_name = str(row["name"]) + tags = {} + config = {} + try: tags = json.loads(row["tags"]) if isinstance(row["tags"], str) else row["tags"] + except: pass + try: config = json.loads(row["config"]) if isinstance(row["config"], str) else row["config"] + except: pass + + model = "Unknown" + try: model = config["modules"]["prompt_templating"]["model"]["name"] + except: + if isinstance(tags, dict): model = tags.get("evaluation.ai.sap.com/model", "Unknown") + elif isinstance(tags, list): + for t in tags: + if t.get("key") == "evaluation.ai.sap.com/model": model = t.get("value") + + meta_data.append({"run_id": run_id, "run_name": run_name, "model": model}) + return pd.DataFrame(meta_data) + +def extract_api_metrics(runs_data_resource): + flat_data = [] + for run in runs_data_resource: + model = "Unknown" + for t in run.get("tags", []): + if t.get("name") == "evaluation.ai.sap.com/model": + model = t.get("value") + break + for m in run.get("metrics", []): + clean_name = m.get("name", "").replace('"', '').strip() + flat_data.append({ + "model": model, + "metrics_name_clean": clean_name, + "metric_value": m.get("value") + }) + df = pd.DataFrame(flat_data) + df['metric_value'] = pd.to_numeric(df['metric_value'], errors='coerce') + return df + +# ========================================== +# 3. SCORING & HELM LOGIC +# ========================================== +def calculate_weighted_avg_score(row, cols): + """ Returns a score based on counts. + Categorical: 1-5 scale. + Boolean: 0-1 scale (Pass Rate). + """ + total_score = 0 + total_count = 0 + # Check counts 0-5 (covers Boolean 0/1 and Categorical 1-5) + for rating in range(0, 6): + col_name = next((c for c in cols if f"/{rating}/count" in c), None) + if col_name and not pd.isna(row[col_name]): + count = row[col_name] + total_score += count * rating + total_count += count + return total_score / total_count if total_count > 0 else 0.0 + +def get_metric_score_series(df_metrics, metric_name, group_type): + """ Returns a Series of SCORES (Scalar) for each model for a specific metric """ + subset = df_metrics[df_metrics['metrics_name_clean'].str.startswith(metric_name)] + if subset.empty: return None + + # Pivot to get columns for this metric + pivot = subset.pivot_table(index='model', columns='metrics_name_clean', values='metric_value', aggfunc='first') + cols = pivot.columns.tolist() + + if group_type == "categorical": + # Calculate Weighted Average (or Pass Rate for Boolean) + return pivot.apply(lambda row: calculate_weighted_avg_score(row, cols), axis=1) + else: + # Calculate Mean (Numerical) + c_mean = next((c for c in cols if "mean" in c), None) + if c_mean: return pivot[c_mean] + return None + +def calculate_group_win_rate(score_table): + """ + Calculates HELM Win Rate: % of times a model beats another model across all metrics in this group. + """ + models = score_table.index.tolist() + metrics = score_table.columns.tolist() + win_rates = {} + + for model_a in models: + wins = 0 + comparisons = 0 + + for model_b in models: + if model_a == model_b: continue + + # Compare across ALL metrics in this table + for metric in metrics: + score_a = score_table.at[model_a, metric] + score_b = score_table.at[model_b, metric] + + # Only compare valid scores + if pd.isna(score_a) or pd.isna(score_b): continue + + comparisons += 1 + if score_a > score_b: + wins += 1 + + win_rates[model_a] = wins / comparisons if comparisons > 0 else 0.0 + + return pd.Series(win_rates) + +# ========================================== +# 4. EXECUTION +# ========================================== +db_file = 'results-new/results.db' + +# A. Metadata +df_db_meta = extract_db_metadata(db_file) +df_db_unique = df_db_meta.drop_duplicates(subset=['model'], keep='last') + +# B. CSS +html_content = """ + +
+""" +if 'runs_data' in locals() and runs_data: + df_metrics_all = extract_api_metrics(runs_data['resources']) + + for group_name, config in METRIC_GROUPS.items(): + + # 1. Build Score Table + score_table = pd.DataFrame(index=df_db_unique['model'].unique()) + score_table.index.name = 'model' + + valid_metrics = [] + + # 2. Calculate Scores + for metric in config["metrics"]: + scores = get_metric_score_series(df_metrics_all, metric, config["type"]) + if scores is not None: + score_table[metric] = scores + valid_metrics.append(metric) + + if not valid_metrics: + continue + + # 3. Calculate HELM Win Rate (Specific to this group) + score_table['Win Rate'] = calculate_group_win_rate(score_table[valid_metrics]) + + # 4. Calculate Final Rank + score_table['Final Rank'] = score_table['Win Rate'].rank(ascending=False, method='min') + + # 5. Merge & Format + df_final = pd.merge(df_db_unique, score_table, on='model', how='inner') + df_final = df_final.sort_values('Final Rank') + + # Rounding + for c in valid_metrics: df_final[c] = df_final[c].fillna(0.0).astype(float).round(4) + df_final['Win Rate'] = df_final['Win Rate'].fillna(0.0).astype(float).round(4) + df_final['Final Rank'] = df_final['Final Rank'].fillna(0).astype(int) + + # Columns + meta_cols = ['run_id', 'run_name', 'model'] + final_cols = meta_cols + ['Win Rate', 'Final Rank'] + valid_metrics + + # 6. Generate HTML + table_html = df_final[final_cols].to_html(classes='table-container', index=False) + + html_content += f""" +
+

{group_name} Comparison

+

Values: {config['description']}. Win Rate based on head-to-head performance.

+ {table_html} +
+ """ + + html_content += "
" + display(HTML(html_content)) + +else: + print("'runs_data' missing.") +``` +![img](img/image_py_rnk1.png) + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +Retrieve Aggregate Metrics + +Send a GET request: + +**GET** +```bash +{{apiurl}}/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/child-of={{execution_id}} +``` +**Retrieve Aggregate Metrics Using Run Name** + +Send a GET request: + +**GET** +```bash +{{apiurl}}/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/run-name={{run_name}} +``` + +This returns aggregated values for: + + - latency + + - token usage + + - metric scores + + - completion count + +**Download Raw Results** + +1. Open the execution details + +2. Copy the output artifact URL + +3. Download the folder to obtain + + - step-wise results + + - sqlite_combined/results.db + +**Inspect Detailed Results** + +Open the SQLite DB in any client to inspect: + + - submissions + + - completion responses + + - evaluation_results (raw metric scores) + + - aggregation_results + + - custom_logs + +![img](img/image_49.png) + +[OPTION END] + +### Delete Evaluation Artifacts and Configurations + +Over time, your workspace may accumulate old configurations, executions, and metrics. +SAP AI Core allows you to safely delete these resources once they are no longer needed. + +This section explains how to delete: + + - Evaluation Executions + + - Evaluation Configurations + +⚠️ Important: + +Deletions are permanent and cannot be undone. + +[OPTION BEGIN [SAP AI Launchpad]] + +**Delete Executions** + +1. Go to ML Operations → Executions + +2. Select the execution + +3. Click Delete + +4. Confirm the deletion + +**Delete Evaluation Configurations** + +1. Go to ML Operations → Configurations + +2. Select the configuration you created + +3. Click Delete + +[OPTION END] + +[OPTION BEGIN [Python]] + +**1. Delete an Evaluation Execution** + +```python +#Delete Execution Id +def delete_execution(): + headers = _get_headers() + EXEC_ID = execution_id + GET_EXECUTIONS_ENDPOINT = '/v2/lm/executions/' + request_url = f"{AICORE_BASE_URL}{GET_EXECUTIONS_ENDPOINT}{EXEC_ID}" + try: + response = requests.delete( + request_url, headers=headers, params={"AI-Resource-Group":AICORE_RESOURCE_GROUP}, timeout=120 + ) + print(response) + if(response.status_code != 202): + raise + result = response.json() + print(result) + except: + logging.error("Error occurred while attempting to delete a Configuration") + raise + +delete_execution() +``` +**2. Delete an Evaluation Configuration** + +```python +def delete_configuration(configuration_id): + headers = _get_headers() + endpoint = f"/v2/lm/configurations/{configuration_id}" + url = f"{AICORE_BASE_URL}{endpoint}" + + response = requests.delete(url, headers=headers) + print("Status:", response.status_code) + print(response.text) + +# Example: +delete_configuration(configuration_id) +``` + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +**1. Delete Execution** + +**DELETE Request** +```bash +{{apiurl}}/v2/lm/executions/{{execution_id}} +``` +**Headers:** +``` +Authorization: Bearer {{access_token}} +AI-Resource-Group: {{resource_group}} +``` +**2. Delete Configuration** + +```bash +DELETE {{apiurl}}/v2/lm/configurations/{{configuration_id}} +``` + +[OPTION END] diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/evaluation_workflow.ipynb b/tutorials/ai-core-genaihub-evaluation-comprehensive/evaluation_workflow.ipynb new file mode 100644 index 000000000..fb89c71d9 --- /dev/null +++ b/tutorials/ai-core-genaihub-evaluation-comprehensive/evaluation_workflow.ipynb @@ -0,0 +1,1848 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Generative AI Custom Evaluation\n", + "This is an example notebook which showcases how a user can use AI Core custom evaluation to benchmark their large language models, evaluate orchestration configuration or prompts for their use case.\n", + "It uses publicly available [MedicationQA dataset](https://langtest.org/docs/pages/benchmarks/medical/medicationqa/) which consists of commonly asked consumer questions about medications. The workload computes industry standard metrics to check the reliability of the response generate by llm.\n", + "
**Note: For detailed instructions please refer to [Readme](./Readme.md)**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# SetUp (Step 1)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "! pip install -r ../requirements.txt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load your environment variables\n", + "\n", + "Ensure that your environment variables are set in a `.env` file (see sample.env for an example). If there is a missing field the notebook will prompt you for a value." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Loading the credentials from the env file\n", + "from gen_ai_hub.proxy.gen_ai_hub_proxy import GenAIHubProxyClient\n", + "from dotenv import load_dotenv\n", + "import os\n", + "\n", + "load_dotenv(override=True)\n", + "\n", + "\n", + "# Fetching environment variables or prompting the user if missing\n", + "AICORE_BASE_URL = os.getenv(\"AICORE_BASE_URL\") or input(\"AICORE_BASE_URL is missing. Please enter it: \")\n", + "AICORE_RESOURCE_GROUP = os.getenv(\"AICORE_RESOURCE_GROUP\") or input(\"AICORE_RESOURCE_GROUP is missing. Please enter it (default: 'default'): \") or \"default\"\n", + "AICORE_AUTH_URL = os.getenv(\"AICORE_AUTH_URL\") or input(\"AICORE_AUTH_URL is missing. Please enter it: \")\n", + "AICORE_CLIENT_ID = os.getenv(\"AICORE_CLIENT_ID\") or input(\"AICORE_CLIENT_ID is missing. Please enter it: \")\n", + "AICORE_CLIENT_SECRET = os.getenv(\"AICORE_CLIENT_SECRET\") or input(\"AICORE_CLIENT_SECRET is missing. Please enter it: \")\n", + "\n", + "AWS_ACCESS_KEY = os.getenv(\"AWS_ACCESS_KEY\") or input(\"AWS_ACCESS_KEY is missing. Please enter it: \")\n", + "AWS_BUCKET_ID = os.getenv(\"AWS_BUCKET_ID\") or input(\"AWS_BUCKET_ID is missing. Please enter it: \")\n", + "AWS_REGION = os.getenv(\"AWS_REGION\") or input(\"AWS_REGION is missing. Please enter it: \")\n", + "AWS_SECRET_ACCESS_KEY = os.getenv(\"AWS_SECRET_ACCESS_KEY\") or input(\"AWS_SECRET_ACCESS_KEY is missing. Please enter it: \")\n", + "DEPLOYMENT_URL = os.getenv(\"DEPLOYMENT_URL\", None)\n", + "\n", + "# Initializing the GenAIHubProxyClient\n", + "client = GenAIHubProxyClient(\n", + " base_url=AICORE_BASE_URL,\n", + " auth_url=AICORE_AUTH_URL,\n", + " client_id=AICORE_CLIENT_ID,\n", + " client_secret=AICORE_CLIENT_SECRET,\n", + " resource_group=AICORE_RESOURCE_GROUP\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Dependencies and Helper Functions (Step 2)" + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Dataset name: medicalqna_dataset.csv\n" + ] + } + ], + "source": [ + "import os\n", + "import json\n", + "\n", + "\n", + "\n", + "def get_dataset_file_name(folder_path):\n", + " \"\"\"\n", + " Retrieves the name of the first file in the specified folder.\n", + " \"\"\"\n", + " if not os.path.isdir(folder_path):\n", + " print(f\"The folder path '{folder_path}' does not exist.\")\n", + " return None\n", + "\n", + " items_in_folder = os.listdir(folder_path)\n", + "\n", + " for item in items_in_folder:\n", + " item_path = os.path.join(folder_path, item)\n", + " if os.path.isfile(item_path):\n", + " return item\n", + "\n", + " print(f\"No files were found in the folder '{folder_path}'.\")\n", + " return None\n", + "\n", + "\n", + "\n", + "# --- MAIN EXECUTION ---\n", + "DATASET_FOLDER = \"../DATASET\"\n", + "\n", + "DATASET_NAME = get_dataset_file_name(DATASET_FOLDER)\n", + "\n", + "if DATASET_NAME:\n", + " print(f\"Dataset name: {DATASET_NAME}\")\n", + "else:\n", + " print(\"Missing run or dataset file.\")\n", + " raise SystemExit(\"Exiting due to missing run/dataset file.\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Register an Object Store Secret\n", + "To use the evaluations service, you must register an object store with the name default. Optionally, you can register an additional object store with a name of your choice." + ] + }, + { + "cell_type": "code", + "execution_count": 66, + "metadata": {}, + "outputs": [], + "source": [ + "# setup authentication and headers needed for AI Core requests\n", + "def _get_headers():\n", + " headers = {\n", + " \"Authorization\": client.get_ai_core_token(),\n", + " \"AI-Resource-Group\": AICORE_RESOURCE_GROUP,\n", + " \"Content-Type\": \"application/json\",\n", + " }\n", + " return headers" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Successfully deleted object store secret: default\n", + "Object store secret not found: genai-simplified-notebook. It may not exist.\n" + ] + }, + { + "data": { + "text/plain": [ + "{'message': 'secret has been created'}" + ] + }, + "execution_count": 67, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Register S3 secret with AI Core which will be used an input source \n", + "import requests\n", + "import json\n", + "import logging\n", + "\n", + "def delete_oss_secret(oss_name=\"\"):\n", + " headers = _get_headers()\n", + " \n", + " DELETE_SECRETS_ENDPOINT = f'/v2/admin/objectStoreSecrets/{oss_name}'\n", + " request_url = f\"{AICORE_BASE_URL}{DELETE_SECRETS_ENDPOINT}\"\n", + " \n", + " try:\n", + " response = requests.delete(request_url, headers=headers, timeout=120)\n", + " if response.status_code == 202:\n", + " print(f\"Successfully deleted object store secret: {oss_name}\")\n", + " elif response.status_code == 404:\n", + " print(f\"Object store secret not found: {oss_name}. It may not exist.\")\n", + " else:\n", + " logging.error(f\"Failed to delete object store secret: {oss_name}, Status Code: {response.status_code}\")\n", + " except Exception as e:\n", + " logging.error(f\"Error occurred while attempting to delete object store secret: {e}\")\n", + " raise\n", + "\n", + "def register_oss_secret(oss_name=\"\", path_prefix=\"\"):\n", + " headers = _get_headers()\n", + " \n", + " POST_SECRETS_ENDPOINT = '/v2/admin/objectStoreSecrets'\n", + " request_url = f\"{AICORE_BASE_URL}{POST_SECRETS_ENDPOINT}\"\n", + " \n", + " request_body = {\n", + " \"name\": oss_name,\n", + " \"data\": {\n", + " \"AWS_ACCESS_KEY_ID\": AWS_ACCESS_KEY,\n", + " \"AWS_SECRET_ACCESS_KEY\": AWS_SECRET_ACCESS_KEY\n", + " },\n", + " \"type\": \"S3\",\n", + " \"bucket\": AWS_BUCKET_ID,\n", + " \"endpoint\": \"s3-eu-central-1.amazonaws.com\",\n", + " \"region\": AWS_REGION,\n", + " \"pathPrefix\": path_prefix,\n", + " \"verifyssl\": \"0\",\n", + " \"usehttps\": \"1\",\n", + " }\n", + " try:\n", + " response = requests.post(\n", + " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", + " )\n", + " result = response.json()\n", + " return result\n", + " except:\n", + " logging.error(\"Error occurred while attempting to create object store secret\")\n", + " raise\n", + " \n", + "delete_oss_secret(oss_name=\"default\")\n", + "delete_oss_secret(oss_name=\"genai-simplified-notebook\")\n", + " \n", + "register_oss_secret(oss_name=\"default\", path_prefix=\"\")\n", + "register_oss_secret(oss_name=\"genai-simplified-notebook\", path_prefix=\"\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# uploading these files to Object store to register as an artifact inside ai core\n", + "\n", + "import boto3\n", + "import os\n", + "import uuid\n", + "\n", + "def upload_folder_to_s3(folder_path, bucket_name, s3_prefix=\"\"):\n", + " \"\"\"\n", + " Upload a folder to an S3 bucket recursively.\n", + "\n", + " :param folder_path: The local folder path to upload.\n", + " :param bucket_name: The name of the S3 bucket.\n", + " :param s3_prefix: Optional prefix to use for the S3 keys (e.g., subfolder in the bucket).\n", + " \"\"\"\n", + " s3_client = boto3.client(\n", + " 's3',\n", + " aws_access_key_id=AWS_ACCESS_KEY,\n", + " aws_secret_access_key=AWS_SECRET_ACCESS_KEY,\n", + " region_name=AWS_REGION\n", + " )\n", + "\n", + " for root, dirs, files in os.walk(folder_path):\n", + " for file_name in files:\n", + " print(\"val of root is \", file_name)\n", + " local_path = os.path.join(root, file_name)\n", + " # Compute the relative path for the S3 key\n", + " relative_path = os.path.relpath(local_path, folder_path)\n", + " s3_key = os.path.join(s3_prefix, relative_path).replace(\"\\\\\", \"/\") # Ensure S3-compatible paths\n", + " print(\"val of s3 key is \", s3_key)\n", + " print(f\"Uploading {local_path} to s3://{bucket_name}/{s3_key}\")\n", + " \n", + " # Upload the file\n", + " s3_client.upload_file(local_path, bucket_name, s3_key)\n", + "\n", + "# Example usage\n", + "folder_to_upload_testdata = \"../DATASET\"\n", + "user_directory_prefix = \"\" # replace with your i-number as string here\n", + "prefix_guid = user_directory_prefix if user_directory_prefix is not None else str(uuid.uuid4().hex)\n", + "s3_testdata_prefix = f\"genaiEvaluation/{prefix_guid}/testdata\" # Leave empty for root of the bucket\n", + "\n", + "\n", + "upload_folder_to_s3(folder_to_upload_testdata, AWS_BUCKET_ID, s3_testdata_prefix)\n", + "input_artifact_path = f\"ai://genai-simplified-notebook/genaiEvaluation/{prefix_guid}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The user stores the input files in the object store and registers the root folder as artifact with AI Core. The File Upload and Artifact endpoints of AI Core API may be used for this purpose. In this example `genaiEvaluation\\{prefix_guid}` is the root folder containing the orchestration configurations and test data which is registered as AI Core artifact." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "import logging\n", + "# Registering the uploaded files from AWS as artifacts to use inside configuration.\n", + "\n", + "def register_artifact():\n", + " headers = _get_headers()\n", + " \n", + " GET_ARTIFACTS_ENDPOINT = '/v2/lm/artifacts'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_ARTIFACTS_ENDPOINT}\"\n", + " \n", + " request_body = {\n", + " \"labels\": [\n", + " {\n", + " \"key\": \"ext.ai.sap.com/prompt-evaluation\",\n", + " \"value\": \"true\"\n", + " }\n", + " ],\n", + " \"name\": \"genai-eval-simplified-test-data\",\n", + " \"kind\": \"other\",\n", + " \"url\": input_artifact_path, # input artifact path\n", + " \"description\": \"demo artifacts for evaluation flow.\",\n", + " \"scenarioId\": \"genai-evaluations\"\n", + " }\n", + " try:\n", + " response = requests.post(\n", + " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", + " )\n", + " result = response.json()\n", + " print(result)\n", + " return result['id']\n", + " except:\n", + " print(\"Error occurred while attempting to create an execution\")\n", + " raise\n", + " \n", + "\n", + "artifact_id = register_artifact()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Orchestration Deployment\n", + "An orchestration Deployment URL is required for us to run our evaluation. Once created we need to wait until the deployment is running and provides us a deployment url which will be add to our configuration file in the next step. You can skip this step if you already have a orchestration deployment running." + ] + }, + { + "cell_type": "code", + "execution_count": 69, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "import json\n", + "import time\n", + "\n", + "\n", + "\n", + "def create_orchestration_configuration():\n", + " headers = _get_headers()\n", + " request_body = {\n", + " \"name\": \"orchestrationDeployment\",\n", + " \"executableId\": \"orchestration\",\n", + " \"scenarioId\": \"orchestration\",\n", + " \"parameterBindings\": [\n", + " {\n", + " \"key\": \"modelFilterList\",\n", + " \"value\": \"null\"\n", + " },\n", + " {\n", + " \"key\": \"modelFilterListType\",\n", + " \"value\": \"allow\"\n", + " }\n", + " ],\n", + " \"inputArtifactBindings\": []\n", + " }\n", + " \n", + " GET_CONFIGURATIONS_ENDPOINT = '/v2/lm/configurations'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_CONFIGURATIONS_ENDPOINT}\"\n", + " try:\n", + " response = requests.post(\n", + " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", + " )\n", + " print(response)\n", + " if(response.status_code != 201):\n", + " raise\n", + " result = response.json()\n", + " print(result)\n", + " return result['id']\n", + " except:\n", + " logging.error(\"Error occurred while attempting to create a Configuration\")\n", + " raise\n", + " \n", + "def execute_orchestration_deployment(configuration_id):\n", + " headers = _get_headers()\n", + " GET_DEPLOYMENTS_ENDPOINT = '/v2/lm/deployments'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_DEPLOYMENTS_ENDPOINT}\"\n", + " \n", + " request_body = {\n", + " \"configurationId\": configuration_id\n", + " }\n", + " \n", + " try:\n", + " response = requests.post(\n", + " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", + " )\n", + " print(response)\n", + " if(response.status_code != 202):\n", + " print(\"Deployment execution failed\")\n", + " result = response.json()\n", + " print(result)\n", + " return result['id']\n", + " \n", + " except:\n", + " logging.error(\"Error occurred while attempting to create an execution\")\n", + " raise\n", + "\n", + "def get_deployment_status(orchestration_deployment_id):\n", + " headers = _get_headers()\n", + " api_url = f\"{AICORE_BASE_URL}/v2/lm/deployments/{orchestration_deployment_id}?$select=status\"\n", + " timeout = 400 \n", + " initial_interval = 30 \n", + " pending_interval = 10\n", + " start = time.time()\n", + "\n", + " status = None\n", + " current_interval = initial_interval\n", + "\n", + " while time.time() - start < timeout:\n", + " response = requests.get(api_url, headers=headers)\n", + " if response.status_code == 200:\n", + " status = response.json().get('status')\n", + " print(f\"Deployment {orchestration_deployment_id} status: {status}\")\n", + " # Adjust polling interval based on status\n", + " if status == 'RUNNING':\n", + " return True\n", + " elif status == 'UNKNOWN':\n", + " current_interval = initial_interval\n", + " elif status == 'PENDING':\n", + " current_interval = pending_interval\n", + "\n", + " else:\n", + " print(f\"Failed to fetch deployment status. HTTP {response.status_code}\")\n", + " return False\n", + "\n", + " # Waiting according to status for API call\n", + " time.sleep(current_interval)\n", + "\n", + "def get_deployment_url(orchestration_deployment_id):\n", + " headers = _get_headers()\n", + " response = requests.get(f\"{AICORE_BASE_URL}/v2/lm/deployments/{orchestration_deployment_id}\", headers=headers)\n", + " if response.status_code != 200:\n", + " raise Exception(f\"Failed to get deployment URL: {response.status_code} - {response.text}\")\n", + " return response.json().get('deploymentUrl')\n", + "\n", + "# You can skip this step if you already have a orchestration deployment running\n", + "deployment_url = DEPLOYMENT_URL\n", + "if not deployment_url:\n", + " configuration_id = create_orchestration_configuration()\n", + " orchestration_deployment_id = execute_orchestration_deployment(configuration_id)\n", + " is_running = get_deployment_status(orchestration_deployment_id) \n", + " if is_running:\n", + " deployment_url = get_deployment_url(orchestration_deployment_id)\n", + " print(f\"Deployment URL: {deployment_url}\")\n", + " else:\n", + " print(\"Deployment is not running or failed.\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Manually set the orchestration deployment url\n", + "# deployment_url=\"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Select your Models\n", + " \n", + "Add the LLMs you wish to use in the string `selected_models_str`\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 70, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Selected models string: gemini-2.5-pro:001\n" + ] + } + ], + "source": [ + "# Manual selection of models\n", + "selected_models_str=\"gemini-2.5-pro:001\"\n", + "print(\"Selected models string:\", selected_models_str)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Select system defined metrics\n", + " \n", + "Add the system defined metrics you wish to use in the string `selected_metrics_str`.\n", + "\n", + "**Note: If your dataset does not have a reference column, DO NOT Select metrcis where reference is required.**" + ] + }, + { + "cell_type": "code", + "execution_count": 71, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Pointwise Answer Relevance,Exact Match\n" + ] + } + ], + "source": [ + "# Manual Selection of Metrics\n", + "selected_metrics_str = \"Pointwise Answer Relevance,Exact Match\"\n", + "print(selected_metrics_str)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Custom Metric Creation and Selection\n", + "This script checks for an evaluation metric in SAP AI Core.\n", + "\n", + "1. You can provide Metric ID's directly by setting the variable as comma separated string:\n", + " user_metric_ids = `\"\"`\n", + " - ✅ If the ID exists, it will be returned.\n", + " \n", + "2. You can create a new custom metric by adding the json in `custom_metric_list` string\n", + " - The script will use the contents of the `custom_metric_list`\n", + " to search for an existing metric by scenario + name + version.\n", + "\n", + "3. If no existing metric is found:\n", + " - A new metric will be created using the details in `custom_metric_list`.\n", + " - Required fields in custom_metric: scenario, name, version, evaluationMethod.\n", + "\n", + "4. At the end:\n", + " - The script prints the final Metric ID that was found or created.\n", + "\n", + "Note: Skip the two following cell if you do not want to create/select a custom metric for your workload" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "user_metric_ids = \"d1868b00-1601-407a-92cd-0b9065682d1f,dbf56851-8444-45d3-a0c1-adbe210c7e771\"\n", + "\n", + "custom_metric_list = [\n", + " {\n", + " \"name\": \"test-metric\",\n", + " \"scenario\": \"genai-evaluations-test\",\n", + " \"version\": \"0.0.1\",\n", + " \"evaluationMethod\": \"llm-as-a-judge\",\n", + " \"managedBy\": \"imperative\",\n", + " \"systemPredefined\": False,\n", + " \"metricType\": \"evaluation\",\n", + " \"spec\": {\n", + " \"outputType\": \"numerical\",\n", + " \"promptType\": \"structured\",\n", + " \"configuration\": {\n", + " \"modelConfiguration\": {\n", + " \"name\": \"gpt-5\",\n", + " \"version\": \"2025-08-07\",\n", + " \"parameters\": [\n", + " {\n", + " \"key\": \"max_tokens\",\n", + " \"value\": \"10000\"\n", + " }\n", + " ]\n", + " },\n", + " \"promptConfiguration\": {\n", + " \"definition\": \"You are an expert evaluator. Your task is to evaluate the quality of the responses generated by AI models. We will provide you with a reference and an AI-generated response. You should first read the user input carefully for analyzing the task, and then evaluate the quality of the responses based on the criteria provided in the Evaluation section below. You will assign the response a rating following the Rating Rubric and Evaluation Steps. Give step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.\\n\\n## Metric Definition\\nYou are an INFORMATION OVERLAP classifier providing the overlap of information between a response and reference.\\n\\n## Criteria\\nGroundedness: The of information between a response generated by AI models and provided reference.\\n\\n## Rating Rubric\\n5: (Fully grounded). The response and the reference are fully overlapped.\\n4: (Mostly grounded). The response and the reference are mostly overlapped.\\n3: (Somewhat grounded). The response and the reference are somewhat overlapped.\\n2: (Poorly grounded). The response and the reference are slightly overlapped.\\n1: (Not grounded). There is no overlap between the response and the reference.\\n\\n## Evaluation Steps\\nSTEP 1: Assess the response in aspects of Groundedness. Identify any information in the response and provide assessment according to the Criteria.\\nSTEP 2: Score based on the rating rubric. Give a brief rationale to explain your evaluation considering Groundedness.\\n\\nReference: {{?reference}}\\nResponse: {{?aicore_llm_completion}}\\n\\nBegin your evaluation by providing a short explanation. Be as unbiased as possible. After providing your explanation, please rate the response according to the rubric and outputs STRICTLY following this JSON format:\\n\\n{ \\\"explanation\\\": string, \\\"rating\\\": integer }\\n\\nOutput:\\n\",\n", + " \"evaluationTask\": \"You are an expert evaluator. Your task is to evaluate the quality of the responses generated by AI models. We will provide you with a reference and an AI-generated response. You should first read the user input carefully for analyzing the task, and then evaluate the quality of the responses based on the criteria provided in the Evaluation section below. You will assign the response a rating following the Rating Rubric and Evaluation Steps. Give step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.\\n\\n## Metric Definition\\nYou are an INFORMATION OVERLAP classifier providing the overlap of information between a response and reference.\\n\\n## Criteria\\nGroundedness: The of information between a response generated by AI models and provided reference.\\n\\n## Rating Rubric\\n5: (Fully grounded). The response and the reference are fully overlapped.\\n4: (Mostly grounded). The response and the reference are mostly overlapped.\\n3: (Somewhat grounded). The response and the reference are somewhat overlapped.\\n2: (Poorly grounded). The response and the reference are slightly overlapped.\\n1: (Not grounded). There is no overlap between the response and the reference.\\n\\n## Evaluation Steps\\nSTEP 1: Assess the response in aspects of Groundedness. Identify any information in the response and provide assessment according to the Criteria.\\nSTEP 2: Score based on the rating rubric. Give a brief rationale to explain your evaluation considering Groundedness.\\n\\nReference: {{?reference}}\\nResponse: {{?aicore_llm_completion}}\\n\\nBegin your evaluation by providing a short explanation. Be as unbiased as possible. After providing your explanation, please rate the response according to the rubric and outputs STRICTLY following this JSON format:\\n\\n{ \\\"explanation\\\": string, \\\"rating\\\": integer }\\n\\nOutput:\\n\",\n", + " \"criteria\": \"You should strictly follow the instruction given to you. Please act as an impartial judge and evaluate the quality of the responses based on the prompt and following criteria:\",\n", + " \"ratingRubric\": [\n", + " {\n", + " \"rating\": 3,\n", + " \"rule\": \"Response is completely factual with no unsupported claims\"\n", + " },\n", + " {\n", + " \"rating\": 2,\n", + " \"rule\": \"Response has minor inaccuracies but no major contradictions\"\n", + " },\n", + " {\n", + " \"rating\": 1,\n", + " \"rule\": \"Response contains significant factual errors or hallucinations\"\n", + " }\n", + " ]\n", + " }\n", + " }\n", + " }\n", + " }\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import json\n", + "import requests\n", + "\n", + "\n", + "# --- Fetch all metrics from SAP AI Core ---\n", + "def fetch_all_metrics():\n", + " request_url = f\"{AICORE_BASE_URL}/v2/lm/evaluationMetrics\"\n", + " resp = requests.get(request_url, headers=_get_headers())\n", + " resp.raise_for_status()\n", + " return resp.json().get(\"resources\", [])\n", + "\n", + "# --- Create or fetch a metric ---\n", + "def create_or_get_metric(custom_metric, user_metric_id=None):\n", + " all_metrics = fetch_all_metrics()\n", + "\n", + " # 1️⃣ User-supplied ID lookup\n", + " if user_metric_id:\n", + " for m in all_metrics:\n", + " if m.get(\"id\") == user_metric_id:\n", + " print(f\"✅ Metric already exists by ID: {user_metric_id}\")\n", + " return user_metric_id\n", + " print(f\"⚠️ User metric ID {user_metric_id} not found, will only include if valid later\")\n", + "\n", + " # 2️⃣ Check by scenario, name, version\n", + " scenario = custom_metric.get(\"scenario\")\n", + " name = custom_metric.get(\"name\")\n", + " version = custom_metric.get(\"version\")\n", + " if not all([scenario, name, version]):\n", + " raise ValueError(\"Metric must include 'scenario', 'name', and 'version'\")\n", + "\n", + " for m in all_metrics:\n", + " if (m.get(\"scenario\") == scenario and\n", + " m.get(\"name\") == name and\n", + " m.get(\"version\") == version):\n", + " metric_id = m.get(\"id\")\n", + " print(f\"✅ Metric already exists: {scenario}/{name} v{version}, ID = {metric_id}\")\n", + " return metric_id\n", + "\n", + " # 3️⃣ Create metric if not found\n", + " request_url = f\"{AICORE_BASE_URL}/v2/lm/evaluationMetrics\"\n", + " required_fields = [\"scenario\", \"name\", \"version\", \"evaluationMethod\", \"metricType\"]\n", + " for f in required_fields:\n", + " if f not in custom_metric:\n", + " raise ValueError(f\"❌ Missing required field: {f}\")\n", + "\n", + " resp = requests.post(request_url, headers=_get_headers(), json=custom_metric)\n", + " resp.raise_for_status()\n", + " metric_id = resp.json().get(\"id\")\n", + " print(f\"✅ Metric created successfully: {name} v{version}, ID = {metric_id}\")\n", + " return metric_id\n", + "\n", + "# --- Main pipeline ---\n", + "\n", + "# 1️⃣ Create/fetch metrics from SAP AI Core\n", + "metric_ids = []\n", + "for metric in custom_metric_list:\n", + " try:\n", + " print(f\"metric:{metric}\")\n", + " metric_id = create_or_get_metric(metric)\n", + " metric_ids.append(metric_id)\n", + " except ValueError as e:\n", + " print(f\"Skipping metric due to error: {e}\")\n", + "\n", + "# 2️⃣ Validate user_metric_ids separately if provided\n", + "if user_metric_ids and user_metric_ids.strip():\n", + " all_metrics = fetch_all_metrics()\n", + " # Split comma-separated IDs and strip whitespace\n", + " for uid in [uid.strip() for uid in user_metric_ids.split(\",\")]:\n", + " if any(m.get(\"id\") == uid for m in all_metrics):\n", + " metric_ids.append(uid)\n", + " else:\n", + " print(f\"⚠️ User metric ID {uid} does not exist in AI Core, skipping.\")\n", + "# 3️⃣ Convert to comma-separated string\n", + "custom_metric_ids_str = \",\".join(metric_ids)\n", + "print(\"✅ All processed metric IDs:\", custom_metric_ids_str)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create Orchestration Registry Configuration\n", + "\n", + "The following code defines a function `create_orchestration_registry_config()` that creates a new **Orchestration Configuration** in **Orchestration Registry**.\n", + "\n", + "**Note** : If you wish to use an existing orchestration config, skip executing this cell and add the orchestration config id in `orchestration_registry_id` string in the next cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def create_orchestration_registry_config():\n", + " headers = _get_headers()\n", + " prompt_template = {\n", + " \"template\": [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\"\n", + " }\n", + " ]\n", + " }\n", + " CREATE_ORCHESTRATION_REGISTRY = '/v2/registry/v2/orchestrationConfigs'\n", + " request_url = f\"{AICORE_BASE_URL}{CREATE_ORCHESTRATION_REGISTRY}\"\n", + " model_name,model_version=selected_models_str.split(\":\")\n", + " request_body = {\n", + " \"name\": \"genai-eval-test\",\n", + " \"version\": \"1.0.0\",\n", + " \"scenario\": \"genai-evaluations\",\n", + " \"spec\": {\n", + " \"modules\": {\n", + " \"prompt_templating\": {\n", + " \"model\": {\n", + " \"name\": model_name,\n", + " \"version\": model_version\n", + " },\n", + " \"prompt\": prompt_template\n", + " }\n", + " }\n", + " }\n", + " }\n", + " try:\n", + " response = requests.post(\n", + " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", + " )\n", + " if(response.status_code != 200):\n", + " print(response.json())\n", + " raise\n", + " result = response.json()\n", + " print(result)\n", + " return result['id']\n", + " except:\n", + " logging.error(\"Error occurred while attempting to create a orchestration registry id\")\n", + " raise\n", + "orchestration_registry_id = create_orchestration_registry_config()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Manually set orchestration config id\n", + "# orchestration_registry_id=\"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Evaluation Configuration Creation" + ] + }, + { + "cell_type": "code", + "execution_count": 80, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Selected metrics: Pointwise Answer Relevance,Pointwise Instruction Following\n", + "Selected models: gemini-2.5-pro:001\n" + ] + } + ], + "source": [ + "\n", + "import json\n", + "test_data_path = f\"testdata/{DATASET_NAME}\" # specify the test data path here. For the full folder just specifying testdata will work\n", + "test_datasets = json.dumps({'path': test_data_path, 'type': 'csv'})\n", + "metrics_list = \",\".join([selected_metrics_str,custom_metric_ids_str])\n", + "models_list = selected_models_str\n", + "print(f\"Selected metrics: {metrics_list}\")\n", + "print(f\"Selected models: {models_list}\")\n", + "#variable_mapping = json.dumps({'prompt/question': 'data/topic'}) # to map the question prompt variable to the entry in dataset.\n", + "# orchestration_deployment_url = deployment_url # needs to specify this to use a specific deployment id\n", + "orchestration_deployment_url = deployment_url\n", + "repetitions = \"1\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# creating an AICORE Configuration.\n", + "import requests\n", + "\n", + "request_body = {\n", + " \"name\": \"genai-eval-conf\",\n", + " \"scenarioId\": \"genai-evaluations\",\n", + " \"executableId\": \"genai-evaluations-simplified\",\n", + " \"inputArtifactBindings\": [\n", + " {\n", + " \"key\": \"datasetFolder\",\n", + " \"artifactId\": artifact_id\n", + " }\n", + " ],\n", + " \"parameterBindings\": [\n", + " {\n", + " \"key\": \"repetitions\",\n", + " \"value\": repetitions\n", + " },\n", + " {\n", + " \"key\": \"orchestrationDeploymentURL\",\n", + " \"value\": orchestration_deployment_url\n", + " },\n", + " {\n", + " \"key\": \"metrics\",\n", + " \"value\": metrics_list\n", + " },\n", + " {\n", + " \"key\": \"testDataset\",\n", + " \"value\": test_datasets\n", + " },\n", + " {\n", + " \"key\": \"orchestrationRegistryIds\",\n", + " \"value\": orchestration_registry_id\n", + " }\n", + " ]\n", + "}\n", + "\n", + "def create_aicore_configuration():\n", + " headers = _get_headers()\n", + " GET_CONFIGURATIONS_ENDPOINT = '/v2/lm/configurations'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_CONFIGURATIONS_ENDPOINT}\"\n", + " try:\n", + " response = requests.post(\n", + " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", + " )\n", + " print(response)\n", + " if(response.status_code != 201):\n", + " raise\n", + " result = response.json()\n", + " print(result)\n", + " return result['id']\n", + " except:\n", + " logging.error(\"Error occurred while attempting to create a Configuration\")\n", + " raise\n", + " \n", + "configuration_id = create_aicore_configuration()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Evaluation Execution Creation\n", + "Once Configration is create, we create the AI Core execution which triggers the evaluation workload.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# create an execution with the created configuration.\n", + "\n", + "import requests\n", + "def create_execution():\n", + " headers = _get_headers()\n", + " GET_EXECUTIONS_ENDPOINT = '/v2/lm/executions'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_EXECUTIONS_ENDPOINT}\"\n", + " request_body = {\"configurationId\" : configuration_id} \n", + " try:\n", + " response = requests.post(\n", + " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", + " )\n", + " print(\"response received is \", response)\n", + " result = response.json()\n", + " print(result)\n", + " return result['id']\n", + " except:\n", + " logging.error(\"Error occurred while attempting to create an execution\")\n", + " raise\n", + " \n", + "\n", + "execution_id = create_execution()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# get execution status\n", + "import requests\n", + "def get_execution_status(execution_id):\n", + " headers = _get_headers()\n", + " LOG_EXECUTIONS_ENDPOINT = f'/v2/lm/executions/{execution_id}'\n", + " request_url = f\"{AICORE_BASE_URL}{LOG_EXECUTIONS_ENDPOINT}\"\n", + " try:\n", + " response = requests.get(\n", + " request_url, headers=headers, timeout=120\n", + " )\n", + " print(\"response received is \", response)\n", + " result = response.json()\n", + " return result\n", + " except:\n", + " logging.error(\"Error occurred while attempting to get execution status\")\n", + " raise\n", + " \n", + "\n", + "get_execution_status(execution_id)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "1. Run the following cells only when the status field in the Execution response is \"COMPLETED\" to view the results.\n", + "2. The status field progresses through different states over time: UNKNOWN → PENDING → RUNNING → COMPLETED. Ensure it reaches COMPLETED before proceeding.\n", + "\n", + "\n", + "Note: The targetStatus will always be COMPLETED from the start, as it represents the intended final state of the Execution. Do not confuse it with the actual status field.\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Evaluation Result\n", + "The evaluation job produces two outputs\n", + "1. A SQLite DB file which stores the orchestration input, orchestration output, values for all the metrics calculated for this orchestration output and statistics such as latency for this orchestration output. These metric values are called raw metric values. This SQLite DB file is stored in the object store as an AI Core output artifact.\n", + "2. A set of metrics whose values are aggregated from the raw metric values. The aggregate metrics are stored in the tracking service. The user-defined tags along with the run names are stored with the metrics.\n", + "Post execution completion user can see the runs generated by the workload along with the aggregate metrics by calling the tracking api as show below" + ] + }, + { + "cell_type": "code", + "execution_count": 90, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "response received is \n" + ] + } + ], + "source": [ + "# Get aggregate metrics using execution id\n", + "import requests\n", + "def retrieve_aggregate_metrics(execution_id):\n", + " headers = _get_headers()\n", + " GET_METRICS_ENDPOINT = f'/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/child-of={execution_id}'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_METRICS_ENDPOINT}\"\n", + " try:\n", + " response = requests.get(request_url, headers=headers, timeout=120)\n", + " print(\"response received is \", response)\n", + " result = response.json()\n", + " return result\n", + " except:\n", + " logging.error(\"Error occurred while attempting to retreive aggeregate metrics for the run\")\n", + " raise\n", + "\n", + "runs_data = retrieve_aggregate_metrics(execution_id)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To further drill down , User can also download the SQLite DB file from object storage and analyse the results(instance level metrics, logs etc) locally." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# download the result artifacts from Object store.\n", + "import boto3\n", + "\n", + "def download_all_objects(prefix, destination_folder):\n", + " \"\"\"\n", + " Recursively download all objects from an S3 bucket starting with a specific prefix.\n", + "\n", + " :param bucket_name: Name of the S3 bucket.\n", + " :param prefix: Prefix to filter objects in the bucket.\n", + " :param destination_folder: Local folder to save the downloaded files.\n", + " \"\"\"\n", + " s3_client = boto3.client(\n", + " 's3',\n", + " aws_access_key_id=AWS_ACCESS_KEY,\n", + " aws_secret_access_key=AWS_SECRET_ACCESS_KEY,\n", + " region_name=AWS_REGION\n", + " )\n", + "\n", + " # Ensure the destination folder exists\n", + " if not os.path.exists(destination_folder):\n", + " os.makedirs(destination_folder)\n", + "\n", + " # Paginate through objects\n", + " paginator = s3_client.get_paginator('list_objects_v2')\n", + " pages = paginator.paginate(Bucket=AWS_BUCKET_ID, Prefix=prefix)\n", + "\n", + " for page in pages:\n", + " if 'Contents' in page:\n", + " for obj in page['Contents']:\n", + " key = obj['Key']\n", + " local_file_path = os.path.join(destination_folder, os.path.relpath(key, prefix))\n", + "\n", + " # Ensure the local directory structure exists\n", + " local_directory = os.path.dirname(local_file_path)\n", + " if not os.path.exists(local_directory):\n", + " os.makedirs(local_directory)\n", + "\n", + " # Download the object\n", + " print(f\"Downloading {key} to {local_file_path}\")\n", + " s3_client.download_file(AWS_BUCKET_ID, key, local_file_path)\n", + "\n", + "\n", + "# Download the evaluation results from the object store. Look at execution status under \"outputArtifacts\" key to see the 'url'\n", + "# which shows the data path of where your output results are stored\n", + "EXECUTION_ID = execution_id\n", + "sqlite_db_prefix = f'{EXECUTION_ID}/tmp/' # change the prefix based on where your output artifact is stored in the bucket.\n", + "destination_folder = 'results-new'\n", + "\n", + "download_all_objects(sqlite_db_prefix, destination_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "NOTE: The below Cell shows results of top 5 rows of the Evaluation Results across all SQLite tables. IF you wish to see all the entries you can comment the line saying df.head(5) in the below cell or modify the number accordingly." + ] + }, + { + "cell_type": "code", + "execution_count": 95, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "
\n", + "\n", + "
\n", + "

Table: run

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idnameconfigtagscreated_atupdated_at
1571f78d465d4d53961f08758a243bb8Run-genai-eval-test-gemini-2.5-pro-001{\"modules\": {\"prompt_templating\": {\"prompt\": {\"template\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\", \"role\": \"user\"}]}, \"model\": {\"name\": \"gemini-2.5-pro\", \"version\": \"001\", \"timeout\": 600, \"max_retries\": 2}}}}{}2026-02-10 07:23:05.0367702026-02-10 07:23:05.036775
19722d52bde94ac488b1bd8abbd5bec9Run-genai-eval-test-gemini-2.5-pro-001{\"modules\": {\"prompt_templating\": {\"prompt\": {\"template\": [{\"content\": \"{{?question}}\", \"role\": \"user\"}]}, \"model\": {\"name\": \"gemini-2.5-pro\", \"version\": \"001\", \"timeout\": 600, \"max_retries\": 2}}}}{}2026-02-10 07:23:05.0367792026-02-10 07:23:05.036779
\n", + "
\n", + " \n", + "
\n", + "

Table: configuration

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idtest_datasetsmetricsvariable_mappingtagsorchestration_deployment_urlrepetitionsmetric_templatescreated_atupdated_at
7ea46306da41430b88d3e4d3e83554c9{\"path\": \"testdata/medicalqna_dataset.csv\", \"type\": \"csv\"}[\"Pointwise Answer Relevance\", \"Pointwise Instruction Following\"]{}{}https://api.ai.aicore-pr.eu-west-1.mlf-aws-dev.com/v2/inference/deployments/d0d6f232abfea6721[{\"evaluationMethod\": \"llm-as-a-judge\", \"scenario\": \"genai-evaluations\", \"createdAt\": \"2025-11-19 00:00:00+00:00\", \"managedBy\": \"imperative\", \"metricType\": \"evaluation\", \"systemPredefined\": true, \"id\": \"0ae30283-0140-451e-8a88-267ef801f35c\", \"name\": \"Pointwise Answer Relevance\", \"description\": \"Measures how closely the model\\u2019s response relates to the user prompt, for both general and RAG use cases. Scores range from 1 to 5, with higher values indicating greater relevance.\", \"version\": \"1.0.0\", \"spec\": {\"outputType\": \"categorical\", \"promptType\": \"structured\", \"configuration\": {\"modelConfiguration\": {\"name\": \"gpt-4.1\", \"version\": \"2025-04-14\", \"parameters\": [{\"key\": \"temperature\", \"value\": \"0\"}]}, \"promptConfiguration\": {\"evaluationTask\": \"You are an expert evaluator. Your task is to evaluate the relevance of responses generated by AI models.\\nWe will provide you with the user input and an AI-generated response.\\nYou should first read the user input carefully to understand the context and intention, and then evaluate the relevance of the response based on the criteria provided in the Evaluation section below.\\nYou will assign the response a rating following the Rating Rubric and Evaluation Steps.\\nGive step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.\", \"definition\": \"You will be assessing relevance, which measures the ability to provide a response that is pertinent and useful based on the user prompt and the context provided.\", \"criteria\": \"Relevance: Does the response address the user's query appropriately and provide pertinent information?\", \"ratingRubric\": [{\"rating\": \"1\", \"rule\": \"(Irrelevant). The response is irrelevant and does not address the user's query.\"}, {\"rating\": \"2\", \"rule\": \"(Slightly relevant). The response is slightly relevant and largely misses the user's query.\"}, {\"rating\": \"3\", \"rule\": \"(Somewhat relevant). The response is somewhat relevant but may miss key aspects of the user's query.\"}, {\"rating\": \"4\", \"rule\": \"(Mostly relevant). The response is mostly relevant and generally addresses the user's query with useful information.\"}, {\"rating\": \"5\", \"rule\": \"(Highly relevant). The response is highly relevant, directly addresses the user's query, and provides useful information.\"}], \"evaluationSteps\": [\"Assess the response in terms of Relevance. Identify how well the response aligns with the user's query and context according to the Criteria.\", \"Score based on the rating rubric. Give a brief rationale to explain your evaluation considering Relevance.\"]}}}, \"usageType\": [\"evaluation\"], \"additionalProperties\": {\"variables\": [], \"supported_values\": [1, 5], \"experimental\": true}}, {\"evaluationMethod\": \"llm-as-a-judge\", \"scenario\": \"genai-evaluations\", \"createdAt\": \"2025-11-19 00:00:00+00:00\", \"managedBy\": \"imperative\", \"metricType\": \"evaluation\", \"systemPredefined\": true, \"id\": \"cd3ffd21-faae-4f06-8184-52541182d9a5\", \"name\": \"Pointwise Instruction Following\", \"description\": \"Evaluates the model\\u2019s ability to follow the instructions provided in the user prompt. Scores range from 1 to 5, with 1 indicating no fulfillment and 5 indicating complete fulfillment.\", \"version\": \"1.0.0\", \"spec\": {\"outputType\": \"categorical\", \"promptType\": \"structured\", \"configuration\": {\"modelConfiguration\": {\"name\": \"gpt-4.1\", \"version\": \"2025-04-14\", \"parameters\": [{\"key\": \"temperature\", \"value\": \"0\"}]}, \"promptConfiguration\": {\"evaluationTask\": \"Please act as an impartial judge and evaluate the quality of the responses based on the prompt and following criteria:\", \"definition\": \"You will be assessing model's the ability to follow instructions provided in the user prompt.\", \"criteria\": \"Instruction following: The response demonstrates a clear understanding of the instructions in the user prompt, satisfying all of the instruction's requirements. Evaluate the responses STRICTLY on the ability to follow instruction ONLY.\", \"ratingRubric\": [{\"rating\": \"1\", \"rule\": \"(No fulfillment). Response does not address the most important aspects of the instruction. The user would feel like their request was not at all understood.\"}, {\"rating\": \"2\", \"rule\": \"(Poor fulfillment). Response addresses some aspects of the instruction but misses key requirements or major components. The user would feel like their instruction was misunderstood in significant ways.\"}, {\"rating\": \"3\", \"rule\": \"(Some fulfillment). Response does not address some minor aspects and/or ignores some requirements of the instruction. The user would feel like their instruction was partially understood.\"}, {\"rating\": \"4\", \"rule\": \"(Good fulfillment). Response addresses most aspects and requirements of the instruction. It might miss very minor details or have slight deviations from requirements. The user would feel like their instruction was well understood.\"}, {\"rating\": \"5\", \"rule\": \"(Complete fulfillment). Response addresses all aspects and adheres to all requirements of the instruction. The user would feel like their instruction was completely understood.\"}]}}}, \"usageType\": [\"evaluation\"], \"additionalProperties\": {\"variables\": [], \"supported_values\": [1, 5], \"experimental\": false}}]2026-02-10 07:23:05.0276612026-02-10 07:23:05.027666
\n", + "
\n", + " \n", + "
\n", + "

Table: submission

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idrun_idorchestration_configurationtemplate_variablescreated_atupdated_at
b5f494721730469f922759caf919d4701571f78d465d4d53961f08758a243bb8{\"modules\": {\"prompt_templating\": {\"prompt\": {\"template\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\", \"role\": \"user\"}]}, \"model\": {\"name\": \"gemini-2.5-pro\", \"version\": \"001\", \"timeout\": 600, \"max_retries\": 2}}}}{\"question\": \"how does rivatigmine and otc sleep medicine interact\", \"sentiment\": \"Interaction\", \"reference\": \"tell your doctor and pharmacist what prescription and nonprescription medications, vitamins, nutritional supplements, and herbal products you are taking or plan to take. Be sure to mention any of the following: antihistamines; aspirin and other nonsteroidal anti-inflammatory medications (NSAIDs) such as ibuprofen (Advil, Motrin) and naproxen (Aleve, Naprosyn); bethanechol (Duvoid, Urecholine); ipratropium (Atrovent, in Combivent, DuoNeb); and medications for Alzheimer's disease, glaucoma, irritable bowel disease, motion sickness, ulcers, or urinary problems. Your doctor may need to change the doses of your medications or monitor you carefully for side effects.\"}2026-02-10 07:23:05.0467472026-02-10 07:23:05.046749
cb37a740df1b4a43a50ce2bf6720eda01571f78d465d4d53961f08758a243bb8{\"modules\": {\"prompt_templating\": {\"prompt\": {\"template\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\", \"role\": \"user\"}]}, \"model\": {\"name\": \"gemini-2.5-pro\", \"version\": \"001\", \"timeout\": 600, \"max_retries\": 2}}}}{\"question\": \"how does valium affect the brain\", \"sentiment\": \"Action\", \"reference\": \"Diazepam is a benzodiazepine that exerts anxiolytic, sedative, muscle-relaxant, anticonvulsant and amnestic effects. Most of these effects are thought to result from a facilitation of the action of gamma aminobutyric acid (GABA), an inhibitory neurotransmitter in the central nervous system.\"}2026-02-10 07:23:05.0467532026-02-10 07:23:05.046753
fe41557ffc8d410681a10dee1da5bc691571f78d465d4d53961f08758a243bb8{\"modules\": {\"prompt_templating\": {\"prompt\": {\"template\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\", \"role\": \"user\"}]}, \"model\": {\"name\": \"gemini-2.5-pro\", \"version\": \"001\", \"timeout\": 600, \"max_retries\": 2}}}}{\"question\": \"what is morphine\", \"sentiment\": \"Information\", \"reference\": \"Morphine is a pain medication of the opiate family which is found naturally in a number of plants and animals.[5][7] It acts directly on the central nervous system (CNS) to decrease the feeling of pain.\"}2026-02-10 07:23:05.0467552026-02-10 07:23:05.046756
03d1b9791ee640f088980fd7cb6426a41571f78d465d4d53961f08758a243bb8{\"modules\": {\"prompt_templating\": {\"prompt\": {\"template\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\", \"role\": \"user\"}]}, \"model\": {\"name\": \"gemini-2.5-pro\", \"version\": \"001\", \"timeout\": 600, \"max_retries\": 2}}}}{\"question\": \"what are the milligrams for oxycodone e\", \"sentiment\": \"Dose\", \"reference\": \"\\ufffd 10 mg \\ufffd 20 mg \\ufffd 40 mg \\ufffd 80 mg ...\"}2026-02-10 07:23:05.0467582026-02-10 07:23:05.046758
d2c9940f373d423b80eb75d1ccc39ad91571f78d465d4d53961f08758a243bb8{\"modules\": {\"prompt_templating\": {\"prompt\": {\"template\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\", \"role\": \"user\"}]}, \"model\": {\"name\": \"gemini-2.5-pro\", \"version\": \"001\", \"timeout\": 600, \"max_retries\": 2}}}}{\"question\": \"81% aspirin contain resin and shellac in it. ?\", \"sentiment\": \"Ingredient\", \"reference\": \"Inactive Ingredients Ingredient Name\"}2026-02-10 07:23:05.0467602026-02-10 07:23:05.046760
\n", + "
\n", + " \n", + "
\n", + "

Table: submission_result

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
submission_idrun_idrepetition_countcompletion_resultlatencycreated_atupdated_at
b5f494721730469f922759caf919d4701571f78d465d4d53961f08758a243bb81{\"request_id\": \"45f4160b-4fb2-9116-abbd-65a96bb43f04\", \"intermediate_results\": {\"templating\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: how does rivatigmine and otc sleep medicine interact.\", \"role\": \"user\"}], \"llm\": {\"id\": \"\", \"object\": \"chat.completion\", \"created\": 1770708295, \"model\": \"gemini-2.5-pro\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Of course. Here is a breakdown of the benefits and side effects of the drugs involved in the question, followed by an explanation of their interaction.\\n\\n***Disclaimer: This information is for educational purposes only and is not a substitute for professional medical advice. Always consult with a healthcare provider or pharmacist before starting, stopping, or combining any medications.***\\n\\n### The Interaction: How Rivastigmine and OTC Sleep Medicine Interact\\n\\nThe interaction between rivastigmine and most over-the-counter (OTC) sleep medicines is significant and potentially harmful. They work in opposite ways in the brain.\\n\\n* **Rivastigmine's Job:** To **increase** the levels of a brain chemical called acetylcholine, which is crucial for memory and thinking.\\n* **OTC Sleep Medicine's Job:** Most OTC sleep aids (like ZzzQuil, Unisom, or Benadryl) contain antihistamines (diphenhydramine or doxylamine) that work by **blocking** the action of acetylcholine. This blocking effect causes drowsiness, but it also leads to confusion, dry mouth, and other side effects.\\n\\nBecause these drugs have opposing effects, taking them together can:\\n1. **Cancel out the benefits of rivastigmine**, making it less effective at treating the symptoms of dementia.\\n2. **Worsen cognitive side effects**, such as confusion, memory problems, and disorientation, which is particularly dangerous for a person already being treated for dementia.\\n\\n---\\n\\n### Drug 1: Rivastigmine (e.g., Exelon)\\n\\nThis is a prescription medication used to treat dementia.\\n\\n#### **Benefits:**\\n* **Treats Symptoms of Dementia:** Used for mild to moderate dementia associated with Alzheimer's disease and Parkinson's disease.\\n* **Improves Cognitive Function:** It can help improve or slow the decline of memory, thinking, attention, and the ability to perform simple daily tasks.\\n* **Improves Behavioral Symptoms:** In some patients, it can help manage behavioral symptoms associated with dementia, such as agitation or apathy.\\n* **It is important to note that rivastigmine is not a cure**; it only helps manage the symptoms.\\n\\n#### **Common Side Effects:**\\n* Nausea, vomiting, diarrhea\\n* Loss of appetite and weight loss\\n* Stomach pain or upset\\n* Dizziness or headache\\n* Weakness or fatigue\\n* **For the patch form:** Skin redness, itching, or irritation at the application site.\\n\\n---\\n\\n### Drug 2: Common OTC Sleep Medicines (Antihistamines)\\n\\nThese are non-prescription drugs used for short-term sleeplessness. The most common active ingredients are **Diphenhydramine** (found in Benadryl, ZzzQuil, Aleve PM) and **Doxylamine** (found in Unisom SleepTabs).\\n\\n#### **Benefits:**\\n* **Induces Drowsiness:** Helps a person fall asleep more easily.\\n* **Relieves Short-Term Insomnia:** Effective for occasional sleeplessness caused by stress, travel, or other temporary disruptions.\\n* **Widely Accessible:** Available over-the-counter without a prescription.\\n\\n#### **Common Side Effects (especially problematic in older adults):**\\n* **Cognitive Impairment:** **Confusion, memory problems, and difficulty concentrating.**\\n* **\\\"Hangover Effect\\\":** Next-day drowsiness, grogginess, and poor coordination.\\n* **Anticholinergic Effects:**\\n * Dry mouth, dry eyes\\n * Blurred vision\\n * Constipation\\n * Difficulty urinating (urinary retention)\\n* Dizziness and lightheadedness, which can increase the risk of falls.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 2346, \"prompt_tokens\": 29, \"total_tokens\": 2375, \"prompt_tokens_details\": {\"cached_tokens\": 0}, \"completion_tokens_details\": {\"reasoning_tokens\": 1561}}}}, \"final_result\": {\"id\": \"\", \"object\": \"chat.completion\", \"created\": 1770708295, \"model\": \"gemini-2.5-pro\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Of course. Here is a breakdown of the benefits and side effects of the drugs involved in the question, followed by an explanation of their interaction.\\n\\n***Disclaimer: This information is for educational purposes only and is not a substitute for professional medical advice. Always consult with a healthcare provider or pharmacist before starting, stopping, or combining any medications.***\\n\\n### The Interaction: How Rivastigmine and OTC Sleep Medicine Interact\\n\\nThe interaction between rivastigmine and most over-the-counter (OTC) sleep medicines is significant and potentially harmful. They work in opposite ways in the brain.\\n\\n* **Rivastigmine's Job:** To **increase** the levels of a brain chemical called acetylcholine, which is crucial for memory and thinking.\\n* **OTC Sleep Medicine's Job:** Most OTC sleep aids (like ZzzQuil, Unisom, or Benadryl) contain antihistamines (diphenhydramine or doxylamine) that work by **blocking** the action of acetylcholine. This blocking effect causes drowsiness, but it also leads to confusion, dry mouth, and other side effects.\\n\\nBecause these drugs have opposing effects, taking them together can:\\n1. **Cancel out the benefits of rivastigmine**, making it less effective at treating the symptoms of dementia.\\n2. **Worsen cognitive side effects**, such as confusion, memory problems, and disorientation, which is particularly dangerous for a person already being treated for dementia.\\n\\n---\\n\\n### Drug 1: Rivastigmine (e.g., Exelon)\\n\\nThis is a prescription medication used to treat dementia.\\n\\n#### **Benefits:**\\n* **Treats Symptoms of Dementia:** Used for mild to moderate dementia associated with Alzheimer's disease and Parkinson's disease.\\n* **Improves Cognitive Function:** It can help improve or slow the decline of memory, thinking, attention, and the ability to perform simple daily tasks.\\n* **Improves Behavioral Symptoms:** In some patients, it can help manage behavioral symptoms associated with dementia, such as agitation or apathy.\\n* **It is important to note that rivastigmine is not a cure**; it only helps manage the symptoms.\\n\\n#### **Common Side Effects:**\\n* Nausea, vomiting, diarrhea\\n* Loss of appetite and weight loss\\n* Stomach pain or upset\\n* Dizziness or headache\\n* Weakness or fatigue\\n* **For the patch form:** Skin redness, itching, or irritation at the application site.\\n\\n---\\n\\n### Drug 2: Common OTC Sleep Medicines (Antihistamines)\\n\\nThese are non-prescription drugs used for short-term sleeplessness. The most common active ingredients are **Diphenhydramine** (found in Benadryl, ZzzQuil, Aleve PM) and **Doxylamine** (found in Unisom SleepTabs).\\n\\n#### **Benefits:**\\n* **Induces Drowsiness:** Helps a person fall asleep more easily.\\n* **Relieves Short-Term Insomnia:** Effective for occasional sleeplessness caused by stress, travel, or other temporary disruptions.\\n* **Widely Accessible:** Available over-the-counter without a prescription.\\n\\n#### **Common Side Effects (especially problematic in older adults):**\\n* **Cognitive Impairment:** **Confusion, memory problems, and difficulty concentrating.**\\n* **\\\"Hangover Effect\\\":** Next-day drowsiness, grogginess, and poor coordination.\\n* **Anticholinergic Effects:**\\n * Dry mouth, dry eyes\\n * Blurred vision\\n * Constipation\\n * Difficulty urinating (urinary retention)\\n* Dizziness and lightheadedness, which can increase the risk of falls.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 2346, \"prompt_tokens\": 29, \"total_tokens\": 2375, \"prompt_tokens_details\": {\"cached_tokens\": 0}, \"completion_tokens_details\": {\"reasoning_tokens\": 1561}}}}28.0859652026-02-10 07:31:29.3301122026-02-10 07:31:29.330117
cb37a740df1b4a43a50ce2bf6720eda01571f78d465d4d53961f08758a243bb81{\"request_id\": \"8c78b558-11d3-9c9b-867f-855688bbdcf8\", \"intermediate_results\": {\"templating\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: how does valium affect the brain.\", \"role\": \"user\"}], \"llm\": {\"id\": \"\", \"object\": \"chat.completion\", \"created\": 1770708294, \"model\": \"gemini-2.5-pro\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Of course. Here is a breakdown of the benefits and side effects of Valium (diazepam) based on the question of how it affects the brain.\\n\\n### How Valium (Diazepam) Affects the Brain\\n\\nValium belongs to a class of drugs called benzodiazepines. Its primary effect on the brain is to enhance the activity of a neurotransmitter called **GABA (gamma-aminobutyric acid)**.\\n\\nThink of GABA as the brain's primary \\\"braking system\\\" or calming chemical. It reduces the excitability of nerve cells. By making GABA more effective, Valium essentially \\\"turns down the volume\\\" on brain activity. This leads to a sedative, anti-anxiety, and muscle-relaxing effect.\\n\\n---\\n\\n### Benefits (Therapeutic Uses)\\n\\nBy slowing down excessive nerve signaling in the brain, Valium provides the following medical benefits:\\n\\n* **Anxiety Relief:** It is effective for treating generalized anxiety disorder, panic attacks, and short-term relief from severe anxiety symptoms.\\n* **Muscle Relaxation:** It calms the nerve signals that cause muscle tightness and spasms, making it useful for conditions like back pain, cerebral palsy, or other muscle injuries.\\n* **Seizure Control:** By reducing excessive electrical activity in the brain, it can be used to treat certain types of seizure disorders (epilepsy) and to stop prolonged seizures (status epilepticus).\\n* **Management of Alcohol Withdrawal:** It helps manage and reduce severe withdrawal symptoms like agitation, tremors, and seizures by providing a calming effect on the over-stimulated brain.\\n* **Sedation:** It is often used to calm patients before medical procedures or surgery, causing drowsiness and reducing anxiety.\\n\\n### Side Effects\\n\\nThe same brain-slowing mechanism that provides benefits also causes side effects.\\n\\n#### Common Side Effects:\\n\\n* **Drowsiness and Fatigue:** The most common side effect, as the drug slows overall brain function.\\n* **Dizziness and Lightheadedness:** Can affect balance and spatial awareness.\\n* **Muscle Weakness:** An extension of its muscle-relaxing properties.\\n* **Ataxia (Loss of Coordination):** Difficulty with balance, walking, and fine motor skills.\\n* **Confusion and \\\"Brain Fog\\\":** Slower thinking, difficulty concentrating, and short-term memory impairment.\\n\\n#### Less Common or More Serious Side Effects:\\n\\n* **Anterograde Amnesia:** Difficulty forming new memories while the drug is active.\\n* **Depression:** Can worsen or, in some cases, cause depressive symptoms.\\n* **Slurred Speech:** A result of reduced motor control.\\n* **Paradoxical Reactions:** In rare cases, it can cause the opposite effect, leading to agitation, aggression, anxiety, or hallucinations.\\n* **Respiratory Depression:** Slowed breathing, which can be dangerous, especially when Valium is combined with other depressants like alcohol or opioid painkillers.\\n\\n#### Risks Associated with Long-Term Use:\\n\\n* **Tolerance:** The body adapts to the drug, requiring higher doses to achieve the same effect.\\n* **Dependence:** The brain becomes reliant on the drug to function normally. Stopping the drug abruptly can lead to severe and potentially life-threatening **withdrawal symptoms**, including rebound anxiety, insomnia, tremors, and seizures.\\n* **Addiction:** Compulsive use of the drug despite negative consequences.\\n\\n***\\n\\n**Disclaimer:** This information is for educational purposes only and is not a substitute for professional medical advice. Valium is a potent prescription medication that should only be taken under the guidance of a healthcare provider. It is generally recommended for short-term use due to the high risk of dependence.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 1943, \"prompt_tokens\": 24, \"total_tokens\": 1967, \"prompt_tokens_details\": {\"cached_tokens\": 0}, \"completion_tokens_details\": {\"reasoning_tokens\": 1174}}}}, \"final_result\": {\"id\": \"\", \"object\": \"chat.completion\", \"created\": 1770708294, \"model\": \"gemini-2.5-pro\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Of course. Here is a breakdown of the benefits and side effects of Valium (diazepam) based on the question of how it affects the brain.\\n\\n### How Valium (Diazepam) Affects the Brain\\n\\nValium belongs to a class of drugs called benzodiazepines. Its primary effect on the brain is to enhance the activity of a neurotransmitter called **GABA (gamma-aminobutyric acid)**.\\n\\nThink of GABA as the brain's primary \\\"braking system\\\" or calming chemical. It reduces the excitability of nerve cells. By making GABA more effective, Valium essentially \\\"turns down the volume\\\" on brain activity. This leads to a sedative, anti-anxiety, and muscle-relaxing effect.\\n\\n---\\n\\n### Benefits (Therapeutic Uses)\\n\\nBy slowing down excessive nerve signaling in the brain, Valium provides the following medical benefits:\\n\\n* **Anxiety Relief:** It is effective for treating generalized anxiety disorder, panic attacks, and short-term relief from severe anxiety symptoms.\\n* **Muscle Relaxation:** It calms the nerve signals that cause muscle tightness and spasms, making it useful for conditions like back pain, cerebral palsy, or other muscle injuries.\\n* **Seizure Control:** By reducing excessive electrical activity in the brain, it can be used to treat certain types of seizure disorders (epilepsy) and to stop prolonged seizures (status epilepticus).\\n* **Management of Alcohol Withdrawal:** It helps manage and reduce severe withdrawal symptoms like agitation, tremors, and seizures by providing a calming effect on the over-stimulated brain.\\n* **Sedation:** It is often used to calm patients before medical procedures or surgery, causing drowsiness and reducing anxiety.\\n\\n### Side Effects\\n\\nThe same brain-slowing mechanism that provides benefits also causes side effects.\\n\\n#### Common Side Effects:\\n\\n* **Drowsiness and Fatigue:** The most common side effect, as the drug slows overall brain function.\\n* **Dizziness and Lightheadedness:** Can affect balance and spatial awareness.\\n* **Muscle Weakness:** An extension of its muscle-relaxing properties.\\n* **Ataxia (Loss of Coordination):** Difficulty with balance, walking, and fine motor skills.\\n* **Confusion and \\\"Brain Fog\\\":** Slower thinking, difficulty concentrating, and short-term memory impairment.\\n\\n#### Less Common or More Serious Side Effects:\\n\\n* **Anterograde Amnesia:** Difficulty forming new memories while the drug is active.\\n* **Depression:** Can worsen or, in some cases, cause depressive symptoms.\\n* **Slurred Speech:** A result of reduced motor control.\\n* **Paradoxical Reactions:** In rare cases, it can cause the opposite effect, leading to agitation, aggression, anxiety, or hallucinations.\\n* **Respiratory Depression:** Slowed breathing, which can be dangerous, especially when Valium is combined with other depressants like alcohol or opioid painkillers.\\n\\n#### Risks Associated with Long-Term Use:\\n\\n* **Tolerance:** The body adapts to the drug, requiring higher doses to achieve the same effect.\\n* **Dependence:** The brain becomes reliant on the drug to function normally. Stopping the drug abruptly can lead to severe and potentially life-threatening **withdrawal symptoms**, including rebound anxiety, insomnia, tremors, and seizures.\\n* **Addiction:** Compulsive use of the drug despite negative consequences.\\n\\n***\\n\\n**Disclaimer:** This information is for educational purposes only and is not a substitute for professional medical advice. Valium is a potent prescription medication that should only be taken under the guidance of a healthcare provider. It is generally recommended for short-term use due to the high risk of dependence.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 1943, \"prompt_tokens\": 24, \"total_tokens\": 1967, \"prompt_tokens_details\": {\"cached_tokens\": 0}, \"completion_tokens_details\": {\"reasoning_tokens\": 1174}}}}27.5086212026-02-10 07:31:29.3301172026-02-10 07:31:29.330118
fe41557ffc8d410681a10dee1da5bc691571f78d465d4d53961f08758a243bb81{\"request_id\": \"824ae29d-8409-9f03-8d88-a5edfc13b7f0\", \"intermediate_results\": {\"templating\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: what is morphine.\", \"role\": \"user\"}], \"llm\": {\"id\": \"\", \"object\": \"chat.completion\", \"created\": 1770708300, \"model\": \"gemini-2.5-pro\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Of course. Here is a breakdown of the benefits and side effects of morphine, framed as a response to the consumer health question, \\\"what is morphine.\\\"\\n\\n***\\n\\nMorphine is a powerful prescription pain medication that belongs to a class of drugs called opioids. It is derived from the opium poppy plant and works by changing the way your brain and nervous system respond to pain. Because it is very strong and can be habit-forming, it is used to treat severe pain that is not helped by other types of pain relievers.\\n\\nHere are the primary benefits and side effects associated with its use.\\n\\n### Benefits of Morphine\\n\\nWhen used as prescribed by a healthcare professional, morphine is a highly effective medication for managing severe pain. Its main benefits include:\\n\\n* **Powerful Pain Relief:** Morphine is one of the most effective pain relievers available. It is often used for acute pain after major surgery, serious injuries (like severe burns or trauma), or for heart attacks.\\n* **Management of Chronic Pain:** It can be used to manage severe, persistent pain, especially pain related to cancer.\\n* **Palliative and End-of-Life Care:** Morphine is a cornerstone of palliative care, where it not only relieves pain but can also ease the sensation of shortness of breath (dyspnea) in patients with terminal illnesses.\\n* **Anxiety Reduction:** By relieving severe pain, morphine can also help reduce the significant anxiety and distress that often accompany it.\\n\\n### Side Effects of Morphine\\n\\nMorphine has a significant risk of side effects, which is why it must be used under strict medical supervision. These can be divided into common and more serious categories.\\n\\n#### Common Side Effects:\\n\\n* **Drowsiness and Dizziness:** Feeling sleepy, sedated, or lightheaded is very common.\\n* **Constipation:** This is one of the most frequent and persistent side effects of all opioids.\\n* **Nausea and Vomiting:** Many people experience this, especially when first starting the medication.\\n* **Confusion or \\\"Fogginess\\\":** Difficulty thinking clearly or feeling disoriented.\\n* **Itching or Sweating:** These can occur as a reaction to the drug.\\n\\n#### Serious Side Effects (Require Immediate Medical Attention):\\n\\n* **Respiratory Depression:** This is the most dangerous side effect. Morphine can cause breathing to become dangerously slow and shallow, which can lead to unconsciousness, coma, brain damage, or death. The risk is much higher with large doses or when mixed with other substances like alcohol or sedatives.\\n* **Addiction, Dependence, and Tolerance:**\\n * **Dependence:** The body adapts to the drug, and stopping it suddenly can cause withdrawal symptoms (e.g., muscle aches, anxiety, sweating, nausea).\\n * **Tolerance:** Over time, a person may need higher doses of the drug to get the same level of pain relief.\\n * **Addiction:** A compulsive desire to use the drug for its euphoric effects, despite harmful consequences. This is a serious brain disease that can develop with opioid use.\\n* **Severe Low Blood Pressure:** This can cause fainting or extreme dizziness, especially when standing up.\\n* **Overdose:** Taking too much morphine can be fatal, primarily due to respiratory depression. Signs include unresponsiveness, pinpoint pupils, and slowed or stopped breathing.\\n\\n---\\n\\n**Disclaimer:** This information is for educational purposes only and is not a substitute for professional medical advice. Morphine is a controlled substance that should only be taken exactly as prescribed and monitored by a qualified healthcare provider. Always consult a doctor or pharmacist for guidance on your specific health condition and treatments.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 2048, \"prompt_tokens\": 20, \"total_tokens\": 2068, \"prompt_tokens_details\": {\"cached_tokens\": 0}, \"completion_tokens_details\": {\"reasoning_tokens\": 1284}}}}, \"final_result\": {\"id\": \"\", \"object\": \"chat.completion\", \"created\": 1770708300, \"model\": \"gemini-2.5-pro\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Of course. Here is a breakdown of the benefits and side effects of morphine, framed as a response to the consumer health question, \\\"what is morphine.\\\"\\n\\n***\\n\\nMorphine is a powerful prescription pain medication that belongs to a class of drugs called opioids. It is derived from the opium poppy plant and works by changing the way your brain and nervous system respond to pain. Because it is very strong and can be habit-forming, it is used to treat severe pain that is not helped by other types of pain relievers.\\n\\nHere are the primary benefits and side effects associated with its use.\\n\\n### Benefits of Morphine\\n\\nWhen used as prescribed by a healthcare professional, morphine is a highly effective medication for managing severe pain. Its main benefits include:\\n\\n* **Powerful Pain Relief:** Morphine is one of the most effective pain relievers available. It is often used for acute pain after major surgery, serious injuries (like severe burns or trauma), or for heart attacks.\\n* **Management of Chronic Pain:** It can be used to manage severe, persistent pain, especially pain related to cancer.\\n* **Palliative and End-of-Life Care:** Morphine is a cornerstone of palliative care, where it not only relieves pain but can also ease the sensation of shortness of breath (dyspnea) in patients with terminal illnesses.\\n* **Anxiety Reduction:** By relieving severe pain, morphine can also help reduce the significant anxiety and distress that often accompany it.\\n\\n### Side Effects of Morphine\\n\\nMorphine has a significant risk of side effects, which is why it must be used under strict medical supervision. These can be divided into common and more serious categories.\\n\\n#### Common Side Effects:\\n\\n* **Drowsiness and Dizziness:** Feeling sleepy, sedated, or lightheaded is very common.\\n* **Constipation:** This is one of the most frequent and persistent side effects of all opioids.\\n* **Nausea and Vomiting:** Many people experience this, especially when first starting the medication.\\n* **Confusion or \\\"Fogginess\\\":** Difficulty thinking clearly or feeling disoriented.\\n* **Itching or Sweating:** These can occur as a reaction to the drug.\\n\\n#### Serious Side Effects (Require Immediate Medical Attention):\\n\\n* **Respiratory Depression:** This is the most dangerous side effect. Morphine can cause breathing to become dangerously slow and shallow, which can lead to unconsciousness, coma, brain damage, or death. The risk is much higher with large doses or when mixed with other substances like alcohol or sedatives.\\n* **Addiction, Dependence, and Tolerance:**\\n * **Dependence:** The body adapts to the drug, and stopping it suddenly can cause withdrawal symptoms (e.g., muscle aches, anxiety, sweating, nausea).\\n * **Tolerance:** Over time, a person may need higher doses of the drug to get the same level of pain relief.\\n * **Addiction:** A compulsive desire to use the drug for its euphoric effects, despite harmful consequences. This is a serious brain disease that can develop with opioid use.\\n* **Severe Low Blood Pressure:** This can cause fainting or extreme dizziness, especially when standing up.\\n* **Overdose:** Taking too much morphine can be fatal, primarily due to respiratory depression. Signs include unresponsiveness, pinpoint pupils, and slowed or stopped breathing.\\n\\n---\\n\\n**Disclaimer:** This information is for educational purposes only and is not a substitute for professional medical advice. Morphine is a controlled substance that should only be taken exactly as prescribed and monitored by a qualified healthcare provider. Always consult a doctor or pharmacist for guidance on your specific health condition and treatments.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 2048, \"prompt_tokens\": 20, \"total_tokens\": 2068, \"prompt_tokens_details\": {\"cached_tokens\": 0}, \"completion_tokens_details\": {\"reasoning_tokens\": 1284}}}}33.4001852026-02-10 07:31:29.3301182026-02-10 07:31:29.330120
03d1b9791ee640f088980fd7cb6426a41571f78d465d4d53961f08758a243bb81{\"request_id\": \"a0b58a8c-b636-962f-a2ca-21df8d71ccb3\", \"intermediate_results\": {\"templating\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: what are the milligrams for oxycodone e.\", \"role\": \"user\"}], \"llm\": {\"id\": \"\", \"object\": \"chat.completion\", \"created\": 1770708307, \"model\": \"gemini-2.5-pro\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Of course. It is crucial to consult with a qualified healthcare provider for any questions about prescription medications like oxycodone. The information below is for educational purposes only and is not a substitute for professional medical advice, diagnosis, or treatment.\\n\\nRegarding your question about milligrams, oxycodone dosage is highly individualized and must be determined by a doctor. It depends on many factors, including:\\n\\n* The severity of the pain.\\n* The patient's age, weight, and overall health (especially liver and kidney function).\\n* Whether the patient has taken opioids before (tolerance).\\n* The specific formulation (immediate-release vs. extended-release).\\n\\nCommon strengths for immediate-release oxycodone tablets include 5 mg, 10 mg, 15 mg, 20 mg, and 30 mg. It is also available in combination with other drugs like acetaminophen (e.g., Percocet). **Never take oxycodone without a doctor's prescription and guidance.**\\n\\nHere is a list of the medical benefits and potential side effects of oxycodone.\\n\\n### Benefits (Medical Uses) of Oxycodone\\n\\nOxycodone is a powerful opioid analgesic prescribed for its primary benefit:\\n\\n* **Pain Relief:** Its main purpose is to manage moderate to severe pain that is not adequately controlled by other pain medications. It works by binding to opioid receptors in the brain and spinal cord, changing the way your body feels and responds to pain. It is used for:\\n * **Acute Pain:** Short-term, severe pain, such as after surgery or a major injury.\\n * **Chronic Pain:** Long-term, around-the-clock pain, often associated with conditions like cancer. For this, an extended-release formulation is typically used.\\n\\n---\\n\\n### Side Effects & Risks of Oxycodone\\n\\nOxycodone carries significant risks and a wide range of side effects, which can be categorized from common to severe.\\n\\n#### Common Side Effects\\n\\nThese are the most frequently reported side effects. While not typically life-threatening, they can be very uncomfortable.\\n\\n* **Constipation:** This is a very common and often persistent side effect of all opioids.\\n* **Drowsiness, Dizziness, or Lightheadedness:** Can impair your ability to drive or operate heavy machinery.\\n* **Nausea and Vomiting**\\n* **Headache**\\n* **Dry Mouth**\\n* **Itching or Sweating**\\n* **Feeling tired or weak (fatigue)**\\n\\n#### Serious Side Effects (Require Immediate Medical Attention)\\n\\nThese side effects can be dangerous and require you to contact a doctor or seek emergency medical help right away.\\n\\n* **Severe Respiratory Depression:** This is the most dangerous risk. Signs include slow, shallow, or stopped breathing. It can lead to coma and death.\\n* **Extreme Drowsiness or Inability to Wake Up**\\n* **Confusion, Hallucinations, or Severe Mood Changes**\\n* **Seizures**\\n* **Low Blood Pressure (Hypotension):** Signs include feeling faint, dizzy, or fainting.\\n* **Allergic Reaction:** Signs include rash, hives, difficulty breathing, and swelling of the face, lips, tongue, or throat.\\n\\n#### Major Risks and Long-Term Warnings\\n\\n* **Addiction, Abuse, and Dependence:** Oxycodone has a very high potential for creating physical dependence and psychological addiction.\\n * **Dependence** means your body adapts to the drug, and you will experience withdrawal symptoms if you stop taking it abruptly.\\n * **Addiction** is a brain disease characterized by compulsive drug-seeking and use despite harmful consequences.\\n* **Overdose:** Taking too much oxycodone can be fatal, primarily due to respiratory depression (stopped breathing). The risk is significantly higher when mixed with alcohol, benzodiazepines (like Xanax or Valium), or other central nervous system depressants.\\n* **Withdrawal Symptoms:** If you stop taking oxycodone suddenly after long-term use, you may experience severe withdrawal symptoms, including muscle aches, anxiety, insomnia, sweating, nausea, diarrhea, and intense cravings.\\n\\n**If you or someone you know is struggling with substance use, please call the SAMHSA National Helpline at 1-800-662-HELP (4357).**\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 2437, \"prompt_tokens\": 25, \"total_tokens\": 2462, \"prompt_tokens_details\": {\"cached_tokens\": 0}, \"completion_tokens_details\": {\"reasoning_tokens\": 1528}}}}, \"final_result\": {\"id\": \"\", \"object\": \"chat.completion\", \"created\": 1770708307, \"model\": \"gemini-2.5-pro\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Of course. It is crucial to consult with a qualified healthcare provider for any questions about prescription medications like oxycodone. The information below is for educational purposes only and is not a substitute for professional medical advice, diagnosis, or treatment.\\n\\nRegarding your question about milligrams, oxycodone dosage is highly individualized and must be determined by a doctor. It depends on many factors, including:\\n\\n* The severity of the pain.\\n* The patient's age, weight, and overall health (especially liver and kidney function).\\n* Whether the patient has taken opioids before (tolerance).\\n* The specific formulation (immediate-release vs. extended-release).\\n\\nCommon strengths for immediate-release oxycodone tablets include 5 mg, 10 mg, 15 mg, 20 mg, and 30 mg. It is also available in combination with other drugs like acetaminophen (e.g., Percocet). **Never take oxycodone without a doctor's prescription and guidance.**\\n\\nHere is a list of the medical benefits and potential side effects of oxycodone.\\n\\n### Benefits (Medical Uses) of Oxycodone\\n\\nOxycodone is a powerful opioid analgesic prescribed for its primary benefit:\\n\\n* **Pain Relief:** Its main purpose is to manage moderate to severe pain that is not adequately controlled by other pain medications. It works by binding to opioid receptors in the brain and spinal cord, changing the way your body feels and responds to pain. It is used for:\\n * **Acute Pain:** Short-term, severe pain, such as after surgery or a major injury.\\n * **Chronic Pain:** Long-term, around-the-clock pain, often associated with conditions like cancer. For this, an extended-release formulation is typically used.\\n\\n---\\n\\n### Side Effects & Risks of Oxycodone\\n\\nOxycodone carries significant risks and a wide range of side effects, which can be categorized from common to severe.\\n\\n#### Common Side Effects\\n\\nThese are the most frequently reported side effects. While not typically life-threatening, they can be very uncomfortable.\\n\\n* **Constipation:** This is a very common and often persistent side effect of all opioids.\\n* **Drowsiness, Dizziness, or Lightheadedness:** Can impair your ability to drive or operate heavy machinery.\\n* **Nausea and Vomiting**\\n* **Headache**\\n* **Dry Mouth**\\n* **Itching or Sweating**\\n* **Feeling tired or weak (fatigue)**\\n\\n#### Serious Side Effects (Require Immediate Medical Attention)\\n\\nThese side effects can be dangerous and require you to contact a doctor or seek emergency medical help right away.\\n\\n* **Severe Respiratory Depression:** This is the most dangerous risk. Signs include slow, shallow, or stopped breathing. It can lead to coma and death.\\n* **Extreme Drowsiness or Inability to Wake Up**\\n* **Confusion, Hallucinations, or Severe Mood Changes**\\n* **Seizures**\\n* **Low Blood Pressure (Hypotension):** Signs include feeling faint, dizzy, or fainting.\\n* **Allergic Reaction:** Signs include rash, hives, difficulty breathing, and swelling of the face, lips, tongue, or throat.\\n\\n#### Major Risks and Long-Term Warnings\\n\\n* **Addiction, Abuse, and Dependence:** Oxycodone has a very high potential for creating physical dependence and psychological addiction.\\n * **Dependence** means your body adapts to the drug, and you will experience withdrawal symptoms if you stop taking it abruptly.\\n * **Addiction** is a brain disease characterized by compulsive drug-seeking and use despite harmful consequences.\\n* **Overdose:** Taking too much oxycodone can be fatal, primarily due to respiratory depression (stopped breathing). The risk is significantly higher when mixed with alcohol, benzodiazepines (like Xanax or Valium), or other central nervous system depressants.\\n* **Withdrawal Symptoms:** If you stop taking oxycodone suddenly after long-term use, you may experience severe withdrawal symptoms, including muscle aches, anxiety, insomnia, sweating, nausea, diarrhea, and intense cravings.\\n\\n**If you or someone you know is struggling with substance use, please call the SAMHSA National Helpline at 1-800-662-HELP (4357).**\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 2437, \"prompt_tokens\": 25, \"total_tokens\": 2462, \"prompt_tokens_details\": {\"cached_tokens\": 0}, \"completion_tokens_details\": {\"reasoning_tokens\": 1528}}}}40.4348192026-02-10 07:31:29.3301212026-02-10 07:31:29.330122
d2c9940f373d423b80eb75d1ccc39ad91571f78d465d4d53961f08758a243bb81{\"request_id\": \"fb75ca86-e855-95c2-ba17-646c5297ca17\", \"intermediate_results\": {\"templating\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: 81% aspirin contain resin and shellac in it. ?.\", \"role\": \"user\"}], \"llm\": {\"id\": \"\", \"object\": \"chat.completion\", \"created\": 1770708312, \"model\": \"gemini-2.5-pro\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Of course. It appears there may be a misunderstanding in the original question, which is very common. Let's clarify that first and then detail the benefits and side effects of the drug in question.\\n\\n### Clarification of the Question\\n\\nIt seems you are asking about **low-dose aspirin**, which is commonly available in an **81 mg** tablet (not 81%). The \\\"resin and shellac\\\" you mentioned are not active ingredients; they are inactive components used to create an **enteric coating**.\\n\\n* **81 mg Aspirin:** This is a low dose, often called \\\"baby aspirin,\\\" used for long-term preventative health.\\n* **Enteric Coating (Resin and Shellac):** This is a special, hard coating on the pill. Its purpose is to prevent the aspirin from dissolving in the stomach. Instead, it dissolves later in the small intestine. This coating helps reduce stomach irritation, a common side effect of aspirin. Shellac and resin are common, FDA-approved ingredients used in many medications and even some foods for this purpose.\\n\\nSo, the drug is **Aspirin (at a low dose of 81 mg)**. Here are its primary benefits and potential side effects.\\n\\n---\\n\\n### Benefits of Low-Dose (81 mg) Aspirin\\n\\nThe main benefit of a daily low-dose aspirin regimen is its ability to prevent blood clots. It works as an antiplatelet agent, making platelets in the blood less \\\"sticky.\\\" This is primarily used for cardiovascular protection.\\n\\n**1. Prevention of Heart Attack:** For individuals who have already had a heart attack, daily aspirin can significantly reduce the risk of having a second one.\\n\\n**2. Prevention of Ischemic Stroke:** An ischemic stroke is caused by a blood clot in the brain. For people who have had a previous ischemic stroke or a transient ischemic attack (TIA or \\\"mini-stroke\\\"), daily aspirin can help prevent another one.\\n\\n**3. After Cardiovascular Procedures:** Doctors often prescribe it after procedures like stent placement, coronary artery bypass surgery, or angioplasty to prevent new clots from forming on the treated vessels.\\n\\n**4. Primary Prevention in High-Risk Individuals:** In some cases, a doctor might recommend daily aspirin for people who have *not* yet had a heart attack or stroke but are at very high risk due to factors like diabetes, high blood pressure, and high cholesterol. **However, this practice is now less common, as the risks can outweigh the benefits for many.**\\n\\n### Side Effects and Risks of Aspirin\\n\\nEven at a low dose, daily aspirin is a powerful medication with significant risks. The decision to take it must always be made with a doctor.\\n\\n**Common Side Effects:**\\n\\n* **Stomach Upset:** Heartburn, indigestion, or nausea. The enteric coating is designed to minimize this, but it can still occur.\\n\\n**Serious Side Effects and Risks:**\\n\\n* **Gastrointestinal (GI) Bleeding:** This is the most significant risk. Aspirin can irritate the stomach lining and lead to ulcers or bleeding. Signs include black or tarry stools, vomiting blood (or what looks like coffee grounds), and persistent stomach pain. The risk increases with age and in people with a history of ulcers.\\n* **Hemorrhagic Stroke (Bleeding in the Brain):** While aspirin helps prevent strokes caused by clots, it increases the risk of strokes caused by bleeding. This is a critical reason why it should only be taken under medical supervision.\\n* **Allergic Reaction:** Some people are allergic to aspirin. Symptoms can include hives, facial swelling, wheezing, and anaphylactic shock.\\n* **Tinnitus:** Ringing in the ears, which usually occurs at higher doses but can happen with long-term use.\\n* **Kidney Problems:** Long-term use can, in some cases, affect kidney function.\\n\\n**\\u26a0\\ufe0f Special Warning: Reye's Syndrome**\\n\\nAspirin should **NEVER** be given to children or teenagers recovering from a viral illness like the flu or chickenpox. It is linked to Reye's syndrome, a rare but extremely serious condition that can cause swelling in the liver and brain.\\n\\n---\\n\\n### **Important: Talk to Your Doctor**\\n\\nThe decision to start or stop taking daily low-dose aspirin is a medical one that you should make with your healthcare provider. They will weigh your personal risk of a heart attack or stroke against your personal risk of serious bleeding.\\n\\n**Do not start taking daily aspirin on your own without consulting a doctor.**\\n\\nThis information is for educational purposes only and does not constitute medical advice.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 2415, \"prompt_tokens\": 30, \"total_tokens\": 2445, \"prompt_tokens_details\": {\"cached_tokens\": 0}, \"completion_tokens_details\": {\"reasoning_tokens\": 1458}}}}, \"final_result\": {\"id\": \"\", \"object\": \"chat.completion\", \"created\": 1770708312, \"model\": \"gemini-2.5-pro\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Of course. It appears there may be a misunderstanding in the original question, which is very common. Let's clarify that first and then detail the benefits and side effects of the drug in question.\\n\\n### Clarification of the Question\\n\\nIt seems you are asking about **low-dose aspirin**, which is commonly available in an **81 mg** tablet (not 81%). The \\\"resin and shellac\\\" you mentioned are not active ingredients; they are inactive components used to create an **enteric coating**.\\n\\n* **81 mg Aspirin:** This is a low dose, often called \\\"baby aspirin,\\\" used for long-term preventative health.\\n* **Enteric Coating (Resin and Shellac):** This is a special, hard coating on the pill. Its purpose is to prevent the aspirin from dissolving in the stomach. Instead, it dissolves later in the small intestine. This coating helps reduce stomach irritation, a common side effect of aspirin. Shellac and resin are common, FDA-approved ingredients used in many medications and even some foods for this purpose.\\n\\nSo, the drug is **Aspirin (at a low dose of 81 mg)**. Here are its primary benefits and potential side effects.\\n\\n---\\n\\n### Benefits of Low-Dose (81 mg) Aspirin\\n\\nThe main benefit of a daily low-dose aspirin regimen is its ability to prevent blood clots. It works as an antiplatelet agent, making platelets in the blood less \\\"sticky.\\\" This is primarily used for cardiovascular protection.\\n\\n**1. Prevention of Heart Attack:** For individuals who have already had a heart attack, daily aspirin can significantly reduce the risk of having a second one.\\n\\n**2. Prevention of Ischemic Stroke:** An ischemic stroke is caused by a blood clot in the brain. For people who have had a previous ischemic stroke or a transient ischemic attack (TIA or \\\"mini-stroke\\\"), daily aspirin can help prevent another one.\\n\\n**3. After Cardiovascular Procedures:** Doctors often prescribe it after procedures like stent placement, coronary artery bypass surgery, or angioplasty to prevent new clots from forming on the treated vessels.\\n\\n**4. Primary Prevention in High-Risk Individuals:** In some cases, a doctor might recommend daily aspirin for people who have *not* yet had a heart attack or stroke but are at very high risk due to factors like diabetes, high blood pressure, and high cholesterol. **However, this practice is now less common, as the risks can outweigh the benefits for many.**\\n\\n### Side Effects and Risks of Aspirin\\n\\nEven at a low dose, daily aspirin is a powerful medication with significant risks. The decision to take it must always be made with a doctor.\\n\\n**Common Side Effects:**\\n\\n* **Stomach Upset:** Heartburn, indigestion, or nausea. The enteric coating is designed to minimize this, but it can still occur.\\n\\n**Serious Side Effects and Risks:**\\n\\n* **Gastrointestinal (GI) Bleeding:** This is the most significant risk. Aspirin can irritate the stomach lining and lead to ulcers or bleeding. Signs include black or tarry stools, vomiting blood (or what looks like coffee grounds), and persistent stomach pain. The risk increases with age and in people with a history of ulcers.\\n* **Hemorrhagic Stroke (Bleeding in the Brain):** While aspirin helps prevent strokes caused by clots, it increases the risk of strokes caused by bleeding. This is a critical reason why it should only be taken under medical supervision.\\n* **Allergic Reaction:** Some people are allergic to aspirin. Symptoms can include hives, facial swelling, wheezing, and anaphylactic shock.\\n* **Tinnitus:** Ringing in the ears, which usually occurs at higher doses but can happen with long-term use.\\n* **Kidney Problems:** Long-term use can, in some cases, affect kidney function.\\n\\n**\\u26a0\\ufe0f Special Warning: Reye's Syndrome**\\n\\nAspirin should **NEVER** be given to children or teenagers recovering from a viral illness like the flu or chickenpox. It is linked to Reye's syndrome, a rare but extremely serious condition that can cause swelling in the liver and brain.\\n\\n---\\n\\n### **Important: Talk to Your Doctor**\\n\\nThe decision to start or stop taking daily low-dose aspirin is a medical one that you should make with your healthcare provider. They will weigh your personal risk of a heart attack or stroke against your personal risk of serious bleeding.\\n\\n**Do not start taking daily aspirin on your own without consulting a doctor.**\\n\\nThis information is for educational purposes only and does not constitute medical advice.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 2415, \"prompt_tokens\": 30, \"total_tokens\": 2445, \"prompt_tokens_details\": {\"cached_tokens\": 0}, \"completion_tokens_details\": {\"reasoning_tokens\": 1458}}}}45.2776842026-02-10 07:31:29.3301222026-02-10 07:31:29.330122
\n", + "
\n", + " \n", + "
\n", + "

Table: evaluation_result

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
submission_idrun_idrepetition_countmetricaggregating_valuemetric_resulterrorcreated_atupdated_at
b5f494721730469f922759caf919d4701571f78d465d4d53961f08758a243bb81\"Pointwise Answer Relevance\"5{\"explanation\": \"The response directly addresses the user's query by explaining both the benefits and side effects of rivastigmine and common OTC sleep medicines, as requested. It also provides a clear and detailed explanation of their interaction, including the potential risks and why combining them can be harmful. The information is organized, accurate, and highly relevant to the user's question, offering both general drug information and specific interaction details. This fully meets the criteria for relevance.\", \"rating\": 5}None2026-02-10 07:42:50.3289072026-02-10 07:42:50.328912
cb37a740df1b4a43a50ce2bf6720eda01571f78d465d4d53961f08758a243bb81\"Pointwise Answer Relevance\"5{\"explanation\": \"The response directly addresses the user's question about how Valium affects the brain by explaining its mechanism of action (enhancing GABA activity), which is highly relevant. It then clearly lists both the benefits (therapeutic uses) and side effects, specifically focusing on those related to brain function and overall neurological impact. The information is detailed, accurate, and tailored to the context of the question. The response also includes a disclaimer, which is appropriate for consumer health information. Overall, the response is highly relevant, directly answering the user's query and providing useful, pertinent information.\", \"rating\": 5}None2026-02-10 07:42:50.3289132026-02-10 07:42:50.328913
fe41557ffc8d410681a10dee1da5bc691571f78d465d4d53961f08758a243bb81\"Pointwise Answer Relevance\"5{\"explanation\": \"The response directly addresses the user's query by providing a clear explanation of what morphine is, followed by a detailed list of its benefits and side effects. It covers both common and serious side effects, and explains the contexts in which morphine is used, such as pain management and palliative care. The information is pertinent, comprehensive, and framed appropriately for a consumer health question. The response is highly relevant and useful for someone seeking information about morphine's benefits and risks.\", \"rating\": 5}None2026-02-10 07:42:50.3289142026-02-10 07:42:50.328916
03d1b9791ee640f088980fd7cb6426a41571f78d465d4d53961f08758a243bb81\"Pointwise Answer Relevance\"5{\"explanation\": \"The user asked for the benefits and side effects of oxycodone in the context of a consumer health question about its milligram dosages. The response first addresses the milligram strengths available for oxycodone, then provides a thorough and detailed list of both the benefits (medical uses) and side effects (common, serious, and long-term risks) of the drug. The information is accurate, comprehensive, and directly relevant to the user's query. The response also includes appropriate safety warnings and resources, which are pertinent for a consumer health context. Overall, the response is highly relevant and fully addresses the user's request.\", \"rating\": 5}None2026-02-10 07:42:50.3289172026-02-10 07:42:50.328918
d2c9940f373d423b80eb75d1ccc39ad91571f78d465d4d53961f08758a243bb81\"Pointwise Answer Relevance\"5{\"explanation\": \"The response directly addresses the user's query by clarifying the confusion in the question (81% vs. 81 mg aspirin, and the role of resin and shellac as enteric coating agents). It then provides a comprehensive and accurate list of the benefits and side effects of low-dose aspirin, which is the drug in question. The explanation is detailed, relevant, and includes important warnings and context for consumer health. The response is highly pertinent and useful for the user's needs.\", \"rating\": 5}None2026-02-10 07:42:50.3289192026-02-10 07:42:50.328919
\n", + "
\n", + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# viewing the results from sqlite db in tabular format..\n", + "import sqlite3\n", + "import pandas as pd\n", + "from IPython.display import display, HTML\n", + "\n", + "# Path to your SQLite database file\n", + "db_file = 'results-new/results.db'\n", + "\n", + "connection = sqlite3.connect(db_file)\n", + "\n", + "# Specify the table names you want to display\n", + "table_names = ['run','configuration', 'submission', 'submission_result', 'evaluation_result'] \n", + "\n", + "# Create the CSS and HTML container\n", + "html_content = \"\"\"\n", + "\n", + "
\n", + "\"\"\"\n", + "\n", + "for table_name in table_names:\n", + " query = f\"SELECT * FROM {table_name};\"\n", + " df = pd.read_sql_query(query, connection)\n", + " # If you want to see all the rows across all tables, remove/comment the next line\n", + " df = df.head(5) # Limiting the number of rows displayed\n", + " table_html = df.to_html(classes='table-container', index=False)\n", + " html_content += f\"\"\"\n", + "
\n", + "

Table: {table_name}

\n", + " {table_html}\n", + "
\n", + " \"\"\"\n", + "\n", + "html_content += \"
\"\n", + "\n", + "display(HTML(html_content))\n", + "\n", + "# Close the connection\n", + "connection.close()" + ] + }, + { + "cell_type": "code", + "execution_count": 94, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "
\n", + "\n", + "
\n", + "

Categorical Comparison

\n", + "

Values: Weighted Average (1-5 scale). Win Rate based on head-to-head performance.

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
run_idrun_namemodelWin RateFinal RankPointwise Instruction FollowingPointwise Answer Relevance
19722d52bde94ac488b1bd8abbd5bec9Run-genai-eval-test-gemini-2.5-pro-001gemini-2.5-pro0.015.05.0
\n", + "
\n", + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "import sqlite3\n", + "import json\n", + "import os\n", + "from IPython.display import display, HTML\n", + "\n", + "# ==========================================\n", + "# 1. CONFIGURATION (Separated Groups)\n", + "# ==========================================\n", + "METRIC_GROUPS = {\n", + " \"Categorical\": {\n", + " \"type\": \"categorical\",\n", + " \"description\": \"Weighted Average (1-5 scale)\",\n", + " \"metrics\": [\n", + " \"Pointwise Conciseness\", \n", + " \"Pointwise Instruction Following\", \n", + " \"Pointwise Correctness\", \n", + " \"Pointwise Answer Relevance\"\n", + " ]\n", + " },\n", + " \"Boolean\": {\n", + " \"type\": \"categorical\", # Uses same weighted avg logic (0 or 1)\n", + " \"description\": \"Pass Rate (0-1 scale)\",\n", + " \"metrics\": [\n", + " \"Exact Match\",\n", + " \"Content Filter on Input\",\n", + " \"Content Filter on Output\",\n", + " \"Language Match\",\n", + " \"JSON Schema Match\"\n", + " ]\n", + " },\n", + " \"Numerical\": {\n", + " \"type\": \"numerical\",\n", + " \"description\": \"Mean Value\",\n", + " \"metrics\": [\n", + " \"BLEU\", \n", + " \"ROUGE\", \n", + " \"BERT Score\",\n", + " \"test-metric\"\n", + " ]\n", + " }\n", + "}\n", + "\n", + "# ==========================================\n", + "# 2. DATA EXTRACTION\n", + "# ==========================================\n", + "def extract_db_metadata(db_path):\n", + " if not os.path.exists(db_path): return pd.DataFrame()\n", + " conn = sqlite3.connect(db_path)\n", + " df_runs = pd.read_sql_query(\"SELECT id, name, tags, config FROM run\", conn)\n", + " conn.close()\n", + " \n", + " meta_data = []\n", + " for _, row in df_runs.iterrows():\n", + " run_id = str(row[\"id\"])\n", + " run_name = str(row[\"name\"])\n", + " tags = {}\n", + " config = {}\n", + " try: tags = json.loads(row[\"tags\"]) if isinstance(row[\"tags\"], str) else row[\"tags\"]\n", + " except: pass\n", + " try: config = json.loads(row[\"config\"]) if isinstance(row[\"config\"], str) else row[\"config\"]\n", + " except: pass\n", + "\n", + " model = \"Unknown\"\n", + " try: model = config[\"modules\"][\"prompt_templating\"][\"model\"][\"name\"]\n", + " except:\n", + " if isinstance(tags, dict): model = tags.get(\"evaluation.ai.sap.com/model\", \"Unknown\")\n", + " elif isinstance(tags, list):\n", + " for t in tags: \n", + " if t.get(\"key\") == \"evaluation.ai.sap.com/model\": model = t.get(\"value\")\n", + "\n", + " meta_data.append({\"run_id\": run_id, \"run_name\": run_name, \"model\": model})\n", + " return pd.DataFrame(meta_data)\n", + "\n", + "def extract_api_metrics(runs_data_resource):\n", + " flat_data = []\n", + " for run in runs_data_resource:\n", + " model = \"Unknown\"\n", + " for t in run.get(\"tags\", []):\n", + " if t.get(\"name\") == \"evaluation.ai.sap.com/model\":\n", + " model = t.get(\"value\")\n", + " break\n", + " for m in run.get(\"metrics\", []):\n", + " clean_name = m.get(\"name\", \"\").replace('\"', '').strip()\n", + " flat_data.append({\n", + " \"model\": model,\n", + " \"metrics_name_clean\": clean_name,\n", + " \"metric_value\": m.get(\"value\")\n", + " })\n", + " df = pd.DataFrame(flat_data)\n", + " df['metric_value'] = pd.to_numeric(df['metric_value'], errors='coerce')\n", + " return df\n", + "\n", + "# ==========================================\n", + "# 3. SCORING & HELM LOGIC\n", + "# ==========================================\n", + "def calculate_weighted_avg_score(row, cols):\n", + " \"\"\" Returns a score based on counts. \n", + " Categorical: 1-5 scale. \n", + " Boolean: 0-1 scale (Pass Rate). \n", + " \"\"\"\n", + " total_score = 0\n", + " total_count = 0\n", + " # Check counts 0-5 (covers Boolean 0/1 and Categorical 1-5)\n", + " for rating in range(0, 6):\n", + " col_name = next((c for c in cols if f\"/{rating}/count\" in c), None)\n", + " if col_name and not pd.isna(row[col_name]):\n", + " count = row[col_name]\n", + " total_score += count * rating\n", + " total_count += count\n", + " return total_score / total_count if total_count > 0 else 0.0\n", + "\n", + "def get_metric_score_series(df_metrics, metric_name, group_type):\n", + " \"\"\" Returns a Series of SCORES (Scalar) for each model for a specific metric \"\"\"\n", + " subset = df_metrics[df_metrics['metrics_name_clean'].str.startswith(metric_name)]\n", + " if subset.empty: return None\n", + "\n", + " # Pivot to get columns for this metric\n", + " pivot = subset.pivot_table(index='model', columns='metrics_name_clean', values='metric_value', aggfunc='first')\n", + " cols = pivot.columns.tolist()\n", + " \n", + " if group_type == \"categorical\":\n", + " # Calculate Weighted Average (or Pass Rate for Boolean)\n", + " return pivot.apply(lambda row: calculate_weighted_avg_score(row, cols), axis=1)\n", + " else:\n", + " # Calculate Mean (Numerical)\n", + " c_mean = next((c for c in cols if \"mean\" in c), None)\n", + " if c_mean: return pivot[c_mean]\n", + " return None\n", + "\n", + "def calculate_group_win_rate(score_table):\n", + " \"\"\"\n", + " Calculates HELM Win Rate: % of times a model beats another model across all metrics in this group.\n", + " \"\"\"\n", + " models = score_table.index.tolist()\n", + " metrics = score_table.columns.tolist()\n", + " win_rates = {}\n", + "\n", + " for model_a in models:\n", + " wins = 0\n", + " comparisons = 0\n", + " \n", + " for model_b in models:\n", + " if model_a == model_b: continue\n", + " \n", + " # Compare across ALL metrics in this table\n", + " for metric in metrics:\n", + " score_a = score_table.at[model_a, metric]\n", + " score_b = score_table.at[model_b, metric]\n", + " \n", + " # Only compare valid scores\n", + " if pd.isna(score_a) or pd.isna(score_b): continue\n", + " \n", + " comparisons += 1\n", + " if score_a > score_b:\n", + " wins += 1\n", + " \n", + " win_rates[model_a] = wins / comparisons if comparisons > 0 else 0.0\n", + " \n", + " return pd.Series(win_rates)\n", + "\n", + "# ==========================================\n", + "# 4. EXECUTION\n", + "# ==========================================\n", + "db_file = 'results-new/results.db'\n", + "\n", + "# A. Metadata\n", + "df_db_meta = extract_db_metadata(db_file)\n", + "df_db_unique = df_db_meta.drop_duplicates(subset=['model'], keep='last')\n", + "\n", + "# B. CSS\n", + "html_content = \"\"\"\n", + "\n", + "
\n", + "\"\"\"\n", + "if 'runs_data' in locals() and runs_data:\n", + " df_metrics_all = extract_api_metrics(runs_data['resources'])\n", + " \n", + " for group_name, config in METRIC_GROUPS.items():\n", + " \n", + " # 1. Build Score Table\n", + " score_table = pd.DataFrame(index=df_db_unique['model'].unique())\n", + " score_table.index.name = 'model'\n", + " \n", + " valid_metrics = []\n", + " \n", + " # 2. Calculate Scores\n", + " for metric in config[\"metrics\"]:\n", + " scores = get_metric_score_series(df_metrics_all, metric, config[\"type\"])\n", + " if scores is not None:\n", + " score_table[metric] = scores\n", + " valid_metrics.append(metric)\n", + " \n", + " if not valid_metrics:\n", + " continue\n", + "\n", + " # 3. Calculate HELM Win Rate (Specific to this group)\n", + " score_table['Win Rate'] = calculate_group_win_rate(score_table[valid_metrics])\n", + " \n", + " # 4. Calculate Final Rank\n", + " score_table['Final Rank'] = score_table['Win Rate'].rank(ascending=False, method='min')\n", + " \n", + " # 5. Merge & Format\n", + " df_final = pd.merge(df_db_unique, score_table, on='model', how='inner')\n", + " df_final = df_final.sort_values('Final Rank')\n", + " \n", + " # Rounding\n", + " for c in valid_metrics: df_final[c] = df_final[c].fillna(0.0).astype(float).round(4)\n", + " df_final['Win Rate'] = df_final['Win Rate'].fillna(0.0).astype(float).round(4)\n", + " df_final['Final Rank'] = df_final['Final Rank'].fillna(0).astype(int)\n", + " \n", + " # Columns\n", + " meta_cols = ['run_id', 'run_name', 'model']\n", + " final_cols = meta_cols + ['Win Rate', 'Final Rank'] + valid_metrics\n", + " \n", + " # 6. Generate HTML\n", + " table_html = df_final[final_cols].to_html(classes='table-container', index=False)\n", + " \n", + " html_content += f\"\"\"\n", + "
\n", + "

{group_name} Comparison

\n", + "

Values: {config['description']}. Win Rate based on head-to-head performance.

\n", + " {table_html}\n", + "
\n", + " \"\"\"\n", + "\n", + " html_content += \"
\"\n", + " display(HTML(html_content))\n", + " \n", + "else:\n", + " print(\"'runs_data' missing.\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Delete Execution Id\n", + "def delete_execution():\n", + " headers = _get_headers()\n", + " EXEC_ID = execution_id\n", + " GET_EXECUTIONS_ENDPOINT = '/v2/lm/executions/'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_EXECUTIONS_ENDPOINT}{EXEC_ID}\"\n", + " try:\n", + " response = requests.delete(\n", + " request_url, headers=headers, params={\"AI-Resource-Group\":AICORE_RESOURCE_GROUP}, timeout=120\n", + " )\n", + " print(response)\n", + " if(response.status_code != 202):\n", + " raise\n", + " result = response.json()\n", + " print(result)\n", + " except:\n", + " logging.error(\"Error occurred while attempting to delete a Configuration\")\n", + " raise\n", + " \n", + "delete_execution()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.4" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/AI_Core.json b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/AI_Core.json new file mode 100644 index 000000000..bb30bf61b --- /dev/null +++ b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/AI_Core.json @@ -0,0 +1,1578 @@ +{ + "name": "AI Core", + "version": "1", + "items": [ + { + "type": "http", + "name": "get_token", + "filename": "get_token.bru", + "seq": 1, + "request": { + "url": "{{ai_auth_url}}/oauth/token", + "method": "POST", + "headers": [ + { + "name": "Content-Type", + "value": "application/x-www-form-urlencoded", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "formUrlEncoded", + "formUrlEncoded": [ + { + "name": "grant_type", + "value": "client_credentials", + "enabled": true + }, + { + "name": "client_id", + "value": "{{client_id}}", + "enabled": true + }, + { + "name": "client_secret", + "value": "{{client_secret}}", + "enabled": true + } + ], + "multipartForm": [], + "file": [] + }, + "script": { + "res": "if (res.getStatus() == 200) {\n bru.setEnvVar(\"access_token\", res.body.access_token);\n}" + }, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "", + "auth": { + "mode": "none" + } + } + }, + { + "type": "folder", + "name": "admin", + "filename": "admin", + "root": { + "meta": { + "name": "admin" + } + }, + "items": [ + { + "type": "folder", + "name": "objectStoreSecrets", + "filename": "objectStoreSecrets", + "root": { + "meta": { + "name": "objectStoreSecrets" + } + }, + "items": [ + { + "type": "http", + "name": "Create a secret", + "filename": "Create a secret.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/v2/admin/objectStoreSecrets", + "method": "POST", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Content-Type", + "value": "application/json", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + }, + { + "name": "Authorization", + "value": "", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "json", + "json": "{\n \"name\": \"genai-data\",\n \"data\": {\n \"AWS_ACCESS_KEY_ID\": \"\",\n \"AWS_SECRET_ACCESS_KEY\": \"\"\n },\n \"type\": \"S3\",\n \"bucket\": \"\",\n \"endpoint\": \"https://s3.eu-central-1.amazonaws.com\",\n \"region\": \"\",\n \"pathPrefix\": \"\" \n }", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Create a secret based on the configuration in the request body\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "http", + "name": "Get a list of metadata of available secrets.", + "filename": "Get a list of metadata of available secrets.bru", + "seq": 2, + "request": { + "url": "{{baseUrl}}/admin/objectStoreSecrets?$top=&$skip=&$count=", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [ + { + "name": "$top", + "value": "", + "type": "query", + "enabled": true + }, + { + "name": "$skip", + "value": "", + "type": "query", + "enabled": true + }, + { + "name": "$count", + "value": "", + "type": "query", + "enabled": true + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve a list of metadata of the stored secrets.\n", + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "credentialsPlacement": "basic_auth_header", + "pkce": false, + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + } + } + }, + { + "type": "folder", + "name": "{objectStoreName}", + "filename": "{objectStoreName}", + "root": { + "meta": { + "name": "{objectStoreName}" + } + }, + "items": [ + { + "type": "http", + "name": "Delete object store secret", + "filename": "Delete object store secret.bru", + "seq": 2, + "request": { + "url": "{{baseUrl}}/admin/objectStoreSecrets/:objectStoreName", + "method": "DELETE", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [ + { + "name": "objectStoreName", + "value": "qKoZ-aHSe", + "type": "path", + "enabled": true + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Delete a secret with the name of objectStoreName if it exists.", + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "credentialsPlacement": "basic_auth_header", + "pkce": false, + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + } + } + }, + { + "type": "http", + "name": "Returns the of metadata of secrets which match the query parameter.", + "filename": "Returns the of metadata of secrets which match the query parameter.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/v2/admin/objectStoreSecrets", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "This retrieves the metadata of the stored secret which match the parameter objectStoreName.\nThe fetched secret is constructed like objectStoreName-object-store-secret\nThe base64 encoded field for the stored secret is not returned.\n", + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "credentialsPlacement": "basic_auth_header", + "pkce": false, + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + } + } + }, + { + "type": "http", + "name": "Update object store secret", + "filename": "Update object store secret.bru", + "seq": 3, + "request": { + "url": "{{baseUrl}}/admin/objectStoreSecrets/:objectStoreName", + "method": "PATCH", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Content-Type", + "value": "application/json", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [ + { + "name": "objectStoreName", + "value": "qKoZ-aHSe", + "type": "path", + "enabled": true + } + ], + "body": { + "mode": "json", + "json": "{\n \"name\": \"\",\n \"type\": \"\",\n \"data\": {},\n \"bucket\": \"\",\n \"endpoint\": \"\",\n \"region\": \"\",\n \"pathPrefix\": \"\",\n \"verifyssl\": \"\",\n \"usehttps\": \"1\"\n}", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Update a secret with name of objectStoreName if it exists.\n", + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "credentialsPlacement": "basic_auth_header", + "pkce": false, + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + } + } + } + ] + } + ] + } + ] + }, + { + "type": "folder", + "name": "lm", + "filename": "lm", + "root": { + "meta": { + "name": "lm" + } + }, + "items": [ + { + "type": "folder", + "name": "configurations", + "filename": "configurations", + "root": { + "meta": { + "name": "configurations" + } + }, + "items": [ + { + "type": "http", + "name": "Create configuration Copy", + "filename": "Create configuration Copy.bru", + "seq": 3, + "request": { + "url": "{{baseUrl}}/v2/lm/configurations", + "method": "DELETE", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Content-Type", + "value": "application/json", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "json", + "json": "{\n \"id\": \"\"\n}", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Create a new configuration linked to a specific scenario and executable for use in an execution\nor deployment.\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "http", + "name": "Create configuration", + "filename": "Create configuration.bru", + "seq": 2, + "request": { + "url": "{{baseUrl}}/v2/lm/configurations", + "method": "POST", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Content-Type", + "value": "application/json", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "json", + "json": "{\n \"name\": \"genai-eval-conf\",\n \"scenarioId\": \"genai-evaluations\",\n \"executableId\": \"genai-evaluations-simplified\",\n \"inputArtifactBindings\": [\n {\n \"key\": \"datasetFolder\",\n \"artifactId\": \"\"\n }\n ],\n \"parameterBindings\": [\n {\n \"key\": \"repetitions\",\n \"value\": \"1\"\n },\n {\n \"key\": \"orchestrationDeploymentURL\",\n \"value\": \"\"\n\n },\n {\n \"key\": \"metrics\",\n \"value\": \"language_match\"\n },\n {\n \"key\": \"testDataset\",\n \"value\": \"{\\\"path\\\": \\\"testdata/global_customer_queries.csv\\\", \\\"type\\\": \\\"csv\\\"}\"\n },\n {\n \"key\": \"promptTemplate\",\n \"value\": \"\"\n },\n {\n \"key\": \"models\",\n \"value\": \"gpt-4.1:latest\"\n }\n ]\n}\n", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Create a new configuration linked to a specific scenario and executable for use in an execution\nor deployment.\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "http", + "name": "Get list of configurations", + "filename": "Get list of configurations.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/v2/lm/configurations", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve a list of configurations. Filter results by scenario ID or a list of executable IDs.\nSearch for configurations containing the search string as substring in the configuration name.\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "folder", + "name": "{configurationId}", + "filename": "{configurationId}", + "root": { + "meta": { + "name": "{configurationId}" + } + }, + "items": [ + { + "type": "http", + "name": "Get configuration by ID", + "filename": "Get configuration by ID.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/v2/lm/configurations", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve details for configuration with configurationId.", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + } + ] + }, + { + "type": "folder", + "name": "$count", + "filename": "$count", + "root": { + "meta": { + "name": "$count" + } + }, + "items": [ + { + "type": "http", + "name": "Get number of configurations", + "filename": "Get number of configurations.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/lm/configurations/$count?scenarioId=iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN&$search=}\"NI2Kn!V&searchCaseInsensitive=false&executableIds=T_jtbUJzwg0e.okSV667jeZejqVb,3e0cmfc4c-6YavNz92uztZE", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Accept", + "value": "text/plain", + "enabled": true + } + ], + "params": [ + { + "name": "scenarioId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": true + }, + { + "name": "$search", + "value": "}\"NI2Kn!V", + "type": "query", + "enabled": true + }, + { + "name": "searchCaseInsensitive", + "value": "false", + "type": "query", + "enabled": true + }, + { + "name": "executableIds", + "value": "T_jtbUJzwg0e.okSV667jeZejqVb,3e0cmfc4c-6YavNz92uztZE", + "type": "query", + "enabled": true + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve the number of available configurations that match the specified filter criteria.\nFilter criteria include a scenarioId or executableIdsList. Search by substring of configuration name is also possible.\n", + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "credentialsPlacement": "basic_auth_header", + "pkce": false, + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + } + } + } + ] + } + ] + }, + { + "type": "folder", + "name": "artifacts", + "filename": "artifacts", + "root": { + "meta": { + "name": "artifacts" + } + }, + "items": [ + { + "type": "http", + "name": "Get list of artifacts", + "filename": "Get list of artifacts.bru", + "seq": 2, + "request": { + "url": "{{baseUrl}}/v2/lm/artifacts", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [ + { + "name": "scenarioId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": false + }, + { + "name": "executionId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": false + }, + { + "name": "name", + "value": "[G7 ovyt8i", + "type": "query", + "enabled": false + }, + { + "name": "kind", + "value": "other", + "type": "query", + "enabled": false + }, + { + "name": "artifactLabelSelector", + "value": "ext.ai.sap.com/bXN1EAk=D*", + "type": "query", + "enabled": false + }, + { + "name": "$top", + "value": "10000", + "type": "query", + "enabled": false + }, + { + "name": "$skip", + "value": "", + "type": "query", + "enabled": false + }, + { + "name": "$search", + "value": "}\"NI2Kn!V", + "type": "query", + "enabled": false + }, + { + "name": "searchCaseInsensitive", + "value": "false", + "type": "query", + "enabled": false + }, + { + "name": "$expand", + "value": "scenario", + "type": "query", + "enabled": false + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve a list of artifacts that matches the specified filter criteria.\nFilter criteria include scenario ID, execution ID, an artifact name, artifact kind, or artifact labels.\nUse top/skip parameters to paginate the result list.\nSearch by substring of artifact name or description, if required.\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "http", + "name": "Register artifact", + "filename": "Register artifact.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/v2/lm/artifacts", + "method": "POST", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Content-Type", + "value": "application/json", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "json", + "json": "{\n \"name\": \"aiconfig\",\n \"kind\": \"dataset\",\n \"url\": \"ai://genai-data/genaiEvaluation/14af1af80b974edb8731632d17286343\",\n \"scenarioId\": \"genai-evaluations\"\n}\n", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Register an artifact for use in a configuration, for example a model or a dataset.", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "folder", + "name": "$count", + "filename": "$count", + "root": { + "meta": { + "name": "$count" + } + }, + "items": [ + { + "type": "http", + "name": "Get number of artifacts", + "filename": "Get number of artifacts.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/lm/artifacts/$count?scenarioId=iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN&executionId=iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN&name=[G7 ovyt8i&kind=other&$search=}\"NI2Kn!V&searchCaseInsensitive=false&artifactLabelSelector=ext.ai.sap.com/bXN1EAk=D*", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Accept", + "value": "text/plain", + "enabled": true + } + ], + "params": [ + { + "name": "scenarioId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": true + }, + { + "name": "executionId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": true + }, + { + "name": "name", + "value": "[G7 ovyt8i", + "type": "query", + "enabled": true + }, + { + "name": "kind", + "value": "other", + "type": "query", + "enabled": true + }, + { + "name": "$search", + "value": "}\"NI2Kn!V", + "type": "query", + "enabled": true + }, + { + "name": "searchCaseInsensitive", + "value": "false", + "type": "query", + "enabled": true + }, + { + "name": "artifactLabelSelector", + "value": "ext.ai.sap.com/bXN1EAk=D*", + "type": "query", + "enabled": true + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve the number of available artifacts that match the specified filter criteria.\nFilter criteria include a scenarioId, executionId, an artifact name, artifact kind, or artifact labels.\nSearch by substring of artifact name or description is also possible.\n", + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "credentialsPlacement": "basic_auth_header", + "pkce": false, + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + } + } + } + ] + } + ] + }, + { + "type": "folder", + "name": "executions", + "filename": "executions", + "root": { + "meta": { + "name": "executions" + } + }, + "items": [ + { + "type": "http", + "name": "Create execution", + "filename": "Create execution.bru", + "seq": 2, + "request": { + "url": "{{baseUrl}}/v2/lm/executions", + "method": "POST", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Content-Type", + "value": "application/json", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "json", + "json": "{\n \"configurationId\": \"\"\n}", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Create an execution using the configuration specified by configurationId.", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "http", + "name": "Get list of executions", + "filename": "Get list of executions.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/v2/lm/executions/", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [ + { + "name": "scenarioId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": false + }, + { + "name": "executionScheduleId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": false + }, + { + "name": "status", + "value": "DEAD", + "type": "query", + "enabled": false + }, + { + "name": "$top", + "value": "10000", + "type": "query", + "enabled": false + }, + { + "name": "$skip", + "value": "", + "type": "query", + "enabled": false + }, + { + "name": "$select", + "value": "status", + "type": "query", + "enabled": false + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve a list of executions that match the specified filter criteria.\nFilter criteria include a list of executableIds, a scenarioId, a configurationId, or a execution status.\nWith top/skip parameters it is possible to paginate the result list.\nWith select parameter it is possible to select only status.\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "folder", + "name": "$count", + "filename": "$count", + "root": { + "meta": { + "name": "$count" + } + } + } + ] + }, + { + "type": "folder", + "name": "deployments", + "filename": "deployments", + "root": { + "meta": { + "name": "deployments" + } + }, + "items": [ + { + "type": "http", + "name": "Create deployment", + "filename": "Create deployment.bru", + "seq": 2, + "request": { + "url": "{{baseUrl}}/v2/lm/deployments", + "method": "POST", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Content-Type", + "value": "application/json", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "json", + "json": "{\n \"configurationId\": \"\"\n}", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Create a deployment using the configuration specified by configurationId after synchronously checking the\ncorrectness of the configuration.\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "http", + "name": "Get list of deployments", + "filename": "Get list of deployments.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/v2/lm/deployments", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve a list of deployments that match the specified filter criteria.\nFilter criteria include a list of executableIds, a scenarioId, a configurationId, or a deployment status.\nWith top/skip parameters it is possible to paginate the result list.\nWith select parameter it is possible to select only status.\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "folder", + "name": "$count", + "filename": "$count", + "root": { + "meta": { + "name": "$count" + } + }, + "items": [ + { + "type": "http", + "name": "Get number of deployments", + "filename": "Get number of deployments.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/lm/deployments/$count?executableIds=T_jtbUJzwg0e.okSV667jeZejqVb,3e0cmfc4c-6YavNz92uztZE&configurationId=iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN&scenarioId=iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN&status=DEAD", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Accept", + "value": "text/plain", + "enabled": true + } + ], + "params": [ + { + "name": "executableIds", + "value": "T_jtbUJzwg0e.okSV667jeZejqVb,3e0cmfc4c-6YavNz92uztZE", + "type": "query", + "enabled": true + }, + { + "name": "configurationId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": true + }, + { + "name": "scenarioId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": true + }, + { + "name": "status", + "value": "DEAD", + "type": "query", + "enabled": true + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve the number of available deployments. The number can be filtered by\nscenarioId, configurationId, executableIdsList or by deployment status.\n", + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "credentialsPlacement": "basic_auth_header", + "pkce": false, + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + } + } + } + ] + } + ] + }, + { + "type": "folder", + "name": "metrics", + "filename": "metrics", + "root": { + "meta": { + "name": "metrics" + } + }, + "items": [ + { + "type": "http", + "name": "Evaluation Metrics via Execution ID", + "filename": "Evaluation Metrics via Execution ID.bru", + "seq": 4, + "request": { + "url": "{{baseUrl}}/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/child-of=", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [ + { + "name": "tagFilters", + "url": "{{baseUrl}}/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/child-of=", + "value": "evaluation.ai.sap.com/child-of=", + "type": "query", + "enabled": true + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "http", + "name": "Metrics by Run Name", + "filename": "Metrics by Run Name.bru", + "seq": 5, + "request": { + "url": "{{baseUrl}}/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/run-name=run1", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [ + { + "name": "tagFilters", + "value": "evaluation.ai.sap.com/run-name=run1", + "type": "query", + "enabled": true + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + } + ] + } + ] + } + ], + "activeEnvironmentUid": "lWUmIcEkGnkMxwNBILLmY", + "environments": [ + { + "variables": [ + { + "name": "ai_auth_url", + "value": "", + "enabled": true, + "secret": false, + "type": "text" + }, + { + "name": "ai_api_url", + "value": "", + "enabled": true, + "secret": false, + "type": "text" + }, + { + "name": "client_id", + "value": "", + "enabled": true, + "secret": false, + "type": "text" + }, + { + "name": "client_secret", + "value": "", + "enabled": true, + "secret": false, + "type": "text" + }, + { + "name": "resource_group", + "value": "", + "enabled": true, + "secret": false, + "type": "text" + }, + { + "name": "orchestration_service_url", + "value": "", + "enabled": true, + "secret": false, + "type": "text" + }, + { + "name": "access_token", + "value": "", + "enabled": true, + "secret": true, + "type": "text" + } + ], + "name": "intprod" + } + ], + "root": { + "request": { + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "state": "", + "pkce": false, + "credentialsPlacement": "basic_auth_header", + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + }, + "vars": { + "req": [ + { + "name": "region", + "value": "prod.eu-central-1.aws", + "enabled": true, + "local": false, + "uid": "oYVk4DuVpyYqqP2roBVjE" + }, + { + "name": "baseUrl", + "value": "", + "enabled": true, + "local": false, + "uid": "I4KjDm7FxpSRwUYzjwfPG" + }, + { + "name": "auth_url", + "value": "", + "enabled": true, + "local": false, + "uid": "zuftvyCURtA9XYErCYDgo" + }, + { + "name": "client_id", + "value": "", + "enabled": true, + "local": false, + "uid": "JfGEVKm71BYTgR8UkQUGv" + }, + { + "name": "client_secret", + "value": "", + "enabled": true, + "local": false, + "uid": "ls3RYTJ40baTl8eYmilGt" + }, + { + "name": "AWS_ACCESS_KEY_ID", + "value": "", + "enabled": true, + "local": false, + "uid": "2O0YTTAdmYltm5XiHMhP2" + }, + { + "name": "AWS_SECRET_ACCESS_KEY", + "value": "", + "enabled": true, + "local": false, + "uid": "8rc4RYyPcHXyTkAnnI981" + }, + { + "name": "BUCKET_NAME", + "value": "", + "enabled": true, + "local": false, + "uid": "HqFIe8Rvc14i41WIAGGkl" + }, + { + "name": "DATABASE_URL", + "value": "https://s3-eu-central-1.amazonaws.com", + "enabled": true, + "local": false, + "uid": "aWIwuJZH5XQ5Guu2D69Sq" + } + ] + } + }, + "docs": "Provides tools to manage your scenarios and workflows in SAP AI Core. Execute pipelines as a batch job, for example to pre-process or train your models, or perform batch inference. Serve inference requests of trained models. Deploy а trained machine learning model as a web service to serve inference requests with high performance. Register your own Docker registry, synchronize your AI content from your own git repository, and register your own object store for training data and trained models.\n", + "meta": { + "name": "AI Core" + } + }, + "brunoConfig": { + "version": "1", + "name": "AI Core", + "type": "collection", + "ignore": [ + "node_modules", + ".git" + ], + "size": 0.10747432708740234, + "filesCount": 151 + } +} diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image-br01.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image-br01.png new file mode 100644 index 000000000..5424ea51d Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image-br01.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image-br02.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image-br02.png new file mode 100644 index 000000000..4ed9d9ab0 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image-br02.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image-br03.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image-br03.png new file mode 100644 index 000000000..2347470e7 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image-br03.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image-br04.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image-br04.png new file mode 100644 index 000000000..9f8a175e4 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image-br04.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image-br05.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image-br05.png new file mode 100644 index 000000000..69a105ef0 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image-br05.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image-br06.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image-br06.png new file mode 100644 index 000000000..81128b34b Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image-br06.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_007.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_007.png new file mode 100644 index 000000000..0cdc4cf4a Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_007.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_008.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_008.png new file mode 100644 index 000000000..2f12f021a Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_008.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_009.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_009.png new file mode 100644 index 000000000..1c979c6b0 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_009.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_1.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_1.png new file mode 100644 index 000000000..6db3eb05c Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_1.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_10.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_10.png new file mode 100644 index 000000000..275de8254 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_10.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_19.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_19.png new file mode 100644 index 000000000..91498a203 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_19.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_21.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_21.png new file mode 100644 index 000000000..dd9f9f22b Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_21.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_22.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_22.png new file mode 100644 index 000000000..abcae67d6 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_22.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_23.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_23.png new file mode 100644 index 000000000..97b0bc60f Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_23.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_24.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_24.png new file mode 100644 index 000000000..5471c2e38 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_24.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_25.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_25.png new file mode 100644 index 000000000..afdb0e197 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_25.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_26.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_26.png new file mode 100644 index 000000000..a2107fe85 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_26.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_27.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_27.png new file mode 100644 index 000000000..ec99b587c Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_27.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_29.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_29.png new file mode 100644 index 000000000..bd3a81ebc Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_29.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_31.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_31.png new file mode 100644 index 000000000..7a1a959fb Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_31.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_32.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_32.png new file mode 100644 index 000000000..fe827f346 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_32.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_33.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_33.png new file mode 100644 index 000000000..546d43b52 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_33.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_34.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_34.png new file mode 100644 index 000000000..4fa0960a1 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_34.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_40.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_40.png new file mode 100644 index 000000000..bc104b465 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_40.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_41.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_41.png new file mode 100644 index 000000000..975e57dc3 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_41.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_43.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_43.png new file mode 100644 index 000000000..d594ffa7c Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_43.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_44.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_44.png new file mode 100644 index 000000000..8b352c79e Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_44.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_45.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_45.png new file mode 100644 index 000000000..7cf1a3f63 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_45.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_46.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_46.png new file mode 100644 index 000000000..ef67d82f2 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_46.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_46_01.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_46_01.png new file mode 100644 index 000000000..131317edd Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_46_01.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_46a.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_46a.png new file mode 100644 index 000000000..c493e2a5d Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_46a.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_47.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_47.png new file mode 100644 index 000000000..fc729b5ea Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_47.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_48.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_48.png new file mode 100644 index 000000000..a7d8b132f Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_48.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_49.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_49.png new file mode 100644 index 000000000..2a2bbcd75 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_49.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_5.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_5.png new file mode 100644 index 000000000..bc6b2a187 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_5.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_50.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_50.png new file mode 100644 index 000000000..74fea1ca6 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_50.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_6.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_6.png new file mode 100644 index 000000000..0d7a4a11a Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_6.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_ail_or1.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_ail_or1.png new file mode 100644 index 000000000..1c754cd04 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_ail_or1.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_ail_or2.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_ail_or2.png new file mode 100644 index 000000000..aac9bd73a Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_ail_or2.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_ail_or3.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_ail_or3.png new file mode 100644 index 000000000..c3548b9f8 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_ail_or3.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_br_dt.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_br_dt.png new file mode 100644 index 000000000..841683c51 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_br_dt.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_br_mtrs.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_br_mtrs.png new file mode 100644 index 000000000..b2fe6925a Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_br_mtrs.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_br_or1.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_br_or1.png new file mode 100644 index 000000000..8af37314e Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_br_or1.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_br_pr.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_br_pr.png new file mode 100644 index 000000000..22d143968 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_br_pr.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_objsec.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_objsec.png new file mode 100644 index 000000000..f4905708c Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_objsec.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_py03.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_py03.png new file mode 100644 index 000000000..e82630d0a Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_py03.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_py_con.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_py_con.png new file mode 100644 index 000000000..12bf2650b Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_py_con.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_py_dtst.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_py_dtst.png new file mode 100644 index 000000000..71f8ba2ee Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_py_dtst.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_py_or1.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_py_or1.png new file mode 100644 index 000000000..0469ab08c Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_py_or1.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_py_rk.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_py_rk.png new file mode 100644 index 000000000..36b500fa1 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_py_rk.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_py_rnk1.png b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_py_rnk1.png new file mode 100644 index 000000000..af5cb4a4d Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/image_py_rnk1.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/img/requirements.txt b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/requirements.txt new file mode 100644 index 000000000..c9e0b941d --- /dev/null +++ b/tutorials/ai-core-genaihub-evaluation-comprehensive/img/requirements.txt @@ -0,0 +1,7 @@ +generative-ai-hub-sdk==4.4.3 +python-dotenv==1.0.1 +boto3==1.37.4 +pandas==2.2.3 +json2html==1.3.0 +numpy==1.26.4 +ipywidgets==8.1.0 \ No newline at end of file diff --git a/tutorials/ai-core-genaihub-evaluation-comprehensive/sample.env b/tutorials/ai-core-genaihub-evaluation-comprehensive/sample.env new file mode 100644 index 000000000..09eeddf3f --- /dev/null +++ b/tutorials/ai-core-genaihub-evaluation-comprehensive/sample.env @@ -0,0 +1,13 @@ +# AICORE CREDENTIALS +AICORE_CLIENT_ID= +AICORE_CLIENT_SECRET=AICORE CLIENT SECRET> +AICORE_AUTH_URL= +AICORE_BASE_URL= + +# AWS CREDENTIALS +AWS_ACCESS_KEY= +AWS_BUCKET_ID=> +AWS_REGION= +AWS_SECRET_ACCESS_KEY= +AWS_USERNAME= +AWS_HOST= diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/DATASET_RAG/AICore_feature_description.pdf b/tutorials/ai-core-genaihub-evaluation-with-grounding/DATASET_RAG/AICore_feature_description.pdf new file mode 100644 index 000000000..b1de4b040 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/DATASET_RAG/AICore_feature_description.pdf differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/DATASET_RAG/context_output.pdf b/tutorials/ai-core-genaihub-evaluation-with-grounding/DATASET_RAG/context_output.pdf new file mode 100644 index 000000000..b990413c0 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/DATASET_RAG/context_output.pdf differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/DATASET_RAG/testdata/emanual.csv b/tutorials/ai-core-genaihub-evaluation-with-grounding/DATASET_RAG/testdata/emanual.csv new file mode 100644 index 000000000..7ccddeeec --- /dev/null +++ b/tutorials/ai-core-genaihub-evaluation-with-grounding/DATASET_RAG/testdata/emanual.csv @@ -0,0 +1,51 @@ +topic,answer,context +'I want to enter into Ambient mode. How can I do that?',"'To enter into Ambient Mode, you need to press the specified button on the remote control. In the provided context, it mentions that to enter Ambient Mode, you should press the button on the Samsung TV remote control.'","Changing the content and settings for Ambient Mode. When you press the button in Ambient Mode , the Ambient Mode browser screen appears. In the Ambient Mode browser screen, you can select content and change the Ambient Mode settings. Setting up the content for Ambient Mode The Ambient Mode browser screen displays content at the top and categories at the bottom. Use the left or right directional buttons in the content list at the top to move the focus to content you want, and then press the Select button. The selected content is played in Ambient Mode In the future, more content that you can set up in the Ambient Mode browser will be provided. You can select the following categories and content: Decor : Allows you to select beautiful screens. Info : Provides information such as weather, news headlines, and more. This function may not be supported depending on the geographical area. Photo : Allows you to set a picture stored in your mobile device as the wallpaper of the Ambient Mode screen. You can configure special layouts using your photos. To save photos from your mobile device to the TV and import them in Ambient Mode , use the SmartThings app on your mobile device. Setting up the Ambient Mode details In the Ambient Mode browser screen, move the focus to , and then press the Select button. You can change the following settings: Brightness : Adjusts the screen brightness for Ambient Mode Color Tone : Adjusts the colors of the screen for Ambient Mode Auto Brightness : Changes the auto brightness setting for Ambient Mode When this function is set to Off , the brightness level of the TV screen is not automatically adjusted according to the ambient light level. Ambient Off Timer : Sets the time that the Ambient Mode screen turns off automatically. If there is no remote control input for the set time, the TV switches to the black screen state. Changing the background color of Ambient Mode To change the background color of Ambient Mode , move the focus to in the Ambient Mode browser screen, and then press the Select button. You can change the background color or pattern. Move the focus to a color or pattern you want, and then press the Select button. Take a picture of a wall using the SmartThings app on your mobile device to set it as the background of Ambient Mode This function may have a delay in image transmission and optimization depending on the network conditions. Changing the content and settings for Ambient Mode. When you press the button in Ambient Mode , the Ambient Mode browser screen appears. In the Ambient Mode browser screen, you can select content and change the Ambient Mode settings. Setting up the content for Ambient Mode The Ambient Mode browser screen displays content at the top and categories at the bottom. Use the left or right directional buttons in the content list at the top to move the focus to content you want, and then press the Select button. The selected content is played in Ambient Mode In the future, more content that you can set up in the Ambient Mode browser will be provided. You can select the following categories and content: Decor : Allows you to select beautiful screens. Info : Provides information such as weather, news headlines, and more. This function may not be supported depending on the geographical area. Photo : Allows you to set a picture stored in your mobile device as the wallpaper of the Ambient Mode screen. You can configure special layouts using your photos. To save photos from your mobile device to the TV and import them in Ambient Mode , use the SmartThings app on your mobile device. Setting up the Ambient Mode details In the Ambient Mode browser screen, move the focus to , and then press the Select button. You can change the following settings: Brightness : Adjusts the screen brightness for Ambient Mode Color Tone : Adjusts the colors of the screen for Ambient Mode Auto Brightness : Changes the auto brightness setting for Ambient Mode When this function is set to Off , the brightness level of the TV screen is not automatically adjusted according to the ambient light level. Ambient Off Timer : Sets the time that the Ambient Mode screen turns off automatically. If there is no remote control input for the set time, the TV switches to the black screen state. Changing the background color of Ambient Mode To change the background color of Ambient Mode , move the focus to in the Ambient Mode browser screen, and then press the Select button. You can change the background color or pattern. Move the focus to a color or pattern you want, and then press the Select button. Take a picture of a wall using the SmartThings app on your mobile device to set it as the background of Ambient Mode This function may have a delay in image transmission and optimization depending on the network conditions. Using the Ambient Mode. Learn about the functions available in Ambient Mode, which is a QLED TV-specific function. Ambient Mode The image on your TV may differ from the image above depending on the model and geographical area. Ambient Mode , you can view beautiful screens, various visual information, and notifications. To enter Ambient Mode , press the button. To return to the TV mode, press the button. To shut off the TV, press the button. If you press the button when the TV is turned off, the TV turns on in Ambient Mode If you use a remote control other than the Samsung TV remote control, there may be restrictions to entering Ambient Mode Because this function is a QLED TV-specific function, it may not be supported depending on the model. Decor Info Photo " +'Where do I find signal information ?',"'You can find signal information under ""Settings"" -> ""Support"" -> ""Self Diagnosis"" -> ""Signal Information.""'","Checking digital channel signal info and strength. Settings Support Self Diagnosis Signal Information Try Now If your TV is attached to a separate, over-the-air antenna, you can use the signal strength information to adjust the antenna for increased signal strength and improved HD channel reception. Viewing Broadcast Information at a Glance. See an overview of each channel's program lineup.Finding the contact information for service. Settings Support About This TV Try Now You can view the address of the Samsung website, the call center phone number, your TV's model number, your TV's software version, Open Source License and other information you may need to get service support from a Samsung call agent or the Samsung website. You can also view information by scanning the QR code of your TV. You can also start this function by pressing and holding the button for 5 or more seconds. For standard remote control, press and hold the button for 5 or more seconds." +'How can I view the channels that are serached by auto program function and How can I view first five favorite channel?',"'To view the channels that are searched by the auto program function, you can access the Channel List screen by pressing the right directional button on your remote control. From there, you can select ""All Channels"" to display the channels found during the Auto Program channel search.\n\nTo view the first five favorite channels, you can also access the Channel List screen by pressing the right directional button on your remote control. From there, you can select ""Favorites"" and then choose Favorites 1 from the five favorites lists displayed on the screen. This will show you the channels in the first favorites list.'","Scanning for available channels. Settings Broadcasting Auto Program Try Now Do not use this function if your TV is connected to a cable box or satellite box. If the TV has saved channels, Auto Program deletes the existing list, and then saves the newly scanned channels.Viewing and selecting channels on Favorites lists only. Live TV Channel List Press the right directional button on your remote control. Select Favorites Select the desired list among the five favorites lists on the screen. The channels in the selected favorites list appear. Only favorites lists that contain registered channels will display channels.Using the Channel List. Change the channel or check programs on other digital channels. Live TV Channel List Using the Channel List , you can change the channel or check programs on other digital channels while watching TV. The Channel List screen contains the following icons: : A favorite channel : A locked channel. If you press the right directional button when viewing the Channel List screen, you can use the following functions: All Channels Displays the channels that the TV found during the Auto Program channel search. Favorites Displays Favorites 1 Favorites 5 Air Cable To allow you to select Air Cable as the DTV mode, move the focus to Air Cable , and then press the Select button. Air is suitable for homes equipped with separate, over-the-air antennas. Cable is suitable for homes that subscribe to a local cable-service. You do not need to use this function if your TV is connected to a cable box or satellite box. TV PLUS While the TV is connected to the Internet, you can watch popular programs or highlights for each theme through a virtual channel at any time. As with regular channels, the TV provides the daily program schedules for TV PLUS This function may not be supported depending on the model or geographical area. " +'Can I configure Tint?',"'Yes, you can configure Tint by accessing the Picture settings under Expert Settings in the TV menu.'","Inverting the screen color. Settings General Accessibility Color Inversion Try Now You can invert the colors of the text and background displayed on the TV screen to make it easier to read them. Color Inversion is on, some Accessibility menus are not available. This function may not be supported depending on the model or geographical area.Configuring advanced picture settings. Settings Picture Expert Settings Try Now Configure the screen settings to your taste by using the following functions: Backlight Try Now Brightness Try Now Contrast Try Now Sharpness Try Now Color Try Now Tint (G/R) Try Now Apply Picture Settings Try Now Digital Clean View Try Now Auto Motion Plus Settings Try Now When LED Clear Motion is set to , the screen appears darker than when it is Off Local Dimming This function may not be supported depending on the model or geographical area. Contrast Enhancer Try Now HDR+ Mode Try Now Automatically provide an optimal HDR effect based on the video source. The HDR (High Dynamic Range) technology implements video images that are very similar to the images seen through human eyes by finely adjusting the contrast of the source. This function may not be supported depending on the model or geographical area. Film Mode Try Now This function is only available when the input signal is TV, AV, Component (480i, 1080i), or HDMI (1080i). Color Tone Try Now White Balance Try Now Gamma Try Now RGB Only Mode Try Now Color Space Settings Try Now Reset Picture Try Now Changing the content and settings for Ambient Mode. When you press the button in Ambient Mode , the Ambient Mode browser screen appears. In the Ambient Mode browser screen, you can select content and change the Ambient Mode settings. Setting up the content for Ambient Mode The Ambient Mode browser screen displays content at the top and categories at the bottom. Use the left or right directional buttons in the content list at the top to move the focus to content you want, and then press the Select button. The selected content is played in Ambient Mode In the future, more content that you can set up in the Ambient Mode browser will be provided. You can select the following categories and content: Decor : Allows you to select beautiful screens. Info : Provides information such as weather, news headlines, and more. This function may not be supported depending on the geographical area. Photo : Allows you to set a picture stored in your mobile device as the wallpaper of the Ambient Mode screen. You can configure special layouts using your photos. To save photos from your mobile device to the TV and import them in Ambient Mode , use the SmartThings app on your mobile device. Setting up the Ambient Mode details In the Ambient Mode browser screen, move the focus to , and then press the Select button. You can change the following settings: Brightness : Adjusts the screen brightness for Ambient Mode Color Tone : Adjusts the colors of the screen for Ambient Mode Auto Brightness : Changes the auto brightness setting for Ambient Mode When this function is set to Off , the brightness level of the TV screen is not automatically adjusted according to the ambient light level. Ambient Off Timer : Sets the time that the Ambient Mode screen turns off automatically. If there is no remote control input for the set time, the TV switches to the black screen state. Changing the background color of Ambient Mode To change the background color of Ambient Mode , move the focus to in the Ambient Mode browser screen, and then press the Select button. You can change the background color or pattern. Move the focus to a color or pattern you want, and then press the Select button. Take a picture of a wall using the SmartThings app on your mobile device to set it as the background of Ambient Mode This function may have a delay in image transmission and optimization depending on the network conditions. " +'How do I fix the missing/wrong color issue ?',"""To fix the missing/wrong color issue on your TV, you can try the following steps:\n\n1. Run the Start Picture Test to determine if the problem is caused by the TV's settings. \n2. Adjust the color settings by going to Picture Expert Settings and modifying the Color and Tint (G/R) settings.\n3. Ensure that the video input connectors are connected properly to the external device.\n4. If the colors are still incorrect, try adjusting the Picture Mode settings such as Brightness and Sharpness.\n5. Check if Power Saving Mode is enabled and disable it.\n6. If the issue persists, try resetting the picture settings in the Picture Expert Settings or change the Picture Mode to standard 16:9 settings.\n7. If you are using AV or analog equipment, make sure the adapter is connected to the correct input port on the TV.\n8. If none of the above steps resolve the issue, contact Samsung's Call Center for further assistance.""","Testing the picture. Settings Support Self Diagnosis Start Picture Test Try Now Before you review the list of problems and solutions below, use Start Picture Test to determine if the problem is caused by the TV. Start Picture Test displays a high definition picture you can examine for flaws or faults. The problem Try this! Flickering and Dimming Component Connections/ Screen Color Screen Brightness Ghosting, Blurring, or Juddering If your TV is flickering or dimming sporadically, you may need to disable some of the energy efficiency features. Disable Ambient Light Detection Power Saving Mode , or Motion Lighting Settings General Eco Solution Ambient Light Detection Settings General Eco Solution Power Saving Mode Settings General Eco Solution Motion Lighting If the color on your TV screen is not correct or the black and white colors are off, run Start Picture Test Settings Support Self Diagnosis Start Picture Test If the test results indicate that the problem is not caused by the TV, do the following: Confirm that the video input connectors are connected to the correct external device video output connectors. Check the other connections as well. If the TV is connected to an external device via a component cable, confirm that the Pb, Pr, and Y jacks are plugged into their proper connectors. If the colors on your TV are correct but just a little too dark or bright, try adjusting the following settings first. Settings Picture Expert Settings Backlight Settings Picture Expert Settings Contrast Settings Picture Expert Settings Brightness Settings Picture Expert Settings Sharpness Settings Picture Expert Settings Color Settings Picture Expert Settings Tint (G/R) If you notice ghosting or blurring on the screen, use the Auto Motion Plus Settings function to resolve the issue. Settings Picture Expert Settings Auto Motion Plus Settings The problem Try this! Unwanted Powering Off Problems Powering On Unable to find a Channel The TV image does not look as good as it did in the store. The picture is distorted. The color is wrong or missing. If your TV appears to turn off by itself, try disabling some of the TV's energy efficiency functions. See if Sleep Timer has been enabled. The Sleep Timer automatically turns the TV off after a specified period of time. Settings General System Manager Time Sleep Timer If the Sleep Timer has not been enabled, see if Auto Power Off Off Timer has been enabled and disable it. Settings General Eco Solution Auto Power Off Settings General System Manager Time Off Timer If you are having problems powering on your TV, there are a number of things to check before calling the service department. Confirm that the TV's power cord is connected correctly at both ends and that the remote control is operating normally. Make sure that the antenna cable or cable TV cable is firmly connected. If you have a cable box or satellite box, confirm that it is plugged in and turned on. If your TV is not connected to a cable box or satellite box, run Auto Program Settings Broadcasting Auto Program Store displays are all tuned to digital, HD (high definition) channels. If you have an analog cable box or satellite box, upgrade to a digital cable box or satellite box. Use HDMI or Component cables to deliver HD (high definition) picture quality. Many HD channels are upscaled from SD (Standard Definition) content. Look for a channel that is broadcasting HD content. Cable/Satellite Subscribers: Try HD channels from the channel lineup. Air/Cable Antenna Connection: Try HD channels after running the Auto Program function. Settings Broadcasting Auto Program Adjust the cable box or satellite box's video output resolution to 1080i or 720p. The compression of video content may cause picture distortions, especially in fast moving pictures from sports programs and action movies. If the signal reception is weak or poor, screen distortion may be visible but it is not a malfunction. Mobile phones used close to the TV (within 3.2 ft) may cause noise on analog and digital channels. If you're using a Component connection, make sure that the Component cables are connected to the correct jacks. Incorrect or loose connections may cause color problems or a blank screen. The problem Try this! The color is poor or the picture is not bright enough. There is a dotted line on the edge of the screen. The picture is black and white. If the test picture does not appear or there is noise or distortion, the TV may have a problem. Contact Samsung's Call Center for assistance. If the test picture is displayed properly, there may be a problem with an external device. Please check the connections. If the problem persists, check the signal strength or refer to the external device's user manual. Go to Picture and then adjust the Picture Mode Brightness Sharpness , and Color settings. Settings Picture Picture Mode Settings Picture Expert Settings Brightness Settings Picture Expert Settings Sharpness Settings Picture Expert Settings Color See if Power Saving Mode has been enabled. Settings General Eco Solution Power Saving Mode Try resetting the picture. Settings Picture Expert Settings Reset Picture Change Picture Size Settings 16:9 Standard Settings Picture Picture Size Settings 16:9 Standard If you are using AV or analog equipment, disconnect the adapter from the Component (Blue) input port on the TV and connect it to the AV (Yellow) input port. The Component or AV input port may not be provided depending on the model or geographical area. Check whether Grayscale is set to Settings General Accessibility Grayscale There Is a Problem with the Picture. When the TV has trouble with the picture, these steps may help resolve the problem.Inverting the screen color. Settings General Accessibility Color Inversion Try Now You can invert the colors of the text and background displayed on the TV screen to make it easier to read them. Color Inversion is on, some Accessibility menus are not available. This function may not be supported depending on the model or geographical area." +'How do I fix blurring issues on TV screen?',"'To fix blurring issues on a TV screen, you can try adjusting the Auto Motion Plus settings in the Picture Expert settings. This function can help resolve the issue of ghosting or blurring on the screen.'","There Is a Problem with the Picture. When the TV has trouble with the picture, these steps may help resolve the problem.Setting the screen to black and white. Settings General Accessibility Grayscale Try Now You can change the color of the TV screen to black and white to sharpen blurred edges caused by colors. Grayscale is on, some Accessibility menus are not available. This function may not be supported depending on the model or geographical area.Testing the picture. Settings Support Self Diagnosis Start Picture Test Try Now Before you review the list of problems and solutions below, use Start Picture Test to determine if the problem is caused by the TV. Start Picture Test displays a high definition picture you can examine for flaws or faults. The problem Try this! Flickering and Dimming Component Connections/ Screen Color Screen Brightness Ghosting, Blurring, or Juddering If your TV is flickering or dimming sporadically, you may need to disable some of the energy efficiency features. Disable Ambient Light Detection Power Saving Mode , or Motion Lighting Settings General Eco Solution Ambient Light Detection Settings General Eco Solution Power Saving Mode Settings General Eco Solution Motion Lighting If the color on your TV screen is not correct or the black and white colors are off, run Start Picture Test Settings Support Self Diagnosis Start Picture Test If the test results indicate that the problem is not caused by the TV, do the following: Confirm that the video input connectors are connected to the correct external device video output connectors. Check the other connections as well. If the TV is connected to an external device via a component cable, confirm that the Pb, Pr, and Y jacks are plugged into their proper connectors. If the colors on your TV are correct but just a little too dark or bright, try adjusting the following settings first. Settings Picture Expert Settings Backlight Settings Picture Expert Settings Contrast Settings Picture Expert Settings Brightness Settings Picture Expert Settings Sharpness Settings Picture Expert Settings Color Settings Picture Expert Settings Tint (G/R) If you notice ghosting or blurring on the screen, use the Auto Motion Plus Settings function to resolve the issue. Settings Picture Expert Settings Auto Motion Plus Settings The problem Try this! Unwanted Powering Off Problems Powering On Unable to find a Channel The TV image does not look as good as it did in the store. The picture is distorted. The color is wrong or missing. If your TV appears to turn off by itself, try disabling some of the TV's energy efficiency functions. See if Sleep Timer has been enabled. The Sleep Timer automatically turns the TV off after a specified period of time. Settings General System Manager Time Sleep Timer If the Sleep Timer has not been enabled, see if Auto Power Off Off Timer has been enabled and disable it. Settings General Eco Solution Auto Power Off Settings General System Manager Time Off Timer If you are having problems powering on your TV, there are a number of things to check before calling the service department. Confirm that the TV's power cord is connected correctly at both ends and that the remote control is operating normally. Make sure that the antenna cable or cable TV cable is firmly connected. If you have a cable box or satellite box, confirm that it is plugged in and turned on. If your TV is not connected to a cable box or satellite box, run Auto Program Settings Broadcasting Auto Program Store displays are all tuned to digital, HD (high definition) channels. If you have an analog cable box or satellite box, upgrade to a digital cable box or satellite box. Use HDMI or Component cables to deliver HD (high definition) picture quality. Many HD channels are upscaled from SD (Standard Definition) content. Look for a channel that is broadcasting HD content. Cable/Satellite Subscribers: Try HD channels from the channel lineup. Air/Cable Antenna Connection: Try HD channels after running the Auto Program function. Settings Broadcasting Auto Program Adjust the cable box or satellite box's video output resolution to 1080i or 720p. The compression of video content may cause picture distortions, especially in fast moving pictures from sports programs and action movies. If the signal reception is weak or poor, screen distortion may be visible but it is not a malfunction. Mobile phones used close to the TV (within 3.2 ft) may cause noise on analog and digital channels. If you're using a Component connection, make sure that the Component cables are connected to the correct jacks. Incorrect or loose connections may cause color problems or a blank screen. The problem Try this! The color is poor or the picture is not bright enough. There is a dotted line on the edge of the screen. The picture is black and white. If the test picture does not appear or there is noise or distortion, the TV may have a problem. Contact Samsung's Call Center for assistance. If the test picture is displayed properly, there may be a problem with an external device. Please check the connections. If the problem persists, check the signal strength or refer to the external device's user manual. Go to Picture and then adjust the Picture Mode Brightness Sharpness , and Color settings. Settings Picture Picture Mode Settings Picture Expert Settings Brightness Settings Picture Expert Settings Sharpness Settings Picture Expert Settings Color See if Power Saving Mode has been enabled. Settings General Eco Solution Power Saving Mode Try resetting the picture. Settings Picture Expert Settings Reset Picture Change Picture Size Settings 16:9 Standard Settings Picture Picture Size Settings 16:9 Standard If you are using AV or analog equipment, disconnect the adapter from the Component (Blue) input port on the TV and connect it to the AV (Yellow) input port. The Component or AV input port may not be provided depending on the model or geographical area. Check whether Grayscale is set to Settings General Accessibility Grayscale " +'What is the use of universal guide?',"""The Universal Guide app allows users to search for and enjoy various content such as TV shows, dramas, movies, sports broadcasts, and music in one place. It can recommend content based on the user's preferences and notify them of new drama series. Additionally, the Universal Guide app can be accessed on a mobile device using the Samsung SmartThings app.""","Using the Universal Guide App. Search for and enjoy content such as TV shows, dramas, movies, sports broadcasts, and music. Universal Guide The image on your TV may differ from the image above depending on the model and geographical area. Universal Guide is an app that allows you to search for and enjoy various content such as TV shows, dramas, movies, and music in one place. Universal Guide can recommend content tailored to your preferences and notify you of new drama series. You can use this feature on your mobile with Samsung SmartThings app. To enjoy the content from these apps on your TV, they must be installed on the TV. When you watch some paid content, you may need to make a payment using their associated app. Images may look blurry depending on the service provider's circumstances. This function may not be supported depending on the model or geographical area. For You TV Shows Movies Sports Music On Now Accessibility Guidance. Provides a menu and a remote control guide that aid the visually impaired.Using additional functions. You can use the following features on the Source screen. Connection Guide : Displays device connection instructions. Universal Remote : Lets you register external devices to your Samsung Smart Remote and control them using the Remote. This function may not be supported depending on the model or geographical area. For more information, refer to External Devices with the Samsung Smart Remote - Using the Universal " +'What is the feature of Bixby guide?','The feature of the Bixby guide is that it provides a tutorial on how to use Bixby quickly.',"Quick Guides. You can learn quickly how to run and use the frequently used functions, such as Bixby, Ambient Mode, and Smart Hub.Running Bixby. Press and hold the button on your Samsung Smart Remote, say a command, and then release the button. The TV recognizes the voice command. To view the Bixby guide, press the button once: When you press the button for the first time, the Using Bixby button appears at the bottom of the screen. Press the Select button. The Using Bixby popup window appears and a tutorial on using Bixby is shown. When you press the button after the first time, the Enter My Bixby button appears at the bottom of the screen. Press the Select button to go to the My Bixby screen. Running Bixby. Press and hold the button on your Samsung Smart Remote, say a command, and then release the button. The TV recognizes the voice command. To view the Bixby guide, press the button once: When you press the button for the first time, the Using Bixby button appears at the bottom of the screen. Press the Select button. The Using Bixby popup window appears and a tutorial on using Bixby is shown. When you press the button after the first time, the Enter My Bixby button appears at the bottom of the screen. Press the Select button to go to the My Bixby screen. " +'How to launch the last used app automatically?',"'To launch the last used app automatically, go to Settings -> General -> Smart Features -> Autorun Last App. Then, turn on the Autorun Last App function by pressing the Select button at the current menu. When this function is enabled, the last used app will automatically run when you turn on the TV.'","Launching the last used app automatically. Settings General Smart Features Autorun Last App Try Now Autorun Last App is set to , the last used app is automatically run when you turn on the TV. You can also turn this function on or off: press the Select button at the current menu. This function may not be supported depending on the app. Move Remove Installing and running an app. Installing an app Move to the app you want to install, and then press the Select button. The detailed information screen appears. Select Install . When the installation is complete, the Open menu appears. Select Open to run the app immediately. You can view installed apps on the Settings screen. If the TV's internal memory is insufficient, you can install some specific apps on a USB device. You can run an app installed on a USB device only when the USB device is connected to the TV. If the USB device is disconnected while an app is running, the app is terminated. You cannot run an app installed on a USB device on a PC or another TV. Launching an app You can run the app desired from Downloaded App The icons below appear within the selected app's icon and indicate the following: : The app is installed on a USB device. : The app has a password. : The app needs to be updated. : The app supports the mirroring function. Managing purchased or installed apps. Select Settings APPS . You can manage installed apps. The image on your TV may differ from the image above depending on the model and geographical area. Removing an app Select an app to delete. Select Delete The selected app is deleted. Ensure that the related app data is also removed when you remove an app. Adding apps to the Home Screen Select an app to add. Select Add to Home After the Home screen appears, move the selected app to the desired location. Press the Select button. The selected app is added to the Home Screen. Settings Delete Add to Home Lock/Unlock Move Reinstall View Details Auto Update : ON Locking and unlocking apps Select an app to lock or unlock. Select Lock/Unlock The selected app is locked or unlocked. To lock or unlock an app, enter the PIN. The default PIN is You can set the PIN in Settings General System Manager Change PIN Moving apps Select an app to move. Select Move Move the app to the desired location. Press the Select button. The selected app is moved. This function may not be supported depending on the model or geographical area. Reinstalling an app Select the app to install again. Select Reinstall Reinstallation starts. Checking the app information details Select the app to check. Select View Details The app information screen appears. Automatically updating apps Installed apps can be automatically updated. If you do not want apps automatically updated, set Auto Update OFF Automatic update is enabled only when the TV is connected to the Internet." +'Where do I find the list of my favorite channels?',"'You can find the list of your favorite channels by pressing the right directional button on your remote control to access the Live TV Channel List, then selecting ""Favorites"" and choosing the desired list among the five favorites lists displayed on the screen. Only favorites lists that contain registered channels will display the channels you have selected as your favorites.'","Creating a Personal Favorites List. Designate frequently watched channels as favorite channels. Favorite channels are highlighted in the Edit Channels and Channel List screens with the symbol. You can create up to five favorites lists so that your family members can create their own personal favorites list.Viewing and selecting channels on Favorites lists only. Live TV Channel List Press the right directional button on your remote control. Select Favorites Select the desired list among the five favorites lists on the screen. The channels in the selected favorites list appear. Only favorites lists that contain registered channels will display channels.Adding channels to a favorites list. Select channels to add, and then select the icon. The selected channels are added to the favorites list." +' I want to setup a beautiful screens. How can I do that?',"'To set up beautiful screens, you can use the Ambient Mode feature on your TV. Simply press the button on your remote control to enter Ambient Mode, where you can view various visual information and notifications. To return to TV mode, press the button.'","Setting Up a Schedule Viewing. Configure the TV to show a specific channel or program at a specific time and date. The icon appears next to programs that have been configured for a schedule viewing. To set up a schedule viewing, you must first set the TV's clock ( Settings General System Manager Time Clock Setting up a schedule viewing The Guide Screen On the Guide screen, select a program you would like to view, and then press and hold the Select button. Select Schedule Viewing on the pop-up menu that appears. The Program Info Screen Press the Select button while watching the TV. The Program Info window appears. Select a broadcast scheduled program by using the left or right directional buttons, and then press the Select button. You can schedule watching the program by selecting Schedule Viewing Setting the screen to black and white. Settings General Accessibility Grayscale Try Now You can change the color of the TV screen to black and white to sharpen blurred edges caused by colors. Grayscale is on, some Accessibility menus are not available. This function may not be supported depending on the model or geographical area.Displaying the Home Screen. Press the button. The image on your TV may differ from the image above depending on the model and geographical area. On the Home Screen, you can easily run the apps you have used previously or frequently. The apps can also be moved or deleted from the screen. Notification You can view a list of notifications for all events that occur on your TV. A notification appears on the screen when it is time to view a scheduled program or when an event occurs on a registered device. If you move the focus to Notification , and then press the Select button, a notification window appears on the right and the following functions are available: Delete All You can delete all your notifications. Settings You can select services you want to be notified about. When you select Allow sound , notifications are displayed with a notification sound. Sources Connection HDMI 1 HDMI 2 USB 1 USB 2 Source Universal Remote Guide Settings When the focus is moved to the icon, a list of quick settings icons appears above the top of the menu. You can quickly set frequently used functions by clicking the icons. Picture Mode You can select the picture mode that provides the best viewing experience. To change the picture mode, press the Select button. To make fine adjustments, press the up directional button, and then select Picture Setup Sound Mode You can select a sound mode to optimize your listening experience. To change the sound mode, press the Select button. To make fine adjustments, press the up directional button, and then select Equalizer Setup Sound Output You can select which speakers the TV uses for audio output. To change the audio output, press the Select button. To connect to a Bluetooth speaker, press the up directional button, and then select Speaker List Connecting Bluetooth speaker may not be supported depending on the model or geographical area. Caption You can watch TV broadcasts with captions. To activate/deactivate the Caption function, press the Select button. To run Accessibility Shortcuts , press the up directional button, and then select Accessibility Shortcuts Sleep Timer You can have the TV automatically turn off at a specific time. To change the sleep time, press the Select button. To set the specific time at which the TV turns off automatically, press the up directional button, and then select Set Up Off Timer Network You can view the current network and Internet status. Press the up directional button, and then select Network Status Network Settings Pressing Settings displays all setting menus available. This function may not be supported depending on the model or geographical area. Source You can select an external device connected to the TV. For more information, refer to between external devices connected to the Search You can search the apps or games in Smart Hub services. To use this feature, the TV must be connected to the Internet. APPS You can enjoy a wide range of content including news, sports, weather, and games by installing the corresponding apps on your TV. To use this feature, the TV must be connected to the Internet. For more information, refer to the APPS Ambient Mode Ambient Mode , you can view beautiful screens, various visual information, and notifications. To enter Ambient Mode , press the button. To return to the TV mode, press the button. To shut off the TV, press the button. If you press the button when the TV is turned off, the TV turns on in Ambient Mode If you use a remote control other than the Samsung TV remote control, there may be restrictions to entering Ambient Mode Because this function is a QLED TV-specific function, it may not be supported depending on the model. For more information about the Ambient Mode , refer to the Ambient Universal Guide Universal Guide is an app that allows you to search for and enjoy various content such as TV shows, dramas, movies, and music in one place. Universal Guide can recommend content tailored to your preferences and notify you of new drama series. For more information about the Universal Guide , refer to the Universal Guide Images may look blurry depending on the service provider's circumstances. This function may not be supported depending on the model or geographical area. The image on your TV may differ from the image above depending on the model and geographical area. Moving an item on the Home Screen Move the focus to the app you want to move, press the down directional button on the remote control, and then select Move . Move the app to the desired position by pressing the left or right directional button, and then press the Select button. The selected app is moved on the Home screen. Removing an item on the Home Screen Move the focus to the app you want to delete, press the down directional button on the remote control, and then select Remove . The selected app is deleted. You can add the apps you want to use often to the Home Screen using APPS. To add your favorite apps to the Home Screen, refer to purchased or installed" +'How do I record using time Timeshift function?',"'To record using the Timeshift function, you can press the button while watching the program to display a menu. From this menu, you can select the ""Record"" option, which immediately starts to record the program you are watching.'","Using Timeshift. Play, pause, or rewind live TV. You can pause and rewind live TV by using Timeshift . Activating this function disables the Broadcasting function. Try Now To activate the Timeshift function while watching TV, press the button and then select Timeshift , or press the button. This function is not available in the U.S.A. and Canada. This function may not be supported depending on the model or geographical area. When the Timeshift function is running, some functions are not available in Settings Broadcasting To use Timeshift, you must connect a USB device for recording. Read all precautions before using the Timeshift function. For more information, refer to Using the Recording and TimeshiftBefore Using the Recording and Timeshift Functions. Read these instructions before using Recording and Timeshift.Using the Instant and Schedule Recording options from the program info. window Recording the program you are watching To display a menu for recording a program or starting the Timeshift function, press the button while watching the program. You can select the following functions on the menu: Record : Immediately starts to record the program you are watching. Timeshift : Starts the Timeshift function. Stop : Stops the recording or the Timeshift function that is running. Info : Displays the details about the program you ware watching. Scheduling viewing from the program info window Press the Select button while viewing a program to display the program info window. In the program info window, use the left and right directional buttons to select a broadcast scheduled program, and then press the Select button. To schedule a recording for the program, select Schedule Recording" +'My IP auto setting failed. How to configure it?',"'To configure the IP auto setting that failed, go to Settings > General > Network > Network Status > IP Settings. Make sure that the DHCP server is enabled on the router. Then, unplug the router and plug it back in. Reset the router if required. For a wireless connection, make sure to enter the correct password if needed.'","Troubleshooting Internet connectivity issues. Try Now After reading the following content, troubleshoot the Internet connection issue. If the problem persists, contact your Internet Service Provider. No network cable found Make sure that the LAN cable is plugged in on both ends. If it is plugged in, make sure that the router is turned on. If the router is on, try turning it off and then on. Wireless network connection failed If a selected wireless router is not found, go to Open Network Settings , and then select the correct router. Settings General Network Open Network Settings Unable to connect to a wireless router Check if the router is turned on. If it is, turn it off and then on. Enter the correct password if required. IP auto setting failed Configure the settings in IP Settings Settings General Network Network Status IP Settings Make sure that the DHCP server is enabled on the router, and then unplug the router and plug it back in. Reset the router if required. For wireless connection, enter the correct password if required. Unable to connect to the network Check all IP Settings Settings General Network Network Status IP Settings After checking the DHCP server status (must be active) on the router, remove the LAN cable, and then connect it again. For wireless connection, enter the correct password if required. Connected to a local network, but not to the Internet Make sure that the Internet LAN cable is connected to the router's external LAN port. Check the DNS values in IP Settings Settings General Network Network Status IP Settings Network setup is complete, but unable to connect to the Internet If the problem persists, contact your Internet Service Provider. Setting up an Internet connection over IPv6. Configuring the IPv6 connection settings Settings General Network Expert Settings IPv6 This function may not be supported depending on the model or geographical area. Checking the Internet connection set up over IPv6 Settings General Network Expert Settings IPv6 Status This function may not be supported depending on the model or geographical area.Connecting an IP control device to the TV. Settings General Network Expert Settings IP Remote You can connect an IP control device to the TV remotely to manage your IP device. To use this function, Power On with Mobile must be turned on. Turning this feature on may allow other IP control devices to control your TV. We recommend turning this feature on only if an authorized third-party custom controller is installed and configured specifically for a Samsung TV and if your Wi-Fi network is password protected. This function may not be supported depending on the model or geographical area." +'How can I connect my mobile device to the TV?',"'To connect your mobile device to the TV, you can follow these steps:\n\n1. Make sure your TV supports SmartThings and that you are signed in to your Samsung account on the TV.\n2. Start the SmartThings app on your mobile device.\n3. Tap on Add Device on the dashboard of the SmartThings app.\n4. Connectable TVs will be searched for - select the model of your TV from the search results.\n5. Enter the PIN number displayed on the TV screen.\n6. Once the TV is registered with your Samsung account, you can use the SmartThings app to control the TV.\n7. Make sure both your mobile device and the TV are connected to the same network for optimal functionality.'","Connecting to the TV via the SmartThings app. Turn on the TV. Start the SmartThings app on your mobile device. Tap Add Device on the dashboard of the SmartThings app. Connectable TVs are searched for. From the search results, select the model of your TV. Enter the PIN number displayed on the TV screen. Once the TV is registered with your Samsung account, you can use the SmartThings app to control the TV. If your TV is not found on your mobile device, turn both of them off and on, and then try again. If you have more than one TV, you can easily select the TV you want by setting different TV names at Settings General System Manager Device Name Connecting Your Mobile Device. Connect your mobile device to the TV on the same network to control the TV with the mobile device or enjoy the content of the mobile device on the TV. Use the SmartThings app on your mobile device to quickly and easily connect to the TV and control its main settings. In addition, you can remotely check and control the statuses of various devices such as Samsung smart devices, home appliances, and speakers registered to the server. Make sure that your TV supports SmartThings. You can check this with the Supported Device menu in the SmartThings app. To use the SmartThings app, you must be signed in to your Samsung account on the TV. This function may not be supported depending on the TV model or mobile device. The supported functions may differ depending on the version of the SmartThings app. You can install the SmartThings app from App Store or Google Play Store.Connection notes for mobile devices. To use the Smart View function, the mobile device must support a mirroring function such as Screen Mirroring or Smart View. To check whether your mobile device supports the mirroring function, please visit the mobile device manufacturer's website. To use Wi-Fi Direct, the mobile device must support the Wi-Fi Direct function. Please check if your mobile device supports Wi-Fi Direct. The mobile device and your Smart TV must be connected to each other on the same network. The video or audio may stop intermittently, depending on network conditions. When sharing content with other network-based devices such as those in an IP (Internet Protocol) storage system, sharing may not be supported due to the network's configuration, quality, or functionality, for example, if the network has an NAS (Network-Attached Storage) device. " +'How to configure Contrast and Sharpness?',"'To configure Contrast and Sharpness, go to the Picture Expert Settings and use the functions provided there. Adjust the Contrast and Sharpness settings to your preference.'","Configuring advanced picture settings. Settings Picture Expert Settings Try Now Configure the screen settings to your taste by using the following functions: Backlight Try Now Brightness Try Now Contrast Try Now Sharpness Try Now Color Try Now Tint (G/R) Try Now Apply Picture Settings Try Now Digital Clean View Try Now Auto Motion Plus Settings Try Now When LED Clear Motion is set to , the screen appears darker than when it is Off Local Dimming This function may not be supported depending on the model or geographical area. Contrast Enhancer Try Now HDR+ Mode Try Now Automatically provide an optimal HDR effect based on the video source. The HDR (High Dynamic Range) technology implements video images that are very similar to the images seen through human eyes by finely adjusting the contrast of the source. This function may not be supported depending on the model or geographical area. Film Mode Try Now This function is only available when the input signal is TV, AV, Component (480i, 1080i), or HDMI (1080i). Color Tone Try Now White Balance Try Now Gamma Try Now RGB Only Mode Try Now Color Space Settings Try Now Reset Picture Try Now Adjusting the Picture Quality. Change the Picture Mode and adjust Expert Settings.Setting the screen to black and white. Settings General Accessibility Grayscale Try Now You can change the color of the TV screen to black and white to sharpen blurred edges caused by colors. Grayscale is on, some Accessibility menus are not available. This function may not be supported depending on the model or geographical area." +'What are the steps to reset network?',"'To reset the network, follow these steps:\n\n1. Go to Settings.\n2. Select General.\n3. Select Network.\n4. Choose Network Reset.\n5. Select Try Now.\n6. Restore the network settings to the factory default.'","Resetting Your Network. Settings General Network Reset Network Try Now Restore the network settings to the factory default.Restoring the TV to the factory settings. Settings Support Self Diagnosis Reset Try Now You can restore all TV settings (excluding the Internet and network settings) to the factory defaults. Select Reset . The security PIN entry window appears. Enter the security PIN, and then select Yes . All settings are then reset. The TV turns off and on again automatically and displays the Start Setup screen. For more information on Start Setup , refer to the user manual that came with the TV. Troubleshooting Internet connectivity issues. Try Now After reading the following content, troubleshoot the Internet connection issue. If the problem persists, contact your Internet Service Provider. No network cable found Make sure that the LAN cable is plugged in on both ends. If it is plugged in, make sure that the router is turned on. If the router is on, try turning it off and then on. Wireless network connection failed If a selected wireless router is not found, go to Open Network Settings , and then select the correct router. Settings General Network Open Network Settings Unable to connect to a wireless router Check if the router is turned on. If it is, turn it off and then on. Enter the correct password if required. IP auto setting failed Configure the settings in IP Settings Settings General Network Network Status IP Settings Make sure that the DHCP server is enabled on the router, and then unplug the router and plug it back in. Reset the router if required. For wireless connection, enter the correct password if required. Unable to connect to the network Check all IP Settings Settings General Network Network Status IP Settings After checking the DHCP server status (must be active) on the router, remove the LAN cable, and then connect it again. For wireless connection, enter the correct password if required. Connected to a local network, but not to the Internet Make sure that the Internet LAN cable is connected to the router's external LAN port. Check the DNS values in IP Settings Settings General Network Network Status IP Settings Network setup is complete, but unable to connect to the Internet If the problem persists, contact your Internet Service Provider. " +'How do I view a list of mobile devices registered to the TV?',"'To view a list of mobile devices registered to the TV, you would need to go to the ""Managing Mobile Devices"" section on the TV settings. Under the Device Connect Manager, there should be an option called ""Device List"" where you can view, edit, or remove a list of mobile devices registered to the TV.'","Managing Mobile Devices. Settings General External Device Manager Device Connect Manager Try Now Access Notification Set whether to display a notification when a mobile device attempts to connect to the TV. Device List View, edit, or remove a list of mobile devices registered to the TV.Connecting to the TV via the SmartThings app. Turn on the TV. Start the SmartThings app on your mobile device. Tap Add Device on the dashboard of the SmartThings app. Connectable TVs are searched for. From the search results, select the model of your TV. Enter the PIN number displayed on the TV screen. Once the TV is registered with your Samsung account, you can use the SmartThings app to control the TV. If your TV is not found on your mobile device, turn both of them off and on, and then try again. If you have more than one TV, you can easily select the TV you want by setting different TV names at Settings General System Manager Device Name Using the SmartThings App. Monitor and configure the smart devices you have connected to the SmartThings service. SmartThings The image on your TV may differ from the image above depending on the model and geographical area. From your TV, you can monitor and configure smart devices you connected to the SmartThings service via the SmartThings app installed on them. You can receive notifications about the statuses of your smart devices. To use this function, you must be logged in to your Samsung account. This function may not be supported depending on the model or geographical area. Location selection list Press the Select button to identify all connected devices or the devices specific to a location. Sign In Go to the Samsung Account screen to create a new account or sign in to your account. Set Your Scene Devices All Devices Settings Turn notifications on or off for the connected smart devices. Set Your Scene Create a custom mode to easily control multiple devices with a single command. Device list View a list of the connected smart devices and their statuses. You can add a new device through the SmartThings app on your mobile phone. Please refer to the user guide of the SmartThings app for more details." +"""I get this error 'some files cannot be played'. How do I fix this?""","""To fix the error of 'some files cannot be played', you can try the following steps:\n\n1. Make sure the file format is supported by the media player or device you are using.\n2. Check the encoding method of the file and ensure it is compatible with the media player.\n3. If the content is not playing smoothly due to network communication problems, try using a USB device instead.\n4. Connect a USB device to the USB port of your device, then browse the list of multimedia content stored on the device and select a content item to play.""","My File Won't Play. When files don't play, this may help resolve the problem. The problem Try this! Some files can't be played.Read Before Playing Photo, Video, or Music Files. Read this information before playing media content.Playing media content. Playing multimedia content on a PC or mobile device Select a device with media content in Source . The media content list in the device appears. Select a media content item from the list. The content is played. For more information about buttons, refer to and functions available while playing multimedia The content may not be played depending on the encoding method or file format. Furthermore, several functions may be restricted. Content on devices connected to the TV via your network may not play smoothly due to network communication problems. If this occurs, use a USB device. For more information about connecting to mobile devices, refer to Your Mobile Playing multimedia content on a USB device Connect a USB device to the USB port. When a pop up message appears on the screen, select Browse to easily move to the list of multimedia content stored on the device. Select a content item to play from the list. The content is played. " +'How do I set scheduled viewing time?',"'To set scheduled viewing time, first set the TV\'s clock by going to Settings --> General --> System Manager --> Time Clock. Then, on the Guide screen, select a program you would like to view, press and hold the Select button, and choose ""Schedule Viewing"" from the pop-up menu that appears. Alternatively, while watching TV, press the Select button to bring up the Program Info screen, select a program using the directional buttons, and choose ""Schedule Viewing"" to set the viewing time.'","Setting Up a Schedule Viewing. Configure the TV to show a specific channel or program at a specific time and date. The appears next to programs that have been configured for a scheduled viewing. To set up a scheduled viewing, you must first set the TV's clock ( Settings General System Manager Time ClockSetting Up a Schedule Viewing. Configure the TV to show a specific channel or program at a specific time and date. The icon appears next to programs that have been configured for a schedule viewing. To set up a schedule viewing, you must first set the TV's clock ( Settings General System Manager Time Clock Setting up a schedule viewing The Guide Screen On the Guide screen, select a program you would like to view, and then press and hold the Select button. Select Schedule Viewing on the pop-up menu that appears. The Program Info Screen Press the Select button while watching the TV. The Program Info window appears. Select a broadcast scheduled program by using the left or right directional buttons, and then press the Select button. You can schedule watching the program by selecting Schedule Viewing Setting up a scheduled viewing for a channel. Using the schedule viewing options from the guide screen On the Guide screen, select a program you would like to view, press and hold the Select button, and then select Schedule Viewing in the pop-up menu that appears. Using the schedule viewing options in the program info window Press the Select button while watching TV. The Program Info window appears. Select a program that will be broadcast using the left or right directional buttons. Press the Select button, and then select Schedule Viewing schedule viewing of that program. Using schedule viewing to view programs at a specified time on a specified date (U.S.A and Canada) Live TV Schedule Manager Add Viewing (Other countries) Live TV Schedule Manager Recordings Schedules Add Viewing To schedule a viewing, set the values for Antenna Channel Repeat Date and Start Time of the broadcast program to view, and then select Save Check the current time in Settings General System Manager Time Clock This function may not be supported depending on the model or geographical area. " +'Can I scan TV for malicious code ?',"'Yes, you can scan your TV for malicious code by accessing the Smart Security feature in the Settings of your TV. It will scan the TV and connected storage for any malicious code, and if any is found, you can quarantine it to prevent it from executing.'","Scanning the TV and connected storage for malicious code. Settings General System Manager Smart Security Scan If malicious code is found, the results of the scan will appear on the screen. This result window displays all malicious code found. You can quarantine (isolate) the code. Quarantining prevents it from executing. Select all the malicious code to quarantine. Select Isolate The selected malicious code is relocated to the Isolated List The Isolated List displays all quarantined malicious codes. Protecting the TV from Hacking and Malicious Code. Protect your TV from hacking attempts and malicious code. Settings General System Manager Smart Security Try Now You can use Smart Security to protect the TV from hacking and malicious code when it is connected to the Internet.Scanning for available channels. Settings Broadcasting Auto Program Try Now Do not use this function if your TV is connected to a cable box or satellite box. If the TV has saved channels, Auto Program deletes the existing list, and then saves the newly scanned channels." +'What is decor and how to set wallpaper of the Ambient Mode screen?',"'Decor is a category in Ambient Mode that allows you to select beautiful screens for your TV. To set a wallpaper for the Ambient Mode screen from your mobile device, you can use the Photo category. Select a picture stored in your mobile device and import it into Ambient Mode using the SmartThings app.'","Changing the content and settings for Ambient Mode. When you press the button in Ambient Mode , the Ambient Mode browser screen appears. In the Ambient Mode browser screen, you can select content and change the Ambient Mode settings. Setting up the content for Ambient Mode The Ambient Mode browser screen displays content at the top and categories at the bottom. Use the left or right directional buttons in the content list at the top to move the focus to content you want, and then press the Select button. The selected content is played in Ambient Mode In the future, more content that you can set up in the Ambient Mode browser will be provided. You can select the following categories and content: Decor : Allows you to select beautiful screens. Info : Provides information such as weather, news headlines, and more. This function may not be supported depending on the geographical area. Photo : Allows you to set a picture stored in your mobile device as the wallpaper of the Ambient Mode screen. You can configure special layouts using your photos. To save photos from your mobile device to the TV and import them in Ambient Mode , use the SmartThings app on your mobile device. Setting up the Ambient Mode details In the Ambient Mode browser screen, move the focus to , and then press the Select button. You can change the following settings: Brightness : Adjusts the screen brightness for Ambient Mode Color Tone : Adjusts the colors of the screen for Ambient Mode Auto Brightness : Changes the auto brightness setting for Ambient Mode When this function is set to Off , the brightness level of the TV screen is not automatically adjusted according to the ambient light level. Ambient Off Timer : Sets the time that the Ambient Mode screen turns off automatically. If there is no remote control input for the set time, the TV switches to the black screen state. Changing the background color of Ambient Mode To change the background color of Ambient Mode , move the focus to in the Ambient Mode browser screen, and then press the Select button. You can change the background color or pattern. Move the focus to a color or pattern you want, and then press the Select button. Take a picture of a wall using the SmartThings app on your mobile device to set it as the background of Ambient Mode This function may have a delay in image transmission and optimization depending on the network conditions. Changing the content and settings for Ambient Mode. When you press the button in Ambient Mode , the Ambient Mode browser screen appears. In the Ambient Mode browser screen, you can select content and change the Ambient Mode settings. Setting up the content for Ambient Mode The Ambient Mode browser screen displays content at the top and categories at the bottom. Use the left or right directional buttons in the content list at the top to move the focus to content you want, and then press the Select button. The selected content is played in Ambient Mode In the future, more content that you can set up in the Ambient Mode browser will be provided. You can select the following categories and content: Decor : Allows you to select beautiful screens. Info : Provides information such as weather, news headlines, and more. This function may not be supported depending on the geographical area. Photo : Allows you to set a picture stored in your mobile device as the wallpaper of the Ambient Mode screen. You can configure special layouts using your photos. To save photos from your mobile device to the TV and import them in Ambient Mode , use the SmartThings app on your mobile device. Setting up the Ambient Mode details In the Ambient Mode browser screen, move the focus to , and then press the Select button. You can change the following settings: Brightness : Adjusts the screen brightness for Ambient Mode Color Tone : Adjusts the colors of the screen for Ambient Mode Auto Brightness : Changes the auto brightness setting for Ambient Mode When this function is set to Off , the brightness level of the TV screen is not automatically adjusted according to the ambient light level. Ambient Off Timer : Sets the time that the Ambient Mode screen turns off automatically. If there is no remote control input for the set time, the TV switches to the black screen state. Changing the background color of Ambient Mode To change the background color of Ambient Mode , move the focus to in the Ambient Mode browser screen, and then press the Select button. You can change the background color or pattern. Move the focus to a color or pattern you want, and then press the Select button. Take a picture of a wall using the SmartThings app on your mobile device to set it as the background of Ambient Mode This function may have a delay in image transmission and optimization depending on the network conditions. Using the Ambient Mode. Learn about the functions available in Ambient Mode, which is a QLED TV-specific function. Ambient Mode The image on your TV may differ from the image above depending on the model and geographical area. Ambient Mode , you can view beautiful screens, various visual information, and notifications. To enter Ambient Mode , press the button. To return to the TV mode, press the button. To shut off the TV, press the button. If you press the button when the TV is turned off, the TV turns on in Ambient Mode If you use a remote control other than the Samsung TV remote control, there may be restrictions to entering Ambient Mode Because this function is a QLED TV-specific function, it may not be supported depending on the model. Decor Info Photo " +'Can I request service I am having problem with the TV?',"'Yes, you can request service if you are having a problem with the TV by selecting the item matching the problem you encountered and then selecting ""Request Now"", ""Schedule Appointment"", or ""Send"". Your service request will be registered, and the Samsung Contact Center will contact you to set up or confirm your service appointment.'","Requesting service. Settings Support Request Support Try Now You can request service when you encounter a problem with the TV. Select the item matching the problem that you encountered, and then select Request Now Schedule Appointment Send . Your service request will be registered. The Samsung Contact Center will contact you to set up or confirm your service appointment. You must agree to the terms and conditions for the service request. This function may not be supported depending on the geographical area. This function requires an Internet connection. Requesting service. Settings Support Request Support Try Now You can request service when you encounter a problem with the TV. Select the item matching the problem that you encountered, and then select Request Now Schedule Appointment Send . Your service request will be registered. The Samsung Contact Center will contact you to set up or confirm your service appointment. You must agree to the terms and conditions for the service request. This function may not be supported depending on the geographical area. This function requires an Internet connection.Troubleshooting. If you have a problem with your TV, you can access various solutions while watching TV. If your problem is not addressed in the Troubleshooting section, contact the Samsung Call Center." +'From where I can see program information?',"'You can see program information on the Guide screen or the Program Info window. To access program information on the Guide screen, select a program you would like to view and press and hold the Select button, then select Schedule Viewing. To access program information on the Program Info window, press the Select button while watching TV, select a program using the directional buttons, and then select Schedule Viewing.'","Viewing Broadcast Information at a Glance. See an overview of each channel's program lineup.Setting up a scheduled viewing for a channel. Using the schedule viewing options from the guide screen On the Guide screen, select a program you would like to view, press and hold the Select button, and then select Schedule Viewing in the pop-up menu that appears. Using the schedule viewing options in the program info window Press the Select button while watching TV. The Program Info window appears. Select a program that will be broadcast using the left or right directional buttons. Press the Select button, and then select Schedule Viewing schedule viewing of that program. Using schedule viewing to view programs at a specified time on a specified date (U.S.A and Canada) Live TV Schedule Manager Add Viewing (Other countries) Live TV Schedule Manager Recordings Schedules Add Viewing To schedule a viewing, set the values for Antenna Channel Repeat Date and Start Time of the broadcast program to view, and then select Save Check the current time in Settings General System Manager Time Clock This function may not be supported depending on the model or geographical area. Setting Up a Schedule Viewing. Configure the TV to show a specific channel or program at a specific time and date. The icon appears next to programs that have been configured for a schedule viewing. To set up a schedule viewing, you must first set the TV's clock ( Settings General System Manager Time Clock Setting up a schedule viewing The Guide Screen On the Guide screen, select a program you would like to view, and then press and hold the Select button. Select Schedule Viewing on the pop-up menu that appears. The Program Info Screen Press the Select button while watching the TV. The Program Info window appears. Select a broadcast scheduled program by using the left or right directional buttons, and then press the Select button. You can schedule watching the program by selecting Schedule Viewing " +'What is source and how to serch data for channels?',"'Source refers to the input signal that is currently displayed on the TV screen. To search for data on channels, you can use the Channel List feature on your TV. This feature allows you to change the channel or check programs on other digital channels while watching TV. The Channel List screen contains icons for favorite channels and locked channels. Additionally, by pressing the right directional button on the Channel List screen, you can access functions such as displaying all channels found during a channel search and selecting between Air and Cable as the DTV mode.'","Changing the input signal. Source When you select a connected external device on the Source screen, the output of the selected device is displayed on the TV's screen. To control a device (Blu-ray player, game console, etc.) that supports universal remote control with the TV's remote, connect the device to an HDMI port on the TV, and then turn on the device. You can switch to the output of the device automatically or you can configure universal remote control for the device automatically. To configure the device for universal remote control automatically, point the TV's remote control at the device. When a USB device is connected to the USB port, a pop-up message appears that lets you switch easily to the media content listed on the device. This function may not be supported depending on the device and geographical area. Connection Guide. You can view detailed information about external devices that can be connected to the TV. Source Connection Guide It shows you how to connect various external devices such as set-top boxes, game consoles, and PCs using pictures. If you select the connection method and an external device, the connection details appear. The image on your TV may differ from the image above depending on the model and geographical area.Using the Channel List. Change the channel or check programs on other digital channels. Live TV Channel List Using the Channel List , you can change the channel or check programs on other digital channels while watching TV. The Channel List screen contains the following icons: : A favorite channel : A locked channel. If you press the right directional button when viewing the Channel List screen, you can use the following functions: All Channels Displays the channels that the TV found during the Auto Program channel search. Favorites Displays Favorites 1 Favorites 5 Air Cable To allow you to select Air Cable as the DTV mode, move the focus to Air Cable , and then press the Select button. Air is suitable for homes equipped with separate, over-the-air antennas. Cable is suitable for homes that subscribe to a local cable-service. You do not need to use this function if your TV is connected to a cable box or satellite box. TV PLUS While the TV is connected to the Internet, you can watch popular programs or highlights for each theme through a virtual channel at any time. As with regular channels, the TV provides the daily program schedules for TV PLUS This function may not be supported depending on the model or geographical area. " +'How can I change Antenna type?',"'To change the antenna type on your TV, you can use the remote control. Press the right directional button to move the focus to Air Cable and then press the Select button. This function allows you to switch between different types of antenna signals. However, this function is not necessary if your TV is already connected to a cable box or satellite box.'","Connecting an Antenna. You can connect an antenna cable to your TV. An antenna connection is not necessary if you connect a cable box or satellite box. Cable Box/Disc Player HDMI Game Console Audio Device Mobile Device External Storage Accessory ANT IN Connection Guide Changing the broadcast signal. Live TV Channel List Press the right directional button on your remote control, move the focus to Air Cable , and then press the Select button. You do not need to use this function if your TV is connected to a cable box or satellite box.Connections. You can watch live broadcasts by connecting an antenna and antenna cable to your TV and can get access to the Internet by using a LAN cable or wireless router. Using various connectors, you can also connect external devices." +'Can I turn on the TV with a mobile device?',"'Yes, you can turn on the TV with a mobile device when both the TV and the mobile device are connected to the same network. This can be done using the SmartThings app on your mobile device.'","Turning on the TV with a mobile device. Settings General Network Expert Settings Power On with Mobile You can turn on the TV with your mobile device when both of them are connected to the same network.Connecting to the TV via the SmartThings app. Turn on the TV. Start the SmartThings app on your mobile device. Tap Add Device on the dashboard of the SmartThings app. Connectable TVs are searched for. From the search results, select the model of your TV. Enter the PIN number displayed on the TV screen. Once the TV is registered with your Samsung account, you can use the SmartThings app to control the TV. If your TV is not found on your mobile device, turn both of them off and on, and then try again. If you have more than one TV, you can easily select the TV you want by setting different TV names at Settings General System Manager Device Name Connecting Your Mobile Device. Connect your mobile device to the TV on the same network to control the TV with the mobile device or enjoy the content of the mobile device on the TV. Use the SmartThings app on your mobile device to quickly and easily connect to the TV and control its main settings. In addition, you can remotely check and control the statuses of various devices such as Samsung smart devices, home appliances, and speakers registered to the server. Make sure that your TV supports SmartThings. You can check this with the Supported Device menu in the SmartThings app. To use the SmartThings app, you must be signed in to your Samsung account on the TV. This function may not be supported depending on the TV model or mobile device. The supported functions may differ depending on the version of the SmartThings app. You can install the SmartThings app from App Store or Google Play Store." +"""What is the function of 'Learn TV Remote'?""","""The function of 'Learn TV Remote' is to help individuals with visual impairments learn the positions of the buttons on the remote control. When activated, pressing a button on the remote will prompt the TV to tell the user the name of that button.""","Learning about the remote control (for the visually impaired). Settings General Accessibility Learn TV Remote Try Now This function helps individuals with a visual impairment to learn the positions of the buttons on the remote control. When this function is activated, you can press a button on the remote control and the TV will tell you its name. To return to the previous screen, press the button twice. To exit, press and hold the button. This function is only available in the U.S.A. and Canada. This function is only available when Voice Guide is enabled. About the Samsung Smart Remote (UHD TV). Learn about the buttons on the Samsung Smart Remote that comes with the UHD TV. Button Description (Bixby) (Number button) (Color button) Directional pad (up/ down/left/right) Select (Return) Press to return to the Home Screen. (Smart Hub) (Play/pause) Move the button up or down to adjust the volume. To mute the sound, press the button. When VOL (Volume) pressed for 1 second or more, the Accessibility Shortcuts appear. (Channel) Use the Samsung Smart Remote less than 20 feet from the TV. The usable distance may vary with the wireless environmental conditions. The images, buttons, and functions of the Samsung Smart Remote may differ with the model. The Samsung Smart Remote may not be supported depending on the model or geographical area. To use the Samsung Smart Remote to control a compatible external device that does not support HDMI-CEC (Anynet+), you must configure universal remote control for the device. For more information, refer to External Devices with the Samsung Smart Remote - Using the Universal To use the Samsung Smart Remote to control a compatible external device that does support HDMI-CEC (Anynet+), connect the device to an HDMI port on the TV using an HDMI cable. For more information, refer to Anynet+ (HDMI- Runs Bixby . Press and hold the button, say a command, and then release the button to run Bixby When you press the button, the Enter My Bixby menu appears at the bottom of the screen. Press to bring up the virtual numeric pad on the screen. Use the numeric keypad to enter numeric values. Select numbers and then select Done to enter a numeric value. Use to change the channel, enter a PIN, enter a ZIP code, etc. If the Color Button appears with a number strip on the screen, select the Color Button and then select a specific color by using the directional pad (up/down/left/right). Use this to access additional options that are specific to the feature in use. Use these colored buttons to access additional options that are specific to the feature in use. These buttons are not available in the U.S.A. and Canada. Moves the focus. Selects or runs a focused item. When pressed while you are watching content, detailed program information appears. Press to return to the previous menu. When pressed for 1 second or more, the running function is terminated. When pressed while you are watching a program, the previous channel appears. When pressed, the playback controls appear. Using these controls, you can control the media content that is playing. Move the button up or down to change the channel. To see the Guide screen, press the button. Press and hold the button up or down to quickly change the channel. Using the e-Manual. Control and read the manual embedded in your TV." +'Can I select Ambient Light Detection ?',"'Yes, you can select Ambient Light Detection, which automatically adjusts the brightness level of the TV according to the ambient light level to reduce power consumption.'","Reducing the energy consumption of the TV. Settings General Eco Solution Try Now You can adjust the brightness level of the TV, reduce overall power consumption, and prevent overheating. Ambient Light Detection Try Now Automatically adjusts the brightness level of the TV, according to the ambient light level, to reduce power consumption. If Ambient Light Detection has adjusted the screen brightness to a too bright or dark level, you can select Minimum Backlight to manually adjust the minimum screen brightness. Minimum Backlight Try Now When Ambient Light Detection is turned on, you can manually adjust the minimum brightness of the TV screen. This function acts only when the value is less than the setting in Settings Picture Expert Settings Backlight Power Saving Mode Try Now Allows you to select a brightness setting from the list to reduce the TV's power consumption. Motion Lighting Try Now Adjusts the brightness in response to on-screen movements to reduce power consumption. Auto Power Off Try Now Automatically turns off the TV to reduce unnecessary power consumption if there is no operation for 4 hours. Changing the content and settings for Ambient Mode. When you press the button in Ambient Mode , the Ambient Mode browser screen appears. In the Ambient Mode browser screen, you can select content and change the Ambient Mode settings. Setting up the content for Ambient Mode The Ambient Mode browser screen displays content at the top and categories at the bottom. Use the left or right directional buttons in the content list at the top to move the focus to content you want, and then press the Select button. The selected content is played in Ambient Mode In the future, more content that you can set up in the Ambient Mode browser will be provided. You can select the following categories and content: Decor : Allows you to select beautiful screens. Info : Provides information such as weather, news headlines, and more. This function may not be supported depending on the geographical area. Photo : Allows you to set a picture stored in your mobile device as the wallpaper of the Ambient Mode screen. You can configure special layouts using your photos. To save photos from your mobile device to the TV and import them in Ambient Mode , use the SmartThings app on your mobile device. Setting up the Ambient Mode details In the Ambient Mode browser screen, move the focus to , and then press the Select button. You can change the following settings: Brightness : Adjusts the screen brightness for Ambient Mode Color Tone : Adjusts the colors of the screen for Ambient Mode Auto Brightness : Changes the auto brightness setting for Ambient Mode When this function is set to Off , the brightness level of the TV screen is not automatically adjusted according to the ambient light level. Ambient Off Timer : Sets the time that the Ambient Mode screen turns off automatically. If there is no remote control input for the set time, the TV switches to the black screen state. Changing the background color of Ambient Mode To change the background color of Ambient Mode , move the focus to in the Ambient Mode browser screen, and then press the Select button. You can change the background color or pattern. Move the focus to a color or pattern you want, and then press the Select button. Take a picture of a wall using the SmartThings app on your mobile device to set it as the background of Ambient Mode This function may have a delay in image transmission and optimization depending on the network conditions. Changing the content and settings for Ambient Mode. When you press the button in Ambient Mode , the Ambient Mode browser screen appears. In the Ambient Mode browser screen, you can select content and change the Ambient Mode settings. Setting up the content for Ambient Mode The Ambient Mode browser screen displays content at the top and categories at the bottom. Use the left or right directional buttons in the content list at the top to move the focus to content you want, and then press the Select button. The selected content is played in Ambient Mode In the future, more content that you can set up in the Ambient Mode browser will be provided. You can select the following categories and content: Decor : Allows you to select beautiful screens. Info : Provides information such as weather, news headlines, and more. This function may not be supported depending on the geographical area. Photo : Allows you to set a picture stored in your mobile device as the wallpaper of the Ambient Mode screen. You can configure special layouts using your photos. To save photos from your mobile device to the TV and import them in Ambient Mode , use the SmartThings app on your mobile device. Setting up the Ambient Mode details In the Ambient Mode browser screen, move the focus to , and then press the Select button. You can change the following settings: Brightness : Adjusts the screen brightness for Ambient Mode Color Tone : Adjusts the colors of the screen for Ambient Mode Auto Brightness : Changes the auto brightness setting for Ambient Mode When this function is set to Off , the brightness level of the TV screen is not automatically adjusted according to the ambient light level. Ambient Off Timer : Sets the time that the Ambient Mode screen turns off automatically. If there is no remote control input for the set time, the TV switches to the black screen state. Changing the background color of Ambient Mode To change the background color of Ambient Mode , move the focus to in the Ambient Mode browser screen, and then press the Select button. You can change the background color or pattern. Move the focus to a color or pattern you want, and then press the Select button. Take a picture of a wall using the SmartThings app on your mobile device to set it as the background of Ambient Mode This function may have a delay in image transmission and optimization depending on the network conditions. " +'How do I fix odd sound of speaker?',"'To fix an odd sound coming from the speakers of the TV, you should first check the volume control of your TV and the external device (such as a cable box or DVD player) connected to your TV. If the issue persists, you can reboot the connected device by disconnecting and then reconnecting its power cable. Additionally, you should ensure that the audio cable is connected to the correct audio output connector on the external device. If there is still an issue, check the signal information for antenna or cable connections as a low signal level may cause sound distortions. If the problem continues, seek assistance from the manufacturer of the audio device connected to your TV.'","Testing the sound. Settings Support Self Diagnosis Start Sound Test Try Now If the TV plays the Start Sound Test melody without distortion, there may be a problem with an external device or the broadcast signal's strength. The problem Try this! There is no sound or the sound is too low at maximum volume. The picture is good but there is no sound. No sound is heard. The speakers are making an odd sound. Check the volume control of your TV, and then check the volume control of the external device (cable box or satellite box, DVD, Blu-ray, etc.) connected to your TV. Set Sound Output TV Speaker Settings Sound Sound Output If you are using an external device, check the device's audio output option. For example, you may need to change your cable box's audio option to HDMI if the box connected to your TV is using an HDMI cable. To listen to computer sound, connect an external speaker to the computer's audio output connector. If your TV has a headphone jack, make sure there is nothing plugged into it. Reboot the connected device by disconnecting and then reconnecting the device's power cable. Check whether the Digital Output Audio Format is set to Dolby Digital+ If you are using a receiver that does not support Dolby Digital Plus, you will hear no sound when you select Dolby Digital+ Settings Sound Expert Settings Digital Output Audio Format Dolby Digital+ Run Start Sound Test Settings Support Self Diagnosis Start Sound Test Make sure that the audio cable is connected to the correct audio output connector on the external device. For antenna or cable connections, check the signal information. A low signal level may cause sound distortions. I Can't Hear the Sound Clearly. When the TV has difficulties with sound, these steps may help resolve the problem.Connection notes for audio devices. For better audio quality, it is a good idea to use an AV receiver. If you connect an external audio device using an optical cable, the Sound Output setting is automatically changed to the connected device. However, to make this happen, you must turn on the external audio device before connecting the optical cable. To manually change the Sound Output setting, do one of the following: Use the Quick Settings screen to change to the connected device: Use the Select button to select Audio Out/Optical on the Sound Output menu. ( Settings directional button Sound Output Use the Settings screen to change to the connected device: Select Audio Out/Optical on the Sound Output menu. ( Settings Sound Sound Output An unusual noise coming from a connected audio device while you are using it may indicate a problem with the audio device itself. If this occurs, ask for assistance from the audio device's manufacturer. Digital audio is only available with 5.1 channel broadcasts. " +'How can I search for the channels?',"'To search for channels, you can use the Channel List feature on your TV. Press the button on the remote to open Smart Hub, then use the directional buttons to move to Live TV and then to Channel List. This will display a list of channels with the program currently on. You can navigate through the list using the directional buttons and enter a channel number to go to a specific channel directly. Additionally, you can add channels to a favorites list for easier access.'","Using the Channel List. See how to list the channels available on your TV. Live TV Channel List Using Channel List , you can change the channel or check programs on other digital channels while watching TV. Press the button on the remote to open Smart Hub , press the left or right directional buttons to move to Live TV press the up directional button to move to the top row, press the right directional button to move to Channel List and then press the Select button. This shows a list of the channels and the program currently on. The focus will be on the channel you are currently watching, showing channel name, channel number, and program title. Use the up and down directional buttons to move within the channel list (or use the button to go up and down a page at a time). You can also use the button to enter a channel number and go to that channel directly. Press the right directional button in the Channel List to move to the Category List . Use the up and down directional buttons to move within this list. Press the Select button to choose a category you want from the Category List . The Category List contains All Channels and Favorites . Use the up and down directional buttons to move to a channel you want to view, and then press the Select button. All Channels Displays the channels that have been automatically searched. Favorites Displays Favorites 1 Favorites 5 . Use the up and down directional buttons to move between the lists of favorites. Highlight the one you want, and then press Select. The channel list will now show only the channels in this list. This option is only available on specific models in specific geographical areas. Adding channels to a favorites list. Select channels to add, and then select the icon. The selected channels are added to the favorites list.Using the Channel List. Change the channel or check programs on other digital channels. Live TV Channel List Using the Channel List , you can change the channel or check programs on other digital channels while watching TV. The Channel List screen contains the following icons: : A favorite channel : A locked channel. If you press the right directional button when viewing the Channel List screen, you can use the following functions: All Channels Displays the channels that the TV found during the Auto Program channel search. Favorites Displays Favorites 1 Favorites 5 Air Cable To allow you to select Air Cable as the DTV mode, move the focus to Air Cable , and then press the Select button. Air is suitable for homes equipped with separate, over-the-air antennas. Cable is suitable for homes that subscribe to a local cable-service. You do not need to use this function if your TV is connected to a cable box or satellite box. TV PLUS While the TV is connected to the Internet, you can watch popular programs or highlights for each theme through a virtual channel at any time. As with regular channels, the TV provides the daily program schedules for TV PLUS This function may not be supported depending on the model or geographical area. " +'Explain the steps how to do Schedule Recording while watching a program?',"'To do a Schedule Recording while watching a program, follow these steps:\n\n1. Press the Select button while watching the program to display the program info window.\n2. Use the left or right directional button to select a scheduled program that you want to record.\n3. Press the Select button.\n4. Select ""Schedule Recording"" from the options presented on the screen.\n5. This will set up a scheduled recording of the selected program.\n6. The program will be scheduled to record at the specified time, and you will be returned to watching the current program.\n\nThese steps allow you to easily schedule a recording of a program while already watching another program.'","Recording programs. Using the instant and schedule recording options from the guide screen Instant Recording Select a live program from the Guide , press and hold the Select button, and then select Record from the pop- up menu to start recording immediately. Schedule Recording Select a program scheduled to be broadcast, press the Select button, and then select Schedule Recording set up a scheduled recording of the scheduled program. Using the instant and schedule recording options while watching a program Instant Recording If you select Record after pressing the button while watching a broadcast, recording starts. Schedule Recording Press the Select button while watching a program to display the program info window. Use the left or right directional button to select a scheduled program. Press the Select button, and then select Schedule Recording to set up a scheduled recording of the scheduled program. Scheduling a video recording after entering the date and time Live TV Recordings Schedules Add Recording To schedule a recording, set the values for Antenna Channel Repeat Date Start Time , and End Time of the broadcast program to record, and then select Save Check the current time in Settings General System Manager Time ClockUsing the instant and schedule recording options from the guide screen:. To record a program that is on now, open the program guide, move to the program, and then press and hold the Select button. You will get a pop-up menu. Use the up and down directional buttons to move to Record . Press select to record the program. You will be returned to the program guide. Recording automatically ends at the time when the program ends. To open the Guide and stop recording the program, press and hold the Select button. The menu is displayed and there is now a Stop Recording option instead of a Record option. Move to the Stop Recording option, and then press the Select button. To record a program that is on later, move to the program in the Guide , and then press Select. You will get a pop-up menu. Use the up and down directional button to move within this menu. Move to the Schedule Recording , and then press Select. You will hear a message saying the program has been set to record and then you will be returned to the program guide. The icon is placed visually next to the program title to show it is set to record. To find out if a program is set to record or to cancel a scheduled recording, press select on the program. If it is already set to record, then the option on the menu will be Cancel Scheduled Recording . You can press the Select button to cancel the scheduled recording or just press the button to go back to the program guide without cancelling the recording. Setting up a scheduled viewing for a channel. Using the schedule viewing options from the guide screen On the Guide screen, select a program you would like to view, press and hold the Select button, and then select Schedule Viewing in the pop-up menu that appears. Using the schedule viewing options in the program info window Press the Select button while watching TV. The Program Info window appears. Select a program that will be broadcast using the left or right directional buttons. Press the Select button, and then select Schedule Viewing schedule viewing of that program. Using schedule viewing to view programs at a specified time on a specified date (U.S.A and Canada) Live TV Schedule Manager Add Viewing (Other countries) Live TV Schedule Manager Recordings Schedules Add Viewing To schedule a viewing, set the values for Antenna Channel Repeat Date and Start Time of the broadcast program to view, and then select Save Check the current time in Settings General System Manager Time Clock This function may not be supported depending on the model or geographical area. " +'How can I use HDMI UHD Color?',"""To use HDMI UHD Color, you need to access the settings on your TV. Go to General > External Device Manager > HDMI UHD Color. If an external UHD player is connected to the TV, use the HDMI UHD Color function to convert the TV's HDMI connector to a mode optimized for the UHD 50P/60P signals. You can select the HDMI connector you want to use for HDMI UHD Color and then press the Select button to set the HDMI UHD Color function to On. This will enable support for up to UHD 50P/60P 4:4:4 and 4:2:2 input signals. If your external device only supports UHD 24 or 30 Hz frequency or any FHD frequency, the HDMI UHD Color function may not be available, and you should set HDMI UHD Color to Off in that case.""","Viewing UHD videos. Settings General External Device Manager HDMI UHD Color Try Now If an external UHD player is connected to the TV, use the HDMI UHD Color function to convert the TV's HDMI connector to a mode optimized for the UHD 50P/60P signals. When you select the HDMI connector you want to use for HDMI UHD Color , and then press the Select button to set the HDMI UHD Color function to , the TV screen flickers. The HDMI connection with HDMI UHD Color set to Off supports up to UHD 50P/60P 4:2:0 input signals, while the HDMI connection with HDMI UHD Color set to supports up to UHD 50P/60P 4:4:4 and 4:2:2 input signals. For more information, refer to Resolutions for UHD Input When you connect the TV to an external device that supports only the UHD 24 or 30 Hz frequency or any FHD frequency, the HDMI UHD Color function may not be available. In this case, set HDMI UHD Color OffIf HDMI UHD Color is set to On. Color Depth / Frame rate (fps) 50 / 60 10 bit RGB 4:4:4 YCbCr 4:4:4 YCbCr 4:2:2 YCbCr 4:2:0 Chroma Sampling RGB 4:4:4 YCbCr 4:4:4 YCbCr 4:2:2 YCbCr 4:2:0 Chroma Sampling 8 bit 12 bit Supported Resolutions for UHD Input Signals. Check the supported resolution for UHD input signals. Resolution: 3840 x 2160p, 4096 x 2160p An HDMI connection with HDMI UHD Color set to Off supports up to UHD 50P/60P 4:2:0 input signals, while an HDMI connection with HDMI UHD Color set to supports up to UHD 50P/60P 4:4:4 and 4:2:2 input signals." +'Can I turn TV in Ambient Mode?',"'Yes, you can turn the TV into Ambient Mode by pressing the button on the remote control when the TV is turned off.'","Using the Ambient Mode. Learn about the functions available in Ambient Mode, which is a QLED TV-specific function. Ambient Mode The image on your TV may differ from the image above depending on the model and geographical area. Ambient Mode , you can view beautiful screens, various visual information, and notifications. To enter Ambient Mode , press the button. To return to the TV mode, press the button. To shut off the TV, press the button. If you press the button when the TV is turned off, the TV turns on in Ambient Mode If you use a remote control other than the Samsung TV remote control, there may be restrictions to entering Ambient Mode Because this function is a QLED TV-specific function, it may not be supported depending on the model. Decor Info Photo Using the Ambient Mode. Learn about the functions available in Ambient Mode, which is a QLED TV-specific function. Ambient Mode The image on your TV may differ from the image above depending on the model and geographical area. Ambient Mode , you can view beautiful screens, various visual information, and notifications. To enter Ambient Mode , press the button. To return to the TV mode, press the button. To shut off the TV, press the button. If you press the button when the TV is turned off, the TV turns on in Ambient Mode If you use a remote control other than the Samsung TV remote control, there may be restrictions to entering Ambient Mode Because this function is a QLED TV-specific function, it may not be supported depending on the model. Decor Info Photo Screen states related to Ambient Mode. Ambient Mode Ambient Mode , you can set up and enjoy visual information and design. In this state, less power is consumed and brightness is automatically adjusted by the illuminance sensor. Getting dark state Ambient Mode , the TV screen changes automatically according to the surroundings. When the surroundings get dark, the TV screen becomes darker. When the surroundings get bright before the TV screen turns completely black or when the remote control or the Bixby function is operated, the TV screen returns to the Ambient Mode You can change the auto brightness setting for Ambient Mode . For more information, refer to up the Ambient Mode In this state, the TV can activate voice guides and process voice commands. Black screen state Ambient Mode , if the surroundings get to a specific level of darkness or there is no external operation, the TV turns soft-off. Then, when the surroundings get bright, the TV screen returns to the Ambient Mode You can change the auto brightness setting for Ambient Mode . For more information, refer to up the Ambient Mode In this state, the TV cannot activate voice guides and process voice commands. TV off state To enter Ambient Mode when the TV is turned off, press the button on the remote control. To enter TV mode when the TV is turned off, press the button. In TV mode, you can watch broadcasts or enjoy the content on the external connected devices. " +'How can I jump forward / jump backward?',"'To jump forward or jump backward while playing multimedia content, you can press the up directional button to move the focus to the playback control bar. Then, select the left or right directional button in the playback control bar to move backward or forward by 10 seconds in the video. This allows you to quickly navigate through the video content.'","Buttons and functions available while recording a program or. Timeshift Review available buttons and functions of how they are used while recording a program or Timeshift. This function is not available in the U.S.A. and Canada. This function may not be supported depending on the model or geographical area. The available buttons and functions may differ with the functions. When you press the down directional button, the control bar appears and the following options are available: Edit Recording Time You can set for what period of time the program will be recorded. This function is only available when a program is being recorded. Record You can switch from the Timeshift function to the recording function. When switching to the recording function, you can include the content recorded by the Timeshift function in the new recording. This function is only available when the Timeshift function is running. Go to Live TV Select to return to the current scenes of the program when the recording or Timeshift function is playing its past scenes. Stop Recording Stop Timeshift Ends the recording or Timeshift function. Info Displays the program info window of the program you are recording or time-shifting. Pause Play You can use the following functions when the video is paused. (Note that with the video paused, the TV does not play audio.) Slow Rewind Slow Forward : Allows you to play the video slowly (1/8, 1/4, 1/2) backward or forward by selecting the option. To increase the rewind or forward speed in slow mode up to 3 times, select the option repeatedly. To return to normal speed, select the option. When the Slow Rewind function is activated, you can view the difference between the current recording time and the current rewind time. Jump Backward / Jump Forward: Press the up directional button to move the focus to the playback control bar, and then select the left or right directional button in the playback control bar to move backward or forward by 10 seconds in the video. When the Jump Backward function is activated, you can view the difference between the current recording time and the current rewind time. Rewind Fast Forward This function is not available while you are watching a program that is currently being broadcast. Buttons and functions available while playing multimedia. content Review available media playback, control, and record buttons and descriptions of how they are used. Press the Select button while playing any video, photo, or recorded content. The following buttons appear. The provided buttons or functions may differ with the media content type. The available buttons and functions may differ with the content you are viewing or playing. Pause Play Start Pauses or plays the multimedia content. You can use the following functions when the video is paused. Slow Rewind Slow Forward : Allows you to play a video slowly backward or forward by selecting the option. There are 3 playback speeds. To change the playback speed, press the option repeatedly. To return to normal speed, select the option or press the button. Jump Backward / Jump Forward: Press the up directional button to move the focus to the playback control bar, and then select the left or right directional button in the playback control bar to move backward or forward by 10 seconds in the video. Move to a specific playback section, move up the focus on the playback bar, and then select one of the five thumbnails. This function may not be supported depending on the file format. Previous Next Displays the previous or the next multimedia content file. Rewind Fast Forward Rewinds or fast forwards the multimedia content. To increase the rewind or fast forward speed up to 3 times faster than normal, select the button repeatedly. To return to normal speed, select the option or press the button. 360 Mode Provides a 360-degree view for videos and photos. This function may not be supported depending on the file format. Repeat Plays the current multimedia content repeatedly or all multimedia content files in the same folder repeatedly. Shuffle Plays music files in random order. Picture Off Plays multimedia content with the screen off. Screen Fit Fits a photo to the screen. Rotate left Rotate right Rotates a photo left or right. Zoom Zooms in a photo by up to a factor of 4. Background Music During a slideshow, pauses or resumes the background music. Options The available options may differ with the model and content. Function Description Slideshow Speed Sets the slideshow speed. Slideshow Effect Applies transition effects to the slide show. Background Music Subtitles Rotate Rotates the video. Audio Language Information Displays detailed information about the current multimedia content. Plays background music while the TV displays photos. The music files must be saved in the same USB device as the photo files. To pause the background music, select Background Music on the playback screen bottom. Controls video subtitles. Subtitles : Turns the language of the subtitles on or off. Sync : Adjusts the synchronization when the video and the subtitles are not synchronized. Reset Sync : Resets the subtitle synchronization adjustment to 0. Size : Changes the font size of the subtitles. Encoding : Displays a list of encoding languages. Allows you to change the encoding language when subtitles are displayed incorrectly. Edge Type : Adjusts the position of text and subtitle edges. Edge Color : Changes the color of subtitle edges. Changes the audio language. This function is only available if the video supports multi-track sound. Changing the recording time. When recording a live program you can specify how long you want to record for. Press the down directional button to display the control bar at the bottom. Press the left or right directional buttons to move to Edit Recording Time , and then press the Select button. A slider scale appears and you will hear the recording time currently set. Use the left and right directional buttons to change the time, and then press the Select button. You will hear a menu that gives information about the recording time. To return to the program you are watching, press the button on the remote. The maximum recording time differs depending on the capacity of the USB device. " +'How can I turn on ambient mode on TV screen?',"'To turn on Ambient Mode on the TV screen, you can press the button on the remote control. If the TV is turned off, pressing the button will turn the TV on in Ambient Mode.'","Screen states related to Ambient Mode. Ambient Mode Ambient Mode , you can set up and enjoy visual information and design. In this state, less power is consumed and brightness is automatically adjusted by the illuminance sensor. Getting dark state Ambient Mode , the TV screen changes automatically according to the surroundings. When the surroundings get dark, the TV screen becomes darker. When the surroundings get bright before the TV screen turns completely black or when the remote control or the Bixby function is operated, the TV screen returns to the Ambient Mode You can change the auto brightness setting for Ambient Mode . For more information, refer to up the Ambient Mode In this state, the TV can activate voice guides and process voice commands. Black screen state Ambient Mode , if the surroundings get to a specific level of darkness or there is no external operation, the TV turns soft-off. Then, when the surroundings get bright, the TV screen returns to the Ambient Mode You can change the auto brightness setting for Ambient Mode . For more information, refer to up the Ambient Mode In this state, the TV cannot activate voice guides and process voice commands. TV off state To enter Ambient Mode when the TV is turned off, press the button on the remote control. To enter TV mode when the TV is turned off, press the button. In TV mode, you can watch broadcasts or enjoy the content on the external connected devices. Screen states related to Ambient Mode. Ambient Mode Ambient Mode , you can set up and enjoy visual information and design. In this state, less power is consumed and brightness is automatically adjusted by the illuminance sensor. Getting dark state Ambient Mode , the TV screen changes automatically according to the surroundings. When the surroundings get dark, the TV screen becomes darker. When the surroundings get bright before the TV screen turns completely black or when the remote control or the Bixby function is operated, the TV screen returns to Ambient Mode You can change the auto brightness setting for Ambient Mode . For more information, refer to up the Ambient Mode In this state, the TV can activate voice guides and process voice commands. Black screen state Ambient Mode , if the surroundings get to a specific level of darkness or there is no external operation, the TV turns soft-off. Then, when the surroundings get bright, the TV screen returns to the Ambient Mode You can change the auto brightness setting for Ambient Mode . For more information, refer to up the Ambient Mode In this state, the TV cannot activate voice guides and process voice commands. TV off state To enter Ambient Mode when the TV is turned off, press the button on the remote control. To enter TV mode when the TV is turned off, press the button. In TV mode, you can watch broadcasts or enjoy the content on the external connected devices. Using the Ambient Mode. Learn about the functions available in Ambient Mode, which is a QLED TV-specific function. Ambient Mode The image on your TV may differ from the image above depending on the model and geographical area. Ambient Mode , you can view beautiful screens, various visual information, and notifications. To enter Ambient Mode , press the button. To return to the TV mode, press the button. To shut off the TV, press the button. If you press the button when the TV is turned off, the TV turns on in Ambient Mode If you use a remote control other than the Samsung TV remote control, there may be restrictions to entering Ambient Mode Because this function is a QLED TV-specific function, it may not be supported depending on the model. Decor Info Photo " +'What are natural and movie mode?',"'Natural mode reduces eye strain for a comfortable viewing experience, while Movie mode is suitable for watching TV or movies in a dark room.'","Choosing a picture mode. Settings Picture Picture Mode Try Now You can select the Picture Mode that provides the best viewing experience. Dynamic Makes the picture brighter and clearer in bright viewing environments. Standard Is the default mode suitable for general viewing environments. Natural Reduces eye strain for a comfortable viewing experience. Movie Is suitable for watching TV or movies in a dark room. This function may not be supported depending on the model.Playing games on an optimized screen. Settings General External Device Manager Game Mode Settings Game Mode You can enable game mode to optimize the TV's settings for playing video games with a gaming console, such as a PlayStation or Xbox. The Game Mode Settings may not appear depending on the model. The game mode is not available for normal TV viewing. The screen may shake somewhat. When Game Mode is enabled, Picture Mode and Sound Mode are switched to Game automatically. To use a different external device on the same port, remove the game console connection, turn off Game Mode, and then connect the external device to the port.Screen states related to Ambient Mode. Ambient Mode Ambient Mode , you can set up and enjoy visual information and design. In this state, less power is consumed and brightness is automatically adjusted by the illuminance sensor. Getting dark state Ambient Mode , the TV screen changes automatically according to the surroundings. When the surroundings get dark, the TV screen becomes darker. When the surroundings get bright before the TV screen turns completely black or when the remote control or the Bixby function is operated, the TV screen returns to Ambient Mode You can change the auto brightness setting for Ambient Mode . For more information, refer to up the Ambient Mode In this state, the TV can activate voice guides and process voice commands. Black screen state Ambient Mode , if the surroundings get to a specific level of darkness or there is no external operation, the TV turns soft-off. Then, when the surroundings get bright, the TV screen returns to the Ambient Mode You can change the auto brightness setting for Ambient Mode . For more information, refer to up the Ambient Mode In this state, the TV cannot activate voice guides and process voice commands. TV off state To enter Ambient Mode when the TV is turned off, press the button on the remote control. To enter TV mode when the TV is turned off, press the button. In TV mode, you can watch broadcasts or enjoy the content on the external connected devices. " +'Can you explain Ambient Mode?',"'Ambient Mode is a feature available on some TVs that allows users to set up and enjoy visual information and design when the TV is not in use. In Ambient Mode, the TV consumes less power and automatically adjusts brightness based on the surrounding light conditions using an illuminance sensor. There are different states of Ambient Mode, such as getting dark state and black screen state, where the TV screen changes automatically according to the surroundings, becoming darker when it gets dark and turning off when it reaches a specific level of darkness. Users can also change settings like screen brightness, color tone, and auto brightness for Ambient Mode. Additionally, users can select different categories of content, such as Decor, Info, and Photo, to display on the Ambient Mode screen. The TV can be switched between Ambient Mode and regular TV mode by pressing buttons on the remote control.'","Screen states related to Ambient Mode. Ambient Mode Ambient Mode , you can set up and enjoy visual information and design. In this state, less power is consumed and brightness is automatically adjusted by the illuminance sensor. Getting dark state Ambient Mode , the TV screen changes automatically according to the surroundings. When the surroundings get dark, the TV screen becomes darker. When the surroundings get bright before the TV screen turns completely black or when the remote control or the Bixby function is operated, the TV screen returns to the Ambient Mode You can change the auto brightness setting for Ambient Mode . For more information, refer to up the Ambient Mode In this state, the TV can activate voice guides and process voice commands. Black screen state Ambient Mode , if the surroundings get to a specific level of darkness or there is no external operation, the TV turns soft-off. Then, when the surroundings get bright, the TV screen returns to the Ambient Mode You can change the auto brightness setting for Ambient Mode . For more information, refer to up the Ambient Mode In this state, the TV cannot activate voice guides and process voice commands. TV off state To enter Ambient Mode when the TV is turned off, press the button on the remote control. To enter TV mode when the TV is turned off, press the button. In TV mode, you can watch broadcasts or enjoy the content on the external connected devices. Screen states related to Ambient Mode. Ambient Mode Ambient Mode , you can set up and enjoy visual information and design. In this state, less power is consumed and brightness is automatically adjusted by the illuminance sensor. Getting dark state Ambient Mode , the TV screen changes automatically according to the surroundings. When the surroundings get dark, the TV screen becomes darker. When the surroundings get bright before the TV screen turns completely black or when the remote control or the Bixby function is operated, the TV screen returns to Ambient Mode You can change the auto brightness setting for Ambient Mode . For more information, refer to up the Ambient Mode In this state, the TV can activate voice guides and process voice commands. Black screen state Ambient Mode , if the surroundings get to a specific level of darkness or there is no external operation, the TV turns soft-off. Then, when the surroundings get bright, the TV screen returns to the Ambient Mode You can change the auto brightness setting for Ambient Mode . For more information, refer to up the Ambient Mode In this state, the TV cannot activate voice guides and process voice commands. TV off state To enter Ambient Mode when the TV is turned off, press the button on the remote control. To enter TV mode when the TV is turned off, press the button. In TV mode, you can watch broadcasts or enjoy the content on the external connected devices. Changing the content and settings for Ambient Mode. When you press the button in Ambient Mode , the Ambient Mode browser screen appears. In the Ambient Mode browser screen, you can select content and change the Ambient Mode settings. Setting up the content for Ambient Mode The Ambient Mode browser screen displays content at the top and categories at the bottom. Use the left or right directional buttons in the content list at the top to move the focus to content you want, and then press the Select button. The selected content is played in Ambient Mode In the future, more content that you can set up in the Ambient Mode browser will be provided. You can select the following categories and content: Decor : Allows you to select beautiful screens. Info : Provides information such as weather, news headlines, and more. This function may not be supported depending on the geographical area. Photo : Allows you to set a picture stored in your mobile device as the wallpaper of the Ambient Mode screen. You can configure special layouts using your photos. To save photos from your mobile device to the TV and import them in Ambient Mode , use the SmartThings app on your mobile device. Setting up the Ambient Mode details In the Ambient Mode browser screen, move the focus to , and then press the Select button. You can change the following settings: Brightness : Adjusts the screen brightness for Ambient Mode Color Tone : Adjusts the colors of the screen for Ambient Mode Auto Brightness : Changes the auto brightness setting for Ambient Mode When this function is set to Off , the brightness level of the TV screen is not automatically adjusted according to the ambient light level. Ambient Off Timer : Sets the time that the Ambient Mode screen turns off automatically. If there is no remote control input for the set time, the TV switches to the black screen state. Changing the background color of Ambient Mode To change the background color of Ambient Mode , move the focus to in the Ambient Mode browser screen, and then press the Select button. You can change the background color or pattern. Move the focus to a color or pattern you want, and then press the Select button. Take a picture of a wall using the SmartThings app on your mobile device to set it as the background of Ambient Mode This function may have a delay in image transmission and optimization depending on the network conditions. " +'Can I configure Brightness?',"'Yes, you can configure the brightness level of the TV using the advanced picture settings. You can adjust the brightness setting in the Settings Picture Expert Settings to your taste.'","Reducing the energy consumption of the TV. Settings General Eco Solution Try Now You can adjust the brightness level of the TV, reduce overall power consumption, and prevent overheating. Ambient Light Detection Try Now Automatically adjusts the brightness level of the TV, according to the ambient light level, to reduce power consumption. If Ambient Light Detection has adjusted the screen brightness to a too bright or dark level, you can select Minimum Backlight to manually adjust the minimum screen brightness. Minimum Backlight Try Now When Ambient Light Detection is turned on, you can manually adjust the minimum brightness of the TV screen. This function acts only when the value is less than the setting in Settings Picture Expert Settings Backlight Power Saving Mode Try Now Allows you to select a brightness setting from the list to reduce the TV's power consumption. Motion Lighting Try Now Adjusts the brightness in response to on-screen movements to reduce power consumption. Auto Power Off Try Now Automatically turns off the TV to reduce unnecessary power consumption if there is no operation for 4 hours. Configuring advanced picture settings. Settings Picture Expert Settings Try Now Configure the screen settings to your taste by using the following functions: Backlight Try Now Brightness Try Now Contrast Try Now Sharpness Try Now Color Try Now Tint (G/R) Try Now Apply Picture Settings Try Now Digital Clean View Try Now Auto Motion Plus Settings Try Now When LED Clear Motion is set to , the screen appears darker than when it is Off Local Dimming This function may not be supported depending on the model or geographical area. Contrast Enhancer Try Now HDR+ Mode Try Now Automatically provide an optimal HDR effect based on the video source. The HDR (High Dynamic Range) technology implements video images that are very similar to the images seen through human eyes by finely adjusting the contrast of the source. This function may not be supported depending on the model or geographical area. Film Mode Try Now This function is only available when the input signal is TV, AV, Component (480i, 1080i), or HDMI (1080i). Color Tone Try Now White Balance Try Now Gamma Try Now RGB Only Mode Try Now Color Space Settings Try Now Reset Picture Try Now Screen states related to Ambient Mode. Ambient Mode Ambient Mode , you can set up and enjoy visual information and design. In this state, less power is consumed and brightness is automatically adjusted by the illuminance sensor. Getting dark state Ambient Mode , the TV screen changes automatically according to the surroundings. When the surroundings get dark, the TV screen becomes darker. When the surroundings get bright before the TV screen turns completely black or when the remote control or the Bixby function is operated, the TV screen returns to Ambient Mode You can change the auto brightness setting for Ambient Mode . For more information, refer to up the Ambient Mode In this state, the TV can activate voice guides and process voice commands. Black screen state Ambient Mode , if the surroundings get to a specific level of darkness or there is no external operation, the TV turns soft-off. Then, when the surroundings get bright, the TV screen returns to the Ambient Mode You can change the auto brightness setting for Ambient Mode . For more information, refer to up the Ambient Mode In this state, the TV cannot activate voice guides and process voice commands. TV off state To enter Ambient Mode when the TV is turned off, press the button on the remote control. To enter TV mode when the TV is turned off, press the button. In TV mode, you can watch broadcasts or enjoy the content on the external connected devices. " +'What are the uses of buttons in the e-manual?','The uses of buttons in the e-manual are:\n1. Search: Select an item from the search results to load the corresponding page.\n2. Index: Select a keyword to navigate to the relevant page.\n3. Recently Viewed Topics: Select a topic from the list of recently viewed topics.\n4. Try Now: Allows you to access the corresponding menu item and try out the feature right away.\n5. Link: Access an underlined topic referred to on an e-Manual page immediately.',"Using the buttons in the e-Manual. Search ): Select an item from the search results to load the corresponding page. Index ): Select a keyword to navigate to the relevant page. Recently Viewed Topics ): Select a topic from the list of recently viewed topics. Try Now ): Allows you to access the corresponding menu item and try out the feature right away. Link ): Access an underlined topic referred to on an e-Manual page immediately. Some menu screens cannot be accessed from the e-Manual.Launching the e-Manual. Open the user manual embedded in your TV. Press the E-MANUAL button on the standard remote control to open the menu. Alternatively, press the button to open Smart Hub , press the left directional button to move to Settings , and then press the Select button. Use the up and down directional buttons to move to Support , and then select Open e-Manual to open the e-Manual. Settings Support Open e-Manual You can view the embedded e-Manual containing information about your TV's key features. Alternatively, you can download a copy of the e-Manual from Samsung's website. Words in blue (e.g., Picture Mode ) indicate a menu item. The e-Manual contains 2 rows of menu icons. Use the up and down directional buttons to move between rows and the left and right directional buttons to move within a row. Press Select to open the section you want to read. The e-Manual contains a section called Learn TV Remote in Accessibility Guidance (at the right corner of the screen). This is particularly useful for people who cannot easily see the buttons on the control or who are not clear what each button does. When you are viewing this section of the e-Manual, pressing the buttons on the remote control will not affect the TV. Note that the Accessibility Guidance item is only available when Voice Guide Settings General Accessibility Voice Guide Settings Voice Guide ) is enabled. Launching the e-Manual. Settings Support Open e-Manual You can view the embedded e-Manual that contains information about your TV's key features. Alternatively, you can download a copy of the e-Manual from Samsung's website. Words in blue (e.g., Picture Mode ) indicate a menu item. " +'Can I fix powering on issue?',"""Yes, you can fix the powering on issue by following the steps provided in the context. Some solutions include confirming that the TV's power cord is connected correctly, ensuring the remote control is operating normally, checking the antenna or cable TV connection, confirming the cable box or satellite box is plugged in and turned on, and running the Auto Program function.""","Turning on the TV with a mobile device. Settings General Network Expert Settings Power On with Mobile You can turn on the TV with your mobile device when both of them are connected to the same network.Restoring the TV to the factory settings. Settings Support Self Diagnosis Reset Try Now You can restore all TV settings (excluding the Internet and network settings) to the factory defaults. Select Reset . The security PIN entry window appears. Enter the security PIN, and then select Yes . All settings are then reset. The TV turns off and on again automatically and displays the Start Setup screen. For more information on Start Setup , refer to the user manual that came with the TV. Testing the picture. Settings Support Self Diagnosis Start Picture Test Try Now Before you review the list of problems and solutions below, use Start Picture Test to determine if the problem is caused by the TV. Start Picture Test displays a high definition picture you can examine for flaws or faults. The problem Try this! Flickering and Dimming Component Connections/ Screen Color Screen Brightness Ghosting, Blurring, or Juddering If your TV is flickering or dimming sporadically, you may need to disable some of the energy efficiency features. Disable Ambient Light Detection Power Saving Mode , or Motion Lighting Settings General Eco Solution Ambient Light Detection Settings General Eco Solution Power Saving Mode Settings General Eco Solution Motion Lighting If the color on your TV screen is not correct or the black and white colors are off, run Start Picture Test Settings Support Self Diagnosis Start Picture Test If the test results indicate that the problem is not caused by the TV, do the following: Confirm that the video input connectors are connected to the correct external device video output connectors. Check the other connections as well. If the TV is connected to an external device via a component cable, confirm that the Pb, Pr, and Y jacks are plugged into their proper connectors. If the colors on your TV are correct but just a little too dark or bright, try adjusting the following settings first. Settings Picture Expert Settings Backlight Settings Picture Expert Settings Contrast Settings Picture Expert Settings Brightness Settings Picture Expert Settings Sharpness Settings Picture Expert Settings Color Settings Picture Expert Settings Tint (G/R) If you notice ghosting or blurring on the screen, use the Auto Motion Plus Settings function to resolve the issue. Settings Picture Expert Settings Auto Motion Plus Settings The problem Try this! Unwanted Powering Off Problems Powering On Unable to find a Channel The TV image does not look as good as it did in the store. The picture is distorted. The color is wrong or missing. If your TV appears to turn off by itself, try disabling some of the TV's energy efficiency functions. See if Sleep Timer has been enabled. The Sleep Timer automatically turns the TV off after a specified period of time. Settings General System Manager Time Sleep Timer If the Sleep Timer has not been enabled, see if Auto Power Off Off Timer has been enabled and disable it. Settings General Eco Solution Auto Power Off Settings General System Manager Time Off Timer If you are having problems powering on your TV, there are a number of things to check before calling the service department. Confirm that the TV's power cord is connected correctly at both ends and that the remote control is operating normally. Make sure that the antenna cable or cable TV cable is firmly connected. If you have a cable box or satellite box, confirm that it is plugged in and turned on. If your TV is not connected to a cable box or satellite box, run Auto Program Settings Broadcasting Auto Program Store displays are all tuned to digital, HD (high definition) channels. If you have an analog cable box or satellite box, upgrade to a digital cable box or satellite box. Use HDMI or Component cables to deliver HD (high definition) picture quality. Many HD channels are upscaled from SD (Standard Definition) content. Look for a channel that is broadcasting HD content. Cable/Satellite Subscribers: Try HD channels from the channel lineup. Air/Cable Antenna Connection: Try HD channels after running the Auto Program function. Settings Broadcasting Auto Program Adjust the cable box or satellite box's video output resolution to 1080i or 720p. The compression of video content may cause picture distortions, especially in fast moving pictures from sports programs and action movies. If the signal reception is weak or poor, screen distortion may be visible but it is not a malfunction. Mobile phones used close to the TV (within 3.2 ft) may cause noise on analog and digital channels. If you're using a Component connection, make sure that the Component cables are connected to the correct jacks. Incorrect or loose connections may cause color problems or a blank screen. The problem Try this! The color is poor or the picture is not bright enough. There is a dotted line on the edge of the screen. The picture is black and white. If the test picture does not appear or there is noise or distortion, the TV may have a problem. Contact Samsung's Call Center for assistance. If the test picture is displayed properly, there may be a problem with an external device. Please check the connections. If the problem persists, check the signal strength or refer to the external device's user manual. Go to Picture and then adjust the Picture Mode Brightness Sharpness , and Color settings. Settings Picture Picture Mode Settings Picture Expert Settings Brightness Settings Picture Expert Settings Sharpness Settings Picture Expert Settings Color See if Power Saving Mode has been enabled. Settings General Eco Solution Power Saving Mode Try resetting the picture. Settings Picture Expert Settings Reset Picture Change Picture Size Settings 16:9 Standard Settings Picture Picture Size Settings 16:9 Standard If you are using AV or analog equipment, disconnect the adapter from the Component (Blue) input port on the TV and connect it to the AV (Yellow) input port. The Component or AV input port may not be provided depending on the model or geographical area. Check whether Grayscale is set to Settings General Accessibility Grayscale " +'how do I fix low volume issue?',"""To fix the low volume issue on your TV, you can try the following steps:\n\n1. Check the volume control of your TV and the external device (cable box, satellite box, DVD player, Blu-ray player, etc.) connected to your TV.\n2. If you are using an external device, make sure to check the device's audio output option. For example, you may need to change the audio output option to HDMI if the box connected to your TV is using an HDMI cable.\n3. If you are listening to computer sound, connect an external speaker to the computer's audio output connector.\n4. Make sure there is nothing plugged into the headphone jack of your TV.\n5. Reboot the connected device by disconnecting and then reconnecting the device's power cable.\n6. Check if the Digital Output Audio Format is set to Dolby Digital+. If you are using a receiver that does not support Dolby Digital Plus, you will hear no sound when you select Dolby Digital+.\n7. Make sure the audio cable is connected to the correct audio output connector on the external device.\n8. For antenna or cable connections, check the signal information, as a low signal level may cause sound distortions.\n\nAdditionally, for better audio quality, consider using an AV receiver when connecting external audio devices. If you connect an external audio device using an optical cable, the Sound Output setting is automatically changed to the connected device. To manually change the Sound Output setting, you can use the Quick Settings screen or the Settings screen on your TV.""","Testing the sound. Settings Support Self Diagnosis Start Sound Test Try Now If the TV plays the Start Sound Test melody without distortion, there may be a problem with an external device or the broadcast signal's strength. The problem Try this! There is no sound or the sound is too low at maximum volume. The picture is good but there is no sound. No sound is heard. The speakers are making an odd sound. Check the volume control of your TV, and then check the volume control of the external device (cable box or satellite box, DVD, Blu-ray, etc.) connected to your TV. Set Sound Output TV Speaker Settings Sound Sound Output If you are using an external device, check the device's audio output option. For example, you may need to change your cable box's audio option to HDMI if the box connected to your TV is using an HDMI cable. To listen to computer sound, connect an external speaker to the computer's audio output connector. If your TV has a headphone jack, make sure there is nothing plugged into it. Reboot the connected device by disconnecting and then reconnecting the device's power cable. Check whether the Digital Output Audio Format is set to Dolby Digital+ If you are using a receiver that does not support Dolby Digital Plus, you will hear no sound when you select Dolby Digital+ Settings Sound Expert Settings Digital Output Audio Format Dolby Digital+ Run Start Sound Test Settings Support Self Diagnosis Start Sound Test Make sure that the audio cable is connected to the correct audio output connector on the external device. For antenna or cable connections, check the signal information. A low signal level may cause sound distortions. I Can't Hear the Sound Clearly. When the TV has difficulties with sound, these steps may help resolve the problem.Connection notes for audio devices. For better audio quality, it is a good idea to use an AV receiver. If you connect an external audio device using an optical cable, the Sound Output setting is automatically changed to the connected device. However, to make this happen, you must turn on the external audio device before connecting the optical cable. To manually change the Sound Output setting, do one of the following: Use the Quick Settings screen to change to the connected device: Use the Select button to select Audio Out/Optical on the Sound Output menu. ( Settings directional button Sound Output Use the Settings screen to change to the connected device: Select Audio Out/Optical on the Sound Output menu. ( Settings Sound Sound Output An unusual noise coming from a connected audio device while you are using it may indicate a problem with the audio device itself. If this occurs, ask for assistance from the audio device's manufacturer. Digital audio is only available with 5.1 channel broadcasts. " +'My software update over the Internet has failed. How do I fix this?',"""If your software update over the Internet has failed, you can try updating via a USB device. Download the update file from the Samsung website and store it on a USB device. Connect the USB device to the TV and update using the instructions provided. Remember to save the update package in the USB device's top-level folder to ensure the TV can locate it.""","Updating the TV's software to the latest version. DO NOT turn off the TV's power until the update is complete. The TV will turn off and on automatically after completing the software update. Video and audio settings will be reset to their defaults after a software update. Updating through the Internet Settings Support Software Update Update Now Try Now Updating from the Internet requires an active Internet connection. Updating through a USB device Settings Support Software Update Update Now After downloading the update file from the Samsung website and storing it on a USB device, connect the USB device to the TV to update. To update using a USB flash drive, download the update package from Samsung.com to your computer. Then, save the update package in the USB device's top-level folder. Otherwise, the TV will not be able to locate the update package.Updating the TV's software to the latest version. DO NOT turn off the TV's power until the update is complete. The TV will turn off and on automatically after completing the software update. Video and audio settings will be reset to their defaults after a software update. Updating through the Internet Settings Support Software Update Update Now Try Now Updating from the Internet requires an active Internet connection. Updating through a USB device Settings Support Software Update Update Now After downloading the update file from the Samsung website and storing it on a USB device, connect the USB device to the TV to update. To update using a USB flash drive, download the update package from Samsung.com to your computer. Then, save the update package in the USB device's top-level folder. Otherwise, the TV will not be able to locate the update package.Updating the TV automatically. Settings Support Software Update Auto Update Try Now If the TV is connected to the Internet, you can have the TV's software update itself automatically while you are watching the TV. When the background update is completed, it is applied the next time the TV is turned on. If you agree to the Smart Hub terms and conditions, Auto Update is set to automatically. If you want this function disabled, use the Select button to turn it off. This function may take a longer time if another network function is running concurrently. This function requires an Internet connection. " +'How to turn TV in Ambient Mode?',"'To turn the TV in Ambient Mode, you need to press the button on the remote control. If the TV is turned off and you press the button, the TV will turn on in Ambient Mode.'","Using the Ambient Mode. Learn about the functions available in Ambient Mode, which is a QLED TV-specific function. Ambient Mode The image on your TV may differ from the image above depending on the model and geographical area. Ambient Mode , you can view beautiful screens, various visual information, and notifications. To enter Ambient Mode , press the button. To return to the TV mode, press the button. To shut off the TV, press the button. If you press the button when the TV is turned off, the TV turns on in Ambient Mode If you use a remote control other than the Samsung TV remote control, there may be restrictions to entering Ambient Mode Because this function is a QLED TV-specific function, it may not be supported depending on the model. Decor Info Photo Using the Ambient Mode. Learn about the functions available in Ambient Mode, which is a QLED TV-specific function. Ambient Mode The image on your TV may differ from the image above depending on the model and geographical area. Ambient Mode , you can view beautiful screens, various visual information, and notifications. To enter Ambient Mode , press the button. To return to the TV mode, press the button. To shut off the TV, press the button. If you press the button when the TV is turned off, the TV turns on in Ambient Mode If you use a remote control other than the Samsung TV remote control, there may be restrictions to entering Ambient Mode Because this function is a QLED TV-specific function, it may not be supported depending on the model. Decor Info Photo Screen states related to Ambient Mode. Ambient Mode Ambient Mode , you can set up and enjoy visual information and design. In this state, less power is consumed and brightness is automatically adjusted by the illuminance sensor. Getting dark state Ambient Mode , the TV screen changes automatically according to the surroundings. When the surroundings get dark, the TV screen becomes darker. When the surroundings get bright before the TV screen turns completely black or when the remote control or the Bixby function is operated, the TV screen returns to the Ambient Mode You can change the auto brightness setting for Ambient Mode . For more information, refer to up the Ambient Mode In this state, the TV can activate voice guides and process voice commands. Black screen state Ambient Mode , if the surroundings get to a specific level of darkness or there is no external operation, the TV turns soft-off. Then, when the surroundings get bright, the TV screen returns to the Ambient Mode You can change the auto brightness setting for Ambient Mode . For more information, refer to up the Ambient Mode In this state, the TV cannot activate voice guides and process voice commands. TV off state To enter Ambient Mode when the TV is turned off, press the button on the remote control. To enter TV mode when the TV is turned off, press the button. In TV mode, you can watch broadcasts or enjoy the content on the external connected devices. " +' TV audio is not being played through the receiver. What should I do?',"'To troubleshoot the issue of TV audio not being played through the receiver:\n1. Make sure the audio cable is connected to the correct audio output connector on the external device (receiver).\n2. Check the Digital Output Audio Format settings on the TV. Ensure it is set to Dolby Digital+ and change it if necessary based on the capabilities of the receiver.\n3. If using an HDMI connection, verify the audio output setting on the TV.\n4. Reboot the connected device (receiver) by disconnecting and then reconnecting its power cable.\n5. Ensure that the receiver supports Dolby Digital Plus for sound transmission.'","Testing the sound. Settings Support Self Diagnosis Start Sound Test Try Now If the TV plays the Start Sound Test melody without distortion, there may be a problem with an external device or the broadcast signal's strength. The problem Try this! There is no sound or the sound is too low at maximum volume. The picture is good but there is no sound. No sound is heard. The speakers are making an odd sound. Check the volume control of your TV, and then check the volume control of the external device (cable box or satellite box, DVD, Blu-ray, etc.) connected to your TV. Set Sound Output TV Speaker Settings Sound Sound Output If you are using an external device, check the device's audio output option. For example, you may need to change your cable box's audio option to HDMI if the box connected to your TV is using an HDMI cable. To listen to computer sound, connect an external speaker to the computer's audio output connector. If your TV has a headphone jack, make sure there is nothing plugged into it. Reboot the connected device by disconnecting and then reconnecting the device's power cable. Check whether the Digital Output Audio Format is set to Dolby Digital+ If you are using a receiver that does not support Dolby Digital Plus, you will hear no sound when you select Dolby Digital+ Settings Sound Expert Settings Digital Output Audio Format Dolby Digital+ Run Start Sound Test Settings Support Self Diagnosis Start Sound Test Make sure that the audio cable is connected to the correct audio output connector on the external device. For antenna or cable connections, check the signal information. A low signal level may cause sound distortions. My Computer Won't Connect. When the TV has difficulties connecting to a PC, these steps may help resolve the problem. The problem Try this! The Not Set your PC's output resolution so it matches a resolution supported by the TV. message appears. The video is OK but there is no audio. Select Source to confirm that the correct input source has been selected. If the TV is not connected to a cable box or satellite box, run Auto Program to search for channels. Settings Broadcasting Auto Program Confirm that the coaxial cable is securely connected to the TV. Run Start Setup Auto Program Settings General Start Setup Settings Broadcasting Auto Program Go to Caption Settings and change the Caption Mode Settings General Accessibility Caption Settings Caption Mode Some channels may not have caption data. The compression of the video content may cause picture distortions. This is especially true with fast moving pictures from sports programs and action movies. A weak signal can cause picture distortions. This is not a problem with the TV. If you are using an HDMI connection, check the audio output setting on your PC. If you are using an HDMI-to-DVI cable, a separate audio cable is required. Note that the HDMI-to-DVI connection is only supported by the HDMI (DVI) port and this port does not transmit audio. To listen to the computer sound, connect external speakers to the audio output connection of the computer. I Can't Hear the Sound Clearly. When the TV has difficulties with sound, these steps may help resolve the problem." +'How do I change the current time on TV?',"'To change the current time on your TV, you can set it manually by going to Settings > General > System Manager > Time > Clock > Clock Mode Manual. From there, you can directly enter the current time using the directional buttons on the remote control.'","Setting the current time. Settings General System Manager Time Try Now You can set the Clock manually or automatically. Once the Clock is set, you can view the current time on the TV anytime. You must reset the clock in the following cases: The power cable is disconnected and then connected. The Clock Mode is changed from Manual Auto The TV is not connected to the Internet. No broadcast signals are received. Setting the clock automatically Settings General System Manager Time Clock Clock Mode Auto This function works only when the TV is connected to the Internet or is receiving digital broadcasts through a connected antenna. The accuracy of the time information received may differ with the channel and signal. If you get your TV signal from a cable broadcast receiver/set-top box or a satellite receiver/satellite set-top box connected to an HDMI or Component port, you must set the current time manually. Adjusting the clock for DST and time zone Settings General System Manager Time Clock Sets the right time by setting Daylight Savings Time (DST) and your local time zone. Time Zone Selects your time zone. This function is only available when the Clock Mode is set to Auto DST Automatically adjusts for Daylight Saving Time (DST). This function is only available when the Clock Mode is set to Auto Changing the current time Settings General System Manager Time Clock Time Offset Time Offset adjusts the time through a network connection. This function is only available if Clock Mode is set to Auto and the TV is connected to the Internet though a local area network. Time Offset adjusts the time if the TV fails to receive time information through normal digital broadcast signals. Setting the clock manually Settings General System Manager Time Clock Clock Mode Manual When Clock Mode is set to Manual , you can directly enter the current time. Select the Date and Time by using the directional buttons on the remote control.Setting Up a Schedule Viewing. Configure the TV to show a specific channel or program at a specific time and date. The icon appears next to programs that have been configured for a schedule viewing. To set up a schedule viewing, you must first set the TV's clock ( Settings General System Manager Time Clock Setting up a schedule viewing The Guide Screen On the Guide screen, select a program you would like to view, and then press and hold the Select button. Select Schedule Viewing on the pop-up menu that appears. The Program Info Screen Press the Select button while watching the TV. The Program Info window appears. Select a broadcast scheduled program by using the left or right directional buttons, and then press the Select button. You can schedule watching the program by selecting Schedule Viewing Changing the recording time. When recording a live program you can specify how long you want to record for. Press the down directional button to display the control bar at the bottom. Press the left or right directional buttons to move to Edit Recording Time , and then press the Select button. A slider scale appears and you will hear the recording time currently set. Use the left and right directional buttons to change the time, and then press the Select button. You will hear a menu that gives information about the recording time. To return to the program you are watching, press the button on the remote. The maximum recording time differs depending on the capacity of the USB device. " +'How do I select Optimized sound mode?',"'To select the Optimized sound mode, you would need to go to Settings, then Sound, then Sound Mode, and finally choose the Optimized option.'","Choosing a sound mode. Settings Sound Sound Mode Try Now You can select a favorite sound mode suitable for the content type or the listening environment. Standard Optimized Amplify This function may not be supported depending on the model.Configuring a Sound Mode and Expert Settings. Change the Sound Mode and adjust Expert Settings.Configuring advanced sound settings. Settings Sound Expert Settings Try Now You can personalize the sound quality by adjusting the following settings. Balance Try Now Equalizer Try Now HDMI Input Audio Format Try Now Digital Output Audio Format Try Now If you use a receiver that does not support Dolby Digital Plus, you will hear no sound if you select Dolby Digital+ The Dolby Digital+ option is only available via ARC(HDMI) for external devices that support the format. Audio Delay Try Now Sound Mirroring When the Sound Mirroring function is set to , you can share your mobile device's sound using your TV speakers. This function may not be supported depending on the TV model or mobile device. Auto Volume Sound Feedback Try Now Reset Sound Try Now " +'How do I turn on or off Remote Management?',"'To turn on or off Remote Management, you can access the settings on your TV. From the Settings menu, go to Support, then select Remote Management. In the Remote Management section, you can read and agree to the service agreements and choose to turn it on or off. Alternatively, you can also start this function by pressing and holding the button for 5 or more seconds.'","Getting support through Remote Management. Settings Support Remote Management Try Now After consenting to our service agreement, you can use Remote Management to access Remote Support and have a Samsung service technician diagnose your TV, correct problems, and update your TV's software, remotely, via the web. You can also turn Remote Management on and off. You can also start this function by pressing and holding the button for 5 or more seconds. This function requires an Internet connection. What is Remote Support? Samsung Remote Support service offers you one-on-one support with a Samsung Technician who can remotely: Diagnose your TV Adjust the TV settings for you Perform a factory reset on your TV Install recommended firmware updates How does Remote Support Work? You can easily get Samsung Remote Support service for your TV: Call the Samsung Contact Center and ask for remote support. Open the menu on your TV and go to the Support section. ( Settings Support Select Remote Management , then read and agree to the service agreements. When the PIN screen appears, provide the PIN number to the agent. The agent will then access your TV. Getting support through Remote Management. Settings Support Remote Management Try Now After consenting to our service agreement, you can use Remote Management to access Remote Support and have a Samsung service technician diagnose your TV, correct problems, and update your TV's software remotely, via the web. You can also turn Remote Management on and off. You can also start this function by pressing and holding the button for 5 or more seconds. This function requires an Internet connection. What is Remote Support? Samsung Remote Support service offers you one-on-one support with a Samsung Technician who can remotely: Diagnose your TV Adjust the TV settings for you Perform a factory reset on your TV Install recommended firmware updates How does Remote Support Work? You can easily get Samsung Remote Support service for your TV: Call the Samsung Contact Center and ask for remote support. Open the menu on your TV and go to the Support section. ( Settings Support Select Remote Management , then read and agree to the service agreements. When the PIN screen appears, provide the PIN number to the agent. The agent will then access your TV. Connecting an IP control device to the TV. Settings General Network Expert Settings IP Remote You can connect an IP control device to the TV remotely to manage your IP device. To use this function, Power On with Mobile must be turned on. Turning this feature on may allow other IP control devices to control your TV. We recommend turning this feature on only if an authorized third-party custom controller is installed and configured specifically for a Samsung TV and if your Wi-Fi network is password protected. This function may not be supported depending on the model or geographical area." +'How do I turn on High Contrast using Bixby?',"'To turn on High Contrast using Bixby, you need to press the button on the Samsung Smart Remote, say ""High Contrast on"", and then release the button. This voice command will activate the High Contrast feature on your TV.'","Setting accessibility settings using Bixby. Bixby lets you control your TV with voice commands spoken into the microphone built into the Samsung Smart Remote. To use Bixby , the TV must be set up and tuned and must be connected to the Internet. You can connect the TV to the Internet during the initial setup or do it later through the settings menu. There are many commands you can say to control the TV. These can be used together with Voice Guide to give you voice input and voice output. To speak a command, press and hold the button (located near the top of the Samsung Smart Remote, directly below the LED power indicator and above the directional button) and say the command, and then release the button. The TV will confirm the command. For example you can: Turn Voice Guide on: Press the button on the Samsung Smart Remote, and then say Voice Guide on Turn Video Description on: Press the button on the Samsung Smart Remote, and then say Video Description on Turn High Contrast on: Press the button on the Samsung Smart Remote, and then say High Contrast on White text on black background (high contrast). Settings General Accessibility High Contrast Try Now You can change Smart Hub and setting menu screens to white text on a black background or change the transparent TV menus to opaque so that text can be more easily read. To activate this function, set High Contrast High Contrast is on, some Accessibility menus are not available.Running Bixby. Press and hold the button on your Samsung Smart Remote, say a command, and then release the button. The TV recognizes the voice command. To view the Bixby guide, press the button once: When you press the button for the first time, the Using Bixby button appears at the bottom of the screen. Press the Select button. The Using Bixby popup window appears and a tutorial on using Bixby is shown. When you press the button after the first time, the Enter My Bixby button appears at the bottom of the screen. Press the Select button to go to the My Bixby screen. " +'How do I check scheduled viewings?',"'To check scheduled viewings, you can follow these steps:\n\n1. Open the Smart Hub by pressing the button on your remote.\n2. Use the left or right directional button to move to Live TV.\n3. Press the up directional button to move to the top row.\n4. Use the left or right directional button to move to Schedule Manager Recordings.\n5. Press Select to open Schedule Manager Recordings.\n6. Use the up directional button to move to the Schedules menu.\n7. Use the down directional button to move to the list of programs scheduled to view.\n\nFrom there, you can see the list of programs that are scheduled for viewing.'","Setting up a scheduled viewing for a channel. Using the schedule viewing options from the guide screen On the Guide screen, select a program you would like to view, press and hold the Select button, and then select Schedule Viewing in the pop-up menu that appears. Using the schedule viewing options in the program info window Press the Select button while watching TV. The Program Info window appears. Select a program that will be broadcast using the left or right directional buttons. Press the Select button, and then select Schedule Viewing schedule viewing of that program. Using schedule viewing to view programs at a specified time on a specified date (U.S.A and Canada) Live TV Schedule Manager Add Viewing (Other countries) Live TV Schedule Manager Recordings Schedules Add Viewing To schedule a viewing, set the values for Antenna Channel Repeat Date and Start Time of the broadcast program to view, and then select Save Check the current time in Settings General System Manager Time Clock This function may not be supported depending on the model or geographical area. Setting Up a Schedule Viewing. Configure the TV to show a specific channel or program at a specific time and date. The icon appears next to programs that have been configured for a schedule viewing. To set up a schedule viewing, you must first set the TV's clock ( Settings General System Manager Time Clock Setting up a schedule viewing The Guide Screen On the Guide screen, select a program you would like to view, and then press and hold the Select button. Select Schedule Viewing on the pop-up menu that appears. The Program Info Screen Press the Select button while watching the TV. The Program Info window appears. Select a broadcast scheduled program by using the left or right directional buttons, and then press the Select button. You can schedule watching the program by selecting Schedule Viewing Cancelling a scheduled viewing. You have two ways to cancel a scheduled viewing. 1. Cancelling a scheduled viewing from the Guide Live TV Guide Move to the program you want to cancel scheduled viewing for, and then press the Select button. Move to Cancel Scheduled Viewing , and then press the Select button. When a pop-up message asking you to cancel the selected scheduled viewing appears, select Yes . The selected scheduled viewing is canceled and the screen returns to the Guide screen. 2. Cancelling a scheduled viewing from Smart Hub Live TV Schedule Manager Recordings Schedules Delete Press the button to open Smart Hub , and then press the left or right directional button to move to Live TV Press the up directional button to move to the top row, and then press the left or right directional button to move to Schedule Manager Recordings Press Select to open Schedule Manager Recordings Press the up directional button to move to the Schedules menu. Use the down directional button to move to the list of programs scheduled to view. Use the right directional button to move to the Delete option, and then press the Select button to delete the selected item. This function may not be supported depending on the model or geographical area. " +'What do I do if wireless network connection failed?',"'If wireless network connection failed, you should try going to Open Network Settings and then selecting the correct router. Additionally, make sure the selected router is turned on, and if it is, turn it off and then back on. Enter the correct password if required for the wireless connection.'","Troubleshooting Internet connectivity issues. Try Now After reading the following content, troubleshoot the Internet connection issue. If the problem persists, contact your Internet Service Provider. No network cable found Make sure that the LAN cable is plugged in on both ends. If it is plugged in, make sure that the router is turned on. If the router is on, try turning it off and then on. Wireless network connection failed If a selected wireless router is not found, go to Open Network Settings , and then select the correct router. Settings General Network Open Network Settings Unable to connect to a wireless router Check if the router is turned on. If it is, turn it off and then on. Enter the correct password if required. IP auto setting failed Configure the settings in IP Settings Settings General Network Network Status IP Settings Make sure that the DHCP server is enabled on the router, and then unplug the router and plug it back in. Reset the router if required. For wireless connection, enter the correct password if required. Unable to connect to the network Check all IP Settings Settings General Network Network Status IP Settings After checking the DHCP server status (must be active) on the router, remove the LAN cable, and then connect it again. For wireless connection, enter the correct password if required. Connected to a local network, but not to the Internet Make sure that the Internet LAN cable is connected to the router's external LAN port. Check the DNS values in IP Settings Settings General Network Network Status IP Settings Network setup is complete, but unable to connect to the Internet If the problem persists, contact your Internet Service Provider. The TV Won't Connect to the Internet. When the TV has difficulties connecting to the Internet, these steps may help resolve the problem. The problem Try this! The TV cannot connect to your network or apps (for Internet compatible models only). The wireless network Confirm your wireless modem/router is on and connected to the Internet. connection failed. The wireless network signal is too weak.Troubleshooting Internet Connectivity Issues. If your TV won't connect to the Internet, try the solutions below." diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/ai-core-genaihub-evaluation-with-grounding.md b/tutorials/ai-core-genaihub-evaluation-with-grounding/ai-core-genaihub-evaluation-with-grounding.md new file mode 100644 index 000000000..1143bd8c0 --- /dev/null +++ b/tutorials/ai-core-genaihub-evaluation-with-grounding/ai-core-genaihub-evaluation-with-grounding.md @@ -0,0 +1,1692 @@ +--- +parser: v2 +auto_validation: true +time: 45 +primary_tag: software-product>sap-ai-core +tags: [ tutorial>beginner, topic>artificial-intelligence, topic>machine-learning, software-product>sap-ai-core ] +author_name: Smita Naik +author_profile: https://github.com/I321506 +--- + +# GenAI Grounding Evaluations with SAP AI Core + This guide describes how to use SAP AI Core Custom Evaluation to benchmark Large Language Models (LLMs) in a Retrieval-Augmented Generation (RAG) scenario, with a specific focus on groundedness evaluation. + +In RAG-based enterprise applications, model responses must be grounded in trusted data sources such as enterprise documents, knowledge bases, or curated repositories. SAP AI Core’s evaluation capabilities allow you to systematically measure grounding quality, retrieval relevance, and alignment of generated responses with source content. + +## You will learn +- How to configure a grounding evaluation workflow in SAP AI Core. +- How to upload and manage RAG-based test datasets that include retrieved context. +- How to define grounding-specific evaluation metrics for assessing LLM responses. +- How to execute grounding evaluations and analyze the grounding results. + +## Prerequisites +1. **BTP Account** + Set up your SAP Business Technology Platform (BTP) account. + [Create a BTP Account](https://developers.sap.com/group.btp-setup.html) +2. **For SAP Developers or Employees** + Internal SAP stakeholders should refer to the following documentation: [How to create BTP Account For Internal SAP Employee](https://me.sap.com/notes/3493139), [SAP AI Core Internal Documentation](https://help.sap.com/docs/sap-ai-core) +3. **For External Developers, Customers, or Partners** + Follow this tutorial to set up your environment and entitlements: [External Developer Setup Tutorial](https://developers.sap.com/tutorials/btp-cockpit-entitlements.html), [SAP AI Core External Documentation](https://help.sap.com/docs/sap-ai-core?version=CLOUD) +4. **Create BTP Instance and Service Key for SAP AI Core** + Follow the steps to create an instance and generate a service key for SAP AI Core: + [Create Service Key and Instance](https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/create-service-key?version=CLOUD) +5. **AI Core Setup Guide** + Step-by-step guide to set up and get started with SAP AI Core: + [AI Core Setup Tutorial](https://developers.sap.com/tutorials/ai-core-setup.html) +6. An Extended SAP AI Core service plan is required, as the Generative AI Hub is not available in the Free or Standard tiers. For more details, refer to +[SAP AI Core Service Plans](https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/service-plans?version=CLOUD) +7. **Orchestration Deployment** + Ensure at least one orchestration deployment is ready to be consumed during this process. +Refer to [this tutorial understand the basic consumption of GenAI models using orchestration.](https://developers.sap.com/tutorials/ai-core-orchestration-consumption.html) +8. **Basic Knowledge** + Familiarity with the orchestration workflow is recommended +9. **Install Dependencies** + Install the required Python packages using the requirements.txt file provided. +Download [requirements.txt](img/requirements.txt) + +💡 Right-click the link above and choose **"Save link as..."** to download it directly. + + +**Below are the Steps to Run a GenAI Evaluation in SAP AI Core** + +### Pre-Read + +This tutorial uses a structured evaluation dataset named **emanual.csv** Placed inside the folder **DATASET_RAG** + +You can access the DATASET_RAG.zip from the GitHub repository. + +- [Download Full Dataset as ZIP](https://github.com/SAP-samples/aicore-genai-samples/tree/main/genai-sample-apps/Evaluation/evaluation_with_grounding/data) + +**NOTE:** If you download the ZIP file, extract it and navigate to the **DATASET_RAG** folder. Place the entire folder in your designated location for further use. + +**Dataset** + +It leverages the publicly available **emanual.csv**, which contains commonly asked emanual questions. Each entry includes: + + - topic (user query) + - answer + - context + +#### How it works + +- A query and its retrieved context are sent to the model. + +- The model generates a grounded response. + +- The grounding metrics evaluate if the output faithfully uses the provided context. + +### Notebook Reference + +For hands-on execution and end-to-end reference, use the accompanying [Evaluation Grounding Notebook](https://github.com/SAP-samples/aicore-genai-samples/blob/main/genai-sample-apps/Evaluation/evaluation_with_grounding/evaluation_RAG.ipynb ). It includes complete Python code examples that align with each step of this tutorial — from dataset preparation and artifact registration to configuration creation, execution, and result retrieval. + +💡 Even though this tutorial provides stepwise code snippets for clarity, the notebook contains all required imports, object initializations, and helper functions to run the flow seamlessly in one place. + +**To use the notebook:** +- Download and open [notebook](https://github.com/SAP-samples/aicore-genai-samples/blob/main/genai-sample-apps/Evaluation/evaluation_with_grounding/evaluation_RAG.ipynb) in your preferred environment (e.g., VS Code, JupyterLab). +- Configure your environment variables such as AICORE_BASE_URL, AICORE_AUTH_TOKEN, and object store credentials . +- Execute each cell in order to reproduce the complete Evaluation Grounding workflow demonstrated in this tutorial. + +### Environment Variables Setup + +[OPTION BEGIN [SAP AI Launchpad]] + +- Navigate to your SAP AI Core Launchpad. + +- In the Workspaces section, click on "Add" to create a new workspace. + - A workspace in SAP AI Core is a logical container that holds your resources (like models and pipelines) and provides the isolation needed for your projects. + +- When prompted, enter your AI Core credentials (such as Client ID, Client Secret, and Base URL). + - Note: If you're unsure about where to find these credentials, refer to this [guide](https://developers.sap.com/tutorials/ai-core-generative-ai.html#1c4f36d7-f345-4822-be00-c15f133ff7d8). + +- Once the workspace is successfully created, select your desired Resource Group to begin the evaluation process. + +Refer to the screenshot below for guidance: +![img](img/image_34.png) + +[OPTION END] + +[OPTION BEGIN [Python]] + +- Open **Visual Studio Code or Jupyter Notebook**. Create a new file with the .ipynb extension (e.g., custom_evaluation.ipynb). +- Create a **.env** file in the root directory of your project. +- Add your **AI Core** and **AWS credentials** as shown below. + +```env +AICORE_CLIENT_ID="" +AICORE_CLIENT_SECRET="" +AICORE_AUTH_URL="" +AICORE_BASE_URL="" +AICORE_RESOURCE_GROUP="default" + +AWS_ACCESS_KEY="" +AWS_SECRET_ACCESS_KEY="" +AWS_BUCKET_ID="" +AWS_REGION="" +``` + +**Note:** Replace the empty strings "" with your actual credentials. + +Refer to the below screenshot for clarity: +![img](img/image_1.png) + +#### Install Dependencies + +Install the required packages using the [requirements.txt](img/requirements.txt) file you downloaded in the Prerequisites section. +```bash +pip install -r requirements.txt +``` +#### Connect to AI Core Instance + +Once the environment variables are set and dependencies are installed, run the following code to connect to your instance: + +```PYTHON +# Loading the credentials from the .env file +from gen_ai_hub.proxy.gen_ai_hub_proxy import GenAIHubProxyClient +from dotenv import load_dotenv +import os + +# Load environment variables +load_dotenv(override=True) + +# AI Core Credentials +AICORE_BASE_URL = os.getenv("AICORE_BASE_URL") +AICORE_RESOURCE_GROUP = os.getenv("AICORE_RESOURCE_GROUP") +AICORE_AUTH_URL = os.getenv("AICORE_AUTH_URL") +AICORE_CLIENT_ID = os.getenv("AICORE_CLIENT_ID") +AICORE_CLIENT_SECRET = os.getenv("AICORE_CLIENT_SECRET") + +# AWS Credentials +AWS_ACCESS_KEY = os.getenv("AWS_ACCESS_KEY") +AWS_BUCKET_ID = os.getenv("AWS_BUCKET_ID") +AWS_REGION = os.getenv("AWS_REGION") +AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY") + +# Initialize GenAIHub Proxy Client +client = GenAIHubProxyClient( + base_url=AICORE_BASE_URL, + auth_url=AICORE_AUTH_URL, + client_id=AICORE_CLIENT_ID, + client_secret=AICORE_CLIENT_SECRET, + resource_group=AICORE_RESOURCE_GROUP +) +``` + +**NOTE:** +- Ensure the **requirements.txt** installation completes successfully before running the code. +- If you face any issues, recheck your **.env** values and installed packages. + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +- Download the [Bruno_collections](img/AI_Core.json) file + +- please follow the steps in the [Tutorial](https://developers.sap.com/tutorials/ai-core-orchestration-consumption.html) to set up your environment, refer step - **Set Up Your Environment and Configure Access** and proceed till generating the token + +[OPTION END] + +**Important Note:** Please note that for using the document grounding service, your request must contain the **document grounding label** set to **true**. Therefore, existing resource groups without the label won’t work. + +### Preparing Dataset Files and Reference Files + +[OPTION BEGIN [SAP AI Launchpad]] + +> **Note:** This step involves local setup using Python and does not require any action on the SAP AI Launchpad. + +[OPTION END] + +[OPTION BEGIN [Python]] + +In this step, we prepare the dataset and optional reference documents required for grounding evaluation. + +The evaluation notebook dynamically detects the dataset file from a predefined folder structure. +You are not required to hardcode the dataset filename. + +```Python +import os +import json +def get_dataset_file_name(folder_path): + """ + Retrieves the name of the first file in the specified folder. + """ + if not os.path.isdir(folder_path): + print(f"The folder path '{folder_path}' does not exist.") + return None + + items_in_folder = os.listdir(folder_path) + + for item in items_in_folder: + item_path = os.path.join(folder_path, item) + if os.path.isfile(item_path): + return item + + print(f"No files were found in the folder '{folder_path}'.") + return None + + +# --- MAIN EXECUTION --- +DATASET_FOLDER = "./DATASET_RAG/testdata" + +DATASET_NAME = get_dataset_file_name(DATASET_FOLDER) + +if DATASET_NAME: + print(f"Dataset name: {DATASET_NAME}") +else: + print("Missing run or dataset file.") + raise SystemExit("Exiting due to missing run/dataset file.") +``` + +![img](img/image_py_dtst.png) + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +> **Note:** This step involves local setup using Python and does not require any action on Bruno. + +[OPTION END] + +### Registering an Object Store Secret in AI Core + +You can upload the orchestration run files, grounding test datasets, and any optional metric definitions to SAP AI Core using the Tracking API. To upload these files, you must first register an object store secret containing your object store credentials + +[OPTION BEGIN [SAP AI Launchpad]] + +- Open the **SAP AI Core Launchpad** and navigate to the **Administration** tab. +- Select the **Object Store** section from the left-hand menu. +- Click on **“Add”** to register a new object store secret. +- Fill in the required bucket details as shown in the screenshot below. + +![img](img/image_33.png) + +In the **Secret** field, use the following structure to provide your AWS credentials: + +```json +{ + "AWS_ACCESS_KEY_ID": "Enter Your value", + "AWS_SECRET_ACCESS_KEY": "Enter Your value" +} +``` + +[OPTION END] + +[OPTION BEGIN [Python]] +To make your evaluation files available for AI Core orchestration, you need to: + +- Upload them to an object store (e.g., AWS S3). +- Register the object store secret in AI Core. + +**Step 4.1: Setup Authentication and Headers** + +First, define the authentication headers for AI Core REST API calls. + +```PYTHON +def _get_headers(): + headers = { + "Authorization": client.get_ai_core_token(), + "AI-Resource-Group": AICORE_RESOURCE_GROUP, + "Content-Type": "application/json", + } + return headers +``` + +**Step 4.2: Register Object Store Secret in AI Core** + +Register your S3 bucket and credentials as a secret. + +```PYTHON +# Register S3 secret with AI Core which will be used an input source +import requests + +def register_oss_secret(): + headers = _get_headers() + + POST_SECRETS_ENDPOINT = '/v2/admin/objectStoreSecrets' + request_url = f"{AICORE_BASE_URL}{POST_SECRETS_ENDPOINT}" + + request_body = { + "name": "genai-data", + "data": { + "AWS_ACCESS_KEY_ID": AWS_ACCESS_KEY, + "AWS_SECRET_ACCESS_KEY": AWS_SECRET_ACCESS_KEY + }, + "type": "S3", + "bucket": AWS_BUCKET_ID, + "endpoint": "s3-eu-central-1.amazonaws.com", + "region": AWS_REGION, + "pathPrefix": "" + } + try: + response = requests.post( + request_url, headers=headers, data=json.dumps(request_body), timeout=120 + ) + result = response.json() + print(result) + return result + except: + logging.error("Error occurred while attempting to create object store secret") + raise + +register_oss_secret() +``` +[OPTION END] + +[OPTION BEGIN [Bruno]] + +Generic secrets securely store AWS S3 credentials required for document access + +• Expand **objectStoreSecrets** under admin and select create a secret request + +Use the below payload to create a secret for AWS S3 with NoAuthentication as authentication type. + +```CODE +{ + "name": "genai-data", + "data": { + "AWS_ACCESS_KEY_ID": "", + "AWS_SECRET_ACCESS_KEY": "", + }, + "type": "S3", + "bucket": "", + "endpoint": "", + "region": "", + "pathPrefix": "" + } +``` +• Ensure that all values in the data dictionary are Base64-encoded as per AWS S3 credential requirements + +![img](img/image-br01.png) + +[OPTION END] + +> ⚠️ **Important Note (Must Read)** +> +> - You must **create an object store secret named `default`** to store **output artifacts** from orchestration runs. This is **mandatory**. +> - For **input artifacts**, you may create additional object store secrets with different names if needed. +> - If a secret named `default` is not configured, orchestration runs will **fail** due to missing output target setup. + +### Create a Generic Secret + +In the next step, we create a secret that enables grounding by adding on the "labels" config. This generic secret needs to be created to provide details of the hyperscaler and bucket details so that grounding service will know how to retrieve data from it. + +[OPTION BEGIN [SAP AI Launchpad]] + +**Generic secret for AWS S3** + +1. **Open the Workspaces app** and choose the **AI API connection**. + +2. If needed, toggle between **tenant-level** and **resource-group-level** secret creation. + +3. Navigate to the **SAP AI Core Administration** app and go to **Generic Secrets**. + +4. Choose **Add** to create a new secret. + +5. Fill out the form as follows: + - **Resource Group**: `` + - **Name**: `aws-credentials-1` + - **Secret (JSON format)**: + +```json + { + "access_key_id": "", + "secret_access_key": "", + "bucket": "", + "host": "", + "region": "", + "url": "", + "username": "", + "authentication": "NoAuthentication", + "description": "AWS S3 credentials for document grounding", + "type": "HTTP", + "proxyType": "Internet" + } + +``` + +**Labels** + +Add the following key-value pairs as labels: + +| Key | Value | +|--------------------------------------------------|-------| +| ext.ai.sap.com/document-grounding | true | +| ext.ai.sap.com/documentRepositoryType | S3 | + +![img](img/image078.png) + +Click Add to save the secret. + +[OPTION END] + +[OPTION BEGIN [Python]] + +Generic secrets securely store aws credentials required for document access + +```python +import time +import base64 +def encode_base64(value): + return base64.b64encode(value.encode('utf-8')).decode('utf-8') + +def create_generic_secret(): + payload ={ + "name": "groundingsecret", + "data": { + "url": encode_base64("https://s3-eu-central-1.amazonaws.com"), + "authentication": encode_base64("NoAuthentication"), + "description": encode_base64("grounding secret"), + "access_key_id": encode_base64(AWS_ACCESS_KEY), + "bucket": encode_base64(AWS_BUCKET_ID), + "host": encode_base64(AWS_HOST), + "region": encode_base64("eu-central-1"), + "secret_access_key": encode_base64(AWS_SECRET_ACCESS_KEY), + "username": encode_base64(AWS_USERNAME), + }, + "labels": [ + { + "key": "ext.ai.sap.com/document-grounding", + "value": "true" + }, + { + "key": "ext.ai.sap.com/documentRepositoryType", + "value": "S3" + } + ] +} + time.sleep(60) + try: + headers = _get_headers() + api_url = f"{AICORE_BASE_URL}/v2/admin/secrets" + response = requests.post(api_url, headers=headers, json=payload) + if(response.status_code == 200): + print("Generic secret created successfully") + else: + print(f"Failed to create generic secret: {response}") + except Exception as e: + print(f"Error creating secret: {e}") +create_generic_secret() +``` +![img](img/image_py_sec.png) + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +**Generic secret for AWS S3** + +Generic secrets securely store AWS S3 credentials required for document access + +**Endpoint:** +``` +POST:{{ai_api_url}}/v2/admin/secrets +``` + +Use the below payload to create a secret for AWS S3 with NoAuthentication as authentication type. + +```CODE + +{ + "name": "", // Name of the generic secret to be created + "data": { + "url": "", // Base64 encoded value of url + "authentication": "Tm9BdXRoZW50aWNhdGlvbg=", // Base64 encoded value for NoAuthentication + "description": "", // Base64 encoded description of the secret + "access_key_id": "", // Base64 encoded value of access key id + "bucket": "", // Base64 encoded value of bucket name + "host": "", // Base64 encoded value of host + "region": "", // Base64 encoded value of region + "secret_access_key": "", // Base64 encoded value of secret access key + "username": "", // Base64 encoded value of username + "type": "SFRUUA==", // [Optional] Base64 encoded value for HTTP + "proxyType": "SW50ZXJuZXQ=", // [Optional] Base64 encoded value for Internet + }, + "labels": [ + { + "key": "ext.ai.sap.com/document-grounding", // Label for Document Grounding feature + "value": "true" + }, + { + "key": "ext.ai.sap.com/documentRepositoryType", // Label for Document Repository Type + "value": "S3" + } + ] +} +``` + +•Ensure that all values in the data dictionary are Base64-encoded as per AWS S3 credential requirements. + +![img](img/image_br_sec.png) + +[OPTION END] + +### Create a Grounding Pipeline + +Before running grounding evaluations, you must create a grounding pipeline in SAP AI Core. This pipeline is responsible for reading documents from your object store, processing them, and preparing them for retrieval. + +[OPTION BEGIN [SAP AI Launchpad]] + +1. Navigate to Generative AI Hub from the side menu. + +2. Click on Grounding Management. + +3. Click Create to open the Create Data Repository wizard. + +4. In the **Create Data Repository** form: + - **Embedding Model**: Leave as default (`Text Embedding 3 Large`). + - **Document Store Type**: Select `S3`. + - **Document Grounding Generic Secret**: Select the AWS secret you created in **Step 5** (e.g., `aws-credentials-1`). + +5. Once selected, you're ready to proceed. The required S3 bucket, region, and credentials are handled through the secret. + +6. Click **Create** to finish. + +![img](img/image080.png) + +--- + +> ✅ After completing this step, your knowledge base (data repository) will be linked to your document source. The documents will be embedded and made available for grounding in the chat experience. + +[OPTION END] + +[OPTION BEGIN [Python]] + +The following code creates an S3-based grounding pipeline using the generic secret you created earlier: + +```python +def create_s3_grounding_pipeline(): + headers = _get_headers() + api_url = f"{AICORE_BASE_URL}/v2/lm/document-grounding/pipelines" + payload = { + "type": "S3", + "configuration": { + "destination": "groundingsecret" + } + } + time.sleep(5) + + try: + response = requests.post(api_url, headers=headers, json=payload) + if response.status_code == 201: + print("S3 document grounding pipeline created successfully") + else: + print(f"Failed to create pipeline. Status: {response.status_code}, Response: {response.text}") + except Exception as e: + print(f"Error creating S3 document grounding pipeline: {e}") +create_s3_grounding_pipeline() +``` + +This registers the grounding pipeline in SAP AI Core. After creation, you will upload documents and trigger pipeline runs to populate the data repository. + +![img](img/image_py_pip.png) + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +You can create the pipeline through an API request in Bruno like below: + +**Endpoint:** +``` +POST: {{ai_api_url}}/v2/lm/document-grounding/pipelines +``` + +**Headers:** +```bash +Content-Type: application/json +Authorization: Bearer +``` +**Body:** + +```json +{ + "type": "S3", + "configuration": { + "destination": "s3-secret" + } +} +``` +This creates the S3-based grounding pipeline and links it to your previously created generic secret. + +![img](img/image_br_pip.png) + +[OPTION END] + +### Upload Evaluation Files to Object Store and Register Artifact in AI Core + +[OPTION BEGIN [SAP AI Launchpad]] + +After creating the secret, upload your evaluation files to the S3 bucket and register them as an artifact in AI Core. + +#### Register Uploaded Files as Artifact in AI Core + +To register your evaluation dataset with SAP AI Core, you need to upload it as an artifact. Follow the instructions below using the **SAP AI Launchpad UI**. + +--- + +- Open the **SAP AI Core Launchpad**. +- Navigate to the **Generative AI/Optimization/Artifacts** section to create dataset artifact. + +![img](img/image_19.png) + +- On the **Artifacts** section, click **add**. + +--- + +- On the **General Information** screen, enter the following: + + - **Select Scenario:** `genai-evaluations` + - **Name:** `genai-eval-test-data` + - **Description:** `Demo artifacts for evaluation flow.` + - **Select Object Store:** `genai-data` + - **Sub-folder path:** `genaiEvaluation/` + + > 💡 Replace `` with your **SAP BTP user ID** or the folder path in your object store where the evaluation files are uploaded. + +- On the **Labels** screen, click **“Add Label”** and provide the following: + + - **Key:** `prompt-evaluation` + - **Value:** `true` + *(Note: The prefix `ext.ai.sap.com/` is automatically pre-filled in the UI.)* + + ![img](img/image_21.png) + +- Review all entered details carefully. +- Click **“Add”** to complete the artifact registration. + +[OPTION END] + +[OPTION BEGIN [Python]] + +After creating the secret, upload your evaluation files to the S3 bucket and register them as an artifact in AI Core. + +**Step 5.1: Upload Files to S3 Bucket** + +```PYTHON +# uploading these files to Object store to register as an artifact inside ai core + +import boto3 +import os +import uuid + +def upload_folder_to_s3(folder_path, bucket_name, s3_prefix=""): + """ + Upload a folder to an S3 bucket recursively. + + :param folder_path: The local folder path to upload. + :param bucket_name: The name of the S3 bucket. + :param s3_prefix: Optional prefix to use for the S3 keys (e.g., subfolder in the bucket). + """ + s3_client = boto3.client( + 's3', + aws_access_key_id=AWS_ACCESS_KEY, + aws_secret_access_key=AWS_SECRET_ACCESS_KEY, + region_name=AWS_REGION + ) + + for root, dirs, files in os.walk(folder_path): + for file_name in files: + print("val of root is ", file_name) + local_path = os.path.join(root, file_name) + # Compute the relative path for the S3 key + relative_path = os.path.relpath(local_path, folder_path) + s3_key = os.path.join(s3_prefix, relative_path).replace("\\", "/") # Ensure S3-compatible paths + print("val of s3 key is ", s3_key) + print(f"Uploading {local_path} to s3://{bucket_name}/{s3_key}") + + # Upload the file + s3_client.upload_file(local_path, bucket_name, s3_key) + +# Example usage +folder_to_upload_testdata = "./DATASET_RAG" +user_directory_prefix = "" # replace with your i-number as string here +prefix_guid = user_directory_prefix if user_directory_prefix is not None else str(uuid.uuid4().hex) +s3_testdata_prefix = f"genaiEvaluation/{prefix_guid}/testdata" # Leave empty for root of the bucket + +upload_folder_to_s3(folder_to_upload_testdata, AWS_BUCKET_ID, s3_testdata_prefix) +input_artifact_path = f"ai://genai-simplified-notebook/genaiEvaluation/{prefix_guid}" +``` +![img](img/image_5.png) + +**Step 5.2: Register Uploaded Files as Artifact in AI Core** + +```PYTHON +import requests + +# Registering the uploaded files from AWS as artifacts to use inside configuration. + +def register_artifact(): + headers = _get_headers() + + GET_ARTIFACTS_ENDPOINT = '/v2/lm/artifacts' + request_url = f"{AICORE_BASE_URL}{GET_ARTIFACTS_ENDPOINT}" + + request_body = { + "labels": [ + { + "key": "ext.ai.sap.com/prompt-evaluation", + "value": "true" + } + ], + "name": "genai-eval-test-data", + "kind": "other", + "url": input_artifact_path, # input artifact path + "description": "demo artifacts for evaluation flow.", + "scenarioId": "genai-evaluations" + } + try: + response = requests.post( + request_url, headers=headers, data=json.dumps(request_body), timeout=120 + ) + result = response.json() + print(result) + return result['id'] + except: + logging.error("Error occurred while attempting to create an execution") + raise + + +artifact_id = register_artifact() +``` +![img](img/image_6.png) + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +Before registering a dataset artifact in Bruno, you must upload your CSV file to the SAP AI Core object store using the Dataset API. +Bruno cannot upload files directly to S3; therefore, this step is required. + +**Prerequisites** + + - An object store secret must already exist in your resource group.Typically, this is the default secret named **default**. + + - The Dataset API currently supports: + + - S3 object stores only + + - CSV file uploads + +**Upload Your Dataset** + +Use the Dataset API – Upload File request in Bruno: + +```bash +PUT:{{ai_api_url}}/v2/lm/dataset/files/{{secretName}}/{{datasetPath}} +``` + +**Headers** + +```json +Authorization: Bearer {{token}} +AI-Resource-Group: {{resourceGroup}} +Content-Type: text/csv +``` + +**Body** + +Upload your .csv file directly as binary in Bruno’s Body + +Example Path Values: + + - secretName: default + + - datasetPath: testdata/emanual.csv + +![img](img/image_br_dt.png) + +**Note:** + +Save the ai://… URL — you will use this when creating the dataset artifact. + +**Register the Dataset Artifact** + +- Click on **Register artifact** under lm -> artifacts in bruno collection to register the artifact + +```CODE +{ + "name": "aiconfig", + "kind": "dataset", + "url": "ai://default/testdata/emanual.csv", + "scenarioId": "genai-evaluations" +} +``` +![img](img/image-br02.png) + +[OPTION END] + +### Create an Evaluation Configuration + +[OPTION BEGIN [SAP AI Launchpad]] + +**Create Orchestration Registry Configuration** + +The Orchestration Registry allows you to define how different modules—such as prompting, grounding, LLM execution, and safety filters—work together as a single workflow. By creating an orchestration configuration, you specify the exact steps the system will execute for each evaluation input. + +- Go to Generative AI Hub → Orchestration → Orchestration Configurations + +- click create + +- In Grounding section pass input variables, output variable, and in Data repositories add the selected pipeline created in previous step + +![img](img/image_ail_or1.png) + +![img](img/image_ail_or2.png) + +- In templating add the user prompt + +```json +You are a helpful assistant specialized in SAP-related topics. Answer the following SAP question using the provided context. If the answer is not explicitly available in the context, respond with: `The answer is not available in the provided context.` + +Request: {{?topic}}. + +Context: {{?groundingOutput}} +``` +![img](img/image_ail_or3.png) + +- select the model in model configuration and save + +![img](img/image_ail_or4.png) + +![img](img/image_ail_or5.png) + + +To begin evaluating your model, you need to create an Evaluation Configuration using the **genai-evaluations** scenario in SAP AI Core. This configuration defines what to evaluate, using which dataset, with which metrics, and how. + +#### Steps to Create Evaluation Configuration + +1. Go to Generative AI Hub → Optimization. + +2. Click Create to start a new evaluation configuration. + +![img](img/image_25.png) + +- Select Test Input / Runs -> orchestration configuration + +Then: + + - Select your registered dataset artifact + + - Enter the dataset path (example): + testdata/emanual.csv + + - Set the number of test samples (e.g., 20) + + ![img](img/image_26_01.png) + +- Click **Next** to go to Metrics selection. + +#### Select Evaluation Metrics + +Choose the metrics you want to evaluate. + +You may choose one or multiple system-defined or custom metrics—examples: + + - Pointwise RAG Groundedness + + - Pointwise RAG Context Relevance + + - Pointwise RAG Context Precision + + - Pointwise RAG Completeness + +![img](img/image_27.png) + +--- + +> 📘 **Helpful Resources**: +> +> - [System-Defined Evaluation Metrics – SAP Documentation](https://help.sap.com/docs/sap-ai-core/generative-ai-hub/system-defined-evaluation-metrics) +> - [Define Your Own Custom Metrics – SAP Guide](https://help.sap.com/docs/sap-ai-core/generative-ai-hub/custom-metrics) +> *(If your evaluation requires domain-specific or advanced scoring logic)* + +> **Note: You may select additional metrics based on your use case.** + +--- + +#### Additional Configuration + +- Set **Number of Repetitions** to `1`. +- Choose an existing deployment for **Orchestration Endpoint**. +- In the **Input Variable Mapping**, enter the following mapping: + + ```json + { + "prompt/question": "data/topic" + } + ``` + ![img](img/image_29.png) +--- +[Learn more about variable mapping](https://help.sap.com/docs/sap-ai-core/generative-ai-hub/variable-mapping) + +#### Final Review & Start + +- Review all the details on the summary page. +- Once confirmed, click **Create** to start the evaluation job. +![img](img/image_40.png) + +> ✅ You have now successfully configured and triggered a Generative AI Evaluation. + +[OPTION END] + +[OPTION BEGIN [Python]] + +To begin evaluating your model programmatically, you need to create an Evaluation Configuration using the **genai-evaluations** scenario in **SAP AI Core**. This configuration defines what to evaluate, which orchestration deployment to use, which dataset and metrics to apply, and how the evaluation will be executed — all through Python. + +#### Create Orchestration Deployment + +Before proceeding with the evaluation configuration, you must first deploy your orchestration workflow. + +An **orchestration deployment URL** is required to run the evaluation. Once the deployment is created, you should wait until its status is **running** and the deployment provides a **URL**. + +**NOTE:This URL will be used in the configuration definition in the next step.** + +```PYTHON +import requests +import json +import time + + + +def create_orchestration_configuration(): + headers = _get_headers() + request_body = { + "name": "orchestrationDeployment", + "executableId": "orchestration", + "scenarioId": "orchestration", + "parameterBindings": [ + { + "key": "modelFilterList", + "value": "null" + }, + { + "key": "modelFilterListType", + "value": "allow" + } + ], + "inputArtifactBindings": [] + } + + GET_CONFIGURATIONS_ENDPOINT = '/v2/lm/configurations' + request_url = f"{AICORE_BASE_URL}{GET_CONFIGURATIONS_ENDPOINT}" + try: + response = requests.post( + request_url, headers=headers, data=json.dumps(request_body), timeout=120 + ) + print(response) + if(response.status_code != 201): + raise + result = response.json() + print(result) + return result['id'] + except: + logging.error("Error occurred while attempting to create a Configuration") + raise + +def execute_orchestration_deployment(configuration_id): + headers = _get_headers() + GET_DEPLOYMENTS_ENDPOINT = '/v2/lm/deployments' + request_url = f"{AICORE_BASE_URL}{GET_DEPLOYMENTS_ENDPOINT}" + + request_body = { + "configurationId": configuration_id + } + + try: + response = requests.post( + request_url, headers=headers, data=json.dumps(request_body), timeout=120 + ) + print(response) + if(response.status_code != 202): + print("Deployment execution failed") + result = response.json() + print(result) + return result['id'] + + except: + logging.error("Error occurred while attempting to create an execution") + raise + +def get_deployment_status(orchestration_deployment_id): + headers = _get_headers() + api_url = f"{AICORE_BASE_URL}/v2/lm/deployments/{orchestration_deployment_id}?$select=status" + timeout = 400 + initial_interval = 30 + pending_interval = 10 + start = time.time() + + status = None + current_interval = initial_interval + + while time.time() - start < timeout: + response = requests.get(api_url, headers=headers) + if response.status_code == 200: + status = response.json().get('status') + print(f"Deployment {orchestration_deployment_id} status: {status}") + # Adjust polling interval based on status + if status == 'RUNNING': + return True + elif status == 'UNKNOWN': + current_interval = initial_interval + elif status == 'PENDING': + current_interval = pending_interval + + else: + print(f"Failed to fetch deployment status. HTTP {response.status_code}") + return False + + # Waiting according to status for API call + time.sleep(current_interval) + +def get_deployment_url(orchestration_deployment_id): + headers = _get_headers() + response = requests.get(f"{AICORE_BASE_URL}/v2/lm/deployments/{orchestration_deployment_id}", headers=headers) + if response.status_code != 200: + raise Exception(f"Failed to get deployment URL: {response.status_code} - {response.text}") + return response.json().get('deploymentUrl') + +# You can skip this step if you already have a orchestration deployment running +deployment_url = DEPLOYMENT_URL +if not deployment_url: + configuration_id = create_orchestration_configuration() + orchestration_deployment_id = execute_orchestration_deployment(configuration_id) + is_running = get_deployment_status(orchestration_deployment_id) + if is_running: + deployment_url = get_deployment_url(orchestration_deployment_id) + print(f"Deployment URL: {deployment_url}") + else: + print("Deployment is not running or failed.") +``` + +![img](img/image_36.png) + +#### Select your Models + +Add the LLMs you wish to use in the string selected_models_str + +```PYTHON +# Manual selection of models +selected_models_str="gpt-4o:2024-05-13" +print("Selected models string:", selected_models_str) +``` + +#### Select system defined metrics + +Add the system defined metrics you wish to use in the string selected_metrics_str. + +Note: If your dataset does not have a reference column, DO NOT Select metrics where reference is required. + +```PYTHON +# Manual Selection of Metrics +selected_metrics_str = "Pointwise RAG Context Precision,Pointwise RAG Completeness" +print(selected_metrics_str) +``` + +#### Create Orchestration Registry Configuration + +The following code defines a function create_orchestration_registry_config() that creates a new Orchestration Configuration in Orchestration Registry. + +Note : If you wish to use an existing orchestration config, skip executing this cell and add the orchestration config id in orchestration_registry_id string in the next cell. + +```PYTHON +def create_orchestration_registry_config(): + headers = _get_headers() + + CREATE_ORCHESTRATION_REGISTRY = '/v2/registry/v2/orchestrationConfigs' + request_url = f"{AICORE_BASE_URL}{CREATE_ORCHESTRATION_REGISTRY}" + model_name,model_version=selected_models_str.split(":") + request_body = { + "name": "genai-eval-test-1", + "version": "0.0.1", + "scenario": "genai-evaluations", + "spec": { + "modules": { + "prompt_templating": { + "prompt": { + "template": [ + { + "role": "user", + "content": "You are a helpful assistant specialized in e-manual topics. Answer the following e-manual questions using the provided context. If the answer is not explicitly available in the context, respond with: `The answer is not available in the provided context.` \\n\\nRequest: {{?topic}}. \\n\\nContext: {{?groundingOutput}}" + } + ], + "defaults": {} + }, + "model": {"name": f"{model_name}", "version": f"{model_version}", + }, + }, + "grounding": { + "type": "document_grounding_service", + "config": { + "filters": [ + { + "id": "helpRepo", + "data_repositories": [ + "*" + ], + "search_config": { + "max_chunk_count": 10 + }, + "data_repository_type": "help.sap.com" + } + ], + "placeholders": { + "input": [ + "topic" + ], + "output": "groundingOutput" + } + } + } + } + } + } + try: + response = requests.post( + request_url, headers=headers, data=json.dumps(request_body), timeout=120 + ) + if(response.status_code != 200): + print(response.json()) + raise + result = response.json() + print(result) + return result['id'] + except: + logging.error("Error occurred while attempting to create a orchestration registry id") + raise +orchestration_registry_id = create_orchestration_registry_config() +``` + +#### Define Evaluation Flow Parameters + +Below is an example of defining the required input parameters for the prompt evaluation flow. + +```PYTHON +# Defining required input parameters for the prompt Evaluation Flow +import json +test_data_path = f"testdata/testdata/{DATASET_NAME}" # specify the test data path here. For the full folder just specifying testdata will work +test_datasets = json.dumps({'path': test_data_path, 'type': 'csv'}) +print(test_datasets) +metrics_list = ",".join([selected_metrics_str]) +models_list = selected_models_str +print(f"Selected metrics: {metrics_list}") +print(f"Selected models: {models_list}") +#variable_mapping = json.dumps({'prompt/question': 'data/topic'}) # to map the question prompt variable to the entry in dataset. +# orchestration_deployment_url = deployment_url # needs to specify this to use a specific deployment id +orchestration_deployment_url = deployment_url +repetitions = "1" +``` + +**NOTE: For custom metrics, ensure they follow the structured format: scenario/metric_name/version — for example, genai-evaluations/groundedness_formatted/0.0.1 or genai-evaluations/correctness_structured/0.0.1.** + +> 📘 **Helpful Resources**: +> +> - [System-Defined Evaluation Metrics – SAP Documentation](https://help.sap.com/docs/sap-ai-core/generative-ai-hub/system-defined-evaluation-metrics) +> +> - **If your evaluation requires domain-specific or advanced scoring logic -** [ Define Your Own Custom Metrics – SAP Guide](https://help.sap.com/docs/sap-ai-core/generative-ai-hub/custom-metrics) +> +> - [Learn more about variable mapping](https://help.sap.com/docs/sap-ai-core/generative-ai-hub/variable-mapping) +> +Now, we will create an AI Core Configuration using the defined parameters. + + +```PYTHON +# creating an AICORE Configuration. +import requests + +request_body = { + "name": "genai-eval-conf", + "scenarioId": "genai-evaluations", + "executableId": "genai-evaluations-simplified", + "inputArtifactBindings": [ + { + "key": "datasetFolder", + "artifactId": artifact_id + } + ], + "parameterBindings": [ + { + "key": "repetitions", + "value": repetitions + }, + { + "key": "orchestrationDeploymentURL", + "value": orchestration_deployment_url + }, + { + "key": "metrics", + "value": metrics_list + }, + { + "key": "testDataset", + "value": test_datasets + }, + { + "key": "orchestrationRegistryIds", + "value": orchestration_registry_id + }, + { + "key": "testRowCount", + "value": "2" + } + ] +} + +def create_aicore_configuration(): + headers = _get_headers() + GET_CONFIGURATIONS_ENDPOINT = '/v2/lm/configurations' + request_url = f"{AICORE_BASE_URL}{GET_CONFIGURATIONS_ENDPOINT}" + try: + response = requests.post( + request_url, headers=headers, data=json.dumps(request_body), timeout=120 + ) + print(response) + if(response.status_code != 201): + raise + result = response.json() + print(result) + return result['id'] + except: + logging.error("Error occurred while attempting to create a Configuration") + raise + +configuration_id = create_aicore_configuration() +``` +![img](img/image_8.png) + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +Before setting up your evaluation configuration in **Bruno**, you need to create and deploy the orchestration workflow that powers the evaluation process. This is a prerequisite step and must be completed before proceeding. + +The **orchestration deployment URL** is a critical input required in the configuration JSON. Once the orchestration is deployed, ensure its status is set to `running` and the **deployment URL** is generated. You will reference this URL while defining the evaluation configuration. + +> 📘 **Need help deploying the orchestration workflow?** Check the official guide: [Deploy an Orchestration Workflow in SAP AI Core](https://developers.sap.com/tutorials/ai-core-orchestration-consumption.html) + +--- + +#### Sample Evaluation Configuration in Bruno + +Below is a sample configuration payload that you can use in **Bruno** to trigger an evaluation. Update placeholders like `` and `` with actual values. + +```json +{ + "name": "genai-eval-conf", + "scenarioId": "genai-evaluations", + "executableId": "genai-evaluations-simplified", + "inputArtifactBindings": [ + { + "key": "datasetFolder", + "artifactId": "" + } + ], + "parameterBindings": [ + { + "key": "repetitions", + "value": "1" + }, + { + "key": "orchestrationDeploymentURL", + "value": "" + }, + { + "key": "metrics", + "value": "Pointwise RAG Context Precision,Pointwise RAG Completeness" + }, + { + "key": "testDataset", + "value": "{\"path\": \"testdata/emanual.csv\", \"type\": \"csv\"}" + }, + { + "key": "orchestrationRegistryIds", + "value": "" + }, + { + "key": "testRowCount", + "value": "2" + } + ] +} +``` + +![img](img/image-br03.png) + +**NOTE: For custom metrics, ensure they follow the structured format: scenario/metric_name/version — for example, genai-evaluations/groundedness_formatted/0.0.1 or genai-evaluations/correctness_structured/0.0.1.** + +> 📘 **Helpful Resources**: +> +> - [System-Defined Evaluation Metrics – SAP Documentation](https://help.sap.com/docs/sap-ai-core/generative-ai-hub/system-defined-evaluation-metrics) +> +> - **If your evaluation requires domain-specific or advanced scoring logic -** [ Define Your Own Custom Metrics – SAP Guide](https://help.sap.com/docs/sap-ai-core/generative-ai-hub/custom-metrics) +> +> - [Learn more about variable mapping](https://help.sap.com/docs/sap-ai-core/generative-ai-hub/variable-mapping) + +[OPTION END] + +### Evaluation Execution Creation +[OPTION BEGIN [SAP AI Launchpad]] + +- Once the evaluation configuration is created, the system automatically triggers an evaluation execution. + +- Follow these steps to monitor its progress and verify completion: + + - Navigate to **ML Operations** in the SAP AI Core Launchpad. + + - In the sidebar, click **Executions**. + + ![img](img/image_41.png) + + - Locate the most recent execution triggered by your evaluation configuration. You can use the timestamp or configuration name to identify it. + + - Click on the execution entry to open its details. The Current Status will update as the process runs. + + ![img](img/image_31.png) + +- Once the Target Status reaches **COMPLETED** , your evaluation has successfully finished. + +![img](img/image_32.png) + +> [For More information](https://help.sap.com/docs/sap-ai-core/generative-ai-hub/create-evaluation) +> You’ve now completed an evaluation run and are ready to view and interpret the results. + +[OPTION END] + +[OPTION BEGIN [Python]] + +After creating the configuration, the next step is to trigger the evaluation workload by creating an AI Core execution. + +**Create an Execution with the Created Configuration** + +- The code below will initiate the evaluation process based on your configuration. + +```PYTHON +# create an execution with the created configuration. + +import requests +def create_execution(): + headers = _get_headers() + GET_EXECUTIONS_ENDPOINT = '/v2/lm/executions' + request_url = f"{AICORE_BASE_URL}{GET_EXECUTIONS_ENDPOINT}" + request_body = {"configurationId" : configuration_id} # replace with your created configuration id + try: + response = requests.post( + request_url, headers=headers, data=json.dumps(request_body), timeout=120 + ) + print("response received is ", response) + result = response.json() + print(result) + return result['id'] + except: + logging.error("Error occurred while attempting to create an execution") + raise + + +execution_id = create_execution() +``` +![img](img/image_11.png) + +**Get Execution Status** + +Check the status of the triggered execution. You’ll need to wait for the status to be **COMPLETED** before moving to the next steps. + +```PYTHON +# get execution status +import requests +def get_execution_status(execution_id): + headers = _get_headers() + LOG_EXECUTIONS_ENDPOINT = f'/v2/lm/executions/{execution_id}' + request_url = f"{AICORE_BASE_URL}{LOG_EXECUTIONS_ENDPOINT}" + try: + response = requests.get( + request_url, headers=headers, timeout=120 + ) + print("response received is ", response) + result = response.json() + return result + except: + logging.error("Error occurred while attempting to get execution status") + raise + + +get_execution_status(execution_id) +``` + +- The status field progresses through different states over time: +UNKNOWN → PENDING → RUNNING → COMPLETED. + +![img](img/image_9.png) + +![img](img/image_10.png) + +![img](img/image_12.png) + +- Ensure it reaches COMPLETED before proceeding. + +> [For More information](https://help.sap.com/docs/sap-ai-core/generative-ai-hub/create-evaluation) + +**NOTE:** After triggering the execution, wait a few minutes, then re-run the **get_execution_status()** function. Once the status is **COMPLETED**, proceed to the next steps. + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +After creating the configuration, the next step is to trigger the evaluation workload by creating an AI Core execution. + +**Create an Execution with the Created Configuration** + +- Click on create execution under executions, pass the configuration id created in previous step + +![img](img/image-br04.png) + +- The status field progresses through different states over time: +UNKNOWN → PENDING → RUNNING → COMPLETED. + +**Get Execution Status** + +check the status of created execution by passing the execution ID, The Current Status will update as the process runs. please refer the below image + +![img](img/image-br05.png) + +[OPTION END] + +### Evaluation Results Analysis + +[OPTION BEGIN [SAP AI Launchpad]] + +#### Retrieve Aggregate Metrics Using Run Name + +Once the evaluation workflow execution is completed, this step retrieves the aggregated evaluation metrics from the SAP AI Core service by specifying the run name. + +![img](img/image_35.png) + +[OPTION END] + +[OPTION BEGIN [Python]] +#### Retrieve Aggregate Metrics Using Execution ID + +Once the evaluation workflow execution is completed, we can retrieve the aggregated evaluation metrics using the execution ID. These metrics provide a quick summary of the model's performance across all completions. + +Below is the Python code that calls the AI Core Tracking API to fetch these aggregated metrics. + +```PYTHON +# Get aggregate metrics using execution id +import requests +def retrieve_aggregate_metrics(execution_id): + headers = _get_headers() + GET_METRICS_ENDPOINT = f'/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/child-of={execution_id}' + request_url = f"{AICORE_BASE_URL}{GET_METRICS_ENDPOINT}" + try: + response = requests.get(request_url, headers=headers, timeout=120) + print("response received is ", response) + result = response.json() + return result + except: + logging.error("Error occurred while attempting to retreive aggeregate metrics for the run") + raise + +runs_data = retrieve_aggregate_metrics(execution_id) +``` + +**Example Output** +![img](img/image_13.png) +The **Response [200]** indicates that the request was successful, and the aggregated metrics have been retrieved. + +#### Download the Result Artifacts from Object Store for Further Analysis + +- To drill down further into the **instance-level metrics, logs, or additional result files**, you can download the **SQLite DB** and other artifacts from object storage. + +```PYTHON +# download the result artifacts from Object store. +import boto3 + +def download_all_objects(prefix, destination_folder): + """ + Recursively download all objects from an S3 bucket starting with a specific prefix. + + :param bucket_name: Name of the S3 bucket. + :param prefix: Prefix to filter objects in the bucket. + :param destination_folder: Local folder to save the downloaded files. + """ + s3_client = boto3.client( + 's3', + aws_access_key_id=AWS_ACCESS_KEY, + aws_secret_access_key=AWS_SECRET_ACCESS_KEY, + region_name=AWS_REGION + ) + + # Ensure the destination folder exists + if not os.path.exists(destination_folder): + os.makedirs(destination_folder) + + # Paginate through objects + paginator = s3_client.get_paginator('list_objects_v2') + pages = paginator.paginate(Bucket=AWS_BUCKET_ID, Prefix=prefix) + + for page in pages: + if 'Contents' in page: + for obj in page['Contents']: + key = obj['Key'] + local_file_path = os.path.join(destination_folder, os.path.relpath(key, prefix)) + + # Ensure the local directory structure exists + local_directory = os.path.dirname(local_file_path) + if not os.path.exists(local_directory): + os.makedirs(local_directory) + + # Download the object + print(f"Downloading {key} to {local_file_path}") + s3_client.download_file(AWS_BUCKET_ID, key, local_file_path) + +# Example usage +EXECUTION_ID = execution_id +sqlite_db_prefix = f'{EXECUTION_ID}/evaluation_result/' +destination_folder = 'results-new' + +download_all_objects(sqlite_db_prefix, destination_folder) +``` + +**Sample Output** +![img](img/image_15.png) + +#### Viewing Results from SQLite Database in a Tabular Format + +In this step, we will visualize the evaluation results stored in the SQLite database (results.db) in a clean and readable tabular format directly within the notebook. This allows for quick inspection and validation of the data across different tables such as run, configuration, submission, etc. + +**Objective** + +- Connect to the SQLite database. + +- Query specific tables. + +- Display their contents in a structured HTML format. + +- Enhance readability using custom CSS styling. + +```PYTHON +# viewing the results from sqlite db in tabular format.. +import sqlite3 +import pandas as pd +from IPython.display import display, HTML + +# Path to your SQLite database file +db_file = 'results-new/results.db' + +connection = sqlite3.connect(db_file) + +# Specify the table names you want to display +table_names = ['run','configuration', 'submission', 'submission_result', 'evaluation_result'] + +# Create the CSS and HTML container +html_content = """ + +
+""" + +for table_name in table_names: + query = f"SELECT * FROM {table_name};" + df = pd.read_sql_query(query, connection) + # If you want to see all the rows across all tables, remove/comment the next line + df = df.head(5) # Limiting the number of rows displayed + table_html = df.to_html(classes='table-container', index=False) + html_content += f""" +
+

Table: {table_name}

+ {table_html} +
+ """ + +html_content += "
" + +display(HTML(html_content)) + +# Close the connection +connection.close() +``` + +**Output Example** + +Below is an example of the output rendered in the notebook. +![img](img/image_16.png) + +#### Delete an Execution by Execution ID (Optional) + +Once you have completed the evaluation and gathered the necessary aggregated metrics, you may want to delete the execution associated with a specific run. This helps maintain a clean and organized workspace in your SAP AI Core environment by removing outdated or unnecessary executions. + +**NOTE:** Deleting an execution is irreversible. Ensure you have saved all relevant results and metrics before proceeding. + +**Delete Execution by ID** + +```PYTHON +#Delete Execution Id +def delete_execution(): + headers = _get_headers() + EXEC_ID = execution_id + GET_EXECUTIONS_ENDPOINT = '/v2/lm/executions/' + request_url = f"{AICORE_BASE_URL}{GET_EXECUTIONS_ENDPOINT}{EXEC_ID}" + try: + response = requests.delete( + request_url, headers=headers, params={"AI-Resource-Group":AICORE_RESOURCE_GROUP}, timeout=120 + ) + print(response) + if(response.status_code != 202): + raise + result = response.json() + print(result) + except: + logging.error("Error occurred while attempting to delete a Configuration") + raise + +delete_execution() +``` +**How It Works** + +- **Execution ID:** This is the unique identifier for the execution you wish to delete. Ensure the execution_id variable is properly assigned in your script. + +- **DELETE Request:** The function sends an HTTP DELETE request to SAP AI Core’s executions endpoint. + +- **Response Handling:** + - If the status code is 202 Accepted, the deletion request was successfully initiated. + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +**Retrieve Aggregate Metrics Using Execution ID** + +Once the evaluation workflow execution is completed, we can retrieve the aggregated evaluation metrics using the execution ID. These metrics provide a quick summary of the model's performance across all completions. + +![img](img/image-br06.png) + +[OPTION END] + diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/evaluation_RAG.ipynb b/tutorials/ai-core-genaihub-evaluation-with-grounding/evaluation_RAG.ipynb new file mode 100644 index 000000000..bfac281a6 --- /dev/null +++ b/tutorials/ai-core-genaihub-evaluation-with-grounding/evaluation_RAG.ipynb @@ -0,0 +1,1430 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Generative AI Custom Evaluation\n", + "This is an example notebook which showcases how a user can use AI Core custom evaluation to benchmark their large language models, evaluate orchestration configuration or prompts for their use case.\n", + "It uses publicly available emanual.csv. The workload computes industry standard metrics to check the reliability of the response generate by llm.\n", + "
**Note: For detailed instructions please refer to [Readme](./Readme.md)**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# SetUp (Step 1)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "! pip install -r ../requirements.txt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load your environment variables\n", + "\n", + "Ensure that your environment variables are set in a `.env` file (see sample.env for an example). If there is a missing field the notebook will prompt you for a value." + ] + }, + { + "cell_type": "code", + "execution_count": 351, + "metadata": {}, + "outputs": [], + "source": [ + "# Loading the credentials from the env file\n", + "from gen_ai_hub.proxy.gen_ai_hub_proxy import GenAIHubProxyClient\n", + "from dotenv import load_dotenv\n", + "import os\n", + "\n", + "load_dotenv(override=True)\n", + "\n", + "\n", + "# Fetching environment variables or prompting the user if missing\n", + "AICORE_BASE_URL = os.getenv(\"AICORE_BASE_URL\") or input(\"AICORE_BASE_URL is missing. Please enter it: \")\n", + "AICORE_RESOURCE_GROUP = os.getenv(\"AICORE_RESOURCE_GROUP\") or input(\"AICORE_RESOURCE_GROUP is missing. Please enter it (default: 'default'): \") or \"default\"\n", + "AICORE_AUTH_URL = os.getenv(\"AICORE_AUTH_URL\") or input(\"AICORE_AUTH_URL is missing. Please enter it: \")\n", + "AICORE_CLIENT_ID = os.getenv(\"AICORE_CLIENT_ID\") or input(\"AICORE_CLIENT_ID is missing. Please enter it: \")\n", + "AICORE_CLIENT_SECRET = os.getenv(\"AICORE_CLIENT_SECRET\") or input(\"AICORE_CLIENT_SECRET is missing. Please enter it: \")\n", + "\n", + "AWS_ACCESS_KEY = os.getenv(\"AWS_ACCESS_KEY\") or input(\"AWS_ACCESS_KEY is missing. Please enter it: \")\n", + "AWS_BUCKET_ID = os.getenv(\"AWS_BUCKET_ID\") or input(\"AWS_BUCKET_ID is missing. Please enter it: \")\n", + "AWS_REGION = os.getenv(\"AWS_REGION\") or input(\"AWS_REGION is missing. Please enter it: \")\n", + "AWS_SECRET_ACCESS_KEY = os.getenv(\"AWS_SECRET_ACCESS_KEY\") or input(\"AWS_SECRET_ACCESS_KEY is missing. Please enter it: \")\n", + "DEPLOYMENT_URL = os.getenv(\"DEPLOYMENT_URL\", None)\n", + "AWS_USERNAME = os.getenv(\"AWS_USERNAME\")\n", + "AWS_HOST = os.getenv(\"AWS_HOST\")\n", + "\n", + "# Initializing the GenAIHubProxyClient\n", + "client = GenAIHubProxyClient(\n", + " base_url=AICORE_BASE_URL,\n", + " auth_url=AICORE_AUTH_URL,\n", + " client_id=AICORE_CLIENT_ID,\n", + " client_secret=AICORE_CLIENT_SECRET,\n", + " resource_group=AICORE_RESOURCE_GROUP\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Dependencies and Helper Functions (Step 2)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import json\n", + "\n", + "\n", + "\n", + "def get_dataset_file_name(folder_path):\n", + " \"\"\"\n", + " Retrieves the name of the first file in the specified folder.\n", + " \"\"\"\n", + " if not os.path.isdir(folder_path):\n", + " print(f\"The folder path '{folder_path}' does not exist.\")\n", + " return None\n", + "\n", + " items_in_folder = os.listdir(folder_path)\n", + "\n", + " for item in items_in_folder:\n", + " item_path = os.path.join(folder_path, item)\n", + " if os.path.isfile(item_path):\n", + " return item\n", + "\n", + " print(f\"No files were found in the folder '{folder_path}'.\")\n", + " return None\n", + "\n", + "\n", + "\n", + "# --- MAIN EXECUTION ---\n", + "DATASET_FOLDER = \"../DATASET\"\n", + "\n", + "DATASET_NAME = get_dataset_file_name(DATASET_FOLDER)\n", + "\n", + "if DATASET_NAME:\n", + " print(f\"Dataset name: {DATASET_NAME}\")\n", + "else:\n", + " print(\"Missing run or dataset file.\")\n", + " raise SystemExit(\"Exiting due to missing run/dataset file.\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a Bearer token" + ] + }, + { + "cell_type": "code", + "execution_count": 360, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Response status code: \n" + ] + } + ], + "source": [ + "import requests\n", + "def create_token():\n", + " \n", + " payload = {\n", + " 'grant_type': 'client_credentials',\n", + " 'client_id': AICORE_CLIENT_ID,\n", + " 'client_secret':AICORE_CLIENT_SECRET\n", + " }\n", + " response = requests.post(AICORE_AUTH_URL, data=payload)\n", + " print(f\"Response status code: {response}\")\n", + " response_data = response.json()\n", + " if 'access_token' in response_data:\n", + " return response_data['access_token']\n", + " else:\n", + " raise Exception(f\"Failed to get token: {response_data}\")\n", + "token = create_token()\n", + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a Resource Group" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "If you already have a resource group provisioned with grounding and rag enabled, you can add the name fo your resource group at `user_resource_group_id`\n", + "\n", + "**Note: the \"labels\" config is required to enable your resource group to use grounding and Rag metrics. Ensure you set the value to true**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "def create_resource_group():\n", + " headers = {\n", + " 'Authorization': f'Bearer {token}',\n", + " 'Content-Type': 'application/json',\n", + " }\n", + " resource_group_id =f\"rag-notebook-test\"\n", + " api_url = f\"{AICORE_BASE_URL}/v2/admin/resourceGroups\"\n", + " payload = {\n", + " \"resourceGroupId\": resource_group_id,\n", + " \"labels\": [\n", + " {\n", + " \"key\": \"ext.ai.sap.com/document-grounding\",\n", + " \"value\": \"true\"\n", + " }\n", + " ]\n", + " }\n", + " response = requests.post(api_url, json=payload, headers=headers)\n", + " if response.status_code == 202:\n", + " return resource_group_id\n", + " else:\n", + " raise Exception(f\"Failed to create resource group: {response.json()}\")\n", + "\n", + "user_resource_group_id = \"\" # add your provisioned resource group id here, if you have one\n", + "resource_group_id = user_resource_group_id or create_resource_group()\n", + "print(f\"Resource Group created with ID: {resource_group_id}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Register an Object Store Secret\n", + "To use the evaluations service, you must register an object store with the name default. Optionally, you can register an additional object store with a name of your choice." + ] + }, + { + "cell_type": "code", + "execution_count": 361, + "metadata": {}, + "outputs": [], + "source": [ + "# setup authentication and headers needed for AI Core requests\n", + "def _get_headers():\n", + " headers = {\n", + " \"Authorization\": client.get_ai_core_token(),\n", + " \"AI-Resource-Group\": resource_group_id,\n", + " \"Content-Type\": \"application/json\",\n", + " }\n", + " return headers" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Register S3 secret with AI Core which will be used an input source \n", + "import requests\n", + "import json\n", + "import logging\n", + "\n", + "def delete_oss_secret(oss_name=\"\"):\n", + " headers = _get_headers()\n", + " \n", + " DELETE_SECRETS_ENDPOINT = f'/v2/admin/objectStoreSecrets/{oss_name}'\n", + " request_url = f\"{AICORE_BASE_URL}{DELETE_SECRETS_ENDPOINT}\"\n", + " \n", + " try:\n", + " response = requests.delete(request_url, headers=headers, timeout=120)\n", + " if response.status_code == 202:\n", + " print(f\"Successfully deleted object store secret: {oss_name}\")\n", + " elif response.status_code == 404:\n", + " print(f\"Object store secret not found: {oss_name}. It may not exist.\")\n", + " else:\n", + " logging.error(f\"Failed to delete object store secret: {oss_name}, Status Code: {response.status_code}\")\n", + " except Exception as e:\n", + " logging.error(f\"Error occurred while attempting to delete object store secret: {e}\")\n", + " raise\n", + "\n", + "def register_oss_secret(oss_name=\"\", path_prefix=\"\"):\n", + " headers = _get_headers()\n", + " \n", + " POST_SECRETS_ENDPOINT = '/v2/admin/objectStoreSecrets'\n", + " request_url = f\"{AICORE_BASE_URL}{POST_SECRETS_ENDPOINT}\"\n", + " \n", + " request_body = {\n", + " \"name\": oss_name,\n", + " \"data\": {\n", + " \"AWS_ACCESS_KEY_ID\": AWS_ACCESS_KEY,\n", + " \"AWS_SECRET_ACCESS_KEY\": AWS_SECRET_ACCESS_KEY\n", + " },\n", + " \"type\": \"S3\",\n", + " \"bucket\": AWS_BUCKET_ID,\n", + " \"endpoint\": \"s3-eu-central-1.amazonaws.com\",\n", + " \"region\": AWS_REGION,\n", + " \"pathPrefix\": path_prefix,\n", + " \"verifyssl\": \"0\",\n", + " \"usehttps\": \"1\",\n", + " }\n", + " try:\n", + " response = requests.post(\n", + " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", + " )\n", + " result = response.json()\n", + " return result\n", + " except:\n", + " logging.error(\"Error occurred while attempting to create object store secret\")\n", + " raise\n", + " \n", + "delete_oss_secret(oss_name=\"default\")\n", + "delete_oss_secret(oss_name=\"genai-simplified-notebook\")\n", + " \n", + "register_oss_secret(oss_name=\"default\", path_prefix=\"\")\n", + "register_oss_secret(oss_name=\"genai-simplified-notebook\", path_prefix=\"\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a Grounding Secret\n", + "\n", + "In the next step, we create a secret that enables grounding by adding on the \"labels\" config. This generic secret needs to be created to provide details of the hyperscaler and bucket details so that grounding service will know how to retrieve data from it." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import time\n", + "import base64\n", + "def encode_base64(value):\n", + " return base64.b64encode(value.encode('utf-8')).decode('utf-8')\n", + " \n", + "def create_generic_secret():\n", + " payload ={\n", + " \"name\": \"groundingsecret\",\n", + " \"data\": {\n", + " \"url\": encode_base64(\"https://s3-eu-central-1.amazonaws.com\"), \n", + " \"authentication\": encode_base64(\"NoAuthentication\"),\n", + " \"description\": encode_base64(\"grounding secret\"),\n", + " \"access_key_id\": encode_base64(AWS_ACCESS_KEY),\n", + " \"bucket\": encode_base64(AWS_BUCKET_ID),\n", + " \"host\": encode_base64(AWS_HOST),\n", + " \"region\": encode_base64(\"eu-central-1\"),\n", + " \"secret_access_key\": encode_base64(AWS_SECRET_ACCESS_KEY),\n", + " \"username\": encode_base64(AWS_USERNAME),\n", + " },\n", + " \"labels\": [\n", + " {\n", + " \"key\": \"ext.ai.sap.com/document-grounding\",\n", + " \"value\": \"true\"\n", + " },\n", + " {\n", + " \"key\": \"ext.ai.sap.com/documentRepositoryType\",\n", + " \"value\": \"S3\"\n", + " }\n", + " ]\n", + "}\n", + " time.sleep(60)\n", + " try:\n", + " headers = _get_headers()\n", + " api_url = f\"{AICORE_BASE_URL}/v2/admin/secrets\"\n", + " response = requests.post(api_url, headers=headers, json=payload)\n", + " if(response.status_code == 200):\n", + " print(\"Generic secret created successfully\")\n", + " else:\n", + " print(f\"Failed to create generic secret: {response}\")\n", + " except Exception as e:\n", + " print(f\"Error creating secret: {e}\")\n", + "create_generic_secret()\n", + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a Grounding Pipeline\n", + "\n", + "This step creates the connection to where you have stored your grounding documents and allows a path to retrieve the documents during evaluation " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def create_s3_grounding_pipeline():\n", + " headers = _get_headers()\n", + " api_url = f\"{AICORE_BASE_URL}/v2/lm/document-grounding/pipelines\"\n", + " payload = {\n", + " \"type\": \"S3\",\n", + " \"configuration\": {\n", + " \"destination\": \"groundingsecret\"\n", + " }\n", + " }\n", + " time.sleep(5) # Optional wait for secret availability\n", + "\n", + " try:\n", + " response = requests.post(api_url, headers=headers, json=payload)\n", + " if response.status_code == 201:\n", + " print(\"S3 document grounding pipeline created successfully\")\n", + " else:\n", + " print(f\"Failed to create pipeline. Status: {response.status_code}, Response: {response.text}\")\n", + " except Exception as e:\n", + " print(f\"Error creating S3 document grounding pipeline: {e}\")\n", + "create_s3_grounding_pipeline()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note: Check that the next step successfully runs to ensure you have set up properly**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def test_get_retrieval_repository(headers):\n", + " # To ensure chunking happens.\n", + " api_url = f\"{AICORE_BASE_URL}/v2/lm/document-grounding/retrieval/dataRepositories\"\n", + "\n", + " try:\n", + " response = requests.get(api_url, headers=headers)\n", + " print(\"Check to see if the s3 is added in the body:\", response.json())\n", + " if response.status_code == 200:\n", + " print(\"S3 document retrieval successfull\")\n", + " else:\n", + " raise Exception(f\"Failed to create pipeline. Status: {response.status_code}, Response: {response.text}\")\n", + " except Exception as e:\n", + " raise Exception(f\"Error creating S3 document grounding pipeline: {e}\")\n", + "test_get_retrieval_repository(_get_headers())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# uploading these files to Object store to register as an artifact inside ai core\n", + "\n", + "import boto3\n", + "import os\n", + "import uuid\n", + "\n", + "def upload_folder_to_s3(folder_path, bucket_name, s3_prefix=\"\"):\n", + " \"\"\"\n", + " Upload a folder to an S3 bucket recursively.\n", + "\n", + " :param folder_path: The local folder path to upload.\n", + " :param bucket_name: The name of the S3 bucket.\n", + " :param s3_prefix: Optional prefix to use for the S3 keys (e.g., subfolder in the bucket).\n", + " \"\"\"\n", + " s3_client = boto3.client(\n", + " 's3',\n", + " aws_access_key_id=AWS_ACCESS_KEY,\n", + " aws_secret_access_key=AWS_SECRET_ACCESS_KEY,\n", + " region_name=AWS_REGION\n", + " )\n", + "\n", + " for root, dirs, files in os.walk(folder_path):\n", + " for file_name in files:\n", + " print(\"val of root is \", file_name)\n", + " local_path = os.path.join(root, file_name)\n", + " # Compute the relative path for the S3 key\n", + " relative_path = os.path.relpath(local_path, folder_path)\n", + " s3_key = os.path.join(s3_prefix, relative_path).replace(\"\\\\\", \"/\") # Ensure S3-compatible paths\n", + " print(\"val of s3 key is \", s3_key)\n", + " print(f\"Uploading {local_path} to s3://{bucket_name}/{s3_key}\")\n", + " \n", + " # Upload the file\n", + " s3_client.upload_file(local_path, bucket_name, s3_key)\n", + "\n", + "# Example usage\n", + "folder_to_upload_testdata = \"../DATASET_RAG\"\n", + "user_directory_prefix = \"\" # replace with your i-number as string here\n", + "prefix_guid = user_directory_prefix if user_directory_prefix is not None else str(uuid.uuid4().hex)\n", + "s3_testdata_prefix = f\"genaiEvaluation/{prefix_guid}/testdata\" # Leave empty for root of the bucket\n", + "\n", + "\n", + "upload_folder_to_s3(folder_to_upload_testdata, AWS_BUCKET_ID, s3_testdata_prefix)\n", + "input_artifact_path = f\"ai://genai-simplified-notebook/genaiEvaluation/{prefix_guid}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The user stores the input files in the object store and registers the root folder as artifact with AI Core. The File Upload and Artifact endpoints of AI Core API may be used for this purpose. In this example `genaiEvaluation\\{prefix_guid}` is the root folder containing the orchestration configurations and test data which is registered as AI Core artifact." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "import logging\n", + "# Registering the uploaded files from AWS as artifacts to use inside configuration.\n", + "\n", + "def register_artifact():\n", + " headers = _get_headers()\n", + " \n", + " GET_ARTIFACTS_ENDPOINT = '/v2/lm/artifacts'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_ARTIFACTS_ENDPOINT}\"\n", + " \n", + " request_body = {\n", + " \"labels\": [\n", + " {\n", + " \"key\": \"ext.ai.sap.com/prompt-evaluation\",\n", + " \"value\": \"true\"\n", + " }\n", + " ],\n", + " \"name\": \"genai-eval-simplified-test-data\",\n", + " \"kind\": \"other\",\n", + " \"url\": input_artifact_path, # input artifact path\n", + " \"description\": \"demo artifacts for evaluation flow.\",\n", + " \"scenarioId\": \"genai-evaluations\"\n", + " }\n", + " try:\n", + " response = requests.post(\n", + " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", + " )\n", + " result = response.json()\n", + " print(result)\n", + " return result['id']\n", + " except:\n", + " print(\"Error occurred while attempting to create an execution\")\n", + " raise\n", + " \n", + "\n", + "artifact_id = register_artifact()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Orchestration Deployment\n", + "An orchestration Deployment URL is required for us to run our evaluation. Once created we need to wait until the deployment is running and provides us a deployment url which will be add to our configuration file in the next step. You can skip this step if you already have a orchestration deployment running." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "import json\n", + "import time\n", + "\n", + "\n", + "\n", + "def create_orchestration_configuration():\n", + " headers = _get_headers()\n", + " request_body = {\n", + " \"name\": \"orchestrationDeployment\",\n", + " \"executableId\": \"orchestration\",\n", + " \"scenarioId\": \"orchestration\",\n", + " \"parameterBindings\": [\n", + " {\n", + " \"key\": \"modelFilterList\",\n", + " \"value\": \"null\"\n", + " },\n", + " {\n", + " \"key\": \"modelFilterListType\",\n", + " \"value\": \"allow\"\n", + " }\n", + " ],\n", + " \"inputArtifactBindings\": []\n", + " }\n", + " \n", + " GET_CONFIGURATIONS_ENDPOINT = '/v2/lm/configurations'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_CONFIGURATIONS_ENDPOINT}\"\n", + " try:\n", + " response = requests.post(\n", + " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", + " )\n", + " print(response)\n", + " if(response.status_code != 201):\n", + " raise\n", + " result = response.json()\n", + " print(result)\n", + " return result['id']\n", + " except:\n", + " logging.error(\"Error occurred while attempting to create a Configuration\")\n", + " raise\n", + " \n", + "def execute_orchestration_deployment(configuration_id):\n", + " headers = _get_headers()\n", + " GET_DEPLOYMENTS_ENDPOINT = '/v2/lm/deployments'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_DEPLOYMENTS_ENDPOINT}\"\n", + " \n", + " request_body = {\n", + " \"configurationId\": configuration_id\n", + " }\n", + " \n", + " try:\n", + " response = requests.post(\n", + " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", + " )\n", + " print(response)\n", + " if(response.status_code != 202):\n", + " print(\"Deployment execution failed\")\n", + " result = response.json()\n", + " print(result)\n", + " return result['id']\n", + " \n", + " except:\n", + " logging.error(\"Error occurred while attempting to create an execution\")\n", + " raise\n", + "\n", + "def get_deployment_status(orchestration_deployment_id):\n", + " headers = _get_headers()\n", + " api_url = f\"{AICORE_BASE_URL}/v2/lm/deployments/{orchestration_deployment_id}?$select=status\"\n", + " timeout = 400 \n", + " initial_interval = 30 \n", + " pending_interval = 10\n", + " start = time.time()\n", + "\n", + " status = None\n", + " current_interval = initial_interval\n", + "\n", + " while time.time() - start < timeout:\n", + " response = requests.get(api_url, headers=headers)\n", + " if response.status_code == 200:\n", + " status = response.json().get('status')\n", + " print(f\"Deployment {orchestration_deployment_id} status: {status}\")\n", + " # Adjust polling interval based on status\n", + " if status == 'RUNNING':\n", + " return True\n", + " elif status == 'UNKNOWN':\n", + " current_interval = initial_interval\n", + " elif status == 'PENDING':\n", + " current_interval = pending_interval\n", + "\n", + " else:\n", + " print(f\"Failed to fetch deployment status. HTTP {response.status_code}\")\n", + " return False\n", + "\n", + " # Waiting according to status for API call\n", + " time.sleep(current_interval)\n", + "\n", + "def get_deployment_url(orchestration_deployment_id):\n", + " headers = _get_headers()\n", + " response = requests.get(f\"{AICORE_BASE_URL}/v2/lm/deployments/{orchestration_deployment_id}\", headers=headers)\n", + " if response.status_code != 200:\n", + " raise Exception(f\"Failed to get deployment URL: {response.status_code} - {response.text}\")\n", + " return response.json().get('deploymentUrl')\n", + "\n", + "# You can skip this step if you already have a orchestration deployment running\n", + "deployment_url = DEPLOYMENT_URL\n", + "if not deployment_url:\n", + " configuration_id = create_orchestration_configuration()\n", + " orchestration_deployment_id = execute_orchestration_deployment(configuration_id)\n", + " is_running = get_deployment_status(orchestration_deployment_id) \n", + " if is_running:\n", + " deployment_url = get_deployment_url(orchestration_deployment_id)\n", + " print(f\"Deployment URL: {deployment_url}\")\n", + " else:\n", + " print(\"Deployment is not running or failed.\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Manually set the orchestration deployment url\n", + "# deployment_url=\"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Select your Models\n", + " \n", + "Add the LLMs you wish to use in the string `selected_models_str`\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 368, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Selected models string: gpt-4o:2024-05-13\n" + ] + } + ], + "source": [ + "# Manual selection of models\n", + "selected_models_str=\"gpt-4o:2024-05-13\"\n", + "print(\"Selected models string:\", selected_models_str)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Select system defined metrics\n", + " \n", + "Add the system defined metrics you wish to use in the string `selected_metrics_str`.\n", + "\n", + "**Note: If your dataset does not have a reference column, DO NOT Select metrics where reference is required.**" + ] + }, + { + "cell_type": "code", + "execution_count": 369, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Pointwise RAG Context Precision,Pointwise RAG Completeness\n" + ] + } + ], + "source": [ + "# Manual Selection of Metrics\n", + "selected_metrics_str = \"Pointwise RAG Context Precision,Pointwise RAG Completeness\"\n", + "print(selected_metrics_str)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Custom Metric Creation and Selection\n", + "This script checks for an evaluation metric in SAP AI Core.\n", + "\n", + "1. You can provide Metric ID's directly by setting the variable as comma separated string:\n", + " user_metric_ids = `\"\"`\n", + " - ✅ If the ID exists, it will be returned.\n", + " \n", + "2. You can create a new custom metric by adding the json in `custom_metric_list` string\n", + " - The script will use the contents of the `custom_metric_list`\n", + " to search for an existing metric by scenario + name + version.\n", + "\n", + "3. If no existing metric is found:\n", + " - A new metric will be created using the details in `custom_metric_list`.\n", + " - Required fields in custom_metric: scenario, name, version, evaluationMethod.\n", + "\n", + "4. At the end:\n", + " - The script prints the final Metric ID that was found or created.\n", + "\n", + "Note: Skip the two following cell if you do not want to create/select a custom metric for your workload" + ] + }, + { + "cell_type": "code", + "execution_count": 370, + "metadata": {}, + "outputs": [], + "source": [ + "user_metric_ids = \"d1868b00-1601-407a-92cd-0b9065682d1f,dbf56851-8444-45d3-a0c1-adbe210c7e771\"\n", + "\n", + "custom_metric_list = [\n", + " {\n", + " \"name\": \"test-metric\",\n", + " \"scenario\": \"genai-evaluations-test\",\n", + " \"version\": \"0.0.1\",\n", + " \"evaluationMethod\": \"llm-as-a-judge\",\n", + " \"managedBy\": \"imperative\",\n", + " \"systemPredefined\": False,\n", + " \"metricType\": \"evaluation\",\n", + " \"spec\": {\n", + " \"outputType\": \"numerical\",\n", + " \"promptType\": \"structured\",\n", + " \"configuration\": {\n", + " \"modelConfiguration\": {\n", + " \"name\": \"gpt-4o\",\n", + " \"version\": \"2024-05-13\",\n", + " \"parameters\": [\n", + " {\n", + " \"key\": \"max_tokens\",\n", + " \"value\": \"10000\"\n", + " }\n", + " ]\n", + " },\n", + " \"promptConfiguration\": {\n", + " \"definition\": \"You will be assessing Groundedness (also known as Faithfulness), which measures whether the response relies solely on the provided context and avoids introducing external information or making claims not supported by it.\",\n", + " \"evaluationTask\": \"You are an expert evaluator. Your task is to evaluate the groundedness of responses generated by AI models based on provided context.\\nWe will provide you with the provided context (information the AI was supposed to use) and the AI-generated response. The original user query is also provided for background.\\nYou should first read the provided context carefully, then evaluate if the response is fully supported by this context, based on the criteria provided in the Evaluation section below.\\nYou will assign the response a rating following the Rating Rubric and Evaluation Steps.\\nGive step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.\",\n", + " \"criteria\": \"Groundedness: Is all the information presented in the response verifiable against the provided context? Does the response avoid making claims or stating facts not present in the context?\",\n", + " \"ratingRubric\": [\n", + " {\n", + " \"rating\": 3,\n", + " \"rule\": \"Response is completely factual with no unsupported claims\"\n", + " },\n", + " {\n", + " \"rating\": 2,\n", + " \"rule\": \"Response has minor inaccuracies but no major contradictions\"\n", + " },\n", + " {\n", + " \"rating\": 1,\n", + " \"rule\": \"Response contains significant factual errors or hallucinations\"\n", + " }\n", + " ]\n", + " }\n", + " }\n", + " },\n", + " \"includeProperties\": [\n", + " \"grounding_response\"\n", + " ],\n", + " \"additionalProperties\": {\n", + " \"variables\": [],\n", + " \"supported_values\": [\n", + " 1,\n", + " 3\n", + " ],\n", + " \"experimental\": False\n", + " }\n", + "}\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "import json\n", + "import requests\n", + "\n", + "\n", + "# --- Fetch all metrics from SAP AI Core ---\n", + "def fetch_all_metrics():\n", + " request_url = f\"{AICORE_BASE_URL}/v2/lm/evaluationMetrics\"\n", + " resp = requests.get(request_url, headers=_get_headers())\n", + " resp.raise_for_status()\n", + " return resp.json().get(\"resources\", [])\n", + "\n", + "# --- Create or fetch a metric ---\n", + "def create_or_get_metric(custom_metric, user_metric_id=None):\n", + " all_metrics = fetch_all_metrics()\n", + "\n", + " # 1️⃣ User-supplied ID lookup\n", + " if user_metric_id:\n", + " for m in all_metrics:\n", + " if m.get(\"id\") == user_metric_id:\n", + " print(f\"✅ Metric already exists by ID: {user_metric_id}\")\n", + " return user_metric_id\n", + " print(f\"⚠️ User metric ID {user_metric_id} not found, will only include if valid later\")\n", + "\n", + " # 2️⃣ Check by scenario, name, version\n", + " scenario = custom_metric.get(\"scenario\")\n", + " name = custom_metric.get(\"name\")\n", + " version = custom_metric.get(\"version\")\n", + " if not all([scenario, name, version]):\n", + " raise ValueError(\"Metric must include 'scenario', 'name', and 'version'\")\n", + "\n", + " for m in all_metrics:\n", + " if (m.get(\"scenario\") == scenario and\n", + " m.get(\"name\") == name and\n", + " m.get(\"version\") == version):\n", + " metric_id = m.get(\"id\")\n", + " print(f\"✅ Metric already exists: {scenario}/{name} v{version}, ID = {metric_id}\")\n", + " return metric_id\n", + "\n", + " # 3️⃣ Create metric if not found\n", + " request_url = f\"{AICORE_BASE_URL}/v2/lm/evaluationMetrics\"\n", + " required_fields = [\"scenario\", \"name\", \"version\", \"evaluationMethod\", \"metricType\"]\n", + " for f in required_fields:\n", + " if f not in custom_metric:\n", + " raise ValueError(f\"❌ Missing required field: {f}\")\n", + "\n", + " resp = requests.post(request_url, headers=_get_headers(), json=custom_metric)\n", + " resp.raise_for_status()\n", + " metric_id = resp.json().get(\"id\")\n", + " print(f\"✅ Metric created successfully: {name} v{version}, ID = {metric_id}\")\n", + " return metric_id\n", + "\n", + "# --- Main pipeline ---\n", + "\n", + "# 1️⃣ Create/fetch metrics from SAP AI Core\n", + "metric_ids = []\n", + "for metric in custom_metric_list:\n", + " try:\n", + " print(f\"metric:{metric}\")\n", + " metric_id = create_or_get_metric(metric)\n", + " metric_ids.append(metric_id)\n", + " except ValueError as e:\n", + " print(f\"Skipping metric due to error: {e}\")\n", + "\n", + "# 2️⃣ Validate user_metric_ids separately if provided\n", + "if user_metric_ids and user_metric_ids.strip():\n", + " all_metrics = fetch_all_metrics()\n", + " # Split comma-separated IDs and strip whitespace\n", + " for uid in [uid.strip() for uid in user_metric_ids.split(\",\")]:\n", + " if any(m.get(\"id\") == uid for m in all_metrics):\n", + " metric_ids.append(uid)\n", + " else:\n", + " print(f\"⚠️ User metric ID {uid} does not exist in AI Core, skipping.\")\n", + "# 3️⃣ Convert to comma-separated string\n", + "custom_metric_ids_str = \",\".join(metric_ids)\n", + "print(\"✅ All processed metric IDs:\", custom_metric_ids_str)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create Orchestration Registry Configuration\n", + "\n", + "The following code defines a function `create_orchestration_registry_config()` that creates a new **Orchestration Configuration** in **Orchestration Registry**.\n", + "\n", + "**Note** : If you wish to use an existing orchestration config, skip executing this cell and add the orchestration config id in `orchestration_registry_id` string in the next cell." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'message': 'Orchestration config updated successfully.', 'id': '22ba2b67-ca81-41ab-989e-cd63a54a6499', 'scenario': 'genai-evaluations', 'name': 'genai-eval-test-1', 'version': '0.0.1'}\n" + ] + } + ], + "source": [ + "def create_orchestration_registry_config():\n", + " headers = _get_headers()\n", + " \n", + " CREATE_ORCHESTRATION_REGISTRY = '/v2/registry/v2/orchestrationConfigs'\n", + " request_url = f\"{AICORE_BASE_URL}{CREATE_ORCHESTRATION_REGISTRY}\"\n", + " model_name,model_version=selected_models_str.split(\":\")\n", + " request_body = {\n", + " \"name\": \"genai-eval-test-1\",\n", + " \"version\": \"0.0.1\",\n", + " \"scenario\": \"genai-evaluations\",\n", + " \"spec\": {\n", + " \"modules\": {\n", + " \"prompt_templating\": {\n", + " \"prompt\": {\n", + " \"template\": [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": \"You are a helpful assistant specialized in e-manual topics. Answer the following e-manual questions using the provided context. If the answer is not explicitly available in the context, respond with: `The answer is not available in the provided context.` \\\\n\\\\nRequest: {{?topic}}. \\\\n\\\\nContext: {{?groundingOutput}}\"\n", + " }\n", + " ],\n", + " \"defaults\": {}\n", + " },\n", + " \"model\": {\"name\": f\"{model_name}\", \"version\": f\"{model_version}\",\n", + " },\n", + " },\n", + " \"grounding\": {\n", + " \"type\": \"document_grounding_service\",\n", + " \"config\": {\n", + " \"filters\": [\n", + " {\n", + " \"id\": \"helpRepo\",\n", + " \"data_repositories\": [\n", + " \"*\"\n", + " ],\n", + " \"search_config\": {\n", + " \"max_chunk_count\": 10\n", + " },\n", + " \"data_repository_type\": \"help.sap.com\"\n", + " }\n", + " ],\n", + " \"placeholders\": {\n", + " \"input\": [\n", + " \"topic\"\n", + " ],\n", + " \"output\": \"groundingOutput\"\n", + " }\n", + " }\n", + " }\n", + " }\n", + " }\n", + " }\n", + " try:\n", + " response = requests.post(\n", + " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", + " )\n", + " if(response.status_code != 200):\n", + " print(response.json())\n", + " raise\n", + " result = response.json()\n", + " print(result)\n", + " return result['id']\n", + " except:\n", + " logging.error(\"Error occurred while attempting to create a orchestration registry id\")\n", + " raise\n", + "orchestration_registry_id = create_orchestration_registry_config()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Manually set orchestration config id\n", + "# orchestration_registry_id=\"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Evaluation Configuration Creation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "import json\n", + "test_data_path = f\"testdata/testdata/{DATASET_NAME}\" # specify the test data path here. For the full folder just specifying testdata will work\n", + "test_datasets = json.dumps({'path': test_data_path, 'type': 'csv'})\n", + "print(test_datasets)\n", + "metrics_list = \",\".join([selected_metrics_str,custom_metric_ids_str])\n", + "models_list = selected_models_str\n", + "print(f\"Selected metrics: {metrics_list}\")\n", + "print(f\"Selected models: {models_list}\")\n", + "#variable_mapping = json.dumps({'prompt/question': 'data/topic'}) # to map the question prompt variable to the entry in dataset.\n", + "# orchestration_deployment_url = deployment_url # needs to specify this to use a specific deployment id\n", + "orchestration_deployment_url = deployment_url\n", + "repetitions = \"1\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# creating an AICORE Configuration.\n", + "import requests\n", + "\n", + "request_body = {\n", + " \"name\": \"genai-eval-conf\",\n", + " \"scenarioId\": \"genai-evaluations\",\n", + " \"executableId\": \"genai-evaluations-simplified\",\n", + " \"inputArtifactBindings\": [\n", + " {\n", + " \"key\": \"datasetFolder\",\n", + " \"artifactId\": artifact_id\n", + " }\n", + " ],\n", + " \"parameterBindings\": [\n", + " {\n", + " \"key\": \"repetitions\",\n", + " \"value\": repetitions\n", + " },\n", + " {\n", + " \"key\": \"orchestrationDeploymentURL\",\n", + " \"value\": orchestration_deployment_url\n", + " },\n", + " {\n", + " \"key\": \"metrics\",\n", + " \"value\": metrics_list\n", + " },\n", + " {\n", + " \"key\": \"testDataset\",\n", + " \"value\": test_datasets\n", + " },\n", + " {\n", + " \"key\": \"orchestrationRegistryIds\",\n", + " \"value\": orchestration_registry_id\n", + " },\n", + " {\n", + " \"key\": \"testRowCount\",\n", + " \"value\": \"2\"\n", + " }\n", + " ]\n", + "}\n", + "\n", + "def create_aicore_configuration():\n", + " headers = _get_headers()\n", + " GET_CONFIGURATIONS_ENDPOINT = '/v2/lm/configurations'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_CONFIGURATIONS_ENDPOINT}\"\n", + " try:\n", + " response = requests.post(\n", + " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", + " )\n", + " print(response)\n", + " if(response.status_code != 201):\n", + " raise\n", + " result = response.json()\n", + " print(result)\n", + " return result['id']\n", + " except:\n", + " logging.error(\"Error occurred while attempting to create a Configuration\")\n", + " raise\n", + " \n", + "configuration_id = create_aicore_configuration()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Evaluation Execution Creation\n", + "Once Configration is create, we create the AI Core execution which triggers the evaluation workload.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# create an execution with the created configuration.\n", + "\n", + "import requests\n", + "def create_execution():\n", + " headers = _get_headers()\n", + " GET_EXECUTIONS_ENDPOINT = '/v2/lm/executions'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_EXECUTIONS_ENDPOINT}\"\n", + " request_body = {\"configurationId\" : configuration_id} \n", + " try:\n", + " response = requests.post(\n", + " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", + " )\n", + " print(\"response received is \", response)\n", + " result = response.json()\n", + " print(result)\n", + " return result['id']\n", + " except:\n", + " logging.error(\"Error occurred while attempting to create an execution\")\n", + " raise\n", + " \n", + "\n", + "execution_id = create_execution()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# get execution status\n", + "import requests\n", + "def get_execution_status(execution_id):\n", + " headers = _get_headers()\n", + " LOG_EXECUTIONS_ENDPOINT = f'/v2/lm/executions/{execution_id}'\n", + " request_url = f\"{AICORE_BASE_URL}{LOG_EXECUTIONS_ENDPOINT}\"\n", + " try:\n", + " response = requests.get(\n", + " request_url, headers=headers, timeout=120\n", + " )\n", + " print(\"response received is \", response)\n", + " result = response.json()\n", + " return result\n", + " except:\n", + " logging.error(\"Error occurred while attempting to get execution status\")\n", + " raise\n", + " \n", + "\n", + "get_execution_status(execution_id)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "1. Run the following cells only when the status field in the Execution response is \"COMPLETED\" to view the results.\n", + "2. The status field progresses through different states over time: UNKNOWN → PENDING → RUNNING → COMPLETED. Ensure it reaches COMPLETED before proceeding.\n", + "\n", + "\n", + "Note: The targetStatus will always be COMPLETED from the start, as it represents the intended final state of the Execution. Do not confuse it with the actual status field.\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Evaluation Result\n", + "The evaluation job produces two outputs\n", + "1. A SQLite DB file which stores the orchestration input, orchestration output, values for all the metrics calculated for this orchestration output and statistics such as latency for this orchestration output. These metric values are called raw metric values. This SQLite DB file is stored in the object store as an AI Core output artifact.\n", + "2. A set of metrics whose values are aggregated from the raw metric values. The aggregate metrics are stored in the tracking service. The user-defined tags along with the run names are stored with the metrics.\n", + "Post execution completion user can see the runs generated by the workload along with the aggregate metrics by calling the tracking api as show below" + ] + }, + { + "cell_type": "code", + "execution_count": 299, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "response received is \n" + ] + } + ], + "source": [ + "# Get aggregate metrics using execution id\n", + "import requests\n", + "def retrieve_aggregate_metrics(execution_id):\n", + " headers = _get_headers()\n", + " GET_METRICS_ENDPOINT = f'/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/child-of={execution_id}'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_METRICS_ENDPOINT}\"\n", + " try:\n", + " response = requests.get(request_url, headers=headers, timeout=120)\n", + " print(\"response received is \", response)\n", + " result = response.json()\n", + " return result\n", + " except:\n", + " logging.error(\"Error occurred while attempting to retreive aggeregate metrics for the run\")\n", + " raise\n", + "\n", + "runs_data = retrieve_aggregate_metrics(execution_id)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To further drill down , User can also download the SQLite DB file from object storage and analyse the results(instance level metrics, logs etc) locally." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# download the result artifacts from Object store.\n", + "import boto3\n", + "\n", + "def download_all_objects(prefix, destination_folder):\n", + " \"\"\"\n", + " Recursively download all objects from an S3 bucket starting with a specific prefix.\n", + "\n", + " :param bucket_name: Name of the S3 bucket.\n", + " :param prefix: Prefix to filter objects in the bucket.\n", + " :param destination_folder: Local folder to save the downloaded files.\n", + " \"\"\"\n", + " s3_client = boto3.client(\n", + " 's3',\n", + " aws_access_key_id=AWS_ACCESS_KEY,\n", + " aws_secret_access_key=AWS_SECRET_ACCESS_KEY,\n", + " region_name=AWS_REGION\n", + " )\n", + "\n", + " # Ensure the destination folder exists\n", + " if not os.path.exists(destination_folder):\n", + " os.makedirs(destination_folder)\n", + "\n", + " # Paginate through objects\n", + " paginator = s3_client.get_paginator('list_objects_v2')\n", + " pages = paginator.paginate(Bucket=AWS_BUCKET_ID, Prefix=prefix)\n", + "\n", + " for page in pages:\n", + " if 'Contents' in page:\n", + " for obj in page['Contents']:\n", + " key = obj['Key']\n", + " local_file_path = os.path.join(destination_folder, os.path.relpath(key, prefix))\n", + "\n", + " # Ensure the local directory structure exists\n", + " local_directory = os.path.dirname(local_file_path)\n", + " if not os.path.exists(local_directory):\n", + " os.makedirs(local_directory)\n", + "\n", + " # Download the object\n", + " print(f\"Downloading {key} to {local_file_path}\")\n", + " s3_client.download_file(AWS_BUCKET_ID, key, local_file_path)\n", + "\n", + "\n", + "# Download the evaluation results from the object store. Look at execution status under \"outputArtifacts\" key to see the 'url'\n", + "# which shows the data path of where your output results are stored\n", + "EXECUTION_ID = execution_id\n", + "sqlite_db_prefix = f'{EXECUTION_ID}/tmp/' # change the prefix based on where your output artifact is stored in the bucket.\n", + "destination_folder = 'results-new'\n", + "\n", + "download_all_objects(sqlite_db_prefix, destination_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "NOTE: The below Cell shows results of top 5 rows of the Evaluation Results across all SQLite tables. IF you wish to see all the entries you can comment the line saying df.head(5) in the below cell or modify the number accordingly." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# viewing the results from sqlite db in tabular format..\n", + "import sqlite3\n", + "import pandas as pd\n", + "from IPython.display import display, HTML\n", + "\n", + "# Path to your SQLite database file\n", + "db_file = 'results-new/results.db'\n", + "\n", + "connection = sqlite3.connect(db_file)\n", + "\n", + "# Specify the table names you want to display\n", + "table_names = ['run','configuration', 'submission', 'submission_result', 'evaluation_result'] \n", + "\n", + "# Create the CSS and HTML container\n", + "html_content = \"\"\"\n", + "\n", + "
\n", + "\"\"\"\n", + "\n", + "for table_name in table_names:\n", + " query = f\"SELECT * FROM {table_name};\"\n", + " df = pd.read_sql_query(query, connection)\n", + " # If you want to see all the rows across all tables, remove/comment the next line\n", + " df = df.head(5) # Limiting the number of rows displayed\n", + " table_html = df.to_html(classes='table-container', index=False)\n", + " html_content += f\"\"\"\n", + "
\n", + "

Table: {table_name}

\n", + " {table_html}\n", + "
\n", + " \"\"\"\n", + "\n", + "html_content += \"
\"\n", + "\n", + "display(HTML(html_content))\n", + "\n", + "# Close the connection\n", + "connection.close()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Delete Execution Id\n", + "def delete_execution():\n", + " headers = _get_headers()\n", + " EXEC_ID = execution_id\n", + " GET_EXECUTIONS_ENDPOINT = '/v2/lm/executions/'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_EXECUTIONS_ENDPOINT}{EXEC_ID}\"\n", + " try:\n", + " response = requests.delete(\n", + " request_url, headers=headers, params={\"AI-Resource-Group\":AICORE_RESOURCE_GROUP}, timeout=120\n", + " )\n", + " print(response)\n", + " if(response.status_code != 202):\n", + " raise\n", + " result = response.json()\n", + " print(result)\n", + " except:\n", + " logging.error(\"Error occurred while attempting to delete a Configuration\")\n", + " raise\n", + " \n", + "delete_execution()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.13" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/AI_Core.json b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/AI_Core.json new file mode 100644 index 000000000..6ceca23fc --- /dev/null +++ b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/AI_Core.json @@ -0,0 +1,1578 @@ +{ + "name": "AI Core", + "version": "1", + "items": [ + { + "type": "http", + "name": "get_token", + "filename": "get_token.bru", + "seq": 1, + "request": { + "url": "{{ai_auth_url}}/oauth/token", + "method": "POST", + "headers": [ + { + "name": "Content-Type", + "value": "application/x-www-form-urlencoded", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "formUrlEncoded", + "formUrlEncoded": [ + { + "name": "grant_type", + "value": "client_credentials", + "enabled": true + }, + { + "name": "client_id", + "value": "{{client_id}}", + "enabled": true + }, + { + "name": "client_secret", + "value": "{{client_secret}}", + "enabled": true + } + ], + "multipartForm": [], + "file": [] + }, + "script": { + "res": "if (res.getStatus() == 200) {\n bru.setEnvVar(\"access_token\", res.body.access_token);\n}" + }, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "", + "auth": { + "mode": "none" + } + } + }, + { + "type": "folder", + "name": "admin", + "filename": "admin", + "root": { + "meta": { + "name": "admin" + } + }, + "items": [ + { + "type": "folder", + "name": "objectStoreSecrets", + "filename": "objectStoreSecrets", + "root": { + "meta": { + "name": "objectStoreSecrets" + } + }, + "items": [ + { + "type": "http", + "name": "Create a secret", + "filename": "Create a secret.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/v2/admin/objectStoreSecrets", + "method": "POST", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Content-Type", + "value": "application/json", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + }, + { + "name": "Authorization", + "value": "", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "json", + "json": "{\n \"name\": \"genai-data\",\n \"data\": {\n \"AWS_ACCESS_KEY_ID\": \"\",\n \"AWS_SECRET_ACCESS_KEY\": \"\"\n },\n \"type\": \"S3\",\n \"bucket\": \"\",\n \"endpoint\": \"https://s3.eu-central-1.amazonaws.com\",\n \"region\": \"\",\n \"pathPrefix\": \"example-dataset/veritasai\" \n }", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Create a secret based on the configuration in the request body\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "http", + "name": "Get a list of metadata of available secrets.", + "filename": "Get a list of metadata of available secrets.bru", + "seq": 2, + "request": { + "url": "{{baseUrl}}/admin/objectStoreSecrets?$top=&$skip=&$count=", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [ + { + "name": "$top", + "value": "", + "type": "query", + "enabled": true + }, + { + "name": "$skip", + "value": "", + "type": "query", + "enabled": true + }, + { + "name": "$count", + "value": "", + "type": "query", + "enabled": true + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve a list of metadata of the stored secrets.\n", + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "credentialsPlacement": "basic_auth_header", + "pkce": false, + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + } + } + }, + { + "type": "folder", + "name": "{objectStoreName}", + "filename": "{objectStoreName}", + "root": { + "meta": { + "name": "{objectStoreName}" + } + }, + "items": [ + { + "type": "http", + "name": "Delete object store secret", + "filename": "Delete object store secret.bru", + "seq": 2, + "request": { + "url": "{{baseUrl}}/admin/objectStoreSecrets/:objectStoreName", + "method": "DELETE", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [ + { + "name": "objectStoreName", + "value": "qKoZ-aHSe", + "type": "path", + "enabled": true + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Delete a secret with the name of objectStoreName if it exists.", + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "credentialsPlacement": "basic_auth_header", + "pkce": false, + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + } + } + }, + { + "type": "http", + "name": "Returns the of metadata of secrets which match the query parameter.", + "filename": "Returns the of metadata of secrets which match the query parameter.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/v2/admin/objectStoreSecrets", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "This retrieves the metadata of the stored secret which match the parameter objectStoreName.\nThe fetched secret is constructed like objectStoreName-object-store-secret\nThe base64 encoded field for the stored secret is not returned.\n", + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "credentialsPlacement": "basic_auth_header", + "pkce": false, + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + } + } + }, + { + "type": "http", + "name": "Update object store secret", + "filename": "Update object store secret.bru", + "seq": 3, + "request": { + "url": "{{baseUrl}}/admin/objectStoreSecrets/:objectStoreName", + "method": "PATCH", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Content-Type", + "value": "application/json", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [ + { + "name": "objectStoreName", + "value": "qKoZ-aHSe", + "type": "path", + "enabled": true + } + ], + "body": { + "mode": "json", + "json": "{\n \"name\": \"\",\n \"type\": \"\",\n \"data\": {},\n \"bucket\": \"\",\n \"endpoint\": \"\",\n \"region\": \"\",\n \"pathPrefix\": \"\",\n \"verifyssl\": \"\",\n \"usehttps\": \"1\"\n}", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Update a secret with name of objectStoreName if it exists.\n", + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "credentialsPlacement": "basic_auth_header", + "pkce": false, + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + } + } + } + ] + } + ] + } + ] + }, + { + "type": "folder", + "name": "lm", + "filename": "lm", + "root": { + "meta": { + "name": "lm" + } + }, + "items": [ + { + "type": "folder", + "name": "configurations", + "filename": "configurations", + "root": { + "meta": { + "name": "configurations" + } + }, + "items": [ + { + "type": "http", + "name": "Create configuration Copy", + "filename": "Create configuration Copy.bru", + "seq": 3, + "request": { + "url": "{{baseUrl}}/v2/lm/configurations", + "method": "DELETE", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Content-Type", + "value": "application/json", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "json", + "json": "{\n \"id\": \"\"\n}", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Create a new configuration linked to a specific scenario and executable for use in an execution\nor deployment.\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "http", + "name": "Create configuration", + "filename": "Create configuration.bru", + "seq": 2, + "request": { + "url": "{{baseUrl}}/v2/lm/configurations", + "method": "POST", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Content-Type", + "value": "application/json", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "json", + "json": "{\n \"name\": \"genai-eval-conf\",\n \"scenarioId\": \"genai-evaluations\",\n \"executableId\": \"genai-evaluations\",\n \"inputArtifactBindings\": [\n {\n \"key\": \"rootFolder\",\n \"artifactId\": \"\"\n }\n ],\n \"parameterBindings\": [\n {\n \"key\": \"repetitions\",\n \"value\": \"2\"\n },\n {\n \"key\": \"orchestrationDeploymentURL\",\n \"value\": \"\"\n },\n {\n \"key\": \"tags\",\n \"value\": \"{}\"\n },\n {\n \"key\": \"variableMapping\",\n \"value\": \"{\\\"prompt/question\\\": \\\"data/topic\\\"}\"\n },\n {\n \"key\": \"metrics\",\n \"value\": \"bert_score,bleu,rouge,content_filter_on_input,content_filter_on_output,exact_match,pointwise_instruction_following,pointwise_correctness,genai-evaluations/groundedness_formatted/0.0.1,genai-evaluations/correctness_structured/0.0.1\"\n },\n {\n \"key\": \"testDataset\",\n \"value\": \"{\\\"path\\\":\\\"testdata/medicalqna_dataset.csv\\\", \\\"type\\\": \\\"csv\\\"}\"\n },\n {\n \"key\": \"runs\",\n \"value\": \"runs/run1.json\"\n },\n {\n \"key\": \"customMetricConfig\",\n \"value\": \"custom-llm-metric.jsonl\"\n },\n {\n \"key\": \"testRowCount\",\n \"value\": \"2\"\n },\n {\n \"key\": \"debugMode\",\n \"value\": \"ON\"\n }\n ]\n}\n", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Create a new configuration linked to a specific scenario and executable for use in an execution\nor deployment.\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "http", + "name": "Get list of configurations", + "filename": "Get list of configurations.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/v2/lm/configurations", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve a list of configurations. Filter results by scenario ID or a list of executable IDs.\nSearch for configurations containing the search string as substring in the configuration name.\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "folder", + "name": "{configurationId}", + "filename": "{configurationId}", + "root": { + "meta": { + "name": "{configurationId}" + } + }, + "items": [ + { + "type": "http", + "name": "Get configuration by ID", + "filename": "Get configuration by ID.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/v2/lm/configurations", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve details for configuration with configurationId.", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + } + ] + }, + { + "type": "folder", + "name": "$count", + "filename": "$count", + "root": { + "meta": { + "name": "$count" + } + }, + "items": [ + { + "type": "http", + "name": "Get number of configurations", + "filename": "Get number of configurations.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/lm/configurations/$count?scenarioId=iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN&$search=}\"NI2Kn!V&searchCaseInsensitive=false&executableIds=T_jtbUJzwg0e.okSV667jeZejqVb,3e0cmfc4c-6YavNz92uztZE", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Accept", + "value": "text/plain", + "enabled": true + } + ], + "params": [ + { + "name": "scenarioId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": true + }, + { + "name": "$search", + "value": "}\"NI2Kn!V", + "type": "query", + "enabled": true + }, + { + "name": "searchCaseInsensitive", + "value": "false", + "type": "query", + "enabled": true + }, + { + "name": "executableIds", + "value": "T_jtbUJzwg0e.okSV667jeZejqVb,3e0cmfc4c-6YavNz92uztZE", + "type": "query", + "enabled": true + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve the number of available configurations that match the specified filter criteria.\nFilter criteria include a scenarioId or executableIdsList. Search by substring of configuration name is also possible.\n", + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "credentialsPlacement": "basic_auth_header", + "pkce": false, + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + } + } + } + ] + } + ] + }, + { + "type": "folder", + "name": "artifacts", + "filename": "artifacts", + "root": { + "meta": { + "name": "artifacts" + } + }, + "items": [ + { + "type": "http", + "name": "Get list of artifacts", + "filename": "Get list of artifacts.bru", + "seq": 2, + "request": { + "url": "{{baseUrl}}/v2/lm/artifacts", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [ + { + "name": "scenarioId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": false + }, + { + "name": "executionId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": false + }, + { + "name": "name", + "value": "[G7 ovyt8i", + "type": "query", + "enabled": false + }, + { + "name": "kind", + "value": "other", + "type": "query", + "enabled": false + }, + { + "name": "artifactLabelSelector", + "value": "ext.ai.sap.com/bXN1EAk=D*", + "type": "query", + "enabled": false + }, + { + "name": "$top", + "value": "10000", + "type": "query", + "enabled": false + }, + { + "name": "$skip", + "value": "", + "type": "query", + "enabled": false + }, + { + "name": "$search", + "value": "}\"NI2Kn!V", + "type": "query", + "enabled": false + }, + { + "name": "searchCaseInsensitive", + "value": "false", + "type": "query", + "enabled": false + }, + { + "name": "$expand", + "value": "scenario", + "type": "query", + "enabled": false + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve a list of artifacts that matches the specified filter criteria.\nFilter criteria include scenario ID, execution ID, an artifact name, artifact kind, or artifact labels.\nUse top/skip parameters to paginate the result list.\nSearch by substring of artifact name or description, if required.\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "http", + "name": "Register artifact", + "filename": "Register artifact.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/v2/lm/artifacts", + "method": "POST", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Content-Type", + "value": "application/json", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "json", + "json": "{\n \"name\": \"aiconfig\",\n \"kind\": \"dataset\",\n \"url\": \"ai://genai-data/genaiEvaluation/14af1af80b974edb8731632d17286343\",\n \"scenarioId\": \"genai-evaluations\"\n}\n", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Register an artifact for use in a configuration, for example a model or a dataset.", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "folder", + "name": "$count", + "filename": "$count", + "root": { + "meta": { + "name": "$count" + } + }, + "items": [ + { + "type": "http", + "name": "Get number of artifacts", + "filename": "Get number of artifacts.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/lm/artifacts/$count?scenarioId=iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN&executionId=iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN&name=[G7 ovyt8i&kind=other&$search=}\"NI2Kn!V&searchCaseInsensitive=false&artifactLabelSelector=ext.ai.sap.com/bXN1EAk=D*", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Accept", + "value": "text/plain", + "enabled": true + } + ], + "params": [ + { + "name": "scenarioId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": true + }, + { + "name": "executionId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": true + }, + { + "name": "name", + "value": "[G7 ovyt8i", + "type": "query", + "enabled": true + }, + { + "name": "kind", + "value": "other", + "type": "query", + "enabled": true + }, + { + "name": "$search", + "value": "}\"NI2Kn!V", + "type": "query", + "enabled": true + }, + { + "name": "searchCaseInsensitive", + "value": "false", + "type": "query", + "enabled": true + }, + { + "name": "artifactLabelSelector", + "value": "ext.ai.sap.com/bXN1EAk=D*", + "type": "query", + "enabled": true + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve the number of available artifacts that match the specified filter criteria.\nFilter criteria include a scenarioId, executionId, an artifact name, artifact kind, or artifact labels.\nSearch by substring of artifact name or description is also possible.\n", + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "credentialsPlacement": "basic_auth_header", + "pkce": false, + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + } + } + } + ] + } + ] + }, + { + "type": "folder", + "name": "executions", + "filename": "executions", + "root": { + "meta": { + "name": "executions" + } + }, + "items": [ + { + "type": "http", + "name": "Create execution", + "filename": "Create execution.bru", + "seq": 2, + "request": { + "url": "{{baseUrl}}/v2/lm/executions", + "method": "POST", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Content-Type", + "value": "application/json", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "json", + "json": "{\n \"configurationId\": \"\"\n}", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Create an execution using the configuration specified by configurationId.", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "http", + "name": "Get list of executions", + "filename": "Get list of executions.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/v2/lm/executions/", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [ + { + "name": "scenarioId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": false + }, + { + "name": "executionScheduleId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": false + }, + { + "name": "status", + "value": "DEAD", + "type": "query", + "enabled": false + }, + { + "name": "$top", + "value": "10000", + "type": "query", + "enabled": false + }, + { + "name": "$skip", + "value": "", + "type": "query", + "enabled": false + }, + { + "name": "$select", + "value": "status", + "type": "query", + "enabled": false + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve a list of executions that match the specified filter criteria.\nFilter criteria include a list of executableIds, a scenarioId, a configurationId, or a execution status.\nWith top/skip parameters it is possible to paginate the result list.\nWith select parameter it is possible to select only status.\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "folder", + "name": "$count", + "filename": "$count", + "root": { + "meta": { + "name": "$count" + } + } + } + ] + }, + { + "type": "folder", + "name": "deployments", + "filename": "deployments", + "root": { + "meta": { + "name": "deployments" + } + }, + "items": [ + { + "type": "http", + "name": "Create deployment", + "filename": "Create deployment.bru", + "seq": 2, + "request": { + "url": "{{baseUrl}}/v2/lm/deployments", + "method": "POST", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Content-Type", + "value": "application/json", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "json", + "json": "{\n \"configurationId\": \"\"\n}", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Create a deployment using the configuration specified by configurationId after synchronously checking the\ncorrectness of the configuration.\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "http", + "name": "Get list of deployments", + "filename": "Get list of deployments.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/v2/lm/deployments", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve a list of deployments that match the specified filter criteria.\nFilter criteria include a list of executableIds, a scenarioId, a configurationId, or a deployment status.\nWith top/skip parameters it is possible to paginate the result list.\nWith select parameter it is possible to select only status.\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "folder", + "name": "$count", + "filename": "$count", + "root": { + "meta": { + "name": "$count" + } + }, + "items": [ + { + "type": "http", + "name": "Get number of deployments", + "filename": "Get number of deployments.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/lm/deployments/$count?executableIds=T_jtbUJzwg0e.okSV667jeZejqVb,3e0cmfc4c-6YavNz92uztZE&configurationId=iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN&scenarioId=iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN&status=DEAD", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Accept", + "value": "text/plain", + "enabled": true + } + ], + "params": [ + { + "name": "executableIds", + "value": "T_jtbUJzwg0e.okSV667jeZejqVb,3e0cmfc4c-6YavNz92uztZE", + "type": "query", + "enabled": true + }, + { + "name": "configurationId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": true + }, + { + "name": "scenarioId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": true + }, + { + "name": "status", + "value": "DEAD", + "type": "query", + "enabled": true + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve the number of available deployments. The number can be filtered by\nscenarioId, configurationId, executableIdsList or by deployment status.\n", + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "credentialsPlacement": "basic_auth_header", + "pkce": false, + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + } + } + } + ] + } + ] + }, + { + "type": "folder", + "name": "metrics", + "filename": "metrics", + "root": { + "meta": { + "name": "metrics" + } + }, + "items": [ + { + "type": "http", + "name": "Evaluation Metrics via Execution ID", + "filename": "Evaluation Metrics via Execution ID.bru", + "seq": 4, + "request": { + "url": "{{baseUrl}}/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/child-of=", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [ + { + "name": "tagFilters", + "url": "{{baseUrl}}/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/child-of=", + "value": "evaluation.ai.sap.com/child-of=", + "type": "query", + "enabled": true + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "http", + "name": "Metrics by Run Name", + "filename": "Metrics by Run Name.bru", + "seq": 5, + "request": { + "url": "{{baseUrl}}/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/run-name=run1", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [ + { + "name": "tagFilters", + "value": "evaluation.ai.sap.com/run-name=run1", + "type": "query", + "enabled": true + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + } + ] + } + ] + } + ], + "activeEnvironmentUid": "lWUmIcEkGnkMxwNBILLmY", + "environments": [ + { + "variables": [ + { + "name": "ai_auth_url", + "value": "", + "enabled": true, + "secret": false, + "type": "text" + }, + { + "name": "ai_api_url", + "value": "", + "enabled": true, + "secret": false, + "type": "text" + }, + { + "name": "client_id", + "value": "", + "enabled": true, + "secret": false, + "type": "text" + }, + { + "name": "client_secret", + "value": "", + "enabled": true, + "secret": false, + "type": "text" + }, + { + "name": "resource_group", + "value": "", + "enabled": true, + "secret": false, + "type": "text" + }, + { + "name": "orchestration_service_url", + "value": "", + "enabled": true, + "secret": false, + "type": "text" + }, + { + "name": "access_token", + "value": "", + "enabled": true, + "secret": true, + "type": "text" + } + ], + "name": "intprod" + } + ], + "root": { + "request": { + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "state": "", + "pkce": false, + "credentialsPlacement": "basic_auth_header", + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + }, + "vars": { + "req": [ + { + "name": "region", + "value": "prod.eu-central-1.aws", + "enabled": true, + "local": false, + "uid": "oYVk4DuVpyYqqP2roBVjE" + }, + { + "name": "baseUrl", + "value": "", + "enabled": true, + "local": false, + "uid": "I4KjDm7FxpSRwUYzjwfPG" + }, + { + "name": "auth_url", + "value": "", + "enabled": true, + "local": false, + "uid": "zuftvyCURtA9XYErCYDgo" + }, + { + "name": "client_id", + "value": "", + "enabled": true, + "local": false, + "uid": "JfGEVKm71BYTgR8UkQUGv" + }, + { + "name": "client_secret", + "value": "", + "enabled": true, + "local": false, + "uid": "ls3RYTJ40baTl8eYmilGt" + }, + { + "name": "AWS_ACCESS_KEY_ID", + "value": "", + "enabled": true, + "local": false, + "uid": "2O0YTTAdmYltm5XiHMhP2" + }, + { + "name": "AWS_SECRET_ACCESS_KEY", + "value": "", + "enabled": true, + "local": false, + "uid": "8rc4RYyPcHXyTkAnnI981" + }, + { + "name": "BUCKET_NAME", + "value": "", + "enabled": true, + "local": false, + "uid": "HqFIe8Rvc14i41WIAGGkl" + }, + { + "name": "DATABASE_URL", + "value": "https://s3-eu-central-1.amazonaws.com", + "enabled": true, + "local": false, + "uid": "aWIwuJZH5XQ5Guu2D69Sq" + } + ] + } + }, + "docs": "Provides tools to manage your scenarios and workflows in SAP AI Core. Execute pipelines as a batch job, for example to pre-process or train your models, or perform batch inference. Serve inference requests of trained models. Deploy а trained machine learning model as a web service to serve inference requests with high performance. Register your own Docker registry, synchronize your AI content from your own git repository, and register your own object store for training data and trained models.\n", + "meta": { + "name": "AI Core" + } + }, + "brunoConfig": { + "version": "1", + "name": "AI Core", + "type": "collection", + "ignore": [ + "node_modules", + ".git" + ], + "size": 0.10747432708740234, + "filesCount": 151 + } +} \ No newline at end of file diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/custom-eval.jpg b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/custom-eval.jpg new file mode 100644 index 000000000..034cdba2d Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/custom-eval.jpg differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image-br01.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image-br01.png new file mode 100644 index 000000000..5424ea51d Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image-br01.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image-br02.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image-br02.png new file mode 100644 index 000000000..4ed9d9ab0 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image-br02.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image-br03.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image-br03.png new file mode 100644 index 000000000..acba788ce Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image-br03.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image-br04.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image-br04.png new file mode 100644 index 000000000..9f8a175e4 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image-br04.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image-br05.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image-br05.png new file mode 100644 index 000000000..8ed409f63 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image-br05.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image-br06.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image-br06.png new file mode 100644 index 000000000..cdcd63eef Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image-br06.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image-br07.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image-br07.png new file mode 100644 index 000000000..e4607d817 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image-br07.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image078.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image078.png new file mode 100644 index 000000000..91824dea4 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image078.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image080.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image080.png new file mode 100644 index 000000000..fae2959a0 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image080.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_1.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_1.png new file mode 100644 index 000000000..6db3eb05c Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_1.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_10.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_10.png new file mode 100644 index 000000000..3d7e561b2 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_10.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_11.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_11.png new file mode 100644 index 000000000..d229b3e6e Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_11.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_12.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_12.png new file mode 100644 index 000000000..49fda0642 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_12.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_13.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_13.png new file mode 100644 index 000000000..a26476d70 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_13.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_14.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_14.png new file mode 100644 index 000000000..c4d522fb2 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_14.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_15.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_15.png new file mode 100644 index 000000000..c8fcfd6b2 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_15.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_16.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_16.png new file mode 100644 index 000000000..106fc608e Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_16.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_17.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_17.png new file mode 100644 index 000000000..762560132 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_17.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_18.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_18.png new file mode 100644 index 000000000..45641b91a Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_18.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_19.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_19.png new file mode 100644 index 000000000..91498a203 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_19.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_2.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_2.png new file mode 100644 index 000000000..1232299cc Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_2.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_20.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_20.png new file mode 100644 index 000000000..3b58f47ce Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_20.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_21.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_21.png new file mode 100644 index 000000000..dd9f9f22b Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_21.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_22.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_22.png new file mode 100644 index 000000000..abcae67d6 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_22.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_23.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_23.png new file mode 100644 index 000000000..97b0bc60f Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_23.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_24.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_24.png new file mode 100644 index 000000000..a1cb40822 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_24.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_25.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_25.png new file mode 100644 index 000000000..afdb0e197 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_25.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_26.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_26.png new file mode 100644 index 000000000..1b0bdc013 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_26.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_26_01.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_26_01.png new file mode 100644 index 000000000..0e76115ab Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_26_01.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_27.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_27.png new file mode 100644 index 000000000..10711ea62 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_27.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_28.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_28.png new file mode 100644 index 000000000..89ba9a753 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_28.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_29.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_29.png new file mode 100644 index 000000000..bc30129b1 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_29.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_3.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_3.png new file mode 100644 index 000000000..4020c1f58 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_3.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_30.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_30.png new file mode 100644 index 000000000..7cfc06342 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_30.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_31.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_31.png new file mode 100644 index 000000000..7a1a959fb Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_31.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_32.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_32.png new file mode 100644 index 000000000..fe827f346 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_32.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_33.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_33.png new file mode 100644 index 000000000..546d43b52 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_33.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_34.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_34.png new file mode 100644 index 000000000..4fa0960a1 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_34.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_35.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_35.png new file mode 100644 index 000000000..0f08f722b Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_35.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_36.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_36.png new file mode 100644 index 000000000..e16733c52 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_36.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_37.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_37.png new file mode 100644 index 000000000..93052a306 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_37.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_38.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_38.png new file mode 100644 index 000000000..19c9bce7f Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_38.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_39.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_39.png new file mode 100644 index 000000000..2fa160e3f Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_39.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_4.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_4.png new file mode 100644 index 000000000..db1a55ca0 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_4.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_40.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_40.png new file mode 100644 index 000000000..d7a45cf53 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_40.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_41.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_41.png new file mode 100644 index 000000000..455af09dc Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_41.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_42.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_42.png new file mode 100644 index 000000000..43ca443e0 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_42.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_5.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_5.png new file mode 100644 index 000000000..2e5ddd73f Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_5.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_6.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_6.png new file mode 100644 index 000000000..2e8d9e4f9 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_6.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_8.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_8.png new file mode 100644 index 000000000..7b15dacbb Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_8.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_9.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_9.png new file mode 100644 index 000000000..c995a0bb3 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_9.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_ail_or1.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_ail_or1.png new file mode 100644 index 000000000..9bc2a9787 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_ail_or1.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_ail_or2.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_ail_or2.png new file mode 100644 index 000000000..fedc32621 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_ail_or2.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_ail_or3.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_ail_or3.png new file mode 100644 index 000000000..4819de3fc Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_ail_or3.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_ail_or4.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_ail_or4.png new file mode 100644 index 000000000..d523dc873 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_ail_or4.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_ail_or5.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_ail_or5.png new file mode 100644 index 000000000..a039b11fa Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_ail_or5.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_br_dt.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_br_dt.png new file mode 100644 index 000000000..75aba902d Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_br_dt.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_br_pip.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_br_pip.png new file mode 100644 index 000000000..30b41816d Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_br_pip.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_br_sec.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_br_sec.png new file mode 100644 index 000000000..1f08fcea1 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_br_sec.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_py_dtst.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_py_dtst.png new file mode 100644 index 000000000..bc861c016 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_py_dtst.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_py_pip.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_py_pip.png new file mode 100644 index 000000000..0376b7307 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_py_pip.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_py_sec.png b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_py_sec.png new file mode 100644 index 000000000..bbb423978 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/image_py_sec.png differ diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/img/requirements.txt b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/requirements.txt new file mode 100644 index 000000000..7d4a6ccff --- /dev/null +++ b/tutorials/ai-core-genaihub-evaluation-with-grounding/img/requirements.txt @@ -0,0 +1,5 @@ +generative-ai-hub-sdk==4.4.3 +python-dotenv==1.0.1 +boto3==1.37.4 +pandas==2.2.3 +json2html==1.3.0 \ No newline at end of file diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/requirements.txt b/tutorials/ai-core-genaihub-evaluation-with-grounding/requirements.txt new file mode 100644 index 000000000..7d4a6ccff --- /dev/null +++ b/tutorials/ai-core-genaihub-evaluation-with-grounding/requirements.txt @@ -0,0 +1,5 @@ +generative-ai-hub-sdk==4.4.3 +python-dotenv==1.0.1 +boto3==1.37.4 +pandas==2.2.3 +json2html==1.3.0 \ No newline at end of file diff --git a/tutorials/ai-core-genaihub-evaluation-with-grounding/sample.env b/tutorials/ai-core-genaihub-evaluation-with-grounding/sample.env new file mode 100644 index 000000000..09eeddf3f --- /dev/null +++ b/tutorials/ai-core-genaihub-evaluation-with-grounding/sample.env @@ -0,0 +1,13 @@ +# AICORE CREDENTIALS +AICORE_CLIENT_ID= +AICORE_CLIENT_SECRET=AICORE CLIENT SECRET> +AICORE_AUTH_URL= +AICORE_BASE_URL= + +# AWS CREDENTIALS +AWS_ACCESS_KEY= +AWS_BUCKET_ID=> +AWS_REGION= +AWS_SECRET_ACCESS_KEY= +AWS_USERNAME= +AWS_HOST= diff --git a/tutorials/ai-core-genaihub-evaluation/PUT_YOUR_CUSTOM_METRIC_HERE/custom-llm-metric.json b/tutorials/ai-core-genaihub-evaluation/PUT_YOUR_CUSTOM_METRIC_HERE/custom-llm-metric.json deleted file mode 100644 index fbce301d6..000000000 --- a/tutorials/ai-core-genaihub-evaluation/PUT_YOUR_CUSTOM_METRIC_HERE/custom-llm-metric.json +++ /dev/null @@ -1,34 +0,0 @@ -{ - "createdAt": "2025-08-18 09:38:01.990700", - "name": "groundedness", - "scenario": "genai-evaluations", - "version": "0.0.1", - "evaluationMethod": "llm-as-a-judge", - "metricType": "evaluation", - "managedBy": "imperative", - "systemPredefined": false, - "spec": { - "promptType": "free-form", - "configuration": { - "modelConfiguration": { - "name": "gpt-4o", - "version": "2024-08-06", - "parameters": [ - { - "key": "temperature", - "value": "0.1" - }, - { - "key": "max_tokens", - "value": "110" - } - ] - }, - "promptConfiguration": { - "systemPrompt": "You should strictly follow the instruction given to you. Please act as an impartial judge and evaluate the quality of the responses based on the prompt and following criteria:", - "userPrompt": "You are an expert evaluator. Your task is to evaluate the quality of the responses generated by AI models. We will provide you with a reference and an AI-generated response. You should first read the user input carefully for analyzing the task, and then evaluate the quality of the responses based on the criteria provided in the Evaluation section below. You will assign the response a rating following the Rating Rubric and Evaluation Steps. Give step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.\n\n## Metric Definition\nYou are an INFORMATION OVERLAP classifier providing the overlap of information between a response and reference.\n\n## Criteria\nGroundedness: The of information between a response generated by AI models and provided reference.\n\n## Rating Rubric\n5: (Fully grounded). The response and the reference are fully overlapped.\n4: (Mostly grounded). The response and the reference are mostly overlapped.\n3: (Somewhat grounded). The response and the reference are somewhat overlapped.\n2: (Poorly grounded). The response and the reference are slightly overlapped.\n1: (Not grounded). There is no overlap between the response and the reference.\n\n## Evaluation Steps\nSTEP 1: Assess the response in aspects of Groundedness. Identify any information in the response and provide assessment according to the Criteria.\nSTEP 2: Score based on the rating rubric. Give a brief rationale to explain your evaluation considering Groundedness.\n\nReference: {{?reference}}\nResponse: {{?aicore_llm_completion}}\n\nBegin your evaluation by providing a short explanation. Be as unbiased as possible. After providing your explanation, please rate the response according to the rubric and outputs STRICTLY following this JSON format:\n\n{ \"explanation\": string, \"rating\": integer }\n\nOutput:\n", - "dataType": "numeric" - } - } - } -} \ No newline at end of file diff --git a/tutorials/ai-core-genaihub-evaluation/PUT_YOUR_CUSTOM_METRIC_HERE/custom-llm-metric.jsonl b/tutorials/ai-core-genaihub-evaluation/PUT_YOUR_CUSTOM_METRIC_HERE/custom-llm-metric.jsonl deleted file mode 100644 index c73e47a55..000000000 --- a/tutorials/ai-core-genaihub-evaluation/PUT_YOUR_CUSTOM_METRIC_HERE/custom-llm-metric.jsonl +++ /dev/null @@ -1 +0,0 @@ -{"createdAt":"2025-08-18 09:38:01.990700","name":"groundedness","scenario":"genai-evaluations","version":"0.1.6","evaluationMethod":"llm-as-a-judge", "metricType":"evaluation", "managedBy":"imperative","systemPredefined":false,"spec":{"promptType":"free-form","configuration":{"modelConfiguration":{"name":"gpt-4o","version":"2024-08-06","parameters":[{"key":"temperature","value":"0.1"},{"key":"max_tokens","value":"110"}]},"promptConfiguration":{"systemPrompt":"You should strictly follow the instruction given to you. Please act as an impartial judge and evaluate the quality of the responses based on the prompt and following criteria:","userPrompt":"You are an expert evaluator. Your task is to evaluate the quality of the responses generated by AI models. We will provide you with a reference and an AI-generated response. You should first read the user input carefully for analyzing the task, and then evaluate the quality of the responses based on the criteria provided in the Evaluation section below. You will assign the response a rating following the Rating Rubric and Evaluation Steps. Give step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.\n\n## Metric Definition\nYou are an INFORMATION OVERLAP classifier providing the overlap of information between a response and reference.\n\n## Criteria\nGroundedness: The of information between a response generated by AI models and provided reference.\n\n## Rating Rubric\n5: (Fully grounded). The response and the reference are fully overlapped.\n4: (Mostly grounded). The response and the reference are mostly overlapped.\n3: (Somewhat grounded). The response and the reference are somewhat overlapped.\n2: (Poorly grounded). The response and the reference are slightly overlapped.\n1: (Not grounded). There is no overlap between the response and the reference.\n\n## Evaluation Steps\nSTEP 1: Assess the response in aspects of Groundedness. Identify any information in the response and provide assessment according to the Criteria.\nSTEP 2: Score based on the rating rubric. Give a brief rationale to explain your evaluation considering Groundedness.\n\nReference: {{?reference}}\nResponse: {{?aicore_llm_completion}}\n\nBegin your evaluation by providing a short explanation. Be as unbiased as possible. After providing your explanation, please rate the response according to the rubric and outputs STRICTLY following this JSON format:\n\n{ \"explanation\": string, \"rating\": integer }\n\nOutput:\n","dataType":"numeric"}}}} \ No newline at end of file diff --git a/tutorials/ai-core-genaihub-evaluation/PUT_YOUR_PROMPT_TEMPLATE_HERE/prompt_template.json b/tutorials/ai-core-genaihub-evaluation/PUT_YOUR_PROMPT_TEMPLATE_HERE/prompt_template.json deleted file mode 100644 index c22605a33..000000000 --- a/tutorials/ai-core-genaihub-evaluation/PUT_YOUR_PROMPT_TEMPLATE_HERE/prompt_template.json +++ /dev/null @@ -1,8 +0,0 @@ -{ - "template": [ - { - "role": "user", - "content": "List the benefits and side effects of the drug in the following consumer health question: {{?question}}." - } - ] -} \ No newline at end of file diff --git a/tutorials/ai-core-genaihub-evaluation/ai-core-genaihub-evaluation.md b/tutorials/ai-core-genaihub-evaluation/ai-core-genaihub-evaluation.md index 546d84d15..0356b561e 100644 --- a/tutorials/ai-core-genaihub-evaluation/ai-core-genaihub-evaluation.md +++ b/tutorials/ai-core-genaihub-evaluation/ai-core-genaihub-evaluation.md @@ -1,2520 +1,1951 @@ ---- -parser: v2 -auto_validation: true -time: 45 -primary_tag: software-product>sap-business-technology-platform -tags: [ tutorial>beginner, topic>artificial-intelligence, topic>machine-learning, software-product>sap-business-technology-platform ] -author_name: Smita Naik -author_profile: https://github.com/I321506 ---- - -# Using Evaluation Service available in SAP AI Core - This tutorial demonstrates how to use SAP AI Core Custom Evaluation to benchmark Large Language Models (LLMs) using two different approaches **Prompt Registry** and **Orchestration Registry**. It guides you through dataset preparation, environment setup, configuration creation, execution, and result analysis in a unified and simplified workflow. - -It extends the Quick Start tutorial and is intended for Application Developers and Data Scientists who already know the basics of GenAI workflows in SAP AI Core. - -## You will learn -- How to prepare and organize datasets for evaluation. -- How to choose between **Prompt Registry** and **Orchestration Registry** approaches. -- How to configure and run evaluations in SAP AI Core. -- How to analyze and interpret aggregated evaluation results. - -## Prerequisites - -- Setup Environment: -Ensure your instance and AI Core credentials are properly configured according to the steps provided in the initial tutorial -- Orchestration Deployment: -Ensure at least one orchestration deployment is ready to be consumed during this process. -Refer to [this tutorial understand the basic consumption of GenAI models using orchestration.](https://developers.sap.com/tutorials/ai-core-orchestration-consumption.html) -- Basic Knowledge: Familiarity with the orchestration workflow is recommended -- Install Dependencies: Install the required Python packages using the requirements.txt file provided. -Download [requirements.txt](img/requirements.txt) -💡 Right-click the link above and choose **"Save link as..."** to download it directly. - -**Below are the Steps to Run a GenAI Evaluation in SAP AI Core** - -## Pre-Read - -The structure of the input data should be as follows: - -``` -Root -├── PUT_YOUR_PROMPT_TEMPLATE_HERE -| ├── prompt_template.json -│ -├── PUT_YOUR_DATASET_HERE -│ ├── medicalqna_dataset.csv -| -└── PUT_YOUR_CUSTOM_METRIC_HERE - ├── custom-llm-metric.json - ├── custom-llm-metric.jsonl -``` - -**Dataset and Configuration**: -To run this evaluation, All required input files must be placed inside the folder structure provided in the repository: - -You can download or clone the complete folder from the link below and place your files inside the respective folders [Download / Open Full Folder Structure](https://github.com/SAP-samples/aicore-genai-samples/blob/main/genai-sample-apps/prompt-evaluation) - - 1. **Prompt Template Configuration (`PUT_YOUR_PROMPT_TEMPLATE_HERE`)** - * Place one or more prompt template configurations as JSON files in this folder. - 2. **Test Dataset (`PUT_YOUR_DATASET_HERE`)** - * The test dataset should be a CSV, JSON, or JSONL file containing prompt variables, ground truth references, and other data required for evaluation. - 3. **Custom Metrics (`PUT_YOUR_CUSTOM_METRIC_HERE`)** - * (Optional) You can provide custom metric definitions in a single JSON or JSONL file. For JSONL, each line should be a JSON object defining one metric. For JSON, it should be an array of metric-definition objects. - -### Environment Variables Setup - -[OPTION BEGIN [SAP AI Launchpad]] - -- Navigate to your SAP AI Core Launchpad. - -- In the Workspaces section, click on "Add" to create a new workspace. - - A workspace in SAP AI Core is a logical container that holds your resources (like models and pipelines) and provides the isolation needed for your projects. - -- When prompted, enter your AI Core credentials (such as Client ID, Client Secret, and Base URL). - - Note: If you're unsure about where to find these credentials, refer to this [guide](https://developers.sap.com/tutorials/ai-core-generative-ai.html#1c4f36d7-f345-4822-be00-c15f133ff7d8). - -- Once the workspace is successfully created, select your desired Resource Group to begin the evaluation process. - -Refer to the screenshot below for guidance: -![img](img/image_34.png) - -[OPTION END] - -[OPTION BEGIN [Python]] - -- Open **Visual Studio Code or Jupyter Notebook**. Create a new file with the .ipynb extension (e.g., custom_evaluation.ipynb). -- Create a **.env** file in the root directory of your project. -- Add your **AI Core** and **AWS credentials** as shown below. - -```env -# AICORE CREDENTIALS -AICORE_CLIENT_ID= -AICORE_CLIENT_SECRET= -AICORE_AUTH_URL= -AICORE_BASE_URL= -AICORE_RESOURCE_GROUP= - -# AWS CREDENTIALS -AWS_ACCESS_KEY= -AWS_BUCKET_ID= -AWS_REGION= -AWS_SECRET_ACCESS_KEY= - -# ORCHESTRATION DEPLOYMENT URL -DEPLOYMENT_URL= -``` - -**Note:** Replace placeholders (e.g., CLIENT_ID, CLIENT_SECRET, etc) with your actual environment credentials. - -Refer to the below screenshot for clarity: -![img](img/image_1.png) - -#### Install Dependencies - -Install the required packages using the [requirements.txt](img/requirements.txt) file you downloaded in the Prerequisites section. -```bash -pip install -r requirements.txt -``` -#### Connect to AI Core Instance - -Once the environment variables are set and dependencies are installed, run the following code to connect to your instance: - -```PYTHON -# Loading the credentials from the env file -from gen_ai_hub.proxy.gen_ai_hub_proxy import GenAIHubProxyClient -from dotenv import load_dotenv -import os - -load_dotenv(override=True) - -# Fetching environment variables -AICORE_BASE_URL = os.getenv("AICORE_BASE_URL") -AICORE_RESOURCE_GROUP = os.getenv("AICORE_RESOURCE_GROUP") -AICORE_AUTH_URL = os.getenv("AICORE_AUTH_URL") -AICORE_CLIENT_ID = os.getenv("AICORE_CLIENT_ID") -AICORE_CLIENT_SECRET = os.getenv("AICORE_CLIENT_SECRET") - -AWS_ACCESS_KEY = os.getenv("AWS_ACCESS_KEY") -AWS_BUCKET_ID = os.getenv("AWS_BUCKET_ID") -AWS_REGION = os.getenv("AWS_REGION") -AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY") -DEPLOYMENT_URL = os.getenv("DEPLOYMENT_URL") - -# Initializing the GenAIHubProxyClient -client = GenAIHubProxyClient( - base_url=AICORE_BASE_URL, - auth_url=AICORE_AUTH_URL, - client_id=AICORE_CLIENT_ID, - client_secret=AICORE_CLIENT_SECRET, - resource_group=AICORE_RESOURCE_GROUP -) -``` - -**NOTE:** -- Ensure the **requirements.txt** installation completes successfully before running the code. -- If you face any issues, recheck your **.env** values and installed packages. - -[OPTION END] - -[OPTION BEGIN [Bruno]] - -- Download the [Bruno_collections](img/AI_Core.json) file - -- please follow the steps in the [Tutorial](https://developers.sap.com/tutorials/ai-core-orchestration-consumption.html) to set up your environment, refer step - **Set Up Your Environment and Configure Access** and proceed till generating the token - -[OPTION END] - -### Registering an Object Store Secret in AI Core - -[OPTION BEGIN [SAP AI Launchpad]] - -- Open the **SAP AI Core Launchpad** and navigate to the **Administration** tab. -- Select the **Object Store** section from the left-hand menu. -- Click on **“Add”** to register a new object store secret. -- Fill in the required bucket details as shown in the screenshot below. - -![img](img/image_33.png) - -In the **Secret** field, use the following structure to provide your AWS credentials: - -```json -{ - "AWS_ACCESS_KEY_ID": "Enter Your value", - "AWS_SECRET_ACCESS_KEY": "Enter Your value" -} -``` - -[OPTION END] - -[OPTION BEGIN [Python]] - -To make your evaluation files available for AI Core orchestration, you need to: - -- Upload them to an object store (e.g., AWS S3). -- Register the object store secret in AI Core. - -#### **Setup Authentication and Headers** - -First, define the authentication headers for AI Core REST API calls. - -```PYTHON -def _get_headers(): - headers = { - "Authorization": client.get_ai_core_token(), - "AI-Resource-Group": AICORE_RESOURCE_GROUP, - "Content-Type": "application/json", - } - return headers -``` - -#### **Register Object Store Secret in AI Core** - -Register your S3 bucket and credentials as a secret. - -```PYTHON -# Register S3 secret with AI Core which will be used an input source -import requests - -def register_oss_secret(): - headers = _get_headers() - - POST_SECRETS_ENDPOINT = '/v2/admin/objectStoreSecrets' - request_url = f"{AICORE_BASE_URL}{POST_SECRETS_ENDPOINT}" - - request_body = { - "name": "genai-data", - "data": { - "AWS_ACCESS_KEY_ID": AWS_ACCESS_KEY, - "AWS_SECRET_ACCESS_KEY": AWS_SECRET_ACCESS_KEY - }, - "type": "S3", - "bucket": AWS_BUCKET_ID, - "endpoint": "s3-eu-central-1.amazonaws.com", - "region": AWS_REGION, - "pathPrefix": "" - } - try: - response = requests.post( - request_url, headers=headers, data=json.dumps(request_body), timeout=120 - ) - result = response.json() - print(result) - return result - except: - logging.error("Error occurred while attempting to create object store secret") - raise - -register_oss_secret() -``` -[OPTION END] - -[OPTION BEGIN [Bruno]] - -Generic secrets securely store AWS S3 credentials required for document access - -• Expand **objectStoreSecrets** under admin and select create a secret request - -Use the below payload to create a secret for AWS S3 with NoAuthentication as authentication type. - -```CODE -{ - "name": "genai-data", - "data": { - "AWS_ACCESS_KEY_ID": "", - "AWS_SECRET_ACCESS_KEY": "", - }, - "type": "S3", - "bucket": "", - "endpoint": "", - "region": "", - "pathPrefix": "" - } -``` -• Ensure that all values in the data dictionary are Base64-encoded as per AWS S3 credential requirements - -![img](img/image-br01.png) - -[OPTION END] - -> ⚠️ **Important Note (Must Read)** -> -> - You must **create an object store secret** with a user defined name (for eg: default) to store **output artifacts** from orchestration runs. This is **mandatory**. -> - For **input artifacts**, you may create additional object store secrets with different names if needed. -> - If a user defined name (for eg: default) is not configured, orchestration runs will **fail** due to missing output target setup. - - -### Upload and Register Dataset - -[OPTION BEGIN [SAP AI Launchpad]] - -After creating the secret, upload your evaluation files to the S3 bucket and register them as an artifact in AI Core. - -#### **Register Uploaded Files as Artifact in AI Core** - -To register your evaluation dataset with SAP AI Core, you need to upload it as an artifact. Follow the instructions below using the **SAP AI Launchpad UI**. - ---- - -- Open the **SAP AI Core Launchpad**. -- Navigate to the **Generative AI/Optimization/Artifacts** section to create dataset artifact. - -![img](img/image_19.png) - -- On the **Artifacts** section, click **add**. - ---- - -- On the **General Information** screen, enter the following: - - - **Select Scenario:** `genai-evaluations` - - **Name:** `genai-eval-test-data` - - **Description:** `Demo artifacts for evaluation flow.` - - **Select Object Store:** `genai-data` - - **Sub-folder path:** `genaiEvaluation/` - - > 💡 Replace `` with your **SAP BTP user ID** or the folder path in your object store where the evaluation files are uploaded. - -- On the **Labels** screen, click **“Add Label”** and provide the following: - - - **Key:** `prompt-evaluation` - - **Value:** `true` - *(Note: The prefix `ext.ai.sap.com/` is automatically pre-filled in the UI.)* - - ![img](img/image_21.png) - -- Review all entered details carefully. -- Click **“Add”** to complete the artifact registration. - -[OPTION END] - -[OPTION BEGIN [Python]] - -After creating the secret, organize your evaluation files into the eval/ folder testdata. Upload them to S3 and register as artifacts in AI Core. - -#### **Upload Files to S3 Bucket** -```python -# Uploads the testdata folder to Object Store for simplified workflow -def upload_folder_to_s3(root_folder, bucket_name, s3_prefix=None): - """ - Look for 'testdata' folder inside root_folder and upload it to S3 under the same s3_prefix. - If no s3_prefix is provided, a static prefix or a UUID will be used. - - The S3 structure will be: - genaiEvaluation/{s3_prefix}/testdata/... - - Args: - root_folder (str): Path containing the 'testdata' subfolder. - bucket_name (str): Name of the S3 bucket. - s3_prefix (str, optional): S3 prefix path. Defaults to None. - - Returns: - str: The path for newly uploaded input artifacts on S3. - - Raises: - FileNotFoundError: If 'testdata' subfolder is missing. - """ - testdata_folder = os.path.join(root_folder, "testdata") - if not os.path.isdir(testdata_folder): - raise FileNotFoundError(f"Missing required folder: testdata in {root_folder}") - - if s3_prefix is None: - # Generate a unique prefix using UUID or static ID - prefix_guid = "" # replace with UUID if needed - s3_prefix = f"genaiEvaluation/{prefix_guid}" - - s3_client = boto3.client( - 's3', - aws_access_key_id=AWS_ACCESS_KEY, - aws_secret_access_key=AWS_SECRET_ACCESS_KEY, - region_name=AWS_REGION - ) - - # Upload only the testdata folder - full_prefix = f"{s3_prefix}/testdata" - for root, _, files in os.walk(testdata_folder): - for file in files: - local_path = os.path.join(root, file) - relative_path = os.path.relpath(local_path, testdata_folder) - s3_key = f"{full_prefix}/{relative_path}".replace("\\", "/") - print(f"Uploading {local_path} to s3://{bucket_name}/{s3_key}") - s3_client.upload_file(local_path, bucket_name, s3_key) - - return f"ai://genai-data/{s3_prefix}" -``` - ![img](img/image_5.png) - -#### **Register Uploaded Files as Artifact in AI Core** - -```python -# Registering the uploaded files from AWS as artifacts to use inside configuration. -def register_artifact(input_artifact_path): - headers = _get_headers() - - GET_ARTIFACTS_ENDPOINT = '/v2/lm/artifacts' - request_url = f"{AICORE_BASE_URL}{GET_ARTIFACTS_ENDPOINT}" - - request_body = { - "labels": [ - { - "key": "ext.ai.sap.com/prompt-evaluation", - "value": "true" - } - ], - "name": "genai-eval-test-data", - "kind": "other", - "url": input_artifact_path, - "description": "demo artifacts for evaluation flow.", - "scenarioId": "genai-evaluations" - } - try: - response = requests.post( - request_url, headers=headers, data=json.dumps(request_body), timeout=120 - ) - result = response.json() - print(result) - return result['id'] - except: - print("Error occurred while attempting to create an execution") - raise -``` -![img](img/image_6.png) - -[OPTION END] - -[OPTION BEGIN [Bruno]] - -Before registering a dataset artifact in Bruno, you must upload your CSV file to the SAP AI Core object store using the Dataset API. -Bruno cannot upload files directly to S3; therefore, this step is required. - -**Prerequisites** - - - An object store secret must already exist in your resource group.Typically, this is the default secret named **default**. - - - The Dataset API currently supports: - - - S3 object stores only - - - CSV file uploads - -**Upload Your Dataset** - -Use the Dataset API – Upload File request in Bruno: - -```bash -PUT:{{ai_api_url}}/v2/lm/dataset/files/{{secretName}}/{{datasetPath}} -``` - -**Headers** - -```json -Authorization: Bearer {{token}} -AI-Resource-Group: {{resourceGroup}} -Content-Type: text/csv -``` - -**Body** - -Upload your .csv file directly as binary in Bruno’s Body - -Example Path Values: - - - secretName: default - - - datasetPath: testdata/medicalqna_dataset.csv - -![img](img/image_br_dt.png) - -**Note:** - -Save the ai://… URL — you will use this when creating the dataset artifact. - -**Register the Dataset Artifact** - -- Click on **Register artifact** under lm -> artifacts in bruno collection to register the artifact - -```CODE -{ - "name": "aiconfig", - "kind": "dataset", - "url": "ai://default/testdata/medicalqna_dataset.csv", - "scenarioId": "genai-evaluations" -} -``` -![img](img/image-br02.png) - -[OPTION END] - -### Approach Selection – Choose How You Want to Provide Prompts(Read-up) - -In this evaluation workflow, you can provide prompts in two different ways. -Choose only one option based on your requirement. - -Here are your two options: - -| Option | Approach | Description | When to Use | -| ------------ | ------------------------------------------- | --------------------------------------------------------------- | ----------------------------------------------------- | -| **Option 1** | Prompt Template + Model Directly | Prompt stored in Prompt Registry and model referenced directly. | When you want reusable, versioned prompts. | -| **Option 2** | Orchestration Registry (Inline Prompt) | Prompt provided as part of orchestration config. | When prompt is ad-hoc or not reused. | - -After selecting your option: - - - Follow only the steps for that option. - - - Skip the other options. - - - After completing your selected option, go directly to Create Evaluation Configuration. - -### (Option 1) - Providing Prompts via Prompt Template + Model Directly - -✔ Follow this step **ONLY IF** you want to use **Prompt Template**. - -If not, **skip this step and go to Option 2**. - -[OPTION BEGIN [SAP AI Launchpad]] - -A Prompt Template defines: - - - The message roles (system, user, etc.) - - - Variables that get substituted from your dataset (e.g., questions) - - - Optional model configuration (temperature, max tokens, etc.) - -We’ll create a prompt template to guide the model to answer the questions - -**create the Prompt Template** - -- In SAP AI Launchpad, go to the left-hand menu and select Generative AI Hub → Prompt Management. - -- click on Templates → create - -- This is where you can define reusable templates with variables for evaluations. - -![img](img/image_007.png) - -**Define the Prompt** - -In the Message Blocks section: - -- Add a System role message: -```json -{ - "template": [ - { - "role": "user", - "content": "List the benefits and side effects of the drug in the following consumer health question: {{?question}}." - } - ] -} -``` - -**Configure Variables** - -Scroll down to Variable Definitions and add entries for each variable: - -- question - - - Default Value: leave empty or set to en for fallback - -This ensures the placeholders are dynamically substituted during evaluation. - -![img](img/image_008.png) - -**Save the Template** - -Click Save Template (top right): - -- Scenario → genai-evaluations - -- Name → prompt-registry-eval-acc-test - -- Version → 1.0.0 - -Click Save to persist the template. - -**Verify the Template** - -Go to Generative AI Hub → Prompt Management → Templates and confirm: - -- The template appears with the correct name, scenario, and version. - -- Managed By → shows how the template is stored. - -- Versioning is tracked automatically - -![img](img/image_10.png) - -[OPTION END] - -[OPTION BEGIN [Python]] - -```python -import os -import json - -def get_prompt_config_file(folder_path): - """ - Retrieves a list of all JSON file names in the specified folder. - """ - if not os.path.isdir(folder_path): - print(f"The folder path '{folder_path}' does not exist.") - return [] - - json_files = [file for file in os.listdir(folder_path) if file.endswith(".json")] - - if not json_files: - print(f"No JSON files were found in the folder '{folder_path}'.") - return json_files - - -def get_dataset_file_name(folder_path): - """ - Retrieves the name of the first file in the specified folder. - """ - if not os.path.isdir(folder_path): - print(f"The folder path '{folder_path}' does not exist.") - return None - - items_in_folder = os.listdir(folder_path) - - for item in items_in_folder: - item_path = os.path.join(folder_path, item) - if os.path.isfile(item_path): - return item - - print(f"No files were found in the folder '{folder_path}'.") - return None - - -def load_prompt_template(folder_path, file_name): - """ - Loads the contents of a JSON prompt template into a variable. - """ - file_path = os.path.join(folder_path, file_name) - try: - with open(file_path, "r", encoding="utf-8") as f: - return json.load(f) # returns Python dict - except Exception as e: - print(f"Error loading prompt template: {e}") - return None - -# --- MAIN EXECUTION --- -PROMPT_FOLDER = "./PUT_YOUR_PROMPT_TEMPLATE_HERE" -DATASET_FOLDER = "./PUT_YOUR_DATASET_HERE" - -PROMPT_CONFIG_FILES = get_prompt_config_file(PROMPT_FOLDER) -DATASET_NAME = get_dataset_file_name(DATASET_FOLDER) - -if PROMPT_CONFIG_FILES and DATASET_NAME: - # Load the first JSON prompt template - PROMPT_TEMPLATE = load_prompt_template(PROMPT_FOLDER, PROMPT_CONFIG_FILES[0]) - print(f"Prompt configs: {PROMPT_CONFIG_FILES}") - print(f"Dataset name: {DATASET_NAME}") - print("Prompt template contents:", PROMPT_TEMPLATE) -else: - print("Missing run or dataset file.") - raise SystemExit("Exiting due to missing run/dataset file.") -``` - -```python -def create_prompt_template(): - headers = _get_headers() - GET_PROMPT_TEMPLATES_ENDPOINT = '/v2/lm/promptTemplates' - request_url = f"{AICORE_BASE_URL}{GET_PROMPT_TEMPLATES_ENDPOINT}" - - request_body = { - "name": "prompt-registry-eval-acc-test", - "version": "1.0.0", - "scenario": "genai-evaluations", - "spec": PROMPT_TEMPLATE - } - try: - response = requests.post( - request_url, headers=headers, data=json.dumps(request_body), timeout=120 - ) - if(response.status_code != 200): - raise - result = response.json() - print(result) - return result['id'] - except: - logging.error("Error occurred while attempting to create a prompt template") - raise - -prompt_template_id = create_prompt_template() -``` -**Note** - -If you wish to use a prompt template that already exists in prompt registry, you can manually set prompt_template_id in the next cell and skip executing this cell - -If you already have an existing template set the ID manually: - -```python -prompt_template_id = "" -``` - -[OPTION END] - -[OPTION BEGIN [Bruno]] - -In Bruno, you can create a prompt template by sending a POST request to the AI Core API: - -**Request: Create Prompt Template** - -**URL:** - -```bash -POST {{api_url}}/v2/lm/promptTemplates -``` - -**Headers:** -``` -Authorization: Bearer {{access_token}} -Content-Type: application/json -``` - -**Body (JSON):** -```json -{ - "name": "prompt-registry-eval-acc-test", - "version": "1.0.0", - "scenario": "genai-evaluations", - "spec": { - "template": [ - { - "role": "user", - "content": "List the benefits and side effects of the drug in the following consumer health question: {{?question}}." - } - ], - "defaults": {}, - "additional_fields": { - "modelParams": { - "temperature": 0.3, - "max_tokens": 100 - }, - "modelGroup": "chat" - } - } -} -``` -![img](img/image_br_pr.png) - -[OPTION END] - -🔑 Tip: Always increment the version (e.g., 1.0.1, 1.0.2) when updating a template. This ensures reproducibility across evaluations. - -### (Option 2) - Providing Prompts via Orchestration Registry (Inline Prompt) - -Follow this step only if you want to **store prompt + model configuration inside Orchestration Registry**. - -**Create Orchestration Registry Configuration** - -[OPTION BEGIN [SAP AI Launchpad]] - -Go to Generative AI Hub → Orchestration → Orchestration Configurations - -- click create - -- In templating add the system prompt - -```json -List the benefits and side effects of the drug in the following consumer health question: {{?question}}. -``` -![img](img/image_ail_or1.png) - -- select the model in model configuration and save - -![img](img/image_ail_or2.png) - -![img](img/image_ail_or3.png) - -[OPTION END] - -[OPTION BEGIN [Python]] - -```python -def create_orchestration_registry_config(): - headers = _get_headers() - CREATE_ORCHESTRATION_REGISTRY = '/v2/registry/v2/orchestrationConfigs' - request_url = f"{AICORE_BASE_URL}{CREATE_ORCHESTRATION_REGISTRY}" - model_name,model_version=selected_models_str.split(":") - request_body = { - "name": "genai-eval-test", - "version": "1.0.0", - "scenario": "genai-evaluations", - "spec": { - "modules": { - "prompt_templating": { - "model": { - "name": model_name, - "version": model_version - }, - "prompt": PROMPT_TEMPLATE - } - } - } - } - try: - response = requests.post( - request_url, headers=headers, data=json.dumps(request_body), timeout=120 - ) - if(response.status_code != 200): - print(response.json()) - raise - result = response.json() - print(result) - return result['id'] - except: - logging.error("Error occurred while attempting to create a orchestration registry id") - raise -orchestration_registry_id = create_orchestration_registry_config() -``` - -![img](img/image_py_or1.png) - -[OPTION END] - -[OPTION BEGIN [Bruno]] - -You can paste this directly into a Bruno .bru file or create a new request inside Bruno. - -**Url:** -```bash -POST {{AICORE_BASE_URL}}/v2/registry/v2/orchestrationConfigs -``` - -**headers:** -``` -{ - Authorization: Bearer {{token}} - AI-Resource-Group: {{resource_group}} - Content-Type: application/json - } -``` - -**body:** -```json -{ - "name": "genai-eval-test", - "version": "1.0.0", - "scenario": "genai-evaluations", - "spec": { - "modules": { - "prompt_templating": { - "model": { - "name": "model_name", - "version": "model_version" - }, - "prompt": { - "template": [ - { - "role": "user", - "content": "List the benefits and side effects of the drug in the following consumer health question: {{?question}}." - } - ], - "defaults": {} - } - } - } - } -} -``` - -![img](img/image_br_or1.png) - -[OPTION END] - -After completing Option 2: - - - Proceed directly to the “Create Evaluation Configuration” section - - -### Understanding Metrics (Pre-Read) - -Metrics determine how your model outputs are evaluated during an evaluation run. They define the scoring logic that SAP AI Core uses to compare models, measure quality, and validate improvements over time. - -In SAP AI Core, you can use: - - - System-defined metrics (ready-made, no setup needed) - - - Custom metrics (your own definitions stored in the metric registry) - -**How Metrics Apply in Each Approach** - -| Approach | How Metrics Apply | -| ------------------------------------- | ----------------------------------------------------------------------------- | -| **Option 1 – Prompt Template** | Metrics score responses generated using the prompt template + selected model. | -| **Option 2 – Orchestration Registry** | Metrics score responses generated through orchestration configuration. | - -Metrics are provided later during **Create Evaluation Configuration**: - -```json -"metrics": "BERT, answer_relevance" -``` - -You can specify one or multiple metrics (comma-separated). - -#### Types of Metrics - -**1. System-defined Metrics** - -These come in two categories: - -**Computed Metrics** - -Score outputs using reference data or validation logic. - -| Metric | Description | Needs Reference? | -| --------------------- | ------------------------------------------ | ---------------- | -| **BERT Score** | Embedding similarity to reference | Yes | -| **BLEU** | N-gram overlap | Yes | -| **ROUGE** | Recall-based overlap | Yes | -| **Exact Match** | Checks if output exactly matches reference | Yes | -| **JSON Schema Match** | Validates output against a schema | Yes | -| **Language Match** | Detects language | No | -| **Content Filter** | Safety filter triggered (input/output) | No | - -**2. LLM-as-a-Judge Metrics** - -These metrics use a judge LLM to score responses based on a rubric. -They are ideal for open-ended tasks with no exact references. - -| Metric | What It Measures | Needs Reference? | -| ------------------------- | --------------------------------- | ---------------- | -| **Instruction Following** | How well the prompt was followed | No | -| **Correctness** | Factual accuracy | Yes | -| **Answer Relevance** | Relevance of the generated answer | No | -| **Conciseness** | Brevity + clarity | No | -| **RAG Groundedness** | Grounding in the provided context | No | -| **RAG Context Relevance** | Usefulness of retrieved context | No | - ---- - -#### Custom Metrics - -Create them when system metrics are insufficient. - -Two ways to define custom metrics: - -**1. Structured metrics (recommended)** - - - Provide task, criteria, rubric, optional examples - - - AI Core constructs the judge prompt - -**2. Free-form metrics** - - - You define prompts and scoring logic manually - -**Custom metric registration:** - -```bash -POST {{ai_api_url}}/v2/lm/evaluationMetrics -``` -Once registered, use them like system metrics: - -```json -"metrics": "my_custom_metric" -``` - -**Example — Prompt Template Approach** - -```json -"metrics": "BERT Score,answer_relevance" -``` - -**Example — Orchestration Registry Approach** - -```json -"metrics": "Pointwise Conciseness" -``` - -The chosen metrics determine: - - - scoring - - - dashboard visualizations - - - aggregated results - - - model ranking logic - -### Providing Metrics for Evaluation - -Metrics must be supplied before creating an Evaluation Configuration. - -[OPTION BEGIN [SAP AI Launchpad]] - -In SAP AI Launchpad, metrics are selected visually during the Evaluation Configuration creation flow. - -You can choose: - - - System-defined metrics - - - Custom metrics (your own definitions stored in the metric registry — cannot be created directly in AI Launchpad; to use them, register them via API/Bruno mentioned in the same step and then select them in the Evaluation Configuration) - -No manual JSON input is needed—the UI provides a selectable list of available metrics. - -1. Go to Generative AI Hub → Optimization. - -2. Click Create to start a new evaluation configuration. - -![img](img/image_25.png) - -Select Test Input / Runs depending on the option you used earlier: - -| Earlier Option Used | What to Select in AIL | -| ------------------------------------- | ----------------------------------------------------------------- | -| **Option 1 – Prompt Template** | Select your **Prompt Template** and choose one or more **Models** | -| **Option 2 – Orchestration Registry** | Select your **Orchestration Registry Config ID** | - -Then: - - - Select your registered dataset artifact - - - Enter the dataset path (example): - testdata/global_customer_queries.csv - - - Set the number of test samples (e.g., 20) - - ![img](img/image_26.png) - -- Click **Next** to go to Metrics selection. - -#### Select Evaluation Metrics - -Choose the metrics you want to evaluate. - -You may choose one or multiple system-defined or custom metrics—examples: - - - BERT Score - - - answer_relevance - - - instruction_following - -![img](img/image_27.png) - ---- - -> 📘 **Helpful Resources**: -> -> - [System-Defined Evaluation Metrics – SAP Documentation](https://help.sap.com/docs/sap-ai-core/generative-ai-hub/system-defined-evaluation-metrics) -> - [Define Your Own Custom Metrics – SAP Guide](https://help.sap.com/docs/sap-ai-core/generative-ai-hub/custom-metrics) -> *(If your evaluation requires domain-specific or advanced scoring logic)* - -> **Note: You may select additional metrics based on your use case.** - ---- - -[OPTION END] - -[OPTION BEGIN [Python]] - -**Metrics Handling in Python Notebook (Automatic Detection & Creation)** - -When running the evaluation through the Python notebook, metric setup is partially automated. -Before the evaluation configuration is created, the script performs the following: - - - Users can manually specify metric IDs - - - Or place custom metric JSON files in CUSTOM_METRIC_FOLDER - - - The notebook loads all custom metric definitions automatically - - - It checks if each metric already exists in AI Core - - - If not found → creates it automatically - - - Prints final list of metric IDs used for evaluation - -This ensures all metrics exist before the evaluation configuration is created. - -```python -import os -import json -import requests - -# --- Load JSON / JSONL files --- -def load_all_metrics(folder_path): - """ - Loads all JSON and JSONL files from a folder into a single list of dicts. - """ - metrics = [] - files = [f for f in os.listdir(folder_path) if f.endswith((".json", ".jsonl"))] - - if not files: - print(f"No JSON/JSONL files found in {folder_path}") - return metrics - - for file_name in files: - file_path = os.path.join(folder_path, file_name) - try: - with open(file_path, "r", encoding="utf-8") as f: - content = f.read().strip() - try: - data = json.loads(content) - if isinstance(data, list): - metrics.extend(data) - elif isinstance(data, dict): - metrics.append(data) - except json.JSONDecodeError: - # Attempt to parse as JSONL line by line - for line in content.splitlines(): - line = line.strip() - if not line: - continue - try: - metrics.append(json.loads(line)) - except json.JSONDecodeError: - print(f"Skipping invalid JSON line in {file_name}: {line[:50]}...") - except Exception as e: - print(f"Error reading {file_name}: {e}") - return metrics - -# --- Fetch all metrics from SAP AI Core --- -def fetch_all_metrics(): - request_url = f"{AICORE_BASE_URL}/v2/lm/evaluationMetrics" - resp = requests.get(request_url, headers=_get_headers()) - resp.raise_for_status() - return resp.json().get("resources", []) - -# --- Create or fetch a metric --- -def create_or_get_metric(custom_metric, user_metric_id=None): - all_metrics = fetch_all_metrics() - - # 1️⃣ User-supplied ID lookup - if user_metric_id: - for m in all_metrics: - if m.get("id") == user_metric_id: - print(f"✅ Metric already exists by ID: {user_metric_id}") - return user_metric_id - print(f"⚠️ User metric ID {user_metric_id} not found, will only include if valid later") - - # 2️⃣ Check by scenario, name, version - scenario = custom_metric.get("scenario") - name = custom_metric.get("name") - version = custom_metric.get("version") - if not all([scenario, name, version]): - raise ValueError("Metric must include 'scenario', 'name', and 'version'") - - for m in all_metrics: - if (m.get("scenario") == scenario and - m.get("name") == name and - m.get("version") == version): - metric_id = m.get("id") - print(f"✅ Metric already exists: {scenario}/{name} v{version}, ID = {metric_id}") - return metric_id - - # 3️⃣ Create metric if not found - request_url = f"{AICORE_BASE_URL}/v2/lm/evaluationMetrics" - required_fields = ["scenario", "name", "version", "evaluationMethod", "metricType"] - for f in required_fields: - if f not in custom_metric: - raise ValueError(f"❌ Missing required field: {f}") - - resp = requests.post(request_url, headers=_get_headers(), json=custom_metric) - resp.raise_for_status() - metric_id = resp.json().get("id") - print(f"✅ Metric created successfully: {name} v{version}, ID = {metric_id}") - return metric_id - -# --- Main pipeline --- -CUSTOM_METRIC_FOLDER = "./PUT_YOUR_CUSTOM_METRIC_HERE" -user_metric_ids = "" # set by user if needed - -# 1️⃣ Load all metrics from JSON/JSONL -custom_metric_list = load_all_metrics(CUSTOM_METRIC_FOLDER) - -# 2️⃣ Create/fetch metrics from SAP AI Core -metric_ids = [] -for metric in custom_metric_list: - try: - metric_id = create_or_get_metric(metric) - metric_ids.append(metric_id) - except ValueError as e: - print(f"Skipping metric due to error: {e}") - -# 3️⃣ Validate user_metric_ids separately if provided -if user_metric_ids and user_metric_ids.strip(): - all_metrics = fetch_all_metrics() - # Split comma-separated IDs and strip whitespace - for uid in [uid.strip() for uid in user_metric_ids.split(",")]: - if any(m.get("id") == uid for m in all_metrics): - metric_ids.append(uid) - else: - print(f"⚠️ User metric ID {uid} does not exist in AI Core, skipping.") -# 4️⃣ Convert to comma-separated string -custom_metric_ids_str = ",".join(metric_ids) -print("✅ All processed metric IDs:", custom_metric_ids_str) -``` -![img](img/image_py03.png) - -This ensures all required metrics are available before launching the evaluation. - -[OPTION END] - -[OPTION BEGIN [Bruno]] - -Bruno supports two ways of providing metrics: - -**Use System-Defined Metrics** - -You can directly pass system metrics in your configuration: - -Example: - -```json -"metrics": "answer_relevance" -``` - -If you want to register custom metrics, you must call: - -➡️ **Create Custom Metric** - -```bash -POST {{ai_api_url}}/v2/lm/evaluationMetrics -``` -**Body example:** - -```json -{ - "scenario": "genai-evaluations", - "name": "groundedness", - "version": "0.0.1", - "evaluationMethod": "llm-as-a-judge", - "metricType": "evaluation", - "promptType": "structured", - "spec": { - "configuration": { - "modelConfiguration": { - "name": "gpt-4o", - "version": "2024-08-06", - "parameters": [ - { - "key": "temperature", - "value": "0.1" - }, - { - "key": "max_tokens", - "value": "110" - } - ] - }, - "promptConfiguration": { - "evaluationTask": "You will be assessing groundedness, which measures how well the AI-generated response aligns with and is supported by the provided reference.", - "criteria": "Groundedness: The degree of factual and contextual overlap between the response and the reference.", - "ratingRubric": [ - { - "rating": 5, - "rule": "Fully grounded — the response completely aligns with and is fully supported by the reference." - }, - { - "rating": 4, - "rule": "Mostly grounded — the response largely aligns with the reference with only minor deviations." - }, - { - "rating": 3, - "rule": "Somewhat grounded — the response partially aligns, but some details are missing or loosely connected." - }, - { - "rating": 2, - "rule": "Poorly grounded — the response contains minimal overlap with the reference." - }, - { - "rating": 1, - "rule": "Not grounded — the response has no meaningful overlap with the reference." - } - ], - "includeProperties": ["reference","response"] - } - } - } -} - -``` - -You will receive: - -```json -"id": "" -``` - -This metric ID can be directly passed into the evaluation configuration. - -[OPTION END] - -### Define and Create Evaluation Configurations - -[OPTION BEGIN [SAP AI Launchpad]] - -Once your dataset artifact is registered and you have completed Option 1 (Prompt Template) or Option 2 (Orchestration Registry), the next step is to create an Evaluation Configuration. - -An Evaluation Configuration tells SAP AI Core: - - - which dataset to evaluate - - - which prompt/model or orchestration config to use - - - which metrics to compute - - - which orchestration deployment endpoint to call - - - how many repetitions to run - - - which test dataset file to load - -This configuration becomes the blueprint for your evaluation execution. - -**Steps to Create Evaluation Configuration** - -In Additional Configuration - -- Set **Number of Repetitions** to `1`. -- Choose an existing deployment for **Orchestration Endpoint**. - - ![img](img/image_29.png) ---- - -#### Final Review & Start - -- Review all the details on the summary page. -- Once confirmed, click **Create** to start the evaluation job. - -![img](img/image_40.png) - -> ✅ You have now successfully configured and triggered a Generative AI Evaluation. - -[OPTION END] - -[OPTION BEGIN [Python]] - -When using the Python notebook, the evaluation configuration is created automatically based on your selections. -Before creating the configuration, the notebook will: - - - Load the dataset artifact ID - - - Resolve metric IDs (system + custom) - - - Load prompt template or orchestration registry IDs - - - Validate all required parameters - -**Choose Configuration Mode (Option 1 or Option 2)** - -The notebook provides a simple UI with two checkboxes: - -**Option 1 – Prompt Template + Models** - -**Option 2 – Orchestration Registry** - -You must select only one. - -The notebook ensures mutual exclusivity and stores your selection in the variable: - -```python -from ipywidgets import Checkbox, VBox, HBox, Output, Label, Layout -from IPython.display import display -import textwrap - -# --- Selection state --- -approach = None -suppress_update = False - -# --- Define options --- -flag_options = [ - "prompt_registry", - "orchestration_registry" -] - -# --- Output widget to show current selection --- - -output = Output(layout=Layout(border="1px solid black", height="70px", overflow="auto", width="900px")) - - -# --- Handler for checkbox changes --- -def on_flag_change(change): - global approach, suppress_update - if suppress_update: - return - - if change["new"]: # A checkbox was checked - suppress_update = True - # Uncheck all other checkboxes - for cb in checkboxes: - if cb.description != change["owner"].description: - cb.value = False - suppress_update = False - approach = change["owner"].description - else: - # Only clear if the unchecked one was the currently selected - if approach == change["owner"].description: - approach = None - - # Update display once per action - with output: - output.clear_output(wait=True) - msg = f"Selected approach: {approach or 'None'}" - wrapped = textwrap.fill(msg, width=60) - output.append_stdout(wrapped + "\n") - -# --- Create checkboxes --- -checkboxes = [ - Checkbox(value=False, description=option, layout=Layout(width="250px")) - for option in flag_options -] - -# --- Attach event handler --- -for cb in checkboxes: - cb.observe(on_flag_change, names="value") - -# --- Display UI --- -header = Label( - value="Please select the configuration mode:", - layout=Layout(margin="10px 0px 10px 0px") -) -ui = VBox([header, HBox(checkboxes), output]) -display(ui) -``` - -This value determines which fields are passed later: - - - If approach == "prompt_registry" → notebook passes promptTemplate + models - - - If approach == "orchestration_registry" → notebook passes orchestrationRegistryIds - -#### Create Configuration Body - -The notebook builds the configuration using the required SAP AI Core fields: - - - scenarioId - - - executableId - - - dataset artifact binding - - - selected metrics - - - test dataset details - - - repetitions - - - orchestration deployment URL - - - and Option 1 or Option 2 fields, depending on the chosen approach. - -The following function dynamically creates the configuration body for AI Core. - -```python -# creating an AICORE Configuration. -import requests - -request_body = { - "name": "genai-eval-conf", - "scenarioId": "genai-evaluations", - "executableId": "genai-evaluations-simplified", - "inputArtifactBindings": [ - { - "key": "datasetFolder", - "artifactId": artifact_id - } - ], - "parameterBindings": [ - { - "key": "repetitions", - "value": repetitions - }, - { - "key": "orchestrationDeploymentURL", - "value": orchestration_deployment_url - }, - { - "key": "metrics", - "value": metrics_list - }, - { - "key": "testDataset", - "value": test_datasets - }, - { - "key": "promptTemplate", - "value": prompt_template_id if approach == "prompt_registry" else "" - }, - { - "key": "models", - "value": models_list if approach == "prompt_registry" else "" - }, - { - "key": "orchestrationRegistryIds", - "value": orchestration_registry_id if approach == "orchestration_registry" else "" - } - ] -} - -def create_aicore_configuration(): - headers = _get_headers() - GET_CONFIGURATIONS_ENDPOINT = '/v2/lm/configurations' - request_url = f"{AICORE_BASE_URL}{GET_CONFIGURATIONS_ENDPOINT}" - try: - response = requests.post( - request_url, headers=headers, data=json.dumps(request_body), timeout=120 - ) - print(response) - if(response.status_code != 201): - raise - result = response.json() - print(result) - return result['id'] - except: - logging.error("Error occurred while attempting to create a Configuration") - raise - -configuration_id = create_aicore_configuration() -``` - -**Sample parameter setup:** - -```python -import json -test_data_path = f"testdata/{DATASET_NAME}" # specify the test data path here. For the full folder just specifying testdata will work -test_datasets = json.dumps({'path': test_data_path, 'type': 'csv'}) -metrics_list = ",".join([selected_metrics_str,custom_metric_ids_str]) -models_list = selected_models_str -print(f"Selected metrics: {metrics_list}") -print(f"Selected models: {models_list}") -orchestration_deployment_url = "" -repetitions = "1" -``` - -You will receive a configuration ID, which is required for the next step (Execution). - -![img](img/image_py_con.png) - -SAP AI Core returns a configuration ID, which is used to trigger the evaluation execution. - -[OPTION END] - -[OPTION BEGIN [Bruno]] - -When creating an Evaluation Configuration through Bruno, you call: - -```bash -POST {{api_url}}/v2/lm/configurations -``` -Instead, you choose between: - -**Option 1 — Prompt Template + Models** - -**Option 2 — Orchestration Registry** - -based on which fields you include in your request body. - -| Option Selected | Fields You Must Pass | -| ------------------------------------- | -------------------------- | -| **Option 1 – Prompt Template** | `promptTemplate`, `models` | -| **Option 2 – Orchestration Registry** | `orchestrationRegistryIds` | - -All other fields (metrics, testDataset, repetitions, orchestrationDeploymentURL) remain the same across both options. - -Below are the sample request bodies for each option. - -#### Option 1 — Using Prompt Template + Models - -```json -{ - "name": "genai-eval-conf", - "scenarioId": "genai-evaluations", - "executableId": "genai-evaluations-simplified", - "inputArtifactBindings": [ - { - "key": "datasetFolder", - "artifactId": "{{artifactId}}" - } - ], - "parameterBindings": [ - { - "key": "repetitions", - "value": "1" - }, - { - "key": "orchestrationDeploymentURL", - "value": "{{deployment_url}}" - }, - { - "key": "metrics", - "value": "language_match" - }, - { - "key": "testDataset", - "value": "{\"path\": \"testdata/{{dataset_file}}\", \"type\": \"csv\"}" - }, - { - "key": "promptTemplate", - "value": "{{prompt_template_id}}" - }, - { - "key": "models", - "value": "{{model_name}}:{{model_version}}" - } - ] -} -``` -![img](img/image-br03.png) - -#### Option 2 — Using Orchestration Registry - -```json -{ - "name": "genai-eval-conf", - "scenarioId": "genai-evaluations", - "executableId": "genai-evaluations-simplified", - "inputArtifactBindings": [ - { - "key": "datasetFolder", - "artifactId": "{{artifactId}}" - } - ], - "parameterBindings": [ - { - "key": "repetitions", - "value": "1" - }, - { - "key": "orchestrationDeploymentURL", - "value": "{{deployment_url}}" - }, - { - "key": "metrics", - "value": "language_match" - }, - { - "key": "testDataset", - "value": "{\"path\": \"testdata/{{dataset_file}}\", \"type\": \"csv\"}" - }, - { - "key": "orchestrationRegistryIds", - "value": "{{orchestration_registry_id}}" - } - ] -} -``` - -![img](img/image-br06.png) - -[OPTION END] - -### Create and Run Evaluation Execution - -After creating the Evaluation Configuration, the next step is to execute it. -Execution triggers the evaluation workflow, which: - - - Reads the test dataset - - - Generates submissions to the orchestration service - - - Collects model outputs - - - Computes all selected metrics - - - Produces aggregate and raw evaluation results - -The process is identical for SAP AI Launchpad, Python, and Bruno, with only the invocation method differing. - -[OPTION BEGIN [SAP AI Launchpad]] - -- Once the evaluation configuration is created, the system automatically triggers an evaluation execution. - -- Follow these steps to monitor its progress and verify completion: - - - Navigate to **ML Operations** in the SAP AI Core Launchpad. - - - In the sidebar, click **Executions**. - - ![img](img/image_41.png) - - - Locate the most recent execution triggered by your evaluation configuration. You can use the timestamp or configuration name to identify it. - - - Click on the execution entry to open its details. The Current Status will update as the process runs. - - ![img](img/image_31.png) - -- Once the Target Status reaches **COMPLETED** , your evaluation has successfully finished. - -> [For More information](https://help.sap.com/docs/sap-ai-core/generative-ai-hub/create-evaluation) - -Track Execution Status - -The execution page will show: - - - Unknown - - - Pending - - - Running - - - Completed - -Once completed, you can navigate to: - - - Outputs → Tracking Metrics (aggregate results) - - - Output Artifacts (raw results stored in the SQLite DB) - -[OPTION END] - -[OPTION BEGIN [Python]] - -Once the configuration is ready, the next step is to trigger an execution. -An execution is a single evaluation run based on the configuration you defined. - -**Create Execution** - -The following function starts the evaluation in SAP AI Core using the configuration ID: - -```python -# Trigger an execution with the created configuration - -import requests -def create_execution(): - headers = _get_headers() - GET_EXECUTIONS_ENDPOINT = '/v2/lm/executions' - request_url = f"{AICORE_BASE_URL}{GET_EXECUTIONS_ENDPOINT}" - request_body = {"configurationId" : configuration_id} - try: - response = requests.post( - request_url, headers=headers, data=json.dumps(request_body), timeout=120 - ) - print("response received is ", response) - result = response.json() - print(result) - return result['id'] - except: - logging.error("Error occurred while attempting to create an execution") - raise - - -execution_id = create_execution() -``` -![img](img/image_44.png) - -#### Monitor Execution Status - -The execution progresses through states: - -UNKNOWN → PENDING → RUNNING → COMPLETED - -```python -# get execution status -import requests -def get_execution_status(execution_id): - headers = _get_headers() - LOG_EXECUTIONS_ENDPOINT = f'/v2/lm/executions/{execution_id}' - request_url = f"{AICORE_BASE_URL}{LOG_EXECUTIONS_ENDPOINT}" - try: - response = requests.get( - request_url, headers=headers, timeout=120 - ) - print("response received is ", response) - result = response.json() - return result - except: - logging.error("Error occurred while attempting to get execution status") - raise - -get_execution_status(execution_id) -``` - -#### Automatic Polling - -To continuously monitor until the evaluation finishes: - -```python -# Polling the execution status until it is COMPLETED or DEAD or timeout occurs -def poll_execution_status(execution_id, timeout_minutes=1800, poll_interval=30): - start_time = time.time() - while True: - result = get_execution_status(execution_id) - print(f"Execution Status: {result.get('status')}") - if result.get("status") == "COMPLETED": - print(f"Execution completed successfully in {time.time() - start_time} seconds, proceed to fetch results.") - break - if result.get("status") == "DEAD": - print(f"Execution failed with status DEAD in {time.time() - start_time} seconds. Check the logs for more details.") - break - if time.time() - start_time > timeout_minutes * 60: - raise TimeoutError(f"Execution status polling timed out after {timeout_minutes} minutes.") - time.sleep(poll_interval) - -``` - -![img](img/image_45.png) - -✅ Once the execution status shows COMPLETED, the evaluation results are available and can be analyzed in the next step. - -[OPTION END] - -[OPTION BEGIN [Bruno]] - -After creating the configuration, the next step is to trigger the evaluation workload by creating an AI Core execution. - -**Create an Execution with the Created Configuration** - -- Click on create execution under executions, pass the configuration id created in previous step - -![img](img/image-br04.png) - -- The status field progresses through different states over time: -UNKNOWN → PENDING → RUNNING → COMPLETED. - -**Get Execution Status** - -check the status of created execution by passing the execution ID, The Current Status will update as the process runs. please refer the below image - -![img](img/image-br05.png) - -[OPTION END] - -### View and Analyze Evaluation Results - -Once the evaluation execution is complete, SAP AI Core generates both aggregated metrics and detailed instance-level results. -These results help compare model performance, understand quality metrics, and debug issues. - -[OPTION BEGIN [SAP AI Launchpad]] - -Once the evaluation workflow execution is completed, this step retrieves the aggregated evaluation metrics from the SAP AI Core service by specifying the run name. - -1. Go to Evaluations → Executions - -2. Select your execution - -3. Open the Metrics tab to view: - - - average latency - - - token usage - - - metric scores - -4. Open the Artifacts tab to download: - - - the complete result folder - - - the SQLite DB for deeper analysis - -This is the easiest way to visually inspect evaluation outcomes and compare multiple model runs. - -![img](img/image_46.png) - -[OPTION END] - -[OPTION BEGIN [Python]] - -The notebook includes utility scripts to retrieve aggregated metrics, download detailed artifacts, and inspect SQLite results.This returns all metric values per evaluated run, which your notebook then: - - - Converts into a DataFrame - - - Creates a pivot table - - - Prepares for ranking and scoring - -**Retrieve Aggregate Metrics (Tracking API)** - -Aggregated metrics summarize performance across all test samples. -To fetch them using execution ID: - -```python -# Get aggregate metrics using execution id -import pandas as pd -from IPython.display import HTML - -def get_model_from_run(run): - for tag in run.get("tags", []): - if tag.get("name") == "evaluation.ai.sap.com/model": - return tag.get("value") - -def aggregate_metrics_by_model(runs_list): - transformed_data = [] - for run in runs_list: - model = get_model_from_run(run) - for metric in run["metrics"]: - output_json = { - "model": model, - "metrics_name": metric.get("name"), - "metric_value": metric.get("value") - } - transformed_data.append(output_json) - return transformed_data - - -def create_metrics_pivot_table(transformed_data): - """ - Creates a pivot table where rows are models and columns are metrics. - - Args: - transformed_data: List of dictionaries with 'model', 'metrics_name', 'metric_value' - - Returns: - DataFrame with models as rows and metrics as columns - """ - # Convert list of dictionaries to DataFrame - df = pd.DataFrame(transformed_data) - - # Create pivot table - pivot_table = df.pivot_table( - index='model', - columns='metrics_name', - values='metric_value', - aggfunc='first' # Use 'first' to get the single value, or 'mean' if there are duplicates - ) - - return pivot_table - -transformed_data = aggregate_metrics_by_model(runs_data['resources']) -metrics_pivot = create_metrics_pivot_table(transformed_data) - -HTML(metrics_pivot.to_html()) -``` -![img](img/image_47.png) - -You can also retrieve using run name: - -```bash -{base_url}/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/run-name={run_name} -``` - -**Download Raw Results (Output Artifact)** - -All detailed evaluation outputs are stored as an output artifact in your object store. To download all output files programmatically: - -```python -# download the result artifacts from Object store. -import boto3 - -def download_all_objects(prefix, destination_folder): - """ - Recursively download all objects from an S3 bucket starting with a specific prefix. - - :param bucket_name: Name of the S3 bucket. - :param prefix: Prefix to filter objects in the bucket. - :param destination_folder: Local folder to save the downloaded files. - """ - s3_client = boto3.client( - 's3', - aws_access_key_id=AWS_ACCESS_KEY, - aws_secret_access_key=AWS_SECRET_ACCESS_KEY, - region_name=AWS_REGION - ) - - # Ensure the destination folder exists - if not os.path.exists(destination_folder): - os.makedirs(destination_folder) - - # Paginate through objects - paginator = s3_client.get_paginator('list_objects_v2') - pages = paginator.paginate(Bucket=AWS_BUCKET_ID, Prefix=prefix) - - for page in pages: - if 'Contents' in page: - for obj in page['Contents']: - key = obj['Key'] - local_file_path = os.path.join(destination_folder, os.path.relpath(key, prefix)) - - # Ensure the local directory structure exists - local_directory = os.path.dirname(local_file_path) - if not os.path.exists(local_directory): - os.makedirs(local_directory) - - # Download the object - print(f"Downloading {key} to {local_file_path}") - s3_client.download_file(AWS_BUCKET_ID, key, local_file_path) - - -# Download the evaluation results from the object store. Look at execution status under "outputArtifacts" key to see the 'url' -# which shows the data path of where your output results are stored -EXECUTION_ID = execution_id -sqlite_db_prefix = f'{EXECUTION_ID}/evaluation_result/' # change the prefix based on where your output artifact is stored in the bucket. -destination_folder = 'results-new' - -download_all_objects(sqlite_db_prefix, destination_folder) -``` - -![img](img/image_48.png) - -**View Detailed Results (SQLite DB)** - -The evaluation stores detailed instance-level results in results.db. - -Example: Reading SQLite tables: - -```python -# viewing the results from sqlite db in tabular format.. -import sqlite3 -import pandas as pd -from IPython.display import display, HTML - -# Path to your SQLite database file -db_file = 'results-new/results.db' - -connection = sqlite3.connect(db_file) - -# Specify the table names you want to display -table_names = ['run','configuration', 'submission', 'submission_result', 'evaluation_result'] - -# Create the CSS and HTML container -html_content = """ - -
-""" - -for table_name in table_names: - query = f"SELECT * FROM {table_name};" - df = pd.read_sql_query(query, connection) - # If you want to see all the rows across all tables, remove/comment the next line - df = df.head(10) # Limiting the number of rows displayed - table_html = df.to_html(classes='table-container', index=False) - html_content += f""" -
-

Table: {table_name}

- {table_html} -
- """ - -html_content += "
" - -display(HTML(html_content)) - -# Close the connection -connection.close() -``` - -**Process and Rank Models (Optional Python Helpers)** - -The notebook includes post-processing utilities that: - - - normalize numeric metrics - - - process boolean and categorical metrics - - - compute weighted scores - - - generate a final ranking to identify the best model - -```python -import pandas as pd -from IPython.display import HTML - -# Scoring logic depends on "scoring_type" -# "weight" represents the relative weight of this metric to all SELECTED metrics -METRICS_SCORING_TYPE_MAPPING = { - "Content Filter on Input": { - "scoring_type": "bool-false", # False is good - "weight": 1 - }, - "Content Filter on Output": { - "scoring_type": "bool-false", # False is good - "weight": 1 - }, - "Pointwise Instruction Following": { - "scoring_type": "num_1_to_5", - "weight": 1 - }, - "Pointwise Answer Relevance": { - "scoring_type": "num_1_to_5", - "weight": 1 - }, - "Pointwise Conciseness": { - "scoring_type": "num_1_to_5", - "weight": 1 - }, - "Pointwise Correctness": { - "scoring_type": "num_1_to_5", - "weight": 1 - }, - "BLEU": { - "scoring_type": "num_0_to_1", - "weight": 1 - }, - "ROUGE": { - "scoring_type": "num_0_to_1", - "weight": 1 - }, - "BERT Score": { - "scoring_type": "F1/Precision/Recall num_0_to_1", - "weight": 1 - } -} - -def calculate_bool_metric_score(pivot_df, metric_base_name, true_is_good): - """ - Calculate scores for boolean metrics based on False/True counts. - - Args: - pivot_df: DataFrame with models as rows and metrics as columns - metric_base_name: Base name of the metric (without /False/count or /True/count) - true_is_good: Boolean indicating if True is considered a good outcome - - Returns: - Series with boolean metric scores per model (scaled to -1 to 1) - """ - false_col = f"{metric_base_name}/False/count" - true_col = f"{metric_base_name}/True/count" - - false_values = pivot_df[false_col] if false_col in pivot_df.columns else 0 - true_values = pivot_df[true_col] if true_col in pivot_df.columns else 0 - total_values = true_values + false_values - - score = ((false_values * 1) + (true_values * -1)) / total_values - - if true_is_good: - score = 0 - score - - return score - -def calculate_numeric_metric_score(pivot_df, metric_base_name, range_min=0, range_max=1): - """ - Calculate scores for numeric metrics with /mean - The mean is normalized to a score between -1 and 1 using the provided range. - - Args: - pivot_df: DataFrame with models as rows and metrics as columns - metric_base_name: Base name of the metric (without suffixes) - range_min: Minimum possible value of the metric - range_max: Maximum possible value of the metric - - Returns: - Series with numeric metric scores per model (scaled to -1 to 1) - """ - mean_col = f"{metric_base_name}/mean" - - if mean_col not in pivot_df.columns: - return pd.Series(0.0, index=pivot_df.index) - - mean_values = pivot_df[mean_col] - - # Linear normalization from [range_min, range_max] to [0, 1] - normalized = (mean_values - range_min) / (range_max - range_min) - - # Scale to [-1, 1] - score = (normalized * 2) - 1 - - return score - -def calculate_bert_score(pivot_df, metric_base_name): - """ - Calculate BERT Score by averaging F1, Precision, and Recall scores. - - Args: - pivot_df: DataFrame with models as rows and metrics as columns - metric_base_name: Base name "BERT Score" - - Returns: - Series with BERT scores per model (scaled to -1 to 1) - """ - f1_col = f"{metric_base_name}/F1/mean" - precision_col = f"{metric_base_name}/Precision/mean" - recall_col = f"{metric_base_name}/Recall/mean" - - scores = [] - for col in [f1_col, precision_col, recall_col]: - if col in pivot_df.columns: - scores.append(pivot_df[col]) - - if not scores: - return pd.Series(0.0, index=pivot_df.index) - - # Average the three metrics (already in 0 to 1 range) - avg_score = sum(scores) / len(scores) - - # Scale to [-1, 1] - score = (avg_score * 2) - 1 - - return score - -def find_unique_metrics_in_pivot(pivot_df): - """ - Identify unique metric base names present in the pivot table. - - Args: - pivot_df: DataFrame with models as rows and metrics as columns - """ - # Extract unique metric names from pivot table columns - unique_metrics = set() - for col in pivot_df.columns: - # Extract base metric name by removing suffixes - base_name = col - for suffix in ['/False/count', '/True/count', '/F1_score/mean','/Precision_score/mean', - '/Recall_score/mean','/mean','/median', '/p90', '/p95', '/stddev']: - if suffix in base_name and "BERT Score" not in base_name: - base_name = base_name.replace(suffix, '') - unique_metrics.add(base_name) - break - if base_name.startswith("BERT Score/"): - base_name = "BERT Score" - unique_metrics.add(base_name) - if not unique_metrics: - raise ValueError("No valid metrics found in pivot table") - return unique_metrics - - -def rank_models(pivot_df, unique_metrics=None): - """ - Rank models based on metrics present in the pivot table. - - Args: - pivot_df: DataFrame with models as rows (index) and metrics as columns - - Returns: - DataFrame with model rankings and scores - """ - # Calculate total weight for metrics present in pivot table - total_weight = sum(METRICS_SCORING_TYPE_MAPPING[m]["weight"] for m in unique_metrics) - - # Initialize total score - total_scores = pd.Series(0.0, index=pivot_df.index) - - # Process each metric found in the pivot table - for metric_name in unique_metrics: - config = METRICS_SCORING_TYPE_MAPPING[metric_name] - scoring_type = config["scoring_type"] - weight = config["weight"] / total_weight - - if scoring_type == "bool-false": - # False is good (True is bad) - metric_score = calculate_bool_metric_score(pivot_df, metric_name, true_is_good=False) - total_scores += metric_score * weight - - elif scoring_type == "bool-true": - # True is good (False is bad) - metric_score = calculate_bool_metric_score(pivot_df, metric_name, true_is_good=True) - total_scores += metric_score * weight - - elif scoring_type == "num_1_to_5": - metric_score = calculate_numeric_metric_score(pivot_df, metric_name, range_min=1, range_max=5) - total_scores += metric_score * weight - - elif scoring_type == "num_0_to_1": - metric_score = calculate_numeric_metric_score(pivot_df, metric_name, range_min=0, range_max=1) - total_scores += metric_score * weight - - elif scoring_type == "F1/Precision/Recall num_0_to_1": - # BERT Score - metric_score = calculate_bert_score(pivot_df, metric_name) - total_scores += metric_score * weight - - # Create results DataFrame - results_df = pd.DataFrame({ - 'model': pivot_df.index, - 'total_score': total_scores.values - }) - - # Rank models (higher score = better rank) - results_df['rank'] = results_df['total_score'].rank(ascending=False, method='min').astype(int) - results_df = results_df.sort_values('rank') - - return results_df - -def get_detailed_scores(pivot_df, unique_metrics): - """ - Get detailed breakdown of scores per metric for each model. - - Args: - pivot_df: DataFrame with models as rows and metrics as columns - - Returns: - DataFrame with detailed scores per metric - """ - detailed_scores = pd.DataFrame(index=pivot_df.index) - - # Process each metric in the mapping - for metric_name in unique_metrics: - scoring_type = METRICS_SCORING_TYPE_MAPPING[metric_name]["scoring_type"] - - if scoring_type == "bool-false": - detailed_scores[f"{metric_name}_score"] = calculate_bool_metric_score(pivot_df, metric_name, true_is_good=False) - - elif scoring_type == "bool-true": - detailed_scores[f"{metric_name}_score"] = calculate_bool_metric_score(pivot_df, metric_name, true_is_good=True) - - elif scoring_type == "num_1_to_5": - detailed_scores[f"{metric_name}_score"] = calculate_numeric_metric_score(pivot_df, metric_name, range_min=1, range_max=5) - - elif scoring_type == "num_0_to_1": - detailed_scores[f"{metric_name}_score"] = calculate_numeric_metric_score(pivot_df, metric_name, range_min=0, range_max=1) - - elif scoring_type == "F1/Precision/Recall num_0_to_1": - detailed_scores[f"{metric_name}_score"] = calculate_bert_score(pivot_df, metric_name) - - return detailed_scores - -unique_metrics = find_unique_metrics_in_pivot(metrics_pivot) - -# Get detailed scores breakdown -detailed = get_detailed_scores(metrics_pivot, unique_metrics) -display(HTML(detailed.to_html())) - -# Rank models -ranking = rank_models(metrics_pivot, unique_metrics) -display(HTML(ranking.to_html())) -``` -This provides a clear ranking of models based on the metrics you selected during evaluation. - -![img](img/image_py_rk.png) - -[OPTION END] - -[OPTION BEGIN [Bruno]] - -Retrieve Aggregate Metrics - -Send a GET request: - -**GET** -```bash -{{apiurl}}/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/child-of={{execution_id}} -``` -or using dataset run name: - -**GET** -```bash -{{apiurl}}/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/run-name={{run_name}} -``` - -This returns aggregated values for: - - - latency - - - token usage - - - metric scores - - - completion count - -**Download Raw Results** - -1. Open the execution details - -2. Copy the output artifact URL - -3. Download the folder to obtain - - - step-wise results - - - sqlite_combined/results.db - -**Inspect Detailed Results** - -Open the SQLite DB in any client to inspect: - - - submissions - - - completion responses - - - evaluation_results (raw metric scores) - - - aggregation_results - - - custom_logs - -![img](img/image_49.png) - -[OPTION END] - -### Delete Evaluation Artifacts, Configurations & Metrics - -Over time, your workspace may accumulate old configurations, executions, and metrics. -SAP AI Core allows you to safely delete these resources once they are no longer needed. - -This section explains how to delete: - - - Evaluation Executions - - - Evaluation Configurations - - - Custom Metrics (if created) - -⚠️ Important: - -Deletions are permanent and cannot be undone. -System-defined metrics cannot be deleted — only your custom metrics. - -[OPTION BEGIN [SAP AI Launchpad]] - -**Delete Executions** - -1. Go to Evaluations → Executions - -2. Select the execution - -3. Click Delete - -4. Confirm the deletion - -**Delete Evaluation Configurations** - -1. Go to Evaluations → Configurations - -2. Select the configuration you created - -3. Click Delete - -[OPTION END] - -[OPTION BEGIN [Python]] - -**1. Delete an Evaluation Execution** - -```python -#Delete Execution Id -def delete_execution(): - headers = _get_headers() - EXEC_ID = execution_id - GET_EXECUTIONS_ENDPOINT = '/v2/lm/executions/' - request_url = f"{AICORE_BASE_URL}{GET_EXECUTIONS_ENDPOINT}{EXEC_ID}" - try: - response = requests.delete( - request_url, headers=headers, params={"AI-Resource-Group":AICORE_RESOURCE_GROUP}, timeout=120 - ) - print(response) - if(response.status_code != 202): - raise - result = response.json() - print(result) - except: - logging.error("Error occurred while attempting to delete a Configuration") - raise - -delete_execution() -``` -**2. Delete an Evaluation Configuration** - -```python -def delete_configuration(configuration_id): - headers = _get_headers() - endpoint = f"/v2/lm/configurations/{configuration_id}" - url = f"{AICORE_BASE_URL}{endpoint}" - - response = requests.delete(url, headers=headers) - print("Status:", response.status_code) - print(response.text) - -# Example: -delete_configuration(configuration_id) -``` - -**3. Delete a Custom Metric** - -```python -def delete_metric(metric_id): - headers = _get_headers() - endpoint = f"/v2/lm/evaluationMetrics/{metric_id}" - url = f"{AICORE_BASE_URL}{endpoint}" - - response = requests.delete(url, headers=headers) - print("Status:", response.status_code) - print(response.text) - -# Example: -delete_metric(metric_id) -``` - -[OPTION END] - -[OPTION BEGIN [Bruno]] - -**1. Delete Execution** - -**DELETE Request** -```bash -{{apiurl}}/v2/lm/executions/{{execution_id}} -``` -**Headers:** -``` -Authorization: Bearer {{access_token}} -AI-Resource-Group: {{resource_group}} -``` -**2. Delete Configuration** - -```bash -DELETE {{apiurl}}/v2/lm/configurations/{{configuration_id}} -``` - -**3. Delete Custom Metric** - -```bash -DELETE {{apiurl}}/v2/lm/evaluationMetrics/{{metric_id}} -``` - -[OPTION END] +--- +parser: v2 +auto_validation: true +time: 45 +primary_tag: software-product>sap-ai-core +tags: [ tutorial>beginner, topic>artificial-intelligence, topic>machine-learning, software-product>sap-ai-core ] +author_name: Smita Naik +author_profile: https://github.com/I321506 +--- + +# Generative AI Custom Evaluation - Quickstart + This tutorial demonstrates how to use SAP AI Core Custom Evaluation to benchmark Large Language Models (LLMs) using **Prompt Registry**. It guides you through environment setup, configuration creation, execution, and result analysis in a unified and simplified workflow. + +## You will learn +- How to prepare and organize datasets for evaluation. +- How to configure and run evaluations in SAP AI Core. +- How to analyze and interpret aggregated evaluation results. + +## Prerequisites +1. **BTP Account** + Set up your SAP Business Technology Platform (BTP) account. + [Create a BTP Account](https://developers.sap.com/group.btp-setup.html) +2. **For SAP Developers or Employees** + Internal SAP stakeholders should refer to the following documentation: [How to create BTP Account For Internal SAP Employee](https://me.sap.com/notes/3493139), [SAP AI Core Internal Documentation](https://help.sap.com/docs/sap-ai-core) +3. **For External Developers, Customers, or Partners** + Follow this tutorial to set up your environment and entitlements: [External Developer Setup Tutorial](https://developers.sap.com/tutorials/btp-cockpit-entitlements.html), [SAP AI Core External Documentation](https://help.sap.com/docs/sap-ai-core?version=CLOUD) +4. **Create BTP Instance and Service Key for SAP AI Core** + Follow the steps to create an instance and generate a service key for SAP AI Core: + [Create Service Key and Instance](https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/create-service-key?version=CLOUD) +5. **AI Core Setup Guide** + Step-by-step guide to set up and get started with SAP AI Core: + [AI Core Setup Tutorial](https://developers.sap.com/tutorials/ai-core-setup.html) +6. An Extended SAP AI Core service plan is required, as the Generative AI Hub is not available in the Free or Standard tiers. For more details, refer to +[SAP AI Core Service Plans](https://help.sap.com/docs/sap-ai-core/sap-ai-core-service-guide/service-plans?version=CLOUD) +7. **Orchestration Deployment** + Ensure at least one orchestration deployment is ready to be consumed during this process. +Refer to [this tutorial understand the basic consumption of GenAI models using orchestration.](https://developers.sap.com/tutorials/ai-core-orchestration-consumption.html) +8. **Basic Knowledge** + Familiarity with the orchestration workflow is recommended +9. **Install Dependencies** + Install the required Python packages using the requirements.txt file provided. +Download [requirements.txt](img/requirements.txt) + +💡 Right-click the link above and choose **"Save link as..."** to download it directly. + +## Pre-Read + +This tutorial is designed for users who are unfamiliar with AI Core services and do not require flexibility in their use case. This tutorial is setup in a way that provides automatic setup for your evaluation where only the dataset is minimally required. +It demonstrates a quick start simplified workflow for using AI Core's custom evaluation capabilities to benchmark Large Language Models (LLMs), and evaluate different prompts for a specific use case. It utilizes the public [MedicationQA dataset](https://langtest.org/docs/pages/benchmarks/medical/medicationqa/) to showcase how to compute industry-standard metrics and assess the reliability of LLM-generated responses. + +### Environment Variables Setup + +[OPTION BEGIN [SAP AI Launchpad]] + +- Navigate to your SAP AI Core Launchpad. + +- In the Workspaces section, click on "Add" to create a new workspace. + - A workspace in SAP AI Core is a logical container that holds your resources (like models and pipelines) and provides the isolation needed for your projects. + +- When prompted, enter your AI Core credentials (such as Client ID, Client Secret, and Base URL). + - Note: If you're unsure about where to find these credentials, refer to this [guide](https://developers.sap.com/tutorials/ai-core-generative-ai.html#1c4f36d7-f345-4822-be00-c15f133ff7d8). + +- Once the workspace is successfully created, select your desired Resource Group to begin the evaluation process. + +Refer to the screenshot below for guidance: +![img](img/image_34.png) + +[OPTION END] + +[OPTION BEGIN [Python]] + +- Open **Visual Studio Code or Jupyter Notebook**. Create a new file with the .ipynb extension (e.g., custom_evaluation.ipynb). +- Create a **.env** file in the root directory of your project. +- Add your **AI Core** and **AWS credentials** as shown below. + +```env +# AICORE CREDENTIALS +AICORE_CLIENT_ID= +AICORE_CLIENT_SECRET= +AICORE_AUTH_URL= +AICORE_BASE_URL= +AICORE_RESOURCE_GROUP= + +# AWS CREDENTIALS +AWS_ACCESS_KEY= +AWS_BUCKET_ID= +AWS_REGION= +AWS_SECRET_ACCESS_KEY= + +# ORCHESTRATION DEPLOYMENT URL +DEPLOYMENT_URL= +``` + +**Note:** Replace placeholders (e.g., CLIENT_ID, CLIENT_SECRET, etc) with your actual environment credentials. + +Refer to the below screenshot for clarity: +![img](img/image_1.png) + +#### Install Dependencies + +Install the required packages using the [requirements.txt](img/requirements.txt) file you downloaded in the Prerequisites section. +```bash +pip install -r requirements.txt +``` +#### Connect to AI Core Instance + +Once the environment variables are set and dependencies are installed, run the following code to connect to your instance: + +```PYTHON +# Loading the credentials from the env file +from gen_ai_hub.proxy.gen_ai_hub_proxy import GenAIHubProxyClient +from dotenv import load_dotenv +import os + +load_dotenv(override=True) + +# Fetching environment variables +AICORE_BASE_URL = os.getenv("AICORE_BASE_URL") +AICORE_RESOURCE_GROUP = os.getenv("AICORE_RESOURCE_GROUP") +AICORE_AUTH_URL = os.getenv("AICORE_AUTH_URL") +AICORE_CLIENT_ID = os.getenv("AICORE_CLIENT_ID") +AICORE_CLIENT_SECRET = os.getenv("AICORE_CLIENT_SECRET") + +AWS_ACCESS_KEY = os.getenv("AWS_ACCESS_KEY") +AWS_BUCKET_ID = os.getenv("AWS_BUCKET_ID") +AWS_REGION = os.getenv("AWS_REGION") +AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY") +DEPLOYMENT_URL = os.getenv("DEPLOYMENT_URL") + +# Initializing the GenAIHubProxyClient +client = GenAIHubProxyClient( + base_url=AICORE_BASE_URL, + auth_url=AICORE_AUTH_URL, + client_id=AICORE_CLIENT_ID, + client_secret=AICORE_CLIENT_SECRET, + resource_group=AICORE_RESOURCE_GROUP +) +``` + +**NOTE:** +- Ensure the **requirements.txt** installation completes successfully before running the code. +- If you face any issues, recheck your **.env** values and installed packages. + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +- Download the [Bruno_collections](img/AI_Core.json) file + +- please follow the steps in the [Tutorial](https://developers.sap.com/tutorials/ai-core-orchestration-consumption.html) to set up your environment, refer step - **Set Up Your Environment and Configure Access** and proceed till generating the token + +[OPTION END] + +### Preparing Dataset Files + +[OPTION BEGIN [SAP AI Launchpad]] + +> **Note:** This step involves local setup using Python and does not require any action on the SAP AI Launchpad. + +[OPTION END] + +[OPTION BEGIN [Python]] + +In this step, the evaluation notebook dynamically detects the dataset file from a predefined folder structure. +You are not required to hardcode the dataset filename. + +```Python +import os +import json + +def get_dataset_file_name(folder_path): + """ + Retrieves the name of the first file in the specified folder. + """ + if not os.path.isdir(folder_path): + print(f"The folder path '{folder_path}' does not exist.") + return None + + items_in_folder = os.listdir(folder_path) + + for item in items_in_folder: + item_path = os.path.join(folder_path, item) + if os.path.isfile(item_path): + return item + + print(f"No files were found in the folder '{folder_path}'.") + return None + + +# --- MAIN EXECUTION --- +DATASET_FOLDER = "./DATASET" + +DATASET_NAME = get_dataset_file_name(DATASET_FOLDER) + +if DATASET_NAME: + print(f"Dataset name: {DATASET_NAME}") +else: + print("Missing run or dataset file.") + raise SystemExit("Exiting due to missing run/dataset file.") +``` + +![img](img/image_py_dtst.png) + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +> **Note:** This step involves local setup using Python and does not require any action on Bruno. + +[OPTION END] + +### Registering an Object Store Secret in AI Core + +[OPTION BEGIN [SAP AI Launchpad]] + +- Open the **SAP AI Core Launchpad** and navigate to the **Administration** tab. +- Select the **Object Store** section from the left-hand menu. +- Click on **“Add”** to register a new object store secret. +- Fill in the required bucket details as shown in the screenshot below. + +![img](img/image_33.png) + +In the **Secret** field, use the following structure to provide your AWS credentials: + +```json +{ + "AWS_ACCESS_KEY_ID": "Enter Your value", + "AWS_SECRET_ACCESS_KEY": "Enter Your value" +} +``` + +[OPTION END] + +[OPTION BEGIN [Python]] + +To make your evaluation files available for AI Core orchestration, you need to: + +- Upload them to an object store (e.g., AWS S3). +- Register the object store secret in AI Core. + +#### **Setup Authentication and Headers** + +First, define the authentication headers for AI Core REST API calls. + +```PYTHON +def _get_headers(): + headers = { + "Authorization": client.get_ai_core_token(), + "AI-Resource-Group": AICORE_RESOURCE_GROUP, + "Content-Type": "application/json", + } + return headers +``` + +#### **Register Object Store Secret in AI Core** + +Register your S3 bucket and credentials as a secret. + +```PYTHON +# Register S3 secret with AI Core which will be used an input source +import requests +import json +import logging + +def delete_oss_secret(oss_name=""): + headers = _get_headers() + + DELETE_SECRETS_ENDPOINT = f'/v2/admin/objectStoreSecrets/{oss_name}' + request_url = f"{AICORE_BASE_URL}{DELETE_SECRETS_ENDPOINT}" + + try: + response = requests.delete(request_url, headers=headers, timeout=120) + if response.status_code == 202: + print(f"Successfully deleted object store secret: {oss_name}") + elif response.status_code == 404: + print(f"Object store secret not found: {oss_name}. It may not exist.") + else: + logging.error(f"Failed to delete object store secret: {oss_name}, Status Code: {response.status_code}") + except Exception as e: + logging.error(f"Error occurred while attempting to delete object store secret: {e}") + raise + +def register_oss_secret(oss_name="", path_prefix=""): + headers = _get_headers() + + POST_SECRETS_ENDPOINT = '/v2/admin/objectStoreSecrets' + request_url = f"{AICORE_BASE_URL}{POST_SECRETS_ENDPOINT}" + + request_body = { + "name": oss_name, + "data": { + "AWS_ACCESS_KEY_ID": AWS_ACCESS_KEY, + "AWS_SECRET_ACCESS_KEY": AWS_SECRET_ACCESS_KEY + }, + "type": "S3", + "bucket": AWS_BUCKET_ID, + "endpoint": "s3-eu-central-1.amazonaws.com", + "region": AWS_REGION, + "pathPrefix": path_prefix, + "verifyssl": "0", + "usehttps": "1", + } + try: + response = requests.post( + request_url, headers=headers, data=json.dumps(request_body), timeout=120 + ) + result = response.json() + return result + except: + logging.error("Error occurred while attempting to create object store secret") + raise + +delete_oss_secret(oss_name="default") +delete_oss_secret(oss_name="genai-quick-data-notebook") + +register_oss_secret(oss_name="default", path_prefix="") +register_oss_secret(oss_name="genai-quick-data-notebook", path_prefix="") +``` + +![img](img/image_objsec.png) + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +Generic secrets securely store AWS S3 credentials required for document access + +• Expand **objectStoreSecrets** under admin and select create a secret request + +Use the below payload to create a secret for AWS S3 with NoAuthentication as authentication type. + +```CODE +{ + "name": "genai-data", + "data": { + "AWS_ACCESS_KEY_ID": "", + "AWS_SECRET_ACCESS_KEY": "", + }, + "type": "S3", + "bucket": "", + "endpoint": "", + "region": "", + "pathPrefix": "" + } +``` +• Ensure that all values in the data dictionary are Base64-encoded as per AWS S3 credential requirements + +![img](img/image-br01.png) + +[OPTION END] + +> ⚠️ **Important Note (Must Read)** +> +> - You must **create an object store secret** with a user defined name (for eg: default) to store **output artifacts** from orchestration runs. This is **mandatory**. +> - For **input artifacts**, you may create additional object store secrets with different names if needed. +> - If a user defined name (for eg: default) is not configured, orchestration runs will **fail** due to missing output target setup. + + +### Upload and Register Dataset + +[OPTION BEGIN [SAP AI Launchpad]] + +After creating the secret, upload your evaluation files to the S3 bucket and register them as an artifact in AI Core. + +#### **Register Uploaded Files as Artifact in AI Core** + +To register your evaluation dataset with SAP AI Core, you need to upload it as an artifact. Follow the instructions below using the **SAP AI Launchpad UI**. + +--- + +- Open the **SAP AI Core Launchpad**. +- Navigate to the **Generative AI/Optimization/Artifacts** section to create dataset artifact. + +![img](img/image_19.png) + +- On the **Artifacts** section, click **add**. + +--- + +- On the **General Information** screen, enter the following: + + - **Select Scenario:** `genai-evaluations` + - **Name:** `genai-eval-test-data` + - **Description:** `Demo artifacts for evaluation flow.` + - **Select Object Store:** `genai-data` + - **Sub-folder path:** `genaiEvaluation/` + + > 💡 Replace `` with your **SAP BTP user ID** or the folder path in your object store where the evaluation files are uploaded. + +- On the **Labels** screen, click **“Add Label”** and provide the following: + + - **Key:** `prompt-evaluation` + - **Value:** `true` + *(Note: The prefix `ext.ai.sap.com/` is automatically pre-filled in the UI.)* + + ![img](img/image_21.png) + +- Review all entered details carefully. +- Click **“Add”** to complete the artifact registration. + +[OPTION END] + +[OPTION BEGIN [Python]] + +After creating the secret, organize your evaluation files into the eval/ folder testdata. Upload them to S3 and register as artifacts in AI Core. + +#### **Upload Files to S3 Bucket** +```python +# uploading these files to Object store to register as an artifact inside ai core + +import boto3 +import os +import uuid + +def upload_folder_to_s3(folder_path, bucket_name, s3_prefix=""): + """ + Upload a folder to an S3 bucket recursively. + + :param folder_path: The local folder path to upload. + :param bucket_name: The name of the S3 bucket. + :param s3_prefix: Optional prefix to use for the S3 keys (e.g., subfolder in the bucket). + """ + s3_client = boto3.client( + 's3', + aws_access_key_id=AWS_ACCESS_KEY, + aws_secret_access_key=AWS_SECRET_ACCESS_KEY, + region_name=AWS_REGION + ) + + for root, dirs, files in os.walk(folder_path): + for file_name in files: + print("val of root is ", file_name) + local_path = os.path.join(root, file_name) + # Compute the relative path for the S3 key + relative_path = os.path.relpath(local_path, folder_path) + s3_key = os.path.join(s3_prefix, relative_path).replace("\\", "/") # Ensure S3-compatible paths + print("val of s3 key is ", s3_key) + print(f"Uploading {local_path} to s3://{bucket_name}/{s3_key}") + + # Upload the file + s3_client.upload_file(local_path, bucket_name, s3_key) + +# Example usage +folder_to_upload_testdata = "../DATASET" +user_directory_prefix = "" # replace with your i-number as string here +prefix_guid = user_directory_prefix if user_directory_prefix is not None else str(uuid.uuid4().hex) +s3_testdata_prefix = f"genaiEvaluation/{prefix_guid}/testdata" # Leave empty for root of the bucket + + +upload_folder_to_s3(folder_to_upload_testdata, AWS_BUCKET_ID, s3_testdata_prefix) +input_artifact_path = f"ai://genai-quick-data-notebook/genaiEvaluation/{prefix_guid}" +``` + ![img](img/image_5.png) + +#### **Register Uploaded Files as Artifact in AI Core** + +```Python +import requests +import logging +# Registering the uploaded files from AWS as artifacts to use inside configuration. + +def register_artifact(): + headers = _get_headers() + + GET_ARTIFACTS_ENDPOINT = '/v2/lm/artifacts' + request_url = f"{AICORE_BASE_URL}{GET_ARTIFACTS_ENDPOINT}" + + request_body = { + "labels": [ + { + "key": "ext.ai.sap.com/prompt-evaluation", + "value": "true" + } + ], + "name": "genai-eval-simplified-test-data", + "kind": "other", + "url": input_artifact_path, # input artifact path + "description": "demo artifacts for evaluation flow.", + "scenarioId": "genai-evaluations" + } + try: + response = requests.post( + request_url, headers=headers, data=json.dumps(request_body), timeout=120 + ) + result = response.json() + print(result) + return result['id'] + except: + print("Error occurred while attempting to register artifact") + raise + +artifact_id = register_artifact() +``` +![img](img/image_6.png) + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +Before registering a dataset artifact in Bruno, you must upload your CSV file to the SAP AI Core object store using the Dataset API. + +Bruno cannot upload files directly to S3. therefore, this step is required. + +**Prerequisites** + +- An object store secret must already exist in your resource group.Typically, this is the default secret named **default** + +- The Dataset API currently supports: + + - S3 object stores only + + - CSV file uploads + +**Upload Your Dataset** + +Use the Dataset API – Upload File request in Bruno: + +```bash +PUT:{{ai_api_url}}/v2/lm/dataset/files/{{secretName}}/{{datasetPath}} +``` + +**Headers** + +```json +Authorization: Bearer {{token}} +AI-Resource-Group: {{resourceGroup}} +Content-Type: text/csv +``` + +**Body** + +Upload your .csv file directly as binary in Bruno’s Body + +Example Path Values: + + - secretName: default + + - datasetPath: testdata/medicalqna_dataset.csv + +![img](img/image_br_dt.png) + +**Note:** + +Save the ai://… URL — you will use this when creating the dataset artifact. + +**Register the Dataset Artifact** + +- Click on **Register artifact** under lm -> artifacts in bruno collection to register the artifact + +```CODE +{ + "name": "aiconfig", + "kind": "dataset", + "url": "ai://default/testdata/medicalqna_dataset.csv", + "scenarioId": "genai-evaluations" +} +``` +![img](img/image-br02.png) + +[OPTION END] + +### Create a Prompt Template in Prompt Registry + +[OPTION BEGIN [SAP AI Launchpad]] + +A Prompt Template defines: + + - The message roles (system, user, etc.) + + - Variables that get substituted from your dataset (e.g., questions) + + - Optional model configuration (temperature, max tokens, etc.) + +We’ll create a prompt template to guide the model to answer the questions + +**create the Prompt Template** + +- In SAP AI Launchpad, go to the left-hand menu and select Generative AI Hub → Prompt Management. + +- click on Templates → create + +- This is where you can define reusable templates with variables for evaluations. + +![img](img/image_007.png) + +**Define the Prompt** + +In the Message Blocks section: + +- Add a System role message: +```json +{ + "template": [ + { + "role": "user", + "content": "List the benefits and side effects of the drug in the following consumer health question: {{?question}}." + } + ] +} +``` + +**Configure Variables** + +Scroll down to Variable Definitions and add entries for each variable: + +- question + + - Default Value: leave empty or set to en for fallback + +This ensures the placeholders are dynamically substituted during evaluation. + +![img](img/image_008.png) + +**Save the Template** + +Click Save Template (top right): + +- Scenario → genai-evaluations + +- Name → prompt-registry-eval-acc-test + +- Version → 1.0.0 + +Click Save to persist the template. + +**Verify the Template** + +Go to Generative AI Hub → Prompt Management → Templates and confirm: + +- The template appears with the correct name, scenario, and version. + +- Managed By → shows how the template is stored. + +- Versioning is tracked automatically + +![img](img/image_10.png) + +[OPTION END] + +[OPTION BEGIN [Python]] + +The following code defines a function `create_prompt_template()` that creates a new **Prompt Template** in the SAP AI Core **Prompt Registry**. + +```python +def create_prompt_template(): + headers = _get_headers() + GET_PROMPT_TEMPLATES_ENDPOINT = '/v2/lm/promptTemplates' + request_url = f"{AICORE_BASE_URL}{GET_PROMPT_TEMPLATES_ENDPOINT}" + + + prompt_template = { + "template": [ + { + "role": "user", + "content": "List the benefits and side effects of the drug in the following consumer health question: {{?question}}." + } + ] + } + + request_body = { + "name": "prompt-registry-eval-demo", + "version": "1.0.0", + "scenario": "genai-evaluations", + "spec": prompt_template + } + try: + response = requests.post( + request_url, headers=headers, data=json.dumps(request_body), timeout=120 + ) + if(response.status_code != 200): + raise + result = response.json() + print(result) + return result['id'] + except: + logging.error("Error occurred while attempting to create a prompt template") + raise + +prompt_template_id = create_prompt_template() +``` +![img](img/image__py_pmtreg.png) + +**Note** + +If you wish to use a prompt template that already exists in prompt registry, you can manually set prompt_template_id in the next cell and skip executing this cell + +If you already have an existing template set the ID manually: + +```python +prompt_template_id = "" +``` + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +In Bruno, you can create a prompt template by sending a POST request to the AI Core API: + +**Request: Create Prompt Template** + +**URL:** + +```bash +POST {{api_url}}/v2/lm/promptTemplates +``` + +**Headers:** +``` +Authorization: Bearer {{access_token}} +Content-Type: application/json +``` + +**Body (JSON):** +```json +{ + "name": "prompt-registry-eval-acc-test", + "version": "1.0.0", + "scenario": "genai-evaluations", + "spec": { + "template": [ + { + "role": "user", + "content": "List the benefits and side effects of the drug in the following consumer health question: {{?question}}." + } + ], + "defaults": {}, + "additional_fields": { + "modelParams": { + "temperature": 0.3, + "max_tokens": 100 + }, + "modelGroup": "chat" + } + } +} +``` +![img](img/image_br_pr.png) + +[OPTION END] + +🔑 Tip: Always increment the version (e.g., 1.0.1, 1.0.2) when updating a template. This ensures reproducibility across evaluations. + +### Providing Models and Metrics for Evaluation + +Metrics determine how your model outputs are evaluated during an evaluation run. They define the scoring logic that SAP AI Core uses to compare models, measure quality, and validate improvements over time. + +Metrics must be supplied before creating an Evaluation Configuration. + +[OPTION BEGIN [SAP AI Launchpad]] + +In SAP AI Launchpad, metrics are selected visually during the Evaluation Configuration creation flow, the UI provides a selectable list of available metrics. + +1. Go to Generative AI Hub → Optimization. + +2. Click Create to start a new evaluation configuration. + +![img](img/image_25.png) + +- Select Test Input, then: + + - Select the prompt and select more than one model + + - Select your registered dataset artifact + + - Enter the dataset path (example): + testdata/medicalqna_dataset.csv + + - Set the number of test samples (e.g., 20) + + ![img](img/image_ail_26.png) + +- Click **Next** to go to Metrics selection. + +#### Select Evaluation Metrics + +Choose the metrics you want to evaluate. + +You may choose one or multiple system-defined metrics—examples: + + - BERT Score + + - Pointwise Answer Relevance + + - Pointwise Correctness + + - Pointwise Instruction Following + +![img](img/image_27.png) + +--- + +> 📘 **Helpful Resources**: +> +> - [System-Defined Evaluation Metrics – SAP Documentation](https://help.sap.com/docs/sap-ai-core/generative-ai-hub/system-defined-evaluation-metrics) + +> **Note: You may select additional metrics based on your use case.** + +--- + +[OPTION END] + +[OPTION BEGIN [Python]] + +**Select your Models** + +Add the models you wish to use in the string `selected_models_str` + +```Python +# Manual selection of models +selected_models_str="gemini-2.5-pro:001,gpt-4o:2024-08-06,gpt-5:2025-08-07" +print("Selected models string:", selected_models_str) +``` + +**Metrics Handling in Python Notebook (Automatic Detection & Creation)** + +When running the evaluation through the Python notebook, metric setup is partially automated. +Before the evaluation configuration is created, the script performs the following: + + - Users can manually specify metric IDs + + - It checks if each metric already exists in AI Core + + - If not found → creates it automatically + + - Prints final list of metric IDs used for evaluation + +This ensures all metrics exist before the evaluation configuration is created. + +```python +# Manual Selection of Metrics +selected_metrics_str = "Pointwise Conciseness,Pointwise Instruction Following,Pointwise Correctness,Pointwise Answer Relevance,Exact Match,BLEU,ROUGE,Content Filter on Input,Content Filter on Output" +print(selected_metrics_str) +``` +![img](img/image_py03.png) + +This ensures all required metrics are available before launching the evaluation. + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +You can directly pass models and system metrics in your configuration: + +Example Models: + +```json +"models":"gemini-2.5-pro:001,gpt-4o:2024-08-06,gpt-5:2025-08-07" +``` + +Example metrics: + +```json +"metrics": "Pointwise Conciseness,Pointwise Instruction Following,Pointwise Correctness,Pointwise Answer Relevance,Exact Match,BLEU,ROUGE,Content Filter on Input,Content Filter on Output" +``` + +[OPTION END] + +**Note:** + +To compare different models and generate a leaderboard, you must select more than one model. +When multiple models are provided, the evaluation system automatically creates separate +evaluation runs for each model within the same execution. This enables the evaluation workflow +to compare the runs and compute head-to-head win rates across the selected models. + +### Define and Create Evaluation Configurations + +[OPTION BEGIN [SAP AI Launchpad]] + +Once your dataset artifact is registered, the next step is to create an Evaluation Configuration. + +An Evaluation Configuration tells SAP AI Core: + + - which dataset to evaluate + + - which prompt/model or orchestration config to use + + - which metrics to compute + + - which orchestration deployment endpoint to call + + - how many repetitions to run + + - which test dataset file to load + +This configuration becomes the blueprint for your evaluation execution. + +**Steps to Create Evaluation Configuration** + +In Additional Configuration + +- Set **Number of Repetitions** to `1`. +- Choose an existing deployment for **Orchestration Endpoint**. + + ![img](img/image_29.png) +--- + +#### Final Review & Start + +- Review all the details on the summary page. +- Once confirmed, click **Create** to start the evaluation job. + +![img](img/image_40.png) + +> ✅ You have now successfully configured and triggered a Generative AI Evaluation. + +[OPTION END] + +[OPTION BEGIN [Python]] + +When using the Python notebook, the evaluation configuration is created automatically based on your selections. +Before creating the configuration, the notebook will: + + - Load the dataset artifact ID + + - Resolve metric IDs + + - Load prompt template IDs + + - Validate all required parameters + +**Sample parameter setup:** + +```Python +import json +test_data_path = f"testdata/{DATASET_NAME}" # specify the test data path here. For the full folder just specifying testdata will work +test_datasets = json.dumps({'path': test_data_path, 'type': 'csv'}) +metrics_list = selected_metrics_str +models_list = selected_models_str +print(f"Selected metrics: {metrics_list}") +print(f"Selected models: {models_list}") +orchestration_deployment_url = deployment_url # needs to specify this to use a specific deployment id +repetitions = "1" +``` + +#### Create Configuration Body + +The notebook builds the configuration using the required SAP AI Core fields: + + - scenarioId + + - executableId + + - dataset artifact binding + + - selected metrics + + - test dataset details + + - repetitions + + - orchestration deployment URL + + - promptTemplate + + - models. + +The following function dynamically creates the configuration body for AI Core. + +```Python +# creating an AICORE Configuration. +import requests + +request_body = { + "name": "genai-eval-conf", + "scenarioId": "genai-evaluations", + "executableId": "genai-evaluations-simplified", + "inputArtifactBindings": [ + { + "key": "datasetFolder", + "artifactId": artifact_id + } + ], + "parameterBindings": [ + { + "key": "repetitions", + "value": repetitions + }, + { + "key": "orchestrationDeploymentURL", + "value": orchestration_deployment_url + }, + { + "key": "metrics", + "value": metrics_list + }, + { + "key": "testDataset", + "value": test_datasets + }, + { + "key": "promptTemplate", + "value": prompt_template_id + }, + { + "key": "models", + "value": models_list + } + ] +} + +def create_aicore_configuration(): + headers = _get_headers() + GET_CONFIGURATIONS_ENDPOINT = '/v2/lm/configurations' + request_url = f"{AICORE_BASE_URL}{GET_CONFIGURATIONS_ENDPOINT}" + try: + print(request_body) + response = requests.post( + request_url, headers=headers, data=json.dumps(request_body), timeout=120 + ) + print(response) + if(response.status_code != 201): + raise Exception(f"Failed to create configuration: {response.status_code} - {response.text}") + result = response.json() + print(result) + print(request_body) + return result['id'] + except: + logging.error("Error occurred while attempting to create a Configuration") + raise + +configuration_id = create_aicore_configuration() +``` + +You will receive a configuration ID, which is required for the next step (Execution). + +![img](img/image_py_con.png) + +SAP AI Core returns a configuration ID, which is used to trigger the evaluation execution. + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +When creating an Evaluation Configuration through Bruno, you call: + +```bash +POST {{api_url}}/v2/lm/configurations +``` + +Below is the sample request body to create configuration. + +```json +{ + "name": "genai-eval-conf", + "scenarioId": "genai-evaluations", + "executableId": "genai-evaluations-simplified", + "inputArtifactBindings": [ + { + "key": "datasetFolder", + "artifactId": "{{artifactId}}" + } + ], + "parameterBindings": [ + { + "key": "repetitions", + "value": "1" + }, + { + "key": "orchestrationDeploymentURL", + "value": "{{deployment_url}}" + }, + { + "key": "metrics", + "value": "language_match" + }, + { + "key": "testDataset", + "value": "{\"path\": \"testdata/{{dataset_file}}\", \"type\": \"csv\"}" + }, + { + "key": "promptTemplate", + "value": "{{prompt_template_id}}" + }, + { + "key": "models", + "value": "{{model_name}}:{{model_version}}" + } + ] +} +``` +![img](img/image-br03.png) + +[OPTION END] + +### Create and Run Evaluation Execution + +After creating the Evaluation Configuration, the next step is to execute it. + +Execution triggers the evaluation workflow, which: + + - Reads the test dataset + + - Generates submissions to the orchestration service + + - Collects model outputs + + - Computes all selected metrics + + - Produces aggregate and raw evaluation results + +The process is identical for SAP AI Launchpad, Python, and Bruno, with only the invocation method differing. + +[OPTION BEGIN [SAP AI Launchpad]] + +- Once the evaluation configuration is created, the system automatically triggers an evaluation execution. + +- Follow these steps to monitor its progress and verify completion: + + - Navigate to **ML Operations** in the SAP AI Core Launchpad. + + - In the sidebar, click **Executions**. + + ![img](img/image_41.png) + + - Locate the most recent execution triggered by your evaluation configuration. You can use the timestamp or configuration name to identify it. + + - Click on the execution entry to open its details. The Current Status will update as the process runs. + + ![img](img/image_31.png) + +- Once the Target Status reaches **COMPLETED** , your evaluation has successfully finished. + +> [For More information](https://help.sap.com/docs/sap-ai-core/generative-ai-hub/create-evaluation) + +Track Execution Status + +The execution page will show: + + - Unknown + + - Pending + + - Running + + - Completed + +Once completed, you can navigate to: + + - Outputs → Tracking Metrics (aggregate results) + + - Output Artifacts (raw results stored in the SQLite DB) + +[OPTION END] + +[OPTION BEGIN [Python]] + +Once the configuration is ready, the next step is to trigger an execution. +An execution is a single evaluation run based on the configuration you defined. + +**Create Execution** + +The following function starts the evaluation in SAP AI Core using the configuration ID: + +```python +# create an execution with the created configuration. + +import requests +def create_execution(): + headers = _get_headers() + GET_EXECUTIONS_ENDPOINT = '/v2/lm/executions' + request_url = f"{AICORE_BASE_URL}{GET_EXECUTIONS_ENDPOINT}" + request_body = {"configurationId" : configuration_id} + try: + response = requests.post( + request_url, headers=headers, data=json.dumps(request_body), timeout=120 + ) + print("response received is ", response) + result = response.json() + print(result) + return result['id'] + except: + logging.error("Error occurred while attempting to create an execution") + raise + + +execution_id = create_execution() +``` +![img](img/image_44.png) + +#### Monitor Execution Status + +The execution progresses through states: + +UNKNOWN → PENDING → RUNNING → COMPLETED + +```python +# get execution status +import requests +def get_execution_status(execution_id): + headers = _get_headers() + LOG_EXECUTIONS_ENDPOINT = f'/v2/lm/executions/{execution_id}' + request_url = f"{AICORE_BASE_URL}{LOG_EXECUTIONS_ENDPOINT}" + try: + response = requests.get( + request_url, headers=headers, timeout=120 + ) + print("response received is ", response) + result = response.json() + return result + except: + logging.error("Error occurred while attempting to get execution status") + raise + + +get_execution_status(execution_id) +``` + +#### Automatic Polling + +To continuously monitor until the evaluation finishes: + +```python +# Polling the execution status until it is COMPLETED or DEAD or timeout occurs +def poll_execution_status(execution_id, timeout_minutes=1800, poll_interval=30): + start_time = time.time() + while True: + result = get_execution_status(execution_id) + print(f"Execution Status: {result.get('status')}") + if result.get("status") == "COMPLETED": + print(f"Execution completed successfully in {time.time() - start_time} seconds, proceed to fetch results.") + break + if result.get("status") == "DEAD": + print(f"Execution failed with status DEAD in {time.time() - start_time} seconds. Check the logs for more details.") + break + if time.time() - start_time > timeout_minutes * 60: + raise TimeoutError(f"Execution status polling timed out after {timeout_minutes} minutes.") + time.sleep(poll_interval) + +``` + +![img](img/image_45.png) + +✅ Once the execution status shows COMPLETED, the evaluation results are available and can be analyzed in the next step. + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +After creating the configuration, the next step is to trigger the evaluation workload by creating an AI Core execution. + +**Create an Execution with the Created Configuration** + +- Click on create execution under executions, pass the configuration id created in previous step + +![img](img/image-br04.png) + +- The status field progresses through different states over time: +UNKNOWN → PENDING → RUNNING → COMPLETED. + +**Get Execution Status** + +check the status of created execution by passing the execution ID, The Current Status will update as the process runs. please refer the below image + +![img](img/image-br05.png) + +[OPTION END] + +### View and Analyze Evaluation Results + +Once the evaluation execution is complete, SAP AI Core generates both aggregated metrics and detailed instance-level results. +These results help compare model performance, understand quality metrics, and debug issues. + +[OPTION BEGIN [SAP AI Launchpad]] + +Once the evaluation workflow execution is completed, this step retrieves the aggregated evaluation metrics from the SAP AI Core service by specifying the run name. + +1. Go to Optimizations + +2. In the runs section , select the runs you created + +3. you can View detailed results of a runs across your selected metrics and models + +This is the easiest way to visually inspect evaluation outcomes and compare multiple model runs. + +![img](img/image_46_01.png) + +- Compare run performance across your selected metrics. Metrics are aggregated at run level. + +![img](img/image_46.png) + +![img](img/image_46a.png) + +[OPTION END] + +[OPTION BEGIN [Python]] + +The notebook includes utility scripts to retrieve aggregated metrics, download detailed artifacts, and inspect SQLite results.This returns all metric values per evaluated run, which your notebook then: + + - Aggregated evaluation metrics + + - Raw instance-level results + + - Prepares for ranking and scoring + +**Retrieve Aggregate Metrics (Tracking API)** + +Aggregated metrics summarize performance across all test samples. +To fetch them using execution ID: + +```Python +# Get aggregate metrics using execution id +import requests +def retrieve_aggregate_metrics(execution_id): + headers = _get_headers() + GET_METRICS_ENDPOINT = f'/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/child-of={execution_id}' + request_url = f"{AICORE_BASE_URL}{GET_METRICS_ENDPOINT}" + try: + response = requests.get(request_url, headers=headers, timeout=120) + print("response received is ", response) + result = response.json() + return result + except: + logging.error("Error occurred while attempting to retreive aggeregate metrics for the run") + raise + +runs_data = retrieve_aggregate_metrics(execution_id) +``` + +**Transform Metrics by Model** + +Each run contains tags that identify the evaluated model. + +```python +import pandas as pd + +def get_model_from_run(run): + for tag in run.get("tags", []): + if tag.get("name") == "evaluation.ai.sap.com/model": + return tag.get("value") + +def aggregate_metrics_by_model(runs_list): + transformed_data = [] + for run in runs_list: + model = get_model_from_run(run) + for metric in run["metrics"]: + metric_value = metric.get("value") + + # Override only for /mode + if metric.get("name").endswith("/mode"): + for label in metric.get("labels", []): + if label.get("name") == "evaluation.ai.sap.com/mode_category": + metric_value = label.get("value") + break + output_json = { + "model": model, + "metrics_name": metric.get("name"), + "metric_value": metric_value + } + transformed_data.append(output_json) + return transformed_data + + +def create_metrics_pivot_table(transformed_data): + """ + Creates a pivot table where rows are models and columns are metrics. + + Args: + transformed_data: List of dictionaries with 'model', 'metrics_name', 'metric_value' + + Returns: + DataFrame with models as rows and metrics as columns + """ + # Convert list of dictionaries to DataFrame + df = pd.DataFrame(transformed_data) + + # Create pivot table + pivot_table = df.pivot_table( + index='model', + columns='metrics_name', + values='metric_value', + aggfunc='first' # Use 'first' to get the single value, or 'mean' if there are duplicates + ) + + return pivot_table + +transformed_data = aggregate_metrics_by_model(runs_data['resources']) +metrics_pivot = create_metrics_pivot_table(transformed_data) + +HTML(metrics_pivot.to_html()) +``` +![img](img/image_47.png) + +**Download Raw Results (Output Artifact)** + +All detailed evaluation outputs are stored as an output artifact in your object store. To download all output files programmatically: + +```python +# download the result artifacts from Object store. +import boto3 + +def download_all_objects(prefix, destination_folder): + """ + Recursively download all objects from an S3 bucket starting with a specific prefix. + + :param bucket_name: Name of the S3 bucket. + :param prefix: Prefix to filter objects in the bucket. + :param destination_folder: Local folder to save the downloaded files. + """ + s3_client = boto3.client( + 's3', + aws_access_key_id=AWS_ACCESS_KEY, + aws_secret_access_key=AWS_SECRET_ACCESS_KEY, + region_name=AWS_REGION + ) + + # Ensure the destination folder exists + if not os.path.exists(destination_folder): + os.makedirs(destination_folder) + + # Paginate through objects + paginator = s3_client.get_paginator('list_objects_v2') + pages = paginator.paginate(Bucket=AWS_BUCKET_ID, Prefix=prefix) + + for page in pages: + if 'Contents' in page: + for obj in page['Contents']: + key = obj['Key'] + local_file_path = os.path.join(destination_folder, os.path.relpath(key, prefix)) + + # Ensure the local directory structure exists + local_directory = os.path.dirname(local_file_path) + if not os.path.exists(local_directory): + os.makedirs(local_directory) + + # Download the object + print(f"Downloading {key} to {local_file_path}") + s3_client.download_file(AWS_BUCKET_ID, key, local_file_path) + + +# Download the evaluation results from the object store. Look at execution status under "outputArtifacts" key to see the 'url' +# which shows the data path of where your output results are stored +EXECUTION_ID = execution_id +sqlite_db_prefix = f'{EXECUTION_ID}/tmp/' # change the prefix based on where your output artifact is stored in the bucket. +destination_folder = 'results-new' + +download_all_objects(sqlite_db_prefix, destination_folder) +``` + +![img](img/image_48.png) + +**View Detailed Results (SQLite DB)** + +The evaluation stores detailed instance-level results in results.db. + +Example: Reading SQLite tables: + +```python +# viewing the results from sqlite db in tabular format.. +import sqlite3 +import pandas as pd +from IPython.display import display, HTML + +# Path to your SQLite database file +db_file = 'results-new/results.db' + +connection = sqlite3.connect(db_file) + +# Specify the table names you want to display +table_names = ['run','configuration', 'submission', 'submission_result', 'evaluation_result'] + +# Create the CSS and HTML container +html_content = """ + +
+""" + +for table_name in table_names: + query = f"SELECT * FROM {table_name};" + df = pd.read_sql_query(query, connection) + # If you want to see all the rows across all tables, remove/comment the next line + df = df.head(5) # Limiting the number of rows displayed + table_html = df.to_html(classes='table-container', index=False) + html_content += f""" +
+

Table: {table_name}

+ {table_html} +
+ """ + +html_content += "
" + +display(HTML(html_content)) + +# Close the connection +connection.close() +``` + +![img](img/image_py_rk.png) + +#### Process and Rank Results + +This step generates a leaderboard ranking models by their Win Rate (percentage of pairwise victories), providing a robust, comparative measure of the best-performing model and prompt configuration. + +```Python + +import pandas as pd +import numpy as np +import sqlite3 +import json +import os +from IPython.display import display, HTML + +# ========================================== +# 1. CONFIGURATION (Separated Groups) +# ========================================== +METRIC_GROUPS = { + "Categorical": { + "type": "categorical", + "description": "Weighted Average (1-5 scale)", + "metrics": [ + "Pointwise Conciseness", + "Pointwise Instruction Following", + "Pointwise Correctness", + "Pointwise Answer Relevance" + ] + }, + "Boolean": { + "type": "categorical", # Uses same weighted avg logic (0 or 1) + "description": "Pass Rate (0-1 scale)", + "metrics": [ + "Exact Match", + "Content Filter on Input", + "Content Filter on Output", + "Language Match", + "JSON Schema Match" + ] + }, + "Numerical": { + "type": "numerical", + "description": "Mean Value", + "metrics": [ + "BLEU", + "ROUGE", + "BERT Score", + "test-metric" + ] + } +} + +# ========================================== +# 2. DATA EXTRACTION +# ========================================== +def extract_db_metadata(db_path): + if not os.path.exists(db_path): return pd.DataFrame() + conn = sqlite3.connect(db_path) + df_runs = pd.read_sql_query("SELECT id, name, tags, config FROM run", conn) + conn.close() + + meta_data = [] + for _, row in df_runs.iterrows(): + run_id = str(row["id"]) + run_name = str(row["name"]) + tags = {} + config = {} + try: tags = json.loads(row["tags"]) if isinstance(row["tags"], str) else row["tags"] + except: pass + try: config = json.loads(row["config"]) if isinstance(row["config"], str) else row["config"] + except: pass + + model = "Unknown" + try: model = config["modules"]["prompt_templating"]["model"]["name"] + except: + if isinstance(tags, dict): model = tags.get("evaluation.ai.sap.com/model", "Unknown") + elif isinstance(tags, list): + for t in tags: + if t.get("key") == "evaluation.ai.sap.com/model": model = t.get("value") + + meta_data.append({"run_id": run_id, "run_name": run_name, "model": model}) + return pd.DataFrame(meta_data) + +def extract_api_metrics(runs_data_resource): + flat_data = [] + for run in runs_data_resource: + model = "Unknown" + for t in run.get("tags", []): + if t.get("name") == "evaluation.ai.sap.com/model": + model = t.get("value") + break + for m in run.get("metrics", []): + clean_name = m.get("name", "").replace('"', '').strip() + flat_data.append({ + "model": model, + "metrics_name_clean": clean_name, + "metric_value": m.get("value") + }) + df = pd.DataFrame(flat_data) + df['metric_value'] = pd.to_numeric(df['metric_value'], errors='coerce') + return df + +# ========================================== +# 3. SCORING & HELM LOGIC +# ========================================== +def calculate_weighted_avg_score(row, cols): + """ Returns a score based on counts. + Categorical: 1-5 scale. + Boolean: 0-1 scale (Pass Rate). + """ + total_score = 0 + total_count = 0 + # Check counts 0-5 (covers Boolean 0/1 and Categorical 1-5) + for rating in range(0, 6): + col_name = next((c for c in cols if f"/{rating}/count" in c), None) + if col_name and not pd.isna(row[col_name]): + count = row[col_name] + total_score += count * rating + total_count += count + return total_score / total_count if total_count > 0 else 0.0 + +def get_metric_score_series(df_metrics, metric_name, group_type): + """ Returns a Series of SCORES (Scalar) for each model for a specific metric """ + subset = df_metrics[df_metrics['metrics_name_clean'].str.startswith(metric_name)] + if subset.empty: return None + + # Pivot to get columns for this metric + pivot = subset.pivot_table(index='model', columns='metrics_name_clean', values='metric_value', aggfunc='first') + cols = pivot.columns.tolist() + + if group_type == "categorical": + # Calculate Weighted Average (or Pass Rate for Boolean) + return pivot.apply(lambda row: calculate_weighted_avg_score(row, cols), axis=1) + else: + # Calculate Mean (Numerical) + c_mean = next((c for c in cols if "mean" in c), None) + if c_mean: return pivot[c_mean] + return None + +def calculate_group_win_rate(score_table): + """ + Calculates HELM Win Rate: % of times a model beats another model across all metrics in this group. + """ + models = score_table.index.tolist() + metrics = score_table.columns.tolist() + win_rates = {} + + for model_a in models: + wins = 0 + comparisons = 0 + + for model_b in models: + if model_a == model_b: continue + + # Compare across ALL metrics in this table + for metric in metrics: + score_a = score_table.at[model_a, metric] + score_b = score_table.at[model_b, metric] + + # Only compare valid scores + if pd.isna(score_a) or pd.isna(score_b): continue + + comparisons += 1 + if score_a > score_b: + wins += 1 + + win_rates[model_a] = wins / comparisons if comparisons > 0 else 0.0 + + return pd.Series(win_rates) + +# ========================================== +# 4. EXECUTION +# ========================================== +db_file = 'results-new/results.db' + +# A. Metadata +df_db_meta = extract_db_metadata(db_file) +df_db_unique = df_db_meta.drop_duplicates(subset=['model'], keep='last') + +# B. CSS +html_content = """ + +
+""" +if 'runs_data' in locals() and runs_data: + df_metrics_all = extract_api_metrics(runs_data['resources']) + + for group_name, config in METRIC_GROUPS.items(): + + # 1. Build Score Table + score_table = pd.DataFrame(index=df_db_unique['model'].unique()) + score_table.index.name = 'model' + + valid_metrics = [] + + # 2. Calculate Scores + for metric in config["metrics"]: + scores = get_metric_score_series(df_metrics_all, metric, config["type"]) + if scores is not None: + score_table[metric] = scores + valid_metrics.append(metric) + + if not valid_metrics: + continue + + # 3. Calculate HELM Win Rate (Specific to this group) + score_table['Win Rate'] = calculate_group_win_rate(score_table[valid_metrics]) + + # 4. Calculate Final Rank + score_table['Final Rank'] = score_table['Win Rate'].rank(ascending=False, method='min') + + # 5. Merge & Format + df_final = pd.merge(df_db_unique, score_table, on='model', how='inner') + df_final = df_final.sort_values('Final Rank') + + # Rounding + for c in valid_metrics: df_final[c] = df_final[c].fillna(0.0).astype(float).round(4) + df_final['Win Rate'] = df_final['Win Rate'].fillna(0.0).astype(float).round(4) + df_final['Final Rank'] = df_final['Final Rank'].fillna(0).astype(int) + + # Columns + meta_cols = ['run_id', 'run_name', 'model'] + final_cols = meta_cols + ['Win Rate', 'Final Rank'] + valid_metrics + + # 6. Generate HTML + table_html = df_final[final_cols].to_html(classes='table-container', index=False) + + html_content += f""" +
+

{group_name} Comparison

+

Values: {config['description']}. Win Rate based on head-to-head performance.

+ {table_html} +
+ """ + + html_content += "
" + display(HTML(html_content)) + +else: + print("'runs_data' missing.") +``` +![img](img/image_py_rnk1.png) + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +**Retrieve Aggregate Metrics by execution_id** + +Send a GET request: + +**GET** +```bash +{{apiurl}}/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/child-of={{execution_id}} +``` + +**Retrieve Aggregate Metrics Using Run Name** + +Send a GET request: + +**GET** +```bash +{{apiurl}}/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/run-name={{run_name}} +``` + +This returns aggregated values for: + + - latency + + - token usage + + - metric scores + + - completion count + +**Download Raw Results** + +1. Open the execution details + +2. Copy the output artifact URL + +3. Download the folder to obtain + + - step-wise results + + - sqlite_combined/results.db + +**Inspect Detailed Results** + +Open the SQLite DB in any client to inspect: + + - submissions + + - completion responses + + - evaluation_results (raw metric scores) + + - aggregation_results + + - custom_logs + +![img](img/image_49.png) + +![img](img/image_49a.png) + +[OPTION END] + +### Delete Evaluation Artifacts and Configurations + +Over time, your workspace may accumulate old configurations, executions, and metrics. +SAP AI Core allows you to safely delete these resources once they are no longer needed. + +This section explains how to delete: + + - Evaluation Executions + + - Evaluation Configurations + +⚠️ Important: + +Deletions are permanent and cannot be undone. + +[OPTION BEGIN [SAP AI Launchpad]] + +**Delete Executions** + +1. Go to ML Operations → Executions + +2. Select the execution + +3. Click Delete + +4. Confirm the deletion + +**Delete Evaluation Configurations** + +1. Go to ML Operations → Configurations + +2. Select the configuration you created + +3. Click Delete + +[OPTION END] + +[OPTION BEGIN [Python]] + +**1. Delete an Evaluation Execution** + +```python +#Delete Execution Id +def delete_execution(): + headers = _get_headers() + EXEC_ID = execution_id + GET_EXECUTIONS_ENDPOINT = '/v2/lm/executions/' + request_url = f"{AICORE_BASE_URL}{GET_EXECUTIONS_ENDPOINT}{EXEC_ID}" + try: + response = requests.delete( + request_url, headers=headers, params={"AI-Resource-Group":AICORE_RESOURCE_GROUP}, timeout=120 + ) + print(response) + if(response.status_code != 202): + raise + result = response.json() + print(result) + except: + logging.error("Error occurred while attempting to delete a Configuration") + raise + +delete_execution() +``` +**2. Delete an Evaluation Configuration** + +```python +def delete_configuration(configuration_id): + headers = _get_headers() + endpoint = f"/v2/lm/configurations/{configuration_id}" + url = f"{AICORE_BASE_URL}{endpoint}" + + response = requests.delete(url, headers=headers) + print("Status:", response.status_code) + print(response.text) + +# Example: +delete_configuration(configuration_id) +``` + +[OPTION END] + +[OPTION BEGIN [Bruno]] + +**1. Delete Execution** + +**DELETE Request** +```bash +{{apiurl}}/v2/lm/executions/{{execution_id}} +``` +**Headers:** +``` +Authorization: Bearer {{access_token}} +AI-Resource-Group: {{resource_group}} +``` +**2. Delete Configuration** + +```bash +DELETE {{apiurl}}/v2/lm/configurations/{{configuration_id}} +``` + +[OPTION END] diff --git a/tutorials/ai-core-genaihub-evaluation/evaluation_public_simplified_workflow.ipynb b/tutorials/ai-core-genaihub-evaluation/evaluation_public_simplified_workflow.ipynb deleted file mode 100644 index a03d5dbfc..000000000 --- a/tutorials/ai-core-genaihub-evaluation/evaluation_public_simplified_workflow.ipynb +++ /dev/null @@ -1,2745 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Generative AI Custom Evaluation\n", - "This is an example notebook which showcases how a user can use AI Core custom evaluation to benchmark their large language models, evaluate orchestration configuration or prompts for their use case.\n", - "It uses publicly available [MedicationQA dataset](https://langtest.org/docs/pages/benchmarks/medical/medicationqa/) which consists of commonly asked consumer questions about medications. The workload computes industry standard metrics to check the reliability of the response generate by llm.\n", - "
**Note: For detailed instructions please refer to [Readme](./Readme.md)**" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Place your Dataset and Run configuration\n", - "Place your files as such:\n", - "1. Place your dataset in the folder `PUT_YOUR_DATASET_HERE`\n", - "2. Place your prompt template configurations in the folder `PUT_YOUR_PROMPT_TEMPLATE_HERE`\n", - "3. Place your custom metric files in folder `PUT_YOUR_CUSTOM_METRIC_HERE`\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# SetUp (Step 1)\n" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Requirement already satisfied: generative-ai-hub-sdk==4.4.3 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from -r requirements.txt (line 1)) (4.4.3)\n", - "Requirement already satisfied: python-dotenv==1.0.1 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from -r requirements.txt (line 2)) (1.0.1)\n", - "Requirement already satisfied: boto3==1.37.4 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from -r requirements.txt (line 3)) (1.37.4)\n", - "Requirement already satisfied: pandas==2.2.3 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from -r requirements.txt (line 4)) (2.2.3)\n", - "Requirement already satisfied: json2html==1.3.0 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from -r requirements.txt (line 5)) (1.3.0)\n", - "Requirement already satisfied: numpy==1.26.4 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from -r requirements.txt (line 6)) (1.26.4)\n", - "Requirement already satisfied: ipywidgets==8.1.0 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from -r requirements.txt (line 7)) (8.1.0)\n", - "Requirement already satisfied: pydantic==2.9.2 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (2.9.2)\n", - "Requirement already satisfied: dacite>=1.8.1 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (1.9.2)\n", - "Requirement already satisfied: ai-core-sdk>=2.5.7 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (2.6.2)\n", - "Requirement already satisfied: httpx>=0.27.0 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (0.28.1)\n", - "Requirement already satisfied: click>=8.1.7 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (8.1.8)\n", - "Requirement already satisfied: openai>=1.56.0 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (2.6.1)\n", - "Requirement already satisfied: overloading==0.5.0 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (0.5.0)\n", - "Requirement already satisfied: packaging>=23.2 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (25.0)\n", - "Requirement already satisfied: s3transfer<0.12.0,>=0.11.0 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from boto3==1.37.4->-r requirements.txt (line 3)) (0.11.5)\n", - "Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from boto3==1.37.4->-r requirements.txt (line 3)) (1.0.1)\n", - "Requirement already satisfied: botocore<1.38.0,>=1.37.4 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from boto3==1.37.4->-r requirements.txt (line 3)) (1.37.38)\n", - "Requirement already satisfied: pytz>=2020.1 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from pandas==2.2.3->-r requirements.txt (line 4)) (2025.2)\n", - "Requirement already satisfied: python-dateutil>=2.8.2 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from pandas==2.2.3->-r requirements.txt (line 4)) (2.9.0.post0)\n", - "Requirement already satisfied: tzdata>=2022.7 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from pandas==2.2.3->-r requirements.txt (line 4)) (2025.2)\n", - "Requirement already satisfied: widgetsnbextension~=4.0.7 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from ipywidgets==8.1.0->-r requirements.txt (line 7)) (4.0.15)\n", - "Requirement already satisfied: jupyterlab-widgets~=3.0.7 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from ipywidgets==8.1.0->-r requirements.txt (line 7)) (3.0.16)\n", - "Requirement already satisfied: comm>=0.1.3 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from ipywidgets==8.1.0->-r requirements.txt (line 7)) (0.2.3)\n", - "Requirement already satisfied: traitlets>=4.3.1 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from ipywidgets==8.1.0->-r requirements.txt (line 7)) (5.14.3)\n", - "Requirement already satisfied: ipython>=6.1.0 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from ipywidgets==8.1.0->-r requirements.txt (line 7)) (8.18.1)\n", - "Requirement already satisfied: annotated-types>=0.6.0 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from pydantic==2.9.2->generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (0.7.0)\n", - "Requirement already satisfied: typing-extensions>=4.6.1 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from pydantic==2.9.2->generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (4.15.0)\n", - "Requirement already satisfied: pydantic-core==2.23.4 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from pydantic==2.9.2->generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (2.23.4)\n", - "Requirement already satisfied: ai-api-client-sdk==2.6.1 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from ai-core-sdk>=2.5.7->generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (2.6.1)\n", - "Requirement already satisfied: aenum~=3.1 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from ai-api-client-sdk==2.6.1->ai-core-sdk>=2.5.7->generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (3.1.16)\n", - "Requirement already satisfied: pyhumps~=3.0 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from ai-api-client-sdk==2.6.1->ai-core-sdk>=2.5.7->generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (3.8.0)\n", - "Requirement already satisfied: requests<3.0 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from ai-api-client-sdk==2.6.1->ai-core-sdk>=2.5.7->generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (2.32.5)\n", - "Requirement already satisfied: urllib3<1.27,>=1.25.4 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from botocore<1.38.0,>=1.37.4->boto3==1.37.4->-r requirements.txt (line 3)) (1.26.20)\n", - "Requirement already satisfied: httpcore==1.* in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from httpx>=0.27.0->generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (1.0.9)\n", - "Requirement already satisfied: anyio in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from httpx>=0.27.0->generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (4.11.0)\n", - "Requirement already satisfied: idna in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from httpx>=0.27.0->generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (3.11)\n", - "Requirement already satisfied: certifi in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from httpx>=0.27.0->generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (2025.10.5)\n", - "Requirement already satisfied: h11>=0.16 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from httpcore==1.*->httpx>=0.27.0->generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (0.16.0)\n", - "Requirement already satisfied: matplotlib-inline in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from ipython>=6.1.0->ipywidgets==8.1.0->-r requirements.txt (line 7)) (0.2.1)\n", - "Requirement already satisfied: decorator in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from ipython>=6.1.0->ipywidgets==8.1.0->-r requirements.txt (line 7)) (5.2.1)\n", - "Requirement already satisfied: jedi>=0.16 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from ipython>=6.1.0->ipywidgets==8.1.0->-r requirements.txt (line 7)) (0.19.2)\n", - "Requirement already satisfied: pygments>=2.4.0 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from ipython>=6.1.0->ipywidgets==8.1.0->-r requirements.txt (line 7)) (2.19.2)\n", - "Requirement already satisfied: exceptiongroup in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from ipython>=6.1.0->ipywidgets==8.1.0->-r requirements.txt (line 7)) (1.3.0)\n", - "Requirement already satisfied: pexpect>4.3 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from ipython>=6.1.0->ipywidgets==8.1.0->-r requirements.txt (line 7)) (4.9.0)\n", - "Requirement already satisfied: stack-data in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from ipython>=6.1.0->ipywidgets==8.1.0->-r requirements.txt (line 7)) (0.6.3)\n", - "Requirement already satisfied: prompt-toolkit<3.1.0,>=3.0.41 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from ipython>=6.1.0->ipywidgets==8.1.0->-r requirements.txt (line 7)) (3.0.52)\n", - "Requirement already satisfied: distro<2,>=1.7.0 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from openai>=1.56.0->generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (1.9.0)\n", - "Requirement already satisfied: tqdm>4 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from openai>=1.56.0->generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (4.67.1)\n", - "Requirement already satisfied: sniffio in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from openai>=1.56.0->generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (1.3.1)\n", - "Requirement already satisfied: jiter<1,>=0.10.0 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from openai>=1.56.0->generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (0.11.1)\n", - "Requirement already satisfied: six>=1.5 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from python-dateutil>=2.8.2->pandas==2.2.3->-r requirements.txt (line 4)) (1.17.0)\n", - "Requirement already satisfied: parso<0.9.0,>=0.8.4 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from jedi>=0.16->ipython>=6.1.0->ipywidgets==8.1.0->-r requirements.txt (line 7)) (0.8.5)\n", - "Requirement already satisfied: ptyprocess>=0.5 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from pexpect>4.3->ipython>=6.1.0->ipywidgets==8.1.0->-r requirements.txt (line 7)) (0.7.0)\n", - "Requirement already satisfied: wcwidth in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from prompt-toolkit<3.1.0,>=3.0.41->ipython>=6.1.0->ipywidgets==8.1.0->-r requirements.txt (line 7)) (0.2.14)\n", - "Requirement already satisfied: pure-eval in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from stack-data->ipython>=6.1.0->ipywidgets==8.1.0->-r requirements.txt (line 7)) (0.2.3)\n", - "Requirement already satisfied: asttokens>=2.1.0 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from stack-data->ipython>=6.1.0->ipywidgets==8.1.0->-r requirements.txt (line 7)) (3.0.0)\n", - "Requirement already satisfied: executing>=1.2.0 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from stack-data->ipython>=6.1.0->ipywidgets==8.1.0->-r requirements.txt (line 7)) (2.2.1)\n", - "Requirement already satisfied: charset_normalizer<4,>=2 in /Users/c5408555/Downloads/notebook-update-v2/venv/lib/python3.9/site-packages (from requests<3.0->ai-api-client-sdk==2.6.1->ai-core-sdk>=2.5.7->generative-ai-hub-sdk==4.4.3->-r requirements.txt (line 1)) (3.4.4)\n", - "\u001b[33mWARNING: You are using pip version 22.0.4; however, version 25.3 is available.\n", - "You should consider upgrading via the '/Users/c5408555/Downloads/notebook-update-v2/venv/bin/python3 -m pip install --upgrade pip' command.\u001b[0m\u001b[33m\n", - "\u001b[0m\u001b[33m(Deprecated) Installing extensions with the jupyter labextension install command is now deprecated and will be removed in a future major version of JupyterLab.\n", - "\n", - "Users should manage prebuilt extensions with package managers like pip and conda, and extension authors are encouraged to distribute their extensions as prebuilt packages \u001b[0m\n", - "Building jupyterlab assets (production, minimized)\n" - ] - } - ], - "source": [ - "! pip install -r requirements.txt\n", - "! jupyter labextension install @jupyter-widgets/jupyterlab-manager" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Load your environment variables\n", - "\n", - "Ensure that your environment variables are set in a `.env` file (see sample.env for an example). If there is a missing field the notebook will prompt you for a value." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "# Loading the credentials from the env file\n", - "from gen_ai_hub.proxy.gen_ai_hub_proxy import GenAIHubProxyClient\n", - "from dotenv import load_dotenv\n", - "import os\n", - "\n", - "load_dotenv(override=True)\n", - "\n", - "\n", - "# Fetching environment variables or prompting the user if missing\n", - "AICORE_BASE_URL = os.getenv(\"AICORE_BASE_URL\") or input(\"AICORE_BASE_URL is missing. Please enter it: \")\n", - "AICORE_RESOURCE_GROUP = os.getenv(\"AICORE_RESOURCE_GROUP\") or input(\"AICORE_RESOURCE_GROUP is missing. Please enter it (default: 'default'): \") or \"default\"\n", - "AICORE_AUTH_URL = os.getenv(\"AICORE_AUTH_URL\") or input(\"AICORE_AUTH_URL is missing. Please enter it: \")\n", - "AICORE_CLIENT_ID = os.getenv(\"AICORE_CLIENT_ID\") or input(\"AICORE_CLIENT_ID is missing. Please enter it: \")\n", - "AICORE_CLIENT_SECRET = os.getenv(\"AICORE_CLIENT_SECRET\") or input(\"AICORE_CLIENT_SECRET is missing. Please enter it: \")\n", - "\n", - "AWS_ACCESS_KEY = os.getenv(\"AWS_ACCESS_KEY\") or input(\"AWS_ACCESS_KEY is missing. Please enter it: \")\n", - "AWS_BUCKET_ID = os.getenv(\"AWS_BUCKET_ID\") or input(\"AWS_BUCKET_ID is missing. Please enter it: \")\n", - "AWS_REGION = os.getenv(\"AWS_REGION\") or input(\"AWS_REGION is missing. Please enter it: \")\n", - "AWS_SECRET_ACCESS_KEY = os.getenv(\"AWS_SECRET_ACCESS_KEY\") or input(\"AWS_SECRET_ACCESS_KEY is missing. Please enter it: \")\n", - "DEPLOYMENT_URL = os.getenv(\"DEPLOYMENT_URL\", None)\n", - "\n", - "# Initializing the GenAIHubProxyClient\n", - "client = GenAIHubProxyClient(\n", - " base_url=AICORE_BASE_URL,\n", - " auth_url=AICORE_AUTH_URL,\n", - " client_id=AICORE_CLIENT_ID,\n", - " client_secret=AICORE_CLIENT_SECRET,\n", - " resource_group=AICORE_RESOURCE_GROUP\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Dependencies and Helper Functions (Step 2)" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Prompt configs: ['prompt_template.json']\n", - "Dataset name: medicalqna_dataset.csv\n", - "Prompt template contents: {'template': [{'role': 'user', 'content': 'List the benefits and side effects of the drug in the following consumer health question: {{?question}}.'}]}\n" - ] - } - ], - "source": [ - "import os\n", - "import json\n", - "\n", - "def get_prompt_config_file(folder_path):\n", - " \"\"\"\n", - " Retrieves a list of all JSON file names in the specified folder.\n", - " \"\"\"\n", - " if not os.path.isdir(folder_path):\n", - " print(f\"The folder path '{folder_path}' does not exist.\")\n", - " return []\n", - "\n", - " json_files = [file for file in os.listdir(folder_path) if file.endswith(\".json\")]\n", - "\n", - " if not json_files:\n", - " print(f\"No JSON files were found in the folder '{folder_path}'.\")\n", - " return json_files\n", - "\n", - "\n", - "def get_dataset_file_name(folder_path):\n", - " \"\"\"\n", - " Retrieves the name of the first file in the specified folder.\n", - " \"\"\"\n", - " if not os.path.isdir(folder_path):\n", - " print(f\"The folder path '{folder_path}' does not exist.\")\n", - " return None\n", - "\n", - " items_in_folder = os.listdir(folder_path)\n", - "\n", - " for item in items_in_folder:\n", - " item_path = os.path.join(folder_path, item)\n", - " if os.path.isfile(item_path):\n", - " return item\n", - "\n", - " print(f\"No files were found in the folder '{folder_path}'.\")\n", - " return None\n", - "\n", - "\n", - "def load_prompt_template(folder_path, file_name):\n", - " \"\"\"\n", - " Loads the contents of a JSON prompt template into a variable.\n", - " \"\"\"\n", - " file_path = os.path.join(folder_path, file_name)\n", - " try:\n", - " with open(file_path, \"r\", encoding=\"utf-8\") as f:\n", - " return json.load(f) # returns Python dict\n", - " except Exception as e:\n", - " print(f\"Error loading prompt template: {e}\")\n", - " return None\n", - "\n", - "# --- MAIN EXECUTION ---\n", - "PROMPT_FOLDER = \"./PUT_YOUR_PROMPT_TEMPLATE_HERE\"\n", - "DATASET_FOLDER = \"./PUT_YOUR_DATASET_HERE\"\n", - "\n", - "PROMPT_CONFIG_FILES = get_prompt_config_file(PROMPT_FOLDER)\n", - "DATASET_NAME = get_dataset_file_name(DATASET_FOLDER)\n", - "\n", - "if PROMPT_CONFIG_FILES and DATASET_NAME:\n", - " # Load the first JSON prompt template\n", - " PROMPT_TEMPLATE = load_prompt_template(PROMPT_FOLDER, PROMPT_CONFIG_FILES[0])\n", - " print(f\"Prompt configs: {PROMPT_CONFIG_FILES}\")\n", - " print(f\"Dataset name: {DATASET_NAME}\")\n", - " print(\"Prompt template contents:\", PROMPT_TEMPLATE)\n", - "else:\n", - " print(\"Missing run or dataset file.\")\n", - " raise SystemExit(\"Exiting due to missing run/dataset file.\")\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Register an Object Store Secret\n", - "To use the evaluations service, you must register an object store with the name default. Optionally, you can register an additional object store with a name of your choice." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "# setup authentication and headers needed for AI Core requests\n", - "def _get_headers():\n", - " headers = {\n", - " \"Authorization\": client.get_ai_core_token(),\n", - " \"AI-Resource-Group\": AICORE_RESOURCE_GROUP,\n", - " \"Content-Type\": \"application/json\",\n", - " }\n", - " return headers" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Successfully deleted object store secret: default\n", - "Successfully deleted object store secret: genai-quick-data-notebook\n" - ] - }, - { - "data": { - "text/plain": [ - "{'message': 'secret has been created'}" - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# Register S3 secret with AI Core which will be used an input source \n", - "import requests\n", - "import json\n", - "import logging\n", - "\n", - "def delete_oss_secret(oss_name=\"\"):\n", - " headers = _get_headers()\n", - " \n", - " DELETE_SECRETS_ENDPOINT = f'/v2/admin/objectStoreSecrets/{oss_name}'\n", - " request_url = f\"{AICORE_BASE_URL}{DELETE_SECRETS_ENDPOINT}\"\n", - " \n", - " try:\n", - " response = requests.delete(request_url, headers=headers, timeout=120)\n", - " if response.status_code == 202:\n", - " print(f\"Successfully deleted object store secret: {oss_name}\")\n", - " elif response.status_code == 404:\n", - " print(f\"Object store secret not found: {oss_name}. It may not exist.\")\n", - " else:\n", - " logging.error(f\"Failed to delete object store secret: {oss_name}, Status Code: {response.status_code}\")\n", - " except Exception as e:\n", - " logging.error(f\"Error occurred while attempting to delete object store secret: {e}\")\n", - " raise\n", - "\n", - "def register_oss_secret(oss_name=\"\", path_prefix=\"\"):\n", - " headers = _get_headers()\n", - " \n", - " POST_SECRETS_ENDPOINT = '/v2/admin/objectStoreSecrets'\n", - " request_url = f\"{AICORE_BASE_URL}{POST_SECRETS_ENDPOINT}\"\n", - " \n", - " request_body = {\n", - " \"name\": oss_name,\n", - " \"data\": {\n", - " \"AWS_ACCESS_KEY_ID\": AWS_ACCESS_KEY,\n", - " \"AWS_SECRET_ACCESS_KEY\": AWS_SECRET_ACCESS_KEY\n", - " },\n", - " \"type\": \"S3\",\n", - " \"bucket\": AWS_BUCKET_ID,\n", - " \"endpoint\": \"s3-eu-central-1.amazonaws.com\",\n", - " \"region\": AWS_REGION,\n", - " \"pathPrefix\": path_prefix,\n", - " \"verifyssl\": \"0\",\n", - " \"usehttps\": \"1\",\n", - " }\n", - " try:\n", - " response = requests.post(\n", - " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", - " )\n", - " result = response.json()\n", - " return result\n", - " except:\n", - " logging.error(\"Error occurred while attempting to create object store secret\")\n", - " raise\n", - " \n", - "delete_oss_secret(oss_name=\"default\")\n", - "delete_oss_secret(oss_name=\"genai-quick-data-notebook\")\n", - " \n", - "register_oss_secret(oss_name=\"default\", path_prefix=\"\")\n", - "register_oss_secret(oss_name=\"genai-quick-data-notebook\", path_prefix=\"\")" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "val of root is medicalqna_dataset.csv\n", - "val of s3 key is genaiEvaluation/I321506/testdata/medicalqna_dataset.csv\n", - "Uploading ./PUT_YOUR_DATASET_HERE\\medicalqna_dataset.csv to s3://hcp-b60330de-a879-4848-9a3d-0ac828f4517c/genaiEvaluation/I321506/testdata/medicalqna_dataset.csv\n" - ] - } - ], - "source": [ - "# uploading these files to Object store to register as an artifact inside ai core\n", - "\n", - "import boto3\n", - "import os\n", - "import uuid\n", - "\n", - "def upload_folder_to_s3(folder_path, bucket_name, s3_prefix=\"\"):\n", - " \"\"\"\n", - " Upload a folder to an S3 bucket recursively.\n", - "\n", - " :param folder_path: The local folder path to upload.\n", - " :param bucket_name: The name of the S3 bucket.\n", - " :param s3_prefix: Optional prefix to use for the S3 keys (e.g., subfolder in the bucket).\n", - " \"\"\"\n", - " s3_client = boto3.client(\n", - " 's3',\n", - " aws_access_key_id=AWS_ACCESS_KEY,\n", - " aws_secret_access_key=AWS_SECRET_ACCESS_KEY,\n", - " region_name=AWS_REGION\n", - " )\n", - "\n", - " for root, dirs, files in os.walk(folder_path):\n", - " for file_name in files:\n", - " print(\"val of root is \", file_name)\n", - " local_path = os.path.join(root, file_name)\n", - " # Compute the relative path for the S3 key\n", - " relative_path = os.path.relpath(local_path, folder_path)\n", - " s3_key = os.path.join(s3_prefix, relative_path).replace(\"\\\\\", \"/\") # Ensure S3-compatible paths\n", - " print(\"val of s3 key is \", s3_key)\n", - " print(f\"Uploading {local_path} to s3://{bucket_name}/{s3_key}\")\n", - " \n", - " # Upload the file\n", - " s3_client.upload_file(local_path, bucket_name, s3_key)\n", - "\n", - "# Example usage\n", - "folder_to_upload_testdata = \"./PUT_YOUR_DATASET_HERE\"\n", - "folder_to_upload_custommetric = \"./PUT_YOUR_CUSTOM_METRIC_HERE\"\n", - "user_directory_prefix = \"I321506\" # replace with your i-number as string here\n", - "prefix_guid = user_directory_prefix if user_directory_prefix is not None else str(uuid.uuid4().hex)\n", - "s3_testdata_prefix = f\"genaiEvaluation/{prefix_guid}/testdata\" # Leave empty for root of the bucket\n", - "\n", - "\n", - "upload_folder_to_s3(folder_to_upload_testdata, AWS_BUCKET_ID, s3_testdata_prefix)\n", - "input_artifact_path = f\"ai://genai-quick-data-notebook/genaiEvaluation/{prefix_guid}\"" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The user stores the input files in the object store and registers the root folder as artifact with AI Core. The File Upload and Artifact endpoints of AI Core API may be used for this purpose. In this example `genaiEvaluation\\{prefix_guid}` is the root folder containing the orchestration configurations and test data which is registered as AI Core artifact." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{'id': '75dd115f-1707-4de6-9031-ce22d8c12015', 'message': 'Artifact acknowledged', 'url': 'ai://genai-quick-data-notebook/genaiEvaluation/I321506'}\n" - ] - } - ], - "source": [ - "import requests\n", - "import logging\n", - "# Registering the uploaded files from AWS as artifacts to use inside configuration.\n", - "\n", - "def register_artifact():\n", - " headers = _get_headers()\n", - " \n", - " GET_ARTIFACTS_ENDPOINT = '/v2/lm/artifacts'\n", - " request_url = f\"{AICORE_BASE_URL}{GET_ARTIFACTS_ENDPOINT}\"\n", - " \n", - " request_body = {\n", - " \"labels\": [\n", - " {\n", - " \"key\": \"ext.ai.sap.com/prompt-evaluation\",\n", - " \"value\": \"true\"\n", - " }\n", - " ],\n", - " \"name\": \"genai-eval-simplified-test-data\",\n", - " \"kind\": \"other\",\n", - " \"url\": input_artifact_path, # input artifact path\n", - " \"description\": \"demo artifacts for evaluation flow.\",\n", - " \"scenarioId\": \"genai-evaluations\"\n", - " }\n", - " try:\n", - " response = requests.post(\n", - " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", - " )\n", - " result = response.json()\n", - " print(result)\n", - " return result['id']\n", - " except:\n", - " print(\"Error occurred while attempting to create an execution\")\n", - " raise\n", - " \n", - "\n", - "artifact_id = register_artifact()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create Orchestration Deployment\n", - "An orchestration Deployment URL is required for us to run our evaluation. Once created we need to wait until the deployment is running and provides us a deployment url which will be add to our configuration file in the next step. You can skip this step if you already have a orchestration deployment running." - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [], - "source": [ - "import requests\n", - "import json\n", - "import time\n", - "\n", - "\n", - "\n", - "def create_orchestration_configuration():\n", - " headers = _get_headers()\n", - " request_body = {\n", - " \"name\": \"orchestrationDeployment\",\n", - " \"executableId\": \"orchestration\",\n", - " \"scenarioId\": \"orchestration\",\n", - " \"parameterBindings\": [\n", - " {\n", - " \"key\": \"modelFilterList\",\n", - " \"value\": \"null\"\n", - " },\n", - " {\n", - " \"key\": \"modelFilterListType\",\n", - " \"value\": \"allow\"\n", - " }\n", - " ],\n", - " \"inputArtifactBindings\": []\n", - " }\n", - " \n", - " GET_CONFIGURATIONS_ENDPOINT = '/v2/lm/configurations'\n", - " request_url = f\"{AICORE_BASE_URL}{GET_CONFIGURATIONS_ENDPOINT}\"\n", - " try:\n", - " response = requests.post(\n", - " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", - " )\n", - " print(response)\n", - " if(response.status_code != 201):\n", - " raise\n", - " result = response.json()\n", - " print(result)\n", - " return result['id']\n", - " except:\n", - " logging.error(\"Error occurred while attempting to create a Configuration\")\n", - " raise\n", - " \n", - "def execute_orchestration_deployment(configuration_id):\n", - " headers = _get_headers()\n", - " GET_DEPLOYMENTS_ENDPOINT = '/v2/lm/deployments'\n", - " request_url = f\"{AICORE_BASE_URL}{GET_DEPLOYMENTS_ENDPOINT}\"\n", - " \n", - " request_body = {\n", - " \"configurationId\": configuration_id\n", - " }\n", - " \n", - " try:\n", - " response = requests.post(\n", - " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", - " )\n", - " print(response)\n", - " if(response.status_code != 202):\n", - " print(\"Deployment execution failed\")\n", - " result = response.json()\n", - " print(result)\n", - " return result['id']\n", - " \n", - " except:\n", - " logging.error(\"Error occurred while attempting to create an execution\")\n", - " raise\n", - "\n", - "def get_deployment_status(orchestration_deployment_id):\n", - " headers = _get_headers()\n", - " api_url = f\"{AICORE_BASE_URL}/v2/lm/deployments/{orchestration_deployment_id}?$select=status\"\n", - " timeout = 400 \n", - " initial_interval = 30 \n", - " pending_interval = 10\n", - " start = time.time()\n", - "\n", - " status = None\n", - " current_interval = initial_interval\n", - "\n", - " while time.time() - start < timeout:\n", - " response = requests.get(api_url, headers=headers)\n", - " if response.status_code == 200:\n", - " status = response.json().get('status')\n", - " print(f\"Deployment {orchestration_deployment_id} status: {status}\")\n", - " # Adjust polling interval based on status\n", - " if status == 'RUNNING':\n", - " return True\n", - " elif status == 'UNKNOWN':\n", - " current_interval = initial_interval\n", - " elif status == 'PENDING':\n", - " current_interval = pending_interval\n", - "\n", - " else:\n", - " print(f\"Failed to fetch deployment status. HTTP {response.status_code}\")\n", - " return False\n", - "\n", - " # Waiting according to status for API call\n", - " time.sleep(current_interval)\n", - "\n", - "def get_deployment_url(orchestration_deployment_id):\n", - " headers = _get_headers()\n", - " response = requests.get(f\"{AICORE_BASE_URL}/v2/lm/deployments/{orchestration_deployment_id}\", headers=headers)\n", - " if response.status_code != 200:\n", - " raise Exception(f\"Failed to get deployment URL: {response.status_code} - {response.text}\")\n", - " return response.json().get('deploymentUrl')\n", - "\n", - "# You can skip this step if you already have a orchestration deployment running\n", - "deployment_url = DEPLOYMENT_URL\n", - "if not deployment_url:\n", - " configuration_id = create_orchestration_configuration()\n", - " orchestration_deployment_id = execute_orchestration_deployment(configuration_id)\n", - " is_running = get_deployment_status(orchestration_deployment_id) \n", - " if is_running:\n", - " deployment_url = get_deployment_url(orchestration_deployment_id)\n", - " print(f\"Deployment URL: {deployment_url}\")\n", - " else:\n", - " print(\"Deployment is not running or failed.\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Approach Selection\n", - "\n", - "Select whether to use `Prompt Registry` or `Orchestration Registry` approach" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "50f821a5ff974acebf2de68badce0bd3", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "VBox(children=(Label(value='Please select the configuration mode:', layout=Layout(margin='10px 0px 10px 0px'))…" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "from ipywidgets import Checkbox, VBox, HBox, Output, Label, Layout\n", - "from IPython.display import display\n", - "import textwrap\n", - "\n", - "# --- Selection state ---\n", - "approach = None\n", - "suppress_update = False \n", - "\n", - "# --- Define options ---\n", - "flag_options = [\n", - " \"prompt_registry\",\n", - " \"orchestration_registry\"\n", - "]\n", - "\n", - "# --- Output widget to show current selection ---\n", - "\n", - "output = Output(layout=Layout(border=\"1px solid black\", height=\"70px\", overflow=\"auto\", width=\"900px\"))\n", - "\n", - "\n", - "# --- Handler for checkbox changes ---\n", - "def on_flag_change(change):\n", - " global approach, suppress_update\n", - " if suppress_update:\n", - " return\n", - "\n", - " if change[\"new\"]: # A checkbox was checked\n", - " suppress_update = True\n", - " # Uncheck all other checkboxes\n", - " for cb in checkboxes:\n", - " if cb.description != change[\"owner\"].description:\n", - " cb.value = False\n", - " suppress_update = False\n", - " approach = change[\"owner\"].description\n", - " else:\n", - " # Only clear if the unchecked one was the currently selected\n", - " if approach == change[\"owner\"].description:\n", - " approach = None\n", - "\n", - " # Update display once per action\n", - " with output:\n", - " output.clear_output(wait=True)\n", - " msg = f\"Selected approach: {approach or 'None'}\"\n", - " wrapped = textwrap.fill(msg, width=60)\n", - " output.append_stdout(wrapped + \"\\n\")\n", - "\n", - "# --- Create checkboxes ---\n", - "checkboxes = [\n", - " Checkbox(value=False, description=option, layout=Layout(width=\"250px\"))\n", - " for option in flag_options\n", - "]\n", - "\n", - "# --- Attach event handler ---\n", - "for cb in checkboxes:\n", - " cb.observe(on_flag_change, names=\"value\")\n", - "\n", - "# --- Display UI ---\n", - "header = Label(\n", - " value=\"Please select the configuration mode:\",\n", - " layout=Layout(margin=\"10px 0px 10px 0px\")\n", - ")\n", - "ui = VBox([header, HBox(checkboxes), output])\n", - "display(ui)\n" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "orchestration_registry\n" - ] - } - ], - "source": [ - "# Select the approach, either \"prompt_registry\" or orchestration_registry\"\n", - "print(approach)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Create a Prompt Template in Prompt Registry (Prompt Registry Approach)\n", - "\n", - "The following code defines a function `create_prompt_template()` that creates a new **Prompt Template** in the SAP AI Core **Prompt Registry**.\n", - "\n", - "**Note** : If you wish to use a prompt template that already exists in prompt registry, you can manually set `prompt_template_id` in the next cell and skip executing this cell" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{'message': 'Prompt updated successfully.', 'id': 'cf1908d7-c793-497d-ab8c-4e220f41102b', 'scenario': 'genai-evaluations', 'name': 'prompt-registry-eval-acc-test', 'version': '1.0.0'}\n" - ] - } - ], - "source": [ - "def create_prompt_template():\n", - " headers = _get_headers()\n", - " GET_PROMPT_TEMPLATES_ENDPOINT = '/v2/lm/promptTemplates'\n", - " request_url = f\"{AICORE_BASE_URL}{GET_PROMPT_TEMPLATES_ENDPOINT}\"\n", - " \n", - " request_body = {\n", - " \"name\": \"prompt-registry-eval-acc-test\",\n", - " \"version\": \"1.0.0\",\n", - " \"scenario\": \"genai-evaluations\",\n", - " \"spec\": PROMPT_TEMPLATE\n", - " }\n", - " try:\n", - " response = requests.post(\n", - " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", - " )\n", - " if(response.status_code != 200):\n", - " raise\n", - " result = response.json()\n", - " print(result)\n", - " return result['id']\n", - " except:\n", - " logging.error(\"Error occurred while attempting to create a prompt template\")\n", - " raise\n", - "\n", - "prompt_template_id = create_prompt_template()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Manually set prompt_template_id here if you wish to use pre existing prompt template\n", - "# prompt_template_id=\"\"" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Select your metrics\n", - " \n", - "Tick the metrics you wish to use. If the widget does not load properly, you can manually fill in the string `selected_metrics_str`\n", - "\n", - "**Note: If your dataset does not have a reference column, DO NOT Select metrcis where reference is required.**" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "4a1db8c29df04fc3b03700fca7ba83a6", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "VBox(children=(Label(value='Please choose which metrics you want to run', layout=Layout(margin='10px 0px 10px …" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "from ipywidgets import Checkbox, VBox, HBox, Output, Label, Layout\n", - "import textwrap\n", - "\n", - "\n", - "# List to store selected values\n", - "selected_values = []\n", - "\n", - "# Original list of checkbox options\n", - "checkbox_options = [\n", - " \"Content Filter on Input\",\n", - " \"Content Filter on Output\",\n", - " \"Pointwise Instruction Following\",\n", - " \"Pointwise Answer Relevance\",\n", - " \"Pointwise Conciseness\"\n", - "]\n", - "\n", - "# Additional options with \"(reference required)\" in their description\n", - "additional_options = [\n", - " f\"{option} (reference required)\" for option in [\"BLEU\", \"ROUGE\", \"BERT Score\", \"Pointwise Correctness\"]\n", - "]\n", - "\n", - "# Combine both lists\n", - "all_checkbox_options = checkbox_options + additional_options\n", - "\n", - "# Function to handle checkbox changes\n", - "def on_checkbox_change(change):\n", - " global selected_metrics_str # Declare the string version of selected_values as global\n", - " if change['new']: # If the checkbox is checked\n", - " # Remove \"(reference required)\" before adding to the list\n", - " metric_name = change['owner'].description.replace(\" (reference required)\", \"\")\n", - " if metric_name not in selected_values:\n", - " selected_values.append(metric_name)\n", - " else: # If the checkbox is unchecked\n", - " # Remove \"(reference required)\" before removing from the list\n", - " metric_name = change['owner'].description.replace(\" (reference required)\", \"\")\n", - " if metric_name in selected_values:\n", - " selected_values.remove(metric_name)\n", - " # Convert the list to a comma-separated string\n", - " selected_metrics_str = \",\".join(selected_values)\n", - " # Display the updated list with wrapped text\n", - " with output:\n", - " output.clear_output(wait=True) # Clear the output before printing\n", - " wrapped_text = textwrap.fill(f\"Selected values: {selected_values}\", width=80)\n", - " output.append_stdout(wrapped_text + \"\\n\") # Write directly to the output widget\n", - "\n", - "# Create checkboxes for the combined options with wider layout\n", - "checkboxes = [\n", - " Checkbox(value=False, description=option, layout=Layout(width=\"900px\")) for option in all_checkbox_options\n", - "]\n", - "\n", - "# Attach the change handler to each checkbox\n", - "for checkbox in checkboxes:\n", - " checkbox.observe(on_checkbox_change, names='value')\n", - "\n", - "# Group checkboxes into rows (2 per row)\n", - "rows = [HBox(checkboxes[i:i+2]) for i in range(0, len(checkboxes), 2)]\n", - "\n", - "# Output widget to display the selected values with a scrollable and wrapped area\n", - "output = Output(layout=Layout(border=\"1px solid black\", height=\"150px\", overflow=\"auto\", width=\"900px\"))\n", - "\n", - "# Header label\n", - "header = Label(value=\"Please choose which metrics you want to run\", layout=Layout(margin=\"10px 0px 10px 0px\"))\n", - "\n", - "# Display the header, checkboxes in rows, and the output\n", - "display(VBox([header] + rows + [output]))\n", - "\n", - "# Initialize the string version of selected_values\n", - "selected_metrics_str = \",\".join(selected_values)\n" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "BERT Score,Pointwise Conciseness\n" - ] - } - ], - "source": [ - "# Manual Selection of Metrics\n", - "#selected_metrics_str = \"rouge\"\n", - "print(selected_metrics_str)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "This script checks for an evaluation metric in SAP AI Core.\n", - "\n", - "1. You can provide Metric ID's directly by setting the variable as comma separated string:\n", - " user_metric_ids = `\"\"`\n", - " - ✅ If the ID exists, it will be returned.\n", - " \n", - "2. Script reads all `.json` and `.jsonl` files from `CUSTOM_METRIC_FOLDER` to load the custom metrics\n", - " -The loaded metrics are stored in `custom_metric_list'\n", - " - The script will use the contents of the `custom_metric_list`\n", - " to search for an existing metric by scenario + name + version.\n", - "\n", - "3. If no existing metric is found:\n", - " - A new metric will be created using the details in `custom_metric_list`.\n", - " - Required fields in custom_metric: scenario, name, version, evaluationMethod.\n", - "\n", - "4. At the end:\n", - " - The script prints the final Metric ID that was found or created.\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "✅ Metric already exists: genai-evaluations/groundedness v0.0.1, ID = 2b3cc135-a031-4d93-8641-1f3833797034\n", - "✅ Metric already exists: genai-evaluations/groundedness v0.1.6, ID = 9b349f8e-cd39-486d-809c-3dcfa3b15ac7\n", - "⚠️ User metric ID d1868b00-1601-407a-92cd-0b9065682d1f does not exist in AI Core, skipping.\n", - "⚠️ User metric ID dbf56851-8444-45d3-a0c1-adbe210c7e771 does not exist in AI Core, skipping.\n", - "✅ All processed metric IDs: 2b3cc135-a031-4d93-8641-1f3833797034,9b349f8e-cd39-486d-809c-3dcfa3b15ac7\n" - ] - } - ], - "source": [ - "import os\n", - "import json\n", - "import requests\n", - "\n", - "# --- Load JSON / JSONL files ---\n", - "def load_all_metrics(folder_path):\n", - " \"\"\"\n", - " Loads all JSON and JSONL files from a folder into a single list of dicts.\n", - " \"\"\"\n", - " metrics = []\n", - " files = [f for f in os.listdir(folder_path) if f.endswith((\".json\", \".jsonl\"))]\n", - "\n", - " if not files:\n", - " print(f\"No JSON/JSONL files found in {folder_path}\")\n", - " return metrics\n", - "\n", - " for file_name in files:\n", - " file_path = os.path.join(folder_path, file_name)\n", - " try:\n", - " with open(file_path, \"r\", encoding=\"utf-8\") as f:\n", - " content = f.read().strip()\n", - " try:\n", - " data = json.loads(content)\n", - " if isinstance(data, list):\n", - " metrics.extend(data)\n", - " elif isinstance(data, dict):\n", - " metrics.append(data)\n", - " except json.JSONDecodeError:\n", - " # Attempt to parse as JSONL line by line\n", - " for line in content.splitlines():\n", - " line = line.strip()\n", - " if not line:\n", - " continue\n", - " try:\n", - " metrics.append(json.loads(line))\n", - " except json.JSONDecodeError:\n", - " print(f\"Skipping invalid JSON line in {file_name}: {line[:50]}...\")\n", - " except Exception as e:\n", - " print(f\"Error reading {file_name}: {e}\")\n", - " return metrics\n", - "\n", - "# --- Fetch all metrics from SAP AI Core ---\n", - "def fetch_all_metrics():\n", - " request_url = f\"{AICORE_BASE_URL}/v2/lm/evaluationMetrics\"\n", - " resp = requests.get(request_url, headers=_get_headers())\n", - " resp.raise_for_status()\n", - " return resp.json().get(\"resources\", [])\n", - "\n", - "# --- Create or fetch a metric ---\n", - "def create_or_get_metric(custom_metric, user_metric_id=None):\n", - " all_metrics = fetch_all_metrics()\n", - "\n", - " # 1️⃣ User-supplied ID lookup\n", - " if user_metric_id:\n", - " for m in all_metrics:\n", - " if m.get(\"id\") == user_metric_id:\n", - " print(f\"✅ Metric already exists by ID: {user_metric_id}\")\n", - " return user_metric_id\n", - " print(f\"⚠️ User metric ID {user_metric_id} not found, will only include if valid later\")\n", - "\n", - " # 2️⃣ Check by scenario, name, version\n", - " scenario = custom_metric.get(\"scenario\")\n", - " name = custom_metric.get(\"name\")\n", - " version = custom_metric.get(\"version\")\n", - " if not all([scenario, name, version]):\n", - " raise ValueError(\"Metric must include 'scenario', 'name', and 'version'\")\n", - "\n", - " for m in all_metrics:\n", - " if (m.get(\"scenario\") == scenario and\n", - " m.get(\"name\") == name and\n", - " m.get(\"version\") == version):\n", - " metric_id = m.get(\"id\")\n", - " print(f\"✅ Metric already exists: {scenario}/{name} v{version}, ID = {metric_id}\")\n", - " return metric_id\n", - "\n", - " # 3️⃣ Create metric if not found\n", - " request_url = f\"{AICORE_BASE_URL}/v2/lm/evaluationMetrics\"\n", - " required_fields = [\"scenario\", \"name\", \"version\", \"evaluationMethod\", \"metricType\"]\n", - " for f in required_fields:\n", - " if f not in custom_metric:\n", - " raise ValueError(f\"❌ Missing required field: {f}\")\n", - "\n", - " resp = requests.post(request_url, headers=_get_headers(), json=custom_metric)\n", - " resp.raise_for_status()\n", - " metric_id = resp.json().get(\"id\")\n", - " print(f\"✅ Metric created successfully: {name} v{version}, ID = {metric_id}\")\n", - " return metric_id\n", - "\n", - "# --- Main pipeline ---\n", - "CUSTOM_METRIC_FOLDER = \"./PUT_YOUR_CUSTOM_METRIC_HERE\"\n", - "user_metric_ids = \"d1868b00-1601-407a-92cd-0b9065682d1f,dbf56851-8444-45d3-a0c1-adbe210c7e771\" # set by user if needed\n", - "\n", - "# 1️⃣ Load all metrics from JSON/JSONL\n", - "custom_metric_list = load_all_metrics(CUSTOM_METRIC_FOLDER)\n", - "\n", - "# 2️⃣ Create/fetch metrics from SAP AI Core\n", - "metric_ids = []\n", - "for metric in custom_metric_list:\n", - " try:\n", - " metric_id = create_or_get_metric(metric)\n", - " metric_ids.append(metric_id)\n", - " except ValueError as e:\n", - " print(f\"Skipping metric due to error: {e}\")\n", - "\n", - "# 3️⃣ Validate user_metric_ids separately if provided\n", - "if user_metric_ids and user_metric_ids.strip():\n", - " all_metrics = fetch_all_metrics()\n", - " # Split comma-separated IDs and strip whitespace\n", - " for uid in [uid.strip() for uid in user_metric_ids.split(\",\")]:\n", - " if any(m.get(\"id\") == uid for m in all_metrics):\n", - " metric_ids.append(uid)\n", - " else:\n", - " print(f\"⚠️ User metric ID {uid} does not exist in AI Core, skipping.\")\n", - "# 4️⃣ Convert to comma-separated string\n", - "custom_metric_ids_str = \",\".join(metric_ids)\n", - "print(\"✅ All processed metric IDs:\", custom_metric_ids_str)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Select your Models\n", - " \n", - "Tick the metrics you wish to use. If the widget does not load properly, you can manually fill in the string `selected_models_str`\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "application/vnd.jupyter.widget-view+json": { - "model_id": "1ab806f2179a4deab49fd0cd761a863a", - "version_major": 2, - "version_minor": 0 - }, - "text/plain": [ - "VBox(children=(Label(value='Please choose which LLM models you want to run:', layout=Layout(margin='10px 0px 1…" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "import requests\n", - "import textwrap\n", - "from functools import partial\n", - "from ipywidgets import Checkbox, VBox, HBox, Output, Label, Layout\n", - "from IPython.display import display\n", - "\n", - "\n", - "# --- Call the API ---\n", - "GET_MODELS_ENDPOINT = '/v2/lm/scenarios/foundation-models/models'\n", - "request_url = f\"{AICORE_BASE_URL}{GET_MODELS_ENDPOINT}\"\n", - "headers = _get_headers() # your existing function\n", - "response = requests.get(request_url, headers=headers)\n", - "models_data = response.json()\n", - "# --- Extract model options ---\n", - "llm_options = []\n", - "resources = models_data.get(\"resources\", [])\n", - "for m in resources:\n", - " display_name = m.get(\"displayName\") or m.get(\"name\") or m.get(\"model\", \"Unknown\")\n", - " model_id = m.get(\"model\", m.get(\"id\", \"unknown\"))\n", - "\n", - " versions = m.get(\"versions\", [])\n", - " version = \"latest\"\n", - " if versions:\n", - " latest = next((v for v in versions if v.get(\"isLatest\")), versions[0])\n", - " version = latest.get(\"name\") or \"latest\"\n", - "\n", - " label = f\"{display_name} ({model_id}, v:{version})\"\n", - " value = f\"{model_id}:{version}\"\n", - " llm_options.append({\"label\": label, \"value\": value})\n", - "\n", - "# --- Selection state ---\n", - "selected_models = []\n", - "selected_models_str = \"\" # <-- your comma-separated string\n", - "output = Output(layout=Layout(border=\"1px solid black\", height=\"150px\", overflow=\"auto\", width=\"900px\"))\n", - "\n", - "def update_output():\n", - " global selected_models_str\n", - " selected_models_str = \",\".join(selected_models)\n", - " with output:\n", - " output.clear_output(wait=True)\n", - " if selected_models_str:\n", - " wrapped_text = textwrap.fill(f\"Selected models: {selected_models_str}\", width=80)\n", - " output.append_stdout(wrapped_text + \"\\n\")\n", - "\n", - "# --- Callback handler ---\n", - "def on_checkbox_change(model_value, change):\n", - " # Ignore redundant triggers\n", - " if change[\"old\"] == change[\"new\"]:\n", - " return\n", - "\n", - " if approach == \"prompt_registry\":\n", - " # Multiple selections allowed\n", - " if change[\"new\"]:\n", - " if model_value not in selected_models:\n", - " selected_models.append(model_value)\n", - " else:\n", - " if model_value in selected_models:\n", - " selected_models.remove(model_value)\n", - " else:\n", - " # Single selection only\n", - " if change[\"new\"]:\n", - " # Uncheck all other boxes\n", - " for cb in checkboxes:\n", - " if cb.model_value != model_value:\n", - " cb.unobserve_all()\n", - " cb.value = False\n", - " cb.observe(partial(on_checkbox_change, cb.model_value), names=\"value\")\n", - " selected_models.clear()\n", - " selected_models.append(model_value)\n", - " else:\n", - " if model_value in selected_models:\n", - " selected_models.remove(model_value)\n", - "\n", - " update_output()\n", - "\n", - "# --- Create checkboxes + labels ---\n", - "checkboxes = []\n", - "checkbox_rows = []\n", - "for opt in llm_options:\n", - " cb = Checkbox(value=False, indent=False, layout=Layout(width=\"30px\"))\n", - " cb.model_value = opt[\"value\"] # attach model identifier\n", - " cb.observe(partial(on_checkbox_change, opt[\"value\"]), names=\"value\")\n", - " lbl = Label(value=opt[\"label\"], layout=Layout(width=\"850px\"))\n", - " checkboxes.append(cb)\n", - " checkbox_rows.append(HBox([cb, lbl]))\n", - "\n", - "# --- Display ---\n", - "header = Label(value=\"Please choose which LLM models you want to run:\", layout=Layout(margin=\"10px 0px 10px 0px\"))\n", - "display(VBox([header] + checkbox_rows + [output]))\n", - "\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Selected models string: gpt-4o:2024-08-06\n" - ] - } - ], - "source": [ - "# Manual selection of models\n", - "# selected_models_str=\"gpt-4o:2024-08-06\"\n", - "print(\"Selected models string:\", selected_models_str)\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Create orchestration registry config (Only needed if you chose orchestraion registry approach)" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "{'message': 'Orchestration config updated successfully.', 'id': '96f7424b-496d-4c47-8f14-0c6562e7744f', 'scenario': 'genai-evaluations', 'name': 'genai-eval-test', 'version': '1.0.0'}\n" - ] - } - ], - "source": [ - "def create_orchestration_registry_config():\n", - " headers = _get_headers()\n", - " CREATE_ORCHESTRATION_REGISTRY = '/v2/registry/v2/orchestrationConfigs'\n", - " request_url = f\"{AICORE_BASE_URL}{CREATE_ORCHESTRATION_REGISTRY}\"\n", - " model_name,model_version=selected_models_str.split(\":\")\n", - " request_body = {\n", - " \"name\": \"genai-eval-test\",\n", - " \"version\": \"1.0.0\",\n", - " \"scenario\": \"genai-evaluations\",\n", - " \"spec\": {\n", - " \"modules\": {\n", - " \"prompt_templating\": {\n", - " \"model\": {\n", - " \"name\": model_name,\n", - " \"version\": model_version\n", - " },\n", - " \"prompt\": PROMPT_TEMPLATE\n", - " }\n", - " }\n", - " }\n", - " }\n", - " try:\n", - " response = requests.post(\n", - " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", - " )\n", - " if(response.status_code != 200):\n", - " print(response.json())\n", - " raise\n", - " result = response.json()\n", - " print(result)\n", - " return result['id']\n", - " except:\n", - " logging.error(\"Error occurred while attempting to create a orchestration registry id\")\n", - " raise\n", - "orchestration_registry_id = create_orchestration_registry_config()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Start Evaluation Run (Step 3)" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Selected metrics: BERT Score,Pointwise Conciseness,2b3cc135-a031-4d93-8641-1f3833797034,9b349f8e-cd39-486d-809c-3dcfa3b15ac7\n", - "Selected models: gpt-4o:2024-08-06\n" - ] - } - ], - "source": [ - "\n", - "import json\n", - "test_data_path = f\"testdata/{DATASET_NAME}\" # specify the test data path here. For the full folder just specifying testdata will work\n", - "test_datasets = json.dumps({'path': test_data_path, 'type': 'csv'})\n", - "metrics_list = \",\".join([selected_metrics_str,custom_metric_ids_str])\n", - "models_list = selected_models_str\n", - "print(f\"Selected metrics: {metrics_list}\")\n", - "print(f\"Selected models: {models_list}\")\n", - "#variable_mapping = json.dumps({'prompt/question': 'data/topic'}) # to map the question prompt variable to the entry in dataset.\n", - "# orchestration_deployment_url = deployment_url # needs to specify this to use a specific deployment id\n", - "orchestration_deployment_url = \"https://api.ai.internalprod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/d92895b42bdd6175\"\n", - "repetitions = \"1\"" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "{'id': 'fc21348f-4232-4579-8d21-062b88328ef0', 'message': 'Configuration created'}\n" - ] - } - ], - "source": [ - "# creating an AICORE Configuration.\n", - "import requests\n", - "\n", - "request_body = {\n", - " \"name\": \"genai-eval-conf\",\n", - " \"scenarioId\": \"genai-evaluations\",\n", - " \"executableId\": \"genai-evaluations-simplified\",\n", - " \"inputArtifactBindings\": [\n", - " {\n", - " \"key\": \"datasetFolder\",\n", - " \"artifactId\": artifact_id\n", - " }\n", - " ],\n", - " \"parameterBindings\": [\n", - " {\n", - " \"key\": \"repetitions\",\n", - " \"value\": repetitions\n", - " },\n", - " {\n", - " \"key\": \"orchestrationDeploymentURL\",\n", - " \"value\": orchestration_deployment_url\n", - " },\n", - " {\n", - " \"key\": \"metrics\",\n", - " \"value\": metrics_list\n", - " },\n", - " {\n", - " \"key\": \"testDataset\",\n", - " \"value\": test_datasets\n", - " },\n", - " {\n", - " \"key\": \"promptTemplate\",\n", - " \"value\": prompt_template_id if approach == \"prompt_registry\" else \"\"\n", - " },\n", - " {\n", - " \"key\": \"models\",\n", - " \"value\": models_list if approach == \"prompt_registry\" else \"\"\n", - " },\n", - " {\n", - " \"key\": \"orchestrationRegistryIds\",\n", - " \"value\": orchestration_registry_id if approach == \"orchestration_registry\" else \"\"\n", - " }\n", - " ]\n", - "}\n", - "\n", - "def create_aicore_configuration():\n", - " headers = _get_headers()\n", - " GET_CONFIGURATIONS_ENDPOINT = '/v2/lm/configurations'\n", - " request_url = f\"{AICORE_BASE_URL}{GET_CONFIGURATIONS_ENDPOINT}\"\n", - " try:\n", - " response = requests.post(\n", - " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", - " )\n", - " print(response)\n", - " if(response.status_code != 201):\n", - " raise\n", - " result = response.json()\n", - " print(result)\n", - " return result['id']\n", - " except:\n", - " logging.error(\"Error occurred while attempting to create a Configuration\")\n", - " raise\n", - " \n", - "configuration_id = create_aicore_configuration()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Evaluation Execution Creation\n", - "Once Configration is create, we create the AI Core execution which triggers the evaluation workload.\n" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "response received is \n", - "{'id': 'e8d0adb660289477', 'message': 'Execution scheduled', 'status': 'UNKNOWN', 'targetStatus': 'COMPLETED'}\n" - ] - } - ], - "source": [ - "# create an execution with the created configuration.\n", - "\n", - "import requests\n", - "def create_execution():\n", - " headers = _get_headers()\n", - " GET_EXECUTIONS_ENDPOINT = '/v2/lm/executions'\n", - " request_url = f\"{AICORE_BASE_URL}{GET_EXECUTIONS_ENDPOINT}\"\n", - " request_body = {\"configurationId\" : configuration_id} \n", - " try:\n", - " response = requests.post(\n", - " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", - " )\n", - " print(\"response received is \", response)\n", - " result = response.json()\n", - " print(result)\n", - " return result['id']\n", - " except:\n", - " logging.error(\"Error occurred while attempting to create an execution\")\n", - " raise\n", - " \n", - "\n", - "execution_id = create_execution()" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "response received is \n" - ] - }, - { - "data": { - "text/plain": [ - "{'id': 'e8d0adb660289477',\n", - " 'createdAt': '2025-11-14T05:36:34Z',\n", - " 'modifiedAt': '2025-11-14T05:36:34Z',\n", - " 'status': 'COMPLETED',\n", - " 'scenarioId': 'genai-evaluations',\n", - " 'configurationId': 'fc21348f-4232-4579-8d21-062b88328ef0',\n", - " 'targetStatus': 'COMPLETED',\n", - " 'submissionTime': '2025-11-14T05:36:56Z',\n", - " 'startTime': '2025-11-14T05:36:56Z',\n", - " 'completionTime': '2025-11-14T05:52:15Z',\n", - " 'configurationName': 'genai-eval-conf',\n", - " 'executableId': 'genai-evaluations-simplified',\n", - " 'outputArtifacts': [{'id': '80dfbccc-b559-496c-a03a-4d21b18c11be',\n", - " 'createdAt': '2025-11-14T05:52:06Z',\n", - " 'modifiedAt': '2025-11-14T05:52:06Z',\n", - " 'url': 'ai://default/e8d0adb660289477/evaluation_result',\n", - " 'name': 'evaluation_result',\n", - " 'kind': 'resultset',\n", - " 'description': '',\n", - " 'scenarioId': 'genai-evaluations',\n", - " 'executionId': 'e8d0adb660289477'},\n", - " {'id': '2cb535ce-33d0-482d-a673-ba66b0e1df6f',\n", - " 'createdAt': '2025-11-14T05:50:23Z',\n", - " 'modifiedAt': '2025-11-14T05:50:23Z',\n", - " 'url': 'ai://default/e8d0adb660289477/debug_files',\n", - " 'name': 'debug_files',\n", - " 'kind': 'resultset',\n", - " 'description': '',\n", - " 'scenarioId': 'genai-evaluations',\n", - " 'executionId': 'e8d0adb660289477'}]}" - ] - }, - "execution_count": 20, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# get execution status\n", - "import requests\n", - "def get_execution_status(execution_id):\n", - " headers = _get_headers()\n", - " LOG_EXECUTIONS_ENDPOINT = f'/v2/lm/executions/{execution_id}'\n", - " request_url = f\"{AICORE_BASE_URL}{LOG_EXECUTIONS_ENDPOINT}\"\n", - " try:\n", - " response = requests.get(\n", - " request_url, headers=headers, timeout=120\n", - " )\n", - " print(\"response received is \", response)\n", - " result = response.json()\n", - " return result\n", - " except:\n", - " logging.error(\"Error occurred while attempting to get execution status\")\n", - " raise\n", - " \n", - "\n", - "get_execution_status(execution_id)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "\n", - "1. Run the following cells only when the status field in the Execution response is \"COMPLETED\" to view the results.\n", - "2. The status field progresses through different states over time: UNKNOWN → PENDING → RUNNING → COMPLETED. Ensure it reaches COMPLETED before proceeding.\n", - "\n", - "\n", - "Note: The targetStatus will always be COMPLETED from the start, as it represents the intended final state of the Execution. Do not confuse it with the actual status field.\n", - "" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Evaluation Result (Step 4)\n", - "The evaluation job produces two outputs\n", - "1. A SQLite DB file which stores the orchestration input, orchestration output, values for all the metrics calculated for this orchestration output and statistics such as latency for this orchestration output. These metric values are called raw metric values. This SQLite DB file is stored in the object store as an AI Core output artifact.\n", - "2. A set of metrics whose values are aggregated from the raw metric values. The aggregate metrics are stored in the tracking service. The user-defined tags along with the run names are stored with the metrics.\n", - "Post execution completion user can see the runs generated by the workload along with the aggregate metrics by calling the tracking api as show below" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "response received is \n" - ] - } - ], - "source": [ - "# Get aggregate metrics using execution id\n", - "import requests\n", - "def retrieve_aggregate_metrics(execution_id):\n", - " headers = _get_headers()\n", - " GET_METRICS_ENDPOINT = f'/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/child-of={execution_id}'\n", - " request_url = f\"{AICORE_BASE_URL}{GET_METRICS_ENDPOINT}\"\n", - " try:\n", - " response = requests.get(request_url, headers=headers, timeout=120)\n", - " print(\"response received is \", response)\n", - " result = response.json()\n", - " return result\n", - " except:\n", - " logging.error(\"Error occurred while attempting to retreive aggeregate metrics for the run\")\n", - " raise\n", - "\n", - "runs_data = retrieve_aggregate_metrics(execution_id)\n" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
metrics_nameBERT Score/F1/meanBERT Score/F1/medianBERT Score/F1/p90BERT Score/F1/p95BERT Score/F1/stddevBERT Score/Precision/meanBERT Score/Precision/medianBERT Score/Precision/p90BERT Score/Precision/p95BERT Score/Precision/stddevBERT Score/Recall/meanBERT Score/Recall/medianBERT Score/Recall/p90BERT Score/Recall/p95BERT Score/Recall/stddevPointwise Conciseness/1/countPointwise Conciseness/2/countPointwise Conciseness/3/countPointwise Conciseness/4/countPointwise Conciseness/5/countPointwise Conciseness/meanPointwise Conciseness/medianPointwise Conciseness/p90Pointwise Conciseness/p95Pointwise Conciseness/stddevcompletion_tokens/sumgroundedness/0/countgroundedness/1/countgroundedness/2/countgroundedness/3/countgroundedness/4/countgroundedness/5/countlatency/averageprompt_tokens/sumsubmission/sum
model
gpt-4o0.4665450.4869340.5501160.5614220.0772350.4145020.4457250.4984970.5051580.0810790.5394360.5481890.638810.6629220.0810680.00.013.022.014.04.0204084.05.05.00.74971619553.00.00.00.00.00.00.0105.2847391608.049.0
" - ], - "text/plain": [ - "" - ] - }, - "execution_count": 22, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "import pandas as pd\n", - "from IPython.display import HTML\n", - "\n", - "def get_model_from_run(run):\n", - " for tag in run.get(\"tags\", []):\n", - " if tag.get(\"name\") == \"evaluation.ai.sap.com/model\":\n", - " return tag.get(\"value\")\n", - "\n", - "def aggregate_metrics_by_model(runs_list):\n", - " transformed_data = []\n", - " for run in runs_list:\n", - " model = get_model_from_run(run)\n", - " for metric in run[\"metrics\"]:\n", - " output_json = {\n", - " \"model\": model,\n", - " \"metrics_name\": metric.get(\"name\"),\n", - " \"metric_value\": metric.get(\"value\")\n", - " }\n", - " transformed_data.append(output_json)\n", - " return transformed_data\n", - "\n", - "\n", - "def create_metrics_pivot_table(transformed_data):\n", - " \"\"\"\n", - " Creates a pivot table where rows are models and columns are metrics.\n", - " \n", - " Args:\n", - " transformed_data: List of dictionaries with 'model', 'metrics_name', 'metric_value'\n", - " \n", - " Returns:\n", - " DataFrame with models as rows and metrics as columns\n", - " \"\"\"\n", - " # Convert list of dictionaries to DataFrame\n", - " df = pd.DataFrame(transformed_data)\n", - " \n", - " # Create pivot table\n", - " pivot_table = df.pivot_table(\n", - " index='model',\n", - " columns='metrics_name',\n", - " values='metric_value',\n", - " aggfunc='first' # Use 'first' to get the single value, or 'mean' if there are duplicates\n", - " )\n", - " \n", - " return pivot_table\n", - "\n", - "transformed_data = aggregate_metrics_by_model(runs_data['resources'])\n", - "metrics_pivot = create_metrics_pivot_table(transformed_data)\n", - "\n", - "HTML(metrics_pivot.to_html())" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Process Your Execution Results\n", - "The next steps will take your evaluation run and process it such that it will give you a point based ranking of which particular model did the best based on the metrics you selected" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
BERT Score_scorePointwise Conciseness_score
model
gpt-4o-0.0530110.510204
" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/html": [ - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
modeltotal_scorerank
0gpt-4o0.2285961
" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "import pandas as pd\n", - "from IPython.display import HTML\n", - "\n", - "# Scoring logic depends on \"scoring_type\"\n", - "# \"weight\" represents the relative weight of this metric to all SELECTED metrics\n", - "METRICS_SCORING_TYPE_MAPPING = {\n", - " \"Content Filter on Input\": {\n", - " \"scoring_type\": \"bool-false\", # False is good\n", - " \"weight\": 1\n", - " },\n", - " \"Content Filter on Output\": {\n", - " \"scoring_type\": \"bool-false\", # False is good\n", - " \"weight\": 1\n", - " },\n", - " \"Pointwise Instruction Following\": {\n", - " \"scoring_type\": \"num_1_to_5\",\n", - " \"weight\": 1\n", - " },\n", - " \"Pointwise Answer Relevance\": {\n", - " \"scoring_type\": \"num_1_to_5\",\n", - " \"weight\": 1\n", - " },\n", - " \"Pointwise Conciseness\": {\n", - " \"scoring_type\": \"num_1_to_5\",\n", - " \"weight\": 1\n", - " },\n", - " \"Pointwise Correctness\": {\n", - " \"scoring_type\": \"num_1_to_5\",\n", - " \"weight\": 1\n", - " },\n", - " \"BLEU\": {\n", - " \"scoring_type\": \"num_0_to_1\",\n", - " \"weight\": 1\n", - " },\n", - " \"ROUGE\": {\n", - " \"scoring_type\": \"num_0_to_1\",\n", - " \"weight\": 1\n", - " },\n", - " \"BERT Score\": {\n", - " \"scoring_type\": \"F1/Precision/Recall num_0_to_1\",\n", - " \"weight\": 1\n", - " }\n", - "}\n", - "\n", - "def calculate_bool_metric_score(pivot_df, metric_base_name, true_is_good):\n", - " \"\"\"\n", - " Calculate scores for boolean metrics based on False/True counts.\n", - " \n", - " Args:\n", - " pivot_df: DataFrame with models as rows and metrics as columns\n", - " metric_base_name: Base name of the metric (without /False/count or /True/count)\n", - " true_is_good: Boolean indicating if True is considered a good outcome\n", - " \n", - " Returns:\n", - " Series with boolean metric scores per model (scaled to -1 to 1)\n", - " \"\"\"\n", - " false_col = f\"{metric_base_name}/False/count\"\n", - " true_col = f\"{metric_base_name}/True/count\"\n", - " \n", - " false_values = pivot_df[false_col] if false_col in pivot_df.columns else 0\n", - " true_values = pivot_df[true_col] if true_col in pivot_df.columns else 0\n", - " total_values = true_values + false_values\n", - "\n", - " score = ((false_values * 1) + (true_values * -1)) / total_values\n", - "\n", - " if true_is_good:\n", - " score = 0 - score\n", - "\n", - " return score\n", - "\n", - "def calculate_numeric_metric_score(pivot_df, metric_base_name, range_min=0, range_max=1):\n", - " \"\"\"\n", - " Calculate scores for numeric metrics with /mean\n", - " The mean is normalized to a score between -1 and 1 using the provided range.\n", - " \n", - " Args:\n", - " pivot_df: DataFrame with models as rows and metrics as columns\n", - " metric_base_name: Base name of the metric (without suffixes)\n", - " range_min: Minimum possible value of the metric\n", - " range_max: Maximum possible value of the metric\n", - " \n", - " Returns:\n", - " Series with numeric metric scores per model (scaled to -1 to 1)\n", - " \"\"\"\n", - " mean_col = f\"{metric_base_name}/mean\"\n", - " \n", - " if mean_col not in pivot_df.columns:\n", - " return pd.Series(0.0, index=pivot_df.index)\n", - " \n", - " mean_values = pivot_df[mean_col]\n", - " \n", - " # Linear normalization from [range_min, range_max] to [0, 1]\n", - " normalized = (mean_values - range_min) / (range_max - range_min)\n", - " \n", - " # Scale to [-1, 1]\n", - " score = (normalized * 2) - 1\n", - " \n", - " return score\n", - "\n", - "def calculate_bert_score(pivot_df, metric_base_name):\n", - " \"\"\"\n", - " Calculate BERT Score by averaging F1, Precision, and Recall scores.\n", - " \n", - " Args:\n", - " pivot_df: DataFrame with models as rows and metrics as columns\n", - " metric_base_name: Base name \"BERT Score\"\n", - " \n", - " Returns:\n", - " Series with BERT scores per model (scaled to -1 to 1)\n", - " \"\"\"\n", - " f1_col = f\"{metric_base_name}/F1/mean\"\n", - " precision_col = f\"{metric_base_name}/Precision/mean\"\n", - " recall_col = f\"{metric_base_name}/Recall/mean\"\n", - " \n", - " scores = []\n", - " for col in [f1_col, precision_col, recall_col]:\n", - " if col in pivot_df.columns:\n", - " scores.append(pivot_df[col])\n", - " \n", - " if not scores:\n", - " return pd.Series(0.0, index=pivot_df.index)\n", - " \n", - " # Average the three metrics (already in 0 to 1 range)\n", - " avg_score = sum(scores) / len(scores)\n", - " \n", - " # Scale to [-1, 1]\n", - " score = (avg_score * 2) - 1\n", - " \n", - " return score\n", - "\n", - "def find_unique_metrics_in_pivot(pivot_df):\n", - " \"\"\"\n", - " Identify unique metric base names present in the pivot table.\n", - " \n", - " Args:\n", - " pivot_df: DataFrame with models as rows and metrics as columns\n", - " \"\"\"\n", - " # Extract unique metric names from pivot table columns\n", - " unique_metrics = set()\n", - " for col in pivot_df.columns:\n", - " # Extract base metric name by removing suffixes\n", - " base_name = col\n", - " for suffix in ['/False/count', '/True/count', '/F1_score/mean','/Precision_score/mean', \n", - " '/Recall_score/mean','/mean','/median', '/p90', '/p95', '/stddev']:\n", - " if suffix in base_name and \"BERT Score\" not in base_name:\n", - " base_name = base_name.replace(suffix, '')\n", - " unique_metrics.add(base_name)\n", - " break\n", - " if base_name.startswith(\"BERT Score/\"):\n", - " base_name = \"BERT Score\"\n", - " unique_metrics.add(base_name)\n", - " if not unique_metrics:\n", - " raise ValueError(\"No valid metrics found in pivot table\")\n", - " return unique_metrics\n", - "\n", - "\n", - "def rank_models(pivot_df, unique_metrics=None):\n", - " \"\"\"\n", - " Rank models based on metrics present in the pivot table.\n", - " \n", - " Args:\n", - " pivot_df: DataFrame with models as rows (index) and metrics as columns\n", - " \n", - " Returns:\n", - " DataFrame with model rankings and scores\n", - " \"\"\" \n", - " # Calculate total weight for metrics present in pivot table\n", - " total_weight = sum(METRICS_SCORING_TYPE_MAPPING[m][\"weight\"] for m in unique_metrics)\n", - " \n", - " # Initialize total score\n", - " total_scores = pd.Series(0.0, index=pivot_df.index)\n", - " \n", - " # Process each metric found in the pivot table\n", - " for metric_name in unique_metrics:\n", - " config = METRICS_SCORING_TYPE_MAPPING[metric_name]\n", - " scoring_type = config[\"scoring_type\"]\n", - " weight = config[\"weight\"] / total_weight\n", - " \n", - " if scoring_type == \"bool-false\":\n", - " # False is good (True is bad)\n", - " metric_score = calculate_bool_metric_score(pivot_df, metric_name, true_is_good=False)\n", - " total_scores += metric_score * weight\n", - " \n", - " elif scoring_type == \"bool-true\":\n", - " # True is good (False is bad)\n", - " metric_score = calculate_bool_metric_score(pivot_df, metric_name, true_is_good=True)\n", - " total_scores += metric_score * weight\n", - " \n", - " elif scoring_type == \"num_1_to_5\":\n", - " metric_score = calculate_numeric_metric_score(pivot_df, metric_name, range_min=1, range_max=5)\n", - " total_scores += metric_score * weight\n", - " \n", - " elif scoring_type == \"num_0_to_1\":\n", - " metric_score = calculate_numeric_metric_score(pivot_df, metric_name, range_min=0, range_max=1)\n", - " total_scores += metric_score * weight\n", - " \n", - " elif scoring_type == \"F1/Precision/Recall num_0_to_1\":\n", - " # BERT Score\n", - " metric_score = calculate_bert_score(pivot_df, metric_name)\n", - " total_scores += metric_score * weight\n", - " \n", - " # Create results DataFrame\n", - " results_df = pd.DataFrame({\n", - " 'model': pivot_df.index,\n", - " 'total_score': total_scores.values\n", - " })\n", - " \n", - " # Rank models (higher score = better rank)\n", - " results_df['rank'] = results_df['total_score'].rank(ascending=False, method='min').astype(int)\n", - " results_df = results_df.sort_values('rank')\n", - " \n", - " return results_df\n", - "\n", - "def get_detailed_scores(pivot_df, unique_metrics):\n", - " \"\"\"\n", - " Get detailed breakdown of scores per metric for each model.\n", - " \n", - " Args:\n", - " pivot_df: DataFrame with models as rows and metrics as columns\n", - " \n", - " Returns:\n", - " DataFrame with detailed scores per metric\n", - " \"\"\"\n", - " detailed_scores = pd.DataFrame(index=pivot_df.index)\n", - " \n", - " # Process each metric in the mapping\n", - " for metric_name in unique_metrics:\n", - " scoring_type = METRICS_SCORING_TYPE_MAPPING[metric_name][\"scoring_type\"]\n", - " \n", - " if scoring_type == \"bool-false\":\n", - " detailed_scores[f\"{metric_name}_score\"] = calculate_bool_metric_score(pivot_df, metric_name, true_is_good=False)\n", - " \n", - " elif scoring_type == \"bool-true\":\n", - " detailed_scores[f\"{metric_name}_score\"] = calculate_bool_metric_score(pivot_df, metric_name, true_is_good=True)\n", - " \n", - " elif scoring_type == \"num_1_to_5\":\n", - " detailed_scores[f\"{metric_name}_score\"] = calculate_numeric_metric_score(pivot_df, metric_name, range_min=1, range_max=5)\n", - " \n", - " elif scoring_type == \"num_0_to_1\":\n", - " detailed_scores[f\"{metric_name}_score\"] = calculate_numeric_metric_score(pivot_df, metric_name, range_min=0, range_max=1)\n", - " \n", - " elif scoring_type == \"F1/Precision/Recall num_0_to_1\":\n", - " detailed_scores[f\"{metric_name}_score\"] = calculate_bert_score(pivot_df, metric_name)\n", - " \n", - " return detailed_scores\n", - "\n", - "unique_metrics = find_unique_metrics_in_pivot(metrics_pivot)\n", - "\n", - "# Get detailed scores breakdown\n", - "detailed = get_detailed_scores(metrics_pivot, unique_metrics)\n", - "display(HTML(detailed.to_html()))\n", - "\n", - "# Rank models\n", - "ranking = rank_models(metrics_pivot, unique_metrics)\n", - "display(HTML(ranking.to_html()))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To further drill down , User can also download the SQLite DB file from object storage and analyse the results(instance level metrics, logs etc) locally." - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Downloading e8d0adb660289477/evaluation_result/results.db to results-new\\results.db\n" - ] - } - ], - "source": [ - "# download the result artifacts from Object store.\n", - "import boto3\n", - "\n", - "def download_all_objects(prefix, destination_folder):\n", - " \"\"\"\n", - " Recursively download all objects from an S3 bucket starting with a specific prefix.\n", - "\n", - " :param bucket_name: Name of the S3 bucket.\n", - " :param prefix: Prefix to filter objects in the bucket.\n", - " :param destination_folder: Local folder to save the downloaded files.\n", - " \"\"\"\n", - " s3_client = boto3.client(\n", - " 's3',\n", - " aws_access_key_id=AWS_ACCESS_KEY,\n", - " aws_secret_access_key=AWS_SECRET_ACCESS_KEY,\n", - " region_name=AWS_REGION\n", - " )\n", - "\n", - " # Ensure the destination folder exists\n", - " if not os.path.exists(destination_folder):\n", - " os.makedirs(destination_folder)\n", - "\n", - " # Paginate through objects\n", - " paginator = s3_client.get_paginator('list_objects_v2')\n", - " pages = paginator.paginate(Bucket=AWS_BUCKET_ID, Prefix=prefix)\n", - "\n", - " for page in pages:\n", - " if 'Contents' in page:\n", - " for obj in page['Contents']:\n", - " key = obj['Key']\n", - " local_file_path = os.path.join(destination_folder, os.path.relpath(key, prefix))\n", - "\n", - " # Ensure the local directory structure exists\n", - " local_directory = os.path.dirname(local_file_path)\n", - " if not os.path.exists(local_directory):\n", - " os.makedirs(local_directory)\n", - "\n", - " # Download the object\n", - " print(f\"Downloading {key} to {local_file_path}\")\n", - " s3_client.download_file(AWS_BUCKET_ID, key, local_file_path)\n", - "\n", - "\n", - "# Download the evaluation results from the object store. Look at execution status under \"outputArtifacts\" key to see the 'url'\n", - "# which shows the data path of where your output results are stored\n", - "EXECUTION_ID = execution_id\n", - "sqlite_db_prefix = f'{EXECUTION_ID}/evaluation_result/' # change the prefix based on where your output artifact is stored in the bucket.\n", - "destination_folder = 'results-new'\n", - "\n", - "download_all_objects(sqlite_db_prefix, destination_folder)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "NOTE: The below Cell shows results of top 10 rows of the Evaluation Results across all SQLite tables. IF you wish to see all the entries you can comment the line saying df.head(10) in the below cell or modify the number accordingly." - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "\n", - "\n", - "
\n", - "\n", - "
\n", - "

Table: run

\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
idnameconfigtagscreated_atupdated_at
960997700d79409680974700c0d67825Run-genai-eval-test-gpt-4o-2024-08-06{\"modules\": {\"prompt_templating\": {\"prompt\": {\"template\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\", \"role\": \"user\"}]}, \"model\": {\"name\": \"gpt-4o\", \"version\": \"2024-08-06\", \"timeout\": 600, \"max_retries\": 2}}}}{}2025-11-14 05:38:21.9641282025-11-14 05:38:21.964132
\n", - "
\n", - " \n", - "
\n", - "

Table: configuration

\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
idtest_datasetsmetricsvariable_mappingtagsorchestration_deployment_urlrepetitionsmetric_templatescreated_atupdated_at
076096962816435e8e03c0aed7fa4743{\"path\": \"testdata/medicalqna_dataset.csv\", \"type\": \"csv\"}[\"BERT Score\", \"Pointwise Conciseness\", \"2b3cc135-a031-4d93-8641-1f3833797034\", \"9b349f8e-cd39-486d-809c-3dcfa3b15ac7\"]{}{}https://api.ai.internalprod.eu-central-1.aws.ml.hana.ondemand.com/v2/inference/deployments/d92895b42bdd61751[{\"evaluationMethod\": \"computed\", \"scenario\": \"genai-evaluations\", \"createdAt\": \"0001-01-01 00:00:00+00:00\", \"managedBy\": \"imperative\", \"metricType\": \"evaluation\", \"systemPredefined\": true, \"id\": \"93a16045-d577-4132-8481-9497cb205961\", \"name\": \"BERT Score\", \"description\": \"Bertscore is a metric for evaluating the quality of text generation by comparing it to reference texts. It leverages BERT, a pre-trained transformer model, to compute contextual embeddings for each token in both the candidate and reference sentences.\", \"version\": \"1.0.0\", \"includeProperties\": [\"reference\"], \"additionalProperties\": {\"variables\": [], \"output_type\": \"numerical\", \"supported_values\": [0, 1], \"experimental\": false}}, {\"evaluationMethod\": \"llm-as-a-judge\", \"scenario\": \"genai-evaluations\", \"createdAt\": \"0001-01-01 00:00:00+00:00\", \"managedBy\": \"imperative\", \"metricType\": \"evaluation\", \"systemPredefined\": true, \"id\": \"95c03e1b-3938-42dd-bc69-3ec5cd0e5e18\", \"name\": \"Pointwise Conciseness\", \"description\": \"Assess the model's response is a short and concise answer to user prompt.\", \"version\": \"1.0.0\", \"spec\": {\"promptType\": \"structured\", \"configuration\": {\"modelConfiguration\": {\"name\": \"gpt-4.1\", \"version\": \"2025-04-14\", \"parameters\": [{\"key\": \"temperature\", \"value\": \"0\"}]}, \"promptConfiguration\": {\"evaluationTask\": \"You are an expert evaluator. Your task is to evaluate the conciseness of responses generated by AI models.\\nWe will provide you with the user input and an AI-generated response.\\nYou should first read the user input carefully to understand the context and intention, and then evaluate the conciseness of the response based on the criteria provided in the Evaluation section below.\\nYou will assign the response a rating following the Rating Rubric and Evaluation Steps.\\nGive step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.\", \"definition\": \"You will be assessing conciseness, which measures the ability to convey the necessary information in a clear and succinct manner.\", \"criteria\": \"Conciseness: Does the response deliver the essential information without unnecessary words or redundancy?\", \"ratingRubric\": [{\"rating\": 1, \"rule\": \"(Not concise). The response is not concise and is filled with unnecessary or redundant content that obscures the main points.\"}, {\"rating\": 2, \"rule\": \"(Slightly concise). The response is slightly concise and contains a significant amount of unnecessary or redundant information.\"}, {\"rating\": 3, \"rule\": \"(Somewhat concise). The response is somewhat concise but may include some unnecessary words or slightly redundant information.\"}, {\"rating\": 4, \"rule\": \"(Mostly concise). The response is mostly concise and generally avoids unnecessary words while covering the essential information.\"}, {\"rating\": 5, \"rule\": \"(Highly concise). The response is very concise, delivering all necessary information in a succinct manner without any superfluous content.\"}], \"evaluationSteps\": [\"Assess the response in terms of Conciseness. Identify how effectively the response communicates essential information without unnecessary words according to the Criteria.\", \"Score based on the rating rubric. Give a brief rationale to explain your evaluation considering Conciseness.\"]}}}, \"additionalProperties\": {\"variables\": [], \"output_type\": \"numerical\", \"supported_values\": [1, 5], \"experimental\": true}}, {\"id\": \"2b3cc135-a031-4d93-8641-1f3833797034\", \"createdAt\": \"2025-11-14 05:10:58.078072\", \"name\": \"groundedness\", \"scenario\": \"genai-evaluations\", \"version\": \"0.0.1\", \"evaluationMethod\": \"llm-as-a-judge\", \"managedBy\": \"imperative\", \"systemPredefined\": false, \"metricType\": \"evaluation\", \"spec\": {\"promptType\": \"free-form\", \"configuration\": {\"modelConfiguration\": {\"name\": \"gpt-4o\", \"version\": \"2024-08-06\", \"parameters\": [{\"key\": \"temperature\", \"value\": \"0.1\"}, {\"key\": \"max_tokens\", \"value\": \"110\"}]}, \"promptConfiguration\": {\"systemPrompt\": \"You should strictly follow the instruction given to you. Please act as an impartial judge and evaluate the quality of the responses based on the prompt and following criteria:\", \"userPrompt\": \"You are an expert evaluator. Your task is to evaluate the quality of the responses generated by AI models. We will provide you with a reference and an AI-generated response. You should first read the user input carefully for analyzing the task, and then evaluate the quality of the responses based on the criteria provided in the Evaluation section below. You will assign the response a rating following the Rating Rubric and Evaluation Steps. Give step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.\\n\\n## Metric Definition\\nYou are an INFORMATION OVERLAP classifier providing the overlap of information between a response and reference.\\n\\n## Criteria\\nGroundedness: The of information between a response generated by AI models and provided reference.\\n\\n## Rating Rubric\\n5: (Fully grounded). The response and the reference are fully overlapped.\\n4: (Mostly grounded). The response and the reference are mostly overlapped.\\n3: (Somewhat grounded). The response and the reference are somewhat overlapped.\\n2: (Poorly grounded). The response and the reference are slightly overlapped.\\n1: (Not grounded). There is no overlap between the response and the reference.\\n\\n## Evaluation Steps\\nSTEP 1: Assess the response in aspects of Groundedness. Identify any information in the response and provide assessment according to the Criteria.\\nSTEP 2: Score based on the rating rubric. Give a brief rationale to explain your evaluation considering Groundedness.\\n\\nReference: {{?reference}}\\nResponse: {{?aicore_llm_completion}}\\n\\nBegin your evaluation by providing a short explanation. Be as unbiased as possible. After providing your explanation, please rate the response according to the rubric and outputs STRICTLY following this JSON format:\\n\\n{ \\\"explanation\\\": string, \\\"rating\\\": integer }\\n\\nOutput:\\n\", \"dataType\": \"numeric\"}}}}, {\"id\": \"9b349f8e-cd39-486d-809c-3dcfa3b15ac7\", \"createdAt\": \"2025-11-14 05:11:00.137979\", \"name\": \"groundedness\", \"scenario\": \"genai-evaluations\", \"version\": \"0.1.6\", \"evaluationMethod\": \"llm-as-a-judge\", \"managedBy\": \"imperative\", \"systemPredefined\": false, \"metricType\": \"evaluation\", \"spec\": {\"promptType\": \"free-form\", \"configuration\": {\"modelConfiguration\": {\"name\": \"gpt-4o\", \"version\": \"2024-08-06\", \"parameters\": [{\"key\": \"temperature\", \"value\": \"0.1\"}, {\"key\": \"max_tokens\", \"value\": \"110\"}]}, \"promptConfiguration\": {\"systemPrompt\": \"You should strictly follow the instruction given to you. Please act as an impartial judge and evaluate the quality of the responses based on the prompt and following criteria:\", \"userPrompt\": \"You are an expert evaluator. Your task is to evaluate the quality of the responses generated by AI models. We will provide you with a reference and an AI-generated response. You should first read the user input carefully for analyzing the task, and then evaluate the quality of the responses based on the criteria provided in the Evaluation section below. You will assign the response a rating following the Rating Rubric and Evaluation Steps. Give step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.\\n\\n## Metric Definition\\nYou are an INFORMATION OVERLAP classifier providing the overlap of information between a response and reference.\\n\\n## Criteria\\nGroundedness: The of information between a response generated by AI models and provided reference.\\n\\n## Rating Rubric\\n5: (Fully grounded). The response and the reference are fully overlapped.\\n4: (Mostly grounded). The response and the reference are mostly overlapped.\\n3: (Somewhat grounded). The response and the reference are somewhat overlapped.\\n2: (Poorly grounded). The response and the reference are slightly overlapped.\\n1: (Not grounded). There is no overlap between the response and the reference.\\n\\n## Evaluation Steps\\nSTEP 1: Assess the response in aspects of Groundedness. Identify any information in the response and provide assessment according to the Criteria.\\nSTEP 2: Score based on the rating rubric. Give a brief rationale to explain your evaluation considering Groundedness.\\n\\nReference: {{?reference}}\\nResponse: {{?aicore_llm_completion}}\\n\\nBegin your evaluation by providing a short explanation. Be as unbiased as possible. After providing your explanation, please rate the response according to the rubric and outputs STRICTLY following this JSON format:\\n\\n{ \\\"explanation\\\": string, \\\"rating\\\": integer }\\n\\nOutput:\\n\", \"dataType\": \"numeric\"}}}}]2025-11-14 05:38:21.9548642025-11-14 05:38:21.954869
\n", - "
\n", - " \n", - "
\n", - "

Table: submission

\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
idrun_idorchestration_configurationtemplate_variablescreated_atupdated_at
f1132352cb524392b733ca0773c6bef0960997700d79409680974700c0d67825{\"modules\": {\"prompt_templating\": {\"prompt\": {\"template\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\", \"role\": \"user\"}]}, \"model\": {\"name\": \"gpt-4o\", \"version\": \"2024-08-06\", \"timeout\": 600, \"max_retries\": 2}}}}{\"question\": \"how does rivatigmine and otc sleep medicine interact\", \"sentiment\": \"Interaction\", \"reference\": \"tell your doctor and pharmacist what prescription and nonprescription medications, vitamins, nutritional supplements, and herbal products you are taking or plan to take. Be sure to mention any of the following: antihistamines; aspirin and other nonsteroidal anti-inflammatory medications (NSAIDs) such as ibuprofen (Advil, Motrin) and naproxen (Aleve, Naprosyn); bethanechol (Duvoid, Urecholine); ipratropium (Atrovent, in Combivent, DuoNeb); and medications for Alzheimer's disease, glaucoma, irritable bowel disease, motion sickness, ulcers, or urinary problems. Your doctor may need to change the doses of your medications or monitor you carefully for side effects.\"}2025-11-14 05:38:21.9750022025-11-14 05:38:21.975004
ce3fd3f1f8f04499847153fb1d3a2b0f960997700d79409680974700c0d67825{\"modules\": {\"prompt_templating\": {\"prompt\": {\"template\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\", \"role\": \"user\"}]}, \"model\": {\"name\": \"gpt-4o\", \"version\": \"2024-08-06\", \"timeout\": 600, \"max_retries\": 2}}}}{\"question\": \"how does valium affect the brain\", \"sentiment\": \"Action\", \"reference\": \"Diazepam is a benzodiazepine that exerts anxiolytic, sedative, muscle-relaxant, anticonvulsant and amnestic effects. Most of these effects are thought to result from a facilitation of the action of gamma aminobutyric acid (GABA), an inhibitory neurotransmitter in the central nervous system.\"}2025-11-14 05:38:21.9750092025-11-14 05:38:21.975010
e0a2c70183dd4ce289772b5ae74957b6960997700d79409680974700c0d67825{\"modules\": {\"prompt_templating\": {\"prompt\": {\"template\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\", \"role\": \"user\"}]}, \"model\": {\"name\": \"gpt-4o\", \"version\": \"2024-08-06\", \"timeout\": 600, \"max_retries\": 2}}}}{\"question\": \"what is morphine\", \"sentiment\": \"Information\", \"reference\": \"Morphine is a pain medication of the opiate family which is found naturally in a number of plants and animals.[5][7] It acts directly on the central nervous system (CNS) to decrease the feeling of pain.\"}2025-11-14 05:38:21.9750142025-11-14 05:38:21.975015
61ca98967e5b45f4865db10d81de298b960997700d79409680974700c0d67825{\"modules\": {\"prompt_templating\": {\"prompt\": {\"template\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\", \"role\": \"user\"}]}, \"model\": {\"name\": \"gpt-4o\", \"version\": \"2024-08-06\", \"timeout\": 600, \"max_retries\": 2}}}}{\"question\": \"what are the milligrams for oxycodone e\", \"sentiment\": \"Dose\", \"reference\": \"\\ufffd 10 mg \\ufffd 20 mg \\ufffd 40 mg \\ufffd 80 mg ...\"}2025-11-14 05:38:21.9750182025-11-14 05:38:21.975019
9f84346869014412a1fd55ad52bd23b1960997700d79409680974700c0d67825{\"modules\": {\"prompt_templating\": {\"prompt\": {\"template\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\", \"role\": \"user\"}]}, \"model\": {\"name\": \"gpt-4o\", \"version\": \"2024-08-06\", \"timeout\": 600, \"max_retries\": 2}}}}{\"question\": \"81% aspirin contain resin and shellac in it. ?\", \"sentiment\": \"Ingredient\", \"reference\": \"Inactive Ingredients Ingredient Name\"}2025-11-14 05:38:21.9750222025-11-14 05:38:21.975023
98a14a854ba94f09b7a7b48c30878f3b960997700d79409680974700c0d67825{\"modules\": {\"prompt_templating\": {\"prompt\": {\"template\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\", \"role\": \"user\"}]}, \"model\": {\"name\": \"gpt-4o\", \"version\": \"2024-08-06\", \"timeout\": 600, \"max_retries\": 2}}}}{\"question\": \"what is desonide ointment used for\", \"sentiment\": \"Indication\", \"reference\": \"Desonide is used to treat the redness, swelling, itching, and discomfort of various skin conditions, including psoriasis (a skin disease in which red, scaly patches form on some areas of the body and eczema (a skin disease that causes the skin to be dry and itchy and to sometimes develop red, scaly rashes).\"}2025-11-14 05:38:21.9750272025-11-14 05:38:21.975027
6c34dd7a4bba4f4e8bd9bff5ab9afa1f960997700d79409680974700c0d67825{\"modules\": {\"prompt_templating\": {\"prompt\": {\"template\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\", \"role\": \"user\"}]}, \"model\": {\"name\": \"gpt-4o\", \"version\": \"2024-08-06\", \"timeout\": 600, \"max_retries\": 2}}}}{\"question\": \"how soon can tylenol be taken after a cocktail?\", \"sentiment\": \"Interaction\", \"reference\": \"According to the National Health Service (NHS) in the UK, it is usually safe to drink a small amount of alcohol while taking this pain reliever. ... However, when people take acetaminophen at high doses or together with alcohol, it can cause side effects ranging from minor to severe, with the possibility of fatal liver damage. This risk may be higher for people with alcohol use disorder (AUD), which was previously known as alcoholism.... According to the U.S. National Library of Medicine, taking acetaminophen can be dangerous for people who regularly drink alcohol. Manufacturers currently recommend that people who have more than 3 alcoholic drinks per day should ask their doctor before taking acetaminophen.\"}2025-11-14 05:38:21.9750322025-11-14 05:38:21.975032
b8c446bc2c2c423e9d60b1e7511f5c36960997700d79409680974700c0d67825{\"modules\": {\"prompt_templating\": {\"prompt\": {\"template\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\", \"role\": \"user\"}]}, \"model\": {\"name\": \"gpt-4o\", \"version\": \"2024-08-06\", \"timeout\": 600, \"max_retries\": 2}}}}{\"question\": \"breo inhaler how it works\", \"sentiment\": \"Action\", \"reference\": \"The combination of fluticasone and vilanterol is used to control wheezing, shortness of breath, coughing, and chest tightness caused by asthma and chronic obstructive pulmonary (COPD; a group of diseases that affect the lungs and airways, that includes chronic bronchitis and emphysema). Fluticasone is in a class of medications called steroids. It works by reducing swelling in the airways. Vilanterol is in a class of medications called long-acting beta-agonists (LABAs). It works by relaxing and opening air passages in the lungs, making it easier to breathe.\"}2025-11-14 05:38:21.9750362025-11-14 05:38:21.975036
4b6e2f1ddefa4f1585977e1cbe2fa900960997700d79409680974700c0d67825{\"modules\": {\"prompt_templating\": {\"prompt\": {\"template\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\", \"role\": \"user\"}]}, \"model\": {\"name\": \"gpt-4o\", \"version\": \"2024-08-06\", \"timeout\": 600, \"max_retries\": 2}}}}{\"question\": \"breo inhaler how it works\", \"sentiment\": \"Usage\", \"reference\": \"To use the inhaler, follow these steps:\\n\\t1\\tIf you will be using a new inhaler for the first time, remove it from the box and the foil wrapper. Fill in the \\\"Tray opened\\\" and \\\"Discard\\\" blanks on the inhaler label with the date that you opened the pouch and the date 6 weeks later when you must replace the inhaler.\\n\\t2\\tWhen you are ready to inhale your dose, slide the cover down to expose the mouthpiece until it clicks. If you open and close the inhaler without using your dose, you will waste the medication.\\n\\t3\\tThe counter will count down by 1 each time you open the cover. If the counter does not count down, your inhaler will not provide the medicine. If your inhaler does not count down, call your pharmacist or doctor.\\n\\t4\\tHold the inhaler away from your mouth and breathe out as far as you comfortably can. Do not breathe out into the mouthpiece.\\n\\t5\\tPut the mouthpiece between your lips, and close your lips firmly around it. Take a long, steady, deep breath in through your mouth. Do not breathe in through your nose. Be careful not block the air vent with your fingers.\\n\\t6\\tRemove the inhaler from your mouth, and hold your breath for about 3 to 4 seconds or as long as you comfortably can. Breathe out slowly.\\n\\t7\\tYou may or may not taste or feel the medicine released by the inhaler. Even if you do not, do not inhale another dose. If you are not sure you are getting your dose of fluticasone and vilanterol, call your doctor or pharmacist.\\n\\t8\\tYou may clean the mouthpiece with a dry tissue, if needed. Slide the cover up over the mouthpiece as far as it will go to close the inhaler.\\n\\t9\\tRinse your mouth with water, but do not swallow.\\nAsk your pharmacist or doctor for a copy of the manufacturer's information for the patient.\"}2025-11-14 05:38:21.9750402025-11-14 05:38:21.975040
ac009b3ba51c470ea3d672e1dc0fc290960997700d79409680974700c0d67825{\"modules\": {\"prompt_templating\": {\"prompt\": {\"template\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\", \"role\": \"user\"}]}, \"model\": {\"name\": \"gpt-4o\", \"version\": \"2024-08-06\", \"timeout\": 600, \"max_retries\": 2}}}}{\"question\": \"qvar 40mg what is it for\", \"sentiment\": \"Indication\", \"reference\": \"QVAR is indicated in the maintenance treatment of asthma as prophylactic therapy in patients 5 years of age and older. QVAR is also indicated for asthma patients who require systemic corticosteroid administration, where adding QVAR may reduce or eliminate the need for the systemic corticosteroids.\"}2025-11-14 05:38:21.9750442025-11-14 05:38:21.975045
\n", - "
\n", - " \n", - "
\n", - "

Table: submission_result

\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
submission_idrun_idrepetition_countcompletion_resultlatencycreated_atupdated_at
f1132352cb524392b733ca0773c6bef0960997700d79409680974700c0d678251{\"request_id\": \"a84d63c1-35ad-9e6e-aa8d-bc27f6c22e5e\", \"intermediate_results\": {\"templating\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: how does rivatigmine and otc sleep medicine interact.\", \"role\": \"user\"}], \"llm\": {\"id\": \"chatcmpl-CbgpKlbqIxb3V5NHK52VUybEb7B5T\", \"object\": \"chat.completion\", \"created\": 1763098798, \"model\": \"gpt-4o-2024-08-06\", \"system_fingerprint\": \"fp_4a331a0222\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Rivastigmine is a pharmaceutical drug primarily used for the treatment of mild to moderate dementia associated with Alzheimer's and Parkinson's disease. It belongs to a class of drugs known as cholinesterase inhibitors. Over-the-counter (OTC) sleep medicines often contain antihistamines, such as diphenhydramine or doxylamine, which can cause sedation.\\n\\n**Benefits of Rivastigmine:**\\n- Improvement in cognitive function in patients with Alzheimer's and Parkinson's disease-related dementia.\\n- May help with daily living activities by slowing the progression of symptoms.\\n- Possible benefits in improving attention and memory.\\n\\n**Side Effects of Rivastigmine:**\\n- Common side effects include nausea, vomiting, and diarrhea.\\n- Loss of appetite and weight loss.\\n- Dizziness or headache.\\n- Excessive sweating.\\n- More severe side effects can include heart problems such as bradycardia, peptic ulcers, or exacerbation of respiratory conditions.\\n\\n**Interactions between Rivastigmine and OTC Sleep Medicine:**\\n- **Enhanced Sedation:** Rivastigmine can enhance the sedative effects of antihistamines found in OTC sleep aids, leading to increased drowsiness or dizziness.\\n- **Cognitive Effects:** The combination could lead to worsened cognitive impairment in patients already dealing with cognitive decline, due to the sedative effects of sleep medicines.\\n- **Anticholinergic Effects:** Many OTC sleep aids have anticholinergic properties, which can counteract the effects of cholinesterase inhibitors like rivastigmine, potentially reducing its efficacy.\\n- **Risk of Falls:** Increased sedation and dizziness can raise the risk of falls, especially in elderly patients.\\n\\nIt is important for individuals considering these medications to consult a healthcare provider to carefully evaluate the risks, benefits, and potential interactions specific to their health needs.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 367, \"prompt_tokens\": 34, \"total_tokens\": 401}}}, \"final_result\": {\"id\": \"chatcmpl-CbgpKlbqIxb3V5NHK52VUybEb7B5T\", \"object\": \"chat.completion\", \"created\": 1763098798, \"model\": \"gpt-4o-2024-08-06\", \"system_fingerprint\": \"fp_4a331a0222\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Rivastigmine is a pharmaceutical drug primarily used for the treatment of mild to moderate dementia associated with Alzheimer's and Parkinson's disease. It belongs to a class of drugs known as cholinesterase inhibitors. Over-the-counter (OTC) sleep medicines often contain antihistamines, such as diphenhydramine or doxylamine, which can cause sedation.\\n\\n**Benefits of Rivastigmine:**\\n- Improvement in cognitive function in patients with Alzheimer's and Parkinson's disease-related dementia.\\n- May help with daily living activities by slowing the progression of symptoms.\\n- Possible benefits in improving attention and memory.\\n\\n**Side Effects of Rivastigmine:**\\n- Common side effects include nausea, vomiting, and diarrhea.\\n- Loss of appetite and weight loss.\\n- Dizziness or headache.\\n- Excessive sweating.\\n- More severe side effects can include heart problems such as bradycardia, peptic ulcers, or exacerbation of respiratory conditions.\\n\\n**Interactions between Rivastigmine and OTC Sleep Medicine:**\\n- **Enhanced Sedation:** Rivastigmine can enhance the sedative effects of antihistamines found in OTC sleep aids, leading to increased drowsiness or dizziness.\\n- **Cognitive Effects:** The combination could lead to worsened cognitive impairment in patients already dealing with cognitive decline, due to the sedative effects of sleep medicines.\\n- **Anticholinergic Effects:** Many OTC sleep aids have anticholinergic properties, which can counteract the effects of cholinesterase inhibitors like rivastigmine, potentially reducing its efficacy.\\n- **Risk of Falls:** Increased sedation and dizziness can raise the risk of falls, especially in elderly patients.\\n\\nIt is important for individuals considering these medications to consult a healthcare provider to carefully evaluate the risks, benefits, and potential interactions specific to their health needs.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 367, \"prompt_tokens\": 34, \"total_tokens\": 401}}}9.1126042025-11-14 05:43:14.0161692025-11-14 05:43:14.016173
ce3fd3f1f8f04499847153fb1d3a2b0f960997700d79409680974700c0d678251{\"request_id\": \"0e717c24-58c4-9c75-8b79-487f11da96c3\", \"intermediate_results\": {\"templating\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: how does valium affect the brain.\", \"role\": \"user\"}], \"llm\": {\"id\": \"chatcmpl-CbgpOcNTGm3nqeh2uGJBEijIUDYbC\", \"object\": \"chat.completion\", \"created\": 1763098802, \"model\": \"gpt-4o-2024-08-06\", \"system_fingerprint\": \"fp_4a331a0222\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Valium, also known by its generic name diazepam, is a medication belonging to the benzodiazepine class. It is commonly used to treat anxiety, muscle spasms, and seizures, and it can also be used in certain situations to provide sedation before medical procedures. Here's how Valium affects the brain, including its benefits and potential side effects:\\n\\n### Benefits:\\n1. **Anxiety Relief**: Valium acts on the central nervous system to produce calming effects, thus reducing symptoms of anxiety.\\n2. **Muscle Relaxation**: It can help relieve muscle spasms by promoting relaxation.\\n3. **Seizure Control**: Valium can be used to manage certain types of seizures, especially when used in conjunction with other medications.\\n4. **Sedation**: It may be used to sedate patients before surgery or medical procedures, helping them relax.\\n5. **Alcohol Withdrawal**: Valium is sometimes used to manage symptoms associated with acute alcohol withdrawal.\\n\\n### Side Effects:\\n1. **Drowsiness and Fatigue**: One of the most common side effects, which can affect concentration and coordination.\\n2. **Dizziness**: Users might experience light-headedness or a sensation of spinning.\\n3. **Cognitive Effects**: Can include confusion or forgetfulness.\\n4. **Dependence and Withdrawal**: Long-term use can lead to dependence, with withdrawal symptoms occurring if the medication is abruptly stopped.\\n5. **Coordination Problems**: There may be issues with balance and physical coordination.\\n6. **Blurred Vision**: Some users report changes in their vision.\\n7. **Mood Changes**: Can include feelings of depression or euphoria.\\n8. **Gastrointestinal Issues**: Such as nausea, constipation, or dry mouth.\\n9. **Respiratory Depression**: High doses can depress breathing, especially when combined with other CNS depressants, such as alcohol.\\n\\n### Important Considerations:\\n- **Addiction Risk**: Because it has the potential for abuse and addiction, Valium should be used only as prescribed by a healthcare professional.\\n- **Interactions**: Valium can interact with other medications, including alcohol, further increasing sedative effects.\\n- **Not Suitable for Everyone**: It might not be appropriate for people with certain conditions, such as severe lung conditions, sleep apnea, or a history of substance abuse.\\n\\nAlways follow the advice of a healthcare provider when using Valium, and report any side effects or concerns you have while taking the medication.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 507, \"prompt_tokens\": 31, \"total_tokens\": 538}}}, \"final_result\": {\"id\": \"chatcmpl-CbgpOcNTGm3nqeh2uGJBEijIUDYbC\", \"object\": \"chat.completion\", \"created\": 1763098802, \"model\": \"gpt-4o-2024-08-06\", \"system_fingerprint\": \"fp_4a331a0222\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Valium, also known by its generic name diazepam, is a medication belonging to the benzodiazepine class. It is commonly used to treat anxiety, muscle spasms, and seizures, and it can also be used in certain situations to provide sedation before medical procedures. Here's how Valium affects the brain, including its benefits and potential side effects:\\n\\n### Benefits:\\n1. **Anxiety Relief**: Valium acts on the central nervous system to produce calming effects, thus reducing symptoms of anxiety.\\n2. **Muscle Relaxation**: It can help relieve muscle spasms by promoting relaxation.\\n3. **Seizure Control**: Valium can be used to manage certain types of seizures, especially when used in conjunction with other medications.\\n4. **Sedation**: It may be used to sedate patients before surgery or medical procedures, helping them relax.\\n5. **Alcohol Withdrawal**: Valium is sometimes used to manage symptoms associated with acute alcohol withdrawal.\\n\\n### Side Effects:\\n1. **Drowsiness and Fatigue**: One of the most common side effects, which can affect concentration and coordination.\\n2. **Dizziness**: Users might experience light-headedness or a sensation of spinning.\\n3. **Cognitive Effects**: Can include confusion or forgetfulness.\\n4. **Dependence and Withdrawal**: Long-term use can lead to dependence, with withdrawal symptoms occurring if the medication is abruptly stopped.\\n5. **Coordination Problems**: There may be issues with balance and physical coordination.\\n6. **Blurred Vision**: Some users report changes in their vision.\\n7. **Mood Changes**: Can include feelings of depression or euphoria.\\n8. **Gastrointestinal Issues**: Such as nausea, constipation, or dry mouth.\\n9. **Respiratory Depression**: High doses can depress breathing, especially when combined with other CNS depressants, such as alcohol.\\n\\n### Important Considerations:\\n- **Addiction Risk**: Because it has the potential for abuse and addiction, Valium should be used only as prescribed by a healthcare professional.\\n- **Interactions**: Valium can interact with other medications, including alcohol, further increasing sedative effects.\\n- **Not Suitable for Everyone**: It might not be appropriate for people with certain conditions, such as severe lung conditions, sleep apnea, or a history of substance abuse.\\n\\nAlways follow the advice of a healthcare provider when using Valium, and report any side effects or concerns you have while taking the medication.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 507, \"prompt_tokens\": 31, \"total_tokens\": 538}}}16.7714402025-11-14 05:43:14.0161742025-11-14 05:43:14.016174
e0a2c70183dd4ce289772b5ae74957b6960997700d79409680974700c0d678251{\"request_id\": \"7210615f-2b11-9523-a798-ed1af505e05f\", \"intermediate_results\": {\"templating\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: what is morphine.\", \"role\": \"user\"}], \"llm\": {\"id\": \"chatcmpl-CbgpSRd0RQJNQTjNWQs53chmAduGo\", \"object\": \"chat.completion\", \"created\": 1763098806, \"model\": \"gpt-4o-2024-08-06\", \"system_fingerprint\": \"fp_4a331a0222\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Morphine is a powerful opioid pain medication used to treat moderate to severe pain. It's often used in a hospital setting for pain relief after surgery or for chronic pain management. Here are the benefits and side effects of morphine:\\n\\n### Benefits:\\n1. **Pain Relief**: Morphine is highly effective in reducing severe pain, providing relief for patients who are suffering from acute or chronic pain conditions.\\n2. **Reduced Anxiety and Stress**: By alleviating pain, morphine can also help reduce anxiety and stress associated with severe pain.\\n3. **Sedation**: It may provide a calming effect which can be beneficial for patients in extreme discomfort.\\n4. **Improves Quality of Life**: For those with chronic pain, it can improve daily functioning and overall quality of life.\\n5. **Flexible Administration**: Morphine can be administered in various forms, including orally, intravenously, and via injection, providing flexibility in its use.\\n\\n### Side Effects:\\n1. **Respiratory Depression**: One of the most serious side effects, morphine can slow down breathing, which can be dangerous, requiring careful monitoring especially when first starting the medication or adjusting doses.\\n2. **Nausea and Vomiting**: Common side effects, especially when first starting treatment.\\n3. **Drowsiness and Sedation**: Can limit activities such as driving or operating heavy machinery.\\n4. **Constipation**: A very common issue with opioid use; patients may need to use laxatives or other remedies to counteract this effect.\\n5. **Dizziness**: This may occur, leading to risks of falls, especially in older adults.\\n6. **Tolerance and Dependence**: Long-term use can lead to tolerance (requiring higher doses to achieve the same effect) and physical dependence, meaning that withdrawal symptoms may occur if the medication is abruptly stopped.\\n7. **Potential for Abuse**: As an opioid, morphine has a potential for addiction and abuse, which necessitates careful monitoring and regulation of its use.\\n8. **Allergic Reactions**: Although rare, some individuals may experience allergic reactions such as skin rashes, itching, or more severe reactions.\\n\\nGiven its potent effects and risks, morphine should be used under strict medical supervision to manage its efficacy and safety.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 463, \"prompt_tokens\": 28, \"total_tokens\": 491}}}, \"final_result\": {\"id\": \"chatcmpl-CbgpSRd0RQJNQTjNWQs53chmAduGo\", \"object\": \"chat.completion\", \"created\": 1763098806, \"model\": \"gpt-4o-2024-08-06\", \"system_fingerprint\": \"fp_4a331a0222\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Morphine is a powerful opioid pain medication used to treat moderate to severe pain. It's often used in a hospital setting for pain relief after surgery or for chronic pain management. Here are the benefits and side effects of morphine:\\n\\n### Benefits:\\n1. **Pain Relief**: Morphine is highly effective in reducing severe pain, providing relief for patients who are suffering from acute or chronic pain conditions.\\n2. **Reduced Anxiety and Stress**: By alleviating pain, morphine can also help reduce anxiety and stress associated with severe pain.\\n3. **Sedation**: It may provide a calming effect which can be beneficial for patients in extreme discomfort.\\n4. **Improves Quality of Life**: For those with chronic pain, it can improve daily functioning and overall quality of life.\\n5. **Flexible Administration**: Morphine can be administered in various forms, including orally, intravenously, and via injection, providing flexibility in its use.\\n\\n### Side Effects:\\n1. **Respiratory Depression**: One of the most serious side effects, morphine can slow down breathing, which can be dangerous, requiring careful monitoring especially when first starting the medication or adjusting doses.\\n2. **Nausea and Vomiting**: Common side effects, especially when first starting treatment.\\n3. **Drowsiness and Sedation**: Can limit activities such as driving or operating heavy machinery.\\n4. **Constipation**: A very common issue with opioid use; patients may need to use laxatives or other remedies to counteract this effect.\\n5. **Dizziness**: This may occur, leading to risks of falls, especially in older adults.\\n6. **Tolerance and Dependence**: Long-term use can lead to tolerance (requiring higher doses to achieve the same effect) and physical dependence, meaning that withdrawal symptoms may occur if the medication is abruptly stopped.\\n7. **Potential for Abuse**: As an opioid, morphine has a potential for addiction and abuse, which necessitates careful monitoring and regulation of its use.\\n8. **Allergic Reactions**: Although rare, some individuals may experience allergic reactions such as skin rashes, itching, or more severe reactions.\\n\\nGiven its potent effects and risks, morphine should be used under strict medical supervision to manage its efficacy and safety.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 463, \"prompt_tokens\": 28, \"total_tokens\": 491}}}17.6568122025-11-14 05:43:14.0161752025-11-14 05:43:14.016177
61ca98967e5b45f4865db10d81de298b960997700d79409680974700c0d678251{\"request_id\": \"34e8b9ee-890d-957d-89f4-53dfcad18a20\", \"intermediate_results\": {\"templating\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: what are the milligrams for oxycodone e.\", \"role\": \"user\"}], \"llm\": {\"id\": \"chatcmpl-CbgpWCiud9mjIG1PkXPDcHc9HFIrQ\", \"object\": \"chat.completion\", \"created\": 1763098810, \"model\": \"gpt-4o-2024-08-06\", \"system_fingerprint\": \"fp_4a331a0222\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Oxycodone is a prescription opioid medication used to treat moderate to severe pain. It is important to use this medication exactly as prescribed by a healthcare provider. Here are some of the benefits and side effects of oxycodone:\\n\\n### Benefits\\n1. **Pain Relief**: Oxycodone is effective in reducing or eliminating acute and chronic pain.\\n2. **Improved Quality of Life**: For individuals with chronic pain, oxycodone can improve the ability to perform daily activities and overall quality of life.\\n3. **Various Formulations**: Available in immediate-release and extended-release formulations for flexible pain management.\\n\\n### Side Effects\\nCommon side effects include:\\n1. **Nausea and Vomiting**: These can occur when starting the medication or adjusting the dose.\\n2. **Constipation**: A very common side effect that often requires dietary changes or laxatives to manage.\\n3. **Drowsiness and Dizziness**: May impair the ability to drive or operate machinery.\\n4. **Sweating**: Increased sweating is a possible side effect.\\n5. **Dry Mouth**: Can occur with opioid use.\\n\\nSerious side effects may include:\\n1. **Respiratory Depression**: Slowed breathing can be life-threatening and requires immediate medical attention.\\n2. **Addiction, Abuse, and Misuse**: Oxycodone has a high potential for addiction.\\n3. **Withdrawal Symptoms**: These can occur if the medication is stopped abruptly after prolonged use.\\n4. **Interactions with Other Medications**: Can have harmful interactions with other drugs, especially other CNS depressants.\\n5. **Severe Allergic Reactions**: Symptoms such as rash, itching, swelling, severe dizziness, or trouble breathing require immediate medical attention.\\n\\nAlways consult a healthcare provider for personalized advice and to discuss the risks and benefits before starting any new medication.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 379, \"prompt_tokens\": 34, \"total_tokens\": 413}}}, \"final_result\": {\"id\": \"chatcmpl-CbgpWCiud9mjIG1PkXPDcHc9HFIrQ\", \"object\": \"chat.completion\", \"created\": 1763098810, \"model\": \"gpt-4o-2024-08-06\", \"system_fingerprint\": \"fp_4a331a0222\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Oxycodone is a prescription opioid medication used to treat moderate to severe pain. It is important to use this medication exactly as prescribed by a healthcare provider. Here are some of the benefits and side effects of oxycodone:\\n\\n### Benefits\\n1. **Pain Relief**: Oxycodone is effective in reducing or eliminating acute and chronic pain.\\n2. **Improved Quality of Life**: For individuals with chronic pain, oxycodone can improve the ability to perform daily activities and overall quality of life.\\n3. **Various Formulations**: Available in immediate-release and extended-release formulations for flexible pain management.\\n\\n### Side Effects\\nCommon side effects include:\\n1. **Nausea and Vomiting**: These can occur when starting the medication or adjusting the dose.\\n2. **Constipation**: A very common side effect that often requires dietary changes or laxatives to manage.\\n3. **Drowsiness and Dizziness**: May impair the ability to drive or operate machinery.\\n4. **Sweating**: Increased sweating is a possible side effect.\\n5. **Dry Mouth**: Can occur with opioid use.\\n\\nSerious side effects may include:\\n1. **Respiratory Depression**: Slowed breathing can be life-threatening and requires immediate medical attention.\\n2. **Addiction, Abuse, and Misuse**: Oxycodone has a high potential for addiction.\\n3. **Withdrawal Symptoms**: These can occur if the medication is stopped abruptly after prolonged use.\\n4. **Interactions with Other Medications**: Can have harmful interactions with other drugs, especially other CNS depressants.\\n5. **Severe Allergic Reactions**: Symptoms such as rash, itching, swelling, severe dizziness, or trouble breathing require immediate medical attention.\\n\\nAlways consult a healthcare provider for personalized advice and to discuss the risks and benefits before starting any new medication.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 379, \"prompt_tokens\": 34, \"total_tokens\": 413}}}20.1365062025-11-14 05:43:14.0161782025-11-14 05:43:14.016179
9f84346869014412a1fd55ad52bd23b1960997700d79409680974700c0d678251{\"request_id\": \"38ddc04e-b9df-92d8-936a-fbd4b4b5f407\", \"intermediate_results\": {\"templating\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: 81% aspirin contain resin and shellac in it. ?.\", \"role\": \"user\"}], \"llm\": {\"id\": \"chatcmpl-Cbgpaso97P0FcGhMa2DzrKllg1yEq\", \"object\": \"chat.completion\", \"created\": 1763098814, \"model\": \"gpt-4o-2024-08-06\", \"system_fingerprint\": \"fp_4a331a0222\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Aspirin, including low-dose formulations like 81 mg aspirin, is widely used for its anti-inflammatory, analgesic, and antipyretic properties and its potential protective effects against heart attack and stroke. Here's a detailed list of benefits and side effects associated with aspirin:\\n\\n### Benefits:\\n1. **Cardiovascular Protection:** Low-dose aspirin is commonly prescribed to reduce the risk of heart attack and stroke, particularly in individuals with cardiovascular disease or risk factors.\\n2. **Pain Relief:** Aspirin provides relief from mild to moderate pain, such as headaches, muscle aches, toothaches, and menstrual cramps.\\n3. **Anti-Inflammatory Effects:** It helps reduce inflammation from conditions such as arthritis.\\n4. **Antipyretic Effects:** Aspirin can help reduce fever.\\n\\n### Side Effects:\\n1. **Gastrointestinal Issues:** These can include stomach pain, heartburn, nausea, and more serious effects like gastrointestinal bleeding or ulcers.\\n2. **Increased Bleeding Risk:** Aspirin affects blood clotting, which can lead to an increased risk of bleeding, especially if used long term.\\n3. **Allergic Reactions:** Some individuals may experience allergic reactions to aspirin, including hives, facial swelling, asthma, or anaphylaxis.\\n4. **Kidney and Liver Effects:** Prolonged use of aspirin can potentially impact kidney and liver function, though this is more common with high doses or long-term use.\\n5. **Reye's Syndrome:** Aspirin is linked to Reye's syndrome when given to children with viral infections; thus, it is generally avoided in pediatric medicine for viral illnesses.\\n6. **Ringing in the Ears:** Also known as tinnitus, this can occur at higher doses of aspirin.\\n\\nRegarding the mention of resin and shellac, these are commonly used as excipients or coating materials in the manufacturing of pills, including some aspirin products. They generally serve to protect the pill, control its release into the system, or enhance its appearance. These components are usually considered safe, but people with specific allergies or sensitivities might need to be cautious.\\n\\nAlways consult a healthcare provider before starting or changing a medication regimen to ensure it aligns with your health needs and conditions.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 447, \"prompt_tokens\": 36, \"total_tokens\": 483}}}, \"final_result\": {\"id\": \"chatcmpl-Cbgpaso97P0FcGhMa2DzrKllg1yEq\", \"object\": \"chat.completion\", \"created\": 1763098814, \"model\": \"gpt-4o-2024-08-06\", \"system_fingerprint\": \"fp_4a331a0222\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Aspirin, including low-dose formulations like 81 mg aspirin, is widely used for its anti-inflammatory, analgesic, and antipyretic properties and its potential protective effects against heart attack and stroke. Here's a detailed list of benefits and side effects associated with aspirin:\\n\\n### Benefits:\\n1. **Cardiovascular Protection:** Low-dose aspirin is commonly prescribed to reduce the risk of heart attack and stroke, particularly in individuals with cardiovascular disease or risk factors.\\n2. **Pain Relief:** Aspirin provides relief from mild to moderate pain, such as headaches, muscle aches, toothaches, and menstrual cramps.\\n3. **Anti-Inflammatory Effects:** It helps reduce inflammation from conditions such as arthritis.\\n4. **Antipyretic Effects:** Aspirin can help reduce fever.\\n\\n### Side Effects:\\n1. **Gastrointestinal Issues:** These can include stomach pain, heartburn, nausea, and more serious effects like gastrointestinal bleeding or ulcers.\\n2. **Increased Bleeding Risk:** Aspirin affects blood clotting, which can lead to an increased risk of bleeding, especially if used long term.\\n3. **Allergic Reactions:** Some individuals may experience allergic reactions to aspirin, including hives, facial swelling, asthma, or anaphylaxis.\\n4. **Kidney and Liver Effects:** Prolonged use of aspirin can potentially impact kidney and liver function, though this is more common with high doses or long-term use.\\n5. **Reye's Syndrome:** Aspirin is linked to Reye's syndrome when given to children with viral infections; thus, it is generally avoided in pediatric medicine for viral illnesses.\\n6. **Ringing in the Ears:** Also known as tinnitus, this can occur at higher doses of aspirin.\\n\\nRegarding the mention of resin and shellac, these are commonly used as excipients or coating materials in the manufacturing of pills, including some aspirin products. They generally serve to protect the pill, control its release into the system, or enhance its appearance. These components are usually considered safe, but people with specific allergies or sensitivities might need to be cautious.\\n\\nAlways consult a healthcare provider before starting or changing a medication regimen to ensure it aligns with your health needs and conditions.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 447, \"prompt_tokens\": 36, \"total_tokens\": 483}}}26.6376532025-11-14 05:43:14.0161802025-11-14 05:43:14.016180
98a14a854ba94f09b7a7b48c30878f3b960997700d79409680974700c0d678251{\"request_id\": \"b3487dc5-1c06-9c24-8493-fcc226b751a3\", \"intermediate_results\": {\"templating\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: what is desonide ointment used for.\", \"role\": \"user\"}], \"llm\": {\"id\": \"chatcmpl-CbgpeWG1wvBScw4hRug9GmhLBfQYr\", \"object\": \"chat.completion\", \"created\": 1763098818, \"model\": \"gpt-4o-2024-08-06\", \"system_fingerprint\": \"fp_4a331a0222\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Desonide ointment is a topical corticosteroid used primarily to treat various skin conditions by reducing inflammation, redness, and itching. Here are the benefits and potential side effects of using desonide ointment:\\n\\n### Benefits\\n1. **Reduces Inflammation:** It helps alleviate inflammation associated with conditions like eczema, dermatitis, and psoriasis.\\n2. **Decreases Itching:** By calming the skin, desonide reduces the urge to scratch, which can improve comfort.\\n3. **Reduces Redness:** The ointment helps in minimizing redness and swelling of the affected areas.\\n4. **Promotes Healing:** By managing symptoms like inflammation and itching, it supports faster healing of the skin.\\n5. **Mild Formulation:** As a low-potency steroid, it is often recommended for sensitive skin areas and for use in children.\\n\\n### Side Effects\\nWhile desonide is considered low-potency and generally well-tolerated, some potential side effects may occur:\\n\\n1. **Skin Irritation:** This can include burning, itching, or dryness at the application site.\\n2. **Contact Dermatitis:** Some individuals may experience an allergic reaction leading to a rash.\\n3. **Hypopigmentation:** Prolonged use might cause lightening of the skin.\\n4. **Thinning of Skin:** Extended use, especially under occlusive dressings, can lead to skin thinning.\\n5. **Stretch Marks:** May occur with long-term use in certain areas.\\n6. **Increased Hair Growth:** Some people may notice more hair growth in the treated area.\\n7. **Systemic Absorption:** Though rare, especially with topical use, large amounts or prolonged use can lead to systemic absorption with corticosteroid-related side effects.\\n\\nIt is important to use desonide ointment as directed by a healthcare provider. If you experience severe side effects or signs of an allergic reaction, such as rash, swelling, or difficulty breathing, seek medical attention promptly.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 395, \"prompt_tokens\": 33, \"total_tokens\": 428}}}, \"final_result\": {\"id\": \"chatcmpl-CbgpeWG1wvBScw4hRug9GmhLBfQYr\", \"object\": \"chat.completion\", \"created\": 1763098818, \"model\": \"gpt-4o-2024-08-06\", \"system_fingerprint\": \"fp_4a331a0222\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Desonide ointment is a topical corticosteroid used primarily to treat various skin conditions by reducing inflammation, redness, and itching. Here are the benefits and potential side effects of using desonide ointment:\\n\\n### Benefits\\n1. **Reduces Inflammation:** It helps alleviate inflammation associated with conditions like eczema, dermatitis, and psoriasis.\\n2. **Decreases Itching:** By calming the skin, desonide reduces the urge to scratch, which can improve comfort.\\n3. **Reduces Redness:** The ointment helps in minimizing redness and swelling of the affected areas.\\n4. **Promotes Healing:** By managing symptoms like inflammation and itching, it supports faster healing of the skin.\\n5. **Mild Formulation:** As a low-potency steroid, it is often recommended for sensitive skin areas and for use in children.\\n\\n### Side Effects\\nWhile desonide is considered low-potency and generally well-tolerated, some potential side effects may occur:\\n\\n1. **Skin Irritation:** This can include burning, itching, or dryness at the application site.\\n2. **Contact Dermatitis:** Some individuals may experience an allergic reaction leading to a rash.\\n3. **Hypopigmentation:** Prolonged use might cause lightening of the skin.\\n4. **Thinning of Skin:** Extended use, especially under occlusive dressings, can lead to skin thinning.\\n5. **Stretch Marks:** May occur with long-term use in certain areas.\\n6. **Increased Hair Growth:** Some people may notice more hair growth in the treated area.\\n7. **Systemic Absorption:** Though rare, especially with topical use, large amounts or prolonged use can lead to systemic absorption with corticosteroid-related side effects.\\n\\nIt is important to use desonide ointment as directed by a healthcare provider. If you experience severe side effects or signs of an allergic reaction, such as rash, swelling, or difficulty breathing, seek medical attention promptly.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 395, \"prompt_tokens\": 33, \"total_tokens\": 428}}}30.8816902025-11-14 05:43:14.0161812025-11-14 05:43:14.016181
6c34dd7a4bba4f4e8bd9bff5ab9afa1f960997700d79409680974700c0d678251{\"request_id\": \"fd5cbafd-b66b-9b97-93c2-eebcdc27661d\", \"intermediate_results\": {\"templating\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: how soon can tylenol be taken after a cocktail?.\", \"role\": \"user\"}], \"llm\": {\"id\": \"chatcmpl-CbgpiVkYdzlpHQQsRHqvnGJfohWuk\", \"object\": \"chat.completion\", \"created\": 1763098822, \"model\": \"gpt-4o-2024-08-06\", \"system_fingerprint\": \"fp_4a331a0222\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"### Benefits of Tylenol (Acetaminophen)\\n\\n1. **Pain Relief:** \\n - Tylenol is commonly used to relieve mild to moderate pain from headaches, muscle aches, menstrual periods, colds and sore throats, toothaches, and backaches.\\n\\n2. **Fever Reduction:**\\n - It is effective in reducing fever.\\n\\n3. **Generally Safe:**\\n - When taken at recommended doses, Tylenol is generally safe for most people and can be used by those who cannot take NSAIDs (non-steroidal anti-inflammatory drugs) like aspirin, ibuprofen, or naproxen.\\n\\n4. **Well-Tolerated:**\\n - It generally causes fewer gastrointestinal side effects than NSAIDs.\\n\\n### Side Effects of Tylenol (Acetaminophen)\\n\\n1. **Liver Damage:**\\n - High doses or prolonged use can cause severe liver damage. This risk is increased with excessive alcohol consumption.\\n\\n2. **Allergic Reactions:**\\n - Rarely, some people may experience allergic reactions like rash, itching, swelling, severe dizziness, or trouble breathing.\\n\\n3. **Kidney Damage:**\\n - Long-term use can potentially lead to kidney damage, especially at high doses.\\n\\n### Regarding Alcohol Consumption\\n\\n- **Timing:** \\n - It is generally advised to wait at least 3-4 hours after consuming a cocktail before taking Tylenol. This allows your body to process the alcohol and reduces the risk of liver damage. However, depending on the amount of alcohol consumed and individual health considerations, it might be better to consult a healthcare professional for personalized advice.\\n\\n- **Risk of Liver Damage:**\\n - Both alcohol and acetaminophen are processed via the liver, and taking them together, especially in large amounts or over a long period, can increase the risk of liver damage.\\n\\nIt is always important to follow dosing instructions on the medication label and to consult with a healthcare professional if you have specific concerns or health conditions that might affect how you should use Tylenol.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 414, \"prompt_tokens\": 35, \"total_tokens\": 449}}}, \"final_result\": {\"id\": \"chatcmpl-CbgpiVkYdzlpHQQsRHqvnGJfohWuk\", \"object\": \"chat.completion\", \"created\": 1763098822, \"model\": \"gpt-4o-2024-08-06\", \"system_fingerprint\": \"fp_4a331a0222\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"### Benefits of Tylenol (Acetaminophen)\\n\\n1. **Pain Relief:** \\n - Tylenol is commonly used to relieve mild to moderate pain from headaches, muscle aches, menstrual periods, colds and sore throats, toothaches, and backaches.\\n\\n2. **Fever Reduction:**\\n - It is effective in reducing fever.\\n\\n3. **Generally Safe:**\\n - When taken at recommended doses, Tylenol is generally safe for most people and can be used by those who cannot take NSAIDs (non-steroidal anti-inflammatory drugs) like aspirin, ibuprofen, or naproxen.\\n\\n4. **Well-Tolerated:**\\n - It generally causes fewer gastrointestinal side effects than NSAIDs.\\n\\n### Side Effects of Tylenol (Acetaminophen)\\n\\n1. **Liver Damage:**\\n - High doses or prolonged use can cause severe liver damage. This risk is increased with excessive alcohol consumption.\\n\\n2. **Allergic Reactions:**\\n - Rarely, some people may experience allergic reactions like rash, itching, swelling, severe dizziness, or trouble breathing.\\n\\n3. **Kidney Damage:**\\n - Long-term use can potentially lead to kidney damage, especially at high doses.\\n\\n### Regarding Alcohol Consumption\\n\\n- **Timing:** \\n - It is generally advised to wait at least 3-4 hours after consuming a cocktail before taking Tylenol. This allows your body to process the alcohol and reduces the risk of liver damage. However, depending on the amount of alcohol consumed and individual health considerations, it might be better to consult a healthcare professional for personalized advice.\\n\\n- **Risk of Liver Damage:**\\n - Both alcohol and acetaminophen are processed via the liver, and taking them together, especially in large amounts or over a long period, can increase the risk of liver damage.\\n\\nIt is always important to follow dosing instructions on the medication label and to consult with a healthcare professional if you have specific concerns or health conditions that might affect how you should use Tylenol.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 414, \"prompt_tokens\": 35, \"total_tokens\": 449}}}34.0473912025-11-14 05:43:14.0161822025-11-14 05:43:14.016182
b8c446bc2c2c423e9d60b1e7511f5c36960997700d79409680974700c0d678251{\"request_id\": \"ad1459b8-f922-9fd0-ae8b-beb41d2bfc9a\", \"intermediate_results\": {\"templating\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: breo inhaler how it works.\", \"role\": \"user\"}], \"llm\": {\"id\": \"chatcmpl-Cbgpm1fLXLizRqds95YiTnF1MgSfk\", \"object\": \"chat.completion\", \"created\": 1763098826, \"model\": \"gpt-4o-2024-08-06\", \"system_fingerprint\": \"fp_4a331a0222\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"The Breo inhaler, also known by its generic components fluticasone furoate and vilanterol, is used primarily to treat asthma and chronic obstructive pulmonary disease (COPD). Here's how it works and its potential benefits and side effects:\\n\\n### How It Works:\\n- **Fluticasone Furoate**: This is a corticosteroid that reduces inflammation in the airways, helping to prevent asthma attacks and improve breathing.\\n- **Vilanterol**: This is a long-acting beta-agonist (LABA) that relaxes muscles in the airways to improve airflow and breathing.\\n\\n### Benefits:\\n- **Improved Breathing**: Helps ease breathing by reducing airway inflammation and relaxing airway muscles.\\n- **Reduced Symptoms**: Decreases the frequency and severity of asthma attacks and COPD flare-ups.\\n- **Long-Lasting Relief**: Provides relief that can last for 24 hours, meaning it typically needs to be used once a day.\\n- **Convenience**: Combines two medications in a single inhaler, simplifying treatment.\\n\\n### Potential Side Effects:\\n- **Common Side Effects**:\\n - Throat irritation or hoarseness\\n - Headache\\n - Cough\\n - Back pain\\n\\n- **Serious Side Effects** (require immediate medical attention):\\n - Difficulty breathing or swallowing\\n - Signs of an allergic reaction (rash, itching, swelling, severe dizziness)\\n - High blood pressure\\n - Increased heart rate\\n - Chest pain\\n - Eye problems like blurred vision or eye pain\\n\\n- **Long-term Use Side Effects**:\\n - Potential for increased risk of pneumonia in COPD patients\\n - Possible risk of osteoporosis\\n - Oral thrush (a fungal infection in the mouth) \\u2013 to reduce this risk, rinse your mouth with water after each use\\n\\nIt's important for users to follow their healthcare provider's instructions and not to use the Breo inhaler to treat sudden breathing problems. This medication is meant for long-term management, and a rescue inhaler should be used for acute symptoms. Always consult a healthcare professional for personalized advice and guidance.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 435, \"prompt_tokens\": 31, \"total_tokens\": 466}}}, \"final_result\": {\"id\": \"chatcmpl-Cbgpm1fLXLizRqds95YiTnF1MgSfk\", \"object\": \"chat.completion\", \"created\": 1763098826, \"model\": \"gpt-4o-2024-08-06\", \"system_fingerprint\": \"fp_4a331a0222\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"The Breo inhaler, also known by its generic components fluticasone furoate and vilanterol, is used primarily to treat asthma and chronic obstructive pulmonary disease (COPD). Here's how it works and its potential benefits and side effects:\\n\\n### How It Works:\\n- **Fluticasone Furoate**: This is a corticosteroid that reduces inflammation in the airways, helping to prevent asthma attacks and improve breathing.\\n- **Vilanterol**: This is a long-acting beta-agonist (LABA) that relaxes muscles in the airways to improve airflow and breathing.\\n\\n### Benefits:\\n- **Improved Breathing**: Helps ease breathing by reducing airway inflammation and relaxing airway muscles.\\n- **Reduced Symptoms**: Decreases the frequency and severity of asthma attacks and COPD flare-ups.\\n- **Long-Lasting Relief**: Provides relief that can last for 24 hours, meaning it typically needs to be used once a day.\\n- **Convenience**: Combines two medications in a single inhaler, simplifying treatment.\\n\\n### Potential Side Effects:\\n- **Common Side Effects**:\\n - Throat irritation or hoarseness\\n - Headache\\n - Cough\\n - Back pain\\n\\n- **Serious Side Effects** (require immediate medical attention):\\n - Difficulty breathing or swallowing\\n - Signs of an allergic reaction (rash, itching, swelling, severe dizziness)\\n - High blood pressure\\n - Increased heart rate\\n - Chest pain\\n - Eye problems like blurred vision or eye pain\\n\\n- **Long-term Use Side Effects**:\\n - Potential for increased risk of pneumonia in COPD patients\\n - Possible risk of osteoporosis\\n - Oral thrush (a fungal infection in the mouth) \\u2013 to reduce this risk, rinse your mouth with water after each use\\n\\nIt's important for users to follow their healthcare provider's instructions and not to use the Breo inhaler to treat sudden breathing problems. This medication is meant for long-term management, and a rescue inhaler should be used for acute symptoms. Always consult a healthcare professional for personalized advice and guidance.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 435, \"prompt_tokens\": 31, \"total_tokens\": 466}}}38.3906542025-11-14 05:43:14.0161832025-11-14 05:43:14.016183
4b6e2f1ddefa4f1585977e1cbe2fa900960997700d79409680974700c0d678251{\"request_id\": \"7ed2ae00-47ea-97fc-8d72-69f143a42abc\", \"intermediate_results\": {\"templating\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: breo inhaler how it works.\", \"role\": \"user\"}], \"llm\": {\"id\": \"chatcmpl-Cbgpq82kWg8SHVgUnPQhAFvm7VaIR\", \"object\": \"chat.completion\", \"created\": 1763098830, \"model\": \"gpt-4o-2024-08-06\", \"system_fingerprint\": \"fp_4a331a0222\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Breo Ellipta is an inhaler that contains two active ingredients, fluticasone furoate and vilanterol. It is commonly used for managing asthma and chronic obstructive pulmonary disease (COPD). Here's how it works, along with its benefits and potential side effects:\\n\\n### How It Works\\n- **Fluticasone Furoate**: This is a corticosteroid that helps reduce inflammation in the lungs, making it easier to breathe.\\n- **Vilanterol**: This is a long-acting beta-agonist (LABA) that relaxes muscles in the airways to improve airflow and reduce symptoms like wheezing and shortness of breath.\\n\\n### Benefits\\n- **Improved Breathing**: By reducing inflammation and relaxing airway muscles, the inhaler helps improve overall breathing.\\n- **Symptom Control**: Helps control and prevent symptoms such as wheezing, coughing, chest tightness, and shortness of breath.\\n- **Convenience**: Once-daily dosing can improve adherence and make it easier for patients to manage their condition.\\n- **Improved Quality of Life**: Better management of asthma or COPD symptoms can lead to an improved ability to carry out daily activities.\\n\\n### Potential Side Effects\\n- **Common Side Effects**: These may include headaches, sore throat, cough, and oral thrush (a fungal infection in the mouth).\\n- **Muscle Pain**: Some people may experience muscle and joint pain.\\n- **Increased Heart Rate**: Vilanterol may cause an increased heart rate or palpitations.\\n- **Risk of Infections**: Corticosteroids can suppress the immune system, increasing the risk of infections, such as pneumonia.\\n- **Hoarseness or Voice Changes**: Inhaled corticosteroids can affect the vocal cords.\\n- **Paradoxical Bronchospasm**: Rarely, inhalers can cause bronchospasm, which is a tightening of the muscles around the airways.\\n\\nIt's important for patients to discuss their symptoms and any side effects they experience with their healthcare provider, who can offer guidance tailored to individual health needs.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 431, \"prompt_tokens\": 31, \"total_tokens\": 462}}}, \"final_result\": {\"id\": \"chatcmpl-Cbgpq82kWg8SHVgUnPQhAFvm7VaIR\", \"object\": \"chat.completion\", \"created\": 1763098830, \"model\": \"gpt-4o-2024-08-06\", \"system_fingerprint\": \"fp_4a331a0222\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Breo Ellipta is an inhaler that contains two active ingredients, fluticasone furoate and vilanterol. It is commonly used for managing asthma and chronic obstructive pulmonary disease (COPD). Here's how it works, along with its benefits and potential side effects:\\n\\n### How It Works\\n- **Fluticasone Furoate**: This is a corticosteroid that helps reduce inflammation in the lungs, making it easier to breathe.\\n- **Vilanterol**: This is a long-acting beta-agonist (LABA) that relaxes muscles in the airways to improve airflow and reduce symptoms like wheezing and shortness of breath.\\n\\n### Benefits\\n- **Improved Breathing**: By reducing inflammation and relaxing airway muscles, the inhaler helps improve overall breathing.\\n- **Symptom Control**: Helps control and prevent symptoms such as wheezing, coughing, chest tightness, and shortness of breath.\\n- **Convenience**: Once-daily dosing can improve adherence and make it easier for patients to manage their condition.\\n- **Improved Quality of Life**: Better management of asthma or COPD symptoms can lead to an improved ability to carry out daily activities.\\n\\n### Potential Side Effects\\n- **Common Side Effects**: These may include headaches, sore throat, cough, and oral thrush (a fungal infection in the mouth).\\n- **Muscle Pain**: Some people may experience muscle and joint pain.\\n- **Increased Heart Rate**: Vilanterol may cause an increased heart rate or palpitations.\\n- **Risk of Infections**: Corticosteroids can suppress the immune system, increasing the risk of infections, such as pneumonia.\\n- **Hoarseness or Voice Changes**: Inhaled corticosteroids can affect the vocal cords.\\n- **Paradoxical Bronchospasm**: Rarely, inhalers can cause bronchospasm, which is a tightening of the muscles around the airways.\\n\\nIt's important for patients to discuss their symptoms and any side effects they experience with their healthcare provider, who can offer guidance tailored to individual health needs.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 431, \"prompt_tokens\": 31, \"total_tokens\": 462}}}41.6401132025-11-14 05:43:14.0161842025-11-14 05:43:14.016184
ac009b3ba51c470ea3d672e1dc0fc290960997700d79409680974700c0d678251{\"request_id\": \"b2b05a40-a5a0-9a34-933b-df8af63c91a7\", \"intermediate_results\": {\"templating\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: qvar 40mg what is it for.\", \"role\": \"user\"}], \"llm\": {\"id\": \"chatcmpl-CbgpuOliFS4lXR3FtBVPP7oeSZOho\", \"object\": \"chat.completion\", \"created\": 1763098834, \"model\": \"gpt-4o-2024-08-06\", \"system_fingerprint\": \"fp_4a331a0222\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Qvar (beclomethasone dipropionate) is an inhaled corticosteroid medication commonly used for the maintenance treatment of asthma. It helps to prevent and control symptoms such as wheezing and shortness of breath by reducing inflammation in the airways. It is not intended for the relief of acute asthma attacks.\\n\\n### Benefits:\\n1. **Reduces Inflammation**: Decreases swelling and irritation in the airways, helping to prevent asthma symptoms.\\n2. **Improves Breathing**: Helps to open airways, making it easier to breathe.\\n3. **Fewer Asthma Attacks**: Regular use can lead to fewer and milder asthma attacks.\\n4. **Maintenance of Asthma Control**: Helps in maintaining long-term control of asthma symptoms.\\n\\n### Potential Side Effects:\\n1. **Throat Irritation**: May cause a sore throat or hoarseness.\\n2. **Oral Thrush**: A fungal infection in the mouth; can be minimized by rinsing the mouth after use.\\n3. **Cough**: Some users may experience a cough immediately after use.\\n4. **Nasal Congestion**: Stuffy nose might occur in some individuals.\\n5. **Headache**: A common mild side effect.\\n6. **Potential for Delayed Growth in Children**: Long-term use can affect growth rates in children, so growth should be monitored.\\n7. **Other Systemic Effects**: While rare with inhaled steroids, potential side effects could include changes in mood, adrenal suppression, and bone density reduction if used at high doses for prolonged periods.\\n\\nIt is essential for users to follow their healthcare provider\\u2019s instructions and discuss any concerns or side effects they experience.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 348, \"prompt_tokens\": 33, \"total_tokens\": 381}}}, \"final_result\": {\"id\": \"chatcmpl-CbgpuOliFS4lXR3FtBVPP7oeSZOho\", \"object\": \"chat.completion\", \"created\": 1763098834, \"model\": \"gpt-4o-2024-08-06\", \"system_fingerprint\": \"fp_4a331a0222\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Qvar (beclomethasone dipropionate) is an inhaled corticosteroid medication commonly used for the maintenance treatment of asthma. It helps to prevent and control symptoms such as wheezing and shortness of breath by reducing inflammation in the airways. It is not intended for the relief of acute asthma attacks.\\n\\n### Benefits:\\n1. **Reduces Inflammation**: Decreases swelling and irritation in the airways, helping to prevent asthma symptoms.\\n2. **Improves Breathing**: Helps to open airways, making it easier to breathe.\\n3. **Fewer Asthma Attacks**: Regular use can lead to fewer and milder asthma attacks.\\n4. **Maintenance of Asthma Control**: Helps in maintaining long-term control of asthma symptoms.\\n\\n### Potential Side Effects:\\n1. **Throat Irritation**: May cause a sore throat or hoarseness.\\n2. **Oral Thrush**: A fungal infection in the mouth; can be minimized by rinsing the mouth after use.\\n3. **Cough**: Some users may experience a cough immediately after use.\\n4. **Nasal Congestion**: Stuffy nose might occur in some individuals.\\n5. **Headache**: A common mild side effect.\\n6. **Potential for Delayed Growth in Children**: Long-term use can affect growth rates in children, so growth should be monitored.\\n7. **Other Systemic Effects**: While rare with inhaled steroids, potential side effects could include changes in mood, adrenal suppression, and bone density reduction if used at high doses for prolonged periods.\\n\\nIt is essential for users to follow their healthcare provider\\u2019s instructions and discuss any concerns or side effects they experience.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 348, \"prompt_tokens\": 33, \"total_tokens\": 381}}}44.5463812025-11-14 05:43:14.0161852025-11-14 05:43:14.016185
\n", - "
\n", - " \n", - "
\n", - "

Table: evaluation_result

\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
submission_idrun_idrepetition_countmetricaggregating_valuemetric_resulterrorcreated_atupdated_at
f1132352cb524392b733ca0773c6bef0960997700d79409680974700c0d678251\"BERT Score/F1\"0.522608{\"precision\": 0.4782029390335083, \"recall\": 0.5761047601699829, \"f1\": 0.5226083397865295}None2025-11-14 05:52:03.1291652025-11-14 05:52:03.129169
ce3fd3f1f8f04499847153fb1d3a2b0f960997700d79409680974700c0d678251\"BERT Score/F1\"0.490477{\"precision\": 0.4271319508552551, \"recall\": 0.5758814811706543, \"f1\": 0.4904767870903015}None2025-11-14 05:52:03.1291702025-11-14 05:52:03.129171
e0a2c70183dd4ce289772b5ae74957b6960997700d79409680974700c0d678251\"BERT Score/F1\"0.479421{\"precision\": 0.4179512560367584, \"recall\": 0.5620903968811035, \"f1\": 0.4794212579727173}None2025-11-14 05:52:03.1291712025-11-14 05:52:03.129174
61ca98967e5b45f4865db10d81de298b960997700d79409680974700c0d678251\"BERT Score/F1\"0.343647{\"precision\": 0.31056681275367737, \"recall\": 0.3846154510974884, \"f1\": 0.34364742040634155}None2025-11-14 05:52:03.1291752025-11-14 05:52:03.129176
9f84346869014412a1fd55ad52bd23b1960997700d79409680974700c0d678251\"BERT Score/F1\"0.331058{\"precision\": 0.27349621057510376, \"recall\": 0.4193095266819, \"f1\": 0.3310583531856537}None2025-11-14 05:52:03.1291762025-11-14 05:52:03.129177
98a14a854ba94f09b7a7b48c30878f3b960997700d79409680974700c0d678251\"BERT Score/F1\"0.532768{\"precision\": 0.457075834274292, \"recall\": 0.6385050415992737, \"f1\": 0.5327680706977844}None2025-11-14 05:52:03.1291772025-11-14 05:52:03.129178
6c34dd7a4bba4f4e8bd9bff5ab9afa1f960997700d79409680974700c0d678251\"BERT Score/F1\"0.549074{\"precision\": 0.5064661502838135, \"recall\": 0.5995101928710938, \"f1\": 0.5490743517875671}None2025-11-14 05:52:03.1291782025-11-14 05:52:03.129179
b8c446bc2c2c423e9d60b1e7511f5c36960997700d79409680974700c0d678251\"BERT Score/F1\"0.585487{\"precision\": 0.5031953454017639, \"recall\": 0.6999572515487671, \"f1\": 0.5854871869087219}None2025-11-14 05:52:03.1291802025-11-14 05:52:03.129180
4b6e2f1ddefa4f1585977e1cbe2fa900960997700d79409680974700c0d678251\"BERT Score/F1\"0.506072{\"precision\": 0.49769705533981323, \"recall\": 0.5147332549095154, \"f1\": 0.5060718655586243}None2025-11-14 05:52:03.1291812025-11-14 05:52:03.129181
ac009b3ba51c470ea3d672e1dc0fc290960997700d79409680974700c0d678251\"BERT Score/F1\"0.509633{\"precision\": 0.4473775625228882, \"recall\": 0.5920162200927734, \"f1\": 0.5096331238746643}None2025-11-14 05:52:03.1291822025-11-14 05:52:03.129182
\n", - "
\n", - "
" - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "# viewing the results from sqlite db in tabular format..\n", - "import sqlite3\n", - "import pandas as pd\n", - "from IPython.display import display, HTML\n", - "\n", - "# Path to your SQLite database file\n", - "db_file = 'results-new/results.db'\n", - "\n", - "connection = sqlite3.connect(db_file)\n", - "\n", - "# Specify the table names you want to display\n", - "table_names = ['run','configuration', 'submission', 'submission_result', 'evaluation_result'] \n", - "\n", - "# Create the CSS and HTML container\n", - "html_content = \"\"\"\n", - "\n", - "
\n", - "\"\"\"\n", - "\n", - "for table_name in table_names:\n", - " query = f\"SELECT * FROM {table_name};\"\n", - " df = pd.read_sql_query(query, connection)\n", - " # If you want to see all the rows across all tables, remove/comment the next line\n", - " df = df.head(10) # Limiting the number of rows displayed\n", - " table_html = df.to_html(classes='table-container', index=False)\n", - " html_content += f\"\"\"\n", - "
\n", - "

Table: {table_name}

\n", - " {table_html}\n", - "
\n", - " \"\"\"\n", - "\n", - "html_content += \"
\"\n", - "\n", - "display(HTML(html_content))\n", - "\n", - "# Close the connection\n", - "connection.close()" - ] - }, - { - "cell_type": "code", - "execution_count": 94, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "{'id': 'ef8963f68ec99242', 'message': 'Deletion scheduled', 'targetStatus': 'DELETED'}\n" - ] - } - ], - "source": [ - "#Delete Execution Id\n", - "def delete_execution():\n", - " headers = _get_headers()\n", - " EXEC_ID = execution_id\n", - " GET_EXECUTIONS_ENDPOINT = '/v2/lm/executions/'\n", - " request_url = f\"{AICORE_BASE_URL}{GET_EXECUTIONS_ENDPOINT}{EXEC_ID}\"\n", - " try:\n", - " response = requests.delete(\n", - " request_url, headers=headers, params={\"AI-Resource-Group\":AICORE_RESOURCE_GROUP}, timeout=120\n", - " )\n", - " print(response)\n", - " if(response.status_code != 202):\n", - " raise\n", - " result = response.json()\n", - " print(result)\n", - " except:\n", - " logging.error(\"Error occurred while attempting to delete a Configuration\")\n", - " raise\n", - " \n", - "delete_execution()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.12.4" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/tutorials/ai-core-genaihub-evaluation/img/AI_Core.json b/tutorials/ai-core-genaihub-evaluation/img/AI_Core.json index 56a807a86..bb30bf61b 100644 --- a/tutorials/ai-core-genaihub-evaluation/img/AI_Core.json +++ b/tutorials/ai-core-genaihub-evaluation/img/AI_Core.json @@ -1,1578 +1,1578 @@ -{ - "name": "AI Core", - "version": "1", - "items": [ - { - "type": "http", - "name": "get_token", - "filename": "get_token.bru", - "seq": 1, - "request": { - "url": "{{ai_auth_url}}/oauth/token", - "method": "POST", - "headers": [ - { - "name": "Content-Type", - "value": "application/x-www-form-urlencoded", - "enabled": true - } - ], - "params": [], - "body": { - "mode": "formUrlEncoded", - "formUrlEncoded": [ - { - "name": "grant_type", - "value": "client_credentials", - "enabled": true - }, - { - "name": "client_id", - "value": "{{client_id}}", - "enabled": true - }, - { - "name": "client_secret", - "value": "{{client_secret}}", - "enabled": true - } - ], - "multipartForm": [], - "file": [] - }, - "script": { - "res": "if (res.getStatus() == 200) {\n bru.setEnvVar(\"access_token\", res.body.access_token);\n}" - }, - "vars": {}, - "assertions": [], - "tests": "", - "docs": "", - "auth": { - "mode": "none" - } - } - }, - { - "type": "folder", - "name": "admin", - "filename": "admin", - "root": { - "meta": { - "name": "admin" - } - }, - "items": [ - { - "type": "folder", - "name": "objectStoreSecrets", - "filename": "objectStoreSecrets", - "root": { - "meta": { - "name": "objectStoreSecrets" - } - }, - "items": [ - { - "type": "http", - "name": "Create a secret", - "filename": "Create a secret.bru", - "seq": 1, - "request": { - "url": "{{baseUrl}}/v2/admin/objectStoreSecrets", - "method": "POST", - "headers": [ - { - "name": "AI-Resource-Group", - "value": "", - "enabled": true - }, - { - "name": "Content-Type", - "value": "application/json", - "enabled": true - }, - { - "name": "Accept", - "value": "application/json", - "enabled": true - }, - { - "name": "Authorization", - "value": "", - "enabled": true - } - ], - "params": [], - "body": { - "mode": "json", - "json": "{\n \"name\": \"genai-data\",\n \"data\": {\n \"AWS_ACCESS_KEY_ID\": \"\",\n \"AWS_SECRET_ACCESS_KEY\": \"\"\n },\n \"type\": \"S3\",\n \"bucket\": \"\",\n \"endpoint\": \"https://s3.eu-central-1.amazonaws.com\",\n \"region\": \"\",\n \"pathPrefix\": \"\" \n }", - "formUrlEncoded": [], - "multipartForm": [], - "file": [] - }, - "script": {}, - "vars": {}, - "assertions": [], - "tests": "", - "docs": "Create a secret based on the configuration in the request body\n", - "auth": { - "mode": "bearer", - "bearer": { - "token": "{{access_token}}" - } - } - } - }, - { - "type": "http", - "name": "Get a list of metadata of available secrets.", - "filename": "Get a list of metadata of available secrets.bru", - "seq": 2, - "request": { - "url": "{{baseUrl}}/admin/objectStoreSecrets?$top=&$skip=&$count=", - "method": "GET", - "headers": [ - { - "name": "AI-Resource-Group", - "value": "", - "enabled": true - }, - { - "name": "Accept", - "value": "application/json", - "enabled": true - } - ], - "params": [ - { - "name": "$top", - "value": "", - "type": "query", - "enabled": true - }, - { - "name": "$skip", - "value": "", - "type": "query", - "enabled": true - }, - { - "name": "$count", - "value": "", - "type": "query", - "enabled": true - } - ], - "body": { - "mode": "none", - "formUrlEncoded": [], - "multipartForm": [], - "file": [] - }, - "script": {}, - "vars": {}, - "assertions": [], - "tests": "", - "docs": "Retrieve a list of metadata of the stored secrets.\n", - "auth": { - "mode": "oauth2", - "oauth2": { - "grantType": "authorization_code", - "callbackUrl": "", - "authorizationUrl": "", - "accessTokenUrl": "", - "refreshTokenUrl": "", - "clientId": "", - "clientSecret": "", - "scope": "", - "credentialsPlacement": "basic_auth_header", - "pkce": false, - "credentialsId": "credentials", - "tokenPlacement": "header", - "tokenHeaderPrefix": "Bearer", - "tokenQueryKey": "access_token", - "autoFetchToken": true, - "autoRefreshToken": false - } - } - } - }, - { - "type": "folder", - "name": "{objectStoreName}", - "filename": "{objectStoreName}", - "root": { - "meta": { - "name": "{objectStoreName}" - } - }, - "items": [ - { - "type": "http", - "name": "Delete object store secret", - "filename": "Delete object store secret.bru", - "seq": 2, - "request": { - "url": "{{baseUrl}}/admin/objectStoreSecrets/:objectStoreName", - "method": "DELETE", - "headers": [ - { - "name": "AI-Resource-Group", - "value": "", - "enabled": true - }, - { - "name": "Accept", - "value": "application/json", - "enabled": true - } - ], - "params": [ - { - "name": "objectStoreName", - "value": "qKoZ-aHSe", - "type": "path", - "enabled": true - } - ], - "body": { - "mode": "none", - "formUrlEncoded": [], - "multipartForm": [], - "file": [] - }, - "script": {}, - "vars": {}, - "assertions": [], - "tests": "", - "docs": "Delete a secret with the name of objectStoreName if it exists.", - "auth": { - "mode": "oauth2", - "oauth2": { - "grantType": "authorization_code", - "callbackUrl": "", - "authorizationUrl": "", - "accessTokenUrl": "", - "refreshTokenUrl": "", - "clientId": "", - "clientSecret": "", - "scope": "", - "credentialsPlacement": "basic_auth_header", - "pkce": false, - "credentialsId": "credentials", - "tokenPlacement": "header", - "tokenHeaderPrefix": "Bearer", - "tokenQueryKey": "access_token", - "autoFetchToken": true, - "autoRefreshToken": false - } - } - } - }, - { - "type": "http", - "name": "Returns the of metadata of secrets which match the query parameter.", - "filename": "Returns the of metadata of secrets which match the query parameter.bru", - "seq": 1, - "request": { - "url": "{{baseUrl}}/v2/admin/objectStoreSecrets", - "method": "GET", - "headers": [ - { - "name": "AI-Resource-Group", - "value": "", - "enabled": true - }, - { - "name": "Accept", - "value": "application/json", - "enabled": true - } - ], - "params": [], - "body": { - "mode": "none", - "formUrlEncoded": [], - "multipartForm": [], - "file": [] - }, - "script": {}, - "vars": {}, - "assertions": [], - "tests": "", - "docs": "This retrieves the metadata of the stored secret which match the parameter objectStoreName.\nThe fetched secret is constructed like objectStoreName-object-store-secret\nThe base64 encoded field for the stored secret is not returned.\n", - "auth": { - "mode": "oauth2", - "oauth2": { - "grantType": "authorization_code", - "callbackUrl": "", - "authorizationUrl": "", - "accessTokenUrl": "", - "refreshTokenUrl": "", - "clientId": "", - "clientSecret": "", - "scope": "", - "credentialsPlacement": "basic_auth_header", - "pkce": false, - "credentialsId": "credentials", - "tokenPlacement": "header", - "tokenHeaderPrefix": "Bearer", - "tokenQueryKey": "access_token", - "autoFetchToken": true, - "autoRefreshToken": false - } - } - } - }, - { - "type": "http", - "name": "Update object store secret", - "filename": "Update object store secret.bru", - "seq": 3, - "request": { - "url": "{{baseUrl}}/admin/objectStoreSecrets/:objectStoreName", - "method": "PATCH", - "headers": [ - { - "name": "AI-Resource-Group", - "value": "", - "enabled": true - }, - { - "name": "Content-Type", - "value": "application/json", - "enabled": true - }, - { - "name": "Accept", - "value": "application/json", - "enabled": true - } - ], - "params": [ - { - "name": "objectStoreName", - "value": "qKoZ-aHSe", - "type": "path", - "enabled": true - } - ], - "body": { - "mode": "json", - "json": "{\n \"name\": \"\",\n \"type\": \"\",\n \"data\": {},\n \"bucket\": \"\",\n \"endpoint\": \"\",\n \"region\": \"\",\n \"pathPrefix\": \"\",\n \"verifyssl\": \"\",\n \"usehttps\": \"1\"\n}", - "formUrlEncoded": [], - "multipartForm": [], - "file": [] - }, - "script": {}, - "vars": {}, - "assertions": [], - "tests": "", - "docs": "Update a secret with name of objectStoreName if it exists.\n", - "auth": { - "mode": "oauth2", - "oauth2": { - "grantType": "authorization_code", - "callbackUrl": "", - "authorizationUrl": "", - "accessTokenUrl": "", - "refreshTokenUrl": "", - "clientId": "", - "clientSecret": "", - "scope": "", - "credentialsPlacement": "basic_auth_header", - "pkce": false, - "credentialsId": "credentials", - "tokenPlacement": "header", - "tokenHeaderPrefix": "Bearer", - "tokenQueryKey": "access_token", - "autoFetchToken": true, - "autoRefreshToken": false - } - } - } - } - ] - } - ] - } - ] - }, - { - "type": "folder", - "name": "lm", - "filename": "lm", - "root": { - "meta": { - "name": "lm" - } - }, - "items": [ - { - "type": "folder", - "name": "configurations", - "filename": "configurations", - "root": { - "meta": { - "name": "configurations" - } - }, - "items": [ - { - "type": "http", - "name": "Create configuration Copy", - "filename": "Create configuration Copy.bru", - "seq": 3, - "request": { - "url": "{{baseUrl}}/v2/lm/configurations", - "method": "DELETE", - "headers": [ - { - "name": "AI-Resource-Group", - "value": "{{resource_group}}", - "enabled": true - }, - { - "name": "Content-Type", - "value": "application/json", - "enabled": true - }, - { - "name": "Accept", - "value": "application/json", - "enabled": true - } - ], - "params": [], - "body": { - "mode": "json", - "json": "{\n \"id\": \"\"\n}", - "formUrlEncoded": [], - "multipartForm": [], - "file": [] - }, - "script": {}, - "vars": {}, - "assertions": [], - "tests": "", - "docs": "Create a new configuration linked to a specific scenario and executable for use in an execution\nor deployment.\n", - "auth": { - "mode": "bearer", - "bearer": { - "token": "{{access_token}}" - } - } - } - }, - { - "type": "http", - "name": "Create configuration", - "filename": "Create configuration.bru", - "seq": 2, - "request": { - "url": "{{baseUrl}}/v2/lm/configurations", - "method": "POST", - "headers": [ - { - "name": "AI-Resource-Group", - "value": "{{resource_group}}", - "enabled": true - }, - { - "name": "Content-Type", - "value": "application/json", - "enabled": true - }, - { - "name": "Accept", - "value": "application/json", - "enabled": true - } - ], - "params": [], - "body": { - "mode": "json", - "json": "{\n \"name\": \"genai-eval-conf\",\n \"scenarioId\": \"genai-evaluations\",\n \"executableId\": \"genai-evaluations-simplified\",\n \"inputArtifactBindings\": [\n {\n \"key\": \"datasetFolder\",\n \"artifactId\": \"\"\n }\n ],\n \"parameterBindings\": [\n {\n \"key\": \"repetitions\",\n \"value\": \"1\"\n },\n {\n \"key\": \"orchestrationDeploymentURL\",\n \"value\": \"\"\n\n },\n {\n \"key\": \"metrics\",\n \"value\": \"language_match\"\n },\n {\n \"key\": \"testDataset\",\n \"value\": \"{\\\"path\\\": \\\"testdata/global_customer_queries.csv\\\", \\\"type\\\": \\\"csv\\\"}\"\n },\n {\n \"key\": \"promptTemplate\",\n \"value\": \"\"\n },\n {\n \"key\": \"models\",\n \"value\": \"gpt-4.1:latest\"\n }\n ]\n}\n", - "formUrlEncoded": [], - "multipartForm": [], - "file": [] - }, - "script": {}, - "vars": {}, - "assertions": [], - "tests": "", - "docs": "Create a new configuration linked to a specific scenario and executable for use in an execution\nor deployment.\n", - "auth": { - "mode": "bearer", - "bearer": { - "token": "{{access_token}}" - } - } - } - }, - { - "type": "http", - "name": "Get list of configurations", - "filename": "Get list of configurations.bru", - "seq": 1, - "request": { - "url": "{{baseUrl}}/v2/lm/configurations", - "method": "GET", - "headers": [ - { - "name": "AI-Resource-Group", - "value": "{{resource_group}}", - "enabled": true - }, - { - "name": "Accept", - "value": "application/json", - "enabled": true - } - ], - "params": [], - "body": { - "mode": "none", - "formUrlEncoded": [], - "multipartForm": [], - "file": [] - }, - "script": {}, - "vars": {}, - "assertions": [], - "tests": "", - "docs": "Retrieve a list of configurations. Filter results by scenario ID or a list of executable IDs.\nSearch for configurations containing the search string as substring in the configuration name.\n", - "auth": { - "mode": "bearer", - "bearer": { - "token": "{{access_token}}" - } - } - } - }, - { - "type": "folder", - "name": "{configurationId}", - "filename": "{configurationId}", - "root": { - "meta": { - "name": "{configurationId}" - } - }, - "items": [ - { - "type": "http", - "name": "Get configuration by ID", - "filename": "Get configuration by ID.bru", - "seq": 1, - "request": { - "url": "{{baseUrl}}/v2/lm/configurations", - "method": "GET", - "headers": [ - { - "name": "AI-Resource-Group", - "value": "{{resource_group}}", - "enabled": true - }, - { - "name": "Accept", - "value": "application/json", - "enabled": true - } - ], - "params": [], - "body": { - "mode": "none", - "formUrlEncoded": [], - "multipartForm": [], - "file": [] - }, - "script": {}, - "vars": {}, - "assertions": [], - "tests": "", - "docs": "Retrieve details for configuration with configurationId.", - "auth": { - "mode": "bearer", - "bearer": { - "token": "{{access_token}}" - } - } - } - } - ] - }, - { - "type": "folder", - "name": "$count", - "filename": "$count", - "root": { - "meta": { - "name": "$count" - } - }, - "items": [ - { - "type": "http", - "name": "Get number of configurations", - "filename": "Get number of configurations.bru", - "seq": 1, - "request": { - "url": "{{baseUrl}}/lm/configurations/$count?scenarioId=iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN&$search=}\"NI2Kn!V&searchCaseInsensitive=false&executableIds=T_jtbUJzwg0e.okSV667jeZejqVb,3e0cmfc4c-6YavNz92uztZE", - "method": "GET", - "headers": [ - { - "name": "AI-Resource-Group", - "value": "", - "enabled": true - }, - { - "name": "Accept", - "value": "text/plain", - "enabled": true - } - ], - "params": [ - { - "name": "scenarioId", - "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", - "type": "query", - "enabled": true - }, - { - "name": "$search", - "value": "}\"NI2Kn!V", - "type": "query", - "enabled": true - }, - { - "name": "searchCaseInsensitive", - "value": "false", - "type": "query", - "enabled": true - }, - { - "name": "executableIds", - "value": "T_jtbUJzwg0e.okSV667jeZejqVb,3e0cmfc4c-6YavNz92uztZE", - "type": "query", - "enabled": true - } - ], - "body": { - "mode": "none", - "formUrlEncoded": [], - "multipartForm": [], - "file": [] - }, - "script": {}, - "vars": {}, - "assertions": [], - "tests": "", - "docs": "Retrieve the number of available configurations that match the specified filter criteria.\nFilter criteria include a scenarioId or executableIdsList. Search by substring of configuration name is also possible.\n", - "auth": { - "mode": "oauth2", - "oauth2": { - "grantType": "authorization_code", - "callbackUrl": "", - "authorizationUrl": "", - "accessTokenUrl": "", - "refreshTokenUrl": "", - "clientId": "", - "clientSecret": "", - "scope": "", - "credentialsPlacement": "basic_auth_header", - "pkce": false, - "credentialsId": "credentials", - "tokenPlacement": "header", - "tokenHeaderPrefix": "Bearer", - "tokenQueryKey": "access_token", - "autoFetchToken": true, - "autoRefreshToken": false - } - } - } - } - ] - } - ] - }, - { - "type": "folder", - "name": "artifacts", - "filename": "artifacts", - "root": { - "meta": { - "name": "artifacts" - } - }, - "items": [ - { - "type": "http", - "name": "Get list of artifacts", - "filename": "Get list of artifacts.bru", - "seq": 2, - "request": { - "url": "{{baseUrl}}/v2/lm/artifacts", - "method": "GET", - "headers": [ - { - "name": "AI-Resource-Group", - "value": "", - "enabled": true - }, - { - "name": "Accept", - "value": "application/json", - "enabled": true - } - ], - "params": [ - { - "name": "scenarioId", - "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", - "type": "query", - "enabled": false - }, - { - "name": "executionId", - "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", - "type": "query", - "enabled": false - }, - { - "name": "name", - "value": "[G7 ovyt8i", - "type": "query", - "enabled": false - }, - { - "name": "kind", - "value": "other", - "type": "query", - "enabled": false - }, - { - "name": "artifactLabelSelector", - "value": "ext.ai.sap.com/bXN1EAk=D*", - "type": "query", - "enabled": false - }, - { - "name": "$top", - "value": "10000", - "type": "query", - "enabled": false - }, - { - "name": "$skip", - "value": "", - "type": "query", - "enabled": false - }, - { - "name": "$search", - "value": "}\"NI2Kn!V", - "type": "query", - "enabled": false - }, - { - "name": "searchCaseInsensitive", - "value": "false", - "type": "query", - "enabled": false - }, - { - "name": "$expand", - "value": "scenario", - "type": "query", - "enabled": false - } - ], - "body": { - "mode": "none", - "formUrlEncoded": [], - "multipartForm": [], - "file": [] - }, - "script": {}, - "vars": {}, - "assertions": [], - "tests": "", - "docs": "Retrieve a list of artifacts that matches the specified filter criteria.\nFilter criteria include scenario ID, execution ID, an artifact name, artifact kind, or artifact labels.\nUse top/skip parameters to paginate the result list.\nSearch by substring of artifact name or description, if required.\n", - "auth": { - "mode": "bearer", - "bearer": { - "token": "{{access_token}}" - } - } - } - }, - { - "type": "http", - "name": "Register artifact", - "filename": "Register artifact.bru", - "seq": 1, - "request": { - "url": "{{baseUrl}}/v2/lm/artifacts", - "method": "POST", - "headers": [ - { - "name": "AI-Resource-Group", - "value": "{{resource_group}}", - "enabled": true - }, - { - "name": "Content-Type", - "value": "application/json", - "enabled": true - }, - { - "name": "Accept", - "value": "application/json", - "enabled": true - } - ], - "params": [], - "body": { - "mode": "json", - "json": "{\n \"name\": \"aiconfig\",\n \"kind\": \"dataset\",\n \"url\": \"ai://genai-data/genaiEvaluation/14af1af80b974edb8731632d17286343\",\n \"scenarioId\": \"genai-evaluations\"\n}\n", - "formUrlEncoded": [], - "multipartForm": [], - "file": [] - }, - "script": {}, - "vars": {}, - "assertions": [], - "tests": "", - "docs": "Register an artifact for use in a configuration, for example a model or a dataset.", - "auth": { - "mode": "bearer", - "bearer": { - "token": "{{access_token}}" - } - } - } - }, - { - "type": "folder", - "name": "$count", - "filename": "$count", - "root": { - "meta": { - "name": "$count" - } - }, - "items": [ - { - "type": "http", - "name": "Get number of artifacts", - "filename": "Get number of artifacts.bru", - "seq": 1, - "request": { - "url": "{{baseUrl}}/lm/artifacts/$count?scenarioId=iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN&executionId=iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN&name=[G7 ovyt8i&kind=other&$search=}\"NI2Kn!V&searchCaseInsensitive=false&artifactLabelSelector=ext.ai.sap.com/bXN1EAk=D*", - "method": "GET", - "headers": [ - { - "name": "AI-Resource-Group", - "value": "", - "enabled": true - }, - { - "name": "Accept", - "value": "text/plain", - "enabled": true - } - ], - "params": [ - { - "name": "scenarioId", - "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", - "type": "query", - "enabled": true - }, - { - "name": "executionId", - "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", - "type": "query", - "enabled": true - }, - { - "name": "name", - "value": "[G7 ovyt8i", - "type": "query", - "enabled": true - }, - { - "name": "kind", - "value": "other", - "type": "query", - "enabled": true - }, - { - "name": "$search", - "value": "}\"NI2Kn!V", - "type": "query", - "enabled": true - }, - { - "name": "searchCaseInsensitive", - "value": "false", - "type": "query", - "enabled": true - }, - { - "name": "artifactLabelSelector", - "value": "ext.ai.sap.com/bXN1EAk=D*", - "type": "query", - "enabled": true - } - ], - "body": { - "mode": "none", - "formUrlEncoded": [], - "multipartForm": [], - "file": [] - }, - "script": {}, - "vars": {}, - "assertions": [], - "tests": "", - "docs": "Retrieve the number of available artifacts that match the specified filter criteria.\nFilter criteria include a scenarioId, executionId, an artifact name, artifact kind, or artifact labels.\nSearch by substring of artifact name or description is also possible.\n", - "auth": { - "mode": "oauth2", - "oauth2": { - "grantType": "authorization_code", - "callbackUrl": "", - "authorizationUrl": "", - "accessTokenUrl": "", - "refreshTokenUrl": "", - "clientId": "", - "clientSecret": "", - "scope": "", - "credentialsPlacement": "basic_auth_header", - "pkce": false, - "credentialsId": "credentials", - "tokenPlacement": "header", - "tokenHeaderPrefix": "Bearer", - "tokenQueryKey": "access_token", - "autoFetchToken": true, - "autoRefreshToken": false - } - } - } - } - ] - } - ] - }, - { - "type": "folder", - "name": "executions", - "filename": "executions", - "root": { - "meta": { - "name": "executions" - } - }, - "items": [ - { - "type": "http", - "name": "Create execution", - "filename": "Create execution.bru", - "seq": 2, - "request": { - "url": "{{baseUrl}}/v2/lm/executions", - "method": "POST", - "headers": [ - { - "name": "AI-Resource-Group", - "value": "{{resource_group}}", - "enabled": true - }, - { - "name": "Content-Type", - "value": "application/json", - "enabled": true - }, - { - "name": "Accept", - "value": "application/json", - "enabled": true - } - ], - "params": [], - "body": { - "mode": "json", - "json": "{\n \"configurationId\": \"\"\n}", - "formUrlEncoded": [], - "multipartForm": [], - "file": [] - }, - "script": {}, - "vars": {}, - "assertions": [], - "tests": "", - "docs": "Create an execution using the configuration specified by configurationId.", - "auth": { - "mode": "bearer", - "bearer": { - "token": "{{access_token}}" - } - } - } - }, - { - "type": "http", - "name": "Get list of executions", - "filename": "Get list of executions.bru", - "seq": 1, - "request": { - "url": "{{baseUrl}}/v2/lm/executions/", - "method": "GET", - "headers": [ - { - "name": "AI-Resource-Group", - "value": "{{resource_group}}", - "enabled": true - }, - { - "name": "Accept", - "value": "application/json", - "enabled": true - } - ], - "params": [ - { - "name": "scenarioId", - "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", - "type": "query", - "enabled": false - }, - { - "name": "executionScheduleId", - "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", - "type": "query", - "enabled": false - }, - { - "name": "status", - "value": "DEAD", - "type": "query", - "enabled": false - }, - { - "name": "$top", - "value": "10000", - "type": "query", - "enabled": false - }, - { - "name": "$skip", - "value": "", - "type": "query", - "enabled": false - }, - { - "name": "$select", - "value": "status", - "type": "query", - "enabled": false - } - ], - "body": { - "mode": "none", - "formUrlEncoded": [], - "multipartForm": [], - "file": [] - }, - "script": {}, - "vars": {}, - "assertions": [], - "tests": "", - "docs": "Retrieve a list of executions that match the specified filter criteria.\nFilter criteria include a list of executableIds, a scenarioId, a configurationId, or a execution status.\nWith top/skip parameters it is possible to paginate the result list.\nWith select parameter it is possible to select only status.\n", - "auth": { - "mode": "bearer", - "bearer": { - "token": "{{access_token}}" - } - } - } - }, - { - "type": "folder", - "name": "$count", - "filename": "$count", - "root": { - "meta": { - "name": "$count" - } - } - } - ] - }, - { - "type": "folder", - "name": "deployments", - "filename": "deployments", - "root": { - "meta": { - "name": "deployments" - } - }, - "items": [ - { - "type": "http", - "name": "Create deployment", - "filename": "Create deployment.bru", - "seq": 2, - "request": { - "url": "{{baseUrl}}/v2/lm/deployments", - "method": "POST", - "headers": [ - { - "name": "AI-Resource-Group", - "value": "{{resource_group}}", - "enabled": true - }, - { - "name": "Content-Type", - "value": "application/json", - "enabled": true - }, - { - "name": "Accept", - "value": "application/json", - "enabled": true - } - ], - "params": [], - "body": { - "mode": "json", - "json": "{\n \"configurationId\": \"\"\n}", - "formUrlEncoded": [], - "multipartForm": [], - "file": [] - }, - "script": {}, - "vars": {}, - "assertions": [], - "tests": "", - "docs": "Create a deployment using the configuration specified by configurationId after synchronously checking the\ncorrectness of the configuration.\n", - "auth": { - "mode": "bearer", - "bearer": { - "token": "{{access_token}}" - } - } - } - }, - { - "type": "http", - "name": "Get list of deployments", - "filename": "Get list of deployments.bru", - "seq": 1, - "request": { - "url": "{{baseUrl}}/v2/lm/deployments", - "method": "GET", - "headers": [ - { - "name": "AI-Resource-Group", - "value": "{{resource_group}}", - "enabled": true - }, - { - "name": "Accept", - "value": "application/json", - "enabled": true - } - ], - "params": [], - "body": { - "mode": "none", - "formUrlEncoded": [], - "multipartForm": [], - "file": [] - }, - "script": {}, - "vars": {}, - "assertions": [], - "tests": "", - "docs": "Retrieve a list of deployments that match the specified filter criteria.\nFilter criteria include a list of executableIds, a scenarioId, a configurationId, or a deployment status.\nWith top/skip parameters it is possible to paginate the result list.\nWith select parameter it is possible to select only status.\n", - "auth": { - "mode": "bearer", - "bearer": { - "token": "{{access_token}}" - } - } - } - }, - { - "type": "folder", - "name": "$count", - "filename": "$count", - "root": { - "meta": { - "name": "$count" - } - }, - "items": [ - { - "type": "http", - "name": "Get number of deployments", - "filename": "Get number of deployments.bru", - "seq": 1, - "request": { - "url": "{{baseUrl}}/lm/deployments/$count?executableIds=T_jtbUJzwg0e.okSV667jeZejqVb,3e0cmfc4c-6YavNz92uztZE&configurationId=iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN&scenarioId=iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN&status=DEAD", - "method": "GET", - "headers": [ - { - "name": "AI-Resource-Group", - "value": "", - "enabled": true - }, - { - "name": "Accept", - "value": "text/plain", - "enabled": true - } - ], - "params": [ - { - "name": "executableIds", - "value": "T_jtbUJzwg0e.okSV667jeZejqVb,3e0cmfc4c-6YavNz92uztZE", - "type": "query", - "enabled": true - }, - { - "name": "configurationId", - "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", - "type": "query", - "enabled": true - }, - { - "name": "scenarioId", - "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", - "type": "query", - "enabled": true - }, - { - "name": "status", - "value": "DEAD", - "type": "query", - "enabled": true - } - ], - "body": { - "mode": "none", - "formUrlEncoded": [], - "multipartForm": [], - "file": [] - }, - "script": {}, - "vars": {}, - "assertions": [], - "tests": "", - "docs": "Retrieve the number of available deployments. The number can be filtered by\nscenarioId, configurationId, executableIdsList or by deployment status.\n", - "auth": { - "mode": "oauth2", - "oauth2": { - "grantType": "authorization_code", - "callbackUrl": "", - "authorizationUrl": "", - "accessTokenUrl": "", - "refreshTokenUrl": "", - "clientId": "", - "clientSecret": "", - "scope": "", - "credentialsPlacement": "basic_auth_header", - "pkce": false, - "credentialsId": "credentials", - "tokenPlacement": "header", - "tokenHeaderPrefix": "Bearer", - "tokenQueryKey": "access_token", - "autoFetchToken": true, - "autoRefreshToken": false - } - } - } - } - ] - } - ] - }, - { - "type": "folder", - "name": "metrics", - "filename": "metrics", - "root": { - "meta": { - "name": "metrics" - } - }, - "items": [ - { - "type": "http", - "name": "Evaluation Metrics via Execution ID", - "filename": "Evaluation Metrics via Execution ID.bru", - "seq": 4, - "request": { - "url": "{{baseUrl}}/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/child-of=", - "method": "GET", - "headers": [ - { - "name": "AI-Resource-Group", - "value": "{{resource_group}}", - "enabled": true - }, - { - "name": "Accept", - "value": "application/json", - "enabled": true - } - ], - "params": [ - { - "name": "tagFilters", - "url": "{{baseUrl}}/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/child-of=", - "value": "evaluation.ai.sap.com/child-of=", - "type": "query", - "enabled": true - } - ], - "body": { - "mode": "none", - "formUrlEncoded": [], - "multipartForm": [], - "file": [] - }, - "script": {}, - "vars": {}, - "assertions": [], - "tests": "", - "docs": "", - "auth": { - "mode": "bearer", - "bearer": { - "token": "{{access_token}}" - } - } - } - }, - { - "type": "http", - "name": "Metrics by Run Name", - "filename": "Metrics by Run Name.bru", - "seq": 5, - "request": { - "url": "{{baseUrl}}/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/run-name=run1", - "method": "GET", - "headers": [ - { - "name": "AI-Resource-Group", - "value": "{{resource_group}}", - "enabled": true - }, - { - "name": "Accept", - "value": "application/json", - "enabled": true - } - ], - "params": [ - { - "name": "tagFilters", - "value": "evaluation.ai.sap.com/run-name=run1", - "type": "query", - "enabled": true - } - ], - "body": { - "mode": "none", - "formUrlEncoded": [], - "multipartForm": [], - "file": [] - }, - "script": {}, - "vars": {}, - "assertions": [], - "tests": "", - "docs": "", - "auth": { - "mode": "bearer", - "bearer": { - "token": "{{access_token}}" - } - } - } - } - ] - } - ] - } - ], - "activeEnvironmentUid": "lWUmIcEkGnkMxwNBILLmY", - "environments": [ - { - "variables": [ - { - "name": "ai_auth_url", - "value": "", - "enabled": true, - "secret": false, - "type": "text" - }, - { - "name": "ai_api_url", - "value": "", - "enabled": true, - "secret": false, - "type": "text" - }, - { - "name": "client_id", - "value": "", - "enabled": true, - "secret": false, - "type": "text" - }, - { - "name": "client_secret", - "value": "", - "enabled": true, - "secret": false, - "type": "text" - }, - { - "name": "resource_group", - "value": "", - "enabled": true, - "secret": false, - "type": "text" - }, - { - "name": "orchestration_service_url", - "value": "", - "enabled": true, - "secret": false, - "type": "text" - }, - { - "name": "access_token", - "value": "", - "enabled": true, - "secret": true, - "type": "text" - } - ], - "name": "intprod" - } - ], - "root": { - "request": { - "auth": { - "mode": "oauth2", - "oauth2": { - "grantType": "authorization_code", - "callbackUrl": "", - "authorizationUrl": "", - "accessTokenUrl": "", - "refreshTokenUrl": "", - "clientId": "", - "clientSecret": "", - "scope": "", - "state": "", - "pkce": false, - "credentialsPlacement": "basic_auth_header", - "credentialsId": "credentials", - "tokenPlacement": "header", - "tokenHeaderPrefix": "Bearer", - "tokenQueryKey": "access_token", - "autoFetchToken": true, - "autoRefreshToken": false - } - }, - "vars": { - "req": [ - { - "name": "region", - "value": "prod.eu-central-1.aws", - "enabled": true, - "local": false, - "uid": "oYVk4DuVpyYqqP2roBVjE" - }, - { - "name": "baseUrl", - "value": "", - "enabled": true, - "local": false, - "uid": "I4KjDm7FxpSRwUYzjwfPG" - }, - { - "name": "auth_url", - "value": "", - "enabled": true, - "local": false, - "uid": "zuftvyCURtA9XYErCYDgo" - }, - { - "name": "client_id", - "value": "", - "enabled": true, - "local": false, - "uid": "JfGEVKm71BYTgR8UkQUGv" - }, - { - "name": "client_secret", - "value": "", - "enabled": true, - "local": false, - "uid": "ls3RYTJ40baTl8eYmilGt" - }, - { - "name": "AWS_ACCESS_KEY_ID", - "value": "", - "enabled": true, - "local": false, - "uid": "2O0YTTAdmYltm5XiHMhP2" - }, - { - "name": "AWS_SECRET_ACCESS_KEY", - "value": "", - "enabled": true, - "local": false, - "uid": "8rc4RYyPcHXyTkAnnI981" - }, - { - "name": "BUCKET_NAME", - "value": "", - "enabled": true, - "local": false, - "uid": "HqFIe8Rvc14i41WIAGGkl" - }, - { - "name": "DATABASE_URL", - "value": "https://s3-eu-central-1.amazonaws.com", - "enabled": true, - "local": false, - "uid": "aWIwuJZH5XQ5Guu2D69Sq" - } - ] - } - }, - "docs": "Provides tools to manage your scenarios and workflows in SAP AI Core. Execute pipelines as a batch job, for example to pre-process or train your models, or perform batch inference. Serve inference requests of trained models. Deploy а trained machine learning model as a web service to serve inference requests with high performance. Register your own Docker registry, synchronize your AI content from your own git repository, and register your own object store for training data and trained models.\n", - "meta": { - "name": "AI Core" - } - }, - "brunoConfig": { - "version": "1", - "name": "AI Core", - "type": "collection", - "ignore": [ - "node_modules", - ".git" - ], - "size": 0.10747432708740234, - "filesCount": 151 - } -} +{ + "name": "AI Core", + "version": "1", + "items": [ + { + "type": "http", + "name": "get_token", + "filename": "get_token.bru", + "seq": 1, + "request": { + "url": "{{ai_auth_url}}/oauth/token", + "method": "POST", + "headers": [ + { + "name": "Content-Type", + "value": "application/x-www-form-urlencoded", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "formUrlEncoded", + "formUrlEncoded": [ + { + "name": "grant_type", + "value": "client_credentials", + "enabled": true + }, + { + "name": "client_id", + "value": "{{client_id}}", + "enabled": true + }, + { + "name": "client_secret", + "value": "{{client_secret}}", + "enabled": true + } + ], + "multipartForm": [], + "file": [] + }, + "script": { + "res": "if (res.getStatus() == 200) {\n bru.setEnvVar(\"access_token\", res.body.access_token);\n}" + }, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "", + "auth": { + "mode": "none" + } + } + }, + { + "type": "folder", + "name": "admin", + "filename": "admin", + "root": { + "meta": { + "name": "admin" + } + }, + "items": [ + { + "type": "folder", + "name": "objectStoreSecrets", + "filename": "objectStoreSecrets", + "root": { + "meta": { + "name": "objectStoreSecrets" + } + }, + "items": [ + { + "type": "http", + "name": "Create a secret", + "filename": "Create a secret.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/v2/admin/objectStoreSecrets", + "method": "POST", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Content-Type", + "value": "application/json", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + }, + { + "name": "Authorization", + "value": "", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "json", + "json": "{\n \"name\": \"genai-data\",\n \"data\": {\n \"AWS_ACCESS_KEY_ID\": \"\",\n \"AWS_SECRET_ACCESS_KEY\": \"\"\n },\n \"type\": \"S3\",\n \"bucket\": \"\",\n \"endpoint\": \"https://s3.eu-central-1.amazonaws.com\",\n \"region\": \"\",\n \"pathPrefix\": \"\" \n }", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Create a secret based on the configuration in the request body\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "http", + "name": "Get a list of metadata of available secrets.", + "filename": "Get a list of metadata of available secrets.bru", + "seq": 2, + "request": { + "url": "{{baseUrl}}/admin/objectStoreSecrets?$top=&$skip=&$count=", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [ + { + "name": "$top", + "value": "", + "type": "query", + "enabled": true + }, + { + "name": "$skip", + "value": "", + "type": "query", + "enabled": true + }, + { + "name": "$count", + "value": "", + "type": "query", + "enabled": true + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve a list of metadata of the stored secrets.\n", + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "credentialsPlacement": "basic_auth_header", + "pkce": false, + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + } + } + }, + { + "type": "folder", + "name": "{objectStoreName}", + "filename": "{objectStoreName}", + "root": { + "meta": { + "name": "{objectStoreName}" + } + }, + "items": [ + { + "type": "http", + "name": "Delete object store secret", + "filename": "Delete object store secret.bru", + "seq": 2, + "request": { + "url": "{{baseUrl}}/admin/objectStoreSecrets/:objectStoreName", + "method": "DELETE", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [ + { + "name": "objectStoreName", + "value": "qKoZ-aHSe", + "type": "path", + "enabled": true + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Delete a secret with the name of objectStoreName if it exists.", + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "credentialsPlacement": "basic_auth_header", + "pkce": false, + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + } + } + }, + { + "type": "http", + "name": "Returns the of metadata of secrets which match the query parameter.", + "filename": "Returns the of metadata of secrets which match the query parameter.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/v2/admin/objectStoreSecrets", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "This retrieves the metadata of the stored secret which match the parameter objectStoreName.\nThe fetched secret is constructed like objectStoreName-object-store-secret\nThe base64 encoded field for the stored secret is not returned.\n", + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "credentialsPlacement": "basic_auth_header", + "pkce": false, + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + } + } + }, + { + "type": "http", + "name": "Update object store secret", + "filename": "Update object store secret.bru", + "seq": 3, + "request": { + "url": "{{baseUrl}}/admin/objectStoreSecrets/:objectStoreName", + "method": "PATCH", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Content-Type", + "value": "application/json", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [ + { + "name": "objectStoreName", + "value": "qKoZ-aHSe", + "type": "path", + "enabled": true + } + ], + "body": { + "mode": "json", + "json": "{\n \"name\": \"\",\n \"type\": \"\",\n \"data\": {},\n \"bucket\": \"\",\n \"endpoint\": \"\",\n \"region\": \"\",\n \"pathPrefix\": \"\",\n \"verifyssl\": \"\",\n \"usehttps\": \"1\"\n}", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Update a secret with name of objectStoreName if it exists.\n", + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "credentialsPlacement": "basic_auth_header", + "pkce": false, + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + } + } + } + ] + } + ] + } + ] + }, + { + "type": "folder", + "name": "lm", + "filename": "lm", + "root": { + "meta": { + "name": "lm" + } + }, + "items": [ + { + "type": "folder", + "name": "configurations", + "filename": "configurations", + "root": { + "meta": { + "name": "configurations" + } + }, + "items": [ + { + "type": "http", + "name": "Create configuration Copy", + "filename": "Create configuration Copy.bru", + "seq": 3, + "request": { + "url": "{{baseUrl}}/v2/lm/configurations", + "method": "DELETE", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Content-Type", + "value": "application/json", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "json", + "json": "{\n \"id\": \"\"\n}", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Create a new configuration linked to a specific scenario and executable for use in an execution\nor deployment.\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "http", + "name": "Create configuration", + "filename": "Create configuration.bru", + "seq": 2, + "request": { + "url": "{{baseUrl}}/v2/lm/configurations", + "method": "POST", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Content-Type", + "value": "application/json", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "json", + "json": "{\n \"name\": \"genai-eval-conf\",\n \"scenarioId\": \"genai-evaluations\",\n \"executableId\": \"genai-evaluations-simplified\",\n \"inputArtifactBindings\": [\n {\n \"key\": \"datasetFolder\",\n \"artifactId\": \"\"\n }\n ],\n \"parameterBindings\": [\n {\n \"key\": \"repetitions\",\n \"value\": \"1\"\n },\n {\n \"key\": \"orchestrationDeploymentURL\",\n \"value\": \"\"\n\n },\n {\n \"key\": \"metrics\",\n \"value\": \"language_match\"\n },\n {\n \"key\": \"testDataset\",\n \"value\": \"{\\\"path\\\": \\\"testdata/global_customer_queries.csv\\\", \\\"type\\\": \\\"csv\\\"}\"\n },\n {\n \"key\": \"promptTemplate\",\n \"value\": \"\"\n },\n {\n \"key\": \"models\",\n \"value\": \"gpt-4.1:latest\"\n }\n ]\n}\n", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Create a new configuration linked to a specific scenario and executable for use in an execution\nor deployment.\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "http", + "name": "Get list of configurations", + "filename": "Get list of configurations.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/v2/lm/configurations", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve a list of configurations. Filter results by scenario ID or a list of executable IDs.\nSearch for configurations containing the search string as substring in the configuration name.\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "folder", + "name": "{configurationId}", + "filename": "{configurationId}", + "root": { + "meta": { + "name": "{configurationId}" + } + }, + "items": [ + { + "type": "http", + "name": "Get configuration by ID", + "filename": "Get configuration by ID.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/v2/lm/configurations", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve details for configuration with configurationId.", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + } + ] + }, + { + "type": "folder", + "name": "$count", + "filename": "$count", + "root": { + "meta": { + "name": "$count" + } + }, + "items": [ + { + "type": "http", + "name": "Get number of configurations", + "filename": "Get number of configurations.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/lm/configurations/$count?scenarioId=iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN&$search=}\"NI2Kn!V&searchCaseInsensitive=false&executableIds=T_jtbUJzwg0e.okSV667jeZejqVb,3e0cmfc4c-6YavNz92uztZE", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Accept", + "value": "text/plain", + "enabled": true + } + ], + "params": [ + { + "name": "scenarioId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": true + }, + { + "name": "$search", + "value": "}\"NI2Kn!V", + "type": "query", + "enabled": true + }, + { + "name": "searchCaseInsensitive", + "value": "false", + "type": "query", + "enabled": true + }, + { + "name": "executableIds", + "value": "T_jtbUJzwg0e.okSV667jeZejqVb,3e0cmfc4c-6YavNz92uztZE", + "type": "query", + "enabled": true + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve the number of available configurations that match the specified filter criteria.\nFilter criteria include a scenarioId or executableIdsList. Search by substring of configuration name is also possible.\n", + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "credentialsPlacement": "basic_auth_header", + "pkce": false, + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + } + } + } + ] + } + ] + }, + { + "type": "folder", + "name": "artifacts", + "filename": "artifacts", + "root": { + "meta": { + "name": "artifacts" + } + }, + "items": [ + { + "type": "http", + "name": "Get list of artifacts", + "filename": "Get list of artifacts.bru", + "seq": 2, + "request": { + "url": "{{baseUrl}}/v2/lm/artifacts", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [ + { + "name": "scenarioId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": false + }, + { + "name": "executionId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": false + }, + { + "name": "name", + "value": "[G7 ovyt8i", + "type": "query", + "enabled": false + }, + { + "name": "kind", + "value": "other", + "type": "query", + "enabled": false + }, + { + "name": "artifactLabelSelector", + "value": "ext.ai.sap.com/bXN1EAk=D*", + "type": "query", + "enabled": false + }, + { + "name": "$top", + "value": "10000", + "type": "query", + "enabled": false + }, + { + "name": "$skip", + "value": "", + "type": "query", + "enabled": false + }, + { + "name": "$search", + "value": "}\"NI2Kn!V", + "type": "query", + "enabled": false + }, + { + "name": "searchCaseInsensitive", + "value": "false", + "type": "query", + "enabled": false + }, + { + "name": "$expand", + "value": "scenario", + "type": "query", + "enabled": false + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve a list of artifacts that matches the specified filter criteria.\nFilter criteria include scenario ID, execution ID, an artifact name, artifact kind, or artifact labels.\nUse top/skip parameters to paginate the result list.\nSearch by substring of artifact name or description, if required.\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "http", + "name": "Register artifact", + "filename": "Register artifact.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/v2/lm/artifacts", + "method": "POST", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Content-Type", + "value": "application/json", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "json", + "json": "{\n \"name\": \"aiconfig\",\n \"kind\": \"dataset\",\n \"url\": \"ai://genai-data/genaiEvaluation/14af1af80b974edb8731632d17286343\",\n \"scenarioId\": \"genai-evaluations\"\n}\n", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Register an artifact for use in a configuration, for example a model or a dataset.", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "folder", + "name": "$count", + "filename": "$count", + "root": { + "meta": { + "name": "$count" + } + }, + "items": [ + { + "type": "http", + "name": "Get number of artifacts", + "filename": "Get number of artifacts.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/lm/artifacts/$count?scenarioId=iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN&executionId=iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN&name=[G7 ovyt8i&kind=other&$search=}\"NI2Kn!V&searchCaseInsensitive=false&artifactLabelSelector=ext.ai.sap.com/bXN1EAk=D*", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Accept", + "value": "text/plain", + "enabled": true + } + ], + "params": [ + { + "name": "scenarioId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": true + }, + { + "name": "executionId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": true + }, + { + "name": "name", + "value": "[G7 ovyt8i", + "type": "query", + "enabled": true + }, + { + "name": "kind", + "value": "other", + "type": "query", + "enabled": true + }, + { + "name": "$search", + "value": "}\"NI2Kn!V", + "type": "query", + "enabled": true + }, + { + "name": "searchCaseInsensitive", + "value": "false", + "type": "query", + "enabled": true + }, + { + "name": "artifactLabelSelector", + "value": "ext.ai.sap.com/bXN1EAk=D*", + "type": "query", + "enabled": true + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve the number of available artifacts that match the specified filter criteria.\nFilter criteria include a scenarioId, executionId, an artifact name, artifact kind, or artifact labels.\nSearch by substring of artifact name or description is also possible.\n", + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "credentialsPlacement": "basic_auth_header", + "pkce": false, + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + } + } + } + ] + } + ] + }, + { + "type": "folder", + "name": "executions", + "filename": "executions", + "root": { + "meta": { + "name": "executions" + } + }, + "items": [ + { + "type": "http", + "name": "Create execution", + "filename": "Create execution.bru", + "seq": 2, + "request": { + "url": "{{baseUrl}}/v2/lm/executions", + "method": "POST", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Content-Type", + "value": "application/json", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "json", + "json": "{\n \"configurationId\": \"\"\n}", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Create an execution using the configuration specified by configurationId.", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "http", + "name": "Get list of executions", + "filename": "Get list of executions.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/v2/lm/executions/", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [ + { + "name": "scenarioId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": false + }, + { + "name": "executionScheduleId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": false + }, + { + "name": "status", + "value": "DEAD", + "type": "query", + "enabled": false + }, + { + "name": "$top", + "value": "10000", + "type": "query", + "enabled": false + }, + { + "name": "$skip", + "value": "", + "type": "query", + "enabled": false + }, + { + "name": "$select", + "value": "status", + "type": "query", + "enabled": false + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve a list of executions that match the specified filter criteria.\nFilter criteria include a list of executableIds, a scenarioId, a configurationId, or a execution status.\nWith top/skip parameters it is possible to paginate the result list.\nWith select parameter it is possible to select only status.\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "folder", + "name": "$count", + "filename": "$count", + "root": { + "meta": { + "name": "$count" + } + } + } + ] + }, + { + "type": "folder", + "name": "deployments", + "filename": "deployments", + "root": { + "meta": { + "name": "deployments" + } + }, + "items": [ + { + "type": "http", + "name": "Create deployment", + "filename": "Create deployment.bru", + "seq": 2, + "request": { + "url": "{{baseUrl}}/v2/lm/deployments", + "method": "POST", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Content-Type", + "value": "application/json", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "json", + "json": "{\n \"configurationId\": \"\"\n}", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Create a deployment using the configuration specified by configurationId after synchronously checking the\ncorrectness of the configuration.\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "http", + "name": "Get list of deployments", + "filename": "Get list of deployments.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/v2/lm/deployments", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve a list of deployments that match the specified filter criteria.\nFilter criteria include a list of executableIds, a scenarioId, a configurationId, or a deployment status.\nWith top/skip parameters it is possible to paginate the result list.\nWith select parameter it is possible to select only status.\n", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "folder", + "name": "$count", + "filename": "$count", + "root": { + "meta": { + "name": "$count" + } + }, + "items": [ + { + "type": "http", + "name": "Get number of deployments", + "filename": "Get number of deployments.bru", + "seq": 1, + "request": { + "url": "{{baseUrl}}/lm/deployments/$count?executableIds=T_jtbUJzwg0e.okSV667jeZejqVb,3e0cmfc4c-6YavNz92uztZE&configurationId=iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN&scenarioId=iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN&status=DEAD", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "", + "enabled": true + }, + { + "name": "Accept", + "value": "text/plain", + "enabled": true + } + ], + "params": [ + { + "name": "executableIds", + "value": "T_jtbUJzwg0e.okSV667jeZejqVb,3e0cmfc4c-6YavNz92uztZE", + "type": "query", + "enabled": true + }, + { + "name": "configurationId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": true + }, + { + "name": "scenarioId", + "value": "iiwMZ8.BjeF0SgmlZJM11XXkDUxP7Sg5GQLKEEsaWb.om5wMy1gN3AtN", + "type": "query", + "enabled": true + }, + { + "name": "status", + "value": "DEAD", + "type": "query", + "enabled": true + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "Retrieve the number of available deployments. The number can be filtered by\nscenarioId, configurationId, executableIdsList or by deployment status.\n", + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "credentialsPlacement": "basic_auth_header", + "pkce": false, + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + } + } + } + ] + } + ] + }, + { + "type": "folder", + "name": "metrics", + "filename": "metrics", + "root": { + "meta": { + "name": "metrics" + } + }, + "items": [ + { + "type": "http", + "name": "Evaluation Metrics via Execution ID", + "filename": "Evaluation Metrics via Execution ID.bru", + "seq": 4, + "request": { + "url": "{{baseUrl}}/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/child-of=", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [ + { + "name": "tagFilters", + "url": "{{baseUrl}}/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/child-of=", + "value": "evaluation.ai.sap.com/child-of=", + "type": "query", + "enabled": true + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + }, + { + "type": "http", + "name": "Metrics by Run Name", + "filename": "Metrics by Run Name.bru", + "seq": 5, + "request": { + "url": "{{baseUrl}}/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/run-name=run1", + "method": "GET", + "headers": [ + { + "name": "AI-Resource-Group", + "value": "{{resource_group}}", + "enabled": true + }, + { + "name": "Accept", + "value": "application/json", + "enabled": true + } + ], + "params": [ + { + "name": "tagFilters", + "value": "evaluation.ai.sap.com/run-name=run1", + "type": "query", + "enabled": true + } + ], + "body": { + "mode": "none", + "formUrlEncoded": [], + "multipartForm": [], + "file": [] + }, + "script": {}, + "vars": {}, + "assertions": [], + "tests": "", + "docs": "", + "auth": { + "mode": "bearer", + "bearer": { + "token": "{{access_token}}" + } + } + } + } + ] + } + ] + } + ], + "activeEnvironmentUid": "lWUmIcEkGnkMxwNBILLmY", + "environments": [ + { + "variables": [ + { + "name": "ai_auth_url", + "value": "", + "enabled": true, + "secret": false, + "type": "text" + }, + { + "name": "ai_api_url", + "value": "", + "enabled": true, + "secret": false, + "type": "text" + }, + { + "name": "client_id", + "value": "", + "enabled": true, + "secret": false, + "type": "text" + }, + { + "name": "client_secret", + "value": "", + "enabled": true, + "secret": false, + "type": "text" + }, + { + "name": "resource_group", + "value": "", + "enabled": true, + "secret": false, + "type": "text" + }, + { + "name": "orchestration_service_url", + "value": "", + "enabled": true, + "secret": false, + "type": "text" + }, + { + "name": "access_token", + "value": "", + "enabled": true, + "secret": true, + "type": "text" + } + ], + "name": "intprod" + } + ], + "root": { + "request": { + "auth": { + "mode": "oauth2", + "oauth2": { + "grantType": "authorization_code", + "callbackUrl": "", + "authorizationUrl": "", + "accessTokenUrl": "", + "refreshTokenUrl": "", + "clientId": "", + "clientSecret": "", + "scope": "", + "state": "", + "pkce": false, + "credentialsPlacement": "basic_auth_header", + "credentialsId": "credentials", + "tokenPlacement": "header", + "tokenHeaderPrefix": "Bearer", + "tokenQueryKey": "access_token", + "autoFetchToken": true, + "autoRefreshToken": false + } + }, + "vars": { + "req": [ + { + "name": "region", + "value": "prod.eu-central-1.aws", + "enabled": true, + "local": false, + "uid": "oYVk4DuVpyYqqP2roBVjE" + }, + { + "name": "baseUrl", + "value": "", + "enabled": true, + "local": false, + "uid": "I4KjDm7FxpSRwUYzjwfPG" + }, + { + "name": "auth_url", + "value": "", + "enabled": true, + "local": false, + "uid": "zuftvyCURtA9XYErCYDgo" + }, + { + "name": "client_id", + "value": "", + "enabled": true, + "local": false, + "uid": "JfGEVKm71BYTgR8UkQUGv" + }, + { + "name": "client_secret", + "value": "", + "enabled": true, + "local": false, + "uid": "ls3RYTJ40baTl8eYmilGt" + }, + { + "name": "AWS_ACCESS_KEY_ID", + "value": "", + "enabled": true, + "local": false, + "uid": "2O0YTTAdmYltm5XiHMhP2" + }, + { + "name": "AWS_SECRET_ACCESS_KEY", + "value": "", + "enabled": true, + "local": false, + "uid": "8rc4RYyPcHXyTkAnnI981" + }, + { + "name": "BUCKET_NAME", + "value": "", + "enabled": true, + "local": false, + "uid": "HqFIe8Rvc14i41WIAGGkl" + }, + { + "name": "DATABASE_URL", + "value": "https://s3-eu-central-1.amazonaws.com", + "enabled": true, + "local": false, + "uid": "aWIwuJZH5XQ5Guu2D69Sq" + } + ] + } + }, + "docs": "Provides tools to manage your scenarios and workflows in SAP AI Core. Execute pipelines as a batch job, for example to pre-process or train your models, or perform batch inference. Serve inference requests of trained models. Deploy а trained machine learning model as a web service to serve inference requests with high performance. Register your own Docker registry, synchronize your AI content from your own git repository, and register your own object store for training data and trained models.\n", + "meta": { + "name": "AI Core" + } + }, + "brunoConfig": { + "version": "1", + "name": "AI Core", + "type": "collection", + "ignore": [ + "node_modules", + ".git" + ], + "size": 0.10747432708740234, + "filesCount": 151 + } +} diff --git a/tutorials/ai-core-genaihub-evaluation/img/image-br03.png b/tutorials/ai-core-genaihub-evaluation/img/image-br03.png index 48b1474ce..cbfd0b4c1 100644 Binary files a/tutorials/ai-core-genaihub-evaluation/img/image-br03.png and b/tutorials/ai-core-genaihub-evaluation/img/image-br03.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_008.png b/tutorials/ai-core-genaihub-evaluation/img/image_008.png index 2f12f021a..0582d66f2 100644 Binary files a/tutorials/ai-core-genaihub-evaluation/img/image_008.png and b/tutorials/ai-core-genaihub-evaluation/img/image_008.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_10.png b/tutorials/ai-core-genaihub-evaluation/img/image_10.png index 275de8254..f5a0fec8e 100644 Binary files a/tutorials/ai-core-genaihub-evaluation/img/image_10.png and b/tutorials/ai-core-genaihub-evaluation/img/image_10.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_19.png b/tutorials/ai-core-genaihub-evaluation/img/image_19.png index 3e302e4c7..91498a203 100644 Binary files a/tutorials/ai-core-genaihub-evaluation/img/image_19.png and b/tutorials/ai-core-genaihub-evaluation/img/image_19.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_20.png b/tutorials/ai-core-genaihub-evaluation/img/image_20.png deleted file mode 100644 index ebf5c705e..000000000 Binary files a/tutorials/ai-core-genaihub-evaluation/img/image_20.png and /dev/null differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_25.png b/tutorials/ai-core-genaihub-evaluation/img/image_25.png index 8b9af206b..afdb0e197 100644 Binary files a/tutorials/ai-core-genaihub-evaluation/img/image_25.png and b/tutorials/ai-core-genaihub-evaluation/img/image_25.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_27.png b/tutorials/ai-core-genaihub-evaluation/img/image_27.png index c3472efdb..5bd8e53b7 100644 Binary files a/tutorials/ai-core-genaihub-evaluation/img/image_27.png and b/tutorials/ai-core-genaihub-evaluation/img/image_27.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_29.png b/tutorials/ai-core-genaihub-evaluation/img/image_29.png index 09c845eea..72d40ecdf 100644 Binary files a/tutorials/ai-core-genaihub-evaluation/img/image_29.png and b/tutorials/ai-core-genaihub-evaluation/img/image_29.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_33.png b/tutorials/ai-core-genaihub-evaluation/img/image_33.png index 1ee321aa5..546d43b52 100644 Binary files a/tutorials/ai-core-genaihub-evaluation/img/image_33.png and b/tutorials/ai-core-genaihub-evaluation/img/image_33.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_34.png b/tutorials/ai-core-genaihub-evaluation/img/image_34.png index 47498a7b0..4fa0960a1 100644 Binary files a/tutorials/ai-core-genaihub-evaluation/img/image_34.png and b/tutorials/ai-core-genaihub-evaluation/img/image_34.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_40.png b/tutorials/ai-core-genaihub-evaluation/img/image_40.png index 87af4a4e8..1498cb5cb 100644 Binary files a/tutorials/ai-core-genaihub-evaluation/img/image_40.png and b/tutorials/ai-core-genaihub-evaluation/img/image_40.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_41.png b/tutorials/ai-core-genaihub-evaluation/img/image_41.png index 28e12a546..1bb7780d5 100644 Binary files a/tutorials/ai-core-genaihub-evaluation/img/image_41.png and b/tutorials/ai-core-genaihub-evaluation/img/image_41.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_46.png b/tutorials/ai-core-genaihub-evaluation/img/image_46.png index eab460b4f..ef67d82f2 100644 Binary files a/tutorials/ai-core-genaihub-evaluation/img/image_46.png and b/tutorials/ai-core-genaihub-evaluation/img/image_46.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_46_01.png b/tutorials/ai-core-genaihub-evaluation/img/image_46_01.png new file mode 100644 index 000000000..131317edd Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation/img/image_46_01.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_46a.png b/tutorials/ai-core-genaihub-evaluation/img/image_46a.png new file mode 100644 index 000000000..c493e2a5d Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation/img/image_46a.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_47.png b/tutorials/ai-core-genaihub-evaluation/img/image_47.png index fc729b5ea..861ec6d0a 100644 Binary files a/tutorials/ai-core-genaihub-evaluation/img/image_47.png and b/tutorials/ai-core-genaihub-evaluation/img/image_47.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_48.png b/tutorials/ai-core-genaihub-evaluation/img/image_48.png index a7d8b132f..78731db09 100644 Binary files a/tutorials/ai-core-genaihub-evaluation/img/image_48.png and b/tutorials/ai-core-genaihub-evaluation/img/image_48.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_49.png b/tutorials/ai-core-genaihub-evaluation/img/image_49.png index ec44bfad3..2a2bbcd75 100644 Binary files a/tutorials/ai-core-genaihub-evaluation/img/image_49.png and b/tutorials/ai-core-genaihub-evaluation/img/image_49.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_49a.png b/tutorials/ai-core-genaihub-evaluation/img/image_49a.png new file mode 100644 index 000000000..07bcac05c Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation/img/image_49a.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_5.png b/tutorials/ai-core-genaihub-evaluation/img/image_5.png index e30beaf9b..b3a46a40e 100644 Binary files a/tutorials/ai-core-genaihub-evaluation/img/image_5.png and b/tutorials/ai-core-genaihub-evaluation/img/image_5.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_6.png b/tutorials/ai-core-genaihub-evaluation/img/image_6.png index 4a96b45b9..d2936a402 100644 Binary files a/tutorials/ai-core-genaihub-evaluation/img/image_6.png and b/tutorials/ai-core-genaihub-evaluation/img/image_6.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image__py_pmtreg.png b/tutorials/ai-core-genaihub-evaluation/img/image__py_pmtreg.png new file mode 100644 index 000000000..f0d907cf0 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation/img/image__py_pmtreg.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_ail_26.png b/tutorials/ai-core-genaihub-evaluation/img/image_ail_26.png new file mode 100644 index 000000000..753e25505 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation/img/image_ail_26.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_objsec.png b/tutorials/ai-core-genaihub-evaluation/img/image_objsec.png new file mode 100644 index 000000000..cccf2d1b4 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation/img/image_objsec.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_py03.png b/tutorials/ai-core-genaihub-evaluation/img/image_py03.png index cace1aedb..44de78ff6 100644 Binary files a/tutorials/ai-core-genaihub-evaluation/img/image_py03.png and b/tutorials/ai-core-genaihub-evaluation/img/image_py03.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_py_con.png b/tutorials/ai-core-genaihub-evaluation/img/image_py_con.png index 12bf2650b..b929a58a2 100644 Binary files a/tutorials/ai-core-genaihub-evaluation/img/image_py_con.png and b/tutorials/ai-core-genaihub-evaluation/img/image_py_con.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_py_dtst.png b/tutorials/ai-core-genaihub-evaluation/img/image_py_dtst.png new file mode 100644 index 000000000..ec0dca7c3 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation/img/image_py_dtst.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_py_rk.png b/tutorials/ai-core-genaihub-evaluation/img/image_py_rk.png index d7e2b38c4..f38fe6241 100644 Binary files a/tutorials/ai-core-genaihub-evaluation/img/image_py_rk.png and b/tutorials/ai-core-genaihub-evaluation/img/image_py_rk.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/image_py_rnk1.png b/tutorials/ai-core-genaihub-evaluation/img/image_py_rnk1.png new file mode 100644 index 000000000..12b49a4d5 Binary files /dev/null and b/tutorials/ai-core-genaihub-evaluation/img/image_py_rnk1.png differ diff --git a/tutorials/ai-core-genaihub-evaluation/img/requirements.txt b/tutorials/ai-core-genaihub-evaluation/img/requirements.txt index c63e2f289..2c0a06e40 100644 --- a/tutorials/ai-core-genaihub-evaluation/img/requirements.txt +++ b/tutorials/ai-core-genaihub-evaluation/img/requirements.txt @@ -1,7 +1,7 @@ -generative-ai-hub-sdk==4.4.3 -python-dotenv==1.0.1 -boto3==1.37.4 -pandas==2.2.3 -json2html==1.3.0 -numpy==1.26.4 -ipywidgets==8.1.0 +generative-ai-hub-sdk==4.4.3 +python-dotenv==1.0.1 +boto3==1.37.4 +pandas==2.2.3 +json2html==1.3.0 +numpy==1.26.4 +ipywidgets==8.1.0 diff --git a/tutorials/ai-core-genaihub-evaluation/quick_start.ipynb b/tutorials/ai-core-genaihub-evaluation/quick_start.ipynb new file mode 100644 index 000000000..b2a5e24d0 --- /dev/null +++ b/tutorials/ai-core-genaihub-evaluation/quick_start.ipynb @@ -0,0 +1,2402 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Generative AI Custom Evaluation\n", + "This is an example notebook which showcases how a user can use AI Core custom evaluation to benchmark their large language models, evaluate orchestration configuration or prompts for their use case.\n", + "It uses publicly available [MedicationQA dataset](https://langtest.org/docs/pages/benchmarks/medical/medicationqa/) which consists of commonly asked consumer questions about medications. The workload computes industry standard metrics to check the reliability of the response generate by llm.\n", + "
**Note: For detailed instructions please refer to [Readme](./Readme.md)**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# SetUp (Step 1)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "! pip install -r ../requirements.txt" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Load your environment variables\n", + "\n", + "Ensure that your environment variables are set in a `.env` file (see sample.env for an example). If there is a missing field the notebook will prompt you for a value." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\n", + "\n", + "# Loading the credentials from the env file\n", + "from gen_ai_hub.proxy.gen_ai_hub_proxy import GenAIHubProxyClient\n", + "from dotenv import load_dotenv\n", + "import os\n", + "\n", + "load_dotenv(override=True)\n", + "\n", + "\n", + "# Fetching environment variables or prompting the user if missing\n", + "AICORE_BASE_URL = os.getenv(\"AICORE_BASE_URL\") or input(\"AICORE_BASE_URL is missing. Please enter it: \")\n", + "AICORE_RESOURCE_GROUP = os.getenv(\"AICORE_RESOURCE_GROUP\") or input(\"AICORE_RESOURCE_GROUP is missing. Please enter it (default: 'default'): \") or \"default\"\n", + "AICORE_AUTH_URL = os.getenv(\"AICORE_AUTH_URL\") or input(\"AICORE_AUTH_URL is missing. Please enter it: \")\n", + "AICORE_CLIENT_ID = os.getenv(\"AICORE_CLIENT_ID\") or input(\"AICORE_CLIENT_ID is missing. Please enter it: \")\n", + "AICORE_CLIENT_SECRET = os.getenv(\"AICORE_CLIENT_SECRET\") or input(\"AICORE_CLIENT_SECRET is missing. Please enter it: \")\n", + "\n", + "AWS_ACCESS_KEY = os.getenv(\"AWS_ACCESS_KEY\") or input(\"AWS_ACCESS_KEY is missing. Please enter it: \")\n", + "AWS_BUCKET_ID = os.getenv(\"AWS_BUCKET_ID\") or input(\"AWS_BUCKET_ID is missing. Please enter it: \")\n", + "AWS_REGION = os.getenv(\"AWS_REGION\") or input(\"AWS_REGION is missing. Please enter it: \")\n", + "AWS_SECRET_ACCESS_KEY = os.getenv(\"AWS_SECRET_ACCESS_KEY\") or input(\"AWS_SECRET_ACCESS_KEY is missing. Please enter it: \")\n", + "DEPLOYMENT_URL = os.getenv(\"DEPLOYMENT_URL\", None)\n", + "\n", + "# Initializing the GenAIHubProxyClient\n", + "client = GenAIHubProxyClient(\n", + " base_url=AICORE_BASE_URL,\n", + " auth_url=AICORE_AUTH_URL,\n", + " client_id=AICORE_CLIENT_ID,\n", + " client_secret=AICORE_CLIENT_SECRET,\n", + " resource_group=AICORE_RESOURCE_GROUP\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Dependencies and Helper Functions (Step 2)" + ] + }, + { + "cell_type": "code", + "execution_count": 193, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Dataset name: medicalqna_dataset.csv\n" + ] + } + ], + "source": [ + "import os\n", + "import json\n", + "\n", + "\n", + "def get_dataset_file_name(folder_path):\n", + " \"\"\"\n", + " Retrieves the name of the first file in the specified folder.\n", + " \"\"\"\n", + " if not os.path.isdir(folder_path):\n", + " print(f\"The folder path '{folder_path}' does not exist.\")\n", + " return None\n", + "\n", + " items_in_folder = os.listdir(folder_path)\n", + "\n", + " for item in items_in_folder:\n", + " item_path = os.path.join(folder_path, item)\n", + " if os.path.isfile(item_path):\n", + " return item\n", + "\n", + " print(f\"No files were found in the folder '{folder_path}'.\")\n", + " return None\n", + "\n", + "\n", + "# --- MAIN EXECUTION ---\n", + "DATASET_FOLDER = \"../DATASET\"\n", + "\n", + "DATASET_NAME = get_dataset_file_name(DATASET_FOLDER)\n", + "\n", + "if DATASET_NAME:\n", + " print(f\"Dataset name: {DATASET_NAME}\")\n", + "else:\n", + " print(\"Missing run or dataset file.\")\n", + " raise SystemExit(\"Exiting due to missing run/dataset file.\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Register an Object Store Secret\n", + "To use the evaluations service, you must register an object store with the name default. Optionally, you can register an additional object store with a name of your choice." + ] + }, + { + "cell_type": "code", + "execution_count": 194, + "metadata": {}, + "outputs": [], + "source": [ + "# setup authentication and headers needed for AI Core requests\n", + "def _get_headers():\n", + " headers = {\n", + " \"Authorization\": client.get_ai_core_token(),\n", + " \"AI-Resource-Group\": AICORE_RESOURCE_GROUP,\n", + " \"Content-Type\": \"application/json\",\n", + " }\n", + " return headers" + ] + }, + { + "cell_type": "code", + "execution_count": 195, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Successfully deleted object store secret: default\n", + "Successfully deleted object store secret: genai-quick-data-notebook\n" + ] + }, + { + "data": { + "text/plain": [ + "{'message': 'secret has been created'}" + ] + }, + "execution_count": 195, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Register S3 secret with AI Core which will be used an input source \n", + "import requests\n", + "import json\n", + "import logging\n", + "\n", + "def delete_oss_secret(oss_name=\"\"):\n", + " headers = _get_headers()\n", + " \n", + " DELETE_SECRETS_ENDPOINT = f'/v2/admin/objectStoreSecrets/{oss_name}'\n", + " request_url = f\"{AICORE_BASE_URL}{DELETE_SECRETS_ENDPOINT}\"\n", + " \n", + " try:\n", + " response = requests.delete(request_url, headers=headers, timeout=120)\n", + " if response.status_code == 202:\n", + " print(f\"Successfully deleted object store secret: {oss_name}\")\n", + " elif response.status_code == 404:\n", + " print(f\"Object store secret not found: {oss_name}. It may not exist.\")\n", + " else:\n", + " logging.error(f\"Failed to delete object store secret: {oss_name}, Status Code: {response.status_code}\")\n", + " except Exception as e:\n", + " logging.error(f\"Error occurred while attempting to delete object store secret: {e}\")\n", + " raise\n", + "\n", + "def register_oss_secret(oss_name=\"\", path_prefix=\"\"):\n", + " headers = _get_headers()\n", + " \n", + " POST_SECRETS_ENDPOINT = '/v2/admin/objectStoreSecrets'\n", + " request_url = f\"{AICORE_BASE_URL}{POST_SECRETS_ENDPOINT}\"\n", + " \n", + " request_body = {\n", + " \"name\": oss_name,\n", + " \"data\": {\n", + " \"AWS_ACCESS_KEY_ID\": AWS_ACCESS_KEY,\n", + " \"AWS_SECRET_ACCESS_KEY\": AWS_SECRET_ACCESS_KEY\n", + " },\n", + " \"type\": \"S3\",\n", + " \"bucket\": AWS_BUCKET_ID,\n", + " \"endpoint\": \"s3-eu-central-1.amazonaws.com\",\n", + " \"region\": AWS_REGION,\n", + " \"pathPrefix\": path_prefix,\n", + " \"verifyssl\": \"0\",\n", + " \"usehttps\": \"1\",\n", + " }\n", + " try:\n", + " response = requests.post(\n", + " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", + " )\n", + " result = response.json()\n", + " return result\n", + " except:\n", + " logging.error(\"Error occurred while attempting to create object store secret\")\n", + " raise\n", + " \n", + "delete_oss_secret(oss_name=\"default\")\n", + "delete_oss_secret(oss_name=\"genai-quick-data-notebook\")\n", + " \n", + "register_oss_secret(oss_name=\"default\", path_prefix=\"\")\n", + "register_oss_secret(oss_name=\"genai-quick-data-notebook\", path_prefix=\"\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The user stores the input files in the object store and registers the root folder as artifact with AI Core. The File Upload and Artifact endpoints of AI Core API may be used for this purpose. In this example eval-data is the root folder containing the orchestration configurations and test data which is registered as AI Core artifact." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# uploading these files to Object store to register as an artifact inside ai core\n", + "\n", + "import boto3\n", + "import os\n", + "import uuid\n", + "\n", + "def upload_folder_to_s3(folder_path, bucket_name, s3_prefix=\"\"):\n", + " \"\"\"\n", + " Upload a folder to an S3 bucket recursively.\n", + "\n", + " :param folder_path: The local folder path to upload.\n", + " :param bucket_name: The name of the S3 bucket.\n", + " :param s3_prefix: Optional prefix to use for the S3 keys (e.g., subfolder in the bucket).\n", + " \"\"\"\n", + " s3_client = boto3.client(\n", + " 's3',\n", + " aws_access_key_id=AWS_ACCESS_KEY,\n", + " aws_secret_access_key=AWS_SECRET_ACCESS_KEY,\n", + " region_name=AWS_REGION\n", + " )\n", + "\n", + " for root, dirs, files in os.walk(folder_path):\n", + " for file_name in files:\n", + " print(\"val of root is \", file_name)\n", + " local_path = os.path.join(root, file_name)\n", + " # Compute the relative path for the S3 key\n", + " relative_path = os.path.relpath(local_path, folder_path)\n", + " s3_key = os.path.join(s3_prefix, relative_path).replace(\"\\\\\", \"/\") # Ensure S3-compatible paths\n", + " print(\"val of s3 key is \", s3_key)\n", + " print(f\"Uploading {local_path} to s3://{bucket_name}/{s3_key}\")\n", + " \n", + " # Upload the file\n", + " s3_client.upload_file(local_path, bucket_name, s3_key)\n", + "\n", + "# Example usage\n", + "folder_to_upload_testdata = \"../DATASET\"\n", + "user_directory_prefix = \"\" # replace with your i-number as string here\n", + "prefix_guid = user_directory_prefix if user_directory_prefix is not None else str(uuid.uuid4().hex)\n", + "s3_testdata_prefix = f\"genaiEvaluation/{prefix_guid}/testdata\" # Leave empty for root of the bucket\n", + "\n", + "\n", + "upload_folder_to_s3(folder_to_upload_testdata, AWS_BUCKET_ID, s3_testdata_prefix)\n", + "input_artifact_path = f\"ai://genai-quick-data-notebook/genaiEvaluation/{prefix_guid}\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The user stores the input files in the object store and registers the root folder as artifact with AI Core. The File Upload and Artifact endpoints of AI Core API may be used for this purpose. In this example `genaiEvaluation\\{prefix_guid}` is the root folder containing the orchestration configurations and test data which is registered as AI Core artifact." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "import logging\n", + "# Registering the uploaded files from AWS as artifacts to use inside configuration.\n", + "\n", + "def register_artifact():\n", + " headers = _get_headers()\n", + " \n", + " GET_ARTIFACTS_ENDPOINT = '/v2/lm/artifacts'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_ARTIFACTS_ENDPOINT}\"\n", + " \n", + " request_body = {\n", + " \"labels\": [\n", + " {\n", + " \"key\": \"ext.ai.sap.com/prompt-evaluation\",\n", + " \"value\": \"true\"\n", + " }\n", + " ],\n", + " \"name\": \"genai-eval-simplified-test-data\",\n", + " \"kind\": \"other\",\n", + " \"url\": input_artifact_path, # input artifact path\n", + " \"description\": \"demo artifacts for evaluation flow.\",\n", + " \"scenarioId\": \"genai-evaluations\"\n", + " }\n", + " try:\n", + " response = requests.post(\n", + " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", + " )\n", + " result = response.json()\n", + " print(result)\n", + " return result['id']\n", + " except:\n", + " print(\"Error occurred while attempting to register artifact\")\n", + " raise\n", + " \n", + "\n", + "artifact_id = register_artifact()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create Orchestration Deployment\n", + "An orchestration Deployment URL is required for us to run our evaluation. Once created we need to wait until the deployment is running and provides us a deployment url which will be add to our configuration file in the next step. \n", + "\n", + "**Note**: You can skip this step if you already have a orchestration deployment running and set the deployment url in the next cell." + ] + }, + { + "cell_type": "code", + "execution_count": 198, + "metadata": {}, + "outputs": [], + "source": [ + "import requests\n", + "import json\n", + "import time\n", + "\n", + "\n", + "\n", + "def create_orchestration_configuration():\n", + " headers = _get_headers()\n", + " request_body = {\n", + " \"name\": \"orchestrationDeployment\",\n", + " \"executableId\": \"orchestration\",\n", + " \"scenarioId\": \"orchestration\",\n", + " \"parameterBindings\": [\n", + " {\n", + " \"key\": \"modelFilterList\",\n", + " \"value\": \"null\"\n", + " },\n", + " {\n", + " \"key\": \"modelFilterListType\",\n", + " \"value\": \"allow\"\n", + " }\n", + " ],\n", + " \"inputArtifactBindings\": []\n", + " }\n", + " \n", + " GET_CONFIGURATIONS_ENDPOINT = '/v2/lm/configurations'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_CONFIGURATIONS_ENDPOINT}\"\n", + " try:\n", + " response = requests.post(\n", + " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", + " )\n", + " print(response)\n", + " if(response.status_code != 201):\n", + " raise\n", + " result = response.json()\n", + " print(result)\n", + " return result['id']\n", + " except:\n", + " logging.error(\"Error occurred while attempting to create a Configuration\")\n", + " raise\n", + " \n", + "def execute_orchestration_deployment(configuration_id):\n", + " headers = _get_headers()\n", + " GET_DEPLOYMENTS_ENDPOINT = '/v2/lm/deployments'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_DEPLOYMENTS_ENDPOINT}\"\n", + " \n", + " request_body = {\n", + " \"configurationId\": configuration_id\n", + " }\n", + " \n", + " try:\n", + " response = requests.post(\n", + " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", + " )\n", + " print(response)\n", + " if(response.status_code != 202):\n", + " print(\"Deployment execution failed\")\n", + " result = response.json()\n", + " print(result)\n", + " return result['id']\n", + " \n", + " except:\n", + " logging.error(\"Error occurred while attempting to create an execution\")\n", + " raise\n", + "\n", + "def get_deployment_status(orchestration_deployment_id):\n", + " headers = _get_headers()\n", + " api_url = f\"{AICORE_BASE_URL}/v2/lm/deployments/{orchestration_deployment_id}?$select=status\"\n", + " timeout = 400 \n", + " initial_interval = 30 \n", + " pending_interval = 10\n", + " start = time.time()\n", + "\n", + " status = None\n", + " current_interval = initial_interval\n", + "\n", + " while time.time() - start < timeout:\n", + " response = requests.get(api_url, headers=headers)\n", + " if response.status_code == 200:\n", + " status = response.json().get('status')\n", + " print(f\"Deployment {orchestration_deployment_id} status: {status}\")\n", + " # Adjust polling interval based on status\n", + " if status == 'RUNNING':\n", + " return True\n", + " elif status == 'UNKNOWN':\n", + " current_interval = initial_interval\n", + " elif status == 'PENDING':\n", + " current_interval = pending_interval\n", + "\n", + " else:\n", + " print(f\"Failed to fetch deployment status. HTTP {response.status_code}\")\n", + " return False\n", + "\n", + " # Waiting according to status for API call\n", + " time.sleep(current_interval)\n", + "\n", + "def get_deployment_url(orchestration_deployment_id):\n", + " headers = _get_headers()\n", + " response = requests.get(f\"{AICORE_BASE_URL}/v2/lm/deployments/{orchestration_deployment_id}\", headers=headers)\n", + " if response.status_code != 200:\n", + " raise Exception(f\"Failed to get deployment URL: {response.status_code} - {response.text}\")\n", + " return response.json().get('deploymentUrl')\n", + "\n", + "# You can skip this step if you already have a orchestration deployment running\n", + "deployment_url = DEPLOYMENT_URL\n", + "if not deployment_url:\n", + " configuration_id = create_orchestration_configuration()\n", + " orchestration_deployment_id = execute_orchestration_deployment(configuration_id)\n", + " is_running = get_deployment_status(orchestration_deployment_id) \n", + " if is_running:\n", + " deployment_url = get_deployment_url(orchestration_deployment_id)\n", + " print(f\"Deployment URL: {deployment_url}\")\n", + " else:\n", + " print(\"Deployment is not running or failed.\")" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [], + "source": [ + "# Manually set the orchestration deployment url\n", + "# deployment_url=\"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Create a Prompt Template in Prompt Registry \n", + "\n", + "The following code defines a function `create_prompt_template()` that creates a new **Prompt Template** in the SAP AI Core **Prompt Registry**.\n", + "\n", + "**Note** : If you wish to use a prompt template that already exists in prompt registry, you can manually set `prompt_template_id` in the next cell and skip executing this cell" + ] + }, + { + "cell_type": "code", + "execution_count": 199, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'message': 'Prompt updated successfully.', 'id': '29deed9b-6678-4548-94c1-e68e1fc2e2bd', 'scenario': 'genai-evaluations', 'name': 'prompt-registry-eval-demo', 'version': '1.0.0'}\n" + ] + } + ], + "source": [ + "def create_prompt_template():\n", + " headers = _get_headers()\n", + " GET_PROMPT_TEMPLATES_ENDPOINT = '/v2/lm/promptTemplates'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_PROMPT_TEMPLATES_ENDPOINT}\"\n", + " \n", + " \n", + " prompt_template = {\n", + " \"template\": [\n", + " {\n", + " \"role\": \"user\",\n", + " \"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\"\n", + " }\n", + " ]\n", + " }\n", + "\n", + " request_body = {\n", + " \"name\": \"prompt-registry-eval-demo\",\n", + " \"version\": \"1.0.0\",\n", + " \"scenario\": \"genai-evaluations\",\n", + " \"spec\": prompt_template\n", + " }\n", + " try:\n", + " response = requests.post(\n", + " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", + " )\n", + " if(response.status_code != 200):\n", + " raise\n", + " result = response.json()\n", + " print(result)\n", + " return result['id']\n", + " except:\n", + " logging.error(\"Error occurred while attempting to create a prompt template\")\n", + " raise\n", + "\n", + "prompt_template_id = create_prompt_template()" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [], + "source": [ + "# Manually set prompt_template_id here if you wish to use pre existing prompt template\n", + "# prompt_template_id=\"\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Select your metrics\n", + " \n", + "Add the metrices need to be evaluated in `selected_metrics_str`.\n", + "\n", + "**Note: If your dataset does not have a reference column, DO NOT Select metrcis where reference is required.**" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Pointwise Conciseness,Pointwise Instruction Following,Pointwise Correctness,Pointwise Answer Relevance,Exact Match,BLEU,ROUGE,Content Filter on Input,Content Filter on Output\n" + ] + } + ], + "source": [ + "# Manual Selection of Metrics\n", + "selected_metrics_str = \"Pointwise Conciseness,Pointwise Instruction Following,Pointwise Correctness,Pointwise Answer Relevance,Exact Match,BLEU,ROUGE,Content Filter on Input,Content Filter on Output\"\n", + "print(selected_metrics_str)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Select your Models\n", + " \n", + "Add the models you wish to use in the string `selected_models_str`\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Selected models string: gemini-2.5-pro:001,gpt-4o:2024-08-06,gpt-5:2025-08-07\n" + ] + } + ], + "source": [ + "# Manual selection of models\n", + "selected_models_str=\"gemini-2.5-pro:001,gpt-4o:2024-08-06,gpt-5:2025-08-07\"\n", + "print(\"Selected models string:\", selected_models_str)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Start Evaluation Run (Step 3)" + ] + }, + { + "cell_type": "code", + "execution_count": 217, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Selected metrics: Pointwise Conciseness,Pointwise Instruction Following,Pointwise Correctness,Pointwise Answer Relevance,Exact Match,BLEU,ROUGE,Content Filter on Input,Content Filter on Output\n", + "Selected models: gemini-2.5-pro:001,gpt-4o:2024-08-06,gpt-5:2025-08-07\n" + ] + } + ], + "source": [ + "\n", + "import json\n", + "test_data_path = f\"testdata/{DATASET_NAME}\" # specify the test data path here. For the full folder just specifying testdata will work\n", + "test_datasets = json.dumps({'path': test_data_path, 'type': 'csv'})\n", + "metrics_list = selected_metrics_str\n", + "models_list = selected_models_str\n", + "print(f\"Selected metrics: {metrics_list}\")\n", + "print(f\"Selected models: {models_list}\")\n", + "#variable_mapping = json.dumps({'prompt/question': 'data/topic'}) # to map the question prompt variable to the entry in dataset.\n", + "orchestration_deployment_url = deployment_url # needs to specify this to use a specific deployment id\n", + "repetitions = \"1\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# creating an AICORE Configuration.\n", + "import requests\n", + "\n", + "request_body = {\n", + " \"name\": \"genai-eval-conf\",\n", + " \"scenarioId\": \"genai-evaluations\",\n", + " \"executableId\": \"genai-evaluations-simplified\",\n", + " \"inputArtifactBindings\": [\n", + " {\n", + " \"key\": \"datasetFolder\",\n", + " \"artifactId\": artifact_id\n", + " }\n", + " ],\n", + " \"parameterBindings\": [\n", + " {\n", + " \"key\": \"repetitions\",\n", + " \"value\": repetitions\n", + " },\n", + " {\n", + " \"key\": \"orchestrationDeploymentURL\",\n", + " \"value\": orchestration_deployment_url\n", + " },\n", + " {\n", + " \"key\": \"metrics\",\n", + " \"value\": metrics_list\n", + " },\n", + " {\n", + " \"key\": \"testDataset\",\n", + " \"value\": test_datasets\n", + " },\n", + " {\n", + " \"key\": \"promptTemplate\",\n", + " \"value\": prompt_template_id\n", + " },\n", + " {\n", + " \"key\": \"models\",\n", + " \"value\": models_list\n", + " }\n", + " ]\n", + "}\n", + "\n", + "def create_aicore_configuration():\n", + " headers = _get_headers()\n", + " GET_CONFIGURATIONS_ENDPOINT = '/v2/lm/configurations'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_CONFIGURATIONS_ENDPOINT}\"\n", + " try:\n", + " print(request_body)\n", + " response = requests.post(\n", + " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", + " )\n", + " print(response)\n", + " if(response.status_code != 201):\n", + " raise Exception(f\"Failed to create configuration: {response.status_code} - {response.text}\")\n", + " result = response.json()\n", + " print(result)\n", + " print(request_body)\n", + " return result['id']\n", + " except:\n", + " logging.error(\"Error occurred while attempting to create a Configuration\")\n", + " raise\n", + " \n", + "configuration_id = create_aicore_configuration()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Evaluation Execution Creation\n", + "Once Configration is create, we create the AI Core execution which triggers the evaluation workload.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# create an execution with the created configuration.\n", + "\n", + "import requests\n", + "def create_execution():\n", + " headers = _get_headers()\n", + " GET_EXECUTIONS_ENDPOINT = '/v2/lm/executions'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_EXECUTIONS_ENDPOINT}\"\n", + " request_body = {\"configurationId\" : configuration_id} \n", + " try:\n", + " response = requests.post(\n", + " request_url, headers=headers, data=json.dumps(request_body), timeout=120\n", + " )\n", + " print(\"response received is \", response)\n", + " result = response.json()\n", + " print(result)\n", + " return result['id']\n", + " except:\n", + " logging.error(\"Error occurred while attempting to create an execution\")\n", + " raise\n", + " \n", + "\n", + "execution_id = create_execution()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# get execution status\n", + "import requests\n", + "def get_execution_status(execution_id):\n", + " headers = _get_headers()\n", + " LOG_EXECUTIONS_ENDPOINT = f'/v2/lm/executions/{execution_id}'\n", + " request_url = f\"{AICORE_BASE_URL}{LOG_EXECUTIONS_ENDPOINT}\"\n", + " try:\n", + " response = requests.get(\n", + " request_url, headers=headers, timeout=120\n", + " )\n", + " print(\"response received is \", response)\n", + " result = response.json()\n", + " return result\n", + " except:\n", + " logging.error(\"Error occurred while attempting to get execution status\")\n", + " raise\n", + " \n", + "\n", + "get_execution_status(execution_id)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "\n", + "Run the following cells only when the status field in the Execution response is \"COMPLETED\" to view the results.\n", + "\n", + "The status field progresses through different states over time: UNKNOWN → PENDING → RUNNING → COMPLETED. Ensure it reaches COMPLETED before proceeding.\n", + "\n", + "\n", + "Note: The targetStatus will always be COMPLETED from the start, as it represents the intended final state of the Execution. Do not confuse it with the actual status field.\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Evaluation Result (Step 4)\n", + "The evaluation job produces two outputs\n", + "1. A SQLite DB file which stores the orchestration input, orchestration output, values for all the metrics calculated for this orchestration output and statistics such as latency for this orchestration output. These metric values are called raw metric values. This SQLite DB file is stored in the object store as an AI Core output artifact.\n", + "2. A set of metrics whose values are aggregated from the raw metric values. The aggregate metrics are stored in the tracking service. The user-defined tags along with the run names are stored with the metrics.\n", + "Post execution completion user can see the runs generated by the workload along with the aggregate metrics by calling the tracking api as show below" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "response received is \n" + ] + } + ], + "source": [ + "# Get aggregate metrics using execution id\n", + "import requests\n", + "def retrieve_aggregate_metrics(execution_id):\n", + " headers = _get_headers()\n", + " GET_METRICS_ENDPOINT = f'/v2/lm/metrics?tagFilters=evaluation.ai.sap.com/child-of={execution_id}'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_METRICS_ENDPOINT}\"\n", + " try:\n", + " response = requests.get(request_url, headers=headers, timeout=120)\n", + " print(\"response received is \", response)\n", + " result = response.json()\n", + " return result\n", + " except:\n", + " logging.error(\"Error occurred while attempting to retreive aggeregate metrics for the run\")\n", + " raise\n", + "\n", + "runs_data = retrieve_aggregate_metrics(execution_id)" + ] + }, + { + "cell_type": "code", + "execution_count": 229, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
run_namerunIdmetrics_namemetric_value
Run-prompt-registry-eval-demo-gpt-4o-2024-05-13007d20838a084869a00767bbe2b39667Pointwise Conciseness/1/count0.0
Run-prompt-registry-eval-demo-gpt-4o-2024-05-13007d20838a084869a00767bbe2b39667Pointwise Conciseness/1/relative_frequency0.0
Run-prompt-registry-eval-demo-gpt-4o-2024-05-13007d20838a084869a00767bbe2b39667Pointwise Conciseness/2/count0.0
Run-prompt-registry-eval-demo-gpt-4o-2024-05-13007d20838a084869a00767bbe2b39667Pointwise Conciseness/2/relative_frequency0.0
Run-prompt-registry-eval-demo-gpt-4o-2024-05-13007d20838a084869a00767bbe2b39667Pointwise Conciseness/3/count7.0
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 229, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# A Sample method to transform the aggregate metric data returned by tracking service as per use case\n", + "\n", + "from json2html import *\n", + "from IPython.display import HTML\n", + "def transform_run_data(runs_list): \n", + " transformed_data = []\n", + " for run in runs_list:\n", + " try:\n", + " output_json = {}\n", + "\n", + " # Extract run_name from tags\n", + " for tag in run.get(\"tags\", []):\n", + " if tag.get(\"name\") == \"evaluation.ai.sap.com/run-name\":\n", + " run_name = tag.get(\"value\")\n", + " break \n", + " if run_name is None:\n", + " continue\n", + "\n", + " # Rename executionId to runId\n", + " run_id = run.get(\"executionId\")\n", + " if run_id is None: \n", + " continue\n", + " \n", + " # Extract metrics_name and metric_value from metrics\n", + " metrics = run.get(\"metrics\", [])\n", + " if not metrics: \n", + " continue;\n", + " for metric in metrics:\n", + " output_json = {\n", + " \"run_name\": run_name,\n", + " \"runId\": run_id,\n", + " \"metrics_name\": metric.get(\"name\"),\n", + " \"metric_value\": metric.get(\"value\")\n", + " }\n", + " transformed_data.append(output_json)\n", + "\n", + " except (TypeError, AttributeError): # Handle potential errors if input is not in the expected format\n", + " continue;\n", + " return transformed_data\n", + "# Transform the run data first\n", + "transformed_data = transform_run_data(runs_data['resources'])\n", + "# Fetch unique run names\n", + "unique_run_names = list({entry['run_name'] for entry in transformed_data})\n", + "\n", + "HTML(json2html.convert(json = transformed_data[:5]))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To view this data into a more comaprison-friendly format, we transform the data into a tabular format, with rows being the different models evaluated, and columns being the different metrics calculated." + ] + }, + { + "cell_type": "code", + "execution_count": 255, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
metrics_nameBLEU/meanBLEU/medianBLEU/p90BLEU/p95BLEU/stddevContent Filter on Input/0/countContent Filter on Input/0/relative_frequencyContent Filter on Input/1/countContent Filter on Input/1/relative_frequencyContent Filter on Input/entropyContent Filter on Input/modeContent Filter on Output/0/countContent Filter on Output/0/relative_frequencyContent Filter on Output/1/countContent Filter on Output/1/relative_frequencyContent Filter on Output/entropyContent Filter on Output/modeExact Match/0/countExact Match/0/relative_frequencyExact Match/1/countExact Match/1/relative_frequencyExact Match/entropyExact Match/modePointwise Answer Relevance/1/countPointwise Answer Relevance/1/relative_frequencyPointwise Answer Relevance/2/countPointwise Answer Relevance/2/relative_frequencyPointwise Answer Relevance/3/countPointwise Answer Relevance/3/relative_frequencyPointwise Answer Relevance/4/countPointwise Answer Relevance/4/relative_frequencyPointwise Answer Relevance/5/countPointwise Answer Relevance/5/relative_frequencyPointwise Answer Relevance/entropyPointwise Answer Relevance/modePointwise Conciseness/1/countPointwise Conciseness/1/relative_frequencyPointwise Conciseness/2/countPointwise Conciseness/2/relative_frequencyPointwise Conciseness/3/countPointwise Conciseness/3/relative_frequencyPointwise Conciseness/4/countPointwise Conciseness/4/relative_frequencyPointwise Conciseness/5/countPointwise Conciseness/5/relative_frequencyPointwise Conciseness/entropyPointwise Conciseness/modePointwise Correctness/1/countPointwise Correctness/1/relative_frequencyPointwise Correctness/2/countPointwise Correctness/2/relative_frequencyPointwise Correctness/3/countPointwise Correctness/3/relative_frequencyPointwise Correctness/4/countPointwise Correctness/4/relative_frequencyPointwise Correctness/5/countPointwise Correctness/5/relative_frequencyPointwise Correctness/entropyPointwise Correctness/modePointwise Instruction Following/1/countPointwise Instruction Following/1/relative_frequencyPointwise Instruction Following/2/countPointwise Instruction Following/2/relative_frequencyPointwise Instruction Following/3/countPointwise Instruction Following/3/relative_frequencyPointwise Instruction Following/4/countPointwise Instruction Following/4/relative_frequencyPointwise Instruction Following/5/countPointwise Instruction Following/5/relative_frequencyPointwise Instruction Following/entropyPointwise Instruction Following/modeROUGE/meanROUGE/medianROUGE/p90ROUGE/p95ROUGE/stddevcompletion_tokens/sumlatency/averageprompt_tokens/sumsubmission/sum
model
gemini-2.5-pro0.0030340.00.0095820.0140530.00625349.01.00.00.00.0049.01.00.00.00.0049.01.00.00.00.002.00.0408161.00.0204082.00.0408161.00.02040843.00.8775510.77125350.00.00.00.027.00.5510222.00.448980.00.00.99247632.00.0408161.00.0204080.00.00.00.046.00.9387760.38850950.00.00.00.02.00.0408161.00.02040846.00.9387760.38850950.0793560.0710480.1623270.1721740.063277118298.0124.846381250.049.0
gpt-4o0.0041790.00.0167570.026590.01105449.01.00.00.00.0049.01.00.00.00.0049.01.00.00.00.001.00.0204080.00.03.00.0612246.00.12244939.00.7959180.99439750.00.00.00.00.00.027.00.5510222.00.448980.99247641.00.0204080.00.01.00.0204082.00.04081645.00.9183670.53035550.00.00.00.01.00.0204082.00.04081646.00.9387760.38850950.1078620.1117020.1880920.2224620.06887920490.0302.0736281608.049.0
gpt-50.0021450.00.0087390.0111190.00633549.01.00.00.00.0049.01.00.00.00.0049.01.00.00.00.002.00.0408162.00.0408163.00.0612243.00.06122439.00.7959181.13225150.00.00.00.03.00.06122421.00.42857125.00.5102041.26593554.00.0816332.00.0408161.00.0204081.00.02040841.00.8367350.92777951.00.0204081.00.0204083.00.0612243.00.06122441.00.8367350.93778350.0929840.1009170.1645940.2022170.06209198856.0511.5321931559.049.0
" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 255, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "\n", + "def get_model_from_run(run):\n", + " for tag in run.get(\"tags\", []):\n", + " if tag.get(\"name\") == \"evaluation.ai.sap.com/model\":\n", + " return tag.get(\"value\")\n", + "\n", + "def aggregate_metrics_by_model(runs_list):\n", + " transformed_data = []\n", + " for run in runs_list:\n", + " model = get_model_from_run(run)\n", + " for metric in run[\"metrics\"]:\n", + " metric_value = metric.get(\"value\")\n", + "\n", + " # Override only for /mode\n", + " if metric.get(\"name\").endswith(\"/mode\"):\n", + " for label in metric.get(\"labels\", []):\n", + " if label.get(\"name\") == \"evaluation.ai.sap.com/mode_category\":\n", + " metric_value = label.get(\"value\")\n", + " break\n", + " output_json = {\n", + " \"model\": model,\n", + " \"metrics_name\": metric.get(\"name\"),\n", + " \"metric_value\": metric_value\n", + " }\n", + " transformed_data.append(output_json)\n", + " return transformed_data\n", + "\n", + "\n", + "def create_metrics_pivot_table(transformed_data):\n", + " \"\"\"\n", + " Creates a pivot table where rows are models and columns are metrics.\n", + " \n", + " Args:\n", + " transformed_data: List of dictionaries with 'model', 'metrics_name', 'metric_value'\n", + " \n", + " Returns:\n", + " DataFrame with models as rows and metrics as columns\n", + " \"\"\"\n", + " # Convert list of dictionaries to DataFrame\n", + " df = pd.DataFrame(transformed_data)\n", + " \n", + " # Create pivot table\n", + " pivot_table = df.pivot_table(\n", + " index='model',\n", + " columns='metrics_name',\n", + " values='metric_value',\n", + " aggfunc='first' # Use 'first' to get the single value, or 'mean' if there are duplicates\n", + " )\n", + " \n", + " return pivot_table\n", + "\n", + "transformed_data = aggregate_metrics_by_model(runs_data['resources'])\n", + "metrics_pivot = create_metrics_pivot_table(transformed_data)\n", + "\n", + "HTML(metrics_pivot.to_html())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To further drill down , User can also download the SQLite DB file from object storage and analyse the results(instance level metrics, logs etc) locally." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# download the result artifacts from Object store.\n", + "import boto3\n", + "\n", + "def download_all_objects(prefix, destination_folder):\n", + " \"\"\"\n", + " Recursively download all objects from an S3 bucket starting with a specific prefix.\n", + "\n", + " :param bucket_name: Name of the S3 bucket.\n", + " :param prefix: Prefix to filter objects in the bucket.\n", + " :param destination_folder: Local folder to save the downloaded files.\n", + " \"\"\"\n", + " s3_client = boto3.client(\n", + " 's3',\n", + " aws_access_key_id=AWS_ACCESS_KEY,\n", + " aws_secret_access_key=AWS_SECRET_ACCESS_KEY,\n", + " region_name=AWS_REGION\n", + " )\n", + "\n", + " # Ensure the destination folder exists\n", + " if not os.path.exists(destination_folder):\n", + " os.makedirs(destination_folder)\n", + "\n", + " # Paginate through objects\n", + " paginator = s3_client.get_paginator('list_objects_v2')\n", + " pages = paginator.paginate(Bucket=AWS_BUCKET_ID, Prefix=prefix)\n", + "\n", + " for page in pages:\n", + " if 'Contents' in page:\n", + " for obj in page['Contents']:\n", + " key = obj['Key']\n", + " local_file_path = os.path.join(destination_folder, os.path.relpath(key, prefix))\n", + "\n", + " # Ensure the local directory structure exists\n", + " local_directory = os.path.dirname(local_file_path)\n", + " if not os.path.exists(local_directory):\n", + " os.makedirs(local_directory)\n", + "\n", + " # Download the object\n", + " print(f\"Downloading {key} to {local_file_path}\")\n", + " s3_client.download_file(AWS_BUCKET_ID, key, local_file_path)\n", + "\n", + "\n", + "# Download the evaluation results from the object store. Look at execution status under \"outputArtifacts\" key to see the 'url'\n", + "# which shows the data path of where your output results are stored\n", + "EXECUTION_ID = execution_id\n", + "sqlite_db_prefix = f'{EXECUTION_ID}/tmp/' # change the prefix based on where your output artifact is stored in the bucket.\n", + "destination_folder = 'results-new'\n", + "\n", + "download_all_objects(sqlite_db_prefix, destination_folder)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "NOTE: The below Cell shows results of top 5 rows of the Evaluation Results across all SQLite tables. IF you wish to see all the entries you can comment the line saying df.head(10) in the below cell or modify the number accordingly." + ] + }, + { + "cell_type": "code", + "execution_count": 256, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "
\n", + "\n", + "
\n", + "

Table: run

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idnameconfigtagscreated_atupdated_at
83d31ab3552644d2bccddc73d9c9f30dRun-prompt-registry-eval-demo-gemini-2.5-pro-001{\"modules\": {\"prompt_templating\": {\"model\": {\"name\": \"gemini-2.5-pro\", \"version\": \"001\"}, \"prompt\": {\"template\": [{\"role\": \"user\", \"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\"}], \"defaults\": {}}}}}{\"promptTemplateId\": \"ee5205b4-a3c4-4217-9292-e663e6df0012\"}2026-02-04 15:25:39.1047282026-02-04 15:25:39.104731
2663332bb4da43c089217193cbae88ceRun-prompt-registry-eval-demo-gpt-4o-2024-08-06{\"modules\": {\"prompt_templating\": {\"model\": {\"name\": \"gpt-4o\", \"version\": \"2024-08-06\"}, \"prompt\": {\"template\": [{\"role\": \"user\", \"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\"}], \"defaults\": {}}}}}{\"promptTemplateId\": \"ee5205b4-a3c4-4217-9292-e663e6df0012\"}2026-02-04 15:25:39.1047352026-02-04 15:25:39.104735
a5be2752cae64582922f96b80c890dc8Run-prompt-registry-eval-demo-gpt-5-2025-08-07{\"modules\": {\"prompt_templating\": {\"model\": {\"name\": \"gpt-5\", \"version\": \"2025-08-07\"}, \"prompt\": {\"template\": [{\"role\": \"user\", \"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\"}], \"defaults\": {}}}}}{\"promptTemplateId\": \"ee5205b4-a3c4-4217-9292-e663e6df0012\"}2026-02-04 15:25:39.1047392026-02-04 15:25:39.104740
\n", + "
\n", + " \n", + "
\n", + "

Table: configuration

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idtest_datasetsmetricsvariable_mappingtagsorchestration_deployment_urlrepetitionsmetric_templatescreated_atupdated_at
e7c519079615412c820ef465baa88905{\"path\": \"testdata/medicalqna_dataset.csv\", \"type\": \"csv\"}[\"Pointwise Conciseness\", \"Pointwise Instruction Following\", \"Pointwise Correctness\", \"Pointwise Answer Relevance\", \"Exact Match\", \"BLEU\", \"ROUGE\", \"Content Filter on Input\", \"Content Filter on Output\"]{}{}https://api.ai.aicore-pr.eu-west-1.mlf-aws-dev.com/v2/inference/deployments/d0d6f232abfea6721[{\"evaluationMethod\": \"llm-as-a-judge\", \"scenario\": \"genai-evaluations\", \"createdAt\": \"2025-11-19 00:00:00+00:00\", \"managedBy\": \"imperative\", \"metricType\": \"evaluation\", \"systemPredefined\": true, \"id\": \"95c03e1b-3938-42dd-bc69-3ec5cd0e5e18\", \"name\": \"Pointwise Conciseness\", \"description\": \"Measures how short and concise the model\\u2019s response is. Scores range from 1 to 5, with higher values indicating a more concise answer.\", \"version\": \"1.0.0\", \"spec\": {\"outputType\": \"categorical\", \"promptType\": \"structured\", \"configuration\": {\"modelConfiguration\": {\"name\": \"gpt-4.1\", \"version\": \"2025-04-14\", \"parameters\": [{\"key\": \"temperature\", \"value\": \"0\"}]}, \"promptConfiguration\": {\"evaluationTask\": \"You are an expert evaluator. Your task is to evaluate the conciseness of responses generated by AI models.\\nWe will provide you with the user input and an AI-generated response.\\nYou should first read the user input carefully to understand the context and intention, and then evaluate the conciseness of the response based on the criteria provided in the Evaluation section below.\\nYou will assign the response a rating following the Rating Rubric and Evaluation Steps.\\nGive step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.\", \"definition\": \"You will be assessing conciseness, which measures the ability to convey the necessary information in a clear and succinct manner.\", \"criteria\": \"Conciseness: Does the response deliver the essential information without unnecessary words or redundancy?\", \"ratingRubric\": [{\"rating\": \"1\", \"rule\": \"(Not concise). The response is not concise and is filled with unnecessary or redundant content that obscures the main points.\"}, {\"rating\": \"2\", \"rule\": \"(Slightly concise). The response is slightly concise and contains a significant amount of unnecessary or redundant information.\"}, {\"rating\": \"3\", \"rule\": \"(Somewhat concise). The response is somewhat concise but may include some unnecessary words or slightly redundant information.\"}, {\"rating\": \"4\", \"rule\": \"(Mostly concise). The response is mostly concise and generally avoids unnecessary words while covering the essential information.\"}, {\"rating\": \"5\", \"rule\": \"(Highly concise). The response is very concise, delivering all necessary information in a succinct manner without any superfluous content.\"}], \"evaluationSteps\": [\"Assess the response in terms of Conciseness. Identify how effectively the response communicates essential information without unnecessary words according to the Criteria.\", \"Score based on the rating rubric. Give a brief rationale to explain your evaluation considering Conciseness.\"]}}}, \"usageType\": [\"evaluation\"], \"additionalProperties\": {\"variables\": [], \"supported_values\": [1, 5], \"experimental\": true}}, {\"evaluationMethod\": \"llm-as-a-judge\", \"scenario\": \"genai-evaluations\", \"createdAt\": \"2025-11-19 00:00:00+00:00\", \"managedBy\": \"imperative\", \"metricType\": \"evaluation\", \"systemPredefined\": true, \"id\": \"cd3ffd21-faae-4f06-8184-52541182d9a5\", \"name\": \"Pointwise Instruction Following\", \"description\": \"Evaluates the model\\u2019s ability to follow the instructions provided in the user prompt. Scores range from 1 to 5, with 1 indicating no fulfillment and 5 indicating complete fulfillment.\", \"version\": \"1.0.0\", \"spec\": {\"outputType\": \"categorical\", \"promptType\": \"structured\", \"configuration\": {\"modelConfiguration\": {\"name\": \"gpt-4.1\", \"version\": \"2025-04-14\", \"parameters\": [{\"key\": \"temperature\", \"value\": \"0\"}]}, \"promptConfiguration\": {\"evaluationTask\": \"Please act as an impartial judge and evaluate the quality of the responses based on the prompt and following criteria:\", \"definition\": \"You will be assessing model's the ability to follow instructions provided in the user prompt.\", \"criteria\": \"Instruction following: The response demonstrates a clear understanding of the instructions in the user prompt, satisfying all of the instruction's requirements. Evaluate the responses STRICTLY on the ability to follow instruction ONLY.\", \"ratingRubric\": [{\"rating\": \"1\", \"rule\": \"(No fulfillment). Response does not address the most important aspects of the instruction. The user would feel like their request was not at all understood.\"}, {\"rating\": \"2\", \"rule\": \"(Poor fulfillment). Response addresses some aspects of the instruction but misses key requirements or major components. The user would feel like their instruction was misunderstood in significant ways.\"}, {\"rating\": \"3\", \"rule\": \"(Some fulfillment). Response does not address some minor aspects and/or ignores some requirements of the instruction. The user would feel like their instruction was partially understood.\"}, {\"rating\": \"4\", \"rule\": \"(Good fulfillment). Response addresses most aspects and requirements of the instruction. It might miss very minor details or have slight deviations from requirements. The user would feel like their instruction was well understood.\"}, {\"rating\": \"5\", \"rule\": \"(Complete fulfillment). Response addresses all aspects and adheres to all requirements of the instruction. The user would feel like their instruction was completely understood.\"}]}}}, \"usageType\": [\"evaluation\"], \"additionalProperties\": {\"variables\": [], \"supported_values\": [1, 5], \"experimental\": false}}, {\"evaluationMethod\": \"llm-as-a-judge\", \"scenario\": \"genai-evaluations\", \"createdAt\": \"2025-11-19 00:00:00+00:00\", \"managedBy\": \"imperative\", \"metricType\": \"evaluation\", \"systemPredefined\": true, \"id\": \"36d1abca-cf01-48f6-9bd1-8d5e1272a374\", \"name\": \"Pointwise Correctness\", \"description\": \"Evaluates whether an LLM response is correct, accurate, and factual using a user-provided reference, for both general and retrieval-augmented (RAG) use cases. Scores range from 1 to 5, with 1 indicating completely incorrect and 5 indicating fully correct.\", \"version\": \"1.0.0\", \"spec\": {\"outputType\": \"categorical\", \"promptType\": \"structured\", \"configuration\": {\"modelConfiguration\": {\"name\": \"gpt-4.1\", \"version\": \"2025-04-14\", \"parameters\": [{\"key\": \"temperature\", \"value\": \"0\"}]}, \"promptConfiguration\": {\"evaluationTask\": \"You are an expert evaluator. Your task is to evaluate the correctness of responses generated by AI models.\\nWe will provide you with the user input, an AI-generated response, and a reference answer.\\nYou should first read the user input carefully to understand the task and intent, then evaluate the correctness of the response based on the criteria and rubric below.\\nAssign the response a rating using the Rating Rubric and Evaluation Steps.\\nGive step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.\", \"definition\": \"You will be assessing correctness, which measures whether the response is factually accurate, complete, and directly answers the user's query as intended, using the reference as the main authority.\", \"criteria\": \"Correctness:\\n - Is the response factually accurate and free from errors?\\n- Does the response address all parts of the user's question?\\n- Does the answer avoid missing key information?\\n- Does the response avoid introducing incorrect, misleading, or unrelated information?\\n- If the question is ambiguous or lacks context, is an appropriate clarification or expression of uncertainty provided?\\n- If the reference is a refusal, clarification, or contains specific instructions (such as links or attributions), does the response follow this appropriately?\\n- Are alternative correct answers recognized, not just verbatim matches to the reference?\", \"ratingRubric\": [{\"rating\": \"1\", \"rule\": \"(Incorrect). The response is fundamentally incorrect, misleading, irrelevant, or an inappropriate refusal.\"}, {\"rating\": \"2\", \"rule\": \"(Somewhat incorrect). The response contains significant inaccuracies, omissions, or context mismatch; unreliable as an answer.\"}, {\"rating\": \"3\", \"rule\": \"(Somewhat correct). The response is partially correct, but with notable errors, missing key aspects, or context insensitivity.\"}, {\"rating\": \"4\", \"rule\": \"(Mostly correct). The response is mostly correct, with only minor omissions, ambiguities, or context mismatches.\"}, {\"rating\": \"5\", \"rule\": \"(Completely correct). The response is fully correct, complete, and directly answers the user's query as intended.\"}], \"evaluationSteps\": [\"Assess the response for factual accuracy, completeness, and directness in answering the user's query. Identify any errors, omissions, or irrelevant information.\", \"Consider if the response appropriately handles ambiguity, context, or special instructions from the reference.\", \"Score based on the rating rubric. Give a concise, unbiased rationale for your evaluation, focusing on correctness.\"]}}}, \"usageType\": [\"evaluation\"], \"additionalProperties\": {\"variables\": [], \"supported_values\": [1, 5], \"experimental\": false}}, {\"evaluationMethod\": \"llm-as-a-judge\", \"scenario\": \"genai-evaluations\", \"createdAt\": \"2025-11-19 00:00:00+00:00\", \"managedBy\": \"imperative\", \"metricType\": \"evaluation\", \"systemPredefined\": true, \"id\": \"0ae30283-0140-451e-8a88-267ef801f35c\", \"name\": \"Pointwise Answer Relevance\", \"description\": \"Measures how closely the model\\u2019s response relates to the user prompt, for both general and RAG use cases. Scores range from 1 to 5, with higher values indicating greater relevance.\", \"version\": \"1.0.0\", \"spec\": {\"outputType\": \"categorical\", \"promptType\": \"structured\", \"configuration\": {\"modelConfiguration\": {\"name\": \"gpt-4.1\", \"version\": \"2025-04-14\", \"parameters\": [{\"key\": \"temperature\", \"value\": \"0\"}]}, \"promptConfiguration\": {\"evaluationTask\": \"You are an expert evaluator. Your task is to evaluate the relevance of responses generated by AI models.\\nWe will provide you with the user input and an AI-generated response.\\nYou should first read the user input carefully to understand the context and intention, and then evaluate the relevance of the response based on the criteria provided in the Evaluation section below.\\nYou will assign the response a rating following the Rating Rubric and Evaluation Steps.\\nGive step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.\", \"definition\": \"You will be assessing relevance, which measures the ability to provide a response that is pertinent and useful based on the user prompt and the context provided.\", \"criteria\": \"Relevance: Does the response address the user's query appropriately and provide pertinent information?\", \"ratingRubric\": [{\"rating\": \"1\", \"rule\": \"(Irrelevant). The response is irrelevant and does not address the user's query.\"}, {\"rating\": \"2\", \"rule\": \"(Slightly relevant). The response is slightly relevant and largely misses the user's query.\"}, {\"rating\": \"3\", \"rule\": \"(Somewhat relevant). The response is somewhat relevant but may miss key aspects of the user's query.\"}, {\"rating\": \"4\", \"rule\": \"(Mostly relevant). The response is mostly relevant and generally addresses the user's query with useful information.\"}, {\"rating\": \"5\", \"rule\": \"(Highly relevant). The response is highly relevant, directly addresses the user's query, and provides useful information.\"}], \"evaluationSteps\": [\"Assess the response in terms of Relevance. Identify how well the response aligns with the user's query and context according to the Criteria.\", \"Score based on the rating rubric. Give a brief rationale to explain your evaluation considering Relevance.\"]}}}, \"usageType\": [\"evaluation\"], \"additionalProperties\": {\"variables\": [], \"supported_values\": [1, 5], \"experimental\": true}}, {\"evaluationMethod\": \"computed\", \"scenario\": \"genai-evaluations\", \"createdAt\": \"2025-11-19 00:00:00+00:00\", \"managedBy\": \"imperative\", \"metricType\": \"evaluation\", \"systemPredefined\": true, \"id\": \"39f4a0ba-8a21-4cda-be95-703dba47e4f1\", \"name\": \"Exact Match\", \"description\": \"Boolean indicating whether the output exactly matches the reference.\", \"version\": \"1.0.0\", \"spec\": {\"outputType\": \"boolean\"}, \"usageType\": [\"evaluation\"], \"additionalProperties\": {\"variables\": [\"reference\"], \"supported_values\": [0, 1], \"experimental\": false}}, {\"evaluationMethod\": \"computed\", \"scenario\": \"genai-evaluations\", \"createdAt\": \"2025-11-19 00:00:00+00:00\", \"managedBy\": \"imperative\", \"metricType\": \"evaluation\", \"systemPredefined\": true, \"id\": \"3ea07c1f-5b10-4b12-bf46-6d429faf8010\", \"name\": \"BLEU\", \"description\": \"BLEU (Bilingual Evaluation Understudy) evaluates machine-translated text quality by calculating n-gram precision between candidate and reference translations. Scores range from 0 to 1, with higher values indicating greater similarity.\", \"version\": \"1.0.0\", \"spec\": {\"outputType\": \"numerical\"}, \"usageType\": [\"evaluation\"], \"additionalProperties\": {\"variables\": [\"reference\"], \"supported_values\": [0, 1], \"experimental\": false}}, {\"evaluationMethod\": \"computed\", \"scenario\": \"genai-evaluations\", \"createdAt\": \"2025-11-19 00:00:00+00:00\", \"managedBy\": \"imperative\", \"metricType\": \"evaluation\", \"systemPredefined\": true, \"id\": \"3904208a-b886-41b1-8448-d363245d5397\", \"name\": \"ROUGE\", \"description\": \"ROUGE (Recall-Oriented Understudy for Gisting Evaluation) is a set of metrics for evaluating summarization and machine translation by measuring overlap in n-grams, word sequences, and word pairs between generated and reference texts. This implementation is case-insensitive.\", \"version\": \"1.0.0\", \"spec\": {\"outputType\": \"numerical\"}, \"usageType\": [\"evaluation\"], \"additionalProperties\": {\"variables\": [\"reference\"], \"supported_values\": [0, 1], \"experimental\": false}}, {\"evaluationMethod\": \"computed\", \"scenario\": \"genai-evaluations\", \"createdAt\": \"2025-11-19 00:00:00+00:00\", \"managedBy\": \"imperative\", \"metricType\": \"evaluation\", \"systemPredefined\": true, \"id\": \"ba2ece64-d4ed-4645-96d3-8728af12515f\", \"name\": \"Content Filter on Input\", \"description\": \"Boolean indicating whether the input content filter was invoked.\", \"version\": \"1.0.0\", \"spec\": {\"outputType\": \"boolean\"}, \"usageType\": [\"evaluation\"], \"additionalProperties\": {\"variables\": [], \"supported_values\": [0, 1], \"experimental\": false}}, {\"evaluationMethod\": \"computed\", \"scenario\": \"genai-evaluations\", \"createdAt\": \"2025-11-19 00:00:00+00:00\", \"managedBy\": \"imperative\", \"metricType\": \"evaluation\", \"systemPredefined\": true, \"id\": \"e677ac61-5c39-4f32-8feb-4460ff6b3c23\", \"name\": \"Content Filter on Output\", \"description\": \"Boolean indicating whether the output content filter was invoked.\", \"version\": \"1.0.0\", \"spec\": {\"outputType\": \"boolean\"}, \"usageType\": [\"evaluation\"], \"additionalProperties\": {\"variables\": [], \"supported_values\": [0, 1], \"experimental\": false}}]2026-02-04 15:25:39.0961172026-02-04 15:25:39.096121
\n", + "
\n", + " \n", + "
\n", + "

Table: submission

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idrun_idorchestration_configurationtemplate_variablescreated_atupdated_at
7516fcd4de96441a8ae9e8f68bdb3f5c83d31ab3552644d2bccddc73d9c9f30d{\"modules\": {\"prompt_templating\": {\"model\": {\"name\": \"gemini-2.5-pro\", \"version\": \"001\"}, \"prompt\": {\"template\": [{\"role\": \"user\", \"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\"}], \"defaults\": {}}}}}{\"question\": \"how does rivatigmine and otc sleep medicine interact\", \"sentiment\": \"Interaction\", \"reference\": \"tell your doctor and pharmacist what prescription and nonprescription medications, vitamins, nutritional supplements, and herbal products you are taking or plan to take. Be sure to mention any of the following: antihistamines; aspirin and other nonsteroidal anti-inflammatory medications (NSAIDs) such as ibuprofen (Advil, Motrin) and naproxen (Aleve, Naprosyn); bethanechol (Duvoid, Urecholine); ipratropium (Atrovent, in Combivent, DuoNeb); and medications for Alzheimer's disease, glaucoma, irritable bowel disease, motion sickness, ulcers, or urinary problems. Your doctor may need to change the doses of your medications or monitor you carefully for side effects.\"}2026-02-04 15:25:39.1149732026-02-04 15:25:39.114975
fbe9fd3eaa1945bfa97204593045861483d31ab3552644d2bccddc73d9c9f30d{\"modules\": {\"prompt_templating\": {\"model\": {\"name\": \"gemini-2.5-pro\", \"version\": \"001\"}, \"prompt\": {\"template\": [{\"role\": \"user\", \"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\"}], \"defaults\": {}}}}}{\"question\": \"how does valium affect the brain\", \"sentiment\": \"Action\", \"reference\": \"Diazepam is a benzodiazepine that exerts anxiolytic, sedative, muscle-relaxant, anticonvulsant and amnestic effects. Most of these effects are thought to result from a facilitation of the action of gamma aminobutyric acid (GABA), an inhibitory neurotransmitter in the central nervous system.\"}2026-02-04 15:25:39.1149782026-02-04 15:25:39.114979
c64e5a8aef454b1f864d90611e2c1a9b83d31ab3552644d2bccddc73d9c9f30d{\"modules\": {\"prompt_templating\": {\"model\": {\"name\": \"gemini-2.5-pro\", \"version\": \"001\"}, \"prompt\": {\"template\": [{\"role\": \"user\", \"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\"}], \"defaults\": {}}}}}{\"question\": \"what is morphine\", \"sentiment\": \"Information\", \"reference\": \"Morphine is a pain medication of the opiate family which is found naturally in a number of plants and animals.[5][7] It acts directly on the central nervous system (CNS) to decrease the feeling of pain.\"}2026-02-04 15:25:39.1149812026-02-04 15:25:39.114981
97a2a28dcc9b49029b3101c342461c8f83d31ab3552644d2bccddc73d9c9f30d{\"modules\": {\"prompt_templating\": {\"model\": {\"name\": \"gemini-2.5-pro\", \"version\": \"001\"}, \"prompt\": {\"template\": [{\"role\": \"user\", \"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\"}], \"defaults\": {}}}}}{\"question\": \"what are the milligrams for oxycodone e\", \"sentiment\": \"Dose\", \"reference\": \"\\ufffd 10 mg \\ufffd 20 mg \\ufffd 40 mg \\ufffd 80 mg ...\"}2026-02-04 15:25:39.1149842026-02-04 15:25:39.114984
a14b6ad38ea7484b9b48a486a6d1b7d783d31ab3552644d2bccddc73d9c9f30d{\"modules\": {\"prompt_templating\": {\"model\": {\"name\": \"gemini-2.5-pro\", \"version\": \"001\"}, \"prompt\": {\"template\": [{\"role\": \"user\", \"content\": \"List the benefits and side effects of the drug in the following consumer health question: {{?question}}.\"}], \"defaults\": {}}}}}{\"question\": \"81% aspirin contain resin and shellac in it. ?\", \"sentiment\": \"Ingredient\", \"reference\": \"Inactive Ingredients Ingredient Name\"}2026-02-04 15:25:39.1149862026-02-04 15:25:39.114987
\n", + "
\n", + " \n", + "
\n", + "

Table: submission_result

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
submission_idrun_idrepetition_countcompletion_resultlatencycreated_atupdated_at
7516fcd4de96441a8ae9e8f68bdb3f5c83d31ab3552644d2bccddc73d9c9f30d1{\"request_id\": \"37930f04-8112-9067-b59e-696e43a24410\", \"intermediate_results\": {\"templating\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: how does rivatigmine and otc sleep medicine interact.\", \"role\": \"user\"}], \"llm\": {\"id\": \"\", \"object\": \"chat.completion\", \"created\": 1770218849, \"model\": \"gemini-2.5-pro\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Of course. Here is a breakdown of the benefits and side effects for each drug, followed by an explanation of their interaction, based on the consumer health question.\\n\\n***\\n\\n### 1. Rivastigmine (e.g., brand name Exelon)\\n\\nThis is a prescription medication used to treat dementia.\\n\\n**Benefits:**\\n* **Improves Cognitive Function:** It is used to treat mild to moderate dementia associated with Alzheimer's disease and Parkinson's disease.\\n* **Slows Symptom Progression:** It can help improve or stabilize functions like memory, thinking, and language. It does not cure the disease but can help manage the symptoms.\\n* **Mechanism of Action:** It works by increasing the levels of a chemical in the brain called acetylcholine, which is important for memory and thought processes.\\n\\n**Common Side Effects:**\\n* Nausea, vomiting, diarrhea\\n* Loss of appetite and weight loss\\n* Dizziness, headache, or fatigue\\n* Stomach upset or pain\\n\\n### 2. Common OTC (Over-the-Counter) Sleep Medicines\\n\\nThis category most often includes drugs containing **diphenhydramine** (found in Benadryl, ZzzQuil, Advil PM) or **doxylamine** (found in Unisom SleepTabs).\\n\\n**Benefits:**\\n* **Induces Drowsiness:** Helps individuals fall asleep and manage occasional insomnia.\\n* **Widely Accessible:** Available without a prescription for short-term use.\\n\\n**Common Side Effects:**\\n* Next-day drowsiness or a \\\"hangover\\\" effect\\n* Dizziness and confusion (especially in older adults)\\n* Dry mouth, blurred vision\\n* Constipation and difficulty urinating\\n* **Mechanism of Action:** These drugs have strong **anticholinergic** properties, which means they work by **blocking** the action of acetylcholine. This blocking effect causes drowsiness, but also the other side effects listed.\\n\\n---\\n\\n### The Interaction Between Rivastigmine and OTC Sleep Medicine\\n\\nThis is a significant interaction that should be avoided. The two types of drugs have opposite effects on the brain, creating a direct conflict.\\n\\n**1. Reduced Effectiveness of Rivastigmine:**\\n* Rivastigmine works by **increasing** acetylcholine to help with dementia symptoms.\\n* OTC sleep aids work by **blocking** acetylcholine.\\n* **Result:** Taking an OTC sleep aid can directly counteract the intended benefit of rivastigmine, making the dementia medication less effective.\\n\\n**2. Increased Risk of Negative Side Effects:**\\n* Both medications can cause dizziness and confusion on their own. Taking them together significantly increases the risk and severity of these side effects.\\n* For an older adult, this heightened confusion and dizziness can lead to a much higher risk of **falls, injury, and accidents**.\\n* The anticholinergic side effects from the sleep aid (dry mouth, constipation, blurry vision) can become more pronounced and problematic.\\n\\n### Summary and Recommendation\\n\\n**Do not combine rivastigmine with common OTC sleep medicines like ZzzQuil, Benadryl, or Unisom.** The sleep medicine is likely to make the rivastigmine less effective and increase the risk of dangerous side effects like severe confusion and falls.\\n\\nIf you or someone you care for is taking rivastigmine and having trouble sleeping, it is essential to **speak with their doctor or a pharmacist**. They can recommend safer alternatives, which may include non-drug strategies (like improving sleep hygiene) or a different type of medication that does not conflict with rivastigmine.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 2332, \"prompt_tokens\": 29, \"total_tokens\": 2361, \"prompt_tokens_details\": {\"cached_tokens\": 0}, \"completion_tokens_details\": {\"reasoning_tokens\": 1581}}}}, \"final_result\": {\"id\": \"\", \"object\": \"chat.completion\", \"created\": 1770218849, \"model\": \"gemini-2.5-pro\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Of course. Here is a breakdown of the benefits and side effects for each drug, followed by an explanation of their interaction, based on the consumer health question.\\n\\n***\\n\\n### 1. Rivastigmine (e.g., brand name Exelon)\\n\\nThis is a prescription medication used to treat dementia.\\n\\n**Benefits:**\\n* **Improves Cognitive Function:** It is used to treat mild to moderate dementia associated with Alzheimer's disease and Parkinson's disease.\\n* **Slows Symptom Progression:** It can help improve or stabilize functions like memory, thinking, and language. It does not cure the disease but can help manage the symptoms.\\n* **Mechanism of Action:** It works by increasing the levels of a chemical in the brain called acetylcholine, which is important for memory and thought processes.\\n\\n**Common Side Effects:**\\n* Nausea, vomiting, diarrhea\\n* Loss of appetite and weight loss\\n* Dizziness, headache, or fatigue\\n* Stomach upset or pain\\n\\n### 2. Common OTC (Over-the-Counter) Sleep Medicines\\n\\nThis category most often includes drugs containing **diphenhydramine** (found in Benadryl, ZzzQuil, Advil PM) or **doxylamine** (found in Unisom SleepTabs).\\n\\n**Benefits:**\\n* **Induces Drowsiness:** Helps individuals fall asleep and manage occasional insomnia.\\n* **Widely Accessible:** Available without a prescription for short-term use.\\n\\n**Common Side Effects:**\\n* Next-day drowsiness or a \\\"hangover\\\" effect\\n* Dizziness and confusion (especially in older adults)\\n* Dry mouth, blurred vision\\n* Constipation and difficulty urinating\\n* **Mechanism of Action:** These drugs have strong **anticholinergic** properties, which means they work by **blocking** the action of acetylcholine. This blocking effect causes drowsiness, but also the other side effects listed.\\n\\n---\\n\\n### The Interaction Between Rivastigmine and OTC Sleep Medicine\\n\\nThis is a significant interaction that should be avoided. The two types of drugs have opposite effects on the brain, creating a direct conflict.\\n\\n**1. Reduced Effectiveness of Rivastigmine:**\\n* Rivastigmine works by **increasing** acetylcholine to help with dementia symptoms.\\n* OTC sleep aids work by **blocking** acetylcholine.\\n* **Result:** Taking an OTC sleep aid can directly counteract the intended benefit of rivastigmine, making the dementia medication less effective.\\n\\n**2. Increased Risk of Negative Side Effects:**\\n* Both medications can cause dizziness and confusion on their own. Taking them together significantly increases the risk and severity of these side effects.\\n* For an older adult, this heightened confusion and dizziness can lead to a much higher risk of **falls, injury, and accidents**.\\n* The anticholinergic side effects from the sleep aid (dry mouth, constipation, blurry vision) can become more pronounced and problematic.\\n\\n### Summary and Recommendation\\n\\n**Do not combine rivastigmine with common OTC sleep medicines like ZzzQuil, Benadryl, or Unisom.** The sleep medicine is likely to make the rivastigmine less effective and increase the risk of dangerous side effects like severe confusion and falls.\\n\\nIf you or someone you care for is taking rivastigmine and having trouble sleeping, it is essential to **speak with their doctor or a pharmacist**. They can recommend safer alternatives, which may include non-drug strategies (like improving sleep hygiene) or a different type of medication that does not conflict with rivastigmine.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 2332, \"prompt_tokens\": 29, \"total_tokens\": 2361, \"prompt_tokens_details\": {\"cached_tokens\": 0}, \"completion_tokens_details\": {\"reasoning_tokens\": 1581}}}}25.2656732026-02-04 15:37:07.9352152026-02-04 15:37:07.935219
fbe9fd3eaa1945bfa97204593045861483d31ab3552644d2bccddc73d9c9f30d1{\"request_id\": \"5843d2e6-ff62-9ede-aaff-f759bdac5bab\", \"intermediate_results\": {\"templating\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: how does valium affect the brain.\", \"role\": \"user\"}], \"llm\": {\"id\": \"\", \"object\": \"chat.completion\", \"created\": 1770218857, \"model\": \"gemini-2.5-pro\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Of course. Here is a breakdown of the benefits and side effects of Valium (diazepam) in the context of how it affects the brain, presented in a clear, consumer-friendly format.\\n\\n**IMPORTANT:** This information is for educational purposes only and is not a substitute for professional medical advice, diagnosis, or treatment. Always seek the advice of your physician or another qualified health provider with any questions you may have regarding a medical condition or medication.\\n\\n***\\n\\n### How Valium (Diazepam) Affects the Brain\\n\\nValium belongs to a class of drugs called benzodiazepines. Its primary effect on the brain is to enhance the activity of a neurotransmitter called **GABA (gamma-aminobutyric acid)**.\\n\\nThink of GABA as the brain's main **calming agent or \\\"brake pedal.\\\"** It works by slowing down nerve cell activity. Valium essentially makes this \\\"brake pedal\\\" more effective, leading to a widespread calming and slowing of the central nervous system. This action is what produces both its therapeutic benefits and its side effects.\\n\\n---\\n\\n### Benefits (Therapeutic Uses)\\n\\nBy slowing down brain activity, Valium is prescribed to treat several conditions:\\n\\n* **Anxiety Disorders:** It effectively reduces the physical and emotional symptoms of anxiety, such as excessive worry, tension, panic attacks, and fear.\\n* **Muscle Spasms:** It acts as a potent muscle relaxant, helpful for conditions caused by muscle spasms, such as chronic back pain or certain neurological disorders like cerebral palsy.\\n* **Seizure Disorders:** By reducing excessive electrical activity in the brain, it can be used to stop an active seizure (status epilepticus) or as an add-on therapy to prevent seizures.\\n* **Alcohol Withdrawal:** It is used to manage and prevent severe symptoms of alcohol withdrawal, including agitation, tremors, and life-threatening seizures.\\n* **Sedation:** It is often administered before surgery or other medical procedures to reduce anxiety and induce sedation.\\n\\n---\\n\\n### Side Effects & Risks\\n\\nThe same brain-slowing mechanism that provides benefits can also cause a range of side effects.\\n\\n#### Common Side Effects:\\nThese are often related to the sedative effects of the drug.\\n\\n* **Drowsiness, Fatigue, and Dizziness:** Feeling tired, sleepy, or unsteady on your feet.\\n* **Impaired Coordination and Unsteadiness:** Difficulty with balance and fine motor skills, which increases the risk of falls. This is why driving or operating heavy machinery is strongly discouraged.\\n* **Confusion and Memory Problems:** Difficulty concentrating and, notably, **anterograde amnesia**, which is trouble forming new memories while the drug is active.\\n* **Muscle Weakness:** A feeling of general weakness, separate from the therapeutic muscle relaxation.\\n* **Slurred Speech.**\\n\\n#### Serious Side Effects (Require Immediate Medical Attention):\\n\\n* **Severe Respiratory Depression:** Dangerously slow or shallow breathing, especially when mixed with alcohol, opioids, or other depressants. This can be fatal.\\n* **Paradoxical Reactions:** Instead of calming, the drug can cause the opposite effect, such as increased agitation, aggression, hallucinations, or rage.\\n* **Severe Drowsiness or Unresponsiveness:** Difficulty waking up.\\n* **Suicidal Thoughts or Worsening Depression.**\\n\\n#### Risks of Long-Term Use:\\n\\n* **Tolerance:** Over time, the body adapts to the drug, and higher doses are needed to achieve the same effect.\\n* **Dependence:** The brain and body become reliant on Valium to function normally. Stopping the drug abruptly after long-term use can lead to a severe and potentially dangerous withdrawal syndrome.\\n* **Withdrawal Symptoms:** Can include rebound anxiety, insomnia, tremors, sweating, agitation, and in severe cases, seizures. Withdrawal should always be managed by a doctor through a gradual tapering schedule.\\n* **Addiction (Substance Use Disorder):** Characterized by compulsive drug-seeking and use despite harmful consequences.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 2397, \"prompt_tokens\": 24, \"total_tokens\": 2421, \"prompt_tokens_details\": {\"cached_tokens\": 0}, \"completion_tokens_details\": {\"reasoning_tokens\": 1562}}}}, \"final_result\": {\"id\": \"\", \"object\": \"chat.completion\", \"created\": 1770218857, \"model\": \"gemini-2.5-pro\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Of course. Here is a breakdown of the benefits and side effects of Valium (diazepam) in the context of how it affects the brain, presented in a clear, consumer-friendly format.\\n\\n**IMPORTANT:** This information is for educational purposes only and is not a substitute for professional medical advice, diagnosis, or treatment. Always seek the advice of your physician or another qualified health provider with any questions you may have regarding a medical condition or medication.\\n\\n***\\n\\n### How Valium (Diazepam) Affects the Brain\\n\\nValium belongs to a class of drugs called benzodiazepines. Its primary effect on the brain is to enhance the activity of a neurotransmitter called **GABA (gamma-aminobutyric acid)**.\\n\\nThink of GABA as the brain's main **calming agent or \\\"brake pedal.\\\"** It works by slowing down nerve cell activity. Valium essentially makes this \\\"brake pedal\\\" more effective, leading to a widespread calming and slowing of the central nervous system. This action is what produces both its therapeutic benefits and its side effects.\\n\\n---\\n\\n### Benefits (Therapeutic Uses)\\n\\nBy slowing down brain activity, Valium is prescribed to treat several conditions:\\n\\n* **Anxiety Disorders:** It effectively reduces the physical and emotional symptoms of anxiety, such as excessive worry, tension, panic attacks, and fear.\\n* **Muscle Spasms:** It acts as a potent muscle relaxant, helpful for conditions caused by muscle spasms, such as chronic back pain or certain neurological disorders like cerebral palsy.\\n* **Seizure Disorders:** By reducing excessive electrical activity in the brain, it can be used to stop an active seizure (status epilepticus) or as an add-on therapy to prevent seizures.\\n* **Alcohol Withdrawal:** It is used to manage and prevent severe symptoms of alcohol withdrawal, including agitation, tremors, and life-threatening seizures.\\n* **Sedation:** It is often administered before surgery or other medical procedures to reduce anxiety and induce sedation.\\n\\n---\\n\\n### Side Effects & Risks\\n\\nThe same brain-slowing mechanism that provides benefits can also cause a range of side effects.\\n\\n#### Common Side Effects:\\nThese are often related to the sedative effects of the drug.\\n\\n* **Drowsiness, Fatigue, and Dizziness:** Feeling tired, sleepy, or unsteady on your feet.\\n* **Impaired Coordination and Unsteadiness:** Difficulty with balance and fine motor skills, which increases the risk of falls. This is why driving or operating heavy machinery is strongly discouraged.\\n* **Confusion and Memory Problems:** Difficulty concentrating and, notably, **anterograde amnesia**, which is trouble forming new memories while the drug is active.\\n* **Muscle Weakness:** A feeling of general weakness, separate from the therapeutic muscle relaxation.\\n* **Slurred Speech.**\\n\\n#### Serious Side Effects (Require Immediate Medical Attention):\\n\\n* **Severe Respiratory Depression:** Dangerously slow or shallow breathing, especially when mixed with alcohol, opioids, or other depressants. This can be fatal.\\n* **Paradoxical Reactions:** Instead of calming, the drug can cause the opposite effect, such as increased agitation, aggression, hallucinations, or rage.\\n* **Severe Drowsiness or Unresponsiveness:** Difficulty waking up.\\n* **Suicidal Thoughts or Worsening Depression.**\\n\\n#### Risks of Long-Term Use:\\n\\n* **Tolerance:** Over time, the body adapts to the drug, and higher doses are needed to achieve the same effect.\\n* **Dependence:** The brain and body become reliant on Valium to function normally. Stopping the drug abruptly after long-term use can lead to a severe and potentially dangerous withdrawal syndrome.\\n* **Withdrawal Symptoms:** Can include rebound anxiety, insomnia, tremors, sweating, agitation, and in severe cases, seizures. Withdrawal should always be managed by a doctor through a gradual tapering schedule.\\n* **Addiction (Substance Use Disorder):** Characterized by compulsive drug-seeking and use despite harmful consequences.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 2397, \"prompt_tokens\": 24, \"total_tokens\": 2421, \"prompt_tokens_details\": {\"cached_tokens\": 0}, \"completion_tokens_details\": {\"reasoning_tokens\": 1562}}}}33.0435612026-02-04 15:37:07.9352202026-02-04 15:37:07.935220
c64e5a8aef454b1f864d90611e2c1a9b83d31ab3552644d2bccddc73d9c9f30d1{\"request_id\": \"16cb6606-07ba-9462-88cd-62e5fe83f39a\", \"intermediate_results\": {\"templating\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: what is morphine.\", \"role\": \"user\"}], \"llm\": {\"id\": \"\", \"object\": \"chat.completion\", \"created\": 1770218856, \"model\": \"gemini-2.5-pro\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Based on the consumer health question \\\"what is morphine,\\\" here is a list of the drug's benefits and side effects.\\n\\n***\\n\\n### **Benefits of Morphine**\\n\\nMorphine is a powerful opioid pain medication prescribed for the management of severe pain. Its primary benefits are related to its potent analgesic (pain-relieving) effects.\\n\\n* **Management of Severe Acute Pain:** It is highly effective for treating intense, short-term pain, such as pain experienced after major surgery, a serious injury (like a broken bone or severe burn), or a heart attack.\\n* **Cancer-Related Pain:** Morphine is a cornerstone of pain management for patients with moderate to severe pain caused by cancer or its treatments.\\n* **Palliative and End-of-Life Care:** It is used to provide comfort to patients with terminal illnesses by relieving persistent pain and the sensation of shortness of breath (dyspnea).\\n* **Chronic Pain Management:** In some specific and carefully monitored cases, it may be prescribed for severe, long-term chronic pain when other treatments have failed.\\n\\n### **Side Effects of Morphine**\\n\\nMorphine has a significant risk of side effects, ranging from common and manageable to severe and life-threatening.\\n\\n#### **Common Side Effects:**\\n\\n* **Drowsiness and Sedation:** Feeling sleepy, tired, or mentally \\\"foggy.\\\"\\n* **Constipation:** This is a very common and often persistent side effect.\\n* **Nausea and Vomiting:** Especially common when first starting the medication.\\n* **Dizziness and Lightheadedness:** Can increase the risk of falls, particularly in the elderly.\\n* **Confusion:** Difficulty thinking clearly or remembering things.\\n* **Itching or Sweating.**\\n\\n#### **Serious and Potentially Life-Threatening Side Effects:**\\n\\n**Seek immediate medical attention if you experience any of the following:**\\n\\n* **Respiratory Depression (Slowed or Stopped Breathing):** This is the most dangerous side effect of morphine and can be fatal. Signs include shallow, difficult, or very slow breathing.\\n* **Extreme Drowsiness or Inability to Wake Up:** Progressing from sedation to unresponsiveness.\\n* **Low Blood Pressure and Slow Heart Rate:** Causing faintness, dizziness, or confusion.\\n* **Severe Confusion, Hallucinations, or Agitation.**\\n* **Allergic Reaction:** Signs include rash, hives, swelling of the face, lips, or throat, and difficulty breathing.\\n\\n#### **Long-Term Risks and Other Important Considerations:**\\n\\n* **Addiction, Dependence, and Withdrawal:** Morphine has a high potential for abuse and addiction (the compulsive use of a drug despite harmful consequences). Physical dependence can also occur, meaning the body adapts to the drug and will experience withdrawal symptoms (e.g., anxiety, muscle aches, nausea, diarrhea) if the medication is stopped suddenly.\\n* **Tolerance:** Over time, the body may require higher doses of morphine to achieve the same level of pain relief.\\n* **Risk of Overdose:** Taking too much morphine, or combining it with other substances like **alcohol, benzodiazepines (e.g., Xanax, Valium), or other sedatives**, can lead to a fatal overdose, primarily by stopping breathing.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 2131, \"prompt_tokens\": 20, \"total_tokens\": 2151, \"prompt_tokens_details\": {\"cached_tokens\": 0}, \"completion_tokens_details\": {\"reasoning_tokens\": 1445}}}}, \"final_result\": {\"id\": \"\", \"object\": \"chat.completion\", \"created\": 1770218856, \"model\": \"gemini-2.5-pro\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Based on the consumer health question \\\"what is morphine,\\\" here is a list of the drug's benefits and side effects.\\n\\n***\\n\\n### **Benefits of Morphine**\\n\\nMorphine is a powerful opioid pain medication prescribed for the management of severe pain. Its primary benefits are related to its potent analgesic (pain-relieving) effects.\\n\\n* **Management of Severe Acute Pain:** It is highly effective for treating intense, short-term pain, such as pain experienced after major surgery, a serious injury (like a broken bone or severe burn), or a heart attack.\\n* **Cancer-Related Pain:** Morphine is a cornerstone of pain management for patients with moderate to severe pain caused by cancer or its treatments.\\n* **Palliative and End-of-Life Care:** It is used to provide comfort to patients with terminal illnesses by relieving persistent pain and the sensation of shortness of breath (dyspnea).\\n* **Chronic Pain Management:** In some specific and carefully monitored cases, it may be prescribed for severe, long-term chronic pain when other treatments have failed.\\n\\n### **Side Effects of Morphine**\\n\\nMorphine has a significant risk of side effects, ranging from common and manageable to severe and life-threatening.\\n\\n#### **Common Side Effects:**\\n\\n* **Drowsiness and Sedation:** Feeling sleepy, tired, or mentally \\\"foggy.\\\"\\n* **Constipation:** This is a very common and often persistent side effect.\\n* **Nausea and Vomiting:** Especially common when first starting the medication.\\n* **Dizziness and Lightheadedness:** Can increase the risk of falls, particularly in the elderly.\\n* **Confusion:** Difficulty thinking clearly or remembering things.\\n* **Itching or Sweating.**\\n\\n#### **Serious and Potentially Life-Threatening Side Effects:**\\n\\n**Seek immediate medical attention if you experience any of the following:**\\n\\n* **Respiratory Depression (Slowed or Stopped Breathing):** This is the most dangerous side effect of morphine and can be fatal. Signs include shallow, difficult, or very slow breathing.\\n* **Extreme Drowsiness or Inability to Wake Up:** Progressing from sedation to unresponsiveness.\\n* **Low Blood Pressure and Slow Heart Rate:** Causing faintness, dizziness, or confusion.\\n* **Severe Confusion, Hallucinations, or Agitation.**\\n* **Allergic Reaction:** Signs include rash, hives, swelling of the face, lips, or throat, and difficulty breathing.\\n\\n#### **Long-Term Risks and Other Important Considerations:**\\n\\n* **Addiction, Dependence, and Withdrawal:** Morphine has a high potential for abuse and addiction (the compulsive use of a drug despite harmful consequences). Physical dependence can also occur, meaning the body adapts to the drug and will experience withdrawal symptoms (e.g., anxiety, muscle aches, nausea, diarrhea) if the medication is stopped suddenly.\\n* **Tolerance:** Over time, the body may require higher doses of morphine to achieve the same level of pain relief.\\n* **Risk of Overdose:** Taking too much morphine, or combining it with other substances like **alcohol, benzodiazepines (e.g., Xanax, Valium), or other sedatives**, can lead to a fatal overdose, primarily by stopping breathing.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 2131, \"prompt_tokens\": 20, \"total_tokens\": 2151, \"prompt_tokens_details\": {\"cached_tokens\": 0}, \"completion_tokens_details\": {\"reasoning_tokens\": 1445}}}}32.1152832026-02-04 15:37:07.9352212026-02-04 15:37:07.935223
97a2a28dcc9b49029b3101c342461c8f83d31ab3552644d2bccddc73d9c9f30d1{\"request_id\": \"c0c31906-4033-9321-affd-947bcb4bdb5b\", \"intermediate_results\": {\"templating\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: what are the milligrams for oxycodone e.\", \"role\": \"user\"}], \"llm\": {\"id\": \"\", \"object\": \"chat.completion\", \"created\": 1770218864, \"model\": \"gemini-2.5-pro\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Of course. It is extremely important to understand that **only a qualified healthcare provider can determine the correct and safe dosage of oxycodone for an individual.** The information below is for educational purposes only.\\n\\nBased on your question about oxycodone, here is a summary of its medical uses (benefits), common side effects, and important warnings.\\n\\n***\\n\\n### **Medical Use (Benefit) of Oxycodone**\\n\\nThe primary benefit of oxycodone is to manage pain that is severe enough to require an opioid pain reliever and when other pain treatments (like non-opioid medications or combination products) have not worked or cannot be tolerated.\\n\\n* **How it works:** Oxycodone is an opioid analgesic. It works by changing the way the brain and nervous system respond to pain signals.\\n* **Types:** It comes in two main forms:\\n * **Immediate-release (IR):** Used for managing acute or \\\"breakthrough\\\" pain. Dosages often start at **5 mg to 15 mg** and are taken every 4 to 6 hours as needed.\\n * **Extended-release (ER):** Used for managing chronic, around-the-clock pain. These tablets are taken once or twice a day and should never be crushed, chewed, or dissolved. Dosages are higher (e.g., **10 mg, 20 mg, 40 mg, 80 mg**) because the medication is released slowly over time.\\n\\n### **Common Side Effects**\\n\\nThese are side effects that many people experience. While common, you should still discuss them with your doctor.\\n\\n* Drowsiness, dizziness, or feeling lightheaded\\n* Nausea and vomiting\\n* Constipation (very common with long-term use)\\n* Headache\\n* Dry mouth\\n* Sweating\\n* Itching\\n\\n### **Serious Side Effects and Important Warnings**\\n\\nThese side effects can be dangerous and require immediate medical attention. **Call 911 if you experience signs of an overdose.**\\n\\n* **Slowed or Shallow Breathing (Respiratory Depression):** This is the most dangerous side effect and can be fatal.\\n* **Extreme Drowsiness or Fainting:** Difficulty waking up or feeling faint.\\n* **Confusion, Hallucinations, or unusual thoughts.**\\n* **Signs of an Allergic Reaction:** Rash, hives, swelling of the face, lips, or throat, and difficulty breathing.\\n* **Seizures.**\\n\\n#### **BOXED WARNING: Addiction, Abuse, and Overdose**\\n\\nOxycodone carries a high risk of addiction, physical dependence, and overdose, which can lead to death.\\n\\n* **Addiction and Misuse:** Oxycodone has a high potential for abuse and can lead to psychological and physical addiction, even when taken as prescribed.\\n* **Physical Dependence and Withdrawal:** Your body can become dependent on the drug, meaning you will experience withdrawal symptoms (like anxiety, muscle aches, sweating, nausea) if you stop taking it suddenly. Your doctor will help you taper off the medication slowly to avoid this.\\n* **Overdose:** Taking too much oxycodone, or combining it with alcohol or other central nervous system depressants (like benzodiazepines, sedatives, or other opioids), can cause a fatal overdose.\\n\\n> **Important Medical Disclaimer:**\\n> This information is for educational purposes and is not a substitute for professional medical advice, diagnosis, or treatment. Always seek the advice of your doctor or another qualified health provider with any questions you may have regarding a medical condition or medication. Never use oxycodone that was not prescribed for you, and never share your prescription with others.\\n>\\n> If you or someone you know is struggling with substance use, please contact the **SAMHSA National Helpline at 1-800-662-4357**.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 2349, \"prompt_tokens\": 25, \"total_tokens\": 2374, \"prompt_tokens_details\": {\"cached_tokens\": 0}, \"completion_tokens_details\": {\"reasoning_tokens\": 1544}}}}, \"final_result\": {\"id\": \"\", \"object\": \"chat.completion\", \"created\": 1770218864, \"model\": \"gemini-2.5-pro\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Of course. It is extremely important to understand that **only a qualified healthcare provider can determine the correct and safe dosage of oxycodone for an individual.** The information below is for educational purposes only.\\n\\nBased on your question about oxycodone, here is a summary of its medical uses (benefits), common side effects, and important warnings.\\n\\n***\\n\\n### **Medical Use (Benefit) of Oxycodone**\\n\\nThe primary benefit of oxycodone is to manage pain that is severe enough to require an opioid pain reliever and when other pain treatments (like non-opioid medications or combination products) have not worked or cannot be tolerated.\\n\\n* **How it works:** Oxycodone is an opioid analgesic. It works by changing the way the brain and nervous system respond to pain signals.\\n* **Types:** It comes in two main forms:\\n * **Immediate-release (IR):** Used for managing acute or \\\"breakthrough\\\" pain. Dosages often start at **5 mg to 15 mg** and are taken every 4 to 6 hours as needed.\\n * **Extended-release (ER):** Used for managing chronic, around-the-clock pain. These tablets are taken once or twice a day and should never be crushed, chewed, or dissolved. Dosages are higher (e.g., **10 mg, 20 mg, 40 mg, 80 mg**) because the medication is released slowly over time.\\n\\n### **Common Side Effects**\\n\\nThese are side effects that many people experience. While common, you should still discuss them with your doctor.\\n\\n* Drowsiness, dizziness, or feeling lightheaded\\n* Nausea and vomiting\\n* Constipation (very common with long-term use)\\n* Headache\\n* Dry mouth\\n* Sweating\\n* Itching\\n\\n### **Serious Side Effects and Important Warnings**\\n\\nThese side effects can be dangerous and require immediate medical attention. **Call 911 if you experience signs of an overdose.**\\n\\n* **Slowed or Shallow Breathing (Respiratory Depression):** This is the most dangerous side effect and can be fatal.\\n* **Extreme Drowsiness or Fainting:** Difficulty waking up or feeling faint.\\n* **Confusion, Hallucinations, or unusual thoughts.**\\n* **Signs of an Allergic Reaction:** Rash, hives, swelling of the face, lips, or throat, and difficulty breathing.\\n* **Seizures.**\\n\\n#### **BOXED WARNING: Addiction, Abuse, and Overdose**\\n\\nOxycodone carries a high risk of addiction, physical dependence, and overdose, which can lead to death.\\n\\n* **Addiction and Misuse:** Oxycodone has a high potential for abuse and can lead to psychological and physical addiction, even when taken as prescribed.\\n* **Physical Dependence and Withdrawal:** Your body can become dependent on the drug, meaning you will experience withdrawal symptoms (like anxiety, muscle aches, sweating, nausea) if you stop taking it suddenly. Your doctor will help you taper off the medication slowly to avoid this.\\n* **Overdose:** Taking too much oxycodone, or combining it with alcohol or other central nervous system depressants (like benzodiazepines, sedatives, or other opioids), can cause a fatal overdose.\\n\\n> **Important Medical Disclaimer:**\\n> This information is for educational purposes and is not a substitute for professional medical advice, diagnosis, or treatment. Always seek the advice of your doctor or another qualified health provider with any questions you may have regarding a medical condition or medication. Never use oxycodone that was not prescribed for you, and never share your prescription with others.\\n>\\n> If you or someone you know is struggling with substance use, please contact the **SAMHSA National Helpline at 1-800-662-4357**.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 2349, \"prompt_tokens\": 25, \"total_tokens\": 2374, \"prompt_tokens_details\": {\"cached_tokens\": 0}, \"completion_tokens_details\": {\"reasoning_tokens\": 1544}}}}40.1377342026-02-04 15:37:07.9352242026-02-04 15:37:07.935224
a14b6ad38ea7484b9b48a486a6d1b7d783d31ab3552644d2bccddc73d9c9f30d1{\"request_id\": \"cf6c8a62-85ee-9d9a-9fab-aad2cce5a825\", \"intermediate_results\": {\"templating\": [{\"content\": \"List the benefits and side effects of the drug in the following consumer health question: 81% aspirin contain resin and shellac in it. ?.\", \"role\": \"user\"}], \"llm\": {\"id\": \"\", \"object\": \"chat.completion\", \"created\": 1770218867, \"model\": \"gemini-2.5-pro\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Of course. It's important to clarify the user's question first, as it contains a common misunderstanding.\\n\\nThe question \\\"81% aspirin contain resin and shellac in it. ?.\\\" likely has two points of confusion:\\n\\n1. **\\\"81%\\\" is almost certainly a typo for \\\"81 mg\\\"**. 81 mg is the standard \\\"low-dose\\\" or \\\"baby aspirin\\\" dosage used for cardiovascular protection.\\n2. **Resin and Shellac** are inactive ingredients used to create the \\\"enteric coating\\\" on some aspirin tablets. This special coating is designed to prevent the aspirin from dissolving in the stomach, allowing it to pass into the small intestine before being absorbed. This helps reduce stomach irritation and bleeding.\\n\\nBased on this clarification, the drug in question is **Enteric-Coated 81 mg Aspirin**. Here are its benefits and side effects.\\n\\n---\\n\\n### Benefits of Low-Dose (81 mg) Aspirin\\n\\nLow-dose aspirin is primarily used for its antiplatelet effects, meaning it helps prevent blood clots from forming. Its main benefits are related to cardiovascular health, typically under a doctor's recommendation.\\n\\n* **Prevention of Heart Attack:** For individuals who have already had a heart attack, daily low-dose aspirin can significantly reduce the risk of having a second one. It is also used in people with a very high risk of a first heart attack.\\n* **Prevention of Ischemic Stroke:** By preventing blood clots, aspirin can lower the risk of an ischemic stroke (a stroke caused by a blockage in a blood vessel supplying the brain). It is often prescribed to patients who have previously had this type of stroke or a transient ischemic attack (TIA or \\\"mini-stroke\\\").\\n* **Management of Cardiovascular Disease:** It is a cornerstone of therapy for people with known coronary artery disease, angina (chest pain from heart disease), or who have had procedures like bypass surgery or stent placement.\\n* **Potential Cancer Prevention:** Some studies suggest that long-term daily aspirin use may lower the risk of certain cancers, particularly colorectal cancer. However, this is not its primary approved use and should be discussed with a doctor.\\n\\n### Side Effects of Low-Dose (81 mg) Aspirin\\n\\nEven at a low dose, aspirin is a powerful medication with potential risks and side effects. The enteric coating helps reduce some risks but does not eliminate them.\\n\\n#### **Common Side Effects:**\\n\\n* **Stomach Upset:** Heartburn, indigestion, or nausea.\\n* **Easy Bruising or Minor Bleeding:** Since aspirin thins the blood, you may notice more bruising or that small cuts take longer to stop bleeding.\\n\\n#### **Serious Side Effects (Require Immediate Medical Attention):**\\n\\n* **Gastrointestinal (GI) Bleeding:** This is the most significant risk. The enteric coating reduces but does not eliminate this risk. **Signs include:**\\n * Black, bloody, or tarry stools.\\n * Vomiting blood or a substance that looks like coffee grounds.\\n * Severe stomach pain that doesn't go away.\\n* **Hemorrhagic Stroke (Bleeding in the Brain):** While aspirin helps prevent strokes caused by clots, it can increase the risk of strokes caused by bleeding. **Signs include:**\\n * Sudden severe headache, confusion, or vision problems.\\n * Numbness or weakness, especially on one side of the body.\\n* **Allergic Reaction:** Although rare, some people are allergic to aspirin. **Signs include:**\\n * Hives, rash, or itching.\\n * Swelling of the face, lips, or tongue.\\n * Wheezing or difficulty breathing.\\n* **Tinnitus:** Ringing or buzzing in the ears, which can be a sign of taking too much aspirin.\\n\\n### **Important Warnings**\\n\\n* **Reye's Syndrome:** Aspirin should **NEVER** be given to children or teenagers recovering from a viral infection (like the flu or chickenpox) as it can cause Reye's syndrome, a rare but life-threatening condition that causes swelling in the liver and brain.\\n* **Drug Interactions:** Aspirin can interact with other medications, especially other blood thinners (like warfarin, clopidogrel), NSAIDs (like ibuprofen, naproxen), and some antidepressants.\\n* **Alcohol:** Drinking alcohol while taking daily aspirin can increase your risk of stomach bleeding.\\n\\n> **Disclaimer:** This information is for educational purposes only and is not a substitute for professional medical advice. The decision to start, stop, or continue taking aspirin should only be made in consultation with a qualified healthcare provider who can assess your individual health risks and benefits.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 2362, \"prompt_tokens\": 30, \"total_tokens\": 2392, \"prompt_tokens_details\": {\"cached_tokens\": 0}, \"completion_tokens_details\": {\"reasoning_tokens\": 1367}}}}, \"final_result\": {\"id\": \"\", \"object\": \"chat.completion\", \"created\": 1770218867, \"model\": \"gemini-2.5-pro\", \"choices\": [{\"index\": 0, \"message\": {\"role\": \"assistant\", \"content\": \"Of course. It's important to clarify the user's question first, as it contains a common misunderstanding.\\n\\nThe question \\\"81% aspirin contain resin and shellac in it. ?.\\\" likely has two points of confusion:\\n\\n1. **\\\"81%\\\" is almost certainly a typo for \\\"81 mg\\\"**. 81 mg is the standard \\\"low-dose\\\" or \\\"baby aspirin\\\" dosage used for cardiovascular protection.\\n2. **Resin and Shellac** are inactive ingredients used to create the \\\"enteric coating\\\" on some aspirin tablets. This special coating is designed to prevent the aspirin from dissolving in the stomach, allowing it to pass into the small intestine before being absorbed. This helps reduce stomach irritation and bleeding.\\n\\nBased on this clarification, the drug in question is **Enteric-Coated 81 mg Aspirin**. Here are its benefits and side effects.\\n\\n---\\n\\n### Benefits of Low-Dose (81 mg) Aspirin\\n\\nLow-dose aspirin is primarily used for its antiplatelet effects, meaning it helps prevent blood clots from forming. Its main benefits are related to cardiovascular health, typically under a doctor's recommendation.\\n\\n* **Prevention of Heart Attack:** For individuals who have already had a heart attack, daily low-dose aspirin can significantly reduce the risk of having a second one. It is also used in people with a very high risk of a first heart attack.\\n* **Prevention of Ischemic Stroke:** By preventing blood clots, aspirin can lower the risk of an ischemic stroke (a stroke caused by a blockage in a blood vessel supplying the brain). It is often prescribed to patients who have previously had this type of stroke or a transient ischemic attack (TIA or \\\"mini-stroke\\\").\\n* **Management of Cardiovascular Disease:** It is a cornerstone of therapy for people with known coronary artery disease, angina (chest pain from heart disease), or who have had procedures like bypass surgery or stent placement.\\n* **Potential Cancer Prevention:** Some studies suggest that long-term daily aspirin use may lower the risk of certain cancers, particularly colorectal cancer. However, this is not its primary approved use and should be discussed with a doctor.\\n\\n### Side Effects of Low-Dose (81 mg) Aspirin\\n\\nEven at a low dose, aspirin is a powerful medication with potential risks and side effects. The enteric coating helps reduce some risks but does not eliminate them.\\n\\n#### **Common Side Effects:**\\n\\n* **Stomach Upset:** Heartburn, indigestion, or nausea.\\n* **Easy Bruising or Minor Bleeding:** Since aspirin thins the blood, you may notice more bruising or that small cuts take longer to stop bleeding.\\n\\n#### **Serious Side Effects (Require Immediate Medical Attention):**\\n\\n* **Gastrointestinal (GI) Bleeding:** This is the most significant risk. The enteric coating reduces but does not eliminate this risk. **Signs include:**\\n * Black, bloody, or tarry stools.\\n * Vomiting blood or a substance that looks like coffee grounds.\\n * Severe stomach pain that doesn't go away.\\n* **Hemorrhagic Stroke (Bleeding in the Brain):** While aspirin helps prevent strokes caused by clots, it can increase the risk of strokes caused by bleeding. **Signs include:**\\n * Sudden severe headache, confusion, or vision problems.\\n * Numbness or weakness, especially on one side of the body.\\n* **Allergic Reaction:** Although rare, some people are allergic to aspirin. **Signs include:**\\n * Hives, rash, or itching.\\n * Swelling of the face, lips, or tongue.\\n * Wheezing or difficulty breathing.\\n* **Tinnitus:** Ringing or buzzing in the ears, which can be a sign of taking too much aspirin.\\n\\n### **Important Warnings**\\n\\n* **Reye's Syndrome:** Aspirin should **NEVER** be given to children or teenagers recovering from a viral infection (like the flu or chickenpox) as it can cause Reye's syndrome, a rare but life-threatening condition that causes swelling in the liver and brain.\\n* **Drug Interactions:** Aspirin can interact with other medications, especially other blood thinners (like warfarin, clopidogrel), NSAIDs (like ibuprofen, naproxen), and some antidepressants.\\n* **Alcohol:** Drinking alcohol while taking daily aspirin can increase your risk of stomach bleeding.\\n\\n> **Disclaimer:** This information is for educational purposes only and is not a substitute for professional medical advice. The decision to start, stop, or continue taking aspirin should only be made in consultation with a qualified healthcare provider who can assess your individual health risks and benefits.\"}, \"finish_reason\": \"stop\"}], \"usage\": {\"completion_tokens\": 2362, \"prompt_tokens\": 30, \"total_tokens\": 2392, \"prompt_tokens_details\": {\"cached_tokens\": 0}, \"completion_tokens_details\": {\"reasoning_tokens\": 1367}}}}43.8727362026-02-04 15:37:07.9352252026-02-04 15:37:07.935225
\n", + "
\n", + " \n", + "
\n", + "

Table: evaluation_result

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
submission_idrun_idrepetition_countmetricaggregating_valuemetric_resulterrorcreated_atupdated_at
7516fcd4de96441a8ae9e8f68bdb3f5c83d31ab3552644d2bccddc73d9c9f30d1\"Pointwise Conciseness\"4.0{\"explanation\": \"The response is thorough and clearly structured, providing the benefits and side effects of both rivastigmine and common OTC sleep medicines, followed by a detailed explanation of their interaction. However, the response includes some redundant information (e.g., repeating the mechanism of action for both drugs, restating the risks in multiple sections) and uses more words than necessary to convey the essential points. While the information is accurate and helpful, the response could be made more concise by removing some repetition and condensing explanations. Therefore, it is mostly concise but not highly concise.\", \"rating\": 4}None2026-02-04 15:51:18.6698132026-02-04 15:51:18.669817
fbe9fd3eaa1945bfa97204593045861483d31ab3552644d2bccddc73d9c9f30d1\"Pointwise Conciseness\"4.0{\"explanation\": \"The response provides a thorough and well-organized explanation of how Valium affects the brain, including both benefits and side effects. It avoids excessive repetition and presents information in a clear, consumer-friendly manner. However, there is some introductory and cautionary language (e.g., the medical disclaimer and the 'clear, consumer-friendly format' statement) that, while helpful, is not strictly necessary for conciseness. The main content is detailed but not overly verbose, and each point is relevant to the user's question. There is some slight elaboration (such as analogies and expanded explanations) that could be trimmed for maximum conciseness, but overall, the response is mostly concise and covers the essential information without significant superfluous content.\", \"rating\": 4}None2026-02-04 15:51:18.6698182026-02-04 15:51:18.669818
c64e5a8aef454b1f864d90611e2c1a9b83d31ab3552644d2bccddc73d9c9f30d1\"Pointwise Conciseness\"4.0{\"explanation\": \"The response provides a thorough and well-organized list of both benefits and side effects of morphine, clearly separated into categories. While the information is comprehensive and relevant, there is some redundancy and elaboration that could be condensed. For example, some side effects are explained in detail with examples or additional warnings, which, while helpful, add to the length. The response could be more succinct by listing the benefits and side effects without the extra explanatory sentences and examples. Therefore, it is mostly concise but not highly concise.\", \"rating\": 4}None2026-02-04 15:51:18.6698182026-02-04 15:51:18.669821
97a2a28dcc9b49029b3101c342461c8f83d31ab3552644d2bccddc73d9c9f30d1\"Pointwise Conciseness\"3.0{\"explanation\": \"The response provides a thorough overview of oxycodone's benefits, side effects, and important warnings, including dosage information for different formulations. However, it contains a significant amount of extra information, such as repeated medical disclaimers, detailed warnings, and explanations about how the drug works, which, while helpful, go beyond the essential request to list benefits and side effects. The response could be made more concise by focusing strictly on the benefits and side effects, with a brief mention of dosage forms if relevant. The inclusion of multiple warnings and disclaimers, while important for safety, adds to the length and reduces conciseness. Therefore, the response is somewhat concise but includes some unnecessary and slightly redundant information.\", \"rating\": 3}None2026-02-04 15:51:18.6698212026-02-04 15:51:18.669822
a14b6ad38ea7484b9b48a486a6d1b7d783d31ab3552644d2bccddc73d9c9f30d1\"Pointwise Conciseness\"4.0{\"explanation\": \"The response is thorough and provides all the essential information regarding the benefits and side effects of low-dose (81 mg) enteric-coated aspirin. It also clarifies the user's question, which is helpful for understanding. However, the response is quite lengthy and includes some extended explanations, such as detailed symptom lists for side effects, background on enteric coating, and a disclaimer. While these details are informative, they go beyond the essential information requested (a list of benefits and side effects). The response could be made more concise by summarizing or omitting some of the explanatory content and focusing more directly on the requested lists. Therefore, it is mostly concise but not highly concise.\", \"rating\": 4}None2026-02-04 15:51:18.6698232026-02-04 15:51:18.669823
\n", + "
\n", + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "# viewing the results from sqlite db in tabular format..\n", + "import sqlite3\n", + "import pandas as pd\n", + "from IPython.display import display, HTML\n", + "\n", + "# Path to your SQLite database file\n", + "db_file = 'results-new/results.db'\n", + "\n", + "connection = sqlite3.connect(db_file)\n", + "\n", + "# Specify the table names you want to display\n", + "table_names = ['run','configuration', 'submission', 'submission_result', 'evaluation_result'] \n", + "\n", + "# Create the CSS and HTML container\n", + "html_content = \"\"\"\n", + "\n", + "
\n", + "\"\"\"\n", + "\n", + "for table_name in table_names:\n", + " query = f\"SELECT * FROM {table_name};\"\n", + " df = pd.read_sql_query(query, connection)\n", + " # If you want to see all the rows across all tables, remove/comment the next line\n", + " df = df.head(5) # Limiting the number of rows displayed\n", + " table_html = df.to_html(classes='table-container', index=False)\n", + " html_content += f\"\"\"\n", + "
\n", + "

Table: {table_name}

\n", + " {table_html}\n", + "
\n", + " \"\"\"\n", + "\n", + "html_content += \"
\"\n", + "\n", + "display(HTML(html_content))\n", + "\n", + "# Close the connection\n", + "connection.close()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "
\n", + "\n", + "
\n", + "

Categorical Comparison

\n", + "

Values: Weighted Average (1-5 scale). Win Rate based on head-to-head performance.

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
run_idrun_namemodelWin RateFinal RankPointwise ConcisenessPointwise Instruction FollowingPointwise CorrectnessPointwise Answer Relevance
2663332bb4da43c089217193cbae88ceRun-prompt-registry-eval-demo-gpt-4o-2024-08-06gpt-4o0.75014.4494.91844.83674.6735
83d31ab3552644d2bccddc73d9c9f30dRun-prompt-registry-eval-demo-gemini-2.5-pro-001gemini-2.5-pro0.37523.4494.89804.77554.6735
a5be2752cae64582922f96b80c890dc8Run-prompt-registry-eval-demo-gpt-5-2025-08-07gpt-50.12534.4494.67354.48984.5306
\n", + "
\n", + " \n", + "
\n", + "

Boolean Comparison

\n", + "

Values: Pass Rate (0-1 scale). Win Rate based on head-to-head performance.

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
run_idrun_namemodelWin RateFinal RankExact MatchContent Filter on InputContent Filter on Output
83d31ab3552644d2bccddc73d9c9f30dRun-prompt-registry-eval-demo-gemini-2.5-pro-001gemini-2.5-pro0.010.00.00.0
2663332bb4da43c089217193cbae88ceRun-prompt-registry-eval-demo-gpt-4o-2024-08-06gpt-4o0.010.00.00.0
a5be2752cae64582922f96b80c890dc8Run-prompt-registry-eval-demo-gpt-5-2025-08-07gpt-50.010.00.00.0
\n", + "
\n", + " \n", + "
\n", + "

Numerical Comparison

\n", + "

Values: Mean Value. Win Rate based on head-to-head performance.

\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
run_idrun_namemodelWin RateFinal RankBLEUROUGE
2663332bb4da43c089217193cbae88ceRun-prompt-registry-eval-demo-gpt-4o-2024-08-06gpt-4o1.0010.00420.1079
83d31ab3552644d2bccddc73d9c9f30dRun-prompt-registry-eval-demo-gemini-2.5-pro-001gemini-2.5-pro0.2520.00300.0794
a5be2752cae64582922f96b80c890dc8Run-prompt-registry-eval-demo-gpt-5-2025-08-07gpt-50.2520.00210.0930
\n", + "
\n", + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "import sqlite3\n", + "import json\n", + "import os\n", + "from IPython.display import display, HTML\n", + "\n", + "# ==========================================\n", + "# 1. CONFIGURATION (Separated Groups)\n", + "# ==========================================\n", + "METRIC_GROUPS = {\n", + " \"Categorical\": {\n", + " \"type\": \"categorical\",\n", + " \"description\": \"Weighted Average (1-5 scale)\",\n", + " \"metrics\": [\n", + " \"Pointwise Conciseness\", \n", + " \"Pointwise Instruction Following\", \n", + " \"Pointwise Correctness\", \n", + " \"Pointwise Answer Relevance\"\n", + " ]\n", + " },\n", + " \"Boolean\": {\n", + " \"type\": \"categorical\", # Uses same weighted avg logic (0 or 1)\n", + " \"description\": \"Pass Rate (0-1 scale)\",\n", + " \"metrics\": [\n", + " \"Exact Match\",\n", + " \"Content Filter on Input\",\n", + " \"Content Filter on Output\",\n", + " \"Language Match\",\n", + " \"JSON Schema Match\"\n", + " ]\n", + " },\n", + " \"Numerical\": {\n", + " \"type\": \"numerical\",\n", + " \"description\": \"Mean Value\",\n", + " \"metrics\": [\n", + " \"BLEU\", \n", + " \"ROUGE\", \n", + " \"BERT Score\",\n", + " \"test-metric\"\n", + " ]\n", + " }\n", + "}\n", + "\n", + "# ==========================================\n", + "# 2. DATA EXTRACTION\n", + "# ==========================================\n", + "def extract_db_metadata(db_path):\n", + " if not os.path.exists(db_path): return pd.DataFrame()\n", + " conn = sqlite3.connect(db_path)\n", + " df_runs = pd.read_sql_query(\"SELECT id, name, tags, config FROM run\", conn)\n", + " conn.close()\n", + " \n", + " meta_data = []\n", + " for _, row in df_runs.iterrows():\n", + " run_id = str(row[\"id\"])\n", + " run_name = str(row[\"name\"])\n", + " tags = {}\n", + " config = {}\n", + " try: tags = json.loads(row[\"tags\"]) if isinstance(row[\"tags\"], str) else row[\"tags\"]\n", + " except: pass\n", + " try: config = json.loads(row[\"config\"]) if isinstance(row[\"config\"], str) else row[\"config\"]\n", + " except: pass\n", + "\n", + " model = \"Unknown\"\n", + " try: model = config[\"modules\"][\"prompt_templating\"][\"model\"][\"name\"]\n", + " except:\n", + " if isinstance(tags, dict): model = tags.get(\"evaluation.ai.sap.com/model\", \"Unknown\")\n", + " elif isinstance(tags, list):\n", + " for t in tags: \n", + " if t.get(\"key\") == \"evaluation.ai.sap.com/model\": model = t.get(\"value\")\n", + "\n", + " meta_data.append({\"run_id\": run_id, \"run_name\": run_name, \"model\": model})\n", + " return pd.DataFrame(meta_data)\n", + "\n", + "def extract_api_metrics(runs_data_resource):\n", + " flat_data = []\n", + " for run in runs_data_resource:\n", + " model = \"Unknown\"\n", + " for t in run.get(\"tags\", []):\n", + " if t.get(\"name\") == \"evaluation.ai.sap.com/model\":\n", + " model = t.get(\"value\")\n", + " break\n", + " for m in run.get(\"metrics\", []):\n", + " clean_name = m.get(\"name\", \"\").replace('\"', '').strip()\n", + " flat_data.append({\n", + " \"model\": model,\n", + " \"metrics_name_clean\": clean_name,\n", + " \"metric_value\": m.get(\"value\")\n", + " })\n", + " df = pd.DataFrame(flat_data)\n", + " df['metric_value'] = pd.to_numeric(df['metric_value'], errors='coerce')\n", + " return df\n", + "\n", + "# ==========================================\n", + "# 3. SCORING & HELM LOGIC\n", + "# ==========================================\n", + "def calculate_weighted_avg_score(row, cols):\n", + " \"\"\" Returns a score based on counts. \n", + " Categorical: 1-5 scale. \n", + " Boolean: 0-1 scale (Pass Rate). \n", + " \"\"\"\n", + " total_score = 0\n", + " total_count = 0\n", + " # Check counts 0-5 (covers Boolean 0/1 and Categorical 1-5)\n", + " for rating in range(0, 6):\n", + " col_name = next((c for c in cols if f\"/{rating}/count\" in c), None)\n", + " if col_name and not pd.isna(row[col_name]):\n", + " count = row[col_name]\n", + " total_score += count * rating\n", + " total_count += count\n", + " return total_score / total_count if total_count > 0 else 0.0\n", + "\n", + "def get_metric_score_series(df_metrics, metric_name, group_type):\n", + " \"\"\" Returns a Series of SCORES (Scalar) for each model for a specific metric \"\"\"\n", + " subset = df_metrics[df_metrics['metrics_name_clean'].str.startswith(metric_name)]\n", + " if subset.empty: return None\n", + "\n", + " # Pivot to get columns for this metric\n", + " pivot = subset.pivot_table(index='model', columns='metrics_name_clean', values='metric_value', aggfunc='first')\n", + " cols = pivot.columns.tolist()\n", + " \n", + " if group_type == \"categorical\":\n", + " # Calculate Weighted Average (or Pass Rate for Boolean)\n", + " return pivot.apply(lambda row: calculate_weighted_avg_score(row, cols), axis=1)\n", + " else:\n", + " # Calculate Mean (Numerical)\n", + " c_mean = next((c for c in cols if \"mean\" in c), None)\n", + " if c_mean: return pivot[c_mean]\n", + " return None\n", + "\n", + "def calculate_group_win_rate(score_table):\n", + " \"\"\"\n", + " Calculates HELM Win Rate: % of times a model beats another model across all metrics in this group.\n", + " \"\"\"\n", + " models = score_table.index.tolist()\n", + " metrics = score_table.columns.tolist()\n", + " win_rates = {}\n", + "\n", + " for model_a in models:\n", + " wins = 0\n", + " comparisons = 0\n", + " \n", + " for model_b in models:\n", + " if model_a == model_b: continue\n", + " \n", + " # Compare across ALL metrics in this table\n", + " for metric in metrics:\n", + " score_a = score_table.at[model_a, metric]\n", + " score_b = score_table.at[model_b, metric]\n", + " \n", + " # Only compare valid scores\n", + " if pd.isna(score_a) or pd.isna(score_b): continue\n", + " \n", + " comparisons += 1\n", + " if score_a > score_b:\n", + " wins += 1\n", + " \n", + " win_rates[model_a] = wins / comparisons if comparisons > 0 else 0.0\n", + " \n", + " return pd.Series(win_rates)\n", + "\n", + "# ==========================================\n", + "# 4. EXECUTION\n", + "# ==========================================\n", + "db_file = 'results-new/results.db'\n", + "\n", + "# A. Metadata\n", + "df_db_meta = extract_db_metadata(db_file)\n", + "df_db_unique = df_db_meta.drop_duplicates(subset=['model'], keep='last')\n", + "\n", + "# B. CSS\n", + "html_content = \"\"\"\n", + "\n", + "
\n", + "\"\"\"\n", + "if 'runs_data' in locals() and runs_data:\n", + " df_metrics_all = extract_api_metrics(runs_data['resources'])\n", + " \n", + " for group_name, config in METRIC_GROUPS.items():\n", + " \n", + " # 1. Build Score Table\n", + " score_table = pd.DataFrame(index=df_db_unique['model'].unique())\n", + " score_table.index.name = 'model'\n", + " \n", + " valid_metrics = []\n", + " \n", + " # 2. Calculate Scores\n", + " for metric in config[\"metrics\"]:\n", + " scores = get_metric_score_series(df_metrics_all, metric, config[\"type\"])\n", + " if scores is not None:\n", + " score_table[metric] = scores\n", + " valid_metrics.append(metric)\n", + " \n", + " if not valid_metrics:\n", + " continue\n", + "\n", + " # 3. Calculate HELM Win Rate (Specific to this group)\n", + " score_table['Win Rate'] = calculate_group_win_rate(score_table[valid_metrics])\n", + " \n", + " # 4. Calculate Final Rank\n", + " score_table['Final Rank'] = score_table['Win Rate'].rank(ascending=False, method='min')\n", + " \n", + " # 5. Merge & Format\n", + " df_final = pd.merge(df_db_unique, score_table, on='model', how='inner')\n", + " df_final = df_final.sort_values('Final Rank')\n", + " \n", + " # Rounding\n", + " for c in valid_metrics: df_final[c] = df_final[c].fillna(0.0).astype(float).round(4)\n", + " df_final['Win Rate'] = df_final['Win Rate'].fillna(0.0).astype(float).round(4)\n", + " df_final['Final Rank'] = df_final['Final Rank'].fillna(0).astype(int)\n", + " \n", + " # Columns\n", + " meta_cols = ['run_id', 'run_name', 'model']\n", + " final_cols = meta_cols + ['Win Rate', 'Final Rank'] + valid_metrics\n", + " \n", + " # 6. Generate HTML\n", + " table_html = df_final[final_cols].to_html(classes='table-container', index=False)\n", + " \n", + " html_content += f\"\"\"\n", + "
\n", + "

{group_name} Comparison

\n", + "

Values: {config['description']}. Win Rate based on head-to-head performance.

\n", + " {table_html}\n", + "
\n", + " \"\"\"\n", + "\n", + " html_content += \"
\"\n", + " display(HTML(html_content))\n", + " \n", + "else:\n", + " print(\"'runs_data' missing.\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "#Delete Execution Id\n", + "def delete_execution():\n", + " headers = _get_headers()\n", + " EXEC_ID = execution_id\n", + " GET_EXECUTIONS_ENDPOINT = '/v2/lm/executions/'\n", + " request_url = f\"{AICORE_BASE_URL}{GET_EXECUTIONS_ENDPOINT}{EXEC_ID}\"\n", + " try:\n", + " response = requests.delete(\n", + " request_url, headers=headers, params={\"AI-Resource-Group\":AICORE_RESOURCE_GROUP}, timeout=120\n", + " )\n", + " print(response)\n", + " if(response.status_code != 202):\n", + " raise\n", + " result = response.json()\n", + " print(result)\n", + " except:\n", + " logging.error(\"Error occurred while attempting to delete a Configuration\")\n", + " raise\n", + " \n", + "delete_execution()" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.13" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/tutorials/ai-core-genaihub-evaluation/requirements.txt b/tutorials/ai-core-genaihub-evaluation/requirements.txt index 547492540..c63e2f289 100644 --- a/tutorials/ai-core-genaihub-evaluation/requirements.txt +++ b/tutorials/ai-core-genaihub-evaluation/requirements.txt @@ -1,4 +1,4 @@ -sap-ai-sdk-gen +generative-ai-hub-sdk==4.4.3 python-dotenv==1.0.1 boto3==1.37.4 pandas==2.2.3 diff --git a/tutorials/ai-core-genaihub-evaluation/results-new/results.db b/tutorials/ai-core-genaihub-evaluation/results-new/results.db deleted file mode 100644 index 24d338d4c..000000000 Binary files a/tutorials/ai-core-genaihub-evaluation/results-new/results.db and /dev/null differ diff --git a/tutorials/ai-core-genaihub-evaluation/sample.env b/tutorials/ai-core-genaihub-evaluation/sample.env index 0cf95eaef..09eeddf3f 100644 --- a/tutorials/ai-core-genaihub-evaluation/sample.env +++ b/tutorials/ai-core-genaihub-evaluation/sample.env @@ -3,10 +3,11 @@ AICORE_CLIENT_ID= AICORE_CLIENT_SECRET=AICORE CLIENT SECRET> AICORE_AUTH_URL= AICORE_BASE_URL= -AICORE_RESOURCE_GROUP =default # AWS CREDENTIALS AWS_ACCESS_KEY= AWS_BUCKET_ID=> AWS_REGION= -AWS_SECRET_ACCESS_KEY= \ No newline at end of file +AWS_SECRET_ACCESS_KEY= +AWS_USERNAME= +AWS_HOST= diff --git a/tutorials/ai-core-genaihub-prompt-optimization/ai-core-genaihub-prompt-optimization.md b/tutorials/ai-core-genaihub-prompt-optimization/ai-core-genaihub-prompt-optimization.md index 446b664e5..8f3745217 100644 --- a/tutorials/ai-core-genaihub-prompt-optimization/ai-core-genaihub-prompt-optimization.md +++ b/tutorials/ai-core-genaihub-prompt-optimization/ai-core-genaihub-prompt-optimization.md @@ -2,8 +2,8 @@ parser: v2 auto_validation: true time: 45 -primary_tag: software-product>sap-business-technology-platform -tags: [ tutorial>beginner, topic>artificial-intelligence, topic>machine-learning, software-product>sap-business-technology-platform ] +primary_tag: software-product>sap-ai-core +tags: [ tutorial>beginner, topic>artificial-intelligence, topic>machine-learning, software-product>sap-ai-core ] author_name: Smita Naik author_profile: https://github.com/I321506 ---