Skip to content

Latest commit

 

History

History
 
 

README.md

HoC 2023 Script Files

Quickstart

Run the HoC2023AiGenerateWeights.py to generate the associated output weights contained in files like cached_background_effects_map.json that are used to calculate final effects output for the HoC 2023 activity using the following command from the code-dot-org root directory: python apps/script/HoC2023ScriptFiles/HoC2023AiGenerateWeights.py

Before running the script, make sure to adjust your local parameters based off the current model being used. Previous iterations have leveraged spaCy and OpenAI's Ada models and it is not unreasonable to anticipate that the model "vendor" may change again in the future.

As of 01/08/2024, this script uses AWS's Titan v1 LLM through their Bedrock API. For additional background context and testing resources, check the google drive here: https://docs.google.com/document/d/1beDoalfB1Y7BybN82YGhuzTNos_TE5l0dX4XdKPzNdw/edit?usp=sharing

Script Outputs

The script generates/updates 3 files in apps/static/dance/ai/model:

  • cached_background_effects_map.json
  • cached_foreground_effects_map.json
  • cached_palettes_map.json

These files store association values measuring the similarity between an emoji and each output using embeddings generated by a LLM. The embeddings used to generate these maps are stored in cached pickle files such as foreground_embeddings.pkl to prevent duplicate LLM API calls.

Expected Runtime Behavior

At runtime, DanceAI will use the three maps to lookup the scores for each output type and randomly select one of the top 3 results of MAX(SUM(Input1Scores, Input2Scores, Input3Scores)) to select a final palette/foreground/background to display to the user. These maps are stored as a local cache rather than generated at runtime to remove the costs associated with querying a LLM and improve runtime performance.

Note that HoC2023AiGenerateWeights.py does NOT execute at runtime; it is expected that HoC2023AiGenerateWeights.py is executed and associated map values generated PRIOR to runtime.

If the generate script is given a value that does not exist in the existing maps/pickle files, it will output an HTTP request to the current LLM API (e.g. AWS Bedrock) requesting for an embedding. This step is the only one where CodeAI can be charged according to the pricing structure of the LLM and appropriate care must therefore take place to ensure that we are not excessively billed for verbose queries.

AWS Titan Setup

It is highly suggested to check their documentation at https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html, https://github.com/aws-samples/amazon-bedrock-workshop, and https://github.com/aws-samples/amazon-bedrock-workshop/blob/main/00_Intro/bedrock_boto3_setup.ipynb as there are multiple permissions you will need before being able to query Bedrock for embeddings. Additional tokens/roles may also be required for it to run within your local environment if you do not already have AWS tokens/permissions setup within your console.

Lastly, make sure to reach out to the Infrastructure team for best practices related to IAM permissions and role creation that you will need to get Bedrock model access. Previously, we opted to utilize the recommended SageMaker service and role to streamline permissions issues but better options (for both cost and organization) might have since been determined.