Skip to content

#1 Updating to latest repo version with LLM Monitoring metrics#1

Merged
juanroesel merged 16 commits into
mainfrom
llm-monitoring
May 9, 2024
Merged

#1 Updating to latest repo version with LLM Monitoring metrics#1
juanroesel merged 16 commits into
mainfrom
llm-monitoring

Conversation

@juanroesel

@juanroesel juanroesel commented May 7, 2024

Copy link
Copy Markdown

Closes ZenHubHQ/devops#2205

It also adds includes the following:

  • A new metric kv_cache_usage_ratio, which measures how much KV cache is being used.
  • Synced commits with the parent repo (not relevant for the PR review).
  • A Llama 3 8B model baked into the image.

New image us.gcr.io/zenhub-ops/llama_cpp_python-llama3_8b_f16:v0.3.1 was successfully deployed into staging.

abetlen and others added 15 commits May 2, 2024 11:32
* set up streaming for v2

* assert v2 streaming, fix tool_call vs function_call

* fix streaming with tool_choice/function_call

* make functions return 1 function call only when 'auto'

* fix

---------

Co-authored-by: Andrei <abetlen@gmail.com>
…ing space (abetlen#1375)

* Fix tokenization edge case where llama output does not start with a space

See this notebook:
https://colab.research.google.com/drive/1Ooz11nFPk19zyJdMDx42CeesU8aWZMdI#scrollTo=oKpHw5PZ30uC

* Update _internals.py

Fixing to compare to b' ' instead of (str)' '

---------

Co-authored-by: Andrei <abetlen@gmail.com>
)

* Update dependabot.yml

Add github-actions update

* Update dependabot.yml

* Update dependabot.yml
@juanroesel juanroesel requested review from cwarje and m62534 May 7, 2024 02:32
@juanroesel

Copy link
Copy Markdown
Author

NOTE: GH Actions need to be updated in this repo. I will create a ticket for this soon.

@juanroesel juanroesel requested a review from blacklander May 7, 2024 17:17

@cwarje cwarje left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@juanroesel

juanroesel commented May 9, 2024

Copy link
Copy Markdown
Author

@m62534 @cwarje Just FYI, given today's events with Llama3, I built a new image us.gcr.io/zenhub-ops/llama_cpp_python_zh-mistral7b_f16:v0.2.1 containing these code changes plus the Mistral model and redeployed it in staging.

@juanroesel juanroesel merged commit 8cd638c into main May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants