#1 Updating to latest repo version with LLM Monitoring metrics by juanroesel · Pull Request #1 · ZenHubHQ/llama-cpp-python

juanroesel · 2024-05-07T02:32:16Z

Closes ZenHubHQ/devops#2205

It also adds includes the following:

A new metric kv_cache_usage_ratio, which measures how much KV cache is being used.
Synced commits with the parent repo (not relevant for the PR review).
A Llama 3 8B model baked into the image.

New image us.gcr.io/zenhub-ops/llama_cpp_python-llama3_8b_f16:v0.3.1 was successfully deployed into staging.

* set up streaming for v2 * assert v2 streaming, fix tool_call vs function_call * fix streaming with tool_choice/function_call * make functions return 1 function call only when 'auto' * fix --------- Co-authored-by: Andrei <abetlen@gmail.com>

…ing space (abetlen#1375) * Fix tokenization edge case where llama output does not start with a space See this notebook: https://colab.research.google.com/drive/1Ooz11nFPk19zyJdMDx42CeesU8aWZMdI#scrollTo=oKpHw5PZ30uC * Update _internals.py Fixing to compare to b' ' instead of (str)' ' --------- Co-authored-by: Andrei <abetlen@gmail.com>

) * Update dependabot.yml Add github-actions update * Update dependabot.yml * Update dependabot.yml

juanroesel · 2024-05-07T02:40:23Z

NOTE: GH Actions need to be updated in this repo. I will create a ticket for this soon.

cwarje

LGTM

juanroesel · 2024-05-09T01:33:57Z

@m62534 @cwarje Just FYI, given today's events with Llama3, I built a new image us.gcr.io/zenhub-ops/llama_cpp_python_zh-mistral7b_f16:v0.2.1 containing these code changes plus the Mistral model and redeployed it in staging.

abetlen and others added 15 commits May 2, 2024 11:32

feat: Add llama-3-vision-alpha chat format

31b1d95

feat: Update llama.cpp

d75dea1

chore: Bump version

2117122

fix(server): Propagate flash_attn to model load. (abetlen#1424)

2138561

feat(server): Remove temperature bounds checks for server. Closes abe…

0a454be

…tlen#1384

fix: Use memmove to copy str_value kv_override. Closes abetlen#1417

9f7a855

Merge branch 'main' of github.com:abetlen/llama_cpp_python into main

f9b7221

feat: Update llama.cpp

3e2597e

feat(ci): Add docker checks and check deps more frequently (abetlen#1426

3666833

) * Update dependabot.yml Add github-actions update * Update dependabot.yml * Update dependabot.yml

feat(server): Add support for setting root_path. Closes abetlen#1420

0318702

Ported over prometheus implementation from previous repo

1972445

Added kn_cache_usage_ratio metric

edd0ec6

Merge branch 'abetlen:main' into llm-monitoring

e42c1d6

juanroesel requested review from cwarje and m62534 May 7, 2024 02:32

juanroesel requested a review from blacklander May 7, 2024 17:17

cwarje approved these changes May 7, 2024

View reviewed changes

Pulled synced commits locally and changed data type

bd84f3c

m62534 approved these changes May 8, 2024

View reviewed changes

juanroesel merged commit 8cd638c into main May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#1 Updating to latest repo version with LLM Monitoring metrics#1

#1 Updating to latest repo version with LLM Monitoring metrics#1
juanroesel merged 16 commits into
mainfrom
llm-monitoring

juanroesel commented May 7, 2024 •

edited

Loading

Uh oh!

juanroesel commented May 7, 2024

Uh oh!

cwarje left a comment

Uh oh!

juanroesel commented May 9, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

juanroesel commented May 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juanroesel commented May 7, 2024

Uh oh!

cwarje left a comment

Choose a reason for hiding this comment

Uh oh!

juanroesel commented May 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

juanroesel commented May 7, 2024 •

edited

Loading

juanroesel commented May 9, 2024 •

edited

Loading