Add OpenAI-Compatible APIs server by tsdocode · Pull Request #834 · haotian-liu/LLaVA

tsdocode · 2023-11-21T08:05:31Z

Add Openai-Compatible API:

OpenAI SDK version: 1.3.3

Request format follow GPT4-V documents:

messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],

Support endpoints:

Example:

from openai import OpenAI
import openai

api_key = ""
base_url = "http://localhost:8000/api/v1" 


client = OpenAI(
    api_key=api_key,
    base_url=base_url
)


response = client.chat.completions.create(
  model="llava-v1.5-7b",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=300,
  stream=True
)

# print(response.choices[0])

haotian-liu · 2023-11-22T05:35:47Z

This is a great feature to have! Is it still WIP or ready for review?

tsdocode · 2023-11-22T07:42:59Z

This PR is currently a work in progress!
One question: I can't find an official implementation for multiple image inference. My current approach is to concatenate the images horizontally. Do we have a better way for this or we must wait for next pre-trained that support multi-images inference better?

haotian-liu · 2023-11-28T16:32:10Z

Hi @tsdocode

I guess we can leave the multiple image inference for now, as the current model is not trained with that and the performance is not optimal. Thanks!

RobitYadda · 2023-11-30T08:47:59Z

@tsdocode This feature is really great. I want to experience it. Any updates on its progress?

tsdocode · 2023-12-01T04:03:35Z

@RobitYadda working on, this will be ready to review soon!

tsdocode · 2023-12-02T17:08:53Z

Hi @haotian-liu, this PR is ready for reviewe now. I have made some changes:

1. Conversation many to one GPT4V message to LLaVA:

As my knowledge, LLaVA now just accepted 1:1 message between bot and human which accepted only one image, but GPT4-V can handle multiple text message from user. Current logic to convert GPT4V format to LLaVA is:

Assume request messages from openai sdk contain:

[Text_1, Image_1, Image_2, Text_2, Image_3, Text_3]

It will be transformed into:

[
(Text_1, Image_1),
(_, Image_2),
(Text_2, Image_3),
(Text_3, _)
]

2. Multiple image inference:

Since we know that the current model is not optimized for multiple image prediction, I added a tricky way (concat) that may help the model understand multiple images at once. There will be 2 options: horizontal, vertical and no. The concatenated output image will look like this:

Usage in openai sdk:

from openai import OpenAI
import openai

api_key = ""
base_url = "http://localhost:8001/api/v1"


client = OpenAI(
    api_key=api_key,
    base_url=base_url
)


response = client.chat.completions.create(
  model="llava-v1.5-7b",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Compare these image"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTBCh78hHqK9wbpQ3qnGXj9CKo4a-ZSZxYKlMHSt1w3zg&s",
          },
        },
          {
          "type": "image_url",
          "image_url": {
            "url": "https://t4.ftcdn.net/jpg/00/97/58/97/360_F_97589769_t45CqXyzjz0KXwoBZT9PRaWGHRk5hQqQ.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=500,
  stream=True,
  # extra_body={
  #    "concat": "vertical"
  # }
)

# print(response.choices[0])

Experiment

Question: Compare these images
Images:

Result:
No concat:
The image showcases a brown and white cat sitting in a grassy field. The cat has a white face and is looking at the camera. The field appears to be covered in yellow leaves, giving a warm and natural atmosphere. The cat appears to be enjoying the outdoors, possibly playing or exploring the area.
Vertical concat:
In the image, a dog and a cat are shown in close proximity to each other, sitting on the grass. They seem to be enjoying their time in the outdoors under the sun.
Horizontal concat:
In the image, there are two pictures of dogs. One picture is of a brown and white dog laying in the grass, while the other picture is of a cat sitting on the grass. The cat appears to be looking with an intense or angry expression. The dog in the first picture is smiling and appears to be enjoying the grassy field. The two pictures are presented side-by-side, showcasing the differences between the two species.

haotian-liu · 2023-12-02T19:56:46Z

Thank you! Before I start reviewing, I just want to quickly make sure that my understanding of the PR is correct. We'll rely on the model_worker for the actual inference, and openai_api_server will be the frontend, similar to the gradio_web_server, relaying user's requests to model_worker and its response back.

Also, we do not need the user to install fastchat package, is that correct? (the current README looks like that the user needs to install fastchat for serving the OpenAI api, if it is because most of the code are based on fastchat, we can (and we should) give credit to them in both readme / code, but make it clear that llava package alone is sufficient for serving)

Thanks!

tsdocode · 2023-12-02T20:22:02Z

Hi @haotian-liu,

Regarding the first statement, it is correct that the openai_api_server acts as the frontend, similar to what the gradio_server does.

For the second statement, most of the implementation is based on the fastchat openai_api_server. I have also utilized some datamodels from fastchat so users may need to install fastchat as well. Alternatively, I can re-define these datamodel (I think it had better to do this).

Apologies for the mistake in the README. The correct command should be:


python3 -m llava.serve.openai_api_server

Let me know if you have any further question!

tsdocode · 2023-12-03T16:19:12Z

Hi @haotian-liu, I have made an update to the code so that users no longer need to install fastchat separately. Now, installing llava is sufficient. I have also given credit to Fastchat in both the code and the README file.

An apendix new serving architechture:

flowchart BT
    %% Declare Nodes
    gws("Gradio (UI Server)")
    openai("OpenAI (OpenAI API Server)")

    c("Controller (API Server):<br/>PORT: 10000")
    mw7b("Model Worker:<br/>llava-v1.5-7b<br/>PORT: 40000")
    mw13b("Model Worker:<br/>llava-v1.5-13b<br/>PORT: 40001")

    %% Declare Styles
    classDef data fill:#3af,stroke:#48a,stroke-width:2px,color:#444
    classDef success fill:#8f8,stroke:#0a0,stroke-width:2px,color:#444
    classDef failure fill:#f88,stroke:#f00,stroke-width:2px,color:#444

    %% Assign Styles
    class id,od data;
    class cimg,cs_s,scsim_s success;
    class ncimg,cs_f,scsim_f failure;

    subgraph Demo Connections
        direction BT
        c<-->gws
        c<-->openai

        
        mw7b<-->c
        mw13b<-->c
    end

tailyer · 2024-01-03T08:51:06Z

@haotian-liu: I have tested this PR yesterday and it worked nicely with OpenAI docs. One thing is that the role strings for "llava-v1" are all written in capitals, eg USER; ASSISTANT. This made streaming response to behave weirdly. Besides that, non-streaming response works just fine.

tsdocode and others added 6 commits November 21, 2023 07:32

feat: add openai api server

bd2707e

docs: add OpenAI to README

4094a49

feat: add Fastchat to deps

842a01f

Update README.md

2c7bdd0

feat: add images concat for multiples images input

6f0811b

fix: image url type

d15c985

coolrazor007 mentioned this pull request Nov 30, 2023

feat: add openai api base url OthersideAI/self-operating-computer#11

Merged

tsdocode added 5 commits December 2, 2023 16:34

feat: add concat options

8502fde

feat: add new concat method

2975c66

fix: fix pydantic-setting import

9e62dea

fix: missing last text message

5305ed8

fix: diffence size concat

ea80ff2

tsdocode added 2 commits December 3, 2023 16:03

feat: re-define some fastchat protocol

91e15eb

docs: update openai_api_server ducoment inside README.md

9823928

SuperMasterBlasterLaser mentioned this pull request Feb 14, 2024

[Question] is there any documentation on how to use the worker/controller? #1118

Open

tsdocode closed this Jul 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenAI-Compatible APIs server#834

Add OpenAI-Compatible APIs server#834
tsdocode wants to merge 13 commits intohaotian-liu:mainfrom
tsdocode:feat/openai-api

tsdocode commented Nov 21, 2023

Uh oh!

haotian-liu commented Nov 22, 2023

Uh oh!

tsdocode commented Nov 22, 2023

Uh oh!

haotian-liu commented Nov 28, 2023

Uh oh!

RobitYadda commented Nov 30, 2023

Uh oh!

tsdocode commented Dec 1, 2023

Uh oh!

tsdocode commented Dec 2, 2023 •

edited

Loading

Uh oh!

haotian-liu commented Dec 2, 2023

Uh oh!

tsdocode commented Dec 2, 2023

Uh oh!

tsdocode commented Dec 3, 2023

Uh oh!

tailyer commented Jan 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tsdocode commented Nov 21, 2023