Skip to content

Add OpenAI-Compatible APIs server#834

Closed
tsdocode wants to merge 13 commits intohaotian-liu:mainfrom
tsdocode:feat/openai-api
Closed

Add OpenAI-Compatible APIs server#834
tsdocode wants to merge 13 commits intohaotian-liu:mainfrom
tsdocode:feat/openai-api

Conversation

@tsdocode
Copy link
Copy Markdown

Add Openai-Compatible API:

OpenAI SDK version: 1.3.3

Request format follow GPT4-V documents:

messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],

Support endpoints:

Screen Shot 2023-11-21 at 14 50 58

Example:

from openai import OpenAI
import openai

api_key = ""
base_url = "http://localhost:8000/api/v1" 


client = OpenAI(
    api_key=api_key,
    base_url=base_url
)


response = client.chat.completions.create(
  model="llava-v1.5-7b",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "What’s in this image?"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=300,
  stream=True
)

# print(response.choices[0])

@haotian-liu
Copy link
Copy Markdown
Owner

This is a great feature to have! Is it still WIP or ready for review?

@tsdocode
Copy link
Copy Markdown
Author

This PR is currently a work in progress!
One question: I can't find an official implementation for multiple image inference. My current approach is to concatenate the images horizontally. Do we have a better way for this or we must wait for next pre-trained that support multi-images inference better?

@haotian-liu
Copy link
Copy Markdown
Owner

Hi @tsdocode

I guess we can leave the multiple image inference for now, as the current model is not trained with that and the performance is not optimal. Thanks!

@RobitYadda
Copy link
Copy Markdown

@tsdocode This feature is really great. I want to experience it. Any updates on its progress?

@tsdocode
Copy link
Copy Markdown
Author

tsdocode commented Dec 1, 2023

@RobitYadda working on, this will be ready to review soon!

@tsdocode
Copy link
Copy Markdown
Author

tsdocode commented Dec 2, 2023

Hi @haotian-liu, this PR is ready for reviewe now. I have made some changes:

1. Conversation many to one GPT4V message to LLaVA:

As my knowledge, LLaVA now just accepted 1:1 message between bot and human which accepted only one image, but GPT4-V can handle multiple text message from user. Current logic to convert GPT4V format to LLaVA is:

Assume request messages from openai sdk contain:

[Text_1, Image_1, Image_2, Text_2, Image_3, Text_3]

It will be transformed into:

[
(Text_1, Image_1),
(_, Image_2),
(Text_2, Image_3),
(Text_3, _)
]

2. Multiple image inference:

Since we know that the current model is not optimized for multiple image prediction, I added a tricky way (concat) that may help the model understand multiple images at once. There will be 2 options: horizontal, vertical and no. The concatenated output image will look like this:

jg (12)

Usage in openai sdk:

from openai import OpenAI
import openai

api_key = ""
base_url = "http://localhost:8001/api/v1"


client = OpenAI(
    api_key=api_key,
    base_url=base_url
)


response = client.chat.completions.create(
  model="llava-v1.5-7b",
  messages=[
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Compare these image"},
        {
          "type": "image_url",
          "image_url": {
            "url": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTBCh78hHqK9wbpQ3qnGXj9CKo4a-ZSZxYKlMHSt1w3zg&s",
          },
        },
          {
          "type": "image_url",
          "image_url": {
            "url": "https://t4.ftcdn.net/jpg/00/97/58/97/360_F_97589769_t45CqXyzjz0KXwoBZT9PRaWGHRk5hQqQ.jpg",
          },
        },
      ],
    }
  ],
  max_tokens=500,
  stream=True,
  # extra_body={
  #    "concat": "vertical"
  # }
)

# print(response.choices[0])

Experiment

  • Question: Compare these images

  • Images:

  • Result:

  • No concat:
    The image showcases a brown and white cat sitting in a grassy field. The cat has a white face and is looking at the camera. The field appears to be covered in yellow leaves, giving a warm and natural atmosphere. The cat appears to be enjoying the outdoors, possibly playing or exploring the area.

  • Vertical concat:
    In the image, a dog and a cat are shown in close proximity to each other, sitting on the grass. They seem to be enjoying their time in the outdoors under the sun.

  • Horizontal concat:
    In the image, there are two pictures of dogs. One picture is of a brown and white dog laying in the grass, while the other picture is of a cat sitting on the grass. The cat appears to be looking with an intense or angry expression. The dog in the first picture is smiling and appears to be enjoying the grassy field. The two pictures are presented side-by-side, showcasing the differences between the two species.

@haotian-liu
Copy link
Copy Markdown
Owner

Thank you! Before I start reviewing, I just want to quickly make sure that my understanding of the PR is correct. We'll rely on the model_worker for the actual inference, and openai_api_server will be the frontend, similar to the gradio_web_server, relaying user's requests to model_worker and its response back.

Also, we do not need the user to install fastchat package, is that correct? (the current README looks like that the user needs to install fastchat for serving the OpenAI api, if it is because most of the code are based on fastchat, we can (and we should) give credit to them in both readme / code, but make it clear that llava package alone is sufficient for serving)

Thanks!

@tsdocode
Copy link
Copy Markdown
Author

tsdocode commented Dec 2, 2023

Hi @haotian-liu,

Regarding the first statement, it is correct that the openai_api_server acts as the frontend, similar to what the gradio_server does.

For the second statement, most of the implementation is based on the fastchat openai_api_server. I have also utilized some datamodels from fastchat so users may need to install fastchat as well. Alternatively, I can re-define these datamodel (I think it had better to do this).

Apologies for the mistake in the README. The correct command should be:


python3 -m llava.serve.openai_api_server

Let me know if you have any further question!

@tsdocode
Copy link
Copy Markdown
Author

tsdocode commented Dec 3, 2023

Hi @haotian-liu, I have made an update to the code so that users no longer need to install fastchat separately. Now, installing llava is sufficient. I have also given credit to Fastchat in both the code and the README file.

An apendix new serving architechture:

flowchart BT
    %% Declare Nodes
    gws("Gradio (UI Server)")
    openai("OpenAI (OpenAI API Server)")

    c("Controller (API Server):<br/>PORT: 10000")
    mw7b("Model Worker:<br/>llava-v1.5-7b<br/>PORT: 40000")
    mw13b("Model Worker:<br/>llava-v1.5-13b<br/>PORT: 40001")

    %% Declare Styles
    classDef data fill:#3af,stroke:#48a,stroke-width:2px,color:#444
    classDef success fill:#8f8,stroke:#0a0,stroke-width:2px,color:#444
    classDef failure fill:#f88,stroke:#f00,stroke-width:2px,color:#444

    %% Assign Styles
    class id,od data;
    class cimg,cs_s,scsim_s success;
    class ncimg,cs_f,scsim_f failure;

    subgraph Demo Connections
        direction BT
        c<-->gws
        c<-->openai

        
        mw7b<-->c
        mw13b<-->c
    end
Loading

@tailyer
Copy link
Copy Markdown

tailyer commented Jan 3, 2024

@haotian-liu: I have tested this PR yesterday and it worked nicely with OpenAI docs. One thing is that the role strings for "llava-v1" are all written in capitals, eg USER; ASSISTANT. This made streaming response to behave weirdly. Besides that, non-streaming response works just fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants