Add OpenAI-Compatible APIs server#834
Conversation
|
This is a great feature to have! Is it still WIP or ready for review? |
|
This PR is currently a work in progress! |
|
Hi @tsdocode I guess we can leave the multiple image inference for now, as the current model is not trained with that and the performance is not optimal. Thanks! |
|
@tsdocode This feature is really great. I want to experience it. Any updates on its progress? |
|
@RobitYadda working on, this will be ready to review soon! |
|
Hi @haotian-liu, this PR is ready for reviewe now. I have made some changes: 1. Conversation many to one GPT4V message to LLaVA:As my knowledge, LLaVA now just accepted 1:1 message between bot and human which accepted only one image, but GPT4-V can handle multiple text message from user. Current logic to convert GPT4V format to LLaVA is: Assume request messages from openai sdk contain: It will be transformed into: 2. Multiple image inference:Since we know that the current model is not optimized for multiple image prediction, I added a tricky way (concat) that may help the model understand multiple images at once. There will be 2 options: horizontal, vertical and no. The concatenated output image will look like this: Usage in openai sdk:from openai import OpenAI
import openai
api_key = ""
base_url = "http://localhost:8001/api/v1"
client = OpenAI(
api_key=api_key,
base_url=base_url
)
response = client.chat.completions.create(
model="llava-v1.5-7b",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Compare these image"},
{
"type": "image_url",
"image_url": {
"url": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTBCh78hHqK9wbpQ3qnGXj9CKo4a-ZSZxYKlMHSt1w3zg&s",
},
},
{
"type": "image_url",
"image_url": {
"url": "https://t4.ftcdn.net/jpg/00/97/58/97/360_F_97589769_t45CqXyzjz0KXwoBZT9PRaWGHRk5hQqQ.jpg",
},
},
],
}
],
max_tokens=500,
stream=True,
# extra_body={
# "concat": "vertical"
# }
)
# print(response.choices[0])Experiment
|
|
Thank you! Before I start reviewing, I just want to quickly make sure that my understanding of the PR is correct. We'll rely on the Also, we do not need the user to install Thanks! |
|
Hi @haotian-liu, Regarding the first statement, it is correct that the openai_api_server acts as the frontend, similar to what the gradio_server does. For the second statement, most of the implementation is based on the fastchat openai_api_server. I have also utilized some datamodels from fastchat so users may need to install fastchat as well. Alternatively, I can re-define these datamodel (I think it had better to do this). Apologies for the mistake in the README. The correct command should be: Let me know if you have any further question! |
|
Hi @haotian-liu, I have made an update to the code so that users no longer need to install fastchat separately. Now, installing llava is sufficient. I have also given credit to Fastchat in both the code and the README file. An apendix new serving architechture: flowchart BT
%% Declare Nodes
gws("Gradio (UI Server)")
openai("OpenAI (OpenAI API Server)")
c("Controller (API Server):<br/>PORT: 10000")
mw7b("Model Worker:<br/>llava-v1.5-7b<br/>PORT: 40000")
mw13b("Model Worker:<br/>llava-v1.5-13b<br/>PORT: 40001")
%% Declare Styles
classDef data fill:#3af,stroke:#48a,stroke-width:2px,color:#444
classDef success fill:#8f8,stroke:#0a0,stroke-width:2px,color:#444
classDef failure fill:#f88,stroke:#f00,stroke-width:2px,color:#444
%% Assign Styles
class id,od data;
class cimg,cs_s,scsim_s success;
class ncimg,cs_f,scsim_f failure;
subgraph Demo Connections
direction BT
c<-->gws
c<-->openai
mw7b<-->c
mw13b<-->c
end
|
|
@haotian-liu: I have tested this PR yesterday and it worked nicely with OpenAI docs. One thing is that the role strings for "llava-v1" are all written in capitals, eg USER; ASSISTANT. This made streaming response to behave weirdly. Besides that, non-streaming response works just fine. |


Add Openai-Compatible API:
OpenAI SDK version: 1.3.3
Request format follow GPT4-V documents:
Support endpoints:
Example: