Add GPU tensor support to decoupled API by Tabrizian · Pull Request #154 · triton-inference-server/python_backend

Tabrizian · 2022-05-11T21:54:26Z

Add support for GPU outputs
Add support for GPU inputs
Add backend utility to collect input buffers into a single buffer
Add testing

GuanLuo · 2022-05-13T23:48:41Z

+                  "GPU buffers size does not match the provided buffers: ") +
+              std::to_string(gpu_tensors.size()) +
+              " != " + std::to_string(*gpu_buffer_count));
+      return;


Send error?

This is more like assert and will never be triggered. I can add a comment for it.

GuanLuo · 2022-05-13T23:50:04Z

+      // limitation in the legacy CUDA IPC API that doesn't allow getting the
+      // handle of an exported pointer. If the cuda handle exists, it indicates
+      // that the cuda shared memory was used and the input is in a single
+      // buffer. [FIXME] for the case where the input is in cuda shared memory


Newline for [FIXME]

GuanLuo · 2022-05-13T23:51:30Z

      reinterpret_cast<ResponseSendMessage*>(send_message.data_.get());
  std::unique_ptr<PbString> error_message;
-  ScopedDefer _([send_message_payload] {
+  ScopedDefer _([&send_message_payload] {


Why take reference of a pointer?

No particular reason. I'll revert this change.

rmccorm4

Using Go-style defer is cool to see 🙂

GuanLuo · 2022-05-20T21:31:19Z

+        ++index;
+      }
+
+      // Additional round trip so that the stub can fill the GPU output buffers.


Why there is another round trip? the backend process signals back that there is CUDA IPC handle and wait for stub process to copy the data to within the CUDA IPC handle?

Do I summarize the workflow correctly:

stub requests for output buffer (and passing buffer via shared memory at the same time if CPU tensor)

backend acquires output buffer, copy if CPU tensor, otherwise, passing the CUDA IPC handle back to stub.

(only for GPU tensor) stub then copy GPU tensor into the CUDA IPC handle

Exactly. The third bullet point is what requires the round trip.

Tabrizian added 3 commits May 11, 2022 17:53

Add GPU tensor support to decoupled API

88b45a2

Fix GPU output buffers for response send

74f766b

Add GPU input support for decoupled API

e677a70

Tabrizian force-pushed the imant-gpu-tensor branch from 87fb69c to e677a70 Compare May 13, 2022 16:22

Tabrizian marked this pull request as ready for review May 13, 2022 21:38

Tabrizian requested review from krishung5 and tanmayv25 May 13, 2022 21:38

GuanLuo reviewed May 13, 2022

View reviewed changes

tanmayv25 requested changes May 17, 2022

View reviewed changes

Comment thread src/infer_response.cc Outdated

Comment thread src/infer_response.cc

Review edit

76759f0

Tabrizian requested a review from tanmayv25 May 18, 2022 19:16

rmccorm4 requested changes May 20, 2022

View reviewed changes

Comment thread src/infer_response.cc

Review edit

f349db5

Tabrizian force-pushed the imant-gpu-tensor branch from 724b1ce to f349db5 Compare May 20, 2022 20:36

tanmayv25 approved these changes May 20, 2022

View reviewed changes

Tabrizian requested review from GuanLuo, rmccorm4 and tanmayv25 May 20, 2022 21:07

tanmayv25 approved these changes May 20, 2022

View reviewed changes

GuanLuo reviewed May 20, 2022

View reviewed changes

rmccorm4 approved these changes May 21, 2022

View reviewed changes

Tabrizian merged commit 92245a7 into main May 21, 2022

Tabrizian deleted the imant-gpu-tensor branch May 24, 2022 00:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GPU tensor support to decoupled API#154

Add GPU tensor support to decoupled API#154
Tabrizian merged 5 commits into
mainfrom
imant-gpu-tensor

Tabrizian commented May 11, 2022 •

edited

Loading

Uh oh!

GuanLuo May 13, 2022

Uh oh!

Tabrizian May 16, 2022

Uh oh!

GuanLuo May 13, 2022

Uh oh!

GuanLuo May 13, 2022

Uh oh!

Tabrizian May 17, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rmccorm4 left a comment

Uh oh!

Uh oh!

Uh oh!

GuanLuo May 20, 2022

Uh oh!

Tabrizian May 20, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

Conversation

Tabrizian commented May 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GuanLuo May 13, 2022

Choose a reason for hiding this comment

Uh oh!

Tabrizian May 16, 2022

Choose a reason for hiding this comment

Uh oh!

GuanLuo May 13, 2022

Choose a reason for hiding this comment

Uh oh!

GuanLuo May 13, 2022

Choose a reason for hiding this comment

Uh oh!

Tabrizian May 17, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rmccorm4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

GuanLuo May 20, 2022

Choose a reason for hiding this comment

Uh oh!

Tabrizian May 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

Tabrizian commented May 11, 2022 •

edited

Loading

Tabrizian May 20, 2022 •

edited

Loading