Skip to content

Add GPU tensor support to decoupled API#154

Merged
Tabrizian merged 5 commits into
mainfrom
imant-gpu-tensor
May 21, 2022
Merged

Add GPU tensor support to decoupled API#154
Tabrizian merged 5 commits into
mainfrom
imant-gpu-tensor

Conversation

@Tabrizian

@Tabrizian Tabrizian commented May 11, 2022

Copy link
Copy Markdown
Member
  • Add support for GPU outputs
  • Add support for GPU inputs
  • Add backend utility to collect input buffers into a single buffer
  • Add testing

@Tabrizian Tabrizian marked this pull request as ready for review May 13, 2022 21:38
@Tabrizian Tabrizian requested review from krishung5 and tanmayv25 May 13, 2022 21:38
Comment thread src/response_sender.cc
"GPU buffers size does not match the provided buffers: ") +
std::to_string(gpu_tensors.size()) +
" != " + std::to_string(*gpu_buffer_count));
return;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Send error?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more like assert and will never be triggered. I can add a comment for it.

Comment thread src/python.cc Outdated
// limitation in the legacy CUDA IPC API that doesn't allow getting the
// handle of an exported pointer. If the cuda handle exists, it indicates
// that the cuda shared memory was used and the input is in a single
// buffer. [FIXME] for the case where the input is in cuda shared memory

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline for [FIXME]

Comment thread src/python.cc Outdated
reinterpret_cast<ResponseSendMessage*>(send_message.data_.get());
std::unique_ptr<PbString> error_message;
ScopedDefer _([send_message_payload] {
ScopedDefer _([&send_message_payload] {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why take reference of a pointer?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason. I'll revert this change.

Comment thread src/pb_stub.h
Comment thread src/infer_response.cc Outdated
Comment thread src/infer_response.cc
@Tabrizian Tabrizian requested a review from tanmayv25 May 18, 2022 19:16

@rmccorm4 rmccorm4 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Go-style defer is cool to see 🙂

Comment thread src/infer_response.cc
Comment thread src/infer_response.cc
Comment thread src/python.cc
++index;
}

// Additional round trip so that the stub can fill the GPU output buffers.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why there is another round trip? the backend process signals back that there is CUDA IPC handle and wait for stub process to copy the data to within the CUDA IPC handle?

Do I summarize the workflow correctly:

  1. stub requests for output buffer (and passing buffer via shared memory at the same time if CPU tensor)
  2. backend acquires output buffer, copy if CPU tensor, otherwise, passing the CUDA IPC handle back to stub.
  3. (only for GPU tensor) stub then copy GPU tensor into the CUDA IPC handle

@Tabrizian Tabrizian May 20, 2022

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. The third bullet point is what requires the round trip.

@Tabrizian Tabrizian merged commit 92245a7 into main May 21, 2022
@Tabrizian Tabrizian deleted the imant-gpu-tensor branch May 24, 2022 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants