Do this inside a for loop with offsets. This will be needed for GFOR
CUDA
OpenCL
CPU
The first draft can be done inside a for loop using multiple queues / streams.
Note: dot product function has been removed from the above list since a batch dot product can be easily done using matmul function.
Do this inside a for loop with offsets. This will be needed for GFOR
CUDA
[ ] dotOpenCL
[ ] dotCPU
[ ] dotThe first draft can be done inside a for loop using multiple queues / streams.
Note: dot product function has been removed from the above list since a batch dot product can be easily done using
matmulfunction.