Current implementation of join can be improved by performing the operation in a single call to the backend kernel instead of multiple calls.
This is a fairly easy kernel and may be a good issue for someone getting to know CUDA/ArrayFire internals. Ping me if you want additional info.
Current implementation of join can be improved by performing the operation in a single call to the backend kernel instead of multiple calls.
This is a fairly easy kernel and may be a good issue for someone getting to know CUDA/ArrayFire internals. Ping me if you want additional info.