Cleanup csrmv sparse-dense matmul OpenCL kernel wrapper code#3010
Merged
Conversation
Member
Author
I am going to look into kernel improvements of NN and TN ops now - differed for later. |
2 tasks
a4e4f8b to
059fdd6
Compare
Member
Author
I was unable to reproduce the 6x performance difference with this code reorg on lastest master with latest versions of dependencies. It seems like some sort of driver or upstream fix addressed the problem of optimizing out the unused global counter being passed for Nevertheless, the cleanup of the wrapper code is still valid. We don't need to pass the global counter when not used and we most definitely don't need to compile an additional kernel when not using it at all. I am going to reword the PR title to appropriately reflect the change. |
umar456
approved these changes
May 10, 2021
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes to Users
None.
Checklist
[ ] Functions added to unified API[ ] Functions documented