Previous logic was to always use the first kernarg pool found to allocate
kernel args. This patch changes this to use only the kernarg pool which
has non-zero size. This logic is also reworked to not use any globals.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Took a while to make sense of this. I think you've preserved the existing strategy of building a collection of kernarg memory pools, while moving the global into a class member and adding some more tests on whether a pool should be added to the collection.
Some tests are done when adding a memory pool to the collection, and some are done when considering whether to take a pool out of the collection. Also, we only ever use a single memory pool. So I think things will be simpler if we go a step further and change the logic to 'try to find exactly one memory_pool that we will use for kernarg on this gpu', and put all the sanity checking (allocatable, size>0, fine grain, kernarg ok etc) behind that one call, which returns a single memory pool (or pair {pool, rc}) instead of a collection.
The proc->addMemory(new_mem); interface survives this change, so I'd guess some of the noise is from pulling kernarg allocation out of the ATMI infra while leaving GPU allocation within it.
openmp/libomptarget/plugins/amdgpu/src/rtl.cpp | ||
---|---|---|
75 | not keen on extern declarations in source (I know there's a print kernel trace one there already...), can we move this global definition into rtl.cpp, or failing that put the extern declaration in some header shared with the file that does define it? |
Addressed review comments.
- Keeping only single memory pool as that was the only one being used.
- Removed extern in favor of already declared on in machine.h
That worked out cleanly, thanks. The temporary vector used to indicate failure with .empty is slightly unusual but obviously works.
not keen on extern declarations in source (I know there's a print kernel trace one there already...), can we move this global definition into rtl.cpp, or failing that put the extern declaration in some header shared with the file that does define it?