This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][Libomptarget] Rework logic for locating kernarg pools
ClosedPublic

Authored by pdhaliwal on Jun 3 2021, 2:10 AM.

Details

Summary

Previous logic was to always use the first kernarg pool found to allocate
kernel args. This patch changes this to use only the kernarg pool which
has non-zero size. This logic is also reworked to not use any globals.

Diff Detail

Event Timeline

pdhaliwal created this revision.Jun 3 2021, 2:10 AM
pdhaliwal requested review of this revision.Jun 3 2021, 2:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2021, 2:10 AM

Took a while to make sense of this. I think you've preserved the existing strategy of building a collection of kernarg memory pools, while moving the global into a class member and adding some more tests on whether a pool should be added to the collection.

Some tests are done when adding a memory pool to the collection, and some are done when considering whether to take a pool out of the collection. Also, we only ever use a single memory pool. So I think things will be simpler if we go a step further and change the logic to 'try to find exactly one memory_pool that we will use for kernarg on this gpu', and put all the sanity checking (allocatable, size>0, fine grain, kernarg ok etc) behind that one call, which returns a single memory pool (or pair {pool, rc}) instead of a collection.

The proc->addMemory(new_mem); interface survives this change, so I'd guess some of the noise is from pulling kernarg allocation out of the ATMI infra while leaving GPU allocation within it.

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
75

not keen on extern declarations in source (I know there's a print kernel trace one there already...), can we move this global definition into rtl.cpp, or failing that put the extern declaration in some header shared with the file that does define it?

pdhaliwal updated this revision to Diff 349806.Jun 4 2021, 3:48 AM

Addressed review comments.

  • Keeping only single memory pool as that was the only one being used.
  • Removed extern in favor of already declared on in machine.h
pdhaliwal marked an inline comment as done.Jun 4 2021, 3:48 AM
pdhaliwal updated this revision to Diff 349807.Jun 4 2021, 3:49 AM

Remove comment.

JonChesterfield accepted this revision.Jun 4 2021, 5:24 AM

That worked out cleanly, thanks. The temporary vector used to indicate failure with .empty is slightly unusual but obviously works.

This revision is now accepted and ready to land.Jun 4 2021, 5:24 AM
This revision was landed with ongoing or failed builds.Jun 6 2021, 11:41 PM
This revision was automatically updated to reflect the committed changes.