This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][Libomptarget] Collect allocatable memory pools using HSA
ClosedPublic

Authored by pdhaliwal on Jun 22 2021, 3:00 AM.

Details

Summary

The logic is almost similar to that of system.cpp with one change that
instead of adding all the memory pools to a device struct it only
keeps a single pool. The existing approach also always allocated memory on
the first HSA pool found for a GPU.

This depends on D104691. The goal of this series of patches is to remove
_atl_machine global. The next patch will drop g_atl_machine entirely.

Diff Detail

Event Timeline

pdhaliwal created this revision.Jun 22 2021, 3:00 AM
pdhaliwal requested review of this revision.Jun 22 2021, 3:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 22 2021, 3:00 AM
openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
373

This looks like it can return a hsa_status_t directly

hsa_status_t Err = get_info();
if (Err != HSA_SUCCESS) return Err;
if (!AllocAllowed) return HSA_ERROR; // or a more specific enum
return HSA_SUCCESS;

384

control flow here is simpler if the bool Valid is dropped by isValidMemoryPool returning a hsa_status_t

478

Is this per-cpu-agent, or per-cpu-agent-with-nonzero-memory-pool? I wonder if we'd be better off with a scalar hsa_amd_memory_pool_t HostFineGrainedMemoryPool; for now

722

Ah, it includes the agents with zero available memory. Not sure distinguishing between CPU agents is helpful at present.

Also, doesn't this plus the (size > 0) check mean that a threadripper would have a vector of length 4 of the host pools, but only index 0 and 1 would be populated, and not correspond to the iteration order of CPU agents?

pdhaliwal added inline comments.Jun 22 2021, 4:28 AM
openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
373

I was thinking that it would be a good idea to stop early if there any kind of error in hsa API.

478

It is per-cpu-agent. I haven't added any zero size check following the existing logic. We always use only one memory pool so using single scalar should not cause any issues.

722

There is no size >0 check so it would populate all the slots.

pdhaliwal updated this revision to Diff 353616.Jun 22 2021, 5:51 AM

Fix clang-tidy errors and use scalar for host memory pool.

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
373

that's probably fair, OK

595

Can probably return the error without printing here, the caller will know it was addDeviceMemoryPool that failed anyway. Medium term it's probably worth consolidating all the misc print statements into rtl.cpp and routing them through the DP() macro

638

This looks like it should be returning a hsa_status_t, the caller may want to disable offloading if the memory pool query fails

pdhaliwal updated this revision to Diff 353869.Jun 23 2021, 1:03 AM

Addressed review comments.

pdhaliwal marked 2 inline comments as done.Jun 23 2021, 7:32 AM
This revision is now accepted and ready to land.Jun 25 2021, 8:42 AM
This revision was landed with ongoing or failed builds.Jun 28 2021, 4:28 AM
This revision was automatically updated to reflect the committed changes.