This is an archive of the discontinued LLVM Phabricator instance.

[Libomptarget] Configure the RPC port count from the plugin
ClosedPublic

Authored by jhuber6 on Jul 20 2023, 5:41 PM.

Details

Summary

This patch allows us to configure the port count to what the specific
card would desire for parallelism. For AMDGPU we need to use the maximum
number of hardware parallelism to avoid deadlocks. For NVPTX we don't
have this problem due to the friendlier scheduler, so we use the number
of warps active on an SM times the number of SMs as a good guess.

Note that the max ports currently is going to be smaller than these
numbers. That will be improved in the future.

Diff Detail

Event Timeline

jhuber6 created this revision.Jul 20 2023, 5:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2023, 5:41 PM
jhuber6 requested review of this revision.Jul 20 2023, 5:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2023, 5:41 PM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
1660

This is valid if a wave opens at most one port at a time. I think an argument could be made that a wave could try to open one port per thread. Likely to be something we should add to the libc tests.

Async also comprises the sizing, though I currently think we should limit openmp to synchronous calls.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.h
770

Typos. Also isn't num_teams the wrong constant here? Should be whatever openmp calls warps

openmp/libomptarget/plugins-nextgen/common/PluginInterface/RPC.cpp
64

This should probably be a hard error if the plugin wants more ports than it can have, especially on platforms where that implies deadlock risk

openmp/libomptarget/plugins-nextgen/cuda/src/rtl.cpp
391

It'll affect performance.

Also I'm not totally confident Nvidia has a fair scheduler on SMs, that would be a good thing to check. Amdgpu does not have a fair scheduler on CUs

899

Typo. Also doesn't seem to match the comment

jdoerfert added inline comments.Jul 20 2023, 6:59 PM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
2471

In a follow up, can you please expose this via ompx_get_hardware_num_processing_elements(int device) (or whatever the non device version is spelled)?

jhuber6 added inline comments.Jul 20 2023, 8:21 PM
openmp/libomptarget/plugins-nextgen/amdgpu/src/rtl.cpp
2471

Yeah I'm not sure how to calculate this on CUDA unfortunately.

openmp/libomptarget/plugins-nextgen/common/PluginInterface/RPC.cpp
64

Was putting that off considering that the limit is currently like 64, which is far below the like 2000 that most platforms will want.

jdoerfert accepted this revision.Aug 11 2023, 10:51 AM

LG, fix comments.

This revision is now accepted and ready to land.Aug 11 2023, 10:51 AM