Page MenuHomePhabricator

[OpenMP][DeviceRTL][AMDGPU] Support code object version 5
Needs ReviewPublic

Authored by saiislam on Dec 9 2022, 11:10 AM.

Details

Summary

Update DeviceRTL and the AMDGPU plugin to use code
object version 5. Default is code object version 4.

DeviceRTL uses rocm-device-libs instead of directly calling
amdgcn builtins for the functions which are affected by
cov5.

AMDGPU plugin queries the ELF for code object version
and then prepares various implicitargs accordingly.

Diff Detail

Event Timeline

saiislam created this revision.Dec 9 2022, 11:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 9 2022, 11:10 AM
saiislam requested review of this revision.Dec 9 2022, 11:10 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptDec 9 2022, 11:10 AM

Maybe we should wait until D138389 lands and we can update both, otherwise we'd need a second patch.

I'm not fully up-to-date, what's the main difference and advantage of the new code object version? What do all the new implicit arguments do.

clang/lib/Driver/ToolChains/AMDGPU.cpp
953

Unrelated?

openmp/libomptarget/DeviceRTL/include/Interface.h
169

This should probably use variants to match the rest of the style, also if you intend to read these outside of the library you'll need to put them in the exports file and set their visibility.

openmp/libomptarget/DeviceRTL/src/Mapping.cpp
19

What if this isn't defined? We should be able to use the OpenMP library without the AMD device libraries. Should it be extern weak?

openmp/libomptarget/DeviceRTL/src/State.cpp
73

Variants

openmp/libomptarget/plugins/amdgpu/impl/get_elf_mach_gfx_name.h
15

Unrelated, but is there any particular reason these aren't defined in the hsa_amd_ext.h?

yaxunl added inline comments.Dec 9 2022, 11:32 AM
clang/lib/Driver/ToolChains/Clang.cpp
7085

Any reason you need the original args? This will bypass the driver translation, which should not in normal cases.

7086

clang -cc1 needs this to be default value false to emit code object version module flag

Could we elaborate on the benefits, please. Now we support two versions?

Why is this helpful:

DeviceRTL uses rocm-device-libs instead of directly calling amdgcn builtins for the functions which are affected by cov5.

openmp/libomptarget/DeviceRTL/src/State.cpp
81

Why do we need the "external..." stuff anyway?

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
2233

What is this all about?

I am reluctant to add the dependency edge to rocm device libs to openmp's GPU runtime.

We currently require that library for libm, which I'm also not thrilled about, but at least you can currently build and run openmp programs (that don't use libm, like much of our tests) without it.

My problem with device libs is that it usually doesn't build with trunk. It follows a rolling dev model tied to rocm clang and when upstream does something that takes a long time to apply, device libs doesn't build until rocm catches up. I've literally never managed to compile any branch of device libs with trunk clang without modifying the source, generally to delete directories that don't look necessary for libm.

Further, selecting an ABI based on runtime code found in a library which is hopefully constant folded is a weird layering choice. The compiler knows what ABI it is emitting code for, and that's how it picks files from device libs to effect that choice, but it would make far more sense to me for the compiler back end to set this stuff up itself.

Also, if we handle ABI in the back end, then we don't get the inevitable problem of rocm device libs and trunk clang having totally different ideas of what the ABI is as they drift in and out of sync.

tianshilei1992 added inline comments.
openmp/libomptarget/DeviceRTL/src/Mapping.cpp
19

It should be put into AMD's declare variant.

Thanks everyone for your review and comments!
I am going to address all of them in a series of smaller patches starting with D140784.

saiislam added inline comments.Wed, Jan 4, 7:08 AM
openmp/libomptarget/DeviceRTL/src/Mapping.cpp
50

If we still don't want to depend on rocm-device-libs then we will have to do something like (haven't tried this code yet):

uint32_t getNumHardwareThreadsInBlock() {
   if (__oclc_ABI_version < 500) {
      return __builtin_amdgcn_workgroup_size_x();
   } else {
      void *implicitArgPtr = __builtin_amdgcn_implicitarg_ptr();
      return (ushort)implicitArgPtr[6];
}
80
uint32_t getNumberOfBlocks() {
   if (__oclc_ABI_version < 500) {
      return __builtin_amdgcn_grid_size_x() / __builtin_amdgcn_workgroup_size_x();
   } else {
      void *implicitArgPtr = __builtin_amdgcn_implicitarg_ptr();
      return (uint)implicitArgPtr[0];
}
saiislam marked an inline comment as done.Wed, Jan 18, 7:31 AM
saiislam added inline comments.
clang/lib/Driver/ToolChains/Clang.cpp
7085

We need derived args to look for mcode-object-version. I have created a separate review for this change. Please have a look at D142022