This is an archive of the discontinued LLVM Phabricator instance.

[clang][openmp][NFC] Remove arch-specific CGOpenMPRuntimeGPU files
ClosedPublic

Authored by atmnpatel on Nov 8 2021, 10:31 AM.

Details

Summary

The existing CGOpenMPRuntimeAMDGCN and CGOpenMPRuntimeNVPTX classes are
just code bloat. By removing them, the codebase gets a bit cleaner.

Diff Detail

Event Timeline

atmnpatel created this revision.Nov 8 2021, 10:31 AM
atmnpatel requested review of this revision.Nov 8 2021, 10:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 8 2021, 10:31 AM

I remember at some point we want to just emit function call to get those information like thread id.

I remember at some point we want to just emit function call to get those information like thread id.

Yes, no intrinsics please. Add __kmpc_get_warp_size and similar functions to both runtimes and then emit calls to them instead.

JonChesterfield added a comment.EditedNov 8 2021, 12:09 PM

Example of the function as opposed to intrinsics is __kmpc_get_hardware_num_threads_in_block from just above where you've modified. That corresponds to a function in the device runtime, e.g.

int __kmpc_get_hardware_num_threads_in_block() {
  return get_workgroup_dim(__builtin_amdgcn_workgroup_id_x(),
                           __builtin_amdgcn_grid_size_x(),
                           __builtin_amdgcn_workgroup_size_x());
}

and

int __kmpc_get_hardware_num_threads_in_block() {
  return __nvvm_read_ptx_sreg_ntid_x();
}

warp size is currently handled in two different ways in clang. Some is by:
RT.getGPUWarpSize(CGF)
and some by
CGF.getTarget().getGridValue().GV_Warp_Size

We may be stuck with the latter style in some places, in which case we may want getWarpSize() to be a function defined in CGOpenMPRuntimeGPU.cpp (which digs magic numbers out of that structure) instead of a call into the runtime, especially if:
auto *Ty = llvm::ArrayType::get(CGM.Int32Ty, WarpSize);
^ requires an unsigned for the array size, as opposed to a call to a function that will return unsigned

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
3966

This does work. The benefit of adding the functions to the device runtime (which contain these intrinsic calls) is we get uniformity of the generated IR, modulo the unfortunate addrspace casts, so we can do nice things like pattern match on the name of the device runtime function

clang/lib/CodeGen/CodeGenModule.cpp
245

Looks like we could fold these cases by renaming the assert

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptNov 8 2021, 1:07 PM
jdoerfert accepted this revision.Nov 8 2021, 1:17 PM

LG

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
23

needed? also the intrinsic include below. They should not be (maybe through a follow up commit) as we should get rid of all intrinsic uses here.

This revision is now accepted and ready to land.Nov 8 2021, 1:17 PM
atmnpatel updated this revision to Diff 385615.Nov 8 2021, 1:24 PM

Remove intrinsic includes

JonChesterfield added a comment.EditedNov 8 2021, 1:46 PM

I think __kmpc_get_warp_size needs an entry in Utils.cpp keepAlive(), and corresponding functions added to deviceRTLs/*/src/target_impl.*. Clang change looks great!

atmnpatel updated this revision to Diff 385627.Nov 8 2021, 2:00 PM

small fixes

JonChesterfield accepted this revision.Nov 8 2021, 2:44 PM

Thanks! Delighted to see the subclasses gone

atmnpatel updated this revision to Diff 385693.Nov 8 2021, 9:15 PM

fix failing clang tests

atmnpatel reopened this revision.Nov 9 2021, 12:08 PM
This revision is now accepted and ready to land.Nov 9 2021, 12:08 PM
atmnpatel updated this revision to Diff 385922.Nov 9 2021, 12:08 PM

Forgot to add definition in interface file

JonChesterfield accepted this revision.Nov 9 2021, 12:10 PM
This revision was landed with ongoing or failed builds.Nov 9 2021, 12:11 PM
This revision was automatically updated to reflect the committed changes.
clang/test/OpenMP/nvptx_target_parallel_reduction_codegen_tbaa_PR46146.cpp