This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] cmake option LIBOMPTARGET_NVPTX_MAX_SM for nvptx device RTL
ClosedPublic

Authored by ye-luo on Sep 23 2020, 3:14 PM.

Details

Summary

It allows customizing MAX_SM for non-flagship GPU and reduces graphic memory usage.

In addition, so far the size is hard-coded up to CUDA_ARCH 700 and is already a hassle for 800.
Introduce MAX_SM for 800 and protect future arch

Diff Detail

Event Timeline

ye-luo created this revision.Sep 23 2020, 3:14 PM
ye-luo requested review of this revision.Sep 23 2020, 3:14 PM

Change seems reasonable. Amdgcn could benefit from the same, e.g. for trying to get apu systems with about 8 CU to run openmp code. Suggest we do that in a different patch if someone asks for it.

I'd like to get rid of the structure this macro controls entirely but don't have a good time estimate for that. This looks like a good idea in the meantime.

openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h
62

Can we distinguish between GA100 and GA102? This structure is large so oversizing wastes significant memory.

ye-luo added inline comments.Sep 23 2020, 5:11 PM
openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h
62

GA100 is CUDA_ARCH 800. GA102 is 860.
There are also 700, 720, 750
I don't really feel the necessity to add more resolution because LIBOMPTARGET_NVPTX_MAX_SM can be leveraged.

openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h
62

It could matter to someone with a GA102 who hasn't read the cmake. Back of envelope math suggests there's a little under a gigabyte of allocated but unused memory between 84 and 108.

ye-luo added inline comments.Sep 23 2020, 6:08 PM
openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h
62

On arch 600, My measurement between 56 and 6 indicates about 500MB difference. So I expect 200MB difference and should matter little to GA102 owners. RTX 3070 has 8GB.

openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h
62

Measuring beats mental arithmetic against a different arch. Amdgpu was 2.1gb w/64, so about 30mb/SM. Sort of glad to hear nvptx is smaller per SM.

JonChesterfield accepted this revision.Sep 24 2020, 7:46 AM

LGTM. Change looks safe as-is and we can add finer granularity later.

This revision is now accepted and ready to land.Sep 24 2020, 7:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 24 2020, 9:40 AM