It allows customizing MAX_SM for non-flagship GPU and reduces graphic memory usage.
In addition, so far the size is hard-coded up to CUDA_ARCH 700 and is already a hassle for 800.
Introduce MAX_SM for 800 and protect future arch
Differential D88185
[OpenMP] cmake option LIBOMPTARGET_NVPTX_MAX_SM for nvptx device RTL ye-luo on Sep 23 2020, 3:14 PM. Authored by
Details It allows customizing MAX_SM for non-flagship GPU and reduces graphic memory usage. In addition, so far the size is hard-coded up to CUDA_ARCH 700 and is already a hassle for 800.
Diff Detail
Event TimelineComment Actions Change seems reasonable. Amdgcn could benefit from the same, e.g. for trying to get apu systems with about 8 CU to run openmp code. Suggest we do that in a different patch if someone asks for it. I'd like to get rid of the structure this macro controls entirely but don't have a good time estimate for that. This looks like a good idea in the meantime.
|
Can we distinguish between GA100 and GA102? This structure is large so oversizing wastes significant memory.