I used https://github.com/zjin-lcf/HeCBench (with nvcc usage swapped to clang++), which is an adaptation of the classic Rodinia benchmarks aimed at CUDA and SYCL programming models, to compare different values of the multiplier using both clang++ cuda and clang++ sycl nvptx backends.
I find that the value is currently too low for both cases. Qualitatively (and in most cases there is very a close quantitative agreement across both cases) the change in code execution time for a range of values from 5 to 1000 matches in both variations (CUDA clang++ vs SYCL (with cuda backend) using the intel/llvm clang++ compiler) of the HeCbench samples.
This value of 11 is optimal for clang++ cuda for all cases I've investigated. I have not found a single case where performance is deprecated by this change of the value from 5 to 11. For one sample the sycl cuda backend preferred a higher value. However we are happy to prioritize clang++ cuda, and we find that this value is close to ideal for both cases anyway.
It would be good to do some further investigation using clang++ openmp cuda offload. However since I do not know of an appropriate set of benchmarks for this case, and the fact that we are now getting complaints about register spills related to insufficient inlining on a weekly basis, we have decided to propose this change and potentially seek some more input from someone who may have more expertise in the openmp case.
Incidentally this value coincides with the value used for the amd-gcn backend. We have also been able to use the amd backend of the intel/llvm "dpc++" compiler to compare the inlining behaviour of an identical code when targetting amd (compared to nvptx). Unsurprisingly the amd backend with a multiplier value of 11 was performing better (with regard to inlining) than the nvptx case when the value of 5 was used. When the two backends use the same multiplier value the inlining behaviors appear to align closely.
This also considerably improves the performance of at least one of the most popular HPC applications: NWCHEMX.
Signed-off-by: JackAKirk <jack.kirk@codeplay.com>
Comment is out of date, and naming the value here is kind of pointless to begin with