This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][CUDA] Cache the maximal number of threads per block (per kernel)
ClosedPublic

Authored by jdoerfert on Aug 16 2020, 8:57 AM.

Details

Summary

Instead of calling cuFuncGetAttribute with
CU_FUNC_ATTRIBUTE_MAX_THREADS_PER_BLOCK for every kernel invocation,
we can do it for the first one and cache the result as part of the
KernelInfo struct. The only functional change is that we now expect
cuFuncGetAttribute to succeed and otherwise propagate the error.
Ignoring any error seems like a slippery slope...

Diff Detail

Event Timeline

jdoerfert created this revision.Aug 16 2020, 8:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 16 2020, 8:57 AM
jdoerfert requested review of this revision.Aug 16 2020, 8:57 AM
JonChesterfield accepted this revision.Aug 16 2020, 9:27 AM

LGTM

openmp/libomptarget/plugins/cuda/src/rtl.cpp
892

tab

This revision is now accepted and ready to land.Aug 16 2020, 9:27 AM
tianshilei1992 added inline comments.Aug 16 2020, 9:59 AM
openmp/libomptarget/plugins/cuda/src/rtl.cpp
79

Do we need to change the front end? This data structure should be generated by the FE, right?

openmp/libomptarget/plugins/cuda/src/rtl.cpp
79

Oh, my bad. It is not. The pointer is replaced during the initialization.

This revision was landed with ongoing or failed builds.Aug 16 2020, 12:40 PM
This revision was automatically updated to reflect the committed changes.