We now identify GPU kernels, that is entry points into the GPU code.
These kernels (can) correspond to OpenMP target regions. With this patch
we identify and on request print them via remarks.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
LGTM but I would like others to take a look.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
1186 | This line of change looks not related to this patch. |
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
1186 | It is. I needed to get rid of the return statements and I wanted to keep the "early exit" out of the if-cascade. Entrance: else. |
I think there's slightly more code here than is necessary.
Specifically, I think identifyKernels should return SmallPtrSetImpl<Kernel> instead of populating a member variable which can later be accessed. With a rename, proposing:
SmallPtrSetImpl<Kernel> getKernels(Module &M){/*roughly contents of current identifyKernels */}
The cache then stores the set by value instead of by reference. Less state lying around, can't accidentally add multiple copies of the name to a single set. Depending on the control flow we might look up the metadata more than once, but that seems fine given it usually goes in a cache.
Thoughts?
We will end up looking at it once per SCC in the program, per invocation of the pass. I would prefer to cache module wide information explicitly and this was the "smallest" solution for this for now.
I can do recompute but the nvvm.annotations has ~100 (non-kernel) entries from the device runtime we'll have to go through every time.
This line of change looks not related to this patch.