This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Identify GPU kernels (aka. OpenMP target regions)
ClosedPublic

Authored by jdoerfert on Jul 6 2020, 6:07 PM.

Details

Summary

We now identify GPU kernels, that is entry points into the GPU code.
These kernels (can) correspond to OpenMP target regions. With this patch
we identify and on request print them via remarks.

Diff Detail

Event Timeline

jdoerfert created this revision.Jul 6 2020, 6:07 PM
Herald added a reviewer: baziotis. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
okura removed a subscriber: okura.Jul 6 2020, 9:47 PM
tianshilei1992 added a comment.EditedJul 7 2020, 7:44 AM

LGTM but I would like others to take a look.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
1186

This line of change looks not related to this patch.

jdoerfert marked an inline comment as done.Jul 7 2020, 8:46 AM
jdoerfert added inline comments.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
1186

It is. I needed to get rid of the return statements and I wanted to keep the "early exit" out of the if-cascade. Entrance: else.

I think there's slightly more code here than is necessary.

Specifically, I think identifyKernels should return SmallPtrSetImpl<Kernel> instead of populating a member variable which can later be accessed. With a rename, proposing:
SmallPtrSetImpl<Kernel> getKernels(Module &M){/*roughly contents of current identifyKernels */}

The cache then stores the set by value instead of by reference. Less state lying around, can't accidentally add multiple copies of the name to a single set. Depending on the control flow we might look up the metadata more than once, but that seems fine given it usually goes in a cache.

Thoughts?

I think there's slightly more code here than is necessary.

Specifically, I think identifyKernels should return SmallPtrSetImpl<Kernel> instead of populating a member variable which can later be accessed. With a rename, proposing:
SmallPtrSetImpl<Kernel> getKernels(Module &M){/*roughly contents of current identifyKernels */}

The cache then stores the set by value instead of by reference. Less state lying around, can't accidentally add multiple copies of the name to a single set. Depending on the control flow we might look up the metadata more than once, but that seems fine given it usually goes in a cache.

Thoughts?

We will end up looking at it once per SCC in the program, per invocation of the pass. I would prefer to cache module wide information explicitly and this was the "smallest" solution for this for now.
I can do recompute but the nvvm.annotations has ~100 (non-kernel) entries from the device runtime we'll have to go through every time.

JonChesterfield accepted this revision.Jul 10 2020, 8:28 AM

Fair enough, stateful it is then.

This revision is now accepted and ready to land.Jul 10 2020, 8:28 AM
This revision was automatically updated to reflect the committed changes.