Page MenuHomePhabricator

[OpenMP] Replace GPU globalization calls with shared memory in the middle-end
AcceptedPublic

Authored by jhuber6 on Mar 2 2021, 4:34 PM.

Details

Summary

The changes introduced in D97680 create a simpler interface to code that needs to be globalized. This interface is used to simplify the globalization calls in the middle end. We can check any globalization call that is only called by a single thread in the team and replace it with a static shared memory buffer.

Diff Detail

Event Timeline

jhuber6 created this revision.Mar 2 2021, 4:34 PM
jhuber6 requested review of this revision.Mar 2 2021, 4:34 PM

some initial comments.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
991
994

No cast needed

999

prefer early exit, if (!...) return false;

1002

No need for it to be a constant

1006

Don't assume an order. Check all users, one should be a free, others can be whatever. If you find bitcast users, remember the type, if they all agree, use that for the alloca.

1708

no need to go over the free calls. they need to be users of the alloc and we remove them with the alloc.

1715

No need, use *CI below.

We can reuse Attributor HeapToStack for this :)

jhuber6 updated this revision to Diff 330353.Mar 12 2021, 1:24 PM
jhuber6 retitled this revision from [OpenMP] Remove GPU globalization calls in the middle-end to [OpenMP] Replace GPU globalization calls with shared memory in the middle-end.
jhuber6 edited the summary of this revision. (Show Details)

Changing this optimization to replace the globalization calls with shared memory. Removing them will be done by the attributor using HeapToStack once we add the allocation calls and improve the attributor.

jdoerfert added inline comments.Mar 17 2021, 9:05 AM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
68

I don't think we need this after all.

1010

For now, check isKernel(F) and only do this for kernel functions. Later we can be more aggressive but for now that should limit it properly, also with regards to the lifetime of those allocations.

1068

You cannot recollect while looping.
Move the recollect for free after the loop, delete the other one. Return true from this function when it deleted the use, that will update the alloc use list.

1072

return true;

jhuber6 updated this revision to Diff 331375.Wed, Mar 17, 2:21 PM
jhuber6 edited the summary of this revision. (Show Details)

Changing the test to simply check if we are in a non-SPMD kernel function. A more advanced approach can be used in the future.

jhuber6 updated this revision to Diff 331676.Thu, Mar 18, 1:30 PM
jdoerfert accepted this revision.Thu, Mar 18, 4:20 PM

LGTM, one nit.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
1031

Can you use a enum here or variable instead of "3".

This revision is now accepted and ready to land.Thu, Mar 18, 4:20 PM
jhuber6 updated this revision to Diff 332837.Tue, Mar 23, 5:48 PM

Making changes.

jhuber6 updated this revision to Diff 332842.Tue, Mar 23, 6:27 PM

Adding enum for address space constant.