The changes introduced in D97680 create a simpler interface to code that needs to be globalized. This interface is used to simplify the globalization calls in the middle end. We can check any globalization call that is only called by a single thread in the team and replace it with a static shared memory buffer.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
some initial comments.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
1033 | ||
1036 | No cast needed | |
1041 | prefer early exit, if (!...) return false; | |
1044 | No need for it to be a constant | |
1048 | Don't assume an order. Check all users, one should be a free, others can be whatever. If you find bitcast users, remember the type, if they all agree, use that for the alloca. | |
1733 | no need to go over the free calls. they need to be users of the alloc and we remove them with the alloc. | |
1740 | No need, use *CI below. |
Changing this optimization to replace the globalization calls with shared memory. Removing them will be done by the attributor using HeapToStack once we add the allocation calls and improve the attributor.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
68 | I don't think we need this after all. | |
1010 | For now, check isKernel(F) and only do this for kernel functions. Later we can be more aggressive but for now that should limit it properly, also with regards to the lifetime of those allocations. | |
1068 | You cannot recollect while looping. | |
1072 | return true; |
Changing the test to simply check if we are in a non-SPMD kernel function. A more advanced approach can be used in the future.
LGTM, one nit.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
1073 | Can you use a enum here or variable instead of "3". |
Changing the method for determining valid allocation calls to use AAExecutionDomain introduced in D101578. Currently, this doesn't work with the code Clang generates but will soon.
Adding a pattern match check so this picks up on the code Clang currently generates. This is a temporary solution to improve performance until a full runtime rewrite can be landed.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
1036 | Nit: hoist this out of the lambda. | |
1061 | You have to check that a unique free call was found or give up otherwise. | |
1078 | You might be able to replace these two with just: ConstantExpr::getPointerCast | |
2467 | Check the Pred as well. Maybe try to split this in multiple matches or somehow to make it easier to read. |
I looked over this again because I think we will later move the logic into an AbstractAttribute. The reason is that we want to reuse some of the analysis in AAKernelInfo to determine if we can use SPMD mode. Basically, if the shared memory is only used in a nice way and fees into an actual parallel51, it is compatible with SPMD mode. If we go to SPMD mode we have (or better should) eliminate the shared memory indirection. Maybe we move all of this into AAHeapToStack and make a more general AAAllocationInfo out of it. Let's see.
That said, I noticed a problem below. Please add a test that misses the free, and/or that has a phi before the free, to make sure we do not globalize.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
1065 | if (!FC) return false; Also move the debug below the exists. |
Change the implementation to use the Attributor. This is because the attributor after registering HeapToStack in D102197 would change the call sites. Putting this pass inside an attributor instance allows both to run and now globalization will only be replaced if HeapToStack doesn't remove it first.
Changing alignment of new global variable, alignment of 8 caused errors in some applications.
I don't think we need this after all.