Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC later because it is not in the entry block.
I believe it is valid to insert the alloca in the entry block, but I'm not confident the way I accomplish it is correct.
Paths
| Differential D50104
[OpenCL] Always emit alloca in entry block for enqueue_kernel builtin ClosedPublic Authored by scott.linder on Jul 31 2018, 2:25 PM.
Details Summary Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC later because it is not in the entry block. I believe it is valid to insert the alloca in the entry block, but I'm not confident the way I accomplish it is correct.
Diff Detail Event TimelineComment Actions Address feedback; I hope I understood correctly what debug info to check for. I don't see where in CreateMemTemp and friends EmitLifetimeStart gets called, and I don't see any lifetime intrinsics in the IR even at -O1. Comment Actions You'll probably also need to update test/CodeGenOpenCL/cl20-device-side-enqueue.cl; please verify with make/ninja check-clang. This revision is now accepted and ready to land.Aug 2 2018, 1:34 AM Comment Actions
Emitting lifetime intrinsic is optional. In this case, since the life time of the temp var is just before and after the function call, emitting lifetime intrinsics can help optimizers. It can be done by code like this: if (auto *Size = EmitLifetimeStart( CGM.getDataLayout().getTypeAllocSize(Alloca.getElementType()), Alloca.getPointer())) { pushFullExprCleanup<CallLifetimeEnd>(NormalEHLifetimeMarker, Alloca, Size); } Then the lifetime.start should be emitted before the function call and lifetime.end should be emitted just after the function call. Comment Actions I still don't quite see what you describe; with that change all of the lifetime.end calls pile up just before the enclosing function returns, not after each call to enqueue_kernel. Looking at https://clang.llvm.org/doxygen/EHScopeStack_8h_source.html#l00078 I don't see any option which isn't based on scope. The lifetime.start calls do occur where I would expect, though, so I will update the patch. Comment Actions
Sorry my mistake. In this case, the full expressions seems to be the calling function, so using pushFullExprCleanup to emit lifetime.end does not work well here. You need to call EmitLifetimeEnd explicitly after emitting the function call. Closed by commit rL338899: [OpenCL] Always emit alloca in entry block for enqueue_kernel builtin (authored by scott.linder). · Explain WhyAug 3 2018, 8:51 AM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 158816 lib/CodeGen/CGBuiltin.cpp
test/CodeGenOpenCL/cl20-device-side-enqueue.cl
test/CodeGenOpenCL/enqueue-kernel-non-entry-block.cl
|
You may try CreateMemTemp. It should handle the insert position and also debug info.