This is an archive of the discontinued LLVM Phabricator instance.

[OpenCL] Always emit alloca in entry block for enqueue_kernel builtin
ClosedPublic

Authored by scott.linder on Jul 31 2018, 2:25 PM.

Details

Summary

Ensures the statically sized alloca is not converted to DYNAMIC_STACKALLOC later because it is not in the entry block.

I believe it is valid to insert the alloca in the entry block, but I'm not confident the way I accomplish it is correct.

Diff Detail

Repository
rL LLVM

Event Timeline

scott.linder created this revision.Jul 31 2018, 2:25 PM
yaxunl added inline comments.Jul 31 2018, 2:51 PM
lib/CodeGen/CGBuiltin.cpp
3342 ↗(On Diff #158385)

You may try CreateMemTemp. It should handle the insert position and also debug info.

test/CodeGenOpenCL/enqueue-kernel-non-entry-block.cl
2 ↗(On Diff #158385)

Can we have a run line for debug info?

Should this also test for lifetime markers?

Address feedback; I hope I understood correctly what debug info to check for.

I don't see where in CreateMemTemp and friends EmitLifetimeStart gets called, and I don't see any lifetime intrinsics in the IR even at -O1.

Anastasia edited reviewers, added: svenvh; removed: Anastasia.Aug 1 2018, 9:16 AM
Anastasia added a subscriber: Anastasia.

You'll probably also need to update test/CodeGenOpenCL/cl20-device-side-enqueue.cl; please verify with make/ninja check-clang.

Update test

svenvh accepted this revision.Aug 2 2018, 1:34 AM

LGTM, thanks!

This revision is now accepted and ready to land.Aug 2 2018, 1:34 AM
yaxunl added a comment.Aug 2 2018, 6:50 AM

Address feedback; I hope I understood correctly what debug info to check for.

I don't see where in CreateMemTemp and friends EmitLifetimeStart gets called, and I don't see any lifetime intrinsics in the IR even at -O1.

Emitting lifetime intrinsic is optional. In this case, since the life time of the temp var is just before and after the function call, emitting lifetime intrinsics can help optimizers.

It can be done by code like this:

 if (auto *Size = EmitLifetimeStart(
         CGM.getDataLayout().getTypeAllocSize(Alloca.getElementType()),
         Alloca.getPointer())) {
   pushFullExprCleanup<CallLifetimeEnd>(NormalEHLifetimeMarker, Alloca,
                                          Size);
}

Then the lifetime.start should be emitted before the function call and lifetime.end should be emitted just after the function call.

I still don't quite see what you describe; with that change all of the lifetime.end calls pile up just before the enclosing function returns, not after each call to enqueue_kernel. Looking at https://clang.llvm.org/doxygen/EHScopeStack_8h_source.html#l00078 I don't see any option which isn't based on scope. The lifetime.start calls do occur where I would expect, though, so I will update the patch.

I still don't quite see what you describe; with that change all of the lifetime.end calls pile up just before the enclosing function returns, not after each call to enqueue_kernel. Looking at https://clang.llvm.org/doxygen/EHScopeStack_8h_source.html#l00078 I don't see any option which isn't based on scope. The lifetime.start calls do occur where I would expect, though, so I will update the patch.

Sorry my mistake. In this case, the full expressions seems to be the calling function, so using pushFullExprCleanup to emit lifetime.end does not work well here.

You need to call EmitLifetimeEnd explicitly after emitting the function call.

Emit lifetime intrinsics for the sizes temp, and update test

yaxunl accepted this revision.Aug 2 2018, 6:59 PM

LGTM. Thanks!

This revision was automatically updated to reflect the committed changes.