This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix lowering enqueue_kernel
ClosedPublic

Authored by yaxunl on Apr 2 2018, 2:44 PM.

Details

Summary

Two issues were fixed:

  1. runtime has difficulty to allocate memory for an external symbol of a

kernel and set the address of the external symbol, therefore make the runtime
handle of an enqueued kernel an ordinary global variable. Runtime only needs
to store the address of the loaded kernel to the handle and has verified
that this approach works.

  1. handle the situation where __enqueue_kernel* gets inlined therefore

the enqueued kernel may be used through a constant expr instead
of an instruction.

Diff Detail

Event Timeline

yaxunl created this revision.Apr 2 2018, 2:44 PM
rampitec added inline comments.Apr 2 2018, 3:03 PM
lib/Target/AMDGPU/AMDGPUOpenCLEnqueuedBlockLowering.cpp
98

You can call callectCallers only if F was not already in the set, e.g. if (Funcs.insert(F).second).

yaxunl updated this revision to Diff 140800.Apr 3 2018, 8:21 AM
yaxunl marked an inline comment as done.
yaxunl edited the summary of this revision. (Show Details)

Revise by Stas' comments and add the change about linkage omitted by accident.

This revision is now accepted and ready to land.Apr 10 2018, 11:56 AM
This revision was automatically updated to reflect the committed changes.