This is an archive of the discontinued LLVM Phabricator instance.

[OpenCL] Mark group functions as convergent in opencl-c.h
ClosedPublic

Authored by yaxunl on Oct 6 2016, 1:00 PM.

Details

Summary

Certain OpenCL builtin functions are supposed to be executed by all threads in a work group or sub group. Such functions should not be made divergent during transformation. It makes sense to mark them with convergent attribute.

The adding of convergent attribute is based on Ettore Speziale's work and the original proposal and patch can be found at https://www.mail-archive.com/cfe-commits@lists.llvm.org/msg22271.html.

A test is added to demonstrate that convergent attribute prevents invalid transformations and allows valid loop unrolling whereas noduplicate does not.

Diff Detail

Event Timeline

yaxunl updated this revision to Diff 73839.Oct 6 2016, 1:00 PM
yaxunl retitled this revision from to [OpenCL] Add noduplicate to group functions opencl-c.h.
yaxunl updated this object.
yaxunl added reviewers: Anastasia, bader.
yaxunl retitled this revision from [OpenCL] Add noduplicate to group functions opencl-c.h to [OpenCL] Mark group functions as noduplicate in opencl-c.h.
yaxunl added subscribers: cfe-commits, b-sumner.
arsenm added a subscriber: arsenm.Oct 6 2016, 7:15 PM

These should be convergent instead

clang does not recognize convergent as a valid attribute. There was an attempt to add this, see https://www.mail-archive.com/cfe-commits@lists.llvm.org/msg22271.html but it hasn't had any result. Matt do you see "real uses" for this now?

yaxunl added a comment.Oct 7 2016, 7:53 AM

These should be convergent instead

Unfortunately convergent is not supported as Clang attribute. There was patch to add it but the author withdrew it.

Anastasia/Alexey,

Do you think it is a good idea to add attribute((convergent)) to clang and mark these group functions with it?

convergent put more restrictions on what transformations are allowed, whereas noduplicate only forbids duplicate of the function calls.

Thanks.

Anastasia edited edge metadata.Oct 8 2016, 4:52 AM

Do you have any code example where Clang/LLVM performs wrong optimizations with respect to the control flow of SPMD execution?

My understanding from the earlier discussion we have had: https://www.mail-archive.com/cfe-commits@lists.llvm.org/msg22643.html that noduplicate is essentially enough for the frontend to prevent erroneous optimizations. Because in general compiler can't do much with unknown function calls.

For LLVM intrinsics it is slightly different as I can deduce from this discussion:http://lists.llvm.org/pipermail/llvm-dev/2015-May/085558.html . It seems like by default it's assumed to be side effect free and can be optimized in various ways.

Do you have any code example where Clang/LLVM performs wrong optimizations with respect to the control flow of SPMD execution?

My understanding from the earlier discussion we have had: https://www.mail-archive.com/cfe-commits@lists.llvm.org/msg22643.html that noduplicate is essentially enough for the frontend to prevent erroneous optimizations. Because in general compiler can't do much with unknown function calls.

noduplicate is enough for correctness, but it prevents legal optimizations, like unrolling loops with barriers. The convergent attribute was added specifically for these kinds of builtins, so we should be using it here instead of noduplicate.

For LLVM intrinsics it is slightly different as I can deduce from this discussion:http://lists.llvm.org/pipermail/llvm-dev/2015-May/085558.html . It seems like by default it's assumed to be side effect free and can be optimized in various ways.

yaxunl updated this revision to Diff 74394.Oct 12 2016, 9:26 AM
yaxunl retitled this revision from [OpenCL] Mark group functions as noduplicate in opencl-c.h to [OpenCL] Mark group functions as convergent in opencl-c.h.
yaxunl updated this object.
yaxunl added a reviewer: aaron.ballman.
aaron.ballman added inline comments.Oct 12 2016, 11:26 AM
include/clang/Basic/AttrDocs.td
612

on a function declaration

613

s/to/into the
s/the call/that the call

621

s/since/because

629

Since this is a C block, perhaps that should be void convfunc(void) instead?

test/CodeGenOpenCL/convergent.cl
119

Missing the Sema tests for the attribute's semantics (applies only to functions, accepts no args, etc).

yaxunl updated this revision to Diff 74514.Oct 13 2016, 7:44 AM
yaxunl marked 5 inline comments as done.

Revised by Aaron's comments. Added a sema test.

aaron.ballman accepted this revision.Oct 13 2016, 9:59 AM
aaron.ballman edited edge metadata.

Aside from some minor nits, the attribute functionality looks fine to me. As to whether we think this is a worthy attribute to add or not, I leave that to people who know CUDA and OpenCL better than I do, so please wait for @tstellarAMD or @Anastasia to sign off before committing.

include/clang/Basic/AttrDocs.td
612

"declarations" should be singular (declaration).

test/SemaOpenCL/convergent.cl
2

Should add -fsyntax-only to the test.

This revision is now accepted and ready to land.Oct 13 2016, 9:59 AM
yaxunl updated this revision to Diff 74541.Oct 13 2016, 10:31 AM
yaxunl edited edge metadata.

Minor revision by Aaron's comments.

Do you have any code example where Clang/LLVM performs wrong optimizations with respect to the control flow of SPMD execution?

My understanding from the earlier discussion we have had: https://www.mail-archive.com/cfe-commits@lists.llvm.org/msg22643.html that noduplicate is essentially enough for the frontend to prevent erroneous optimizations. Because in general compiler can't do much with unknown function calls.

noduplicate is enough for correctness, but it prevents legal optimizations, like unrolling loops with barriers. The convergent attribute was added specifically for these kinds of builtins, so we should be using it here instead of noduplicate.

Should we deprecate noduplicate then as convergent should cover both use cases for OpenCL I believe? As far as I understand noduplicate was added specifically for SPMD use cases...

include/clang/Basic/AttrDocs.td
630

Did you mean "in an OpenCL program"?

test/CodeGenOpenCL/convergent.cl
18

-> non_convfun();

yaxunl updated this revision to Diff 75203.Oct 19 2016, 12:50 PM
yaxunl marked 4 inline comments as done.

Fix typo in test.

Should we deprecate noduplicate then as convergent should cover both use cases for OpenCL I believe? As far as I understand noduplicate was added specifically for SPMD use cases...

noduplicate has different semantics than convergent. Although it is proposed for SPMD originally, it could be used by non-SPMD programs to forbid duplicate of functions. There may be applications using this attribute.

I would suggest to leave this question for future. Probably ask llvm-dev first since the attribute is also in LLVM.

include/clang/Basic/AttrDocs.td
630

No.
In C++ program only, it is also valid to set it as C++11 attribute.
OpenCL does not support C++11 attribute.

Do you have any code example where Clang/LLVM performs wrong optimizations with respect to the control flow of SPMD execution?

My understanding from the earlier discussion we have had: https://www.mail-archive.com/cfe-commits@lists.llvm.org/msg22643.html that noduplicate is essentially enough for the frontend to prevent erroneous optimizations. Because in general compiler can't do much with unknown function calls.

noduplicate is enough for correctness, but it prevents legal optimizations, like unrolling loops with barriers. The convergent attribute was added specifically for these kinds of builtins, so we should be using it here instead of noduplicate.

For LLVM intrinsics it is slightly different as I can deduce from this discussion:http://lists.llvm.org/pipermail/llvm-dev/2015-May/085558.html . It seems like by default it's assumed to be side effect free and can be optimized in various ways.

Do you have any code example where Clang/LLVM performs wrong optimizations with respect to the control flow of SPMD execution?

My understanding from the earlier discussion we have had: https://www.mail-archive.com/cfe-commits@lists.llvm.org/msg22643.html that noduplicate is essentially enough for the frontend to prevent erroneous optimizations. Because in general compiler can't do much with unknown function calls.

noduplicate is enough for correctness, but it prevents legal optimizations, like unrolling loops with barriers. The convergent attribute was added specifically for these kinds of builtins, so we should be using it here instead of noduplicate.

For LLVM intrinsics it is slightly different as I can deduce from this discussion:http://lists.llvm.org/pipermail/llvm-dev/2015-May/085558.html . It seems like by default it's assumed to be side effect free and can be optimized in various ways.

Tom, as far as I understand a valid generalized use case with a barrier is as follows:

a = compute() 
barrier()
use(a)

Regarding unrolling I don't see how the barrier can be unrolled without breaking the correctness of OpenCL programs because all compute() calls of one WG have to complete before use(a) is invoked for the first time. I am not an expert in LLVM transformations but if I read the description of both noduplicate and convergent (http://llvm.org/docs/LangRef.html) none of those seem to make sense to me for the barrier because both allow reordering within the same function or BB respectively and the only valid way seems to mark it as a side-effect intrinsic to prevent any reordering of the barrier itself. I understand within compute() or use() independently there are not limitations on ordering of instructions.

If you have any specific example in mind where the unrolling would work could you please elaborate on it. I think it's essential for the community to undertsnad the use cases for noduplicate/convergent/sideeffect to make better use of them also across similar models i.e. OpenCL, CUDA, etc.

Anastasia accepted this revision.Oct 31 2016, 10:30 AM
Anastasia edited edge metadata.

LGTM! Thanks! Could you please address the last comment before committing?

test/CodeGenOpenCL/convergent.cl
54

Did you mean to check @convfun() here?

yaxunl added inline comments.Oct 31 2016, 11:11 AM
test/CodeGenOpenCL/convergent.cl
54

Yes. Will fix it when committing.

This revision was automatically updated to reflect the committed changes.