Details
Diff Detail
Event Timeline
Can you please add a reference to the document that described these extensions in the description of your patch. Thanks!
The specification of these extensions is https://gitlab.khronos.org/opencl/OpenCL-Docs/blob/2019-extensions/ext/cl_khr_subgroup_extensions.asciidoc
clang/lib/Headers/opencl-c.h | ||
---|---|---|
15594 | Ideally convergent attribute was intended to be used in the non-divergent scenarios. So I don't know if it's going to do what is needed here. Did you look into this already? If we look at the Clang documentation it says:
I remember @nhaehnle was looking at using convergent with operations in the divergent control flow some time ago https://reviews.llvm.org/D68994? I am not sure where this thread ended up and whether we can expect this to work currently? |
clang/lib/Headers/opencl-c.h | ||
---|---|---|
15594 | Thanks for pointing this out! My understanding of the convergent attribute was that it's for uniform control flow, as per the documentation you cited. |
clang/lib/Headers/opencl-c.h | ||
---|---|---|
15594 |
Did you check on the examples that @nhaehnle provided in the review?
I am not sure there is a solution to this in the upstream LLVM at present. I am hoping @nhaehnle can provide us more information. Alternatively, we could run his examples against the latest LLVM revision to see if the problem still remains or not. I am not saying that it should block this review, I am ok that we commit your patch but if we know already that this functionality can't be fully supported it would be good to at least create a bug to OpenCL component to record this. |
clang/lib/Headers/opencl-c.h | ||
---|---|---|
15594 | The following example: kernel void test(global int *data) { uint id = (uint) get_global_id(0); if (id < 4) data[id] = sub_group_elect(); else data[id] = sub_group_elect(); } with clang -S -emit-llvm emits invalid code both with and without the convergent attribute. We don't have this problem in our Intel Graphics Compiler though. That's because we replace the subgroup functions with internal intrinsics early enough. These intrinsics are marked with both the convergent and inaccessiblememonly attributes. Then in the SimplifyCFG pass we prevent optimization if both of these attributes are present: https://github.com/intel/intel-graphics-compiler/blob/master/IGC/Compiler/GenTTI.cpp#L397 I don't understand what can go wrong in the second example (with the jump threading pass) ? |
clang/lib/Headers/opencl-c.h | ||
---|---|---|
15527 | do we need to add vetor types too? Although the definition of gentype is missing in the documentation you have linked... so I am not certain what it means... | |
15574 | Can we move this above sub_group_reduce_add group please? | |
15594 | Thanks for checking. Do you mind creating a bug to clang for now under OpenCL component https://bugs.llvm.org/enter_bug.cgi?product=clang although we might reclassify this for a wider scope later on.
I see. Perhaps we need to come up with a new semantic of convergent and update LLVM passes... or maybe we have to introduce a new attribute. Not sure. I suggest we add a comment to https://reviews.llvm.org/D68994 and see if it gets picked up. | |
15597 | should this not be for gentype? Although the definition of gentype is missing in the documentation you have linked... | |
15618 | Ok, the grouping is somehow different here from the above where you have all version of the same element type first. | |
15689 | btw this is the spec question really but do you happen to know why those are particularly for vectors with 4 elements? | |
15779 | Interesting! Work group and the normal subroups functions don't seem to have this overload counterpart... | |
15975 | here we are also missing vector types? |
clang/lib/Headers/opencl-c.h | ||
---|---|---|
15527 | No, vector types are for broadcast and non_uniform_broadcast only. Search the documentation for "gentype type". | |
15574 | The convention (already present in this file) is that half and double overloads are specified last for every extension. The alternative of keeping the overloads together would result in more #ifs, especially for cl_khr_subgroups and cl_khr_subgroup_non_uniform_arithmetic. I would prefer consistency and not multiplying #ifs. | |
15594 | Do you mean a bug for the invalid optimizations of these new subgroup functions? | |
15597 |
| |
15618 | What are you referring to? | |
15689 | I don't know, but will ask the spec author. | |
15779 | If you mean mul, yes, that's unique here. | |
15975 | No, as I wrote above, vector types are for broadcast only. |
clang/lib/Headers/opencl-c.h | ||
---|---|---|
15689 | It's because the SPIR-V Specification says so. |
clang/lib/Headers/opencl-c.h | ||
---|---|---|
15594 |
yes, you can just copy/paste your test above:
If you can it would be good to highlight what is invalid in IR produced by upstream LLVM now. | |
15597 | Ok thanks I see now. "Summary of New OpenCL C Functions" made me confused as gentype appears to be the same for all functions there... maybe it makes sense to update it. | |
15618 | Above you first list all vector types then move to another element type: uchar __ovld __conv sub_group_broadcast( uchar value, uint index ); uchar2 __ovld __conv sub_group_broadcast( uchar2 value, uint index ); uchar3 __ovld __conv sub_group_broadcast( uchar3 value, uint index ); uchar4 __ovld __conv sub_group_broadcast( uchar4 value, uint index ); uchar8 __ovld __conv sub_group_broadcast( uchar8 value, uint index ); uchar16 __ovld __conv sub_group_broadcast( uchar16 value, uint index ); short __ovld __conv sub_group_broadcast( short value, uint index ); short2 __ovld __conv sub_group_broadcast( short2 value, uint index ); short3 __ovld __conv sub_group_broadcast( short3 value, uint index ); short4 __ovld __conv sub_group_broadcast( short4 value, uint index ); short8 __ovld __conv sub_group_broadcast( short8 value, uint index ); short16 __ovld __conv sub_group_broadcast( short16 value, uint index ); here you first list all scalars then all vecor of 2 elements then all vetors of 3 elements... | |
15689 | Thanks! |
Reordered the sub_group_non_uniform_broadcast overloads for consistency with sub_group_broadcast.
clang/lib/Headers/opencl-c.h | ||
---|---|---|
15594 | Shall I submit the bug before this change is merged? | |
15597 | I forwarded your request to the spec editor. | |
15618 | Looks like both orderings exist in this file. I updated the diff, reordering sub_group_non_uniform_broadcast to match the order of sub_group_broadcast. This ordering is more popular and reads better to me. |
LGTM! Thanks!
clang/include/clang/Basic/OpenCLExtensions.def | ||
---|---|---|
68 | I think it makes sense to allow this extension from the same OpenCL version as normal subgroup. However the version listed somewhere in the spec? | |
clang/lib/Headers/opencl-c.h | ||
15594 | The order is not important as long as we have it eventually. :) |
clang/include/clang/Basic/OpenCLExtensions.def | ||
---|---|---|
68 | The spec doesn't mention the OpenCL version. Quoting the editor (Ben Ashbaugh):
I agree that matching the version of cl_khr_subgroups makes sense. Updated together with the tests. |
Great! Thanks! Feel free to go ahead and commit this!
Can you please add a link to the bug describing incorrect optimization to this review once you create it.
clang/include/clang/Basic/OpenCLExtensions.def | ||
---|---|---|
68 | I would say in upstream Clang we try to follow the commonly accepted and well-documented behavior. We have no capacity to work out what the behavior of every possible customization should be. Also, we aim to implement already documented behavior that provides sufficient details to understand how things should be implemented. We can some times decide to deviate the implementation from the existing published documentation for which we should have a good reason and we should document anything that deviates any existing publicly documented feature. Ideally OpenCL standard should decide whether extensions are associated with OpenCL versions or not and this should be clearly documented. It would certainly help the implementation as it saves us time to work out what language version should be taken as a base. I believe that enabling extensions with incompatible language versions can cause the compiler failures without providing helpful information/diagnostic. Therefore, extensions should be aligned with a certain version of OpenCL C because they might require certain language features to be used. If they only require basic OpenCL language features then I guess it can say the extension is used with any language version starting from 1.0. |
Great! Thanks! Feel free to go ahead and commit this!
Thanks. Can someone commit this for me please? Also to 10.0 if possible?
Can you please add a link to the bug describing incorrect optimization to this review once you create it.
Sure. Now I'm waiting for my Bugzilla account to be created.
clang/include/clang/Basic/OpenCLExtensions.def | ||
---|---|---|
68 | Understood. I will check with the spec author. In the unlikely case that the minimum OpenCL version turns out to be different from 2.0, I will follow up with a fix. |
Ok, I can commit this. I am afraid the window for 10.0 is closed. It is still possible however to backport patches to 10.0.1. But this normally applies to bug fixes and not to new functionality. New functionality normally goes through regular development cycles to make sure there is enough time to discover and fix bugs.
Can you please add a link to the bug describing incorrect optimization to this review once you create it.
Sure. Now I'm waiting for my Bugzilla account to be created.
clang/lib/Headers/opencl-c.h | ||
---|---|---|
15594 |
I think it makes sense to allow this extension from the same OpenCL version as normal subgroup. However the version listed somewhere in the spec?