This is an archive of the discontinued LLVM Phabricator instance.

[OpenCL] always use SPIR address spaces for kernel_arg_addr_space MD
ClosedPublic

Authored by pekka.jaaskelainen on Oct 31 2016, 10:18 AM.

Details

Summary

While trying to get rid of the hackish way of pocl to retain information of OpenCL address spaces in the IR (via the fake address space map ids) by means of using the argument info metadata instead, I noticed that the MD currently uses the target's address space IDs. I think it doesn't make sense to use the target's address space ids in this context as this is metadata that should be referring to the "logical" OpenCL address spaces. For flat AS machines like all "CPUs" in general, the logical AS info gets lost as there's only one address space (0).

This patch changes the logic such that we always use the SPIR address space ids for the argument metadata. It thus allows implementing the clGetKernelArgInfo() and the other detection needs.

If possible I'd like to get this also to 3.9.1 which would allow pocl to get cleaner way to handle the address spaces for its next release which supports 3.9. Our use of fake address space ids seem to cause lots bugs which I'd like to get rid of.

Diff Detail

Repository
rL LLVM

Event Timeline

pekka.jaaskelainen retitled this revision from to [OpenCL] always use SPIR address spaces for kernel_arg_addr_space MD.
pekka.jaaskelainen updated this object.
pekka.jaaskelainen set the repository for this revision to rL LLVM.
Anastasia edited edge metadata.Oct 31 2016, 11:21 AM

It looks good generally. My only concern here if anyone could use this MD expecting target AS values which correspond to those put in the IR itself. On the other hand clGetKernelArgInfo is supposed to be used for obtaining the information about the original source which I agree we are not preserving any longer.

yaxunl edited edge metadata.Oct 31 2016, 11:31 AM

LGTM. Thanks.

yaxunl accepted this revision.Nov 1 2016, 10:58 AM
yaxunl edited edge metadata.
This revision is now accepted and ready to land.Nov 1 2016, 10:58 AM

Thanks. @tstellarAMD OK to commit for 3.9.1 as well?

bader added a subscriber: bader.Nov 2 2016, 4:21 AM

@pekka.jaaskelainen, I have related question: what is the problem with retaining OpenCL address space information in LLVM IR?
My understanding is that target CodeGen can ignore this information for the machines with 'flat' address space model.
On the other hand I would expect this information be useful for the Optimizer to resolve pointer aliasing.
Does it make any sense to keep OpenCL address space information in LLVM IR generated for the machines with 'flat' address space (e.g. CPU)?

@pekka.jaaskelainen, I have related question: what is the problem with retaining OpenCL address space information in LLVM IR?
My understanding is that target CodeGen can ignore this information for the machines with 'flat' address space model.
On the other hand I would expect this information be useful for the Optimizer to resolve pointer aliasing.
Does it make any sense to keep OpenCL address space information in LLVM IR generated for the machines with 'flat' address space (e.g. CPU)?

Is there nowadays such a thing as "standard OpenCL logical AS IDs" which could be retained down to code gen? I must say I haven't checked the current situation here. It used to be that the logical ids are assumed to be converted to the target ones already in Clang and I'm afraid changing this requires a major rework.

pocl has used the fake-address-space ids until now for exactly this -- retaining the logical AS info in the IR which can be exploited for disjoint address space AA, but is also required for handling the different AS kernel arguments as local arguments must be treated differently memory allocation wise.

However, as all backends are not expected to support mapping the fake address spaces to their internal ones (in single AS it's trivial to simply ignore the AS ids, but for multi-AS machines there has to be explicit mapping) we have had an IR pass that converts the address spaces to the target's before code gen. This pass we call TargetAddressSpaces has grown way too complex and is a major source of bugs all the time.

Also, another source of bugs is the fact that many passes simply ignore address spaces as they have been developed for single AS machines and only tested on them. This leads to bugs where the AS ID info is silently dropped (converted to 0) which makes them hard to catch. If the pointer creation APIs of LLVM were forced to include the AS ID in the construction, it might yield out majority of these issues -- as long as the coders respect the fact that there can be multiple ASs and not simply use 0 there all the time.

Also, some optimizations such as vectorization might get confused in case it sees non-0 address spaces for CPU targets (e.g. there might not be vectorized intrinsics available for non-0 ASs).

Etc. Thus, due to the limited time our group has available for hunting the bugs that stem from this, I decided it might be best to avoid the use of the "logical IDs" inside IR for now and think about how to implement the disjoint AA without them later on.

Is there nowadays such a thing as "standard OpenCL logical AS IDs" which could be retained down to code gen? I must say I haven't checked the current situation here. It used to be that the logical ids are assumed to be converted to the target ones already in Clang and I'm afraid changing this requires a major rework.

I think it's really hard topic and people probably have different interpretations. My understanding is that Clang AST contains source level AS (i.e. OpenCL and CUDA) and translates down to logical AS in IR (different AS in CUDA and OpenCL could map to the same AS in IR). I don't think those ASes in IR are fixed or agreed to have some defined semantic either. This is probably the source of major confusions. Clang translates to target (i.e. IR) address spaces which should be lowered by LLVM as late as possible into physical ones when the address space information is not needed any more.

But I think there are a lot of different implementations and shortcuts to either translate early or late at the moment. It's hard to say what's best. I guess as soon as it works and generic enough for everyone needs. But in a longer term some better understanding and common concepts should be found in my opinion.

I must say for this particular change it will probably be confusing to see different AS numbers between IR operations and MD in case some targets have different mapping than SPIR. Perhaps, we could discuss alternative strategies.

pocl has used the fake-address-space ids until now for exactly this -- retaining the logical AS info in the IR which can be exploited for disjoint address space AA, but is also required for handling the different AS kernel arguments as local arguments must be treated differently memory allocation wise.

However, as all backends are not expected to support mapping the fake address spaces to their internal ones (in single AS it's trivial to simply ignore the AS ids, but for multi-AS machines there has to be explicit mapping) we have had an IR pass that converts the address spaces to the target's before code gen. This pass we call TargetAddressSpaces has grown way too complex and is a major source of bugs all the time.

Also, another source of bugs is the fact that many passes simply ignore address spaces as they have been developed for single AS machines and only tested on them. This leads to bugs where the AS ID info is silently dropped (converted to 0) which makes them hard to catch. If the pointer creation APIs of LLVM were forced to include the AS ID in the construction, it might yield out majority of these issues -- as long as the coders respect the fact that there can be multiple ASs and not simply use 0 there all the time.

Also, some optimizations such as vectorization might get confused in case it sees non-0 address spaces for CPU targets (e.g. there might not be vectorized intrinsics available for non-0 ASs).

I think it's the same for the frontend actually. Clang hasn't been written with AS in mind and although with time most problems have been fixed, many commits still revolve around various AS issues.

Is it something that should be addressed by LLVM community? Something that we could discuss to be improved on llvm-dev list? I am guessing a lot of contributors that work with segmented memory architectures can benefit from this work.

Etc. Thus, due to the limited time our group has available for hunting the bugs that stem from this, I decided it might be best to avoid the use of the "logical IDs" inside IR for now and think about how to implement the disjoint AA without them later on.

yaxunl added a comment.Nov 3 2016, 9:06 AM

One purpose of kernel arg info metadata is to be passed to OpenCL runtime to fulfill clGetKernelArgInfo queries. (https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clGetKernelArgInfo.html). As such, the address space id needs to be at source level instead of IR level.

One purpose of kernel arg info metadata is to be passed to OpenCL runtime to fulfill clGetKernelArgInfo queries. (https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clGetKernelArgInfo.html). As such, the address space id needs to be at source level instead of IR level.

Right, but what do you propose the numbers of source AS IDs to be or should we put CL_KERNEL_ARG_ADDRESS_GLOBAL, etc, as strings instead? I am not sure it's a good idea to tie SPIR and OpenCL together. At the end SPIR is meant to be an IR to compile from different source languages and not just OpenCL.

The enum for address space is not necessarily bound to SPIR as long as the runtime knows it.

Since most OpenCL runtime already uses that definition (if they use clang as compiler), I don't see a reason to change that. I doubt this metadata can be used for anything else other than clGetKernelArgInfo, therefore other languages most likely do not need it.

I am wondering whether returning the strings with source level information would be an option instead? But yes this MD are OpenCL specific indeed. I am just not sure whether putting hard-coded arbitrary numbers would not cause any confusions in the future. I feel that entire problem is that documentation is missing about such important aspects which opens rooms for multiple interpretations.

Indeed, it requires wider scale discussion to get it right, and e.g. to pass the info to AA. But to be honest, I think OpenCL and CUDA are still considered 'minority' languages in Clang/LLVM which makes me usually lean towards least intrusive implementation solutions whenever possible.

The matter was discussed 5 years ago: http://clang-developers.42468.n3.nabble.com/OpenCL-Address-Spaces-and-Runtimes-td2814865.html

The current situation of having the target AS in the MD is clearly not the way to go due to the loss of info for flat AS machines, so I think this discussion should be mostly about what should those designated IDs be.

Sure, we could use strings such as "opencl.global" there instead, but I'm not sure how much that adds value to simply having known integer IDs instead. SPIR 1.2-2.0 is based on LLVM and designed to store info of OpenCL C kernels, therefore I thought the SPIR IDs are logical as we can just refer to SPIR specs in this case. Other alternative might be to retain the Clang's logical IDs, but I recall there was some special semantics/handling for AS IDs > 255. I might be wrong though.

Indeed, it requires wider scale discussion to get it right, and e.g. to pass the info to AA. But to be honest, I think OpenCL and CUDA are still considered 'minority' languages in Clang/LLVM which makes me usually lean towards least intrusive implementation solutions whenever possible.

The matter was discussed 5 years ago: http://clang-developers.42468.n3.nabble.com/OpenCL-Address-Spaces-and-Runtimes-td2814865.html

The current situation of having the target AS in the MD is clearly not the way to go due to the loss of info for flat AS machines, so I think this discussion should be mostly about what should those designated IDs be.

Sure, we could use strings such as "opencl.global" there instead, but I'm not sure how much that adds value to simply having known integer IDs instead. SPIR 1.2-2.0 is based on LLVM and designed to store info of OpenCL C kernels, therefore I thought the SPIR IDs are logical as we can just refer to SPIR specs in this case. Other alternative might be to retain the Clang's logical IDs, but I recall there was some special semantics/handling for AS IDs > 255. I might be wrong though.

Actually the solution from Ettore looks like a good compromise to me to what we've discussed so far being more generic than hard-coded numbers but less heavy than putting strings for each kernel.

However, I also see that the issue is in absence of common agreement and understanding of the compiler flow which perhaps is also due to absence of any documentation (I am hoping to improve). I still think that this particular problem originates from the fact that compiler removes relevant source information too early. It seems to me that the compiler has to keep the AS information as late as possible until it's no longer being used anywhere in the toolchain. In this case if this is being queried by RT, it expects that this information is relevant for some reason (i.e. for example to optimize for a particular target architecture). So if there are no memory segments, nothing can be done to optimize for this and therefore providing the "fake" segment information doesn't seem to be useful? I am just trying to understand the use case.

I am not too picky on the exact implementation details here. Perhaps well documented hard coded numbers should work too. I am just trying to see the use case for this and whether the current compilation flow is suboptimal to support it properly. But perhaps also the issue is that those MDs are OpenCL specific anyways, but the IR itself is supposed to be source language agnostic.

target architecture). So if there are no memory segments, nothing can be done to optimize for this and therefore providing
the "fake" segment information doesn't seem to be useful? I am just trying to understand the use case.

The uses for the OpenCL logical address spaces that I know of are:

  1. to differentiate local kernel arguments as their memory allocation is different (per WG if parallel WGs)
  2. alias analysis: as OpenCL address spaces are per definition disjoint (no overlap), it can be utilized to prove that pointers to different address spaces are not accessing the same areas, which helps e.g. vectorization and other optimizations that require code motion/parallelization
  3. clGetKernelArgInfo() implementation

The kernel arg MD is good enough for 1) and 3) and helps in 2). Actually having the address spaces in the pointers would be much better, but is harder to maintain.

I am not too picky on the exact implementation details here. Perhaps well documented hard coded numbers should work too. I am just trying to see the use case for this and whether the current compilation flow is suboptimal to support it properly. But perhaps also the issue is that those MDs are OpenCL specific anyways, but the IR itself is supposed to be source language agnostic.

Yes. Maybe I should just add to the docs about the MD in my patch? I think this patch should go in regardless of a possible heavier work related to the logical address spaces as the current MD is clearly broken.

yaxunl added a comment.Nov 8 2016, 7:29 AM

The uses for the OpenCL logical address spaces that I know of are:

  1. to differentiate local kernel arguments as their memory allocation is different (per WG if parallel WGs)
  2. alias analysis: as OpenCL address spaces are per definition disjoint (no overlap), it can be utilized to prove that pointers to different address spaces are not accessing the same areas, which helps e.g. vectorization and other optimizations that require code motion/parallelization
  3. clGetKernelArgInfo() implementation

The kernel arg MD is good enough for 1) and 3) and helps in 2). Actually having the address spaces in the pointers would be much better, but is harder to maintain.

I think 2) should be done through address space of IR since the MD is only available for kernel functions. For amdgpu targets this is not an issue since they preserve the OpenCL address spaces.

Yes. Maybe I should just add to the docs about the MD in my patch? I think this patch should go in regardless of a possible heavier work related to the logical address spaces as the current MD is clearly broken.

Agree.

Anastasia accepted this revision.Nov 8 2016, 9:49 AM
Anastasia edited edge metadata.

Your code comment seem to describe the issue quite well. Even though I am still inclined towards keeping the address spaces as long as possible and only converting into physical memory segments on the backend really. I believe there are a couple of flatten memory architectures that do that and it seems to work fine. But on the other hand it seems that there are other implementations that took the path of flattening directly in Clang and therefore are facing the issue you are solving now.

I will try to send some draft on OpenCL documentation in Clang hopefully the following weeks. Perhaps it will help us to converge to the common flow at some point.

Committed in r286819.