This is an archive of the discontinued LLVM Phabricator instance.

[Clang] Recognize target address space in superset calculation
Needs RevisionPublic

Authored by jchlanda on Apr 25 2022, 5:47 AM.

Details

Summary

Use target's address space map to handle cases when both language and
target address spaces are provided. In such case, attempt language to
target translation and only then perform the calculation.
The main motivation is to be able to use language address spaces as
inputs for builtins, which are defined in terms of target address space
(as discussed here: https://reviews.llvm.org/D112718) and hence the
definition of builtins with generic address space pointers that would
allow any other address space pointers inputs (bar constant).

This patch attempts to find a happy medium between not recognising target
address spaces at all (current state) and allowing all uses of it, based on
the assumption that users must know better. What it does not to is to
provide a bidirectional translation mechanism, which I'm not sure could ever
be done, with the current address space implementation (use of 0, the value
of default, etc).

Based on OpenCL rules, this patch follows the conversion guidelines for
generic and constant address space pointers as described here:
https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_API.html#_memory_model
similarly it honours SYCL global, global device` and global host semantics.

Diff Detail

Event Timeline

jchlanda created this revision.Apr 25 2022, 5:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 5:47 AM
jchlanda requested review of this revision.Apr 25 2022, 5:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2022, 5:47 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

@tra @Naghasan @t4c1 you might find it interesting, a follow up from the discussion here: https://reviews.llvm.org/D112718

tra added inline comments.Apr 25 2022, 12:11 PM
clang/include/clang/AST/Type.h
486

If A and B are both target AS, we fall through to the code which is dealing with language AS, which would not do us any good. If that's not expected to happen, we should have an assert to ensure it.

Next, I'm not particularly fond of IsSYCLOrOpenCL. Do we need it at all. If we do know that AS maps to OpenCL Constant or Generic, I would assume that those AS would follow the same semantics. Besides, will we ever see OpenCL language AS in non-OpenCL code?

Next, the function *is* OpenCL specific and would not work for CUDA or HIP. I think it needs to be generalized to provide language-specific AS mapping rules.

489

Is the check intended to tell if A is a target AS? If so, we do have isTargetAddressSpace() for that (and it uses '>= LangAS::FirstTargetAddressSpace', which suggests that > may be incorrect, too).

498–499

getLangASFromTargetAS((*ASMap)[static_cast<unsigned>(LangAS::opencl_constant)])

500

if (!IsBTargetAS) would be more directly related to what we're doing here.

508

Is the code above intended to ensure that both A and B are target AS at this point?
If so, then it could be simplified to something like this:

if (ASMap) {
  if (!isTargetAddressSpace(A))
    A = getLangASFromTargetAS((*ASMap)[static_cast<unsigned>(A)]);
  if (!isTargetAddressSpace(B))
    B = getLangASFromTargetAS((*ASMap)[static_cast<unsigned>(B)]);

  Generic = getLangASFromTargetAS((*ASMap)[static_cast<unsigned>(LangAS::opencl_generic)])
  Constant = getLangASFromTargetAS((*ASMap)[static_cast<unsigned>(LangAS::opencl_constant)]);

  // ... proceed inferring whether A is superset of B in target AS.
  return;
}
assert (isTargetAddressSpace(A) && isTargetAddressSpace(B));
clang/lib/Sema/SemaExpr.cpp
9204

Should you pass IsSYCLOrOpenCL to it too? The way isAddressSpaceSupersetOf is implemented now it may give you a different result without it.

Also, it may make sense to plumb ASMap into lhq.isAddressSpaceSupersetOf, too, and just use the old code + couple of new arguments.

9219

Do we need to pass ASMap and IsSYCLOrOpenCL here, too?

You should be able to provide an address space of the pointer using the number, see details in:
https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/Builtins.def#L65

However if language address spaces are needed I wonder if the best approach is just to extend
the Builtin definitions with language address spaces similar to other qualifiers?

This patch attempts to find a happy medium between not recognising target
address spaces at all (current state) and allowing all uses of it, based on
the assumption that users must know better. What it does not to is to
provide a bidirectional translation mechanism, which I'm not sure could ever
be done, with the current address space implementation (use of 0, the value
of default, etc).

Can you provide an example of where it could be useful? Note that I feel that
such functionality could be implemented on top of full implementation of
target specific address space proposed in https://reviews.llvm.org/D62574.

Can you provide an example of where it could be useful? Note that I feel that
such functionality could be implemented on top of full implementation of
target specific address space proposed in https://reviews.llvm.org/D62574.

The use case we had was when calling target builtin (that specifies address
space) from within OpenCL C. Currently this errors out, similarly, the explicit
type cast to address space yields an error
((__attribute__((address_space(3))) int *)&l_woof in the example below). This
is important for libclc, which is implemented in OpenCL C and deals directly
with target builtins.

__kernel void woof() {
  __local int l_woof;
  __nvvm_cp_async_mbarrier_arrive_shared(&l_woof);
}

I wasn't aware of that patch, sorry, I've not had a close look yet, but it
seems worryingly dated.

Can you provide an example of where it could be useful? Note that I feel that
such functionality could be implemented on top of full implementation of
target specific address space proposed in https://reviews.llvm.org/D62574.

The use case we had was when calling target builtin (that specifies address
space) from within OpenCL C. Currently this errors out, similarly, the explicit
type cast to address space yields an error
((__attribute__((address_space(3))) int *)&l_woof in the example below). This
is important for libclc, which is implemented in OpenCL C and deals directly
with target builtins.

__kernel void woof() {
  __local int l_woof;
  __nvvm_cp_async_mbarrier_arrive_shared(&l_woof);
}

I wasn't aware of that patch, sorry, I've not had a close look yet, but it
seems worryingly dated.

Ok, I think the current behavior of builtins is to work with any address space. The way it had worked so far is since the builtins are only intended to be used in toolchains (instead of arbitrary code), the toolchain developers were responsible for making sure that the address spaces are used adequately in those builtins. However the question of extending the clang builtin functions with the notion of language address spaces has popped up before. And I think we could add this feature in a very light way for example by reserving the numbers from the clang LangAS enum to be used with the language address spaces in the prototypes of builtins. Although we could think of more elegant alternatives too. My understanding is there was never a strong enough case to add this functionality.

Just to understand further - do you need the builtins to have a specific address space prototype? And do you need this to improve error handling in libclc code base?

I imagine you could also create some sort of the wrapper functions around the building with the right address spaces, i.e. something like

void __libclc_builtin1(local int* p){
   __builtin1(p);
}

So the prototype in Clang for __builtin1 would still be permissive wrt address space of the pointer but as you only use __libclc_builtin1 in the codebase, you can ensure the correct uses. While this is how this problem has been worked around I think extending Clang builtins definitions might be inevitable to avoid forcing toolchains to create wrapper functions. However if we are aiming for this goal, I think more targeted solutions would make more sense instead of solving this problem indirectly by allowing conversions between target and language address spaces as this for example won’t work for builtins that are shared between targets.

@Anastasia @tra apologies for a late reply, I'm catching up with the thread after holidays.

And I think we could add this feature in a very light way for example by reserving the numbers from the clang LangAS enum to be used with the language address spaces in the prototypes of builtins.

I'm not sure I understand how that would look, could you please elaborate?

Just to understand further - do you need the builtins to have a specific address space prototype? And do you need this to improve error handling in libclc code base?

With the wrappers suggested we could declare all the pointers to be in generic AS and that would get us around the target vs language AS problem. I don't think that would improve the situation, as from llvm perspective/use case all those builtins would be incorrect and there would be no way for users to tell that there is a specific AS requirement on them, nor would the compiler be able to warn/error. Then the only thing making it work would be those wrappers, embedded deeply in the source of libclc, which at the moment is not even shipped with upstream llvm.
The builtins in question are not exclusively OpenCL/SYCL related, say TARGET_BUILTIN(__nvvm_cp_async_ca_shared_global_16, "vv*3vC*1", "", AND(SM_80,PTX70)) would need to take both pointer in a generic address space here. It feels like explicitly providing AS in the prototype is needed.

this for example won’t work for builtins that are shared between targets.

While I agree in principle, I'm not sure if there are any target agnostic and AS specific builtins, sounds like a dangerous thing to introduce. And in any case, it's the target that provides the AS map.

jchlanda added inline comments.May 6 2022, 5:05 AM
clang/include/clang/AST/Type.h
486

I would only like to handle the mixed AS case, it feels like trying to walk back from both HW AS and potentially do the logic of global and constant would be against the intention of users. Asserting on only one HW AS could backfire, as I think it should be allowed to assign between different HW AS.

The reason I added IsSYCLOrOpenCL is because this code is also exercised by checkPointerTypesForAssignment which is not OpenCL specific, so I had to have a way of conditionally enabling the conversion to generic AS.

I agree, it is too restrictive now, especially that the AS map provides values for SYCL, OpenCL and CUDA, so perhaps I should extend IsSYCLOrOpenCL to be an enum specifying which language the function deals with and act accordingly?

489

Yeap, will update to isTargetAddressSpace.

498–499

OK.

500

You are right, will update.

508

Yes at the end of AS map accesses all address spaces have to be expressed in therms of HW values, but I want it to happen only in the case of mixed AS (Language and HW). I will add assert and use helpers, like you suggested in the snippet, but would like to keep the ^ condition.

clang/lib/Sema/SemaExpr.cpp
9204

Yes, will add.

I thought it would introduce much bigger diff, and there was already a handy static version of it, don't mind modifying the member if you'd prefer.

9219

I feel like it would be covered by the first if in this block.

tra added inline comments.May 6 2022, 11:11 AM
clang/include/clang/AST/Type.h
508

would like to keep the ^ condition.

OK. Adding a comment explaining what's going on would be helpful here.

jchlanda updated this revision to Diff 429956.May 17 2022, 1:24 AM
jchlanda edited the summary of this revision. (Show Details)
jchlanda added a reviewer: Anastasia.

Use helper functions when handling address space values.

jchlanda added inline comments.May 17 2022, 1:29 AM
clang/include/clang/AST/Type.h
486

Next, I'm not particularly fond of IsSYCLOrOpenCL. Do we need it at all. If we do know that AS maps to OpenCL Constant or Generic, I would assume that those AS would follow the same semantics. Besides, will we ever see OpenCL language AS in non-OpenCL code?

Next, the function *is* OpenCL specific and would not work for CUDA or HIP. I think it needs to be generalized to provide language-specific AS mapping rules.

I've changed that bool flag to be an enum specifying OpenCL/SYCL/None. The rational here is that the handling of AS values differs slightly (SYCL introduces globa device and global host). It would appear that CUDA follows completely different code-path and by the time isAddressSpaceSuperset called all of the language AS values are stripped and set to 0, which is why (along the fact that we don't have a valid use case for CUDA) I left it out, and only return true for an exact match.

tra added a comment.May 17 2022, 11:14 AM

My concerns have been addressed. I'll defer the final LGTM to @Anastasia.

And I think we could add this feature in a very light way for example by reserving the numbers from the clang LangAS enum to be used with the language address spaces in the prototypes of builtins.

I'm not sure I understand how that would look, could you please elaborate?

We could reserve the IDs from LangAS enums to be used in https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/Builtins.def#L65, but we could also extend the syntax more naturally i.e. numbers could be used for target address spaces and we could add some letter based syntax for language address spaces.

Just to understand further - do you need the builtins to have a specific address space prototype? And do you need this to improve error handling in libclc code base?

With the wrappers suggested we could declare all the pointers to be in generic AS and that would get us around the target vs language AS problem. I don't think that would improve the situation, as from llvm perspective/use case all those builtins would be incorrect and there would be no way for users to tell that there is a specific AS requirement on them, nor would the compiler be able to warn/error. Then the only thing making it work would be those wrappers, embedded deeply in the source of libclc, which at the moment is not even shipped with upstream llvm.
The builtins in question are not exclusively OpenCL/SYCL related, say TARGET_BUILTIN(__nvvm_cp_async_ca_shared_global_16, "vv*3vC*1", "", AND(SM_80,PTX70)) would need to take both pointer in a generic address space here. It feels like explicitly providing AS in the prototype is needed.

My understanding was that Clang low level builtins are not desirable for uses in the application code directly. They are more targeted at tooling developers and low level libraries uses. Which is why this problem has been worked around by declaring the wrapper overload with correct address spaces in the tooling projects.

My understanding was that you need those builtins in the libclc code?

this for example won’t work for builtins that are shared between targets.

While I agree in principle, I'm not sure if there are any target agnostic and AS specific builtins, sounds like a dangerous thing to introduce. And in any case, it's the target that provides the AS map.

In OpenCL we actually have quite some target agnostic builtins.

Anastasia requested changes to this revision.May 18 2022, 4:09 PM

I feel that to progress further on this change, it would be good to get details about the use cases and the limitations first.

However, if there is sufficient evidence of the need to extend clang builtins with language address spaces I am not convinced the approach here is suitable as:

  1. It doesn't allow uses of language builtins with language address spaces generically as mapping to the target address spaces is not portable. In general there can also be generic buitins that are normally mapped to native LLVM intrinsics (not target intrinsics!) shared among multiple targets that might also benefit from having this implemented in a target agnostic way. Such intrinsics could be shares for example between PTX and SPIR-V targets as there is quite some overlap in the functionality.
  2. It has much wider impact on the language semantics then just allowing language address spaces being used in builtins i.e. it results in implicit conversions more broadly. This might not be desirable evolution and we might need to reach some consensus with more languages or targets using the address spaces in order to proceed with such change. In fact current title and description doesn't adequately reflect the impact of the change.

Has extending the builtin definition syntax been considered for this problem? That seems like a more natural and fairly localized change. For example the syntax in https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Basic/Builtins.def#L65 could be changed to:

// * -> pointer (optionally followed by an address space number for target address space
//               or by 00 and a number for language address space as it is set in LangAS, if no
//               address space is specified than any address space will be accepted)

Note that we will likely need to set LangAS entry values explicitly which would make maintaing the enum slightly more painful but it doesn't seem like a concern.

If the only use case we have right now is ability to specify generic address space in kernel-like langauges we could also just reserve a special 00 number in address space field of the buitin prototype description for generic address space and leave full support as a future work.

If we understand the use case better we might be able to look at other alternatives too...

This revision now requires changes to proceed.May 18 2022, 4:09 PM