Page MenuHomePhabricator

[SYCL] Implement __builtin_unique_stable_name.
ClosedPublic

Authored by erichkeane on Mar 23 2020, 8:50 AM.

Details

Summary

In order to support non-user-named kernels, SYCL needs some way in the
integration headers to name the kernel object themselves. Initially, the
design considered just RTTI naming of the lambdas, this results in a
quite unstable situation in light of some device/host macros.
Additionally, this ends up needing to use RTTI, which is a burden on the
implementation and typically unsupported.

Instead, we've introduced a builtin, __builtin_unique_stable_name, which
takes a type or expression, and results in a constexpr constant
character array that uniquely represents the type (or type of the
expression) being passed to it.

The implementation accomplishes that simply by using a slightly modified
version of the Itanium Mangling. The one exception is when mangling
lambdas, instead of appending the index of the lambda in the function,
it appends the macro-expansion back-trace of the lambda itself in the
form LINE->COL[~LINE->COL...].

Diff Detail

Event Timeline

erichkeane created this revision.Mar 23 2020, 8:50 AM
bader accepted this revision.Mar 24 2020, 7:54 AM

LGTM. Thanks!

This revision is now accepted and ready to land.Mar 24 2020, 7:54 AM

LGTM. Thanks!

Thanks! I'll give this 24 hrs for other comments.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 25 2020, 7:32 AM

Richard and I just found out about this commit, and we've decided to revert it. I apologize for the very late reversion, but the reality of Clang development is that it's very difficult for the code owners to watch literally every commit, and we rely on the community to follow policies and bring things to our attention when there's a question about the right thing to do. This is a new feature that was submitted without properly following any of the guidelines for new features: it was neither approved by a relevant standards body, added for compatibility with an existing frontend, or taken through the RFC process. If you had brought it to our attention as an RFC, we would've expressed serious concerns about the design.

Please start an RFC thread and CC Richard and me.

Richard and I just found out about this commit, and we've decided to revert it. I apologize for the very late reversion, but the reality of Clang development is that it's very difficult for the code owners to watch literally every commit, and we rely on the community to follow policies and bring things to our attention when there's a question about the right thing to do. This is a new feature that was submitted without properly following any of the guidelines for new features: it was neither approved by a relevant standards body, added for compatibility with an existing frontend, or taken through the RFC process. If you had brought it to our attention as an RFC, we would've expressed serious concerns about the design.

Please start an RFC thread and CC Richard and me.

The feature that this supports is a part of the SYCL 2020 Provisional Spec, I thought that was sufficient. We'll look into an RFC to re-submit in the future.

Thank you. If it were SYCL-namespaced and -restricted, that would be fine, but as is it's positioned as a generic language feature. (I think we might still have had implementation/design feedback, though.)

keryell added a subscriber: keryell.

Enabling this feature only with SYCL seems like an easy and quick mitigation, as SYCL compilers downstream can easily update their runtime to the new builtin name.
Otherwise, just removing a feature used for almost 6 months will cause some kind of forking pain...
Anyway, improving the feature and implementation at the same time with an RFC for a wider usage looks like a great idea.

rsmith added a subscriber: rsmith.Mon, Oct 12, 7:28 PM

The feature that this supports is a part of the SYCL 2020 Provisional Spec, I thought that was sufficient. We'll look into an RFC to re-submit in the future.

Does that cover only functionality that can be implemented in terms of this builtin, or the builtin itself? If so, what is the functionality that needs to be supported? That information will be a good starting point for the RFC.

Otherwise, just removing a feature used for almost 6 months will cause some kind of forking pain...

Well, there have at least not been any LLVM releases with this feature yet. (We're on the cusp of releasing LLVM 11, which would be the first release with this functionality.)

The feature that this supports is a part of the SYCL 2020 Provisional Spec, I thought that was sufficient. We'll look into an RFC to re-submit in the future.

Does that cover only functionality that can be implemented in terms of this builtin, or the builtin itself? If so, what is the functionality that needs to be supported? That information will be a good starting point for the RFC.

Functionality that is implemented in terms of this builtin. Basically: When compiling a SYCL program, we do so in 2 invocations, once for the Host and once for the Device. The 'sycl_kernel' is the interface function that both invocations of the compiler need to know the name of. For the most part, using the the mangled name of the functor would work, except when using lambdas, where the name is not sufficiently stable, and frequently changes across invocations (even further so when there is macros that introduce lambdas above/below it).

The earlier SYCL spec gets around this by requiring the user to specify a unique-name for the kernel name themselves, in a way that is otherwise unused, something like:
kernel_call<class ANameIGuaranteeIsUnique>([](){});

The 2020 spec supports removing the specific name (so that kernel_call([](){}) works). In order to do that, we need some interface that provides a name based on the functor that is both unique and stable. As I mentioned before, lambdas proved difficult, so we added a builtin to produce a string version of the name (that the host runtime can then use to lookup the kernel later), and the implementation of it in SEMA is used to generate the kernel's name.

Otherwise, just removing a feature used for almost 6 months will cause some kind of forking pain...

Well, there have at least not been any LLVM releases with this feature yet. (We're on the cusp of releasing LLVM 11, which would be the first release with this functionality.)

In fairness, all of the downstreams that I'm aware for SYCL of are working off of trunk. For my two downstreams, this is only a mild inconvenience.

CUDA/HIP are facing similar issues, i.e. consistency of name mangling of kernels between host/device compilation of the same TU. I hope this feature to be implemented in a generic way so that it may be reusable for other offloading languages.

CUDA/HIP are facing similar issues, i.e. consistency of name mangling of kernels between host/device compilation of the same TU. I hope this feature to be implemented in a generic way so that it may be reusable for other offloading languages.

This implementation SHOULD work for that I'd expect, but we're currently re-visiting it in order to get the names to be demanglable (at least on Linux, and will likely delay an RFC). We've used the Itanium Mangler plus some modification to the lambda mangling to make it depend on source-lines instead of arbitrary ordering.

You know on both sides that a lambda is used as a kernel, yes? Why not simply introduce that into the mangling of lambdas, so that the subset of lambdas used as kernels form a stable sequence?

You know on both sides that a lambda is used as a kernel, yes? Why not simply introduce that into the mangling of lambdas, so that the subset of lambdas used as kernels form a stable sequence?

Coming up with said stable sequence is somewhat difficult as well. A strict integer ordering didn't end up being stable, as some of the kernel handling can cause instantiations to happen in a slightly different ordering, which messed that up, as would use of the builtin. We wanted to find a way that was dependent on the source code alone. The "Quick and Dirty" solution was line/column numbers, though we considered a hash of that same information to at least make it shorter.

You know on both sides that a lambda is used as a kernel, yes? Why not simply introduce that into the mangling of lambdas, so that the subset of lambdas used as kernels form a stable sequence?

Coming up with said stable sequence is somewhat difficult as well. A strict integer ordering didn't end up being stable, as some of the kernel handling can cause instantiations to happen in a slightly different ordering, which messed that up, as would use of the builtin. We wanted to find a way that was dependent on the source code alone. The "Quick and Dirty" solution was line/column numbers, though we considered a hash of that same information to at least make it shorter.

Certainly you wouldn't want a *global* sequence ID. However, lambdas can't just occur globally, they're always part of some declaration that they can be scoped to, so that you have e.g. "the third kernel lambda within function X". I fail to see how that wouldn't address the concern about instantiation order, and it's still source-directed.

You know on both sides that a lambda is used as a kernel, yes? Why not simply introduce that into the mangling of lambdas, so that the subset of lambdas used as kernels form a stable sequence?

Coming up with said stable sequence is somewhat difficult as well. A strict integer ordering didn't end up being stable, as some of the kernel handling can cause instantiations to happen in a slightly different ordering, which messed that up, as would use of the builtin. We wanted to find a way that was dependent on the source code alone. The "Quick and Dirty" solution was line/column numbers, though we considered a hash of that same information to at least make it shorter.

Certainly you wouldn't want a *global* sequence ID. However, lambdas can't just occur globally, they're always part of some declaration that they can be scoped to, so that you have e.g. "the third kernel lambda within function X". I fail to see how that wouldn't address the concern about instantiation order, and it's still source-directed.

Hmm... I'll have to consider that. That is an interesting thought. Presumably we could just copy the 'getLambdaManglingNumber' stuff in that case, and place the value in the same location in the mangling, with some sort of discriminator (to avoid conflicting manglings). The only other concern I have is WHEN we know that a lambda is used in a kernel, which we don't until it is called (and can cause confusion when it is a template parameter and called later).

You know on both sides that a lambda is used as a kernel, yes? Why not simply introduce that into the mangling of lambdas, so that the subset of lambdas used as kernels form a stable sequence?

Coming up with said stable sequence is somewhat difficult as well. A strict integer ordering didn't end up being stable, as some of the kernel handling can cause instantiations to happen in a slightly different ordering, which messed that up, as would use of the builtin. We wanted to find a way that was dependent on the source code alone. The "Quick and Dirty" solution was line/column numbers, though we considered a hash of that same information to at least make it shorter.

Certainly you wouldn't want a *global* sequence ID. However, lambdas can't just occur globally, they're always part of some declaration that they can be scoped to, so that you have e.g. "the third kernel lambda within function X". I fail to see how that wouldn't address the concern about instantiation order, and it's still source-directed.

Hmm... I'll have to consider that. That is an interesting thought. Presumably we could just copy the 'getLambdaManglingNumber' stuff in that case, and place the value in the same location in the mangling, with some sort of discriminator (to avoid conflicting manglings).

The Itanium mangling already produces a different lambda mangling number for different lambda signatures. You just need the kernel-ness to be part of the signature.

The only other concern I have is WHEN we know that a lambda is used in a kernel, which we don't until it is called (and can cause confusion when it is a template parameter and called later).

Oh, yes, if you don't know that a lambda is used as a kernel locally then this falls apart a bit. You could of course pretend for ABI purposes that there's a new intermediate lambda at the kernel use point when the lambda is not local to the current function. I don't think there's anything which relies on mangling lambdas before the function they're contained within is fully type-checked.

You know on both sides that a lambda is used as a kernel, yes? Why not simply introduce that into the mangling of lambdas, so that the subset of lambdas used as kernels form a stable sequence?

Coming up with said stable sequence is somewhat difficult as well. A strict integer ordering didn't end up being stable, as some of the kernel handling can cause instantiations to happen in a slightly different ordering, which messed that up, as would use of the builtin. We wanted to find a way that was dependent on the source code alone. The "Quick and Dirty" solution was line/column numbers, though we considered a hash of that same information to at least make it shorter.

Certainly you wouldn't want a *global* sequence ID. However, lambdas can't just occur globally, they're always part of some declaration that they can be scoped to, so that you have e.g. "the third kernel lambda within function X". I fail to see how that wouldn't address the concern about instantiation order, and it's still source-directed.

Hmm... I'll have to consider that. That is an interesting thought. Presumably we could just copy the 'getLambdaManglingNumber' stuff in that case, and place the value in the same location in the mangling, with some sort of discriminator (to avoid conflicting manglings).

The Itanium mangling already produces a different lambda mangling number for different lambda signatures. You just need the kernel-ness to be part of the signature.

It does it via the getLambdaManglingNumber I believe, right? The idea would be to have a getKernelLambdaManglingNumber that does something similar, and needs something to avoid conflicting, which is the discriminator I was mentioning.

The only other concern I have is WHEN we know that a lambda is used in a kernel, which we don't until it is called (and can cause confusion when it is a template parameter and called later).

Oh, yes, if you don't know that a lambda is used as a kernel locally then this falls apart a bit. You could of course pretend for ABI purposes that there's a new intermediate lambda at the kernel use point when the lambda is not local to the current function. I don't think there's anything which relies on mangling lambdas before the function they're contained within is fully type-checked.

One thing that might save us from this, is we aren't actually modifying the LAMBDAs mangling, we're modifying the KERNEL's mangling. The lambda itself keeps its mangling, but the kernel that calls it (which is the device/host boundary) is what has its name changed. So at least at that point we know some about it. And, actually, since we are generating the 'integeration header' lookup table AND the kernel in identical (if not the same) invocation, perhaps a global counter WOULD work... My coworker is investigating this now, so hopefully she'll be able to prove out this idea and give further feedback here.

Thank you so much for the discussion!

You know on both sides that a lambda is used as a kernel, yes? Why not simply introduce that into the mangling of lambdas, so that the subset of lambdas used as kernels form a stable sequence?

Coming up with said stable sequence is somewhat difficult as well. A strict integer ordering didn't end up being stable, as some of the kernel handling can cause instantiations to happen in a slightly different ordering, which messed that up, as would use of the builtin. We wanted to find a way that was dependent on the source code alone. The "Quick and Dirty" solution was line/column numbers, though we considered a hash of that same information to at least make it shorter.

Certainly you wouldn't want a *global* sequence ID. However, lambdas can't just occur globally, they're always part of some declaration that they can be scoped to, so that you have e.g. "the third kernel lambda within function X". I fail to see how that wouldn't address the concern about instantiation order, and it's still source-directed.

Hmm... I'll have to consider that. That is an interesting thought. Presumably we could just copy the 'getLambdaManglingNumber' stuff in that case, and place the value in the same location in the mangling, with some sort of discriminator (to avoid conflicting manglings).

The Itanium mangling already produces a different lambda mangling number for different lambda signatures. You just need the kernel-ness to be part of the signature.

It does it via the getLambdaManglingNumber I believe, right? The idea would be to have a getKernelLambdaManglingNumber that does something similar, and needs something to avoid conflicting, which is the discriminator I was mentioning.

Right, so if you look how that's ultimately derived in ItaniumNumberingContext::getManglingNumber, you'll see that every context has different sequences for unique lambda-sigs. If you can just put "this is a kernel" into the lambda-sig, you'll automatically get a stable sequence for kernel lambdas. But that would rely on immediately knowing that a lambda will be used as a kernel.

The only other concern I have is WHEN we know that a lambda is used in a kernel, which we don't until it is called (and can cause confusion when it is a template parameter and called later).

Oh, yes, if you don't know that a lambda is used as a kernel locally then this falls apart a bit. You could of course pretend for ABI purposes that there's a new intermediate lambda at the kernel use point when the lambda is not local to the current function. I don't think there's anything which relies on mangling lambdas before the function they're contained within is fully type-checked.

One thing that might save us from this, is we aren't actually modifying the LAMBDAs mangling, we're modifying the KERNEL's mangling. The lambda itself keeps its mangling, but the kernel that calls it (which is the device/host boundary) is what has its name changed. So at least at that point we know some about it. And, actually, since we are generating the 'integeration header' lookup table AND the kernel in identical (if not the same) invocation, perhaps a global counter WOULD work... My coworker is investigating this now, so hopefully she'll be able to prove out this idea and give further feedback here.

I feel obliged to point out that, if you're worried about a single function potentially having a different set of lambdas in different translation modes, that's also going to affect things like template specialization identity. That is, suppose a kernel function is formed for a particular lambda, and in one translation mode it's the first lambda with that signature in a function and in another mode it's the second. That's going to change the immediate mangling of the lambda, but it's also going to change the mangling of templates used with that lambda. For example, if you have

template <class F> void run_wrapped_as_kernel(const F &f) {
  use_as_kernel([=] { f(); });  // this lambda is always the first in its context, but the identity of that context is contingent on something unstable
}

void foo() {
#if mode1
  auto blah = []{}; // a lambda that's only present in one language mode
#endif
  use_as_kernel([] { ... }); // the mangling number for this is disturbed
}

So if you're going to try to solve this (instead of just making it the user's responsibility to not do), it's almost like you need some way to retroactively number every lambda implicated in any way by the host/kernel interface, and then have a way to mangle things completely from scratch using those stable numberings for this kernel-numbering thing.

You know on both sides that a lambda is used as a kernel, yes? Why not simply introduce that into the mangling of lambdas, so that the subset of lambdas used as kernels form a stable sequence?

Coming up with said stable sequence is somewhat difficult as well. A strict integer ordering didn't end up being stable, as some of the kernel handling can cause instantiations to happen in a slightly different ordering, which messed that up, as would use of the builtin. We wanted to find a way that was dependent on the source code alone. The "Quick and Dirty" solution was line/column numbers, though we considered a hash of that same information to at least make it shorter.

Certainly you wouldn't want a *global* sequence ID. However, lambdas can't just occur globally, they're always part of some declaration that they can be scoped to, so that you have e.g. "the third kernel lambda within function X". I fail to see how that wouldn't address the concern about instantiation order, and it's still source-directed.

Hmm... I'll have to consider that. That is an interesting thought. Presumably we could just copy the 'getLambdaManglingNumber' stuff in that case, and place the value in the same location in the mangling, with some sort of discriminator (to avoid conflicting manglings).

The Itanium mangling already produces a different lambda mangling number for different lambda signatures. You just need the kernel-ness to be part of the signature.

It does it via the getLambdaManglingNumber I believe, right? The idea would be to have a getKernelLambdaManglingNumber that does something similar, and needs something to avoid conflicting, which is the discriminator I was mentioning.

Right, so if you look how that's ultimately derived in ItaniumNumberingContext::getManglingNumber, you'll see that every context has different sequences for unique lambda-sigs. If you can just put "this is a kernel" into the lambda-sig, you'll automatically get a stable sequence for kernel lambdas. But that would rely on immediately knowing that a lambda will be used as a kernel.

The only other concern I have is WHEN we know that a lambda is used in a kernel, which we don't until it is called (and can cause confusion when it is a template parameter and called later).

Oh, yes, if you don't know that a lambda is used as a kernel locally then this falls apart a bit. You could of course pretend for ABI purposes that there's a new intermediate lambda at the kernel use point when the lambda is not local to the current function. I don't think there's anything which relies on mangling lambdas before the function they're contained within is fully type-checked.

One thing that might save us from this, is we aren't actually modifying the LAMBDAs mangling, we're modifying the KERNEL's mangling. The lambda itself keeps its mangling, but the kernel that calls it (which is the device/host boundary) is what has its name changed. So at least at that point we know some about it. And, actually, since we are generating the 'integeration header' lookup table AND the kernel in identical (if not the same) invocation, perhaps a global counter WOULD work... My coworker is investigating this now, so hopefully she'll be able to prove out this idea and give further feedback here.

I feel obliged to point out that, if you're worried about a single function potentially having a different set of lambdas in different translation modes, that's also going to affect things like template specialization identity. That is, suppose a kernel function is formed for a particular lambda, and in one translation mode it's the first lambda with that signature in a function and in another mode it's the second. That's going to change the immediate mangling of the lambda, but it's also going to change the mangling of templates used with that lambda. For example, if you have

We typically disallow the kernel itself having a different lambda in each mode, as that doesn't make much sense. The SYCL language is a little coy with the ODR when it comes to macros, but typically the kernel calls themselves have to be consistent in a single application.

template <class F> void run_wrapped_as_kernel(const F &f) {
  use_as_kernel([=] { f(); });  // this lambda is always the first in its context, but the identity of that context is contingent on something unstable
}

void foo() {
#if mode1
  auto blah = []{}; // a lambda that's only present in one language mode
#endif
  use_as_kernel([] { ... }); // the mangling number for this is disturbed
}

So if you're going to try to solve this (instead of just making it the user's responsibility to not do), it's almost like you need some way to retroactively number every lambda implicated in any way by the host/kernel interface, and then have a way to mangle things completely from scratch using those stable numberings for this kernel-numbering thing.

THAT right there is exactly the problem we ran into, the 'blah' changing the kernel numbering. Our previous 'retroactive way' of numbering was line&column numbers + the same for macro invocations as it was stable with that. Mangling them retroactively is perhaps a possibility, though it would be great if we could force a number based on textual order (complicated further by template instantiations).