This is an archive of the discontinued LLVM Phabricator instance.

[hip][cuda] Fix the extended lambda name mangling issue.
ClosedPublic

Authored by hliao on Oct 10 2019, 10:21 AM.

Details

Summary
  • HIP/CUDA host side needs to use device kernel symbol name to match the device side binaries. Without a consistent naming between host- and device-side compilations, it's risky that wrong device binaries are executed. Consistent naming is usually not an issue until unnamed types are used, especially the lambda. In this patch, the consistent name mangling is addressed for the extended lambdas, i.e. the lambdas annotated with __device__.
  • In [Itanium C++ ABI][1], the mangling of the lambda is generally unspecified unless, in certain cases, ODR rule is required to ensure consisent naming cross TUs. The extended lambda is such a case as its name may be part of a device kernel function, e.g., the extended lambda is used as a template argument and etc. Thus, we need to force ODR for extended lambdas as they are referenced in both device- and host-side TUs. Furthermore, if a extended lambda is nested in other (extended or not) lambdas, those lambdas are required to follow ODR naming as well. This patch revises the current lambda mangle numbering to force ODR from an extended lambda to all its parent lambdas.
  • On the other side, the aforementioned ODR naming should not change those lambdas' original linkages, i.e., we cannot replace the original internal with linkonce_odr; otherwise, we may violate ODR in general. This patch introduces a new field HasKnownInternalLinkage in lambda data to decouple the current linkage calculation based on mangling number assigned.

[1]: https://itanium-cxx-abi.github.io/cxx-abi/abi.html

Diff Detail

Event Timeline

hliao created this revision.Oct 10 2019, 10:21 AM
Herald added a reviewer: shafik. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: cfe-commits. · View Herald Transcript

this's a patch address the same issue previously proposed to be worked around in https://reviews.llvm.org/D63164

hliao marked 2 inline comments as done.Oct 10 2019, 10:31 AM

minor comment to help review.

BTW, as the AST ser/deser is changed as well, not sure we have compatibility issue and, if there is, how to handle that. please advice me on that concern. Thanks.

clang/include/clang/AST/DeclCXX.h
390–394

Try to avoid inflating memory footprint by borrowing one bit from NumExplicitCaptures. It seems to me that we won't have that large number (8192) of *explicit* capture.

clang/lib/Sema/SemaLambda.cpp
432

extracted from startLambdaDefinition so that we could have a single place to handle mangling numbers based on CUDA attributes.

hliao added a comment.Oct 11 2019, 6:14 AM

PING for review, thanks

hliao added a comment.Oct 15 2019, 5:56 AM

PING for review

tra added a comment.Oct 15 2019, 8:58 AM

@rsmith Richard, could you take a look, please? Lambdas, mangling, ODR rules & ABI scare me. :-)

hliao added a comment.Oct 15 2019, 9:10 AM
In D68818#1709688, @tra wrote:

@rsmith Richard, could you take a look, please? Lambdas, mangling, ODR rules & ABI scare me. :-)

@tra thanks for promoting the review. This patch is quite critical to support extended lambda in clang. We have several workloads that have this mangle numbering issue.

hliao added a comment.Oct 16 2019, 6:01 AM

PING for review

Broadly, I think it's reasonable to number additional lambda expressions in CUDA compilations. However:

  • This is (in theory) an ABI break on the host side, as it changes the lambda numbering in inline functions and function templates and the like. That could be mitigated by using a different numbering sequence for the lambdas that are only numbered for this reason.
  • Depending on whether the call operator is a device function is unstable. If I understand the CUDA rules correctly, then in practice, because constexpr functions are implicitly host device, all lambdas will get numbered in CUDA on C++14 onwards but not in CUDA on C++11, and we generally want those modes to be ABI-compatible. I'd suggest you simplify and stabilize this by simply numbering all lambdas in CUDA mode.
clang/lib/Sema/SemaLambda.cpp
477

What happens if there are other enclosing constructs that would give the lambda internal linkage (eg, an anonymous namespace or static function that might collide with one in another translation unit, or a declaration involving a type with internal linkage)? Presumably you can still suffer from mangling collisions in those cases, at least if you link together multiple translation units containing device code. Do we need (something like) unique identifiers for device code TUs to use in manglings of ostensibly internal linkage entities?

hliao marked an inline comment as done.Oct 16 2019, 1:48 PM

Broadly, I think it's reasonable to number additional lambda expressions in CUDA compilations. However:

  • This is (in theory) an ABI break on the host side, as it changes the lambda numbering in inline functions and function templates and the like. That could be mitigated by using a different numbering sequence for the lambdas that are only numbered for this reason.
  • Depending on whether the call operator is a device function is unstable. If I understand the CUDA rules correctly, then in practice, because constexpr functions are implicitly host device, all lambdas will get numbered in CUDA on C++14 onwards but not in CUDA on C++11, and we generally want those modes to be ABI-compatible. I'd suggest you simplify and stabilize this by simply numbering all lambdas in CUDA mode.

I vote for the numbering of all lambdas in the CUDA mode once addressed in (D63164 as long as we decouple the numbering from the linkage. As long as all lambdas are numbered, the first one is also mitigated. I will prepare the changes forcing all lambdas to be numbered under CUDA/HIP mode.

clang/lib/Sema/SemaLambda.cpp
477

Those unnamed ones will be addressed in late patches as, currently, we only have lambda numbering issues from real workloads. I am preparing changes to address other unnamed types, namespaces, and etc.

I agree with Richard's suggestion of just numbering all the lambdas in both modes if that's viable.

clang/include/clang/AST/DeclCXX.h
390–394

If this field is necessary, please borrow its space from ManglingNumber below rather than the number of explicit captures.

hliao updated this revision to Diff 225452.Oct 17 2019, 9:43 AM

Force numbering on all lambdas in CUDA/HIP.

hliao marked an inline comment as done.Oct 17 2019, 9:44 AM
hliao added a comment.Oct 18 2019, 6:20 AM

PING for review

hliao updated this revision to Diff 225678.Oct 18 2019, 12:34 PM

minor test case revising.

rsmith accepted this revision.Oct 18 2019, 12:54 PM
rsmith added inline comments.
clang/include/clang/AST/DeclCXX.h
1713

This function name should mention that it's only applicable to lambdas.

This revision is now accepted and ready to land.Oct 18 2019, 12:54 PM
This revision was automatically updated to reflect the committed changes.