- HIP/CUDA host side needs to use device kernel symbol name to match the device side binaries. Without a consistent naming between host- and device-side compilations, it's risky that wrong device binaries are executed. Consistent naming is usually not an issue until unnamed types are used, especially the lambda. In this patch, the consistent name mangling is addressed for the extended lambdas, i.e. the lambdas annotated with __device__.
- In [Itanium C++ ABI][1], the mangling of the lambda is generally unspecified unless, in certain cases, ODR rule is required to ensure consisent naming cross TUs. The extended lambda is such a case as its name may be part of a device kernel function, e.g., the extended lambda is used as a template argument and etc. Thus, we need to force ODR for extended lambdas as they are referenced in both device- and host-side TUs. Furthermore, if a extended lambda is nested in other (extended or not) lambdas, those lambdas are required to follow ODR naming as well. This patch revises the current lambda mangle numbering to force ODR from an extended lambda to all its parent lambdas.
- On the other side, the aforementioned ODR naming should not change those lambdas' original linkages, i.e., we cannot replace the original internal with linkonce_odr; otherwise, we may violate ODR in general. This patch introduces a new field HasKnownInternalLinkage in lambda data to decouple the current linkage calculation based on mangling number assigned.
Details
- Reviewers
tra rsmith yaxunl martong shafik rjmccall - Commits
- rG5392566a34f6: CP 243ebfba17da72566ba29a891193e4814cbc4ef3 and revert unnecessary changes.
rG243ebfba17da: [hip][cuda] Fix the extended lambda name mangling issue.
rL375309: [hip][cuda] Fix the extended lambda name mangling issue.
rC375309: [hip][cuda] Fix the extended lambda name mangling issue.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
this's a patch address the same issue previously proposed to be worked around in https://reviews.llvm.org/D63164
minor comment to help review.
BTW, as the AST ser/deser is changed as well, not sure we have compatibility issue and, if there is, how to handle that. please advice me on that concern. Thanks.
clang/include/clang/AST/DeclCXX.h | ||
---|---|---|
390–394 | Try to avoid inflating memory footprint by borrowing one bit from NumExplicitCaptures. It seems to me that we won't have that large number (8192) of *explicit* capture. | |
clang/lib/Sema/SemaLambda.cpp | ||
432 | extracted from startLambdaDefinition so that we could have a single place to handle mangling numbers based on CUDA attributes. |
@rsmith Richard, could you take a look, please? Lambdas, mangling, ODR rules & ABI scare me. :-)
@tra thanks for promoting the review. This patch is quite critical to support extended lambda in clang. We have several workloads that have this mangle numbering issue.
Broadly, I think it's reasonable to number additional lambda expressions in CUDA compilations. However:
- This is (in theory) an ABI break on the host side, as it changes the lambda numbering in inline functions and function templates and the like. That could be mitigated by using a different numbering sequence for the lambdas that are only numbered for this reason.
- Depending on whether the call operator is a device function is unstable. If I understand the CUDA rules correctly, then in practice, because constexpr functions are implicitly host device, all lambdas will get numbered in CUDA on C++14 onwards but not in CUDA on C++11, and we generally want those modes to be ABI-compatible. I'd suggest you simplify and stabilize this by simply numbering all lambdas in CUDA mode.
clang/lib/Sema/SemaLambda.cpp | ||
---|---|---|
477 | What happens if there are other enclosing constructs that would give the lambda internal linkage (eg, an anonymous namespace or static function that might collide with one in another translation unit, or a declaration involving a type with internal linkage)? Presumably you can still suffer from mangling collisions in those cases, at least if you link together multiple translation units containing device code. Do we need (something like) unique identifiers for device code TUs to use in manglings of ostensibly internal linkage entities? |
I vote for the numbering of all lambdas in the CUDA mode once addressed in (D63164 as long as we decouple the numbering from the linkage. As long as all lambdas are numbered, the first one is also mitigated. I will prepare the changes forcing all lambdas to be numbered under CUDA/HIP mode.
clang/lib/Sema/SemaLambda.cpp | ||
---|---|---|
477 | Those unnamed ones will be addressed in late patches as, currently, we only have lambda numbering issues from real workloads. I am preparing changes to address other unnamed types, namespaces, and etc. |
I agree with Richard's suggestion of just numbering all the lambdas in both modes if that's viable.
clang/include/clang/AST/DeclCXX.h | ||
---|---|---|
390–394 | If this field is necessary, please borrow its space from ManglingNumber below rather than the number of explicit captures. |
clang/include/clang/AST/DeclCXX.h | ||
---|---|---|
1713 | This function name should mention that it's only applicable to lambdas. |
Try to avoid inflating memory footprint by borrowing one bit from NumExplicitCaptures. It seems to me that we won't have that large number (8192) of *explicit* capture.