- On Windows, extended lambda has extra issues due to the numbering schemes are different between the host compilation (Microsoft C++ ABI) and the device compilation (Itanium C++ ABI. Additional device side lambda number is required per lambda for the host compilation to correctly mangle the device-side lambda name.
- A hybrid numbering context MSHIPNumberingContext is introduced to number a lambda for both host- and device-compilations.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/include/clang/AST/MangleNumberingContext.h | ||
---|---|---|
56–57 | It would be nicer if there were a single entry point that does the right thing for all mangling contexts. | |
clang/lib/AST/MicrosoftCXXABI.cpp | ||
68–69 | I know it's inheritance instead of composition, but maybe the simple thing to do here would be to inherit from the MicrosoftMangleNumberingContext, and then implement an override for getDeviceManglingNumber that delegates to the Itanium one. | |
90 | This is actually the device's mangling context, and it's an Itanium mangling context. I think it needs comments and a more descriptive variable name. |
clang/include/clang/AST/MangleNumberingContext.h | ||
---|---|---|
56–57 | could you elaborate in more details? |
clang/include/clang/AST/DeclCXX.h | ||
---|---|---|
394–398 | It seems a shame to grow LambdaDefinitionData by a pointer for all users of C++ that do not use CUDA. Optimizing bitfields may be worth the time, but I'll leave it to @rjmccall or @rsmith to give guidance on whether that's worth it. An alternative would be to store the device numbers in the mangling context and look them up when needed, since they are so rarely needed. | |
clang/include/clang/AST/MangleNumberingContext.h | ||
56–57 | I'll add comments at the call site. | |
clang/lib/Sema/SemaLambda.cpp | ||
477–479 | Rather than having a virtual method that returns bool, call a virtual method that does the entire thing you are trying to do. For example, MSHIP* could be responsible for setting the mangling number on the class. Although, maybe it should be storing the number in some side table now that I think about it. |
clang/include/clang/AST/DeclCXX.h | ||
---|---|---|
394–398 | I like the alternative way by storing all numbering into the mangle/numbering context instead of AST itself. But, it needs additional numbering post-processing after AST importing. Sound to me a major refactoring work likely to be addressed later. |
Looks like holidays are approaching, :). Anyway, it's really appreciated that you could review this patch to enable CUDA/HIP applications on Windows.
Thanks.
@hliao -- Can you take a look at https://bugs.llvm.org/show_bug.cgi?id=48866. This patch may be relevant there.
@rnk Reid, looks like this patch does fix a lambda mangling issue in CUDA on Windows. Could you take another look at it?
clang/include/clang/AST/DeclCXX.h | ||
---|---|---|
394–398 | Generally, I don't think we can count on contributors to come back later and optimize memory usage, so it seems reasonable to ask to avoid the regression in the first place. Above I see the bitfield usage optimizing memory usage of the number of captures, and then here we spent lots of memory storing device mangling numbers that are only used for CUDA. I think the memory usage concern still stands. I don't think it's unreasonable to maintain these numbers on the side in a DenseMap. |
I agree that we should aim to avoid broad memory impact for uncommon features, and this seems to apply.
clang/include/clang/AST/DeclCXX.h | ||
---|---|---|
394–398 | I think that's the right idea in general. I don't know how much it really matters for LambdaData, because there aren't a huge number of lambdas in typical translation units. But if you want to optimize this, a hashtable in the ASTContext or ManglingContext seems reasonable. |
- Remove DeviceManglingNumber to reduce memory usage.
- Add a table in ASTContext for device lambda mangling numbers if that lamba requires.
Looks good with one suggestion.
clang/include/clang/AST/ASTContext.h | ||
---|---|---|
540 | This is a nit, but I'd group the device mangling number map here with the other mangling number side tables. |
Hi, a whole bunch of cuda-related tests failed on windows with this in (http://45.33.8.238/win/32620/summary.html). Did you run check-clang before committing? I reverted this for now in 4874ff02417916cc9ff994b34abcb5e563056546 . After the revert, all tests pass again.
This is a nit, but I'd group the device mangling number map here with the other mangling number side tables.