Page MenuHomePhabricator

[hip][cuda] Enable extended lambda support on Windows.

Authored by hliao on Oct 22 2019, 2:04 PM.


  • On Windows, extended lambda has extra issues due to the numbering schemes are different between the host compilation (Microsoft C++ ABI) and the device compilation (Itanium C++ ABI. Additional device side lambda number is required per lambda for the host compilation to correctly mangle the device-side lambda name.
  • A hybrid numbering context MSHIPNumberingContext is introduced to number a lambda for both host- and device-compilations.

Diff Detail

Event Timeline

hliao created this revision.Oct 22 2019, 2:04 PM
Herald added a reviewer: shafik. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
hliao added a comment.Oct 23 2019, 1:23 PM

PING for review

Please don't ping for review after a single day.

hliao added a comment.Oct 29 2019, 5:40 AM

PING for review

tra added a reviewer: rnk.Oct 29 2019, 9:10 AM
tra added a subscriber: rnk.

@rnk Reid, would you be the right person to look at the change on the Windows' side?

rnk added inline comments.Oct 29 2019, 4:31 PM

It would be nicer if there were a single entry point that does the right thing for all mangling contexts.


I know it's inheritance instead of composition, but maybe the simple thing to do here would be to inherit from the MicrosoftMangleNumberingContext, and then implement an override for getDeviceManglingNumber that delegates to the Itanium one.


This is actually the device's mangling context, and it's an Itanium mangling context. I think it needs comments and a more descriptive variable name.

martong resigned from this revision.Oct 30 2019, 2:23 AM
hliao updated this revision to Diff 227091.Oct 30 2019, 7:14 AM

revise MSHIPNumberingContext

hliao marked 3 inline comments as done.Oct 30 2019, 7:16 AM
hliao added inline comments.

could you elaborate in more details?

rnk added inline comments.Oct 30 2019, 11:46 AM

It seems a shame to grow LambdaDefinitionData by a pointer for all users of C++ that do not use CUDA. Optimizing bitfields may be worth the time, but I'll leave it to @rjmccall or @rsmith to give guidance on whether that's worth it.

An alternative would be to store the device numbers in the mangling context and look them up when needed, since they are so rarely needed.


I'll add comments at the call site.


Rather than having a virtual method that returns bool, call a virtual method that does the entire thing you are trying to do. For example, MSHIP* could be responsible for setting the mangling number on the class. Although, maybe it should be storing the number in some side table now that I think about it.

hliao updated this revision to Diff 227214.Oct 30 2019, 7:53 PM
hliao marked an inline comment as done.

simplify again following suggestion.

hliao added inline comments.Oct 30 2019, 7:54 PM

I like the alternative way by storing all numbering into the mangle/numbering context instead of AST itself. But, it needs additional numbering post-processing after AST importing. Sound to me a major refactoring work likely to be addressed later.

hliao added a comment.Nov 1 2019, 9:59 AM

kindly PING for review

hliao added a comment.Nov 4 2019, 2:00 PM

Looks like holidays are approaching, :). Anyway, it's really appreciated that you could review this patch to enable CUDA/HIP applications on Windows.


hliao added a comment.Nov 6 2019, 5:56 AM

PING for review

hliao added a comment.Nov 8 2019, 9:01 AM

PING for review

tra added a comment.Jan 26 2021, 9:41 AM

@hliao -- Can you take a look at This patch may be relevant there.

tra added a comment.Jan 28 2021, 10:39 AM
In D69322#2523058, @tra wrote:

@hliao -- Can you take a look at This patch may be relevant there.

@rnk Reid, looks like this patch does fix a lambda mangling issue in CUDA on Windows. Could you take another look at it?

rnk added inline comments.Jan 28 2021, 11:01 AM

Generally, I don't think we can count on contributors to come back later and optimize memory usage, so it seems reasonable to ask to avoid the regression in the first place. Above I see the bitfield usage optimizing memory usage of the number of captures, and then here we spent lots of memory storing device mangling numbers that are only used for CUDA. I think the memory usage concern still stands. I don't think it's unreasonable to maintain these numbers on the side in a DenseMap.

@rsmith or @rjmccall, do you agree with that reasoning?

I agree that we should aim to avoid broad memory impact for uncommon features, and this seems to apply.


I think that's the right idea in general. I don't know how much it really matters for LambdaData, because there aren't a huge number of lambdas in typical translation units. But if you want to optimize this, a hashtable in the ASTContext or ManglingContext seems reasonable.

hliao updated this revision to Diff 320409.Feb 1 2021, 12:59 AM

Rebase to the main branch.

hliao updated this revision to Diff 320568.Feb 1 2021, 12:58 PM
  • Remove DeviceManglingNumber to reduce memory usage.
  • Add a table in ASTContext for device lambda mangling numbers if that lamba requires.
hliao marked 2 inline comments as done.Feb 1 2021, 12:58 PM
hliao updated this revision to Diff 320679.Feb 1 2021, 10:41 PM

Revise comments.

hliao added a comment.Feb 2 2021, 2:26 PM

PING for review

hliao added a comment.Feb 3 2021, 12:57 PM

@rsmith, @rnk and @rjmccall, could you review the latest change addressing the memory usage issue?

rnk accepted this revision.Feb 3 2021, 1:58 PM

Looks good with one suggestion.


This is a nit, but I'd group the device mangling number map here with the other mangling number side tables.

This revision is now accepted and ready to land.Feb 3 2021, 1:58 PM
This revision was landed with ongoing or failed builds.Feb 3 2021, 10:39 PM
This revision was automatically updated to reflect the committed changes.

Hi, a whole bunch of cuda-related tests failed on windows with this in ( Did you run check-clang before committing? I reverted this for now in 4874ff02417916cc9ff994b34abcb5e563056546 . After the revert, all tests pass again.