This is an archive of the discontinued LLVM Phabricator instance.

[LoopUnroll] adjust for new `convergent` semantics
Needs ReviewPublic

Authored by sameerds on Jun 26 2023, 12:52 AM.

Details

Summary

Adjust the unrolling check for the new semantics:

  • There is no restriction on loops with convergent operations that are controlled inside the loop -- their behavior with respect to cross-thread communication is (partially) implementation-defined, and the loop unrolling is part of the implementation, so...
  • Fix unrolling with loop heart intrinsics: in a typical loop with a loop heart in the header that uses a token from outside the loop, duplicating the intrinsic would introduce multiple static uses of a convergence control token in a cycle that does not contain its definition.

Spell out the setup of UnrollLoopOptions to reduce the potential for
confusion caused by very long struct initializers.

Original implementation [D85605] by Nicolai Haehnle <nicolai.haehnle@amd.com>.

Diff Detail

Event Timeline

sameerds created this revision.Jun 26 2023, 12:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 26 2023, 12:52 AM
sameerds requested review of this revision.Jun 26 2023, 12:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 26 2023, 12:52 AM
sameerds added a subscriber: Restricted Project.

This looks quite high risk. How is it tested beyond this lit test, specifically with code that runs on the GPU? It's really easy to get broken transforms to pass a lit test where the checks were generated from the implementation.

Also could you point reviewer's to the documentation on the heart intrinsic system and how it relates to the current modeling? I know there's a slide deque or two somewhere, hoping for something more formal.

This looks quite high risk. How is it tested beyond this lit test, specifically with code that runs on the GPU? It's really easy to get broken transforms to pass a lit test where the checks were generated from the implementation.

Also could you point reviewer's to the documentation on the heart intrinsic system and how it relates to the current modeling? I know there's a slide deque or two somewhere, hoping for something more formal.

Thanks for taking an interest, Jon. There's a reason why the intrinsics have the word "experimental" in their name. Risk is limited to any flow that actually decides to use them, and no one is being exhorted to actually run this stuff on their GPU at this point. As for documentation, there's definitely been a LOT more activity than just a "slide deck or two", both inside AMD as well as outside. Please do refer to the "stack" of reviews that this Phab review belongs to.

Also, there's this, for links to all historical stuff:
https://discourse.llvm.org/t/rfc-introduce-convergence-control-intrinsics/69613

This looks quite high risk. How is it tested beyond this lit test, specifically with code that runs on the GPU? It's really easy to get broken transforms to pass a lit test where the checks were generated from the implementation.

The fun part is things are

llvm/lib/Transforms/Utils/LoopUnroll.cpp
402–427

This is a lot for one LLVM_DEBUG, move this all to a function?

! In D153744#4448146, @sameerds wrote:
As for documentation, there's definitely been a LOT more activity than just a "slide deck or two", both inside AMD as well as outside. Please do refer to the "stack" of reviews that this Phab review belongs to.

Also, there's this, for links to all historical stuff:
https://discourse.llvm.org/t/rfc-introduce-convergence-control-intrinsics/69613

It's been a multi-year thing with multiple authors, I doubt trawling the history will provide anyone with an accurate idea of what we're currently trying to build.

This stack is revising how control flow is modelled in llvm for GPUs. The failure modes are substantial. Could part of the implementation effort be directed to writing down what the overall algorithm is intended to be, somewhere that is visible outside of AMD? Hopefully that's a direct adaption of whatever design document you are working from internally.

It's been a multi-year thing with multiple authors, I doubt trawling the history will provide anyone with an accurate idea of what we're currently trying to build.

This stack is revising how control flow is modelled in llvm for GPUs. The failure modes are substantial. Could part of the implementation effort be directed to writing down what the overall algorithm is intended to be, somewhere that is visible outside of AMD? Hopefully that's a direct adaption of whatever design document you are working from internally.

This review points to earlier "parent" reviews. In particular:

D147116: [RFC] Introduce convergence control intrinsics

That's the very first review in this stack.

arsenm added inline comments.Jun 26 2023, 11:15 AM
llvm/lib/Analysis/CodeMetrics.cpp
114–126

It might be worth introducing a ConvergenceControlInst subclass of IntrinsicInst

For what it's worth, the formal semantics of convergence tokens and intrinsics landed in LLVM. This patch relaxes the constraints on loop unrolling in the presence of convergence tokens as defined in that spec:

https://llvm.org/docs/ConvergentOperations.html

sameerds updated this revision to Diff 541598.Jul 18 2023, 9:24 AM
  • rebased
  • addressed the comment about having too much code inside LLVM_DEBUG
sameerds marked 2 inline comments as done.Jul 18 2023, 9:27 AM
sameerds added inline comments.
llvm/lib/Analysis/CodeMetrics.cpp
114–126

I suppose so, but it seems too early to attempt for an experimental feature!

llvm/lib/Transforms/Utils/LoopUnroll.cpp
402–427

Fixed by moving it out into a function.

arsenm added inline comments.Jul 18 2023, 10:55 AM
llvm/lib/Analysis/CodeMetrics.cpp
114–126

Part of the experiment is how inconvenient it is, and also there's no better time to have the convenient code than when implementing it. We have ConstrainedFPIntrinsic for "experimental" features too. It's not a lot of work to add (not that it should be part of this patch)

sameerds marked 3 inline comments as done.Jul 18 2023, 8:41 PM
sameerds added inline comments.
llvm/lib/Analysis/CodeMetrics.cpp
114–126

Does that mean this change is good to go?