This is an archive of the discontinued LLVM Phabricator instance.

RFC: Uniformity Analysis for Irreducible Control Flow
ClosedPublic

Authored by sameerds on Jul 28 2022, 11:34 PM.

Details

Summary

Uniformity analysis is a generalization of divergence analysis to
include irreducible control flow:

  1. The proposed spec presents a notion of "maximal convergence" that captures the existing convention of converging threads at the headers of natual loops.
  2. Maximal convergence is then extended to irreducible cycles. The identity of irreducible cycles is determined by the choices made in a depth-first traversal of the control flow graph. Uniformity analysis uses criteria that depend only on closed paths and not cycles, to determine maximal convergence. This makes it a conservative analysis that is independent of the effect of DFS on CycleInfo.
  3. The analysis is implemented as a template that can be instantiated for both LLVM IR and Machine IR.

Current status:

  • passes existing tests for divergence analysis
  • passes new tests with irreducible control flow
  • currently working on Machine IR tests

Based on concepts originally outlined by
Nicolai Haehnle <nicolai.haehnle@amd.com>

With contributions from Ruiling Song <ruiling.song@amd.com> and
Jay Foad <jay.foad@amd.com>.

Diff Detail

Event Timeline

sameerds created this revision.Jul 28 2022, 11:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2022, 11:34 PM
sameerds requested review of this revision.Jul 28 2022, 11:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2022, 11:34 PM
ychen added a subscriber: ychen.Jul 29 2022, 1:15 AM
kuhar added a subscriber: kuhar.Aug 7 2022, 1:15 PM
tra added a subscriber: tra.Aug 16 2022, 12:22 PM
sameerds updated this revision to Diff 476468.Nov 18 2022, 7:26 AM
  1. Rebased.
  2. Added support for GMIR.
  3. Added tests for GMIR and MIR.

Bump!

This review has been waiting for attention for close to four months now. The uniformity analysis is important for the following:

  1. Lay a strong groundwork for defining the convergence control intrinsics proposed in D85603. This is meant to completely replace the currently under-specified “convergent” attribute in LLVM IR.
  2. Improve optimizations in irreducible CFGs.
  3. Make uniformity analysis available in MIR, thus improving code generation. Especially the AMDGPU backend will greatly benefit from this information when allocating scalar/vector registers in a wave.

So far, there have been some comments that are “not opposed” to the change. It’s not clear if people who should be interested in the patch are able to provide the right attention. The work was presented at the LLVM Dev Meeting in San Jose too, in order to attract more attention.

I would really like some advice on how to move this forward. The concepts are sound, as far as they were reviewed by people working directly on that change. The analysis is new and does not affect existing pipelines. The AMDGPU pipeline is eager to use this analysis and is committed to maintaining it. Is this sufficient reason to let the patch through?

For submitting, at least a code review is necessary, which does not need domain expertise.

Ok, so, just to make sure I'm understanding my skim correctly

  • The uniformity analysis refines the existing divergence analysis, allowing us to more precisely determine if a particular value is uniform
  • The main contribution is better handling of irreducible control flow by way of a bunch of stuff about cycle headers that's explained in the talk
  • On top of that, this is a refactor of the old analysis to allow it to work on MIR as well

One thing I'm slightly hazy on is how the results of this analysis relate to the old analysis. I think that the new code will mark some things convergent that the old analysis marked divergent conservatively, but I'm not sure.

Also, if the uniformity analysis strictly improves the divergence analysis, is there a plan for removing the old analysis after this lands?

Ok, so, just to make sure I'm understanding my skim correctly

  • The uniformity analysis refines the existing divergence analysis, allowing us to more precisely determine if a particular value is uniform
  • The main contribution is better handling of irreducible control flow by way of a bunch of stuff about cycle headers that's explained in the talk
  • On top of that, this is a refactor of the old analysis to allow it to work on MIR as well

Correct.

One thing I'm slightly hazy on is how the results of this analysis relate to the old analysis. I think that the new code will mark some things convergent that the old analysis marked divergent conservatively, but I'm not sure.

Correct.

Also, if the uniformity analysis strictly improves the divergence analysis, is there a plan for removing the old analysis after this lands?

UniformityAnalysis depends on CycleInfo, while DivergenceAnalysis depends on LoopInfo. This is an outcome of the "improved" handling of irreducible CFGs. Wherever this distinction is okay, AMDGPU definitely intends to replace DA with the new UA. For example, the CodeGen for AMDGPU still uses LegacyDivergenceAnalysis as a wrapper around DivergenceAnalysis for historic reasons. This new analysis will eliminate that.

An example of where the dependence on CycleInfo will need a closer look is in the loop transforms in the optimizer. They update LoopInfo but not CycleInfo. It's too early to say if it is worthwhile to implement incremental updates to CycleInfo. It's possible that initially, if we want to use UniformityAnalysis, we will simply recompute CycleInfo. This is not too a problem because UniformityAnalysis needs to be recomputed anyway before use. Note that every loop in LoopInfo is a subset of some cycle in CycleInfo: https://llvm.org/docs/CycleTerminology.html

jplehr added a subscriber: jplehr.Dec 1 2022, 1:43 AM
jmmartinez added inline comments.Dec 1 2022, 2:37 AM
llvm/include/llvm/ADT/GenericUniformityImpl.h
945–946

If I understand correctly, a MachineBasicBlock may have multiple terminators. So why stopping at the first terminator, instead of continuing ?

974–979

C->contains(Candidate) already checks if C == Candidate

What do you think about rewritting this loop as an any_of?

Also, I would assume that it is an outermost cycle if its depth is 1. Wouldn't be better to check this instead of it the cycle is contained by another? or am I missing something?

llvm/lib/IR/SSAContext.cpp
42

Is it normal that values with void type are included among the definitions ? I understand that this is intentional but just wanted to confirm.

sameerds marked an inline comment as done.Dec 9 2022, 1:19 AM
sameerds added inline comments.
llvm/include/llvm/ADT/GenericUniformityImpl.h
945–946

All terminators are at the end of the MBB. So any defs will occur before the first terminator. We don't need to continue looking beyond that.

974–979

The any_of is definitely nicer, thanks!

The meaning of "outermost" here is actually "farthest parent that happens to be divergent". Such a parent itself need not be "an outermost" cycle.

llvm/lib/IR/SSAContext.cpp
41

This should be a break, for the sake of readability.

42

Right. We should skip over void values. No harm in adding them, but it's less confusing if we are more precise about what is the intention.

sameerds updated this revision to Diff 481998.Dec 11 2022, 10:47 PM
sameerds marked an inline comment as done.
  • Rebased
  • Renamed insertIfOutermost to insertIfNotContained
  • Replaced a predicate for-loop with llvm::anyo_of
sameerds marked 2 inline comments as done.Dec 11 2022, 10:49 PM
This revision was not accepted when it landed; it landed in state Needs Review.Dec 19 2022, 5:52 PM
This revision was automatically updated to reflect the committed changes.
chapuni added inline comments.
llvm/include/llvm/ADT/GenericUniformityImpl.h
749

Used only in LLVM_DEBUG.

llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
121

Used only in assert.

FYI, I'm seeing the following build failures with ROCm 5.4.0 Clang (so somewhere between Clang 15 and 16) on Ubuntu 20.04 that arise from this patch. I don't know if these are a system configuration error or what but I thought I should post them here

[build] FAILED: lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineUniformityAnalysis.cpp.o 
[build] /opt/rocm/llvm/bin/clang++  -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/CodeGen -I/home/kdrewnia/llvm-project/llvm/lib/CodeGen -Iinclude -I/home/kdrewnia/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -g    -fno-exceptions -fno-rtti -std=c++17 -MD -MT lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineUniformityAnalysis.cpp.o -MF lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineUniformityAnalysis.cpp.o.d -o lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineUniformityAnalysis.cpp.o -c /home/kdrewnia/llvm-project/llvm/lib/CodeGen/MachineUniformityAnalysis.cpp
[build] In file included from /home/kdrewnia/llvm-project/llvm/lib/CodeGen/MachineUniformityAnalysis.cpp:9:
[build] In file included from /home/kdrewnia/llvm-project/llvm/include/llvm/CodeGen/MachineUniformityAnalysis.h:17:
[build] In file included from /home/kdrewnia/llvm-project/llvm/include/llvm/ADT/GenericUniformityInfo.h:16:
[build] In file included from /home/kdrewnia/llvm-project/llvm/include/llvm/ADT/GenericCycleInfo.h:31:
[build] In file included from /home/kdrewnia/llvm-project/llvm/include/llvm/ADT/ArrayRef.h:13:
[build] In file included from /home/kdrewnia/llvm-project/llvm/include/llvm/ADT/SmallVector.h:28:
[build] In file included from /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/memory:83:
[build] /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/unique_ptr.h:357:2: error: static assertion failed due to requirement '__is_invocable<llvm::GenericUniformityInfo<llvm::GenericSSAContext<llvm::MachineFunction>>::ImplDeleter &, llvm::GenericUniformityAnalysisImpl<llvm::GenericSSAContext<llvm::MachineFunction>> *>::value': unique_ptr's deleter must be invocable with a pointer
[build]         static_assert(__is_invocable<deleter_type&, pointer>::value,
[build]         ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] /home/kdrewnia/llvm-project/llvm/include/llvm/ADT/GenericUniformityInfo.h:41:3: note: in instantiation of member function 'std::unique_ptr<llvm::GenericUniformityAnalysisImpl<llvm::GenericSSAContext<llvm::MachineFunction>>, llvm::GenericUniformityInfo<llvm::GenericSSAContext<llvm::MachineFunction>>::ImplDeleter>::~unique_ptr' requested here
[build]   GenericUniformityInfo(FunctionT &F, const DominatorTreeT &DT,
[build]   ^
[build] In file included from /home/kdrewnia/llvm-project/llvm/lib/CodeGen/MachineUniformityAnalysis.cpp:9:
[build] In file included from /home/kdrewnia/llvm-project/llvm/include/llvm/CodeGen/MachineUniformityAnalysis.h:17:
[build] In file included from /home/kdrewnia/llvm-project/llvm/include/llvm/ADT/GenericUniformityInfo.h:16:
[build] In file included from /home/kdrewnia/llvm-project/llvm/include/llvm/ADT/GenericCycleInfo.h:31:
[build] In file included from /home/kdrewnia/llvm-project/llvm/include/llvm/ADT/ArrayRef.h:13:
[build] In file included from /home/kdrewnia/llvm-project/llvm/include/llvm/ADT/SmallVector.h:28:
[build] In file included from /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/memory:83:
[build] /usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/unique_ptr.h:454:2: error: static assertion failed due to requirement '__is_invocable<llvm::GenericUniformityInfo<llvm::GenericSSAContext<llvm::MachineFunction>>::ImplDeleter &, llvm::GenericUniformityAnalysisImpl<llvm::GenericSSAContext<llvm::MachineFunction>> *>::value': unique_ptr's deleter must be invocable with a pointer
[build]         static_assert(__is_invocable<deleter_type&, pointer>::value,
[build]         ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[build] /home/kdrewnia/llvm-project/llvm/include/llvm/ADT/GenericUniformityImpl.h:1179:6: note: in instantiation of member function 'std::unique_ptr<llvm::GenericUniformityAnalysisImpl<llvm::GenericSSAContext<llvm::MachineFunction>>, llvm::GenericUniformityInfo<llvm::GenericSSAContext<llvm::MachineFunction>>::ImplDeleter>::reset' requested here
[build]   DA.reset(new ImplT{Func, DT, CI, TTI});
[build]      ^
[build] 2 errors generated.