GPU-oriented programming languages have some operations with constraints
that cannot currently be expressed properly in LLVM IR. For example:
uvec4 result; if (cc) { result = ballot(true); } else { result = ballot(true); }
Even though both sides of the branch are identical, it is incorrect to
replace the if-statement with a single ballot call. This is because
ballot communicates with other threads, and the set of those threads
depends on where ballot is with respect to control flow.
In the past, we have tried to fix this up somewhat by putting the
convergent attribute on functions. However, this approach has some
weaknesses. First, the restrictions imposed by convergent are not
actually strong enough for some cases such as the example above. Second,
the definition of convergent relies on the notion of
control-dependencies, which have action at a distance that makes it
difficult to satisfy. For example, the jump threading pass currently
does not honor the convergent attribute correctly in cases
such as:
bool flag = false; if (cc1) { ... if (cc2) flag = true; } if (flag) { result = ballot(true); }
Since the convergent ballot operation is at a distance from the part
of the code inspected by the jump threading pass, the pass will decide
to transform the code in an incorrect way.
This patch proposes to fix these and related problems by putting the
convergent attribute and the underlying notions of divergence and
reconvergence on a solid formal basis. At the same time, the impact
on generic transforms is small by design: a new set of intrinsics is
introduced that can be used to control reconvergence without being
prone to action at a distance. Frontends for GPU-oriented programming
langauges are expected to insert these intrinsics, so that passes such
as jump threading will be "correct by default".
In the jump threading example above, a frontend would be expected to
insert intrinsics as follows:
bool flag = false; token tok = @llvm.convergence.anchor(); if (cc1) { ... if (cc2) flag = true; } @llvm.convergence.join(tok); if (flag) { result = ballot(true); }
The convergence intrinsics indicate that threads are expected to
reconverge before the second if-statement, which affects the behavior
of the ballot call. The join intrinsic call guards against incorrect
jump threading.
The intention of this RFC is to gauge the interest of the LLVM community
and whether this direction can be accepted going forward. Frontend and
backend parts are required for a complete solution, though the frontend
parts are language-specific and therefore not part of LLVM itself.
Additional Notes:
- Function inlining really needs to add convergence intrinsics when the caller is convergent and the callee contains control flow
While I like the idea of formalizing the whole "Dynamic Instances" concept I wonder if it could be explained in a more intuitive way as well in order to make the understanding of the formal definitions easier by approaching them with an intuition already of what they are.