While convergent functions are functions where the fact that they are called
must be uniform across multiple threads in an SIMT/SPMD-type execution
model, convergent function arguments are arguments whose value must be
uniform across multiple threads.
The problem that this is intended to address is that (for AMDGPU, but also
for general GLSL semantics):
%v1 = texelFetch(%sampler, %coord0) %v2 = texelFetch(%sampler, %coord1) %v = select i1 %cond, vType %v1, %v2
is logically equivalent to and could benefit from being transformed to:
%coord = select i1 %cond, cType %coord0, %coord1 %v = texelFetch(%sampler, %coord)
On the other hand,
%v1 = texelFetch(%sampler0, %coord) %v2 = texelFetch(%sampler1, %coord) %v = select i1 %cond, vType %v1, %v2
must not be transformed to
%s = select i1 %cond, sType %sampler0, %sampler1 %v = texelFetch(%s, %coord)
because of uniformity restrictions on the first argument of texelFetch.
While InstCombine does not actually perform these transforms today,
SimplifyCFG does tail sinking that amounts to the same thing, and there are
shaders in the wild that are mis-compiled because of it.
In other words, this patch is really a bug fix, but it tries to fix the bug
without unnecessary performance regression and keep the door open for future
optimization improvements.
This is very much an RFC with feedback very much appreciated, but I'd
personally be happy to push the patch as-is (minus the part of the
select-call.ll test which merely illustrates a potential future improvement
and minus the incomplete formalization).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97988
(Based on what I've heard about GPUs) Should you emphasize that they're executed in "lockstep"? Or is that not a correct statement?