"convergent" is documented as meaning that the call cannot be made
control-dependent on more values, but in practice we also require that
it cannot be made control-dependent on fewer values, e.g. it cannot be
hoisted out of the body of an "if" statement.
In code like this, if we allow CSE to combine the two calls:
x = convergent_call(); if (cond) { y = convergent_call(); use y; }
then we get this:
x = convergent_call(); if (cond) { use x; }
This is conceptually equivalent to moving the second call out of the
body of the "if", up to the location of the first call, so it should be
disallowed.
This is the effect of the fix: we repeat the DPP subgroup operation over a reduced set of lanes, instead of reusing the result of the first DPP subgroup operation over all lanes.