I had a case where multiple nested uniform ifs resulted in code that did
v_cmp comparisons, combining the results with s_and_b64, s_or_b64 and
s_xor_b64 and using the resulting mask in s_cbranch_vccnz, without first
ensuring that bits for inactive lanes were clear.
There was already code for inserting an "s_and_b64 vcc, exec, vcc" to
clear bits for inactive lanes in the case that the branch is instruction
selected as s_cbranch_scc1 and is then changed to s_cbranch_vccnz in
SIFixSGPRCopies. I have moved that code into SILowerControlFlow so it
also handles the case that the branch is instruction selected as
s_cbranch_vccnz.
I have also added some code to analyze whether the s_and_b64 needs to be
inserted at all. If vcc was defined by a v_cmp, or a combination via
s_and/s_or of a number of v_cmp instructions, then it is not needed at
all. The s_and_b64 is now not inserted in these cases.
Several tests had such an unnecessary s_and_b64, so needed modifying to
cope with its disappearance. The indirect-addressing-si.ll test also
seemed to be affected by some kind of loop transformation, so the lit
script needed changing to cope with that.
Can you add further explanation? Should it still be OK as long as the incoming condition can be seen to be a VCC producer / not a bit op?