This replaces readsExecAsData check with the convergent attribute
to prevent hoisting or sinking which seems to be illegal for vector
compares.
Details
- Reviewers
foad ruiling sebastian-ne arsenm
Diff Detail
Event Timeline
I am not sure this is a good idea. Actually an ordinary V_CMP is still sinkable. It is only the V_CMP lowered from llvm.amdgcn.icmp that cannot be sinked.
I am not sure this is a good idea. Actually an ordinary V_CMP is still sinkable. It is only the V_CMP lowered from llvm.amdgcn.icmp that cannot be sinked.
Right, an ordinary V_CMP is still sinkable because of the way the result is used, there is an implicit guarantee that bits in the result corresponding to inactive lanes (at the point of use) will be ignored. So marking *all* V_CMPs convergent is overkill, although it does not seem to have any bad effects on the test suite. On the other hand, duplicating all the V_CMP instructions (so we have separate convergent and non-convergent versions of them) doesn't seem like a good idea, because there are lots of them.
I'm not sure what the best solution is.
If duplicating V_CMP for convergent and non-convergent version would make code over-complex, I would go for this simple solution. We can re-visit this issue when we really hit some performance issue here. Let's see if others have different opinion.
Maybe another solution is to try to deprecate and remove amdgcn.icmp in favour of amdcgn.ballot, and try to lower ballot to a single machine instruction that is marked "convergent".
I see it that way: technically HW instruction is convergent. If it would not mask the result with exec it would not be convergent. Then you are right we can ignore this fact given the promise of the at least somewhat structured control flow. Then the use of the intrinsic breaks the promise and essentially stepping into the crosslane territory. Although that is not a property of the instruction but rather of its use.
I.e. marking compares as convergent seems to be safe, the other question is if it is desired or not.
As an aside, I am working on a patch to add a "shouldHoist" to control the behaviour of MachineLICM.
This mostly mirrors the shouldSink that already exists.