This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Mark v_cmp* convergent
AbandonedPublic

Authored by rampitec on Jan 25 2022, 1:39 PM.

Details

Summary

This replaces readsExecAsData check with the convergent attribute
to prevent hoisting or sinking which seems to be illegal for vector
compares.

Diff Detail

Event Timeline

rampitec created this revision.Jan 25 2022, 1:39 PM
rampitec requested review of this revision.Jan 25 2022, 1:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 25 2022, 1:39 PM
Herald added a subscriber: wdng. · View Herald Transcript

This is the followup after discussion in the D117909.

I am not sure this is a good idea. Actually an ordinary V_CMP is still sinkable. It is only the V_CMP lowered from llvm.amdgcn.icmp that cannot be sinked.

foad added a comment.Jan 26 2022, 2:22 AM

I am not sure this is a good idea. Actually an ordinary V_CMP is still sinkable. It is only the V_CMP lowered from llvm.amdgcn.icmp that cannot be sinked.

Right, an ordinary V_CMP is still sinkable because of the way the result is used, there is an implicit guarantee that bits in the result corresponding to inactive lanes (at the point of use) will be ignored. So marking *all* V_CMPs convergent is overkill, although it does not seem to have any bad effects on the test suite. On the other hand, duplicating all the V_CMP instructions (so we have separate convergent and non-convergent versions of them) doesn't seem like a good idea, because there are lots of them.

I'm not sure what the best solution is.

I am not sure this is a good idea. Actually an ordinary V_CMP is still sinkable. It is only the V_CMP lowered from llvm.amdgcn.icmp that cannot be sinked.

Right, an ordinary V_CMP is still sinkable because of the way the result is used, there is an implicit guarantee that bits in the result corresponding to inactive lanes (at the point of use) will be ignored. So marking *all* V_CMPs convergent is overkill, although it does not seem to have any bad effects on the test suite. On the other hand, duplicating all the V_CMP instructions (so we have separate convergent and non-convergent versions of them) doesn't seem like a good idea, because there are lots of them.

If duplicating V_CMP for convergent and non-convergent version would make code over-complex, I would go for this simple solution. We can re-visit this issue when we really hit some performance issue here. Let's see if others have different opinion.

foad added a comment.Jan 26 2022, 5:15 AM

Maybe another solution is to try to deprecate and remove amdgcn.icmp in favour of amdcgn.ballot, and try to lower ballot to a single machine instruction that is marked "convergent".

Right, an ordinary V_CMP is still sinkable because of the way the result is used, there is an implicit guarantee that bits in the result corresponding to inactive lanes (at the point of use) will be ignored.

I see it that way: technically HW instruction is convergent. If it would not mask the result with exec it would not be convergent. Then you are right we can ignore this fact given the promise of the at least somewhat structured control flow. Then the use of the intrinsic breaks the promise and essentially stepping into the crosslane territory. Although that is not a property of the instruction but rather of its use.

I.e. marking compares as convergent seems to be safe, the other question is if it is desired or not.

Maybe another solution is to try to deprecate and remove amdgcn.icmp in favour of amdcgn.ballot, and try to lower ballot to a single machine instruction that is marked "convergent".

A second look, I agree with @foad using amdgcn.ballot is a good idea. Sounds like we need to fix the existing implementation of amdgcn.ballot as @foad's suggested.

rampitec abandoned this revision.Jan 31 2022, 1:52 PM

Abandoning since this is not a way to go.

critson added a subscriber: critson.Feb 1 2022, 1:27 AM

As an aside, I am working on a patch to add a "shouldHoist" to control the behaviour of MachineLICM.
This mostly mirrors the shouldSink that already exists.