This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Stop replacing amdgcn.ballot(1) with amdgcn.s.getreg(exec)
ClosedPublic

Authored by foad on Jun 15 2023, 9:54 AM.

Details

Summary

Rationale:

  • It does not enable any further IR simplifications.
  • It does not improve the generated code since the isel lowering of ballot also has special cases for 0 and 1.
  • getreg is "too powerful" since it can read from many different registers, so its intrinsic properties have to be set very conservatively.

There is also a correctness problem that getreg can read from exec but
it is currently not marked as convergent.

Diff Detail

Event Timeline

foad created this revision.Jun 15 2023, 9:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2023, 9:54 AM
foad requested review of this revision.Jun 15 2023, 9:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2023, 9:54 AM
arsenm accepted this revision.Jun 15 2023, 9:56 AM

There is also a correctness problem that getreg can read from exec but it is currently not marked as convergent.

You can mark the call sites convergent, which this is doing. We still have __builtin_amdgcn_read_exec emitting read_register, but I was thinking we should move them to ballot(true)

This revision is now accepted and ready to land.Jun 15 2023, 9:56 AM
foad added a comment.Jun 15 2023, 9:59 AM

You can mark the call sites convergent, which this is doing.

Hmm, I had not noticed that. I will take another look at the correctness problem that I thought I was seeing.

foad added a comment.Jun 16 2023, 9:15 AM

You can mark the call sites convergent, which this is doing.

Hmm, I had not noticed that. I will take another look at the correctness problem that I thought I was seeing.

D153151 should fix that, but I still plan to push this patch.

This revision was landed with ongoing or failed builds.Jun 16 2023, 9:18 AM
This revision was automatically updated to reflect the committed changes.