This is an archive of the discontinued LLVM Phabricator instance.

[LLVM][Uniformity] Improve detection of uniform registers
ClosedPublic

Authored by sameerds on May 12 2023, 4:15 AM.

Details

Summary

The MachineUA now queries the target to determine if a given register holds a
uniform value. This is determined using the corresponding register bank if
available, or by a combination of the register class and value type. This
assumes that the target is optimizing for performance by choosing registers, and
the target is responsible for any mismatch with the inferred uniformity.

For example, on AMDGPU, an SGPR is now treated as uniform, except if the
register bank is VCC (i.e., the register holds a wave-wide vector of 1-bit
values) or equivalently if it has a value type of s1.

  • This does not always work with inline asm, where the register bank or the value type might not be present. We assume that the SGPR is uniform, because it is not expected to be s1 in the vast majority of cases.
  • The pseudo branch instruction SI_LOOP is now hard-coded to be always divergent, although its condition is an SGPR.

Diff Detail

Event Timeline

sameerds created this revision.May 12 2023, 4:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 12 2023, 4:15 AM
sameerds requested review of this revision.May 12 2023, 4:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 12 2023, 4:15 AM
sameerds added a subscriber: Restricted Project.
arsenm added inline comments.May 12 2023, 7:18 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2879

Can combine these two queries into getRegClassOrRegBank

2891

Just use LLT instead of auto, it's even shorter

2892–2893

You should just be able to do equality without pre-checking for a valid type. Just MRI.getType() != LLT::scalar(1) should be enough

2895

Don't see why this needs to be an assert, equality with s1 should be adequate

sameerds updated this revision to Diff 522094.May 15 2023, 2:22 AM
  • Simplified comparison of value type with s1.
  • Ported the test for temporal divergence from the LLVM version to MIR.
sameerds marked 3 inline comments as done.May 15 2023, 2:25 AM
sameerds added inline comments.
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2879

What's the advantage of that? The PointerUnion will simply conflate two different things for me, and the only improvement that I see is that I can check both for nullptr at the same time. After that, I still have to insert at least one ugly dyn_cast along with a full type name instead of auto just to get to the RegBank. Isn't the current version much more readable?

arsenm added inline comments.May 15 2023, 3:22 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2879

It's one map lookup instead of 2. It also is really one thing, not two

sameerds updated this revision to Diff 522124.May 15 2023, 4:28 AM

simplified inspection of registers; let RBI handle that

sameerds added inline comments.May 15 2023, 4:31 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
2879

Managed to just offload the decision to RegisterBankInfo::getRegBank(). It already handles all the corner cases, and now we treat it as the source of truth rather than duplicating that logic.

arsenm accepted this revision.May 15 2023, 12:37 PM
This revision is now accepted and ready to land.May 15 2023, 12:37 PM
ruiling added inline comments.May 15 2023, 7:27 PM
llvm/lib/Target/AMDGPU/SIInstructions.td
423

I think `SI_IF and SI_ELSE also need this flag being set, but it can be done in a separate change with proper tests.

llvm/lib/Target/AMDGPU/SIInstructions.td
423

I think the same. Marking SI_IF and SI_ELSE as never uniform fixed test failures I was seeing.