This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Fix @llvm.amdgcn.wqm.vote implementation
AcceptedPublic

Authored by cwabbott on Feb 7 2019, 7:31 AM.

Details

Summary

My understanding, and the current behavior of the backend, is that
booleans lowered to bitmasks have undefined values in bits corresponding
to lanes that aren't active. If we just lower this to a WQM intrinsic,
then if the values for inactive lanes are 1, then this may change the
result if not every thread in a quad is active. Fix this by AND'ing with
EXEC to mask out the garbage lanes.

This fixes some VK conformance tests when making the clustered subgroup
reduce operations use @llvm.amdgcn.wqm.vote. I added an extra test which
currently miscompiles.

One other way to handle this would be to change the way we lower boolean
operations like NOT so that inactive lanes are always 0. So far as I can
see, in terms of code quality, this would only prevent a few transforms,
like S_ORN2_B64 for src0 | ~src1. The main thing would be lowering NOT
to an XOR with EXEC. Otherwise we'd have to do an analysis to prove
that the AND is redundant in order to get rid of these instructions.
Thoughts?

Event Timeline

cwabbott created this revision.Feb 7 2019, 7:31 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2019, 7:31 AM
mareko accepted this revision.Feb 7 2019, 12:05 PM
This revision is now accepted and ready to land.Feb 7 2019, 12:05 PM
nhaehnle accepted this revision.Feb 12 2019, 1:19 AM

FWIW, the problem is a bit more involved than that. Consider

bool value = ...;
if (divergent condition) {
  use(value);
}

So the undefinedness of inactive bits in this case is not due to suboptimal lowering of NOT, but inherently due to the fact that we're in a different region of control flow.

This needs a proper solution at some point, possibly with some knownbits-like analysis in the backend to check whether bits outside of EXEC are known to be 0, in order to be able to generate optimal or at least better code in all cases.

For now, I think this is a reasonable enough bug fix.

cwabbott updated this revision to Diff 187772.Feb 21 2019, 5:32 AM

Fixed a regression in the llvm.amdgcn.kill tests.

arsenm added inline comments.Feb 21 2019, 6:38 AM
llvm/lib/Target/AMDGPU/SOPInstructions.td
1039

This needs a comment for why the and is needed