This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix vccz after v_readlane/v_readfirstlane to vcc_lo/hi
ClosedPublic

Authored by foad on Oct 31 2019, 6:41 AM.

Details

Summary

Up to gfx9, writes to vcc_lo and vcc_hi by instructions like
v_readlane and v_readfirstlane do not update vccz to reflect the new
value of vcc. Fix it by reusing part of the existing vccz bug handling
code, which inserts an "s_mov_b64 vcc, vcc" instruction to restore vccz
just before an instruction that needs the correct value.

Diff Detail

Event Timeline

foad created this revision.Oct 31 2019, 6:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 31 2019, 6:41 AM
arsenm added inline comments.Oct 31 2019, 8:18 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1425

I thought we already had a vccz bug subtarget feature check? Either way I want to limit getGeneration checks and keep them in a subtarget check

1428

This won't catch the implicit def in inline asm for example

foad marked 2 inline comments as done.Oct 31 2019, 8:36 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1425

Yes we have the hasReadVCCZBug feature mentioned a few lines above, but as I understand it that is a different and very specific bug.

I'm happy to add a new feature for this if necessary but I'm not sure what to call it. I'm not even sure if it's a bug or a feature. The best explanation I've heard so far is that it's just an artifact of the way vccz is implemented in the hardware, and that it doesn't affect gfx10 because the hardware implementation changed to support wave32. So maybe it should be under a "hasWave32Support" feature?

1428

I specifically wanted to avoid treating this instruction (from the test case) as a write to vcc, despite its implicit-def:

$vcc_hi = V_READFIRSTLANE_B32 killed $vgpr0, implicit $exec, implicit-def $vcc

What kind of inline asm are you thinking of?

arsenm added inline comments.Oct 31 2019, 8:52 AM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1425

It doesn't need to be a full subtarget feature, but I do prefer to keep the getGeneration checks consolidated inside the subtarget. I'm not familiar with this problem and not sure what to call it either

1428

Any inline asm that touches vcc will appear as an implicit-def. You can't know without context that an implicit-def isn't really modifying the register. I'm guessing this one is to model the super-register def?

foad updated this revision to Diff 227296.Oct 31 2019, 9:40 AM

Use a new named subtarget feature method.

foad marked 2 inline comments as done.Oct 31 2019, 9:42 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1428

I don't know exactly why the readfirstlane has an implicit def of vcc. I still think the most conservative fix is to only look for explicit defs of vcc_lo/vcc_hi, which is what my patch does.

arsenm added inline comments.Nov 1 2019, 2:02 PM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1428

Looking only at explicit defs is less conservative, as it will miss real defs. It will miss both inlineasm/call like operations, as well as the standard VCC defs as the VOPC vcc def appears in the implicit def list

llvm/test/CodeGen/AMDGPU/reload-vcc-vccz.mir
48 ↗(On Diff #227296)

Should add a test with an implicit def, e..g. INLINEASM

foad marked an inline comment as done.Nov 1 2019, 3:30 PM
foad added inline comments.
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1428

It is crucial that we treat this instruction:

$vcc_hi = V_READFIRSTLANE_B32 killed $vgpr0, implicit $exec, implicit-def $vcc

as a def of vcc_hi (which corrupts vccz) and NOT as a def of vcc (which fixes vccz), despite the implicit-def, otherwise we will not be fixing the bug.

I suppose I could handle this with something like:

if (Inst.definesRegister(VCC_LO) || Inst.definesRegister(VCC_HI))
  VCCZCorrect = false;
else if (Inst.definesRegister(VCC))
  VCCZCorrect = true;

where the "else" ensures that V_READFIRSTLANE_B32 will fall under the first condition not the second. Would you prefer that?

(Or, stepping back a bit, is it wrong for V_READFIRSTLANE_B32 to have an implicit-def of vcc in the first place?)

arsenm added inline comments.Nov 1 2019, 3:34 PM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1428

That may or may not work; I would have to re-check the details of how definesRegister/modifiesRegister are implemented. You should add a few tests stressing the implicit defs to be sure

The implicit def of the super register is probably correct. What I assume happened here is that the implicit def of the super register is necessary to keep the verifier happy. If you look at SIInstrInfo::copyPhysReg for example, it adds a def of the super register on the first copy so that the later subregister defs don't think they are reading undef other lanes:

if (Idx == 0)
  Builder.addReg(DestReg, RegState::Define | RegState::Implicit);
foad updated this revision to Diff 231394.Nov 28 2019, 3:04 AM

Check implicit as well as explicit defs.

Add test cases for inlineasm.

Rebase on D70708.

foad marked 5 inline comments as done.Nov 28 2019, 3:06 AM
arsenm added inline comments.Dec 1 2019, 10:06 PM
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1438

Can you just check modifiesRegister(VCC) instead of trying all the subregs?

foad marked an inline comment as done.Dec 2 2019, 1:23 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1438

I don't think that would let me distinguish this instruction (which writes to vcc_lo and corrupts vccz) from a "normal" write to vcc, would it?

$vcc_hi = V_READFIRSTLANE_B32 killed $vgpr0, implicit $exec, implicit-def $vcc
arsenm added inline comments.Jan 27 2020, 9:14 AM
llvm/test/CodeGen/AMDGPU/vccz-corrupt-bug-workaround.mir
142

CHECK-NEXT would be less fragile

159

ditto

foad updated this revision to Diff 240684.Jan 27 2020, 2:22 PM

Use CHECK-NEXT instead of CHECK-NOT.

arsenm accepted this revision.Jan 27 2020, 2:32 PM

LGTM with comment since I didn't know there was a second vccz bug

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1439

Can you add a comment explaining the difference between the SI bug and the gfx9 bug?

This revision is now accepted and ready to land.Jan 27 2020, 2:32 PM

Unit tests: pass. 62248 tests passed, 0 failed and 816 were skipped.

clang-tidy: pass.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

foad marked 2 inline comments as done.Jan 28 2020, 1:57 AM
foad marked 2 inline comments as done.Jan 28 2020, 2:54 AM
foad added inline comments.
llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
1439

I wasn't involved in the SI bug fix so I simply reinstated the original comment from D16725, which had got lost when this pass was reimplemented in D31161.

This revision was automatically updated to reflect the committed changes.
foad marked an inline comment as done.