This is an archive of the discontinued LLVM Phabricator instance.

[GISel][RegBankSelect] Hide assertion failure from LLT::getScalarSizeInBits [2/10]
ClosedPublic

Authored by vsk on Apr 14 2020, 11:40 AM.

Details

Summary

It looks like RegBankSelect can try to assign a bank based on a
DBG_VALUE instead of ignoring it. This eventually leads to an assert
in AArch64RegisterBankInfo::getInstrMapping because there is some info
missing from the DBG_VALUE MachineOperand (I see: `Assertion failed:
(RawData != 0 && "Invalid Type"), function getScalarSizeInBits`).

I'm not 100% sure it's safe to insert DBG_VALUE instructions right
before RegBankSelect (that's what -debugify-and-strip-all-safe is
doing). Any advice appreciated.

Depends on D78135.

Diff Detail

Event Timeline

vsk created this revision.Apr 14 2020, 11:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2020, 11:40 AM
dsanders accepted this revision.Apr 14 2020, 3:42 PM

Thanks for fixing this. LGTM

This eventually leads to an assert
in AArch64RegisterBankInfo::getInstrMapping because there is some info
missing from the DBG_VALUE MachineOperand (I see: `Assertion failed:
(RawData != 0 && "Invalid Type"), function getScalarSizeInBits`).

Yes, most instructions get their constraints from the tablegen definitions. COPY is special as it gets it from whichever operand has already had a bank chosen. DBG_VALUE is also special as it genuinely doesn't have a constraint. It's happy to just go along with whatever the rest of the def/use chain decides.

I'm not sure why we haven't run into this before. Maybe the processing order just works out such that for real debug info we've already chosen a reg bank before we look at the DBG_VALUE's

This revision is now accepted and ready to land.Apr 14 2020, 3:42 PM
vsk added a comment.EditedApr 14 2020, 4:17 PM

I'm also not sure how to explain why this assert doesn't fire as often in practice. In this test case, we hit the failure while processing the function 'phiPropagation'. It looks like we take the getInstrMappingImpl path for the first instruction in the entry block (LDRWui), but that that does not set the LLT for %0:gpr32? That would explain what happens next: we subsequently take the generic path for the DBG_VALUE and hit the assert. FWIW this assertion fires on quite a few AArch64 .mir tests, not just this one. So there's some discrepancy here we don't fully understand.

Here is my debug session with this patch backed out:

frame #8: 0x00000001022bb012 llc`llvm::RegBankSelect::runOnMachineFunction(this=0x000000010572e670, MF=0x00000001057745a0) at RegBankSelect.cpp:700:12 [opt]
   697        // if (MI.isDebugInstr())
   698        //   continue;
   699
-> 700        if (!assignInstr(MI)) {
   701          reportGISelFailure(MF, *TPC, *MORE, "gisel-regbankselect",
   702                             "unable to map instruction", MI);
   703          return false;
(lldb) p MI.dump()
  DBG_VALUE %0:gpr32, $noreg, !"1", !DIExpression(), debug-location !23; /Users/vsk/src/llvm-project-master/llvm/test/CodeGen/AArch64/GlobalISel/arm64-regbankselect.mir:6:1 line no:6
(lldb) p MF.dump()
# Machine code for function phiPropagation: IsSSA, TracksLiveness, Legalized

bb.0.entry:
  successors: %bb.2(0x40000000), %bb.1(0x40000000); %bb.2(50.00%), %bb.1(50.00%)
  liveins: $x0, $x1, $w2
  %0:gpr32 = LDRWui killed $x0, 0, debug-location !23 :: (load 4 from %ir.src); /Users/vsk/src/llvm-project-master/llvm/test/CodeGen/AArch64/GlobalISel/arm64-regbankselect.mir:6:1
  DBG_VALUE %0:gpr32, $noreg, !"1", !DIExpression(), debug-location !23; /Users/vsk/src/llvm-project-master/llvm/test/CodeGen/AArch64/GlobalISel/arm64-regbankselect.mir:6:1 line no:6

(lldb) p MF.dump()

Machine code for function phiPropagation: IsSSA, TracksLiveness, Legalized

bb.0.entry:

successors: %bb.2(0x40000000), %bb.1(0x40000000); %bb.2(50.00%), %bb.1(50.00%)
liveins: $x0, $x1, $w2
%0:gpr32 = LDRWui killed $x0, 0, debug-location !23 :: (load 4 from %ir.src); /Users/vsk/src/llvm-project-master/llvm/test/CodeGen/AArch64/GlobalISel/arm64-regbankselect.mir:6:1
DBG_VALUE %0:gpr32, $noreg, !"1", !DIExpression(), debug-location !23; /Users/vsk/src/llvm-project-master/llvm/test/CodeGen/AArch64/GlobalISel/arm64-regbankselect.mir:6:1 line no:6

If you moved the DBG_VALUE below the %5(s32) = COPY %0 does it still assert? It's interesting that the target instruction means the DBG_VALUE is the first to be processed in that def/use chain but I suspect the real problem is that it's not deriving the register bank from the register class and the special case for COPY deals with that.

vsk added a comment.Apr 14 2020, 5:01 PM

(lldb) p MF.dump()

Machine code for function phiPropagation: IsSSA, TracksLiveness, Legalized

bb.0.entry:

successors: %bb.2(0x40000000), %bb.1(0x40000000); %bb.2(50.00%), %bb.1(50.00%)
liveins: $x0, $x1, $w2
%0:gpr32 = LDRWui killed $x0, 0, debug-location !23 :: (load 4 from %ir.src); /Users/vsk/src/llvm-project-master/llvm/test/CodeGen/AArch64/GlobalISel/arm64-regbankselect.mir:6:1
DBG_VALUE %0:gpr32, $noreg, !"1", !DIExpression(), debug-location !23; /Users/vsk/src/llvm-project-master/llvm/test/CodeGen/AArch64/GlobalISel/arm64-regbankselect.mir:6:1 line no:6

If you moved the DBG_VALUE below the %5(s32) = COPY %0 does it still assert? It's interesting that the target instruction means the DBG_VALUE is the first to be processed in that def/use chain but I suspect the real problem is that it's not deriving the register bank from the register class and the special case for COPY deals with that.

No, after moving the DBG_VALUE, I don't hit the assertion:

%0:gpr32 = LDRWui
%5:_(s32) = COPY %0
DBG_VALUE %5

Sorry, I'm not entirely sure what to make of that. Do you suspect -mir-debugify is adding DBG_VALUE insts in the wrong place? In the generic regbankselect path, is it required for operands to go through a COPY before they can be used?

In D78137#1982421, @vsk wrote:

(lldb) p MF.dump()

Machine code for function phiPropagation: IsSSA, TracksLiveness, Legalized

bb.0.entry:

successors: %bb.2(0x40000000), %bb.1(0x40000000); %bb.2(50.00%), %bb.1(50.00%)
liveins: $x0, $x1, $w2
%0:gpr32 = LDRWui killed $x0, 0, debug-location !23 :: (load 4 from %ir.src); /Users/vsk/src/llvm-project-master/llvm/test/CodeGen/AArch64/GlobalISel/arm64-regbankselect.mir:6:1
DBG_VALUE %0:gpr32, $noreg, !"1", !DIExpression(), debug-location !23; /Users/vsk/src/llvm-project-master/llvm/test/CodeGen/AArch64/GlobalISel/arm64-regbankselect.mir:6:1 line no:6

If you moved the DBG_VALUE below the %5(s32) = COPY %0 does it still assert? It's interesting that the target instruction means the DBG_VALUE is the first to be processed in that def/use chain but I suspect the real problem is that it's not deriving the register bank from the register class and the special case for COPY deals with that.

No, after moving the DBG_VALUE, I don't hit the assertion:

%0:gpr32 = LDRWui
%5:_(s32) = COPY %0
DBG_VALUE %5

Sorry, I'm not entirely sure what to make of that. Do you suspect -mir-debugify is adding DBG_VALUE insts in the wrong place? In the generic regbankselect path, is it required for operands to go through a COPY before they can be used?

I think mir-debugify is doing the right thing (at the very least it's doing a valid thing) and RegBankSelect is wrong to fail on it. In GlobalISel, vregs can have three kinds of constraint: LLT, Register Bank, Register Class. There's a relationship between the type of constraint that should be used and the kind of instruction they're used on. For example, a generic opcode (G_*) should have either LLT or Register Bank constraints but not a Register Class constraint. The important one is that target instructions should have a Register Class constraint.

What I believe is happening without this patch is it's seeing DBG_VALUE isn't a target instruction and concluding that it will never have a Register Class constraint and must therefore either be assigned a new Register Bank constraint or an existing Register Bank constraint must be read to influence the decisions elsewhere in the algorithm. However, when the DBG_VALUE occurs before the COPY that assumption is wrong and the code asserts on the unexpected Register Class constraint.

The simplest fix to this is to just skip the DBG_VALUE's as you've done in this patch. We'll still make a decision based on the other instructions that def/use the vreg and the DBG_VALUE shouldn't be influencing the decisions anyway.

vsk added a comment.Apr 15 2020, 2:42 PM
In D78137#1982421, @vsk wrote:

(lldb) p MF.dump()

Machine code for function phiPropagation: IsSSA, TracksLiveness, Legalized

bb.0.entry:

successors: %bb.2(0x40000000), %bb.1(0x40000000); %bb.2(50.00%), %bb.1(50.00%)
liveins: $x0, $x1, $w2
%0:gpr32 = LDRWui killed $x0, 0, debug-location !23 :: (load 4 from %ir.src); /Users/vsk/src/llvm-project-master/llvm/test/CodeGen/AArch64/GlobalISel/arm64-regbankselect.mir:6:1
DBG_VALUE %0:gpr32, $noreg, !"1", !DIExpression(), debug-location !23; /Users/vsk/src/llvm-project-master/llvm/test/CodeGen/AArch64/GlobalISel/arm64-regbankselect.mir:6:1 line no:6

If you moved the DBG_VALUE below the %5(s32) = COPY %0 does it still assert? It's interesting that the target instruction means the DBG_VALUE is the first to be processed in that def/use chain but I suspect the real problem is that it's not deriving the register bank from the register class and the special case for COPY deals with that.

No, after moving the DBG_VALUE, I don't hit the assertion:

%0:gpr32 = LDRWui
%5:_(s32) = COPY %0
DBG_VALUE %5

Sorry, I'm not entirely sure what to make of that. Do you suspect -mir-debugify is adding DBG_VALUE insts in the wrong place? In the generic regbankselect path, is it required for operands to go through a COPY before they can be used?

I think mir-debugify is doing the right thing (at the very least it's doing a valid thing) and RegBankSelect is wrong to fail on it. In GlobalISel, vregs can have three kinds of constraint: LLT, Register Bank, Register Class. There's a relationship between the type of constraint that should be used and the kind of instruction they're used on. For example, a generic opcode (G_*) should have either LLT or Register Bank constraints but not a Register Class constraint. The important one is that target instructions should have a Register Class constraint.

What I believe is happening without this patch is it's seeing DBG_VALUE isn't a target instruction and concluding that it will never have a Register Class constraint and must therefore either be assigned a new Register Bank constraint or an existing Register Bank constraint must be read to influence the decisions elsewhere in the algorithm. However, when the DBG_VALUE occurs before the COPY that assumption is wrong and the code asserts on the unexpected Register Class constraint.

The simplest fix to this is to just skip the DBG_VALUE's as you've done in this patch. We'll still make a decision based on the other instructions that def/use the vreg and the DBG_VALUE shouldn't be influencing the decisions anyway.

Thanks for your patient explanation! All right, I'll consider this good to land then.

vsk updated this revision to Diff 257923.Apr 15 2020, 6:12 PM
vsk retitled this revision from [RegBankSelect] Hide assertion failure from LLT::getScalarSizeInBits to [GISel][RegBankSelect] Hide assertion failure from LLT::getScalarSizeInBits [2/10].

retitle

This revision was automatically updated to reflect the committed changes.