This is an archive of the discontinued LLVM Phabricator instance.

[RegisterCoalescer] Do not call LiveIntervals::getInstructionIndex with a DBG_VALUE
ClosedPublic

Authored by bcahoon on Jan 23 2017, 2:05 PM.

Details

Summary

An assert occurs due to calling getInstructionIndex with a DBG_VALUE instruction because the DBG_VALUE instruction does not have a Slot Index. This occurs in updateRegDefUses.

I'm not sure if this is the correct fix, but there are other places where the call to getInstructionIndex is guarded by a check for DBG_VALUE.

I don't have a reproducible test case because the code that causes the crash is not in-tree. Also, I haven't had any luck reproducing the crash with in-tree code. I tried to generate a test case for ARM because it has composed SubRegs, but I was unsuccessful.

The error occurs, on Hexagon, when introducing a new register class that represents 4 vector registers. The MIR prior to joining the intervals looks like the code below. In the example, the wsub_lo SubReg contains a vsub_hi and vsub_lo register.

%vreg2<def> = load
DBG_VALUE %vreg2, %noreg, !"a",
%vreg4:vsub_lo<def,read-undef> = COPY %vreg2
%vreg6:wsub_lo<def,read-undef> = COPY %vreg4
%vreg6:wsub_hi<def> = COPY %vreg4

After %vreg2 and %vreg4:vsub_lo are coalesced:
...

%vreg4:vsub_lo<def,read-undef> = load
DBG_VALUE %vreg4:vsub_lo<undef>, %noreg, !"a",
%vreg6:wsub_lo<def,read-undef> = COPY %vreg4
%vreg6:wsub_hi<def> = COPY %vreg4

...

The DBG_VALUE that has a sub register results in the call to getInstructionIndex and the assert.

Diff Detail

Event Timeline

bcahoon created this revision.Jan 23 2017, 2:05 PM
MatzeB edited edge metadata.Jan 23 2017, 3:06 PM

At a first glance this looks like it fixes symptoms but not the root cause. It would probably be good if we could get a public reproducer.

At a first glance this looks like it fixes symptoms but not the root cause. It would probably be good if we could get a public reproducer.

I agree with the symptom vs. root cause comment. I also am not sure if I was doing something else that might be problematic. Especially after I was unable to generate a public test case that reproduced the problem. I tried a bunch of different things with ARM since that target has a composable SubRegs without any luck.

At a first glance this looks like it fixes symptoms but not the root cause. It would probably be good if we could get a public reproducer.

I agree with the symptom vs. root cause comment. I also am not sure if I was doing something else that might be problematic. Especially after I was unable to generate a public test case that reproduced the problem. I tried a bunch of different things with ARM since that target has a composable SubRegs without any luck.

Sometimes you can take an internal/private benchmark and reduce it with bugpoint to reduce it something simple that doesn't resemble the original code anymore but still fails.

As for other targets: hexagon and amdgpu are the only public targets with subregister liveness enabled which I suspect is necessary to trigger this.

I can take a look at this later whether I can spot something obvious, but a reproducer would obviously help...

It would also help if you could put up a backtrace with file+linenumbers (debug build).

bcahoon updated this revision to Diff 85981.Jan 26 2017, 3:57 PM

I've created a test case using the AMDGPU target that shows the error. This required a couple of steps. I took an existing AMDGPU lit test, and added debug metadata manually. Then, created MIR for the test by stopping after the TwoAddressInstruction pass. Then, I needed to insert a DBG_VALUE instruction in the correct location. I'm not really sure if it's possible for a DBG_VALUE instruction to appear in the specific place that I put it. But, with these changes, the test case causes an assert because there is a call to getInstructionIndex with a DBG_VALUE insruction.

I think the fix is reasonable. This is the dump of the problematic moment (with the fix), including the dump of LIS right before the crashing copy. The crash (assertion: "instruction not found in map") occurred when updating the register in the DBG_VALUE instruction, which indeed does not have a slot index.

********** MACHINEINSTRS **********
# Machine code for function test: NoPHIs, TracksLiveness
Function Live Ins: %SGPR0_SGPR1 in %vreg0, %VGPR0 in %vreg3

0B      BB#0: derived from LLVM BB %0
            Live Ins: %SGPR0_SGPR1 %VGPR0
16B             %vreg19:sub0<def,read-undef> = COPY %VGPR0; VReg_64:%vreg19
32B             %vreg0<def> = COPY %SGPR0_SGPR1; SGPR_64:%vreg0
48B             %vreg13:sub0_sub1<def,read-undef> = S_LOAD_DWORDX2_IMM %vreg0, 9, 0; mem:LD8[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant) SGPR_128:%vreg13 SGPR_64:%vreg0
64B             %vreg5<def> = S_LOAD_DWORD_IMM %vreg0, 13, 0; mem:LD4[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant) SReg_32_XM0_XEXEC:%vreg5 SGPR_64:%vreg0
80B             %vreg19:sub1<def> = V_ASHRREV_I32_e32 31, %vreg19:sub0, %EXEC<imp-use>; VReg_64:%vreg19
128B            %vreg12:sub1<def,read-undef> = S_MOV_B32 61440; SGPR_64:%vreg12
144B            %vreg12:sub0<def> = S_MOV_B32 0; SGPR_64:%vreg12
                DBG_VALUE %vreg12:sub0<undef>, %noreg, !"a", <!11>; SGPR_64:%vreg12 line no:126
208B            %vreg13:sub2_sub3<def> = COPY %vreg12; SGPR_128:%vreg13 SGPR_64:%vreg12
224B            %vreg20<def> = V_LSHL_B64 %vreg19, 2, %EXEC<imp-use>; VReg_64:%vreg20,%vreg19
240B            %vreg16<def> = COPY %vreg5; VGPR_32:%vreg16 SReg_32_XM0_XEXEC:%vreg5
256B            BUFFER_STORE_DWORD_ADDR64 %vreg16, %vreg20, %vreg13, 0, 0, 0, 0, 0, %EXEC<imp-use>; mem:ST4[%gep.out(addrspace=1)] VGPR_32:%vreg16 VReg_64:%vreg20 SGPR_128:%vreg13
272B            S_ENDPGM

# End machine code for function test.

208B    %vreg13:sub2_sub3<def> = COPY %vreg12; SGPR_128:%vreg13 SGPR_64:%vreg12
        Considering merging to SGPR_128 with %vreg12 in %vreg13:sub2_sub3
                RHS = %vreg12 [128r,144r:0)[144r,208r:1)  0@128r 1@144r L00000001 [144r,208r:0)  0@144r L00000002 [128r,208r:0)  0@128r
                LHS = %vreg13 [48r,208r:0)[208r,256r:1)  0@48r 1@208r L00000003 [48r,256r:0)  0@48r L0000000C [208r,256r:0)  0@208r
                merge %vreg13:1@208r into %vreg12:1@144r --> @144r
                LHST = %vreg13 %vreg13 [48r,208r:0)[208r,256r:1)  0@48r 1@208r L00000003 [48r,256r:0)  0@48r L0000000C [208r,256r:0)  0@208r
                Copy+Merge 0000000C into 00000004
                Reduce Lane to 00000008
                merge %vreg13:0@208r into %vreg12:0@144r --> @144r
                joined lanes: [144r,256r:0)  0@144r
                Copy+Merge 00000008 into 00000008
                merge %vreg13:0@208r into %vreg12:0@128r --> @128r
                joined lanes: [128r,256r:0)  0@128r
        Joined SubRanges %vreg13 [48r,208r:0)[208r,256r:1)  0@48r 1@208r L00000004 [144r,256r:0)  0@144r L00000003 [48r,256r:0)  0@48r L00000008 [128r,256r:0)  0@128r
                pruned %vreg13 at 128r: [48r,128r:0)[208r,256r:1)  0@48r 1@208r
                pruned %vreg13 at 144r: [48r,128r:0)[208r,256r:1)  0@48r 1@208r
                erased: 208r    %vreg13:sub2_sub3<def> = COPY %vreg12; SGPR_128:%vreg13 SGPR_64:%vreg12
                restoring liveness to 3 points: 208r,128r,144r:  %vreg13 [48r,128r:0)[128r,144r:1)[144r,256r:2)  0@48r 1@128r 2@144r L00000004 [144r,256r:0)  0@144r L00000003 [48r,256r:0)  0@48r L00000008 [128r,256r:0)  0@12
                updated: 128B   %vreg13:sub3<def> = S_MOV_B32 61440; SGPR_128:%vreg13
                updated: 144B   %vreg13:sub2<def> = S_MOV_B32 0; SGPR_128:%vreg13
[*CRASH*]       updated: DBG_VALUE %vreg13:sub2<undef>, %noreg, !"a", <!11>; SGPR_128:%vreg13 line no:126
        Success: %vreg12:sub2_sub3 -> %vreg13
        Result = %vreg13 [48r,128r:0)[128r,144r:1)[144r,256r:2)  0@48r 1@128r 2@144r L00000004 [144r,256r:0)  0@144r L00000003 [48r,256r:0)  0@48r L00000008 [128r,256r:0)  0@128r

Stack trace from the crash:

llc: /w/src/llvm.org/include/llvm/CodeGen/SlotIndexes.h:410: llvm::SlotIndex llvm::SlotIndexes::getInstructionIndex(const llvm::MachineInstr &) const: Assertion `itr != mi2iMap.end() && "Instruction not found in maps."' failed.
#0 0x0000000001b80dc8 llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/w/bld/org/bin/llc+0x1b80dc8)
#1 0x0000000001b81426 SignalHandler(int) (/w/bld/org/bin/llc+0x1b81426)
#2 0x00007f63432aa340 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10340)
#3 0x00007f63421a1cc9 gsignal /build/eglibc-3GlaMS/eglibc-2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56:0
#4 0x00007f63421a50d8 abort /build/eglibc-3GlaMS/eglibc-2.19/stdlib/abort.c:91:0
#5 0x00007f634219ab86 __assert_fail_base /build/eglibc-3GlaMS/eglibc-2.19/assert/assert.c:92:0
#6 0x00007f634219ac32 (/lib/x86_64-linux-gnu/libc.so.6+0x2fc32)
#7 0x000000000074eb40 llvm::SlotIndexes::getInstructionIndex(llvm::MachineInstr const&) const (/w/bld/org/bin/llc+0x74eb40)
#8 0x000000000145db6d (anonymous namespace)::RegisterCoalescer::updateRegDefsUses(unsigned int, unsigned int, unsigned int) (/w/bld/org/bin/llc+0x145db6d)
#9 0x0000000001458ecc (anonymous namespace)::RegisterCoalescer::joinCopy(llvm::MachineInstr*, bool&) (/w/bld/org/bin/llc+0x1458ecc)
#10 0x00000000014557c6 (anonymous namespace)::RegisterCoalescer::copyCoalesceWorkList(llvm::MutableArrayRef<llvm::MachineInstr*>) (/w/bld/org/bin/llc+0x14557c6)
#11 0x0000000001455548 (anonymous namespace)::RegisterCoalescer::coalesceLocals() (/w/bld/org/bin/llc+0x1455548)
#12 0x0000000001454b3c (anonymous namespace)::RegisterCoalescer::runOnMachineFunction(llvm::MachineFunction&) (/w/bld/org/bin/llc+0x1454b3c)
#13 0x0000000001369c35 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/w/bld/org/bin/llc+0x1369c35)
#14 0x000000000162f184 llvm::FPPassManager::runOnFunction(llvm::Function&) (/w/bld/org/bin/llc+0x162f184)
#15 0x000000000162f3d3 llvm::FPPassManager::runOnModule(llvm::Module&) (/w/bld/org/bin/llc+0x162f3d3)
#16 0x000000000162f913 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/w/bld/org/bin/llc+0x162f913)
#17 0x0000000000679cca compileModule(char**, llvm::LLVMContext&) (/w/bld/org/bin/llc+0x679cca)
#18 0x0000000000676c5b main (/w/bld/org/bin/llc+0x676c5b)
#19 0x00007f634218cec5 __libc_start_main /build/eglibc-3GlaMS/eglibc-2.19/csu/libc-start.c:321:0
#20 0x0000000000675048 _start (/w/bld/org/bin/llc+0x675048)
Stack dump:
0.      Program arguments: /w/bld/org/bin/llc -march=amdgcn -run-pass simple-register-coalescing -o - x.mir -print-after-all -debug-only=regalloc
1.      Running pass 'Function Pass Manager' on module 'x.mir'.
2.      Running pass 'Simple Register Coalescing' on function '@test'
MatzeB accepted this revision.Feb 1 2017, 7:22 PM

Thanks for providing the testcase, that made investigating this easier. The fix here seems reasonable to me as well.

However I was surprised to see this problem popping up at all, so I dug a bit deeper:

  • This problem only occurs for DBG_VALUE undef %X which doesn't make much sense (but is legal)
  • The undef flag there is actually produced by code in the same function a bit further down (// A subreg use of a partially undef...) which right now looks like a dangerous bug to me (in that it adds an undef flag here even though the register seems live to me). I will investigate this further.

I'm fine with the patch going in, if the testcase is cleaned up (see below):

test/CodeGen/AMDGPU/regcoalesce-dbg.mir
1

Good tests should check some output.
Or put another way: The test should fail if someone replaces llc with /usr/bin/true.

7

You should be able to remove the complete IR part from the test and replace it with a dummy define void @test() { ret void }. The only thing you need to do is remove IR back-reference in the MIR part. Which for this case should just be removing the (%ir-block.0) from the bb.0 label.

75–80

you should be able to remove alignemnt, exposesReturnsTwice, legalize, regBankSelected, selected.

106–119

You can probably remove the frameInfo completely with the test still working.

This revision is now accepted and ready to land.Feb 1 2017, 7:22 PM

Thanks for the comments - much appreciated. I'll update the test case and then push.

Brendon

This revision was automatically updated to reflect the committed changes.