Download Raw Diff

Details

Reviewers

MatzeB
qcolombet
arsenm

Commits

rG7528d4bd4294: [RegisterCoalescer] Fix for SubRange join unreachable
rL307247: [RegisterCoalescer] Fix for SubRange join unreachable

Summary

During remat, some subranges might end up having invalid segments which caused problems for later
coalescing.

Added in a check to remove segments that are invalidated as part of the remat.

See http://llvm.org/PR33524

Diff Detail

Repository: rL LLVM

Event Timeline

dstuttard created this revision.Jun 20 2017, 2:25 AM

Herald added subscribers: qcolombet, MatzeB. · View Herald TranscriptJun 20 2017, 2:25 AM

Added llvm-commits

Adding MatzeB as reviewer - you've made some recent changes in the same area.

I've added this change specifically to address a problem seen in the associated PR (http://llvm.org/PR33524), however I'm not sure that this is necessarily the right way to go about fixing this issue.

I think that the remat results in an incorrect segment in the SubRange - which looks something like [128r, 128d) and is also an undef - and removing it does clear up the problem. (Prior to the remat the segment in the SubRange is correct, albeit for an undef).
However, I wondered if the part of the code that went wrong later (results in an unreachable) should in fact be able to deal with this itself.

Test case?

lib/CodeGen/RegisterCoalescer.cpp
1235 ↗	(On Diff #103178)	Debug code?

In D34391#785251, @meadori wrote:

Test case?

There's an example that provokes the problem in the bugzilla http://llvm.org/PR33524.

I guess at this stage I'm looking for some indication whether this is a reasonable fix for the problem - before submission I'll tidy up the reproducer in the bugzilla and add as a test case.

lib/CodeGen/RegisterCoalescer.cpp
1235 ↗	(On Diff #103178)	Yes, I'll remove.

Removing debug comment
Also adding the test case from the bugzilla as a new test case (I realised it is
a bit annoying to have to go elsewhere to get hold of the reproducer)
I'll update it as a test if this change looks promising

Herald added a subscriber: nhaehnle. · View Herald TranscriptJun 20 2017, 6:35 AM

Removed the commented out debug statement

qcolombet requested changes to this revision.Jun 20 2017, 10:02 AM

qcolombet added inline comments.

test/CodeGen/AMDGPU/pr33524.ll
1 ↗	(On Diff #103200)	You need to add a RUN line and some FileCheck command to check we are generating correct code. FWIW, you'll have something more robust with a .mir test (llc -stop-before simple-register-coalescing -simplify-mir) Add a comment on what this test is checking. In particular listing the pr number here is a good practice. Give a meaning full name to the filename, e.g., reg-coal-join-subrange.

This revision now requires changes to proceed.Jun 20 2017, 10:02 AM

Updating the test as per review comments

I've left it as a .ll test rather than .mir as the mir print routines can't cope
with pseudo source values (used in the llvm.amdgcn.buffer.load intrinsics)

"TargetCustom pseudo source values are not supported"

npjdesres added a subscriber: npjdesres.Jun 21 2017, 7:50 AM

FYI I tried this patch in my out-of-tree backend (hoping to resolve http://llvm.org/PR32773). I observed a segfault SR.removeValNo(RmValNo) because RmValNo may be null.

I don't know yet whether this is specific to my backend, but I thought I'd mention it in case it indicates a more general problem.

In D34391#786792, @npjdesres wrote:

FYI I tried this patch in my out-of-tree backend (hoping to resolve http://llvm.org/PR32773). I observed a segfault SR.removeValNo(RmValNo) because RmValNo may be null.

I don't know yet whether this is specific to my backend, but I thought I'd mention it in case it indicates a more general problem.

Doh = forgot the check for null. I'll upload a change for this.
The fact that you got a segfault here is promising - it could mean it might be a similar problem. Try the new patch.
If it fails then take a look at the SubRanges around the failing SubRange join - it could be a similar problem to the one this fix addresses (but isn't caught by it) - in particular any remats that happen before the failing SubRange join.
There are a couple of failures logged in bugzilla that look similar but aren't fixed by this change, so there are definitely some other issues in this area.

Added in mising check on RmValNo for null

In D34391#786456, @dstuttard wrote:

Updating the test as per review comments

I've left it as a .ll test rather than .mir as the mir print routines can't cope
with pseudo source values (used in the llvm.amdgcn.buffer.load intrinsics)

"TargetCustom pseudo source values are not supported"

You can strip out the MemOperands in the MIR test

test/CodeGen/AMDGPU/pr33524.ll
1 ↗	(On Diff #103200)	Remove these extra comments
2–4 ↗	(On Diff #103200)	These will be redundant with the run line
67–70 ↗	(On Diff #103200)	You can remove all the metadata

Thanks for working on this! A bunch of nitpicks are below but overal the fix looks fine.

lib/CodeGen/RegisterCoalescer.cpp
1228 ↗	(On Diff #103439)	I would not describe `vreg2:sub0` as undef here, the `COPY` is a normal definition like any other. It just happens that after coalescing we don't have a definition left because the copy was reading a partially undef value (but that effect will be visible after the `==>` arrow).
1231 ↗	(On Diff #103439)	This would be the place to mention that vreg2:sub0 is undef now and the subrange needs to be removed.
1232 ↗	(On Diff #103439)	Why do you need `DstIdx == 0`, it seems to me that we need the fixup regardless of DstIdx.
1238 ↗	(On Diff #103439)	"undef tagged as def" is a strange description. How about: "Removing undefined subrange ..." as debug message?
1241–1243 ↗	(On Diff #103439)	How about `if (VNInfo *RmValNo = getVNInfoAt(CurrIdx.getRegSlot()))` (It shouldn't matter here because NewMI should not write to that part of the register. But writing `getRegSlot()` feels more natural to check for liveranges going out of an instruction).
1246 ↗	(On Diff #103439)	You should call `DstInt.removeEmptySubRanges()` after cleaning subranges.

Updating in line with comments from reviewers

I haven't updated the test to .mir as I don't fully understand what
@arsenm means - perhaps if we discuss offline I can do a subsequent patch to
update the test, but the one there will suffice for now?

dstuttard marked 6 inline comments as done.Jun 22 2017, 8:22 AM

In D34391#787997, @dstuttard wrote:

Updating in line with comments from reviewers

I haven't updated the test to .mir as I don't fully understand what
@arsenm means - perhaps if we discuss offline I can do a subsequent patch to
update the test, but the one there will suffice for now?

The TargetCustom error is from the pseudo value source used for the buffer intrinsics memory operands. If you remove those you should avoid the error. Also it may still reproduce if you replace the intrinsics in the IR with volatile loads

The code fix LGTM, but please wait for @qcolombet/@arsenm before committing.

Making a .mir test seems indeed hard at the moment, as the printer already fails; manually stripping memory operands only works after printing I presume?

In D34391#791480, @MatzeB wrote:

The code fix LGTM, but please wait for @qcolombet/@arsenm before committing.

Making a .mir test seems indeed hard at the moment, as the printer already fails; manually stripping memory operands only works after printing I presume?

Yes, stripping the operands only works after printing :(

@arsenm I'll try using volatile loads and stores to see if that works in this case. Would it be acceptable in the meantime to accept this change and then I'll update the .ll test with a .mir one when it works?

LGTM

Updating the test to a .mir test
Replacing the buffer.load intrinsics with load volatile worked

Herald added a subscriber: wdng. · View Herald TranscriptJul 3 2017, 7:07 AM

@qcolombet - any further comments or are you happy for this to go in?

Harbormaster completed remote builds in B7913: Diff 105062.Jul 3 2017, 7:34 AM

LGTM.

Thanks

This revision is now accepted and ready to land.Jul 5 2017, 10:08 AM

Closed by commit rL307247: [RegisterCoalescer] Fix for SubRange join unreachable (authored by dstuttard). · Explain WhyJul 6 2017, 3:08 AM

This revision was automatically updated to reflect the committed changes.

Diff 105389

llvm/trunk/lib/CodeGen/RegisterCoalescer.cpp

Show First 20 Lines • Show All 1,221 Lines • ▼ Show 20 Lines	if (NewIdx == 0 && DstInt.hasSubRanges()) {
SR.createDeadDef(DefIndex, Alloc);		SR.createDeadDef(DefIndex, Alloc);
MaxMask &= ~SR.LaneMask;		MaxMask &= ~SR.LaneMask;
}		}
if (MaxMask.any()) {		if (MaxMask.any()) {
LiveInterval::SubRange *SR = DstInt.createSubRange(Alloc, MaxMask);		LiveInterval::SubRange *SR = DstInt.createSubRange(Alloc, MaxMask);
SR->createDeadDef(DefIndex, Alloc);		SR->createDeadDef(DefIndex, Alloc);
}		}
}		}

		// Make sure that the subrange for resultant undef is removed
		// For example:
		// vreg1:sub1<def,read-undef> = LOAD CONSTANT 1
		// vreg2<def> = COPY vreg1
		// ==>
		// vreg2:sub1<def, read-undef> = LOAD CONSTANT 1
		// ; Correct but need to remove the subrange for vreg2:sub0
		// ; as it is now undef
		if (NewIdx != 0 && DstInt.hasSubRanges()) {
		// The affected subregister segments can be removed.
		SlotIndex CurrIdx = LIS->getInstructionIndex(NewMI);
		LaneBitmask DstMask = TRI->getSubRegIndexLaneMask(NewIdx);
		bool UpdatedSubRanges = false;
		for (LiveInterval::SubRange &SR : DstInt.subranges()) {
		if ((SR.LaneMask & DstMask).none()) {
		DEBUG(dbgs() << "Removing undefined SubRange "
		<< PrintLaneMask(SR.LaneMask) << " : " << SR << "\n");
		// VNI is in ValNo - remove any segments in this SubRange that have this ValNo
		if (VNInfo *RmValNo = SR.getVNInfoAt(CurrIdx.getRegSlot())) {
		SR.removeValNo(RmValNo);
		UpdatedSubRanges = true;
		}
		}
		}
		if (UpdatedSubRanges)
		DstInt.removeEmptySubRanges();
		}
} else if (NewMI.getOperand(0).getReg() != CopyDstReg) {		} else if (NewMI.getOperand(0).getReg() != CopyDstReg) {
// The New instruction may be defining a sub-register of what's actually		// The New instruction may be defining a sub-register of what's actually
// been asked for. If so it must implicitly define the whole thing.		// been asked for. If so it must implicitly define the whole thing.
assert(TargetRegisterInfo::isPhysicalRegister(DstReg) &&		assert(TargetRegisterInfo::isPhysicalRegister(DstReg) &&
"Only expect virtual or physical registers in remat");		"Only expect virtual or physical registers in remat");
NewMI.getOperand(0).setIsDead(true);		NewMI.getOperand(0).setIsDead(true);
NewMI.addOperand(MachineOperand::CreateReg(		NewMI.addOperand(MachineOperand::CreateReg(
CopyDstReg, true /IsDef/, true /IsImp/, false /IsKill/));		CopyDstReg, true /IsDef/, true /IsImp/, false /IsKill/));
▲ Show 20 Lines • Show All 2,119 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/regcoal-subrange-join.mir

				# RUN: llc -march=amdgcn -run-pass simple-register-coalescing -o - %s \| FileCheck --check-prefix=GCN %s
				#
				# See bug http://llvm.org/PR33524 for details of the problem being checked here
				# This test will provoke a subrange join (see annotations below) during simple register coalescing
				# Without a fix for PR33524 this causes an unreachable in SubRange Join
				#
				# GCN-DAG: undef %[[REG0:[0-9]+]].sub0 = COPY %sgpr5
				# GCN-DAG: undef %[[REG1:[0-9]+]].sub0 = COPY %sgpr2
				# GCN-DAG: %[[REG0]].sub1 = S_MOV_B32 1
				# GCN-DAG: %[[REG1]].sub1 = S_MOV_B32 1

				--- \|
				define amdgpu_vs void @regcoal-subrange-join(i32 inreg %arg, i32 inreg %arg1, i32 inreg %arg2, i32 inreg %arg3, i32 inreg %arg4, i32 inreg %arg5, i32 %arg6) local_unnamed_addr #0 {
				ret void
				}

				...
				---
				name: regcoal-subrange-join
				tracksRegLiveness: true
				registers:
				- { id: 0, class: sreg_64 }
				- { id: 1, class: vreg_128 }
				- { id: 2, class: vreg_128 }
				- { id: 3, class: vreg_128 }
				- { id: 4, class: sreg_32_xm0 }
				- { id: 5, class: sreg_32_xm0 }
				- { id: 6, class: sreg_32_xm0, preferred-register: '%8' }
				- { id: 7, class: vreg_128 }
				- { id: 8, class: sreg_32_xm0, preferred-register: '%6' }
				- { id: 9, class: vreg_128 }
				- { id: 10, class: sgpr_32 }
				- { id: 11, class: sgpr_32 }
				- { id: 12, class: sgpr_32 }
				- { id: 13, class: sgpr_32 }
				- { id: 14, class: sgpr_32 }
				- { id: 15, class: sgpr_32 }
				- { id: 16, class: vgpr_32 }
				- { id: 17, class: sreg_32_xm0 }
				- { id: 18, class: sreg_64 }
				- { id: 19, class: sreg_32_xm0 }
				- { id: 20, class: sreg_32_xm0 }
				- { id: 21, class: sreg_64 }
				- { id: 22, class: sreg_32_xm0_xexec }
				- { id: 23, class: sreg_32_xm0 }
				- { id: 24, class: sreg_64_xexec }
				- { id: 25, class: sreg_128 }
				- { id: 26, class: sreg_64_xexec }
				- { id: 27, class: sreg_32_xm0_xexec }
				- { id: 28, class: sreg_32_xm0 }
				- { id: 29, class: vgpr_32 }
				- { id: 30, class: vgpr_32 }
				- { id: 31, class: vgpr_32 }
				- { id: 32, class: vgpr_32 }
				- { id: 33, class: vgpr_32 }
				- { id: 34, class: vgpr_32 }
				- { id: 35, class: vgpr_32 }
				- { id: 36, class: vgpr_32 }
				- { id: 37, class: vgpr_32 }
				- { id: 38, class: sreg_128 }
				- { id: 39, class: sreg_64_xexec }
				- { id: 40, class: sreg_32_xm0_xexec }
				- { id: 41, class: sreg_32_xm0 }
				- { id: 42, class: vgpr_32 }
				- { id: 43, class: vgpr_32 }
				- { id: 44, class: vgpr_32 }
				- { id: 45, class: vgpr_32 }
				- { id: 46, class: vgpr_32 }
				- { id: 47, class: vgpr_32 }
				- { id: 48, class: vgpr_32 }
				- { id: 49, class: vgpr_32 }
				- { id: 50, class: vgpr_32 }
				- { id: 51, class: sreg_128 }
				- { id: 52, class: vgpr_32 }
				- { id: 53, class: vgpr_32 }
				- { id: 54, class: vgpr_32 }
				- { id: 55, class: vgpr_32 }
				- { id: 56, class: vreg_128 }
				- { id: 57, class: vreg_128 }
				- { id: 58, class: vreg_128 }
				- { id: 59, class: sreg_32_xm0 }
				- { id: 60, class: sreg_32_xm0 }
				- { id: 61, class: vreg_128 }
				liveins:
				- { reg: '%sgpr2', virtual-reg: '%12' }
				- { reg: '%sgpr5', virtual-reg: '%15' }
				body: \|
				bb.0:
				liveins: %sgpr2, %sgpr5

				%15 = COPY killed %sgpr5
				%12 = COPY killed %sgpr2
				%17 = S_MOV_B32 1
				undef %18.sub1 = COPY %17
				%0 = COPY %18
				%0.sub0 = COPY killed %12
				%21 = COPY killed %18
				%21.sub0 = COPY killed %15
				%22 = S_LOAD_DWORD_IMM killed %21, 2, 0
				%23 = S_MOV_B32 491436
				undef %24.sub0 = COPY killed %22
				%24.sub1 = COPY killed %23
				%25 = S_LOAD_DWORDX4_IMM killed %24, 0, 0
				%1 = COPY killed %25
				%26 = S_LOAD_DWORDX2_IMM %0, 2, 0
				dead %27 = S_LOAD_DWORD_IMM killed %26, 0, 0
				S_CBRANCH_SCC0 %bb.1, implicit undef %scc

				bb.5:
				%58 = COPY killed %1
				%59 = COPY killed %17
				S_BRANCH %bb.2

				bb.1:
				%30 = V_MOV_B32_e32 1036831949, implicit %exec
				%31 = V_ADD_F32_e32 %30, %1.sub3, implicit %exec
				%33 = V_ADD_F32_e32 %30, %1.sub2, implicit %exec
				%35 = V_ADD_F32_e32 %30, %1.sub1, implicit %exec
				%37 = V_ADD_F32_e32 killed %30, killed %1.sub0, implicit %exec
				undef %56.sub0 = COPY killed %37
				%56.sub1 = COPY killed %35
				%56.sub2 = COPY killed %33
				%56.sub3 = COPY killed %31
				%28 = S_MOV_B32 0
				%2 = COPY killed %56
				%58 = COPY killed %2
				%59 = COPY killed %28

				bb.2:
				%4 = COPY killed %59
				%3 = COPY killed %58
				%39 = S_LOAD_DWORDX2_IMM killed %0, 6, 0
				%40 = S_LOAD_DWORD_IMM killed %39, 0, 0
				%43 = V_MOV_B32_e32 -1102263091, implicit %exec
				%60 = COPY killed %4
				%61 = COPY killed %3

				bb.3:
				successors: %bb.3, %bb.4

				%7 = COPY killed %61
				%6 = COPY killed %60
				%8 = S_ADD_I32 killed %6, 1, implicit-def dead %scc
				%44 = V_ADD_F32_e32 %43, %7.sub3, implicit %exec
				%46 = V_ADD_F32_e32 %43, %7.sub2, implicit %exec
				%48 = V_ADD_F32_e32 %43, %7.sub1, implicit %exec
				%50 = V_ADD_F32_e32 %43, killed %7.sub0, implicit %exec
				undef %57.sub0 = COPY killed %50
				%57.sub1 = COPY killed %48
				%57.sub2 = COPY %46
				%57.sub3 = COPY killed %44
				S_CMP_LT_I32 %8, %40, implicit-def %scc
				%60 = COPY killed %8
				%61 = COPY killed %57
				S_CBRANCH_SCC1 %bb.3, implicit killed %scc
				S_BRANCH %bb.4

				bb.4:
				EXP 32, undef %53, undef %54, killed %46, undef %55, 0, 0, 15, implicit %exec
				S_ENDPGM

				...

This is an archive of the discontinued LLVM Phabricator instance.

[RegisterCoalescer] Fix for SubRange join unreachable
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 105389

llvm/trunk/lib/CodeGen/RegisterCoalescer.cpp

llvm/trunk/test/CodeGen/AMDGPU/regcoal-subrange-join.mir

This is an archive of the discontinued LLVM Phabricator instance.

[RegisterCoalescer] Fix for SubRange join unreachableClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 105389

llvm/trunk/lib/CodeGen/RegisterCoalescer.cpp

llvm/trunk/test/CodeGen/AMDGPU/regcoal-subrange-join.mir

[RegisterCoalescer] Fix for SubRange join unreachable
ClosedPublic