Download Raw Diff

Details

Reviewers

MatzeB
qcolombet
arsenm

Commits

rG7528d4bd4294: [RegisterCoalescer] Fix for SubRange join unreachable
rL307247: [RegisterCoalescer] Fix for SubRange join unreachable

Summary

During remat, some subranges might end up having invalid segments which caused problems for later
coalescing.

Added in a check to remove segments that are invalidated as part of the remat.

See http://llvm.org/PR33524

Diff Detail

Build Status

Buildable 7413
Build 7413: arc lint + arc unit

Event Timeline

dstuttard created this revision.Jun 20 2017, 2:25 AM

Herald added subscribers: qcolombet, MatzeB. · View Herald TranscriptJun 20 2017, 2:25 AM

Added llvm-commits

Adding MatzeB as reviewer - you've made some recent changes in the same area.

I've added this change specifically to address a problem seen in the associated PR (http://llvm.org/PR33524), however I'm not sure that this is necessarily the right way to go about fixing this issue.

I think that the remat results in an incorrect segment in the SubRange - which looks something like [128r, 128d) and is also an undef - and removing it does clear up the problem. (Prior to the remat the segment in the SubRange is correct, albeit for an undef).
However, I wondered if the part of the code that went wrong later (results in an unreachable) should in fact be able to deal with this itself.

Test case?

lib/CodeGen/RegisterCoalescer.cpp
1235	Debug code?

In D34391#785251, @meadori wrote:

Test case?

There's an example that provokes the problem in the bugzilla http://llvm.org/PR33524.

I guess at this stage I'm looking for some indication whether this is a reasonable fix for the problem - before submission I'll tidy up the reproducer in the bugzilla and add as a test case.

lib/CodeGen/RegisterCoalescer.cpp
1235	Yes, I'll remove.

Removing debug comment
Also adding the test case from the bugzilla as a new test case (I realised it is
a bit annoying to have to go elsewhere to get hold of the reproducer)
I'll update it as a test if this change looks promising

Herald added a subscriber: nhaehnle. · View Herald TranscriptJun 20 2017, 6:35 AM

Removed the commented out debug statement

qcolombet requested changes to this revision.Jun 20 2017, 10:02 AM

qcolombet added inline comments.

test/CodeGen/AMDGPU/pr33524.ll
1	You need to add a RUN line and some FileCheck command to check we are generating correct code. FWIW, you'll have something more robust with a .mir test (llc -stop-before simple-register-coalescing -simplify-mir) Add a comment on what this test is checking. In particular listing the pr number here is a good practice. Give a meaning full name to the filename, e.g., reg-coal-join-subrange.

This revision now requires changes to proceed.Jun 20 2017, 10:02 AM

Updating the test as per review comments

I've left it as a .ll test rather than .mir as the mir print routines can't cope
with pseudo source values (used in the llvm.amdgcn.buffer.load intrinsics)

"TargetCustom pseudo source values are not supported"

npjdesres added a subscriber: npjdesres.Jun 21 2017, 7:50 AM

FYI I tried this patch in my out-of-tree backend (hoping to resolve http://llvm.org/PR32773). I observed a segfault SR.removeValNo(RmValNo) because RmValNo may be null.

I don't know yet whether this is specific to my backend, but I thought I'd mention it in case it indicates a more general problem.

In D34391#786792, @npjdesres wrote:

FYI I tried this patch in my out-of-tree backend (hoping to resolve http://llvm.org/PR32773). I observed a segfault SR.removeValNo(RmValNo) because RmValNo may be null.

I don't know yet whether this is specific to my backend, but I thought I'd mention it in case it indicates a more general problem.

Doh = forgot the check for null. I'll upload a change for this.
The fact that you got a segfault here is promising - it could mean it might be a similar problem. Try the new patch.
If it fails then take a look at the SubRanges around the failing SubRange join - it could be a similar problem to the one this fix addresses (but isn't caught by it) - in particular any remats that happen before the failing SubRange join.
There are a couple of failures logged in bugzilla that look similar but aren't fixed by this change, so there are definitely some other issues in this area.

Added in mising check on RmValNo for null

In D34391#786456, @dstuttard wrote:

Updating the test as per review comments

I've left it as a .ll test rather than .mir as the mir print routines can't cope
with pseudo source values (used in the llvm.amdgcn.buffer.load intrinsics)

"TargetCustom pseudo source values are not supported"

You can strip out the MemOperands in the MIR test

test/CodeGen/AMDGPU/pr33524.ll
1	Remove these extra comments
2–4	These will be redundant with the run line
67–70	You can remove all the metadata

Thanks for working on this! A bunch of nitpicks are below but overal the fix looks fine.

lib/CodeGen/RegisterCoalescer.cpp
1228	I would not describe `vreg2:sub0` as undef here, the `COPY` is a normal definition like any other. It just happens that after coalescing we don't have a definition left because the copy was reading a partially undef value (but that effect will be visible after the `==>` arrow).
1231	This would be the place to mention that vreg2:sub0 is undef now and the subrange needs to be removed.
1232	Why do you need `DstIdx == 0`, it seems to me that we need the fixup regardless of DstIdx.
1238	"undef tagged as def" is a strange description. How about: "Removing undefined subrange ..." as debug message?
1241–1243	How about `if (VNInfo *RmValNo = getVNInfoAt(CurrIdx.getRegSlot()))` (It shouldn't matter here because NewMI should not write to that part of the register. But writing `getRegSlot()` feels more natural to check for liveranges going out of an instruction).
1246	You should call `DstInt.removeEmptySubRanges()` after cleaning subranges.

Updating in line with comments from reviewers

I haven't updated the test to .mir as I don't fully understand what
@arsenm means - perhaps if we discuss offline I can do a subsequent patch to
update the test, but the one there will suffice for now?

dstuttard marked 6 inline comments as done.Jun 22 2017, 8:22 AM

In D34391#787997, @dstuttard wrote:

Updating in line with comments from reviewers

I haven't updated the test to .mir as I don't fully understand what
@arsenm means - perhaps if we discuss offline I can do a subsequent patch to
update the test, but the one there will suffice for now?

The TargetCustom error is from the pseudo value source used for the buffer intrinsics memory operands. If you remove those you should avoid the error. Also it may still reproduce if you replace the intrinsics in the IR with volatile loads

The code fix LGTM, but please wait for @qcolombet/@arsenm before committing.

Making a .mir test seems indeed hard at the moment, as the printer already fails; manually stripping memory operands only works after printing I presume?

In D34391#791480, @MatzeB wrote:

The code fix LGTM, but please wait for @qcolombet/@arsenm before committing.

Making a .mir test seems indeed hard at the moment, as the printer already fails; manually stripping memory operands only works after printing I presume?

Yes, stripping the operands only works after printing :(

@arsenm I'll try using volatile loads and stores to see if that works in this case. Would it be acceptable in the meantime to accept this change and then I'll update the .ll test with a .mir one when it works?

LGTM

Updating the test to a .mir test
Replacing the buffer.load intrinsics with load volatile worked

Herald added a subscriber: wdng. · View Herald TranscriptJul 3 2017, 7:07 AM

@qcolombet - any further comments or are you happy for this to go in?

Harbormaster completed remote builds in B7913: Diff 105062.Jul 3 2017, 7:34 AM

LGTM.

Thanks

This revision is now accepted and ready to land.Jul 5 2017, 10:08 AM

Closed by commit rL307247: [RegisterCoalescer] Fix for SubRange join unreachable (authored by dstuttard). · Explain WhyJul 6 2017, 3:08 AM

This revision was automatically updated to reflect the committed changes.

Diff 103200

lib/CodeGen/RegisterCoalescer.cpp

Show First 20 Lines • Show All 1,214 Lines • ▼ Show 20 Lines	if (NewIdx == 0 && DstInt.hasSubRanges()) {
SR.createDeadDef(DefIndex, Alloc);		SR.createDeadDef(DefIndex, Alloc);
MaxMask &= ~SR.LaneMask;		MaxMask &= ~SR.LaneMask;
}		}
if (MaxMask.any()) {		if (MaxMask.any()) {
LiveInterval::SubRange *SR = DstInt.createSubRange(Alloc, MaxMask);		LiveInterval::SubRange *SR = DstInt.createSubRange(Alloc, MaxMask);
SR->createDeadDef(DefIndex, Alloc);		SR->createDeadDef(DefIndex, Alloc);
}		}
}		}

		// Make sure that the subrange for resultant undef is removed
		// For example:
		// vreg1:sub1<def,read-undef> = LOAD CONSTANT 1
		// vreg2<def> = COPY vreg1
		// ; vreg2:sub0 is actually undef but subrange exists in LiveRange for lane
		MatzeBUnsubmitted Done Reply Inline Actions I would not describe `vreg2:sub0` as undef here, the `COPY` is a normal definition like any other. It just happens that after coalescing we don't have a definition left because the copy was reading a partially undef value (but that effect will be visible after the `==>` arrow). MatzeB: I would not describe `vreg2:sub0` as undef here, the `COPY` is a normal definition like any…
		// ==>
		// vreg2:sub1<def, read-undef> = LOAD CONSTANT 1
		// ; Correct but need to remove the subrange for sub0
		MatzeBUnsubmitted Done Reply Inline Actions This would be the place to mention that vreg2:sub0 is undef now and the subrange needs to be removed. MatzeB: This would be the place to mention that vreg2:sub0 is undef now and the subrange needs to be…
		if (NewIdx != 0 && DstIdx == 0 && DstInt.hasSubRanges()) {
		MatzeBUnsubmitted Done Reply Inline Actions Why do you need `DstIdx == 0`, it seems to me that we need the fixup regardless of DstIdx. MatzeB: Why do you need `DstIdx == 0`, it seems to me that we need the fixup regardless of DstIdx.
		// The affected subregister segments can be removed.
		SlotIndex CurrIdx = LIS->getInstructionIndex(NewMI);
		LaneBitmask DstMask = TRI->getSubRegIndexLaneMask(NewIdx);
		meadoriUnsubmitted Done Reply Inline Actions Debug code? meadori: Debug code?
		dstuttardAuthorUnsubmitted Done Reply Inline Actions Yes, I'll remove. dstuttard: Yes, I'll remove.
		for (LiveInterval::SubRange &SR : DstInt.subranges()) {
		if ((SR.LaneMask & DstMask).none()) {
		DEBUG(dbgs() << "SubRange containing an undef tagged as def "
		MatzeBUnsubmitted Done Reply Inline Actions "undef tagged as def" is a strange description. How about: "Removing undefined subrange ..." as debug message? MatzeB: "undef tagged as def" is a strange description. How about: "Removing undefined subrange ..." as…
		<< PrintLaneMask(SR.LaneMask) << " : " << SR << "\n");
		// VNI is in ValNo - remove any segments in this SubRange that have this ValNo
		VNInfo *RmValNo = SR.Query(CurrIdx).valueOutOrDead();
		SR.removeValNo(RmValNo);
		}
		MatzeBUnsubmitted Done Reply Inline Actions How about `if (VNInfo RmValNo = getVNInfoAt(CurrIdx.getRegSlot()))` (It shouldn't matter here because NewMI should not write to that part of the register. But writing `getRegSlot()` feels more natural to check for liveranges going out of an instruction). MatzeB:* How about `if (VNInfo *RmValNo = getVNInfoAt(CurrIdx.getRegSlot()))` (It shouldn't matter here…
		}
		}
} else if (NewMI.getOperand(0).getReg() != CopyDstReg) {		} else if (NewMI.getOperand(0).getReg() != CopyDstReg) {
		MatzeBUnsubmitted Done Reply Inline Actions You should call `DstInt.removeEmptySubRanges()` after cleaning subranges. MatzeB: You should call `DstInt.removeEmptySubRanges()` after cleaning subranges.
// The New instruction may be defining a sub-register of what's actually		// The New instruction may be defining a sub-register of what's actually
// been asked for. If so it must implicitly define the whole thing.		// been asked for. If so it must implicitly define the whole thing.
assert(TargetRegisterInfo::isPhysicalRegister(DstReg) &&		assert(TargetRegisterInfo::isPhysicalRegister(DstReg) &&
"Only expect virtual or physical registers in remat");		"Only expect virtual or physical registers in remat");
NewMI.getOperand(0).setIsDead(true);		NewMI.getOperand(0).setIsDead(true);
NewMI.addOperand(MachineOperand::CreateReg(		NewMI.addOperand(MachineOperand::CreateReg(
CopyDstReg, true /IsDef/, true /IsImp/, false /IsKill/));		CopyDstReg, true /IsDef/, true /IsImp/, false /IsKill/));
// Record small dead def live-ranges for all the subregisters		// Record small dead def live-ranges for all the subregisters
▲ Show 20 Lines • Show All 2,118 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/pr33524.ll

This file was added.

				; ModuleID = 'pr33524.bc'
				qcolombetUnsubmitted Not Done Reply Inline Actions You need to add a RUN line and some FileCheck command to check we are generating correct code. FWIW, you'll have something more robust with a .mir test (llc -stop-before simple-register-coalescing -simplify-mir) Add a comment on what this test is checking. In particular listing the pr number here is a good practice. Give a meaning full name to the filename, e.g., reg-coal-join-subrange. qcolombet: You need to add a RUN line and some FileCheck command to check we are generating correct code.
				arsenmUnsubmitted Not Done Reply Inline Actions Remove these extra comments arsenm: Remove these extra comments
				source_filename = "bugpoint-output-3331cb1.bc"
				target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024"
				target triple = "spir64-unknown-unknown"
				arsenmUnsubmitted Not Done Reply Inline Actions These will be redundant with the run line arsenm: These will be redundant with the run line

				; Function Attrs: nounwind
				define amdgpu_vs void @main(i32 inreg %arg, i32 inreg %arg1, i32 inreg %arg2, i32 inreg %arg3, i32 inreg %arg4, i32 inreg %arg5, i32 %arg6) local_unnamed_addr #0 {
				.entry:
				%.4.vec.insert9 = insertelement <2 x i32> <i32 undef, i32 1>, i32 %arg2, i32 0
				%tmp = bitcast <2 x i32> %.4.vec.insert9 to i64
				%tmp7 = inttoptr i64 %tmp to [4294967295 x i8] addrspace(2)*
				%.4.vec.insert = insertelement <2 x i32> <i32 undef, i32 1>, i32 %arg5, i32 0
				%tmp8 = bitcast <2 x i32> %.4.vec.insert to i64
				%tmp9 = inttoptr i64 %tmp8 to [16 x <4 x i32>] addrspace(2)*
				%tmp10 = getelementptr [16 x <4 x i32>], [16 x <4 x i32>] addrspace(2)* %tmp9, i64 0, i64 0, !amdgpu.uniform !1
				%tmp11 = load <4 x i32>, <4 x i32> addrspace(2)* %tmp10, align 16, !invariant.load !1
				%tmp12 = insertelement <4 x i32> %tmp11, i32 491436, i32 3
				%tmp13 = tail call <4 x float> @llvm.amdgcn.buffer.load.format.v4f32(<4 x i32> %tmp12, i32 undef, i32 0, i1 false, i1 false) #1
				%tmp14 = inttoptr i64 %tmp to <4 x i32> addrspace(2)*
				%tmp15 = load <4 x i32>, <4 x i32> addrspace(2)* %tmp14, align 16
				%tmp16 = tail call float @llvm.amdgcn.buffer.load.f32(<4 x i32> %tmp15, i32 0, i32 0, i1 false, i1 false) #0
				%tmp17 = bitcast float %tmp16 to i32
				br i1 undef, label %.lr.ph6.preheader, label %.preheader

				.lr.ph6.preheader: ; preds = %.entry
				%tmp18 = fadd <4 x float> %tmp13, <float 0x3FB99999A0000000, float 0x3FB99999A0000000, float 0x3FB99999A0000000, float 0x3FB99999A0000000>
				%tmp19 = icmp slt i32 1, %tmp17
				br label %.preheader

				.preheader: ; preds = %.lr.ph6.preheader, %.entry
				%f.0.lcssa = phi <4 x float> [ %tmp13, %.entry ], [ %tmp18, %.lr.ph6.preheader ]
				%.lcssa = phi i32 [ 1, %.entry ], [ 0, %.lr.ph6.preheader ]
				%tmp20 = getelementptr [4294967295 x i8], [4294967295 x i8] addrspace(2)* %tmp7, i64 0, i64 16
				%tmp21 = bitcast i8 addrspace(2)* %tmp20 to <4 x i32> addrspace(2)*, !amdgpu.uniform !1
				%tmp22 = load <4 x i32>, <4 x i32> addrspace(2)* %tmp21, align 16
				%tmp23 = tail call float @llvm.amdgcn.buffer.load.f32(<4 x i32> %tmp22, i32 0, i32 0, i1 false, i1 false) #0
				%tmp24 = bitcast float %tmp23 to i32
				br label %.lr.ph

				.lr.ph: ; preds = %.lr.ph, %.preheader
				%k.14 = phi i32 [ %tmp25, %.lr.ph ], [ %.lcssa, %.preheader ]
				%f.13 = phi <4 x float> [ %tmp26, %.lr.ph ], [ %f.0.lcssa, %.preheader ]
				%tmp25 = add nsw i32 %k.14, 1
				%tmp26 = fadd <4 x float> %f.13, <float 0xBFC99999A0000000, float 0xBFC99999A0000000, float 0xBFC99999A0000000, float 0xBFC99999A0000000>
				%tmp27 = icmp slt i32 %tmp25, %tmp24
				br i1 %tmp27, label %.lr.ph, label %._crit_edge.loopexit

				._crit_edge.loopexit: ; preds = %.lr.ph
				%.lcssa30 = phi <4 x float> [ %tmp26, %.lr.ph ]
				%tmp28 = extractelement <4 x float> %.lcssa30, i32 2
				tail call void @llvm.amdgcn.exp.f32(i32 32, i32 15, float undef, float undef, float %tmp28, float undef, i1 false, i1 false) #0
				ret void
				}

				; Function Attrs: nounwind readonly
				declare float @llvm.amdgcn.buffer.load.f32(<4 x i32>, i32, i32, i1, i1) #1

				; Function Attrs: nounwind readonly
				declare <4 x float> @llvm.amdgcn.buffer.load.format.v4f32(<4 x i32>, i32, i32, i1, i1) #1

				; Function Attrs: nounwind
				declare void @llvm.amdgcn.exp.f32(i32, i32, float, float, float, float, i1, i1) #0

				attributes #0 = { nounwind }
				attributes #1 = { nounwind readonly }

				!spirv.Generator = !{!0}

				!0 = !{i16 8, i16 1}
				!1 = !{}
				arsenmUnsubmitted Not Done Reply Inline Actions You can remove all the metadata arsenm: You can remove all the metadata

This is an archive of the discontinued LLVM Phabricator instance.

[RegisterCoalescer] Fix for SubRange join unreachable
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 103200

lib/CodeGen/RegisterCoalescer.cpp

test/CodeGen/AMDGPU/pr33524.ll

This is an archive of the discontinued LLVM Phabricator instance.

[RegisterCoalescer] Fix for SubRange join unreachableClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 103200

lib/CodeGen/RegisterCoalescer.cpp

test/CodeGen/AMDGPU/pr33524.ll

[RegisterCoalescer] Fix for SubRange join unreachable
ClosedPublic