Download Raw Diff

Details

Reviewers

andreadb
atrick
bogner
craig.topper
aprantl

Commits

rG96c641569717: [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y)
rL296825: [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y)

Summary

Currently, when 't1: i1 = setcc t2, t3, cc' followed by 't4: i1 = xor t1, Constant:i1<-1>' is folded into 't5: i1 = setcc t2, t3 !cc', SDLoc of newly created SDValue 't5' follows SDLoc of 't4', not 't1'. However, as the opcode of newly created SDValue is 'setcc', it make more sense to take DebugLoc from 't1' than 't4'. For the code below

extern int bar();
extern int baz();

int foo(int x, int y) {
  if (x != y)
    return bar();
  else
    return baz();
}

, following is the bitcode representation of 'foo' at the end of llvm-ir level optimization:

define i32 @foo(i32 %x, i32 %y) !dbg !4 {
entry:
  tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12
  tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13
  %cmp = icmp ne i32 %x, %y, !dbg !14
  br i1 %cmp, label %if.then, label %if.else, !dbg !16

if.then:                                          ; preds = %entry
  %call = tail call i32 (...) @bar() #3, !dbg !17
  br label %return, !dbg !18

if.else:                                          ; preds = %entry
  %call1 = tail call i32 (...) @baz() #3, !dbg !19
  br label %return, !dbg !20

return:                                           ; preds = %if.else, %if.then
  %retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ]
  ret i32 %retval.0, !dbg !21
}

!14 = !DILocation(line: 5, column: 9, scope: !15)
!16 = !DILocation(line: 5, column: 7, scope: !4)

As you can see, in 'entry' block, 'icmp' instruction and 'br' instruction have different debug locations. However, with current implementation, there's no distinction between debug locations of these two when they are lowered to asm instructions. This is because 'icmp' and 'br' become 'setcc' 'xor' and 'brcond' in SelectionDAG, where SDLoc of 'setcc' follows the debug location of 'icmp' but SDLOC of 'xor' and 'brcond' follows the debug location of 'br' instruction, and SDLoc of 'xor' overwrites SDLoc of 'setcc' when they are folded. This patch addresses this issue.

Diff Detail

Build Status

Buildable 3872
Build 3872: arc lint + arc unit

Event Timeline

twoh created this revision.Feb 9 2017, 10:53 PM

Harbormaster completed remote builds in B3872: Diff 87966.Feb 9 2017, 10:53 PM

andreadb added a subscriber: andreadb.Feb 10 2017, 4:21 AM

Some tests need to be updated as new DebugLoc changes instruction ordering.

Herald added a subscriber: jholewinski. · View Herald TranscriptFeb 20 2017, 7:55 PM

twoh added reviewers: andreadb, craig.topper.Feb 20 2017, 7:59 PM

mkuper added a subscriber: mkuper.Feb 20 2017, 8:21 PM

mkuper added inline comments.

test/CodeGen/X86/avx512-fsel.ll
15 ↗	(On Diff #89153)	A drive-by comment: please don't manually change auto-generate tests - rather, rerun the script. You shouldn't need CHECK-DAG here anyway, since the scheduling is supposed to be deterministic. (Also, I'm a bit confused. I can understand SDLoc changes affecting scheduling if we do source-order scheduling, but why would this make a difference for IR input that doesn't have any line info to begin with?)

lgtm for nvptx test change

twoh added inline comments.Feb 20 2017, 9:05 PM

test/CodeGen/X86/avx512-fsel.ll
15 ↗	(On Diff #89153)	Sorry, I missed the note. Thanks for the comments, and I will update the test again. Even if there's no line info, SelectionDAGBuilder::SDNodeOrder is still maintained during the process (https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp#L942), and used to set SDNode::IROrder. For example, here (https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp#L1009) getCurSDLoc generates SDLoc that takes SelectionDAGBuilder::SDNodeOrder as its IROrder, and this values eventually passed to the SDNode constructor for IROrder.

Now avx512-fsel.ll is auto-generated from the script.

mkuper added inline comments.Feb 20 2017, 10:24 PM

test/CodeGen/X86/avx512-fsel.ll
15 ↗	(On Diff #89153)	Ah, ok, that makes sense.

andreadb added inline comments.Feb 21 2017, 5:08 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4689–4692	With this change, we always propagate the debug location of N0. However, this may be suboptimal if we end up in a situation where N0 has a null DebugLoc, and the DebugLoc of N is not null. In that case, shouldn't we propagate SDLoc(N) instead? When constructing a new SetCC/SelectCC node, what if we give higher priority to SDLoc(N) over SDLoc(N0) based on whether `N0->getDebugLoc()` returns a null location? In order to guarantee a deterministic schedule, we could always call `SelectionDAG::UpdadeSDLocOnMergedSDNode()` on the newly created dag node, with the goal to set the new IROrder equal to the minimum node order between N0 and N. Not sure if that makes sense. I hope this helps.

andreadb added a reviewer: aprantl.Feb 21 2017, 5:09 AM

aprantl added inline comments.Feb 21 2017, 9:07 AM

test/CodeGen/X86/xor-combine-debugloc.ll
3	Please add a comment that explains what is being checked here. It might be more robust to run llc -stop-after=.. and check the MIR right after ISEL.

twoh added inline comments.Feb 21 2017, 10:53 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4689–4692	@andreadb Thanks for the comments! In this particular case I think it is not correct to take SDLoc from N, and it is better to let the new SDLoc have null DebugLoc if N0 has a null DebugLoc. If we can't have a precise DebugLoc, I think it is better with no DebugLoc than imprecise DebugLoc. Regarding scheduling order, It seems more natural to me that the new SDValue takes IROrder from N0 if it takes SDLoc from N0. A deterministic schedule is a desirable property, but not sure if it needs to be forced here.

Addressing @aprantl's comments. Thanks!

Harbormaster completed remote builds in B4162: Diff 89270.Feb 21 2017, 1:15 PM

andreadb added inline comments.Feb 22 2017, 4:44 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4689–4692	I have to admit, it is not particularly obvious to me why a null DebugLoc is better than a potentially imprecise DebugLoc from N. That said, I am not worried about that particular case. I think that overall the patch is good - but please wait for Adrian.

twoh added inline comments.Feb 22 2017, 8:42 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4689–4692	For debugger there might be not much difference. However, for sample-profile based optimization, while null DebugLoc is simply ignored by the optimizer, imprecise DebugLoc may end up with "wrong" optimization. You can find more detailed discussion on it from comments under D25742.

aprantl added inline comments.Feb 22 2017, 9:14 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4689–4692	Ok, if I understand this correctly: ` t1: i1 = setcc t2, t3, cc, LOC0 t4: i1 = xor t1, Constant:i1<-1>, LOC1 -> t5: i1 = setcc t2, t3 !cc, LOC0 ` There are two reasonable ways to approach this: "We are folding the xor into the setcc", so the location from the setcc wins / remains untouched. This is what this patch implements. "We are merging two instructions", so we should merge the two locations. Then we should use the getMergedDebugLoc API. I'm fine with both of the above variants. I don't think we should get into the business of "prefer Loc1, but take Loc2 if Loc1 isn't available". If we are concerned about loosing fidelity, we should investigate why the setcc doesn't have a location instead and we will benefit in more places from that. (I'm aware that this is more a personal opinion than a sound technical argument :-)

andreadb added inline comments.Feb 22 2017, 9:19 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4689–4692	D25742 is a completely different case where common code from two basic blocks is merged into a common tail. In that case, we don't want that common code still refers to its original debugloc. In your test case, you are combining instructions which are from the same basic block. I don't see how this can affect the sample profile; in particular, it cannot negatively impact the computation of block frequency. Essentially, the problem is only when you are counting sample hits against the wrong basic block.

twoh added inline comments.Feb 22 2017, 11:03 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4689–4692	@andreadb Oh, I just wanted to bring up the example of null debugloc is better than imprecise debugloc for sample profile. Sorry for the confusion. @aprantl In general I think using getMergedDebugLoc API is the right way to go, but for this specific case it is pretty clear that xor is a part of what setcc supposed to do (i.e. setting predicate), so it makes more sense to take SDLoc from setcc. In particular, I believe that SelectionDAGBuilder::visitSwitchCase (https://reviews.llvm.org/diffusion/L/browse/llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp;295856$1872) is one major source generating the pattern of setcc followed by xor with constant -1, and when xor is created at that point it should have take SDLoc from setcc not from following branch instruction. Unfortunately, xor instruction sets its SDLoc to the value returned from getCurSDLoc() call, but CurInst is already a branch instruction. Fixing SDLoc for xor there would involve non-trivial amount of work because it will require SelectionDAGBuilder to maintain not only current instruction but also previous instruction, so I prefer to address it here.

In general I think using getMergedDebugLoc API is the right way to go, but for this specific case it is pretty clear that xor is a part of what setcc supposed to do (i.e. setting predicate), so it makes more sense to take SDLoc from setcc.

Yes, and as I said I can follow that reasoning. As far as I am concerned this LGTM.

Ping. thanks!

LGTM too!

This revision is now accepted and ready to land.Mar 2 2017, 4:43 AM

Rebase. Fix test/CodeGen/X86/and-sink.ll that added 9days ago.

Closed by commit rL296825: [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y) (authored by twoh). · Explain WhyMar 2 2017, 2:10 PM

This revision was automatically updated to reflect the committed changes.

Diff 87966

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,680 Lines • ▼ Show 20 Lines	ISD::CondCode NotCC = ISD::getSetCCInverse(cast<CondCodeSDNode>(CC)->get(),
isInt);		isInt);

if (!LegalOperations \|\|		if (!LegalOperations \|\|
TLI.isCondCodeLegal(NotCC, LHS.getSimpleValueType())) {		TLI.isCondCodeLegal(NotCC, LHS.getSimpleValueType())) {
switch (N0.getOpcode()) {		switch (N0.getOpcode()) {
default:		default:
llvm_unreachable("Unhandled SetCC Equivalent!");		llvm_unreachable("Unhandled SetCC Equivalent!");
case ISD::SETCC:		case ISD::SETCC:
return DAG.getSetCC(SDLoc(N), VT, LHS, RHS, NotCC);		return DAG.getSetCC(SDLoc(N0), VT, LHS, RHS, NotCC);
case ISD::SELECT_CC:		case ISD::SELECT_CC:
return DAG.getSelectCC(SDLoc(N), LHS, RHS, N0.getOperand(2),		return DAG.getSelectCC(SDLoc(N0), LHS, RHS, N0.getOperand(2),
N0.getOperand(3), NotCC);		N0.getOperand(3), NotCC);
		andreadbUnsubmitted Not Done Reply Inline Actions With this change, we always propagate the debug location of N0. However, this may be suboptimal if we end up in a situation where N0 has a null DebugLoc, and the DebugLoc of N is not null. In that case, shouldn't we propagate SDLoc(N) instead? When constructing a new SetCC/SelectCC node, what if we give higher priority to SDLoc(N) over SDLoc(N0) based on whether `N0->getDebugLoc()` returns a null location? In order to guarantee a deterministic schedule, we could always call `SelectionDAG::UpdadeSDLocOnMergedSDNode()` on the newly created dag node, with the goal to set the new IROrder equal to the minimum node order between N0 and N. Not sure if that makes sense. I hope this helps. andreadb: With this change, we always propagate the debug location of N0. However, this may be…
		twohAuthorUnsubmitted Not Done Reply Inline Actions @andreadb Thanks for the comments! In this particular case I think it is not correct to take SDLoc from N, and it is better to let the new SDLoc have null DebugLoc if N0 has a null DebugLoc. If we can't have a precise DebugLoc, I think it is better with no DebugLoc than imprecise DebugLoc. Regarding scheduling order, It seems more natural to me that the new SDValue takes IROrder from N0 if it takes SDLoc from N0. A deterministic schedule is a desirable property, but not sure if it needs to be forced here. twoh: @andreadb Thanks for the comments! In this particular case I think it is not correct to take…
		andreadbUnsubmitted Not Done Reply Inline Actions I have to admit, it is not particularly obvious to me why a null DebugLoc is better than a potentially imprecise DebugLoc from N. That said, I am not worried about that particular case. I think that overall the patch is good - but please wait for Adrian. andreadb: I have to admit, it is not particularly obvious to me why a null DebugLoc is better than a…
		twohAuthorUnsubmitted Not Done Reply Inline Actions For debugger there might be not much difference. However, for sample-profile based optimization, while null DebugLoc is simply ignored by the optimizer, imprecise DebugLoc may end up with "wrong" optimization. You can find more detailed discussion on it from comments under D25742. twoh: For debugger there might be not much difference. However, for sample-profile based optimization…
		andreadbUnsubmitted Not Done Reply Inline Actions D25742 is a completely different case where common code from two basic blocks is merged into a common tail. In that case, we don't want that common code still refers to its original debugloc. In your test case, you are combining instructions which are from the same basic block. I don't see how this can affect the sample profile; in particular, it cannot negatively impact the computation of block frequency. Essentially, the problem is only when you are counting sample hits against the wrong basic block. andreadb: D25742 is a completely different case where common code from two basic blocks is merged into a…
		twohAuthorUnsubmitted Not Done Reply Inline Actions @andreadb Oh, I just wanted to bring up the example of null debugloc is better than imprecise debugloc for sample profile. Sorry for the confusion. @aprantl In general I think using getMergedDebugLoc API is the right way to go, but for this specific case it is pretty clear that xor is a part of what setcc supposed to do (i.e. setting predicate), so it makes more sense to take SDLoc from setcc. In particular, I believe that SelectionDAGBuilder::visitSwitchCase (https://reviews.llvm.org/diffusion/L/browse/llvm/trunk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp;295856$1872) is one major source generating the pattern of setcc followed by xor with constant -1, and when xor is created at that point it should have take SDLoc from setcc not from following branch instruction. Unfortunately, xor instruction sets its SDLoc to the value returned from getCurSDLoc() call, but CurInst is already a branch instruction. Fixing SDLoc for xor there would involve non-trivial amount of work because it will require SelectionDAGBuilder to maintain not only current instruction but also previous instruction, so I prefer to address it here. twoh: @andreadb Oh, I just wanted to bring up the example of null debugloc is better than imprecise…
		aprantlUnsubmitted Not Done Reply Inline Actions Ok, if I understand this correctly: ` t1: i1 = setcc t2, t3, cc, LOC0 t4: i1 = xor t1, Constant:i1<-1>, LOC1 -> t5: i1 = setcc t2, t3 !cc, LOC0 ` There are two reasonable ways to approach this: "We are folding the xor into the setcc", so the location from the setcc wins / remains untouched. This is what this patch implements. "We are merging two instructions", so we should merge the two locations. Then we should use the getMergedDebugLoc API. I'm fine with both of the above variants. I don't think we should get into the business of "prefer Loc1, but take Loc2 if Loc1 isn't available". If we are concerned about loosing fidelity, we should investigate why the setcc doesn't have a location instead and we will benefit in more places from that. (I'm aware that this is more a personal opinion than a sound technical argument :-) aprantl: Ok, if I understand this correctly: ` t1: i1 = setcc t2, t3, cc, LOC0 t4: i1 = xor t1, Constant…
}		}
}		}
}		}

// fold (not (zext (setcc x, y))) -> (zext (not (setcc x, y)))		// fold (not (zext (setcc x, y))) -> (zext (not (setcc x, y)))
if (isOneConstant(N1) && N0.getOpcode() == ISD::ZERO_EXTEND &&		if (isOneConstant(N1) && N0.getOpcode() == ISD::ZERO_EXTEND &&
N0.getNode()->hasOneUse() &&		N0.getNode()->hasOneUse() &&
isSetCCEquivalent(N0.getOperand(0), LHS, RHS, CC)){		isSetCCEquivalent(N0.getOperand(0), LHS, RHS, CC)){
▲ Show 20 Lines • Show All 1,428 Lines • ▼ Show 20 Lines	if (Mask.getOpcode() == ISD::SETCC) {
getMachineMemOperand(MLD->getPointerInfo(),		getMachineMemOperand(MLD->getPointerInfo(),
MachineMemOperand::MOLoad, LoMemVT.getStoreSize(),		MachineMemOperand::MOLoad, LoMemVT.getStoreSize(),
Alignment, MLD->getAAInfo(), MLD->getRanges());		Alignment, MLD->getAAInfo(), MLD->getRanges());

Lo = DAG.getMaskedLoad(LoVT, DL, Chain, Ptr, MaskLo, Src0Lo, LoMemVT, MMO,		Lo = DAG.getMaskedLoad(LoVT, DL, Chain, Ptr, MaskLo, Src0Lo, LoMemVT, MMO,
ISD::NON_EXTLOAD, MLD->isExpandingLoad());		ISD::NON_EXTLOAD, MLD->isExpandingLoad());

Ptr = TLI.IncrementMemoryAddress(Ptr, MaskLo, DL, LoMemVT, DAG,		Ptr = TLI.IncrementMemoryAddress(Ptr, MaskLo, DL, LoMemVT, DAG,
MLD->isExpandingLoad());		MLD->isExpandingLoad());

MMO = DAG.getMachineFunction().		MMO = DAG.getMachineFunction().
getMachineMemOperand(MLD->getPointerInfo(),		getMachineMemOperand(MLD->getPointerInfo(),
MachineMemOperand::MOLoad, HiMemVT.getStoreSize(),		MachineMemOperand::MOLoad, HiMemVT.getStoreSize(),
SecondHalfAlignment, MLD->getAAInfo(), MLD->getRanges());		SecondHalfAlignment, MLD->getAAInfo(), MLD->getRanges());

Hi = DAG.getMaskedLoad(HiVT, DL, Chain, Ptr, MaskHi, Src0Hi, HiMemVT, MMO,		Hi = DAG.getMaskedLoad(HiVT, DL, Chain, Ptr, MaskHi, Src0Hi, HiMemVT, MMO,
ISD::NON_EXTLOAD, MLD->isExpandingLoad());		ISD::NON_EXTLOAD, MLD->isExpandingLoad());
▲ Show 20 Lines • Show All 9,749 Lines • Show Last 20 Lines

test/CodeGen/X86/xor-combine-debugloc.ll

This file was added.

				; RUN: llc < %s \| FileCheck %s
				;
				; CHECK: foo:x <- %EDI
				aprantlUnsubmitted Not Done Reply Inline Actions Please add a comment that explains what is being checked here. It might be more robust to run llc -stop-after=.. and check the MIR right after ISEL. aprantl: Please add a comment that explains what is being checked here. It might be more robust to run…
				; CHECK: foo:y <- %ESI
				; CHECK: .loc 1 5 9
				; CHECK-NEXT: cmpl %esi, %edi
				; CHECK: .loc 1 5 7
				; CHECK-NEXT: je

				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind uwtable
				define i32 @foo(i32 %x, i32 %y) !dbg !4 {
				entry:
				tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !9, metadata !11), !dbg !12
				tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !10, metadata !11), !dbg !13
				%cmp = icmp ne i32 %x, %y, !dbg !14
				br i1 %cmp, label %if.then, label %if.else, !dbg !16

				if.then: ; preds = %entry
				%call = tail call i32 (...) @bar() #3, !dbg !17
				br label %return, !dbg !18

				if.else: ; preds = %entry
				%call1 = tail call i32 (...) @baz() #3, !dbg !19
				br label %return, !dbg !20

				return: ; preds = %if.else, %if.then
				%retval.0 = phi i32 [ %call, %if.then ], [ %call1, %if.else ]
				ret i32 %retval.0, !dbg !21
				}

				declare i32 @bar(...)
				declare i32 @baz(...)

				; Function Attrs: nounwind readnone
				declare void @llvm.dbg.value(metadata, i64, metadata, metadata)

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!2, !3}

				!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, emissionKind: FullDebug)
				!1 = !DIFile(filename: "foo.c", directory: "b/")
				!2 = !{i32 2, !"Dwarf Version", i32 4}
				!3 = !{i32 2, !"Debug Info Version", i32 3}
				!4 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 4, type: !5, isLocal: false, isDefinition: true, scopeLine: 4, flags: DIFlagPrototyped, isOptimized: true, unit: !0, variables: !8)
				!5 = !DISubroutineType(types: !6)
				!6 = !{!7, !7, !7}
				!7 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
				!8 = !{!9, !10}
				!9 = !DILocalVariable(name: "x", arg: 1, scope: !4, file: !1, line: 4, type: !7)
				!10 = !DILocalVariable(name: "y", arg: 2, scope: !4, file: !1, line: 4, type: !7)
				!11 = !DIExpression()
				!12 = !DILocation(line: 4, column: 13, scope: !4)
				!13 = !DILocation(line: 4, column: 20, scope: !4)
				!14 = !DILocation(line: 5, column: 9, scope: !15)
				!15 = distinct !DILexicalBlock(scope: !4, file: !1, line: 5, column: 7)
				!16 = !DILocation(line: 5, column: 7, scope: !4)
				!17 = !DILocation(line: 6, column: 12, scope: !15)
				!18 = !DILocation(line: 6, column: 5, scope: !15)
				!19 = !DILocation(line: 8, column: 12, scope: !15)
				!20 = !DILocation(line: 8, column: 5, scope: !15)
				!21 = !DILocation(line: 9, column: 1, scope: !4)

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y)
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 87966

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/X86/xor-combine-debugloc.ll

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 87966

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/X86/xor-combine-debugloc.ll

[DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y)
ClosedPublic