This is an archive of the discontinued LLVM Phabricator instance.

Remove debug location from common tail when tail-merging
ClosedPublic

Authored by rob.lougher on Oct 18 2016, 11:33 AM.

Download Raw Diff

Details

Reviewers

dblaikie
danielcdh
probinson
aprantl
wolfgangp

Commits

rG660f2f9560c9: Reapply: "Remove debug location from common tail when tail-merging"
rGe32564774c09: Remove debug location from common tail when tail-merging
rL285212: Reapply: "Remove debug location from common tail when tail-merging"
rL285093: Remove debug location from common tail when tail-merging

Summary

Consider the following simple if-then-else:

define i32 @test(i32 %a, i32 %b) !dbg !6 {
entry:
  %tobool = icmp ne i32 %b, 0, !dbg !8
  br i1 %tobool, label %if.then, label %if.else, !dbg !8

if.then:                                          ; preds = %entry
  %call = call i32 @foo(i32 %a), !dbg !9
  %add = add nsw i32 %a, %call, !dbg !10
  br label %if.end, !dbg !11

if.else:                                          ; preds = %entry
  %call1 = call i32 @bar(i32 %a), !dbg !12
  %add2 = add nsw i32 %a, %call1, !dbg !13
  br label %if.end

if.end:                                           ; preds = %if.else, %if.then
  %a.addr.0 = phi i32 [ %add, %if.then ], [ %add2, %if.else ]
  ret i32 %a.addr.0, !dbg !14
}

With debug line information as follows:

!8 = !DILocation(line: 5, column: 6, scope: !6)
!9 = !DILocation(line: 6, column: 10, scope: !6)
!10 = !DILocation(line: 6, column: 7, scope: !6)
!11 = !DILocation(line: 6, column: 5, scope: !6)
!12 = !DILocation(line: 8, column: 10, scope: !6)
!13 = !DILocation(line: 8, column: 7, scope: !6)
!14 = !DILocation(line: 9, column: 3, scope: !6)

If this is passed to llc the branch folding pass will merge the two add instructions into a common tail (.LBB0_3):

test:                                   # @test
        .loc    1 4 0                   # test.c:4:0
        pushq   %rbx
        movl    %edi, %ebx
        .loc    1 5 6 prologue_end      # test.c:5:6
        testl   %esi, %esi
        je      .LBB0_2
        .loc    1 6 10                  # test.c:6:10
        callq   foo
        jmp     .LBB0_3
.LBB0_2:                                # %if.else
        .loc    1 8 10                  # test.c:8:10
        callq   bar
.LBB0_3:                                # %if.end
        addl    %ebx, %eax
        .loc    1 9 3                   # test.c:9:3
        popq    %rbx
        retq

However, the common tail retains the debug information from its original location (randomly taken from one of the merge inputs). In this case the else-part has been taken and the add appears to occur at line 8. This is a problem for sample-based PGO as the else-part will now appear to have been executed irrespective of which side of the if-then-else was actually taken. It will also affect the optimized debugging experience leading to odd stepping in the debugger.

This issue was first seen with the 3.9 release compiler. The issue still exists on trunk, however it is masked by the recent changes to simplify CFG (SinkThenElseCodeToEnd). If the above IR is passed to clang, the add will already have been sunk by the time it gets to the branch folding pass. However it should be fixed as tail-merging will handle cases not handled by simplify CFG.

This patch removes the debug location from the common tail. As this does not cause a new location to be emitted the testcase forces line 0 by using the -use-unknown-locations flag to use line 0 for all instructions with no debug locations. The pending line 0 patch (review D24180), which turns "no debug info" at the start of a basic-block into line 0 will also emit line 0 for the common-tail.

One test required fixing after the change (DebugInfo/COFF/local-variables.ll). This test has an if-then-else with a common-tail. After the change the common-tail no longer has the original debug location which decreases the size of the else (the end of the else is now the same as the end of the second inline site, i.e. the original labels inline_site2_end and else_end are the same and have been merged in the test). This also changes the size of various ranges.

OK to commit?

Thanks,
Rob.

Diff Detail

Repository: rL LLVM

Event Timeline

rob.lougher updated this revision to Diff 75010.Oct 18 2016, 11:33 AM

rob.lougher retitled this revision from to Remove debug location from common tail when tail-merging.

rob.lougher updated this object.

rob.lougher added reviewers: dblaikie, danielcdh, probinson, wolfgangp, aprantl.

rob.lougher added subscribers: llvm-commits, andreadb, gbedwell.

This approach seems generally fine, but I have one question:

If the code were on a single line, and both locations share a common ancestor scope, it seems make sense to create a new location using the common ancestor scope and line and only remove the column information.

How about adding an API to DebugLoc to merge two DebugLocs to handle situations like this? I could imagine that this happens in multiple places in the optimizer.

In D25742#573352, @aprantl wrote:

This approach seems generally fine, but I have one question:

If the code were on a single line, and both locations share a common ancestor scope, it seems make sense to create a new location using the common ancestor scope and line and only remove the column information.

That would collapse the if-then-else into (effectively) a single statement. That probably works okay for a debugger but not profiling, which still wants to treat the then/else as distinct. And, after the tail merging, the tails are no longer distinct.

In D25742#573401, @probinson wrote:

In D25742#573352, @aprantl wrote:

This approach seems generally fine, but I have one question:

If the code were on a single line, and both locations share a common ancestor scope, it seems make sense to create a new location using the common ancestor scope and line and only remove the column information.

That would collapse the if-then-else into (effectively) a single statement. That probably works okay for a debugger but not profiling, which still wants to treat the then/else as distinct. And, after the tail merging, the tails are no longer distinct.

I'm not sure I understand your point. How is having an orphaned add instruction preferable over having it associated with the collapsed if-then-else statement? Wouldn't I want that instruction to be counted towards the line?

In D25742#573687, @aprantl wrote:

In D25742#573401, @probinson wrote:

In D25742#573352, @aprantl wrote:

This approach seems generally fine, but I have one question:

If the code were on a single line, and both locations share a common ancestor scope, it seems make sense to create a new location using the common ancestor scope and line and only remove the column information.

That would collapse the if-then-else into (effectively) a single statement. That probably works okay for a debugger but not profiling, which still wants to treat the then/else as distinct. And, after the tail merging, the tails are no longer distinct.

I'm not sure I understand your point. How is having an orphaned add instruction preferable over having it associated with the collapsed if-then-else statement? Wouldn't I want that instruction to be counted towards the line?

No, the profiler wants to assign sample counts to each block individually. By giving each block the same source attribution, you assert that they have the same profiles. That's unlikely to be true in practice. Really what happens is that the sample counts would be assigned to the parent block, which doesn't help the profiler sort out what to do with the nested blocks.

Admittedly, zapping the source attribution on the merged (parts of the) blocks doesn't let you attribute those counts to *any* block, but at least you aren't attributing counts incorrectly.

Robert may want to correct some of my assumptions here, but this is my understanding.

In D25742#573702, @probinson wrote:

In D25742#573687, @aprantl wrote:

In D25742#573401, @probinson wrote:

In D25742#573352, @aprantl wrote:

This approach seems generally fine, but I have one question:

If the code were on a single line, and both locations share a common ancestor scope, it seems make sense to create a new location using the common ancestor scope and line and only remove the column information.

That would collapse the if-then-else into (effectively) a single statement. That probably works okay for a debugger but not profiling, which still wants to treat the then/else as distinct. And, after the tail merging, the tails are no longer distinct.

I'm not sure I understand your point. How is having an orphaned add instruction preferable over having it associated with the collapsed if-then-else statement? Wouldn't I want that instruction to be counted towards the line?

No, the profiler wants to assign sample counts to each block individually. By giving each block the same source attribution, you assert that they have the same profiles. That's unlikely to be true in practice. Really what happens is that the sample counts would be assigned to the parent block, which doesn't help the profiler sort out what to do with the nested blocks.

Admittedly, zapping the source attribution on the merged (parts of the) blocks doesn't let you attribute those counts to *any* block, but at least you aren't attributing counts incorrectly.

Yes, as far as PGO is concerned zapping the source attribution is much much preferable. The sample loader uses the maximum instruction weight within the basic-block as the block weight so not all of the instructions needs to have been hit. This means zapping the source attribution will have no affect on the weight of the correct block but will stop the other block being incorrectly counted. In the testcase above imagine the function was called 100 times and both sides of the if-then-else was executed 50/50. The if-block would get a weight of "50" from the call to foo but the else-block would get a weight of "100" from the add and so it would look like the else-part was executed twice as many times as the if-part - this may lead to incorrect decisions re-order blocks. Zapping the debug info on the common-tail will lead to the correct weights.

Robert may want to correct some of my assumptions here, but this is my understanding.

ping...

Thanks.

This revision is now accepted and ready to land.Oct 25 2016, 10:04 AM

Closed by commit rL285093: Remove debug location from common tail when tail-merging (authored by rlougher). · Explain WhyOct 25 2016, 11:53 AM

This revision was automatically updated to reflect the committed changes.

MatzeB added a subscriber: MatzeB.Oct 25 2016, 12:42 PM

MatzeB added inline comments.

llvm/trunk/lib/CodeGen/BranchFolding.cpp
899–902	You could avoid the extra loop by putting this code into mergeOperations() (line 770 already has a check for isDebugValue()).

I've had to revert this as it caused a ubsan test to unexpectedly fail (vptr.cpp). As I don't know much about these tests I need to do some investigation into the failure. I suspect the test was reliant on a debug location from a common-tail.

Diffusion mentioned this in rL285208: [ubsan] Fix vptr.cpp test to be more resilient. NFC..Oct 26 2016, 9:12 AM

twoh mentioned this in D30226: [BranchFolding] Merge debug locations from common tail instead of removing.Feb 21 2017, 2:39 PM

twoh mentioned this in D29813: [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y).Feb 22 2017, 8:42 AM

twoh mentioned this in rL297805: [BranchFolding] Merge debug locations from common tail instead of removing.Mar 14 2017, 10:57 PM

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

BranchFolding.cpp

7 lines

test/

DebugInfo/

COFF/

local-variables.ll

10 lines

X86/

tail-merge.ll

76 lines

Diff 75753

llvm/trunk/lib/CodeGen/BranchFolding.cpp

Show First 20 Lines • Show All 714 Lines • ▼ Show 20 Lines	if (t <= TimeEstimate) {
commonTailIndex = i;		commonTailIndex = i;
}		}
}		}

MachineBasicBlock::iterator BBI =		MachineBasicBlock::iterator BBI =
SameTails[commonTailIndex].getTailStartPos();		SameTails[commonTailIndex].getTailStartPos();
MachineBasicBlock *MBB = SameTails[commonTailIndex].getBlock();		MachineBasicBlock *MBB = SameTails[commonTailIndex].getBlock();

// If the common tail includes any debug info we will take it pretty
// randomly from one of the inputs. Might be better to remove it?
DEBUG(dbgs() << "\nSplitting BB#" << MBB->getNumber() << ", size "		DEBUG(dbgs() << "\nSplitting BB#" << MBB->getNumber() << ", size "
<< maxCommonTailLength);		<< maxCommonTailLength);

// If the split block unconditionally falls-thru to SuccBB, it will be		// If the split block unconditionally falls-thru to SuccBB, it will be
// merged. In control flow terms it should then take SuccBB's name. e.g. If		// merged. In control flow terms it should then take SuccBB's name. e.g. If
// SuccBB is an inner loop, the common tail is still part of the inner loop.		// SuccBB is an inner loop, the common tail is still part of the inner loop.
const BasicBlock *BB = (SuccBB && MBB->succ_size() == 1) ?		const BasicBlock *BB = (SuccBB && MBB->succ_size() == 1) ?
SuccBB->getBasicBlock() : MBB->getBasicBlock();		SuccBB->getBasicBlock() : MBB->getBasicBlock();
▲ Show 20 Lines • Show All 160 Lines • ▼ Show 20 Lines	if (commonTailIndex == SameTails.size() \|\|
}		}
}		}

MachineBasicBlock *MBB = SameTails[commonTailIndex].getBlock();		MachineBasicBlock *MBB = SameTails[commonTailIndex].getBlock();

// Recompute common tail MBB's edge weights and block frequency.		// Recompute common tail MBB's edge weights and block frequency.
setCommonTailEdgeWeights(*MBB);		setCommonTailEdgeWeights(*MBB);

		// Remove the original debug location from the common tail.
		for (auto &MI : *MBB)
		if (!MI.isDebugValue())
		MI.setDebugLoc(DebugLoc());
		MatzeBUnsubmitted Not Done Reply Inline Actions You could avoid the extra loop by putting this code into mergeOperations() (line 770 already has a check for isDebugValue()). MatzeB: You could avoid the extra loop by putting this code into mergeOperations() (line 770 already…

// MBB is common tail. Adjust all other BB's to jump to this one.		// MBB is common tail. Adjust all other BB's to jump to this one.
// Traversal must be forwards so erases work.		// Traversal must be forwards so erases work.
DEBUG(dbgs() << "\nUsing common tail in BB#" << MBB->getNumber()		DEBUG(dbgs() << "\nUsing common tail in BB#" << MBB->getNumber()
<< " for ");		<< " for ");
for (unsigned int i=0, e = SameTails.size(); i != e; ++i) {		for (unsigned int i=0, e = SameTails.size(); i != e; ++i) {
if (commonTailIndex == i)		if (commonTailIndex == i)
continue;		continue;
DEBUG(dbgs() << "BB#" << SameTails[i].getBlock()->getNumber()		DEBUG(dbgs() << "BB#" << SameTails[i].getBlock()->getNumber()
▲ Show 20 Lines • Show All 1,039 Lines • Show Last 20 Lines

llvm/trunk/test/DebugInfo/COFF/local-variables.ll

	Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	; ASM: [[inline_site2:\.Ltmp.*]]:			; ASM: [[inline_site2:\.Ltmp.*]]:
	; ASM: .cv_inline_site_id 2 within 0 inlined_at 1 14 5			; ASM: .cv_inline_site_id 2 within 0 inlined_at 1 14 5
	; ASM: .cv_loc 2 1 4 7 # t.cpp:4:7			; ASM: .cv_loc 2 1 4 7 # t.cpp:4:7
	; ASM: movl $3, 48(%rsp)			; ASM: movl $3, 48(%rsp)
	; ASM: leaq 48(%rsp), %rcx			; ASM: leaq 48(%rsp), %rcx
	; ASM: .cv_loc 2 1 5 3 # t.cpp:5:3			; ASM: .cv_loc 2 1 5 3 # t.cpp:5:3
	; ASM: callq capture			; ASM: callq capture
	; ASM: leaq 36(%rsp), %rcx			; ASM: leaq 36(%rsp), %rcx
	; ASM: [[inline_site2_end:\.Ltmp.*]]:			; ASM: [[else_end:\.Ltmp.*]]:
	; ASM: .LBB0_3: # %if.end			; ASM: .LBB0_3: # %if.end
	; ASM: .cv_loc 0 1 15 5 # t.cpp:15:5
	; ASM: callq capture			; ASM: callq capture
	; ASM: [[else_end:\.Ltmp.*]]:
	; ASM: .cv_loc 0 1 17 1 # t.cpp:17:1			; ASM: .cv_loc 0 1 17 1 # t.cpp:17:1
	; ASM: nop			; ASM: nop
	; ASM: addq $56, %rsp			; ASM: addq $56, %rsp
	; ASM: retq			; ASM: retq
	; ASM: [[param_end:\.Ltmp.*]]:			; ASM: [[param_end:\.Ltmp.*]]:

	; ASM: .short 4414 # Record kind: S_LOCAL			; ASM: .short 4414 # Record kind: S_LOCAL
	; ASM: .long 116 # TypeIndex			; ASM: .long 116 # TypeIndex
	Show All 17 Lines
	; ASM: .asciz "v"			; ASM: .asciz "v"
	; ASM: .cv_def_range [[inline_site1]] [[else_start]], "E\021O\001\000\000,\000\000\000"			; ASM: .cv_def_range [[inline_site1]] [[else_start]], "E\021O\001\000\000,\000\000\000"
	; ASM: .short 4430 # Record kind: S_INLINESITE_END			; ASM: .short 4430 # Record kind: S_INLINESITE_END
	; ASM: .short 4429 # Record kind: S_INLINESITE			; ASM: .short 4429 # Record kind: S_INLINESITE
	; ASM: .short 4414 # Record kind: S_LOCAL			; ASM: .short 4414 # Record kind: S_LOCAL
	; ASM: .long 116 # TypeIndex			; ASM: .long 116 # TypeIndex
	; ASM: .short 0 # Flags			; ASM: .short 0 # Flags
	; ASM: .asciz "v"			; ASM: .asciz "v"
	; ASM: .cv_def_range [[inline_site2]] [[inline_site2_end]], "E\021O\001\000\0000\000\000\000"			; ASM: .cv_def_range [[inline_site2]] [[else_end]], "E\021O\001\000\0000\000\000\000"
	; ASM: .short 4430 # Record kind: S_INLINESITE_END			; ASM: .short 4430 # Record kind: S_INLINESITE_END

	; OBJ: Subsection [			; OBJ: Subsection [
	; OBJ: SubSectionType: Symbols (0xF1)			; OBJ: SubSectionType: Symbols (0xF1)
	; OBJ: ProcStart {			; OBJ: ProcStart {
	; OBJ: DisplayName: f			; OBJ: DisplayName: f
	; OBJ: LinkageName: f			; OBJ: LinkageName: f
	; OBJ: }			; OBJ: }
	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; OBJ: DefRangeRegisterRel {			; OBJ: DefRangeRegisterRel {
	; OBJ: BaseRegister: 335			; OBJ: BaseRegister: 335
	; OBJ: HasSpilledUDTMember: No			; OBJ: HasSpilledUDTMember: No
	; OBJ: OffsetInParent: 0			; OBJ: OffsetInParent: 0
	; OBJ: BasePointerOffset: 36			; OBJ: BasePointerOffset: 36
	; OBJ: LocalVariableAddrRange {			; OBJ: LocalVariableAddrRange {
	; OBJ: OffsetStart: .text+0x2D			; OBJ: OffsetStart: .text+0x2D
	; OBJ: ISectStart: 0x0			; OBJ: ISectStart: 0x0
	; OBJ: Range: 0x24			; OBJ: Range: 0x1F
	; OBJ: }			; OBJ: }
	; OBJ: }			; OBJ: }
	; OBJ: InlineSite {			; OBJ: InlineSite {
	; OBJ: PtrParent: 0x0			; OBJ: PtrParent: 0x0
	; OBJ: PtrEnd: 0x0			; OBJ: PtrEnd: 0x0
	; OBJ: Inlinee: will_be_inlined (0x1002)			; OBJ: Inlinee: will_be_inlined (0x1002)
	; OBJ: BinaryAnnotations [			; OBJ: BinaryAnnotations [
	; OBJ: ChangeLineOffset: 1			; OBJ: ChangeLineOffset: 1
	Show All 24 Lines
	; OBJ: InlineSite {			; OBJ: InlineSite {
	; OBJ: PtrParent: 0x0			; OBJ: PtrParent: 0x0
	; OBJ: PtrEnd: 0x0			; OBJ: PtrEnd: 0x0
	; OBJ: Inlinee: will_be_inlined (0x1002)			; OBJ: Inlinee: will_be_inlined (0x1002)
	; OBJ: BinaryAnnotations [			; OBJ: BinaryAnnotations [
	; OBJ: ChangeLineOffset: 1			; OBJ: ChangeLineOffset: 1
	; OBJ: ChangeCodeOffset: 0x35			; OBJ: ChangeCodeOffset: 0x35
	; OBJ: ChangeCodeOffsetAndLineOffset: {CodeOffset: 0xD, LineOffset: 1}			; OBJ: ChangeCodeOffsetAndLineOffset: {CodeOffset: 0xD, LineOffset: 1}
	; OBJ: ChangeCodeLength: 0xA			; OBJ: ChangeCodeLength: 0xF
	; OBJ: ]			; OBJ: ]
	; OBJ: }			; OBJ: }
	; OBJ: Local {			; OBJ: Local {
	; OBJ: Type: int (0x74)			; OBJ: Type: int (0x74)
	; OBJ: Flags [ (0x0)			; OBJ: Flags [ (0x0)
	; OBJ: ]			; OBJ: ]
	; OBJ: VarName: v			; OBJ: VarName: v
	; OBJ: }			; OBJ: }
	▲ Show 20 Lines • Show All 107 Lines • Show Last 20 Lines

llvm/trunk/test/DebugInfo/X86/tail-merge.ll

				; RUN: llc %s -mtriple=x86_64-unknown-unknown -use-unknown-locations=true -o - \| FileCheck %s

				; Generated with "clang -gline-tables-only -c -emit-llvm -o - \| opt -sroa -S"
				; from source:
				;
				; extern int foo(int);
				; extern int bar(int);
				;
				; int test(int a, int b) {
				; if(b)
				; a += foo(a);
				; else
				; a += bar(a);
				; return a;
				; }

				; When tail-merging the debug location of the common tail should be removed.

				; CHECK-LABEL: test:
				; CHECK: movl %edi, [[REG:%.*]]
				; CHECK: testl %esi, %esi
				; CHECK: je [[ELSE:.LBB[0-9]+_[0-9]+]]
				; CHECK: .loc 1 6 10
				; CHECK: callq foo
				; CHECK: jmp [[TAIL:.LBB[0-9]+_[0-9]+]]
				; CHECK: [[ELSE]]:
				; CHECK: .loc 1 8 10
				; CHECK: callq bar
				; CHECK: [[TAIL]]:
				; CHECK: .loc 1 0 0
				; CHECK: addl [[REG]], %eax
				; CHECK: .loc 1 9 3

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				define i32 @test(i32 %a, i32 %b) !dbg !6 {
				entry:
				%tobool = icmp ne i32 %b, 0, !dbg !8
				br i1 %tobool, label %if.then, label %if.else, !dbg !8

				if.then: ; preds = %entry
				%call = call i32 @foo(i32 %a), !dbg !9
				%add = add nsw i32 %a, %call, !dbg !10
				br label %if.end, !dbg !11

				if.else: ; preds = %entry
				%call1 = call i32 @bar(i32 %a), !dbg !12
				%add2 = add nsw i32 %a, %call1, !dbg !13
				br label %if.end

				if.end: ; preds = %if.else, %if.then
				%a.addr.0 = phi i32 [ %add, %if.then ], [ %add2, %if.else ]
				ret i32 %a.addr.0, !dbg !14
				}

				declare i32 @foo(i32)
				declare i32 @bar(i32)

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!3, !4}

				!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "", isOptimized: false, runtimeVersion: 0, emissionKind: LineTablesOnly, enums: !2)
				!1 = !DIFile(filename: "test.c", directory: "")
				!2 = !{}
				!3 = !{i32 2, !"Dwarf Version", i32 4}
				!4 = !{i32 2, !"Debug Info Version", i32 3}
				!6 = distinct !DISubprogram(name: "test", scope: !1, file: !1, line: 4, type: !7, isLocal: false, isDefinition: true, scopeLine: 4, flags: DIFlagPrototyped, isOptimized: false, unit: !0, variables: !2)
				!7 = !DISubroutineType(types: !2)
				!8 = !DILocation(line: 5, column: 6, scope: !6)
				!9 = !DILocation(line: 6, column: 10, scope: !6)
				!10 = !DILocation(line: 6, column: 7, scope: !6)
				!11 = !DILocation(line: 6, column: 5, scope: !6)
				!12 = !DILocation(line: 8, column: 10, scope: !6)
				!13 = !DILocation(line: 8, column: 7, scope: !6)
				!14 = !DILocation(line: 9, column: 3, scope: !6)