This is an archive of the discontinued LLVM Phabricator instance.

Remove debug location from common tail when tail-merging
ClosedPublic

Authored by rob.lougher on Oct 18 2016, 11:33 AM.

Details

Summary

Consider the following simple if-then-else:

define i32 @test(i32 %a, i32 %b) !dbg !6 {
entry:
  %tobool = icmp ne i32 %b, 0, !dbg !8
  br i1 %tobool, label %if.then, label %if.else, !dbg !8

if.then:                                          ; preds = %entry
  %call = call i32 @foo(i32 %a), !dbg !9
  %add = add nsw i32 %a, %call, !dbg !10
  br label %if.end, !dbg !11

if.else:                                          ; preds = %entry
  %call1 = call i32 @bar(i32 %a), !dbg !12
  %add2 = add nsw i32 %a, %call1, !dbg !13
  br label %if.end

if.end:                                           ; preds = %if.else, %if.then
  %a.addr.0 = phi i32 [ %add, %if.then ], [ %add2, %if.else ]
  ret i32 %a.addr.0, !dbg !14
}

With debug line information as follows:

!8 = !DILocation(line: 5, column: 6, scope: !6)
!9 = !DILocation(line: 6, column: 10, scope: !6)
!10 = !DILocation(line: 6, column: 7, scope: !6)
!11 = !DILocation(line: 6, column: 5, scope: !6)
!12 = !DILocation(line: 8, column: 10, scope: !6)
!13 = !DILocation(line: 8, column: 7, scope: !6)
!14 = !DILocation(line: 9, column: 3, scope: !6)

If this is passed to llc the branch folding pass will merge the two add instructions into a common tail (.LBB0_3):

test:                                   # @test
        .loc    1 4 0                   # test.c:4:0
        pushq   %rbx
        movl    %edi, %ebx
        .loc    1 5 6 prologue_end      # test.c:5:6
        testl   %esi, %esi
        je      .LBB0_2
        .loc    1 6 10                  # test.c:6:10
        callq   foo
        jmp     .LBB0_3
.LBB0_2:                                # %if.else
        .loc    1 8 10                  # test.c:8:10
        callq   bar
.LBB0_3:                                # %if.end
        addl    %ebx, %eax
        .loc    1 9 3                   # test.c:9:3
        popq    %rbx
        retq

However, the common tail retains the debug information from its original location (randomly taken from one of the merge inputs). In this case the else-part has been taken and the add appears to occur at line 8. This is a problem for sample-based PGO as the else-part will now appear to have been executed irrespective of which side of the if-then-else was actually taken. It will also affect the optimized debugging experience leading to odd stepping in the debugger.

This issue was first seen with the 3.9 release compiler. The issue still exists on trunk, however it is masked by the recent changes to simplify CFG (SinkThenElseCodeToEnd). If the above IR is passed to clang, the add will already have been sunk by the time it gets to the branch folding pass. However it should be fixed as tail-merging will handle cases not handled by simplify CFG.

This patch removes the debug location from the common tail. As this does not cause a new location to be emitted the testcase forces line 0 by using the -use-unknown-locations flag to use line 0 for all instructions with no debug locations. The pending line 0 patch (review D24180), which turns "no debug info" at the start of a basic-block into line 0 will also emit line 0 for the common-tail.

One test required fixing after the change (DebugInfo/COFF/local-variables.ll). This test has an if-then-else with a common-tail. After the change the common-tail no longer has the original debug location which decreases the size of the else (the end of the else is now the same as the end of the second inline site, i.e. the original labels inline_site2_end and else_end are the same and have been merged in the test). This also changes the size of various ranges.

OK to commit?

Thanks,
Rob.

Diff Detail

Event Timeline

rob.lougher retitled this revision from to Remove debug location from common tail when tail-merging.
rob.lougher updated this object.
aprantl edited edge metadata.Oct 18 2016, 11:41 AM

This approach seems generally fine, but I have one question:

If the code were on a single line, and both locations share a common ancestor scope, it seems make sense to create a new location using the common ancestor scope and line and only remove the column information.

How about adding an API to DebugLoc to merge two DebugLocs to handle situations like this? I could imagine that this happens in multiple places in the optimizer.

probinson edited edge metadata.Oct 18 2016, 12:25 PM

This approach seems generally fine, but I have one question:

If the code were on a single line, and both locations share a common ancestor scope, it seems make sense to create a new location using the common ancestor scope and line and only remove the column information.

That would collapse the if-then-else into (effectively) a single statement. That probably works okay for a debugger but not profiling, which still wants to treat the then/else as distinct. And, after the tail merging, the tails are no longer distinct.

This approach seems generally fine, but I have one question:

If the code were on a single line, and both locations share a common ancestor scope, it seems make sense to create a new location using the common ancestor scope and line and only remove the column information.

That would collapse the if-then-else into (effectively) a single statement. That probably works okay for a debugger but not profiling, which still wants to treat the then/else as distinct. And, after the tail merging, the tails are no longer distinct.

I'm not sure I understand your point. How is having an orphaned add instruction preferable over having it associated with the collapsed if-then-else statement? Wouldn't I want that instruction to be counted towards the line?

This approach seems generally fine, but I have one question:

If the code were on a single line, and both locations share a common ancestor scope, it seems make sense to create a new location using the common ancestor scope and line and only remove the column information.

That would collapse the if-then-else into (effectively) a single statement. That probably works okay for a debugger but not profiling, which still wants to treat the then/else as distinct. And, after the tail merging, the tails are no longer distinct.

I'm not sure I understand your point. How is having an orphaned add instruction preferable over having it associated with the collapsed if-then-else statement? Wouldn't I want that instruction to be counted towards the line?

No, the profiler wants to assign sample counts to each block individually. By giving each block the same source attribution, you assert that they have the same profiles. That's unlikely to be true in practice. Really what happens is that the sample counts would be assigned to the parent block, which doesn't help the profiler sort out what to do with the nested blocks.

Admittedly, zapping the source attribution on the merged (parts of the) blocks doesn't let you attribute those counts to *any* block, but at least you aren't attributing counts incorrectly.

Robert may want to correct some of my assumptions here, but this is my understanding.

This approach seems generally fine, but I have one question:

If the code were on a single line, and both locations share a common ancestor scope, it seems make sense to create a new location using the common ancestor scope and line and only remove the column information.

That would collapse the if-then-else into (effectively) a single statement. That probably works okay for a debugger but not profiling, which still wants to treat the then/else as distinct. And, after the tail merging, the tails are no longer distinct.

I'm not sure I understand your point. How is having an orphaned add instruction preferable over having it associated with the collapsed if-then-else statement? Wouldn't I want that instruction to be counted towards the line?

No, the profiler wants to assign sample counts to each block individually. By giving each block the same source attribution, you assert that they have the same profiles. That's unlikely to be true in practice. Really what happens is that the sample counts would be assigned to the parent block, which doesn't help the profiler sort out what to do with the nested blocks.

Admittedly, zapping the source attribution on the merged (parts of the) blocks doesn't let you attribute those counts to *any* block, but at least you aren't attributing counts incorrectly.

Yes, as far as PGO is concerned zapping the source attribution is much much preferable. The sample loader uses the maximum instruction weight within the basic-block as the block weight so not all of the instructions needs to have been hit. This means zapping the source attribution will have no affect on the weight of the correct block but will stop the other block being incorrectly counted. In the testcase above imagine the function was called 100 times and both sides of the if-then-else was executed 50/50. The if-block would get a weight of "50" from the call to foo but the else-block would get a weight of "100" from the add and so it would look like the else-part was executed twice as many times as the if-part - this may lead to incorrect decisions re-order blocks. Zapping the debug info on the common-tail will lead to the correct weights.

Robert may want to correct some of my assumptions here, but this is my understanding.

aprantl accepted this revision.Oct 25 2016, 10:04 AM
aprantl edited edge metadata.

Thanks.

This revision is now accepted and ready to land.Oct 25 2016, 10:04 AM
This revision was automatically updated to reflect the committed changes.
MatzeB added a subscriber: MatzeB.Oct 25 2016, 12:42 PM
MatzeB added inline comments.
llvm/trunk/lib/CodeGen/BranchFolding.cpp
899–902 ↗(On Diff #75753)

You could avoid the extra loop by putting this code into mergeOperations() (line 770 already has a check for isDebugValue()).

I've had to revert this as it caused a ubsan test to unexpectedly fail (vptr.cpp). As I don't know much about these tests I need to do some investigation into the failure. I suspect the test was reliant on a debug location from a common-tail.