This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Don't set debug location when folding through a phi node
ClosedPublic

Authored by rob.lougher on Nov 2 2016, 12:29 PM.

Details

Summary

Consider the following simple if-then-else:

define i32 @test(i32 %a, i32 %b) !dbg !6 {
entry:
  %tobool = icmp ne i32 %a, 0, !dbg !8
  br i1 %tobool, label %if.then, label %if.else, !dbg !8

if.then:                                          ; preds = %entry
  %call = call i32 @foo(), !dbg !9
  %sub = sub nsw i32 %b, %call, !dbg !10
  br label %if.end, !dbg !11

if.else:                                          ; preds = %entry
  %call1 = call i32 @bar(), !dbg !12
  %sub2 = sub nsw i32 %b, %call1, !dbg !13
  br label %if.end

if.end:                                           ; preds = %if.else, %if.then
  %b.addr.0 = phi i32 [ %sub, %if.then ], [ %sub2, %if.else ]
  ret i32 %b.addr.0, !dbg !14
}

With the source location of the sub instructions described by:

!10 = !DILocation(line: 10, column: 7, scope: !6)
!13 = !DILocation(line: 12, column: 7, scope: !6)

If this is passed to InstCombine, the two sub instructions feeding into the phi node will be combined into a single sub with an incoming phi node:

if.end:                                           ; preds = %if.else, %if.then
  %call.pn = phi i32 [ %call, %if.then ], [ %call1, %if.else ]
  %b.addr.0 = sub nsw i32 %b, %call.pn, !dbg !11
  ret i32 %b.addr.0, !dbg !12

However, the combined sub has been given the debug location of the first instruction:

!11 = !DILocation(line: 10, column: 7, scope: !5)

This is a problem for sample-based PGO as hits on the sub instruction will be counted towards the if-part of the if-then-else even when the else-part was executed (which may lead to incorrect decisions to re-order blocks). It will also affect the optimized debugging experience leading to odd stepping in the debugger.

This patch fixes the issue by removing the line that sets the common instruction debug location (as the location is now ambiguous).

In addition to binary operations, instcombine will also fold various others such as casts, loads and getelementptr through a phi node (in total, InstCombinePHI.cpp contains 7 calls to set the debug location on the combined instruction). The patch removes all of these. A test is included which contains 7 tests that check each of these cases.

OK to commit? Although the situation is the same for each case, I can split the patch and commit each change separately if preferred.

Thanks,
Rob.

Diff Detail

Event Timeline

rob.lougher retitled this revision from to [InstCombine] Don't set debug location when folding through a phi node.
rob.lougher updated this object.
aprantl edited edge metadata.Nov 2 2016, 12:56 PM

Shouldn't it only drop the location if the two locations are distinct (and perhaps add a discriminator)?

I mentioned this previously: I think it would be great to add an API that goes something like:
DILocation mergeDebugLoc(DILocation A, DILocation B)
that is invoked in cases like this and then decides what to do depending on the two incoming locations.

  • adrian

Shouldn't it only drop the location if the two locations are distinct (and perhaps add a discriminator)?

Yes, I thought about that. As the two instructions feed into a phi node they are in different basic-blocks, so we're back to talking about the situation where we have an if-then-else all on a single line. In this case we could create a new debug location with a new discriminator (different scope). I don't know the debug info code well enough to do this at the moment...

I mentioned this previously: I think it would be great to add an API that goes something like:
DILocation mergeDebugLoc(DILocation A, DILocation B)
that is invoked in cases like this and then decides what to do depending on the two incoming locations.

That would be nice.

  • adrian

Shouldn't it only drop the location if the two locations are distinct (and perhaps add a discriminator)?

Sorry, didn't make myself clear in the last comment. As the two instructions feed into a phi node they are in different basic-blocks, so the two locations must be distinct (they are in different scopes and will have different discriminators). But in the case where we have an if-then-else all on the same line we could create a new debug location with a different scope/discriminator.

Shouldn't it only drop the location if the two locations are distinct (and perhaps add a discriminator)?

Sorry, didn't make myself clear in the last comment. As the two instructions feed into a phi node they are in different basic-blocks, so the two locations must be distinct (they are in different scopes and will have different discriminators). But in the case where we have an if-then-else all on the same line we could create a new debug location with a different scope/discriminator.

Yes, that sounds reasonable.

I think the single-line situation is not just an edge case and we should handle it correctly. In C++11 it is becoming quite common to have a lot of control flow on a single line (think anything from <algorithm> with lambdas or the ternary operator).

danielcdh edited edge metadata.Nov 2 2016, 1:40 PM

I'm not sure if I understand why you want to add a discriminator here. Basically we should only add a discriminator in the AddDiscriminators.cpp, or when there is code duplication that we want to record.

Dehao

probinson edited edge metadata.Nov 2 2016, 2:28 PM

I'm not sure if I understand why you want to add a discriminator here. Basically we should only add a discriminator in the AddDiscriminators.cpp, or when there is code duplication that we want to record.

Dehao

DWARF specifies the discriminator as a way to distinguish instructions in different blocks that are attributed to the same source location. We are hypothetically starting with two instructions in different blocks with the same source location, so those two instructions ought to have two different discriminators already. We are then creating a new instruction in a third block, and if that instruction preserves the original source location, it ought to have a third discriminator value.

Normally, the two original instructions will have different source locations, in which case erasing the source attribution on the new instruction is the only really viable solution (because we can't attribute the source correctly).

From the implementation point of view, the map from source location to the maximum discriminator that source location has, is only maintained in AddDiscriminators.cpp. So at later optimizations, you cannot find the maximum discriminator for that location, and assign max_discriminator+1 to the moved instruction.

Any updates on this patch?

We actually hit a point that we need this patch to unlock some performance improvements in sample pgo.

Thanks,
Dehao

No update. I looked into handling the single line if-then-else case and concluded it was not easy (as Dehao says at this stage you do not know the maximum discriminator value, without redoing the work of AddDiscriminators). As others want the patch could I add a TODO for now?

Thanks,
Rob.

dblaikie accepted this revision.Nov 9 2016, 10:14 AM
dblaikie edited edge metadata.

Looks fairly reasonable to me.

The changes seem independent though, so if you're inclined - you could split it up into one change and one test per commit for easier isolation/archaeology/etc.

(presumably we could do the simple thing of "if they're actually the same location, keep it" - so in a build with discriminators we wouldn't keep it (because they'd have distinct discriminators) but otherwise/in a normal debug build we could keep them)

This revision is now accepted and ready to land.Nov 9 2016, 10:14 AM

SGTM

As Adrian mentioned, for lambda, there is nothing that can be done (or very difficult) here to address the lexical scope issue.

Assuming this only applies to ternary case, it should not affect the debugging for the following case:

#1 v = cond ? stmt1: stmt2;

Because though the debug info for stmt1&2 are removed, as they are in the same line with cond, debugger will still stop at this line as it need to execute "cond".

The debugging experience for the following code will be affected:

#1 v = cond ?
#2 stmt1: stmt2;

Without the patch, the debugger will first stop at line #1, and then step to line #2. With the patch, #2 does not exist any more in debug info, so it will be skipped after line #1. But this does not seem important as stopping at line #2 does not help much during debugging process.

But for the following case:

#1 v = cond ?
#2 stmt1:
#3 stmt2;

This patch will definitely improve debugging experience by avoiding stopping at incorrect location.

Comments?

I would still prefer adding an "mergeDebugLoc" API, even if it only returns an empty DebugLoc, so we:

  • make it explicit that the debug location is being intentionally dropped and that this code has been audited for correctly updating debug locations (otherwise the code looks no different from a pass that is just not handling debug info at all)
  • can improve on it later by, e.g., keeping identical DebugLocs

Thanks for all the comments. I'm away for the next couple of days - I'll check back on Monday.

Just a quick note to say I haven't forgotten about this.

My understanding is if we had

#1 v = cond ?
#2 stmt1: stmt2;

Then if an instruction was commoned, we should still have at least one instruction left in the blocks for stmt1 and stmt2? So debugging experience wouldn't be affected as we'll still stop at line 2.

However, from Adrian's last comment, we're now moving away from a purely functional argument, to aesthetics of the code where we want something to say "yes, I've thought about debug info here", even if we simply do nothing.

I'm not against doing this. However, before I submit a revised patch I'd like to run a few things past people.

First, where should a merge API live? I can see two possible places, the DebugLoc or the DILocation class. Get/SetDebugLoc uses DebugLoc, but in DebugLoc.h there's FIXMEs to avoid using it.

Second, assuming we want to merge two locations, the function could be static or non-static:

DILocation mergeDebugLoc(DILocation A, DILocation B)
DILocation merge(DILocation Other)

If we need to merge more than two locations, this can be done iteratively, merging the next one onto the current merged result (hence a non-static function may be better).

As far as revising the patch to use the new API, there is an added complexity. The optimisations are generalised, and the phi node may have any number of input values (not just two as in the example). In this case we will need to iterate over all the incoming values, merging the debug locations together into a single location. In some cases there is already a loop iterating over the values, in other cases a loop will need to be added.

My initial implementation of mergeDebugLoc() would simply return DebugLoc().

Thanks,
Rob.

Just a quick note to say I haven't forgotten about this.

My understanding is if we had

#1 v = cond ?
#2 stmt1: stmt2;

Then if an instruction was commoned, we should still have at least one instruction left in the blocks for stmt1 and stmt2? So debugging experience wouldn't be affected as we'll still stop at line 2.

However, from Adrian's last comment, we're now moving away from a purely functional argument, to aesthetics of the code where we want something to say "yes, I've thought about debug info here", even if we simply do nothing.

I'm not against doing this. However, before I submit a revised patch I'd like to run a few things past people.

First, where should a merge API live? I can see two possible places, the DebugLoc or the DILocation class. Get/SetDebugLoc uses DebugLoc, but in DebugLoc.h there's FIXMEs to avoid using it.

For symmetry with DILocation *DILocation::cloneWithDiscriminator(unsigned Discriminator) const DILocation seems like a natural place.

Second, assuming we want to merge two locations, the function could be static or non-static:

DILocation mergeDebugLoc(DILocation A, DILocation B)
DILocation merge(DILocation Other)

If we need to merge more than two locations, this can be done iteratively, merging the next one onto the current merged result (hence a non-static function may be better).

I assume that both versions would be const? If so I think that the verb shouldn't just be merge as it implies that it modifies the object.

At the call site this would look like this:

I.setDebugLoc(DILocation::mergeDebugLoc(LocA, LocB));
I.setDebugLoc(LocA->merge(LocB));
// More variants:
I.setDebugLoc(DILocation::getMergedLocation(LocA, LocB));
I.setDebugLoc(LocA->mergeWith(LocB));
I.setDebugLoc(LocA->cloneMergedLocation(LocB));

Subjectively I like DILocation::getMergedLocation(LocA, LocB) best, but I'm open to other suggestions.

As far as revising the patch to use the new API, there is an added complexity. The optimisations are generalised, and the phi node may have any number of input values (not just two as in the example). In this case we will need to iterate over all the incoming values, merging the debug locations together into a single location. In some cases there is already a loop iterating over the values, in other cases a loop will need to be added.

Thanks for pointing this out I wasn't aware of that. Let me know if it grows impractical.

My initial implementation of mergeDebugLoc() would simply return DebugLoc().

That's fine. Thanks for looking into this!

  • adrian

Thanks,
Rob.

rob.lougher edited edge metadata.

I have updated the patch to add a new API for merging debug locations. As explained in the comments, the API is currently a stub which simply uses an empy location.

The obvious case to consider is when the two debug locations are the same (e.g. instructions on the same line without discriminators). This could be handled with something like:

static DILocation *getMergedLocation(const DILocation *LocA,
                                      const DILocation *LocB) {
  return LocA.canDiscriminate(LocB) ? nullptr : LocA;
 }

However, canDiscriminate is incomplete as it only checks filename and line, so I decided to leave it as a pure stub to keep the patch simple. A later fix can address this.

Also to keep the patch simple I have avoided refactoring the FoldPHIArgXIntoPHI() functions, opting instead to simply replacing the calls to FirstInst->getDebugLoc() with a call to a helper that returns the merged location for the PHI node.

Please let me know if this revision is OK to commit. As suggested in a previous comment, alhough this is shown as a single patch I'll probably submit it as a series of commits.

Thanks,
Rob.

Ping. Adrian, are you happy with this revision?

aprantl accepted this revision.Dec 13 2016, 11:43 AM
aprantl edited edge metadata.

Yes, thank you!

This revision was automatically updated to reflect the committed changes.
pmatos added a subscriber: pmatos.Oct 19 2017, 1:35 AM

I should point out that r289661, which is part of the 8 patches, does not show up in the review commit list.