This is an archive of the discontinued LLVM Phabricator instance.

[LICM] Try to merge debug locations when sinking
ClosedPublic

Authored by davide on Apr 14 2020, 1:24 PM.

Details

Summary

The current strategy LICM uses when sinking for debuginfo is that picking the debug location of one of the uses.
This causes stepping to be wrong sometimes, as in the example in https://bugs.llvm.org/show_bug.cgi?id=45523 / rdar://problem/61750950 where lldb steps on dead code.

This patch introduces a generalization of getMergedLocation(), getMergedLocations(), that operates on a vector of locations instead of two, and try to merge all them together.
The API is used in LICM to try making the locations more correct

Diff Detail

Event Timeline

davide created this revision.Apr 14 2020, 1:24 PM

Do you have any detailed data to refute the claim in the original comment "given that the inserted loads/stores have little relation to the original loads/stores".

Is this loop over the "original loads and stores"?

& maybe some comments in the test case describing what got merged and why the merged result is the desired one?

llvm/lib/IR/DebugInfoMetadata.cpp
81–82

If there's only 1, probably should return that rather than nullptr?

84

REtrieve the end iterator once, rather than N times - as per the LLVM style guide? https://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop

llvm/lib/Transforms/Scalar/LICM.cpp
2066–2071

Bit unweildy if you have to write a loop and stash all the DILocations in a buffer to start with, maybe a templated API would be in order:

auto DL = DebugLoc(DILocation::getMergedLocation(LoopUses, [&](Instruction *I) {
  return I->getDebugLoc().get();
});

(whether or not this will generalize to the next case of N-way merging, anyone's guess (eg: the next case might need to filter some things out of the container - then this API would be uinadequate because it doesn't provide a means of skipping elements), but seems tidy enough for this case)

Or, I guess - this API could take a generic range (general templated Range type) & use mapped range...

auto DL = DebugLoc(DILocation::getMergedLocation(map_range(LoopUses, [&](Instruction *I) {
  return I->getDebugLoc().get();
});

In that case it might be nice to modify the implementation to tolerate non-random-access ranges:

auto I = Range.begin();
auto E = Range.end();
if (I == E)
  return nullptr;
const DILocation *Result = *I;
++I;
for (; I != E; ++I) {
  Merged = getMergedLocation(Merged, *I);
  if (!Merged)
    break;
}
return Merged;

Do you have any detailed data to refute the claim in the original comment "given that the inserted loads/stores have little relation to the original loads/stores".

Is the wrong line info at -Og outlined in the bug report enough? Or you'd like to have more? I'm not sure how to get these, given the original code to preserve debug info has been committed in 2011, and I'm apparently the first one to formalize this bug :)

Is this loop over the "original loads and stores"?

Yes.

& maybe some comments in the test case describing what got merged and why the merged result is the desired one?

Fair.

davide marked 3 inline comments as done.Apr 14 2020, 2:10 PM
davide added inline comments.
llvm/lib/IR/DebugInfoMetadata.cpp
81–82

Yeah, that's a good point.

84

Indeed.

llvm/lib/Transforms/Scalar/LICM.cpp
2066–2071

Yeah, thanks for the suggestion.

aprantl added inline comments.Apr 14 2020, 2:41 PM
llvm/include/llvm/IR/DebugInfoMetadata.h
1554

Could this be an ArrayRef or MutableArrayRef?

vsk added inline comments.Apr 14 2020, 3:18 PM
llvm/include/llvm/IR/DebugInfoMetadata.h
1554

Nice catch, ArrayRef should work, I think.

davide updated this revision to Diff 257525.Apr 14 2020, 3:30 PM

Most of the comment addressed. I didn't change the API because I'm not necessarily convinced that looping over uses is better than collecting debug values in a loop.
I'm not entirely sure how general this API is going to be, but given we only have a use-case right now, we might re-evaluate later. I'm going to do some more work in this area so probably more scenarios will pop up.

vsk accepted this revision.Apr 14 2020, 4:25 PM

Looks good to me with a minor test update. @dblaikie?

llvm/test/Transforms/LICM/sink-debuginfo-preserve.ll
9

Probably less brittle to match !dbg [[storeLocation:![0-9]+]] and then check [[storeLocation]] = !DILocation(line: 0.

This revision is now accepted and ready to land.Apr 14 2020, 4:25 PM
In D78147#1982329, @vsk wrote:

Looks good to me with a minor test update. @dblaikie?

Fair enough - still a bit of a preference for not having to use extra storage for this operation, but see how it shakes out with practice I guess.

Test case could probably do with some simplification - and include the original source/repro steps to generate the IR in a comment in the test file. Here's as simple as I could get it at least:

volatile int a;
extern int g;
int g;
void f1() { 
  while (a) { 
    g = 0;
    if (a)
      g = 0;
  } 
}

At least it does hit the same line of code, so hopefully it's observable there.

Might be worth explaining validating that the specific "scope" for the chosen DILocation is correct - showing that it picked the nearest enclosing scope of the two stores. (this is existing functionality in the merge location - but showing the N-way merge preserves this too) & maybe even generalizing the test to make sure it exercises the N-way merge (ensuring the size of the vector's at least 3 for this test case? probably by adding another "if (a) g = 0;" maybe? or something like that)

dblaikie accepted this revision.Apr 14 2020, 6:03 PM

Oops, forgot to hit the actual accept thingy.

Looks good, I agree with Vedant about avoiding CHECKing explicit metadata numbers. Plus, the store CHECK is testing for a !tbaa node too, which should be avoided.

davide closed this revision.Apr 15 2020, 12:30 PM

Thank you all for the reviews.

commit 5f87415efc1e57587272944a5f9b6745e4474660 (HEAD -> master, origin/master, origin/HEAD)
Author: Davide Italiano <ditaliano@apple.com>
Date:   Wed Apr 15 12:28:34 2020 -0700

    [LICM] Try to merge debug locations when sinking.
    
    The current strategy LICM uses when sinking for debuginfo is
    that of picking the debug location of one of the uses.
    This causes stepping to be wrong sometimes, see, e.g. PR45523.
    
    This patch introduces a generalization of getMergedLocation(),
    that operates on a vector of locations instead of two, and try
    to merge all them together, and use the new API in LICM.
    
    <rdar://problem/61750950>

Thank you all for the reviews.

commit 5f87415efc1e57587272944a5f9b6745e4474660 (HEAD -> master, origin/master, origin/HEAD)
Author: Davide Italiano <ditaliano@apple.com>
Date:   Wed Apr 15 12:28:34 2020 -0700

    [LICM] Try to merge debug locations when sinking.
    
    The current strategy LICM uses when sinking for debuginfo is
    that of picking the debug location of one of the uses.
    This causes stepping to be wrong sometimes, see, e.g. PR45523.
    
    This patch introduces a generalization of getMergedLocation(),
    that operates on a vector of locations instead of two, and try
    to merge all them together, and use the new API in LICM.
    
    <rdar://problem/61750950>

If you can leave in the "Differential Revision: https://reviews.llvm.org/DXXXX" line in the commit message, it's handy for following from a commit to its review, and lets Phabricator auto-close reviews/point to the resulting commit.