This is an archive of the discontinued LLVM Phabricator instance.

[IRSim][IROutliner] Adding DebugInfo handling for IR outlined functions.
ClosedPublic

Authored by AndrewLitteken on Sep 8 2020, 10:33 AM.

Details

Summary

Including debug instructions from different places in the program could cause issues with debuggers. This makes sure that any debug information about original location in outlined functions is stripped accordingly, and adds new debug information to the outlined functions.

Tests Changed:

  • llvm/test/Transforms/IROutliner/legal-debug.ll

New Tests:

  • llvm/test/DebugInfo/AArch64/ir-outliner.ll

Diff Detail

Event Timeline

AndrewLitteken created this revision.Sep 8 2020, 10:33 AM

I know almost nothing about how DebugInfo is represented in llvm/dwarf, but is there not a way to represent what this transformation is doing in DI? It's always a little sad when you have to drop it, since doing so makes the debug experience worse.

llvm/lib/Transforms/IPO/IROutliner.cpp
408

is the StringRef() ctor call necessary?

447

where does moveFunctionData get called with MoveDebug set the other way?

aprantl added a subscriber: vsk.

Updating for string refs.

vsk added a comment.Sep 16 2020, 9:58 AM

CodeExtractor contains a utility to fix up debug info after extraction occurs (I believe it's called fixupDebugInfoPostExtraction). It's solving the same problem - is there any way to generalize/reuse that logic?

llvm/test/DebugInfo/AArch64/ir-outliner.ll
3

Please validate the debug info metadata in the IR after -iroutliner runs. Validating the final DWARF production is nice, but checking the IR takes precedent (why? well DWARF may not represent everything we care about at the IR level, and DWARF isn't the only debug info format llvm supports).

In D87302#2277131, @vsk wrote:

CodeExtractor contains a utility to fix up debug info after extraction occurs (I believe it's called fixupDebugInfoPostExtraction). It's solving the same problem - is there any way to generalize/reuse that logic?

I believe it already uses it since the CodeExtractor is used to create the new functions in each case, but this is mainly for the function that moves them all together. The problem is that fixupDebugInfoPostExtraction transfers the information from the original function, which would not be correct in this case since we are merging the information from several different code locations into one. This also addresses @jroelofs disappointment, but since I'm not aware of any mechanism to say that one particular line in IR is from two different source code locations, I think it has to be dropped altogether for correctness.

Right now the current mechanism in lldb is simply skipping over the section of code, which isn't ideal, but it is consistent.

vsk added a comment.Jan 3 2021, 5:08 PM
In D87302#2277131, @vsk wrote:

CodeExtractor contains a utility to fix up debug info after extraction occurs (I believe it's called fixupDebugInfoPostExtraction). It's solving the same problem - is there any way to generalize/reuse that logic?

I believe it already uses it since the CodeExtractor is used to create the new functions in each case, but this is mainly for the function that moves them all together. The problem is that fixupDebugInfoPostExtraction transfers the information from the original function, which would not be correct in this case since we are merging the information from several different code locations into one. This also addresses @jroelofs disappointment, but since I'm not aware of any mechanism to say that one particular line in IR is from two different source code locations, I think it has to be dropped altogether for correctness.

I see, thanks for explaining.

Right now the current mechanism in lldb is simply skipping over the section of code, which isn't ideal, but it is consistent.

Sgtm, please carry on!

This updates the diff with all the recent IROutliner updates, and adds testing for verifying the IR DebugInfo.

@vsk Is there other verification that needs to happen to the outlined IR code besides just making sure with Filecheck that it is correct?

vsk added a comment.Feb 1 2021, 9:43 AM

No, that looks reasonable to me. (Although, it is unfortunate that update_test_checks.py hardcodes metadata numbers: this creates more of a maintenance burden than is ideal. It'd be nice to revamp the way the script emits metadata check lines, but that's outside the scope of this patch.)

Is this good to land then?

Just bumping this again, is this good to land?

jroelofs accepted this revision.Jun 9 2021, 8:15 AM

LGTM

This revision is now accepted and ready to land.Jun 9 2021, 8:15 AM

I know almost nothing about how DebugInfo is represented in llvm/dwarf, but is there not a way to represent what this transformation is doing in DI? It's always a little sad when you have to drop it, since doing so makes the debug experience worse.

I know a bit more about llvm di/dwarf, but I am also wondering if there is an idea on how to improve this in terms of Debug Info. In order to support Debug Info for Inlined functions, DWARF was extended to meet the requirements. I guess this will need something like that -- on both MIR and IR versions of the pass. Any thoughts about this?

Other than the comment for the test, this looks reasonable to me.

llvm/test/DebugInfo/AArch64/ir-outliner.ll
170

It is bad practice to hardcode metadata numbers in tests since they change, so please make it more flexible by using regular expressions and the check variables.

I guess this will need something like that -- on both MIR and IR versions of the pass. Any thoughts about this?

I think that's certainly possible. If there was a way in the debug info to say that a specific section of IR is from Section A of the program if called from location A, Section B if called from location B etc that would solve the problem. I don't know enough about DebugInfo to really say how difficult that would be though.

Hi,

I think that's certainly possible. If there was a way in the debug info to say that a specific section of IR is from Section A of the program if called from location A, Section B if called from location B etc that would solve the problem. I don't know enough about DebugInfo to really say how difficult that would be though.

Not a super-DWARF-expert, but I don't believe this can be expressed in DWARF today. It's certainly a significant limitation -- there are numerous scenarios where we "common" instructions together, and drop the line number, because it now reflects operations from multiple parts of program. And none of those line numbers are appropriate to present in isolation.

(It certainly isn't supported by LLVM debuginfo; I've no knowledge of what's supported by the CodeView file format).

Does the outliner only run for commoning code, or can it run on a single function (for hot/cold reasons, or anything else)?

BUt yeah, in the commoning case, DWARF (& LLVM IR) have no way to represent this situation and dropping the location is the best tool we have at the moment. It's discussed here: https://llvm.org/docs/HowToUpdateDebugInfo.html#when-to-drop-an-instruction-location

llvm/lib/Transforms/IPO/IROutliner.cpp
383

Side note: This seems surprising to me - do any other passes create functions with plain names like this? I'd expect anything being created would use something like the C++ reserved identifier namespace (double leading underscore or leading underscore and uppercase).

Does the outliner only run for commoning code, or can it run on a single function (for hot/cold reasons, or anything else)?

BUt yeah, in the commoning case, DWARF (& LLVM IR) have no way to represent this situation and dropping the location is the best tool we have at the moment. It's discussed here: https://llvm.org/docs/HowToUpdateDebugInfo.html#when-to-drop-an-instruction-location

Right now, it's module-only. A single-function outlining pass would be pretty useful though. (E.g. for matching outlinable patterns stored off somewhere else.)

Anyway, as for debug info with outlining, yeah, I think that would need a DWARF extension. (@aprantl might be interested in that topic?) I think that discussion is probably better-suited to an email thread on a few mailing lists though. :)

For the sake of this patch, @AndrewLitteken, I think all you need to do is tidy up the testcase like @djtodoro suggested. (Does update_test_checks.py hardcode metadata stuff like this?)

paquette added inline comments.
llvm/lib/Transforms/IPO/IROutliner.cpp
383

The MachineOutliner does this as well. It generally works okay because of the linkage type on the functions. It's not ideal though. IIRC there was a EuroLLVM talk planned prior to covid which had a better naming scheme.

I think @plotfi was helping upstream some of those patches?

plotfi added inline comments.Jun 10 2021, 2:43 PM
llvm/lib/Transforms/IPO/IROutliner.cpp
383

I was working on a pass for redoing address offset calculations done by GEPs so that more of the offset calculation will have increased IR and MIR similarity, but I don't think the size reduction was significant and there were also some weird pathological cases too. I gave up on it though.

Remade tests with regexed metadata information.

the tests look good to me now, thanks! (some nits included)

llvm/lib/Transforms/IPO/IROutliner.cpp
447

can we use some early exits here?

468

Is this clang-formatted?

AndrewLitteken added inline comments.
llvm/lib/Transforms/IPO/IROutliner.cpp
447

I've reversed the logic so that we check for not a call instruction first, then act on that to decrease the indentation.

468

Most of it is, but this crept in under the radar

lgtm, thanks!

llvm/unittests/Analysis/IRSimilarityIdentifierTest.cpp
1989 ↗(On Diff #351881)

I guess this test wasn't intentional? :)

AndrewLitteken added inline comments.Jun 15 2021, 8:12 AM
llvm/unittests/Analysis/IRSimilarityIdentifierTest.cpp
1989 ↗(On Diff #351881)

Ah yeah, I'll make sure to remove it before committing it

This revision was landed with ongoing or failed builds.Jun 15 2021, 8:58 AM
This revision was automatically updated to reflect the committed changes.
vzakhari added inline comments.
llvm/lib/Transforms/IPO/IROutliner.cpp
475

Since DebugInsts does not get reset for each basic block, we may try to erase instructions that have already been erased. This triggers on my change to preserve the topological order for the return blocks created by CodeExtractor. @AndrewLitteken, do you mind if I move the declaration of DebugInsts into the CurrBB loop body as a part of my change?