This is an archive of the discontinued LLVM Phabricator instance.

[Debugify] Add a pass to test debug info preservation
ClosedPublic

Authored by vsk on Nov 27 2017, 11:55 AM.

Details

Summary

The Debugify pass synthesizes debug info for IR. It's paired with a
CheckDebugify pass which determines how much of the original debug info
is preserved. These passes make it easier to create targeted tests for
debug info preservation.

Here is the Debugify algorithm:

NextLine = 1
for (Instruction &I : M)
  attach DebugLoc(NextLine++) to I

NextVar = 1
for (Instruction &I : M)
  if (canAttachDebugValue(I))
    attach dbg.value(NextVar++) to I

The CheckDebugify pass expects contiguous ranges of DILocations and
DILocalVariables. If it fails to find all of the expected debug info, it
prints a specific error to stderr which can be FileChecked.

Example usage:

opt -debugify -mem2reg -check-debugify | grep "CheckDebugify: PASS"

opt -debugify -loop-reduce -check-debugify | grep "CheckDebugify: FAIL"

Diff Detail

Event Timeline

vsk created this revision.Nov 27 2017, 11:55 AM
davide requested changes to this revision.Nov 27 2017, 1:52 PM
davide added a subscriber: davide.

Thank you Vedant, some comments inline.

lib/Transforms/Utils/Debugify.cpp
60–62

ehm, are you sure this is right?
This will break for code emitted by another frontend.

77–81

is there a way to avoid two scans of the whole module?

148

Why a bitvector?

149–151

You can probably use
F.instructions() here.

173–175

I think you can simplify this writing

for (Instruction &I: F.instructions())

or something [check include/llvm/IR/]

181–183

Can you fold valid inside the assertion and avoid the
(void)Valid hack?

212

I think the 3rd or 4th parameter should be true?
If this preserves everything, also preserves the CFG?

Also, isn't something that preserves everything (i.e. doesn't mutate the IR) technically an analysis?

212–215

INITIALIZE_PASS will suffice.

This revision now requires changes to proceed.Nov 27 2017, 1:52 PM
vsk added a comment.Nov 27 2017, 2:04 PM

Thanks for your comments Davide!

lib/Transforms/Utils/Debugify.cpp
60–62

I really want to say "pick any language" here. This pass ignores IR which already has debug info, so I'm not sure what breaks -- could you clarify?

77–81

These loops could be combined. IMO it's just slightly neater to write the early break in the second loop if it's a separate loop.

148

It seems like a cheap way to track whether or not we've seen each line. I originally wanted to use an IntervalMap, but it has no mechanism to remove a mapping, or to iterate over holes in the mapping.

149–151

Thanks! Will fix.

173–175

Will fix.

181–183

AFAIK we need the hack to avoid -Wunused in non-asserts builds.

212

Thanks for the catch! Do you think this belongs in lib/Analysis?

212–215

Will fix.

davide added inline comments.Nov 27 2017, 2:15 PM
lib/Transforms/Utils/Debugify.cpp
60–62

The scenario I have in mind is the one where you have rust emitting an IR module without debug info and then running this pass on it (where you get a C++11 compile unit).
I guess there's no real to understand which frontend is emitting the IR unless the frontend emits enough metadata for the mid-level optimizer to find out which language is this IR derived from.

Another example is, what if you have code compiled with -std=c++98/-std=c++03, without debuginfo on which you call this pass?
I think in that case is setting C_plus_plus_11 always correct? I'm not sure myself, so please correct me if I'm wrong.

77–81

Fair enough. I'm just slightly worried about LTO as scanning the whole module can become really expensive. I guess we'll measure and see :)

148

Sounds reasonable. I thought you benchmarked vs DenseMap and found it a big win.
FWIW, if you have lots of setrange operations, it definitely is, in fact some algorithms use it as a cheap worklist (e.g. NewGVN).

181–183

even if you write
assert(to_integer() && "patatino") ?

212

Could be (at least in my opinion). I'm not sure whether metadata added are considered something IR transformations, as passes can just ignore them willy-nilly.
I think Chandler is the best person to comment on this one after years on the pass manager design. Ideally, you might also bring this up on llvm-dev.

vsk updated this revision to Diff 124488.Nov 27 2017, 3:46 PM
vsk edited edge metadata.
vsk marked 9 inline comments as done.
vsk retitled this revision from WIP: [Debugify] Add a pass to test debug info preservation to [Debugify] Add a pass to test debug info preservation.
vsk edited the summary of this revision. (Show Details)
  • Hook into the new pass manager.
  • Address review feedback from Davide, except for the bit about moving to lib/Analysis/, which I'll discuss in an inline comment.
lib/Transforms/Utils/Debugify.cpp
60–62

I don't think there's any correct language value we could use here due to the issue you point out. In practice, it shouldn't matter unless you actually try to load a debugified module in a debugger, which is a non-goal. I can keep things simple here by just using DW_LANG_C.

148

Gotcha. I didn't benchmark against DenseMap -- I just visualized 'MissingLines' as a bitmap and went down that route.

181–183

Then we'd need to call to_integer() twice (for the non-asserts and asserts paths), which doesn't seem better.

212

Based on docs/Passes, I think CheckDebugify best fits under the category of Utility passes. It doesn't provide information for other passes and it doesn't transform the IR.

The Debugify pass does technically provide information for another pass, but IMO it doesn't make sense to try to preserve/cache/invalidate its results, so it doesn't quite seem like an analysis either.

probinson added inline comments.
lib/Transforms/Utils/Debugify.cpp
60–62

The pass is not adding true source information, e.g. types, so the language is irrelevant. The goal is to see whether info attached to the IR is preserved, not to emit anything useful. No debugger will ever see this output, and probably not even llvm-dwarfdump.
You could do something more obviously bogus, like DW_LANG_Ada83. We don't have a language code for IR.

test/Transforms/LoopStrengthReduce/debug-info.ll
2

This doesn't seem very useful.

mgrang added a subscriber: mgrang.Nov 27 2017, 4:09 PM
mgrang added inline comments.
lib/Transforms/Utils/Debugify.cpp
163

How about you wrap the "bool Valid" definition and the assert inside #ifndef NDEBUG ? You can then get rid of the (void) Valid.

vsk updated this revision to Diff 124491.Nov 27 2017, 4:21 PM
vsk marked an inline comment as done.
  • Remove an unnecessary test case pointed out by Paul Robinson.
lib/Transforms/Utils/Debugify.cpp
163

I'm not sure how Valid would be set in the asserts-enabled case, then.

test/Transforms/LoopStrengthReduce/debug-info.ll
2

Fair point. There's an easier way to add a negative test -- just run '-debugify -strip-debug -check-debugify' and check that it fails.

fhahn added a subscriber: fhahn.Nov 28 2017, 12:13 AM
MatzeB added a subscriber: MatzeB.Dec 4 2017, 4:33 PM

Thanks this is a great idea!

I assume this will only ever be used via opt, so you could move the pass to llvm/tools/opt to keep the size of the llvm libraries smaller.

Thanks this is a great idea!

I assume this will only ever be used via opt, so you could move the pass to llvm/tools/opt to keep the size of the llvm libraries smaller.

Oh nice! It was making me uncomfortable to have a debugging-only pass that would be just taking up space in a production compiler, but I didn't know opt had (effectively) private passes.

emaste added a subscriber: emaste.Dec 4 2017, 8:42 PM
vsk updated this revision to Diff 125595.Dec 5 2017, 1:09 PM
  • Move Debugify* to tools/opt, which simplifies the patch and avoids bloating llvm. Thanks @MatzeB for the suggestion!
davide accepted this revision.Dec 5 2017, 1:19 PM

This looks good to me modulo minor, happy to see this in if Matthias/Adrian are happy with it.

tools/opt/Debugify.cpp
54 ↗(On Diff #125595)

should this be return true?

143–144 ↗(On Diff #125595)

I wonder whether this should be under DEBUG(), so that we don't clutter the release builds?

This revision is now accepted and ready to land.Dec 5 2017, 1:19 PM
vsk updated this revision to Diff 125597.Dec 5 2017, 1:24 PM
  • Make the return type in a lambda explicit, & a minor tweak to an assert.
tools/opt/Debugify.cpp
54 ↗(On Diff #125595)

This needs to return a DIType, but that could be made clearer. I'll make the return type explicit.

143–144 ↗(On Diff #125595)

I'd rather not do that, since -check-debugify should indicate that the author is interested in messages related to debug info.

aprantl added inline comments.Dec 5 2017, 1:26 PM
tools/opt/Debugify.cpp
79 ↗(On Diff #125595)

Sorry for having this idea so late, but: Would it make sense to combine this with the DebugIR pass, so, instead of assigning arbitrary artificial DILocations, the DILocations actually point back to the original instructions in the IR and are perhaps more useful when debugging a pass?

174 ↗(On Diff #125595)

Doxygen comment explaining the purpose of the pass?

186 ↗(On Diff #125595)

comment?

MatzeB accepted this revision.Dec 5 2017, 1:33 PM

LGTM to go in tree. Comments/nitpicks below.

tools/opt/Debugify.cpp
10–11 ↗(On Diff #125595)

Could use /// \file This pass ... so doxygen picks it up.

38–40 ↗(On Diff #125595)

Maybe write a warning to the user to avoid surprises?

43 ↗(On Diff #125595)

I'm personally no fan of auto if you cannot deduce the type immediately. It's not a deal breaker but I consider it friendlier to readers if types are spelled out as much as possible. With the only exception being auto X = dyn_cast<xxx>() as the xxx type can be seen in the same line or lambdas where the type is also obvious from the RHS of the assignment.

51–52 ↗(On Diff #125595)

I don't know the DIB API, but seeing that createBasicType takes a StringRef makes me wonder if it actually takes the string and copies it or whether we will end up with a reference to a temporary std::string here?

143–146 ↗(On Diff #125595)

I think simply printing is fine for a pass like this. I would probably print to stdout as this feels more like the "normal" output of this pass rather than an error condition (from the perspective of this pass).

If you wanted to you could also take a look at this Pass::print() thing we have as sort of an official interface to have a pass print information as part of opt. Though I don't know for sure whether it would work as intended here (and I would expect it to be little better than directly printing as is done here).

vsk updated this revision to Diff 125608.Dec 5 2017, 1:57 PM
  • Address comments by Adrian and Matthias.
tools/opt/Debugify.cpp
10–11 ↗(On Diff #125595)

Will fix.

38–40 ↗(On Diff #125595)

Will fix.

43 ↗(On Diff #125595)

Will fix.

51–52 ↗(On Diff #125595)

The stringref is internalized by MDString::get, and some canonical string is pulled out (possibly after a copy).

79 ↗(On Diff #125595)

I don't think there's a way to unify the two tools in a way that's simpler than keeping them separate. We need separate code paths to mark-up *.ll files on disk vs. in-memory Modules. Secondly, to implement -check-debugify on DebugIR output, we'd need to do do extra work to store a list of all DILocations in a separate piece of metadata. IMO having a separate code path for DebugIR would be neater and less complicated.

143–146 ↗(On Diff #125595)

Printing to stdout sgtm, but the Pass::print() mechanism seems a bit heavyweight given how this is structured.

174 ↗(On Diff #125595)

Will fix.

186 ↗(On Diff #125595)

Will fix.

MatzeB added inline comments.Dec 5 2017, 2:44 PM
tools/opt/Debugify.cpp
10–11 ↗(On Diff #125595)

Needs three slashes for doxygen :)

vsk marked 21 inline comments as done.Dec 7 2017, 4:45 PM

Thanks everybody for your feedback. I think I've addressed all of it now. I'll commit this tomorrow barring any further comments.

tools/opt/Debugify.cpp
10–11 ↗(On Diff #125595)

Will fix.

79 ↗(On Diff #125595)

Adrian and I had an off-line conversation about this, in which we reached the conclusion that it would be neater to keep the DebugIR code path separate from this pass.

This revision was automatically updated to reflect the committed changes.