This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo][InstrRef] Use a depth-first search to reduce the lifetime of tracking information
ClosedPublic

Authored by jmorse on Jan 28 2022, 5:26 AM.

Details

Summary

This patch is technically NFC, but I'm not putting NFC in the title because it shouldn't be considered a trivial change.

Over in D116821, there are some benchmarks where InstrRefBasedLDV significantly increases max-rss during LTO and the like. After thinking about it, it seems a bit stupid that we compute lots of information, store them all in huge tables, then emit them to DBG_VALUE instructions, then free the tables. We can do better: this patch solves the variable value problem in lexical scope depth-first order, allowing blocks where all contained scopes have been processed to be emitted early and the relevant information freed. This isn't going to solve the problem of instruction-referencing producing lots of variable information, but it should make the overhead on top of that stop growing linearly.

I've kept the unordered-explore code available, and it can be selected with a command line flag, to ease debugging in the future. I might figure out a way to make the unit tests use both of these modes, but haven't done that yet.

The approach is to:

  • Make use of the fact LexicalScopes gives a depth-number to each lexical scope,
  • Produce an "ejection map" [0] that identifies the last lexical scope to make use of a block,
  • Enumerate each scope in LexicalScopes' DFS order, solving the variable value problem,
  • After each scope is processed, look for any blocks that won't be used by any other scope, and "eject" them.

Where "ejecting" is translating the variable value information into DBG_VALUE instructions in the block, and freeing any machine-value or variable-value information for that block.

I haven't tested the reproducer posted in D116821 yet, but this (and stacked patches) reduces the instr-ref growth in max-rss in mafft / SPASS from 26%/35% to roughly 10% each. Digging into SPASS, the culprit is the "main" function, which LTO seems to inline a lot of stuff into. As ever, the amount of debug-info is way out of proportion to the actual code, there are some 31k non-debug instructions. After LiveDebugValues, there are ~90k DBG_VALUEs with VarLocBasedLDV, ~135k DBG_VALUEs with InstrRefBasedLDV. The ultimate fix for this is to stream variable information into the DWARF printer when needed, rather than coughing it all up at the same time and creating instructions for it.

Testing: no test added on account of this being a performance patch. I've built stage2 clang RelWithDebinfo with and without this change, and all object files are identical, except for InstrRefBasedImp.cpp.o of course.

[0] This could be called an emission instead of ejecting, but I've written emission in a lot of other places.

Diff Detail

Event Timeline

jmorse created this revision.Jan 28 2022, 5:26 AM
jmorse requested review of this revision.Jan 28 2022, 5:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 28 2022, 5:26 AM

Minor point, but:

This patch is technically NFC, but I'm not putting NFC in the title because it shouldn't be considered a trivial change.

Generally I think it's still valuable to flag such a thing as NFC - it explains why there aren't any test changes and means someone can expect object files to be identical with/without the change (& if they aren't, that's a bug)

I've kept the unordered-explore code available, and it can be selected with a command line flag, to ease debugging in the future. I might figure out a way to make the unit tests use both of these modes, but haven't done that yet.

That worries me a bit - leaving untested code in. Like #if 0 but worse, because it is accessible, but untested. Would be good to either delete it (can be resurrected from source control) or test it...

Once @dblaikie's concerns are addressed, this LGTM.

Would it be worth caching getBlocksForScope results? If I'm reading this right it looks like it's called a few times with what should be the same inputs and outputs in a few places in the pass?

llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
2986

Repeating offline discussion so it's noted on the review -- it looks as though it might be possible to simplify this by removing the HighestDFSIn check and moving the body of this block into the if (ChildNum < (ssize_t)Children.size()) { ... } block below.

jmorse added inline comments.Feb 2 2022, 4:35 AM
llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
2986

I went to implement this and then realised: the block you mention only runs for scopes where there are child scopes, and the "else" block only runs after child scopes are visited. I think this counter/ratchet needs to continue to exist so that scopes are always visited, and in the correct order.

jmorse updated this revision to Diff 405219.Feb 2 2022, 4:41 AM

Strip out dead code as per recommendations.

Given that in the dead path, scopes were visited in a DenseMap order and thus it was nondeterministic, I suppose there's little value in keeping around code to visit scopes "the old way".

Orlando accepted this revision.Feb 2 2022, 5:36 AM

Would it be worth caching getBlocksForScope results? If I'm reading this right it looks like it's called a few times with what should be the same inputs and outputs in a few places in the pass?

wdyt about this? I'm happy for that to come as a separate patch if you agree, and it looks like @dblaikie's comments have been addressed, so LGTM if you make clang-format bot happy.

llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
2986

Aha, SGTM.

This revision is now accepted and ready to land.Feb 2 2022, 5:36 AM
jmorse added a comment.Feb 2 2022, 5:47 AM

Would it be worth caching getBlocksForScope results? If I'm reading this right it looks like it's called a few times with what should be the same inputs and outputs in a few places in the pass?

wdyt about this? I'm happy for that to come as a separate patch if you agree, and it looks like @dblaikie's comments have been addressed, so LGTM if you make clang-format bot happy.

IMO that should happen, but not in this patch, simply out of being conservative -- this patch should be able to make its way into llvm14-rc1, and a follow-up patch caching getBlocksForScope should be able to get into rc2 (IMHO, YMMV).

This revision was landed with ongoing or failed builds.Feb 2 2022, 6:10 AM
This revision was automatically updated to reflect the committed changes.