Page MenuHomePhabricator

[DWARF] Add cutoff guarding validThroughout to avoid near-quadratic behaviour
ClosedPublic

Authored by jmorse on Jul 6 2020, 9:17 AM.

Details

Summary

Occasionally we see absolutely massive basic blocks, typically in global constructors that are vulnerable to heavy inlining. When these blocks are dense with DBG_VALUE instructions, we can hit near quadratic complexity in DwarfDebug's validThroughout function. The problem is caused by:

  • validThroughout having to step through all instructions in the block to examine their lexical scope, and
  • a high proportion of instructions in that block being DBG_VALUEs for a unique variable fragment,

Leading to us stepping through every instruction in the block, for (nearly) each instruction in the block. In the particular sample I'm looking at, there's a block with 120K instructions and maybe two-thirds of them are DBG_VALUEs. Not running validThroughout for this block cuts time in DWARF emission in half (which is many tens of seconds).

By adding this guard, we force variables in this block to use a location list rather than a single-location expression, as shown in the added test . This shouldn't change the meaning of the output DWARF at all: instead we use a less efficient DWARF encoding to avoid a poor-performance code path. In the long term this could be fixed by Orlando's D82129 providing enough instruction ordering information to make validThroughouts checks less complex, but we're not there yet.

The testing technique is shamelessly ripped off from D80662. I've used a set of very-large block pointers rather than calling size() each time, because size() isn't constant-time with ilists.

The default setting of blocks that are over 30,000 instructions long being considered too large isn't determined scientifically; rather, it solves the problem in front of me, and doesn't trigger on a stage2 clang build. Suggestions on a good mechanism to pick this number most welcome.

Diff Detail

Event Timeline

jmorse created this revision.Jul 6 2020, 9:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2020, 9:17 AM
vsk accepted this revision.Jul 6 2020, 12:07 PM

Thanks, lgtm.

This revision is now accepted and ready to land.Jul 6 2020, 12:07 PM

Could the algorithm be changed to do validThroughout of all variable fragments in a single pass together?

aprantl added inline comments.Jul 6 2020, 4:43 PM
llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h
598

Not very important, but: Assuming that VeryLargeBlocks will only be populated in the pathological case, micro-optimizing with a *Small*PtrSet seems unnecessary. Perhaps it's more memory-efficient on average to just use a DenseSet?

jmorse marked an inline comment as done.Jul 7 2020, 4:32 AM

Could the algorithm be changed to do validThroughout of all variable fragments in a single pass together?

This is probably do-able, although I'm generally unfamiliar with the DWARF emission code. Right now we iterate over variable entities and call validThroughout for those that might be single-locations; we would need to pivot to iterate over variable entities collecting those that _might_ be single-location, then calling validThroughout once for that set. My preference is to fold this problem into the work that @Orlando is doing though -- his patch is already solving this problem (intersection of scope and variable-location range) in one context, we should be able to re-purpose it to solve this one too.

(Most of my motivation for this patch is the upcoming branch for LLVM11, I'd like to get a limit on this, then work towards doing it more efficiently)

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h
598

Sounds fair, I'll fold that change in,

Could the algorithm be changed to do validThroughout of all variable fragments in a single pass together?

This is probably do-able, although I'm generally unfamiliar with the DWARF emission code. Right now we iterate over variable entities and call validThroughout for those that might be single-locations; we would need to pivot to iterate over variable entities collecting those that _might_ be single-location, then calling validThroughout once for that set. My preference is to fold this problem into the work that @Orlando is doing though -- his patch is already solving this problem (intersection of scope and variable-location range) in one context, we should be able to re-purpose it to solve this one too.

(Most of my motivation for this patch is the upcoming branch for LLVM11, I'd like to get a limit on this, then work towards doing it more efficiently)

Fair enough. Perhaps a clear FIXME or the like (suggesting that this should be obsoleted by @Orlando 's work (& ensuring they're aware that this is something that should be considered in the replacement design)?

This revision was automatically updated to reflect the committed changes.