This patch is technically NFC, but I'm not putting NFC in the title because it shouldn't be considered a trivial change.
Over in D116821, there are some benchmarks where InstrRefBasedLDV significantly increases max-rss during LTO and the like. After thinking about it, it seems a bit stupid that we compute lots of information, store them all in huge tables, then emit them to DBG_VALUE instructions, then free the tables. We can do better: this patch solves the variable value problem in lexical scope depth-first order, allowing blocks where all contained scopes have been processed to be emitted early and the relevant information freed. This isn't going to solve the problem of instruction-referencing producing lots of variable information, but it should make the overhead on top of that stop growing linearly.
I've kept the unordered-explore code available, and it can be selected with a command line flag, to ease debugging in the future. I might figure out a way to make the unit tests use both of these modes, but haven't done that yet.
The approach is to:
- Make use of the fact LexicalScopes gives a depth-number to each lexical scope,
- Produce an "ejection map" [0] that identifies the last lexical scope to make use of a block,
- Enumerate each scope in LexicalScopes' DFS order, solving the variable value problem,
- After each scope is processed, look for any blocks that won't be used by any other scope, and "eject" them.
Where "ejecting" is translating the variable value information into DBG_VALUE instructions in the block, and freeing any machine-value or variable-value information for that block.
I haven't tested the reproducer posted in D116821 yet, but this (and stacked patches) reduces the instr-ref growth in max-rss in mafft / SPASS from 26%/35% to roughly 10% each. Digging into SPASS, the culprit is the "main" function, which LTO seems to inline a lot of stuff into. As ever, the amount of debug-info is way out of proportion to the actual code, there are some 31k non-debug instructions. After LiveDebugValues, there are ~90k DBG_VALUEs with VarLocBasedLDV, ~135k DBG_VALUEs with InstrRefBasedLDV. The ultimate fix for this is to stream variable information into the DWARF printer when needed, rather than coughing it all up at the same time and creating instructions for it.
Testing: no test added on account of this being a performance patch. I've built stage2 clang RelWithDebinfo with and without this change, and all object files are identical, except for InstrRefBasedImp.cpp.o of course.
[0] This could be called an emission instead of ejecting, but I've written emission in a lot of other places.
clang-format: please reformat the code