Occasionally we see absolutely massive basic blocks, typically in global constructors that are vulnerable to heavy inlining. When these blocks are dense with DBG_VALUE instructions, we can hit near quadratic complexity in DwarfDebug's validThroughout function. The problem is caused by:
- validThroughout having to step through all instructions in the block to examine their lexical scope, and
- a high proportion of instructions in that block being DBG_VALUEs for a unique variable fragment,
Leading to us stepping through every instruction in the block, for (nearly) each instruction in the block. In the particular sample I'm looking at, there's a block with 120K instructions and maybe two-thirds of them are DBG_VALUEs. Not running validThroughout for this block cuts time in DWARF emission in half (which is many tens of seconds).
By adding this guard, we force variables in this block to use a location list rather than a single-location expression, as shown in the added test . This shouldn't change the meaning of the output DWARF at all: instead we use a less efficient DWARF encoding to avoid a poor-performance code path. In the long term this could be fixed by Orlando's D82129 providing enough instruction ordering information to make validThroughouts checks less complex, but we're not there yet.
The testing technique is shamelessly ripped off from D80662. I've used a set of very-large block pointers rather than calling size() each time, because size() isn't constant-time with ilists.
The default setting of blocks that are over 30,000 instructions long being considered too large isn't determined scientifically; rather, it solves the problem in front of me, and doesn't trigger on a stage2 clang build. Suggestions on a good mechanism to pick this number most welcome.
Not very important, but: Assuming that VeryLargeBlocks will only be populated in the pathological case, micro-optimizing with a *Small*PtrSet seems unnecessary. Perhaps it's more memory-efficient on average to just use a DenseSet?