Hi; Hot on the heels of D56151 being committed in r364515, I wound up reverting it as there were reports of performance regressions on llvm-commits. Building AMDGPUDisassembler.cpp with ASAN enabled showed that register coalescing jumped from three seconds to fourty seconds. This patch cuts down on the performance cost (there's still a little) while preserving the original behaviour. It is, alas, ugly.
I believe the root cause of the performance problem is that DBG_VALUE insts don't get an entry in the SlotIndex map of Insts <=> Slots. The current code iterates through each DBG_VALUE for a register getting its slot, then runs a liveness query. However, getting the slot requires a forwards-walk through the block until a non-debug instruction is found. With ASAN, packs of up to 800 DBG_VALUEs in a row appear (for that file), each of which gets examined, which ends up having quadratic complexity.
The solution is to not lookup slots from DBG_VALUEs, but to know the slot and then find relevant DBG_VALUEs. To this end, I've replaced the per-dbg-value query method with some helper functions that will scan a basic block for unsound DBG_VALUEs, and maintains a copy of the current SlotIndex instead of lookup up for each DBG_VALUE.
This isn't pleasant, and just writing more C++ to fix problems isn't great, but:
- We now have to query liveness to ensure DBG_VALUE soundness,
- I imagine that indexing DBG_VALUE -> slots would be either expensive or risk causing codegen changes with -g,
- We're going to wind up iterating over instructions at some point to fix this.
(Note that r364515 was reverted in r365448, this diff shows the original-versus-new, as opposed to what's on trunk right now).