Page MenuHomePhabricator

[DebugInfo] Address performance regression with r364515
Needs ReviewPublic

Authored by jmorse on Jul 12 2019, 6:16 AM.

Details

Summary

Hi; Hot on the heels of D56151 being committed in r364515, I wound up reverting it as there were reports of performance regressions on llvm-commits. Building AMDGPUDisassembler.cpp with ASAN enabled showed that register coalescing jumped from three seconds to fourty seconds. This patch cuts down on the performance cost (there's still a little) while preserving the original behaviour. It is, alas, ugly.

I believe the root cause of the performance problem is that DBG_VALUE insts don't get an entry in the SlotIndex map of Insts <=> Slots. The current code iterates through each DBG_VALUE for a register getting its slot, then runs a liveness query. However, getting the slot requires a forwards-walk through the block until a non-debug instruction is found. With ASAN, packs of up to 800 DBG_VALUEs in a row appear (for that file), each of which gets examined, which ends up having quadratic complexity.

The solution is to not lookup slots from DBG_VALUEs, but to know the slot and then find relevant DBG_VALUEs. To this end, I've replaced the per-dbg-value query method with some helper functions that will scan a basic block for unsound DBG_VALUEs, and maintains a copy of the current SlotIndex instead of lookup up for each DBG_VALUE.

This isn't pleasant, and just writing more C++ to fix problems isn't great, but:

  • We now have to query liveness to ensure DBG_VALUE soundness,
  • I imagine that indexing DBG_VALUE -> slots would be either expensive or risk causing codegen changes with -g,
  • We're going to wind up iterating over instructions at some point to fix this.

(Note that r364515 was reverted in r365448, this diff shows the original-versus-new, as opposed to what's on trunk right now).

Diff Detail

Event Timeline

jmorse created this revision.Jul 12 2019, 6:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2019, 6:16 AM

With ASAN, packs of up to 800 DBG_VALUEs in a row appear (for that file)

That seems excessive; in the past when I've seen this explosion of DBG instructions, the vast majority were redundant. Have you looked at them to see if this is the case? We might rather eliminate duplicates than write code to make it cheaper to have lots of unnecessary instructions.

rnk added a subscriber: rnk.Jul 12 2019, 11:26 AM

With ASAN, packs of up to 800 DBG_VALUEs in a row appear (for that file)

That seems excessive; in the past when I've seen this explosion of DBG instructions, the vast majority were redundant. Have you looked at them to see if this is the case? We might rather eliminate duplicates than write code to make it cheaper to have lots of unnecessary instructions.

dbg.value packs are a major issue, ASan aside. I wrote this proposal a while back and never followed through on it:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127228.html

With the lack of follow-through on my part, I think we should take Chris's suggestion of changing our representation to multiplex dbg.value, so that one instruction can describe an arbitrary number of variable locations without increasing the number of instructions that other passes have to iterate over.

Happily D58453 killing off a large amount of placeDbgValues activity significantly reduces DBG_VALUE grouping -- I don't have the numbers to hand, but I would say the density was almost an order of magnitude lower. The largest back in the benchmark I referred to was about ~120, and other large packs occurred much less frequently.

Reid wrote:

dbg.value packs are a major issue, ASan aside. I wrote this proposal a while back and never followed through on it:
http://lists.llvm.org/pipermail/llvm-dev/2018-October/127228.html

With the lack of follow-through on my part, I think we should take Chris's suggestion of changing our representation to multiplex dbg.value, so that one instruction can describe an arbitrary number of variable locations without increasing the number of instructions that other passes have to iterate over.

While this review isn't the place, this is definitely an area I'd want to invest time into, debug-info in the instruction stream is a frequent pain.