This patch reads machine value numbers from DBG_PHI instructions (marking where SSA PHIs used to be), and matches them up with DBG_INSTR_REF instructions that refer to them. Essentially they are two separate parts of a DBG_VALUE: the place to read the value (register and program position), and where the variable is assigned that value.
...Unfortunately, it's not quite that simple though.
A PHI instruction is computed by control flow -- and after we've left SSA form and run the register allocator, we are back in a scenario where values are identified by their location, like before we were in SSA form. This allows optimisations that change control flow to create and destroy PHI nodes without having to account for them. Specifically, imagine the control flow below being fed to the tail duplicator pass with a PHI node in 'C':
A B \ / \ / C / \ / \ P Q
And it folding C into the two predecessors, making:
A B |\ /| | \ / | | X | | / \ | |/ \| P Q
(The X in the middle isn't a block, it's just edges). This has effectively destroyed the one PHI that happened at block C, and replaced it with two PHIs, one on entry to each of P and Q. LLVM doesn't represent this fact in any way.
This would be fatal to our attempt to track PHIs, however because we placed DBG_PHIs at the positions where the PHIs occurred, we can rely on DBG_PHIs appearing in blocks A and B, copied from C. That then gives us definition points -- and we can treat this like any other SSA problem, one made up of Defs and Uses. In this patch I've used the "Standard" LLVM utility for updating SSA form to do the same with machine value numbers.
Before the technical details, the obvious question is "Isn't that really crazy^W expensive?" I believe this is going to be necessary if we ever hope to get rid of debug instructions from being in the IR/MIR. However, I don't think the cost will be high: for a full stage2 build of clang, the number of blocks examined by the SSA updater was thus (frequency, then number of blocks):
73109 2 18636 3 1611 4 1059 5 472 6 89 7 74 8 38 9 74 12 56 14 30 15
While the number of PHI points identified was:
63744 0 29876 1 970 2 632 3 18 4 8 5
My observation being: having to examine a large number of blocks and do a lot of accounting for solving this SSA problem is very infrequent. Thus I don't believe there will be a serious performance cost. This artifact is likely because the kind of folding that causes this situation happens very late in LLVM, and so there's a much reduced scope for crazy control flow changes.
I don't think this approach is _completely_ crazy. I'm not aware of another way to address the example above (where a PHI is transformed to be two PHIs) without a solution similar to this.
Technical details:
This patch adds another 350 lines of C++ to InstrRefBasedImpl.cpp, which is annoying. However, about 80% of it is boilerplate wrappers to feed block and value information into the SSAUpdater utility, there's very little actual logic. Clearly this file should be split up in the future, but I'd prefer to do that as part of a larger refactor (IMO, YMMV).
Everything interesting happens in the resolveDbgPHIs function. Outside of that, there's an additional instruction handler that records where DBG_PHIs are, and a small amount of code to update their values after the machine value problem is solved. DBG_INSTR_REFs that refer to values defined by DBG_PHI(s) call resolveDbgPHIs to get their machine value number. There's no other changed to the rest of InstrRevBasedLDV -- the dataflow problems don't change, only identifying inputs.
Within resolveDbgPHIs, we present the SSAUpdaterImpl utility with an "Updater" object, a collection of locations where DBG_PHIs read machine value numbers, and a collection in which to stuff new PHIs inferred. SSAUpdaterImpl then makes queries into the SSAUpdaterTraits class, using the Updater object to store state. How it actually goes about building SSA form is beyond the scope of this review; it just returns the value number to use in the block where a DBG_INSTR_REF lives.
After doing so, we have to perfom some validation. SSAUpdater isn't aware that it's working in a non-SSA environment, and so there might be register clobbers along the way that erase the determined value. We examine each PHI to see whether its desired input is actually what the block outputs. If any are wrong, then a necessary PHI is wrong, and we return None.
the uint64_t here seems to denote the number of a value in a preceding LDVSSABlock.
is there anyway we can typedef this away into something more contextual so this pairing reads better? (so long as my assumption about the intent is correct).