(That's right, even more patches! This is the third and final patch series for the instruction referencing work, see D85741. Everything "Interesting" should be covered by this and the two prior series, there will be a couple more individual patches to mop up optimisations that drop instruction numbers, and after that we're Done (TM)).
Very late in compilation, backends like X86 will perform optimsations like this:
%cx = MOV16rm $rax, blah blah blah -> %rcx = MOV64rm $rax, blah blah blah
Widening the load from 16 bits to 64 bits. Seeing how the lower 16 bits remain the same this doesn't affect execution, but is a more compact instruction. However, any debug instruction reference to the defined operand now refers to a 64 bit value, not a 16 bit one, which might be unexpected. Similar things happen earlier in CodeGen:
CALL64pcrel32 @foo, implicit-def $rax %0:gr64 = COPY $rax %1:gr32 = COPY %0.sub_32bit
For reasons that will become clear later, we definitely don't want to label any of the COPYs with debug-instr-number. If we care about %1 for debug purposes, then we need to be able to refer to $eax's definition at the CALL64pcrel32, but it defines the wider $rax. This isn't just a matter of width either: some architectures let you refer to the high and low parts of a superregister (such as $al and $ah).
The solution: add a subregister field to the existing "substitutions" facility that lets us point one instruction / operand pair at another. Using the widening example from above, if we had:
%cx = MOV16rm $rax, blah blah blah, debug-instr-number 1 DBG_INSTR_REF 1, 0
And widened it to be a 64 bit load, record a substitution with a subregister qualifier:
debugValueSubstitutions: - { srcinst: 2, srcop: 0, dstinst: 1, dstop: 0, subreg: 4 } %rcx = MOV64rm $rax, blah blah blah, debug-instr-number 2 DBG_INSTR_REF 1, 0
The DBG_INSTR_REF still refers to the same number; but a substitution has to be looked up. And that substitution specifies that the sub_16bit field of instruction 1 / operand 0 should be read. These can even be chained.
This patch extends a few APIs to cope with this, and as a demonstration adds the relevant instrumentation to the X86FixupBWInsts pass, which performs the optimisation described above. I've also fiddled with a few tests for substitution stuff now that we've added a new field.
Having established that this feature is needed, the next patch adds support to InstrRefBasedLDV, the third makes use of it when handing COPYs in SelectionDAG, and the fourth does some cleanup.
Do we need the std::map sortness here?