(This patch superceeds D104519, but does it better)
Track machine values within sub-fields of stack slots. Sometimes we generate code that writes to a subregister, then spills / restores a super-register to the stack, for example:
$eax = MOV32ri 0 MOV64mr $rsp, 1, $noreg, 16, $noreg, $rax $rcx = MOV64rm $rsp, 1, $noreg, 8, $noreg
Which happens a lot on x86, because stack spills tend to be widened to optimise instruction encoding. Right now, we can't identify that $ecx contains the value from $eax having a constant loaded into it.
Over in D104519 I took a shot at this, by tracking subregister indexes within stack slots. However this wasn't a good solution:
- The largest registers don't have subregister indexes,
- There can be different register hierarchies (eax vs xmm0), the largest of each not having a subregister index,
- A courtesy look at ARM targets suggest they have ~130 subregister indexes.
Which is a recipe for confusion and inefficiency.
This patch takes a different approach: it adds another index to MLocTracker that identifies a size/offset within a stack slot. A location on the stack is then a pair of {StackNum, SlotNum}, where the stack number is an identifier for the stacks FrameIndex, while the SlotNum tells us where within that FrameIndex we're referring to. I've added a diagram to the class docustring for MLocTracker to try and illustrate this. The benefit is that we don't have to consider relationships between registers when identifying something on the stack, only size and offset. It also coalesces locations that are the same size/offset, but have different subregister numbers.
Spilling and restoring is now a matter of identifying the src/dest register number, and the dest/src stack position, then copying a ValueIDNum between the two.
One limitation this exposes, is that if a PHI happens inside a stack slot, LLVM doesn't record how large the value is. If we have a DBG_PHI of %stack.0, is that referring to a 32 bit or 64 bit value within it? Possibly in the future we'll need to record a size in a DBG_PHI instruction, but until then, this affects a very small number of locations. Overall, this patch recovers an additional 1% of variable locations on a clang3.4 build, largely due to patterns like the one copied above.
I've also added unit tests to ensure that values are recorded correctly on the stack, and things like overwriting a spill of $rax with $xmm0 doesn't lead to unexpected values turning up, etc.
nit: unnecessarily