This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo][Instr] Track subregisters across stack spills/restores
ClosedPublic

Authored by jmorse on Oct 20 2021, 5:17 AM.

Details

Summary

(This patch superceeds D104519, but does it better)

Track machine values within sub-fields of stack slots. Sometimes we generate code that writes to a subregister, then spills / restores a super-register to the stack, for example:

$eax = MOV32ri 0 
MOV64mr $rsp, 1, $noreg, 16, $noreg, $rax
$rcx = MOV64rm $rsp, 1, $noreg, 8, $noreg

Which happens a lot on x86, because stack spills tend to be widened to optimise instruction encoding. Right now, we can't identify that $ecx contains the value from $eax having a constant loaded into it.

Over in D104519 I took a shot at this, by tracking subregister indexes within stack slots. However this wasn't a good solution:

  • The largest registers don't have subregister indexes,
  • There can be different register hierarchies (eax vs xmm0), the largest of each not having a subregister index,
  • A courtesy look at ARM targets suggest they have ~130 subregister indexes.

Which is a recipe for confusion and inefficiency.

This patch takes a different approach: it adds another index to MLocTracker that identifies a size/offset within a stack slot. A location on the stack is then a pair of {StackNum, SlotNum}, where the stack number is an identifier for the stacks FrameIndex, while the SlotNum tells us where within that FrameIndex we're referring to. I've added a diagram to the class docustring for MLocTracker to try and illustrate this. The benefit is that we don't have to consider relationships between registers when identifying something on the stack, only size and offset. It also coalesces locations that are the same size/offset, but have different subregister numbers.

Spilling and restoring is now a matter of identifying the src/dest register number, and the dest/src stack position, then copying a ValueIDNum between the two.

One limitation this exposes, is that if a PHI happens inside a stack slot, LLVM doesn't record how large the value is. If we have a DBG_PHI of %stack.0, is that referring to a 32 bit or 64 bit value within it? Possibly in the future we'll need to record a size in a DBG_PHI instruction, but until then, this affects a very small number of locations. Overall, this patch recovers an additional 1% of variable locations on a clang3.4 build, largely due to patterns like the one copied above.

I've also added unit tests to ensure that values are recorded correctly on the stack, and things like overwriting a spill of $rax with $xmm0 doesn't lead to unexpected values turning up, etc.

Diff Detail

Event Timeline

jmorse created this revision.Oct 20 2021, 5:17 AM
jmorse requested review of this revision.Oct 20 2021, 5:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 20 2021, 5:17 AM

I think this looks good (the patch sounds great), but I definitely need to go over it at least once more before approving to be sure I understand the changes. In amongst the inline nits there are a couple of questions.

llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
753

nit: location's

1203

std::array here could be preferable so the size can be used in the loop below?

1385

nit: Is FI used anywhere else after these isXXFromStackSlotPostFE calls? The comment above makes it sound like the answer is yes, but I can't see it being used anywhere.

1459

Is this an assumption that needs to be chceked in an assert, or is it that it's handled gracefully / won't cause a catastrophe if it is ever wrong?

llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.h
299–323

nit: unnecessarily

361

nit: missing the 3rd / in ///

373

nit: missing the 3rd / in ///

384

Seems like it could be easy to mix up the fields in a pair of identical types. If this could be implemented as a struct without too much boilerplate I'd be slightly in favour of going down that route, but this isn't a strong opinion / request.

446

nit: apram -> param

469

I am a little confused with the large number of different indices flying around in here, so this might be a silly question. Should there be another number subtracted here, equivalent to the Idx added to the ID in the function above (getSpillIDWithIdx)? Or is this auto-magically handled due to integer division rounding or something?

This looks cool.

I was wondering, shell we be paying attention to subregisters in spills/restores in the case of VarLocBasedLDV?

I was wondering, shell we be paying attention to subregisters in spills/restores in the case of VarLocBasedLDV?

This should be possible -- it'd involve enumerating all subregisters in `transferSpillOrRestoreInst, finding VarLocs for them, then moving them to a stack location. It would probably mean tracking the size of different spills, so that when a stack restore happens, the right variables are restored to the right subregisters.

It might come with a performance cost, as the number of locations tracked for each variable on the stack goes up a lot. InstrRefBasedLDV can get away with this because it only solves the problem once, it doesn't do it for each individual variable. On the other hand, spills are a small proportion of the locations tracked by by LiveDebugValues.

I was wondering, shell we be paying attention to subregisters in spills/restores in the case of VarLocBasedLDV?

This should be possible -- it'd involve enumerating all subregisters in `transferSpillOrRestoreInst, finding VarLocs for them, then moving them to a stack location. It would probably mean tracking the size of different spills, so that when a stack restore happens, the right variables are restored to the right subregisters.

It might come with a performance cost, as the number of locations tracked for each variable on the stack goes up a lot. InstrRefBasedLDV can get away with this because it only solves the problem once, it doesn't do it for each individual variable. On the other hand, spills are a small proportion of the locations tracked by by LiveDebugValues.

I see... But I guess it can give us a decent location coverage improvement, especially for targets such as x86, right?

I see... But I guess it can give us a decent location coverage improvement, especially for targets such as x86, right?

Probably yes -- it's hard to say in advance because InstrRef is better at handling spill locations. It can follow values across both the stack and in registers, while VarLocBasedLDV has to commit to either following a spill or staying in a register. Definitely worth prototyping though.

Note that I'm cooking up a plan to turn instr-ref on by default for X86 for LLVM14, although it's not a certain thing.

I see... But I guess it can give us a decent location coverage improvement, especially for targets such as x86, right?

Probably yes -- it's hard to say in advance because InstrRef is better at handling spill locations. It can follow values across both the stack and in registers, while VarLocBasedLDV has to commit to either following a spill or staying in a register. Definitely worth prototyping though.

Thanks.

Note that I'm cooking up a plan to turn instr-ref on by default for X86 for LLVM14, although it's not a certain thing.

I am happy to hear that, good luck!

jmorse added inline comments.Oct 22 2021, 3:55 AM
llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.h
469

The latter; I'm relying on the fact that integer division truncates any remainder, the "index" / "idx" / whatever will be eliminated from "ID" in the process. To illustrate, here's an example LocIDToLocIdx table, assuming there are 7 "Idxes"

0 => AL
1 => AH
[... some ~288 X86 registers]
288 => Slot 0 Idx 0
289 => Slot 0 Idx 1
290 => Slot 0 Idx 2
291 => Slot 0 Idx 3
292 => Slot 0 Idx 4
293 => Slot 0 Idx 5
294 => Slot 0 Idx 6
295 => Slot 1 Idx 0
296 => Slot 1 Idx 1
297 => Slot 1 Idx 2
298 => Slot 1 Idx 3
299 => Slot 1 Idx 4
300 => Slot 1 Idx 5
301 => Slot 1 Idx 6

This function subtracts to find the range where the "slots" are recorded, then divides by 7 to remove the "Idx" part, leaving only the slot number.

Orlando accepted this revision.Oct 22 2021, 7:03 AM

Ok, been over this again and LGTM (for real this time). I'm a big fan of all the testing.

llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.h
469

SGTM, ty for that. Might be worth just mentioning that in a comment briefly but I'll leave that up to you.

This revision is now accepted and ready to land.Oct 22 2021, 7:03 AM
jmorse updated this revision to Diff 381534.Oct 22 2021, 7:12 AM

Add a little more type safety to spill location numbers -- the number of the spill in the UniqueVector is now wrapped, so that it's a bit more obvious what's going on.

(All the number of numbers and indexes continues to be a weakness)

jmorse marked 10 inline comments as done.Oct 22 2021, 7:40 AM
jmorse added inline comments.
llvm/lib/CodeGen/LiveDebugValues/InstrRefBasedImpl.cpp
1385

Looks like it isn't; it was in the past, I think, now it's un-necessary. I'll rename "DummyFI" and fix the comment.

1459

I think we'd have to decode the machine-instruction and work out where it was addressing, to assert that. IMO this is a design assumption: we can't assert that LLVM always does the right thing, only document what was originally expected by this code.

(Another design assumption, for example, is that all modifications to stack slots have a MachineMemoryOperand attached to the instruction, which we can't really assert for).

jmorse updated this revision to Diff 381548.Oct 22 2021, 7:44 AM
jmorse marked 2 inline comments as done.

(Address Orlandos review comments)

This revision was landed with ongoing or failed builds.Oct 22 2021, 11:23 AM
This revision was automatically updated to reflect the committed changes.