This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Preserve full vector regs over EH edge.
ClosedPublic

Authored by sdesmalen on Jul 28 2020, 2:45 AM.

Details

Summary

Unwinders may only preserve the lower 64bits of Neon and SVE registers,
as only the registers in the base ABI are guaranteed to be preserved
over the exception edge. The caller will need to preserve additional
registers for when the call throws an exception and the unwinder has
tried to recover state.

For e.g.

svint32_t bar(svint32_t);
svint32_t foo(svint32_t x, bool *err) {
  try { bar(x); } catch (...) { *err = true; }
  return x;
}

z0 needs to be spilled before the call to bar(x) and reloaded before
returning from foo, as the exception handler may have clobbered z0.

Diff Detail

Event Timeline

sdesmalen created this revision.Jul 28 2020, 2:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2020, 2:45 AM
Harbormaster returned this revision to the author for changes because remote builds failed.Jul 28 2020, 2:46 AM
Harbormaster failed remote builds in B65982: Diff 281159!
sdesmalen updated this revision to Diff 281237.Jul 28 2020, 8:01 AM

Fixed patch to look at call instruction and callee.

sdesmalen requested review of this revision.Jul 28 2020, 8:01 AM
rsandifo-arm added inline comments.Aug 4 2020, 4:41 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
4710 ↗(On Diff #281237)

Like we talked about off-list:

  • it might be better to drop the doesNotThrow check. We should only have an InvokedInst if the call can “return” to a known landing pad.
  • It might be better to do this by adding clobbers at the start of the landing pad instead, a bit like the existing getBeginClobberMask. That would also handle the corresponding problem for aarch64_vector_pcs functions and would mean that normal returns weren't penalised by the exceptional case.
sdesmalen updated this revision to Diff 285037.Aug 12 2020, 4:27 AM
sdesmalen retitled this revision from [AArch64][SVE] Preserve SVE regs when callee may throw an exception. to [AArch64][SVE] Preserve full vector regs over EH edge..
sdesmalen edited the summary of this revision. (Show Details)
  • Set clobber mask for EH pad block, instead of changing the CC of the call.
  • Rebased patch

Thanks the update, the idea looks good in general. Two questions though:

  • Is it worth having two functions? I thought it might be simpler to have just getCustomEHPadPreservedMask(), with nullptr meaning “no clobbers”.
  • It looks from the tests like the function isn't saving or restoring z8-z23 and p4-p15, is that right? That's fine for landing pads that can't return directly to the callee (e.g. because they rethrow). But in these tests the landing pad does return to the caller. Since the extra callee-saved registers are clobbered on the EH edge, we need to save and restore those registers ourselves on control paths that include an EH edge and a return.
  • Made getCustomEHPadPreservedMask() return nullptr by default (meaning 'no clobbers')
  • Fixed missing callee saves/restores for clobbered registers.
  • Updated tests (and also added a RUN line for regalloc=fast)

Thanks the update, the idea looks good in general.

Thanks for the feedback!

Two questions though:

  • Is it worth having two functions? I thought it might be simpler to have just getCustomEHPadPreservedMask(), with nullptr meaning “no clobbers”.

The reason I had two functions at first was because TargetRegisterInfo::getCallPreservedMask() returns nullptr as well, but in that case this means "clobbers everything", where for this function it means the opposite. I have changed the default now, and clarified the default in the comment of the method declaration.

  • It looks from the tests like the function isn't saving or restoring z8-z23 and p4-p15, is that right? That's fine for landing pads that can't return directly to the callee (e.g. because they rethrow). But in these tests the landing pad does return to the caller. Since the extra callee-saved registers are clobbered on the EH edge, we need to save and restore those registers ourselves on control paths that include an EH edge and a return.

Yes you're right, thanks for pointing that out! I have fixed that in the latest revision by calling addPhysRegsUsedFromRegMask for the mask returned by getCustomEHPadPreservedMask. I tried doing this for the mask returned by MachineBasicBlock::getBeginClobberMask (which seemed to me like the right thing to do considering it's name), but that broke mosts tests for Windows EH (on both AArch64 as well as X86).

LGTM, but I don't think I should be the one to accept.

efriedma added inline comments.Aug 18 2020, 3:18 PM
llvm/lib/CodeGen/RegAllocFast.cpp
1215 ↗(On Diff #286106)

This seems like a really weird location for this call; it's modifying function-global state in the middle of register allocation.

I'd prefer to do one of the following:

  1. Attach the register mask to the landingpad instruction itself.
  2. Update the function-global state when we construct the landing pad instruction/node.
sdesmalen updated this revision to Diff 288339.Aug 27 2020, 8:16 AM
  • Updated patch with option 2: Update the function-global state when we construct the landing pad instruction/node.

Does MIRParser also need to be modified?

sdesmalen updated this revision to Diff 288557.Aug 28 2020, 2:13 AM
  • MIRParser now correctly marks registers as clobbered and added corresponding .mir test file.
  • Updated RUN lines of unwind-preserved.ll to test with/without global isel.

Does MIRParser also need to be modified?

You're right, that was missing. Thanks for pointing out!

llvm/lib/CodeGen/RegAllocFast.cpp
1215 ↗(On Diff #286106)

This seems like a really weird location for this call; it's modifying function-global state in the middle of register allocation.

The same seems to happen in allocateInstruction(MI) (see line 989) for an instruction when one of the operands is a register mask.

sdesmalen added inline comments.Aug 28 2020, 2:16 AM
llvm/lib/CodeGen/RegAllocFast.cpp
1215 ↗(On Diff #286106)

Sorry, please ignore this comment (I wrote it before I updated this revision the other day, but forgot to remove it).

Hi @efriedma, what do you think of the latest changes to this patch?

(I'm really sorry for the quick prod, but I'm (perhaps falsely) hoping this fix may still have a small chance to make it into the release after it lands. Of course not before you're happy with it first)

efriedma accepted this revision.Sep 1 2020, 2:31 PM

LGTM

I'm not completely confident this is the best approach; it might come out cleaner if the clobber is attached to a MachineInstr. As it is, it's depending on sort of fragile invariants to ensure the value isn't live across the unwind edge: every register allocator has to be aware of getCustomEHPadPreservedMask, and any post-regalloc optimizations that the unwind edge is special. But I can't come up with any concrete example that would actually cause trouble.

This revision is now accepted and ready to land.Sep 1 2020, 2:31 PM
This revision was automatically updated to reflect the committed changes.

Thanks for reviewing the patch!

I'm not completely confident this is the best approach; it might come out cleaner if the clobber is attached to a MachineInstr.

What MachineInstr would it need to be attached to? When adding the clobbers to the invoke instruction that penalises the normal return. When adding the clobbers to e.g. the EH_LABEL instruction in the unwind-block, the register spills are added at the start of the unwinding block rather than before the call instruction.

As it is, it's depending on sort of fragile invariants to ensure the value isn't live across the unwind edge: every register allocator has to be aware of getCustomEHPadPreservedMask, and any post-regalloc optimizations that the unwind edge is special.

The clobber mask is applied when calculating the live intervals which is used as input to the register allocator, so I would think that register allocators themselves don't need to be aware of this as well. I.e. Once this information is reflected in the live ranges, there isn't anything that the register allocator should do to change that.

What MachineInstr would it need to be attached to?

We'd need some instruction that's guaranteed to be at the beginning of the block. Probably EH_LABEL should fall into that category. Granted, I guess the register allocators aren't aware of this restriction at the moment.

The clobber mask is applied when calculating the live intervals which is used as input to the register allocator, so I would think that register allocators themselves don't need to be aware of this as well. I.e. Once this information is reflected in the live ranges, there isn't anything that the register allocator should do to change that.

We have at least one register allocator that doesn't use LiveIntervals. Fast regalloc doesn't keep any values live across blocks, though