Page MenuHomePhabricator

[AArch64][SVE] Add missing unwind info for SVE registers.
ClosedPublic

Authored by sdesmalen on Jul 17 2020, 9:45 AM.

Details

Summary

This patch adds a CFI entry for each SVE callee saved register
that needs unwind info at an offset from the CFA. The offset is
a DWARF expression because the offset is partly scalable.

The CFI entries only cover a subset of the SVE callee-saves and
only encodes the lower 64-bits, thus implementing the lowest
common denominator ABI. Existing unwinders may support VG but
only restore the lower 64-bits.

Diff Detail

Event Timeline

sdesmalen created this revision.Jul 17 2020, 9:45 AM
Herald added a project: Restricted Project. · View Herald Transcript

I'm confused by the stack layout involved here. SVE registers should never be callee-save. The ABI says we're only supposed to save the low 64 bits.

llvm/lib/Target/AArch64/AArch64FrameLowering.cpp
478

Can you use a SmallString here, so you don't have to do weird pointer casts?

sdesmalen updated this revision to Diff 279486.Jul 21 2020, 4:48 AM
sdesmalen marked an inline comment as done.

Use SmallString instead of SmallVector<uint8_t>.

I'm confused by the stack layout involved here. SVE registers should never be callee-save. The ABI says we're only supposed to save the low 64 bits.

Just in case you were looking at the wrong version of the spec, https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#scalable-vector-registers contains the sections for SVE.

From 6.1.3:

If a subroutine takes at least one argument in scalable vector registers or scalable predicate registers, or if it is a function that returns results in such registers, it must ensure that the entire contents of z8-z23 are preserved across the call

The tests above have the aarch64_sve_vector_pcs calling convention annotation to enforce the SVE ABI for these functions, because they otherwise don't take or return any values. e.g.

define aarch64_sve_vector_pcs void @save_restore_pregs_sve() { entry: unreachable }

Oh, I missed that the IR declarations in the testcase were messing with the calling convention.

The CFI entries only cover a subset of the SVE callee-saves and only encodes the lower 64-bits, thus implementing the lowest common denominator ABI. Existing unwinders may support VG but only restore the lower 64-bits.

I'm not sure I understand this. If the calling convention requires saving the whole SVE register, unwinding needs to restore the entire SVE register, or else we corrupt the register in exception unwinding.

The CFI entries only cover a subset of the SVE callee-saves and only encodes the lower 64-bits, thus implementing the lowest common denominator ABI. Existing unwinders may support VG but only restore the lower 64-bits.

I'm not sure I understand this. If the calling convention requires saving the whole SVE register, unwinding needs to restore the entire SVE register, or else we corrupt the register in exception unwinding.

Yes that confused me as well, you are right that this would corrupt the register in exception unwinding.

I asked @rsandifo-arm about this, who suggested that the unwinder restores the callee-saved registers in the base ABI only, i.e. normal returns preserve extra registers, but exceptional returns don't. Additionally, existing unwinders may not support the new SVE registers to begin with. The SVE registers have different DWARF register numbers from their lower V0-V31, which may complicate the unwinder implementation as it needs to take into account the aliasing between the registers.

It seems that unwinders even treat 128-bit V0-V31 as 64-bit D0-D31 if no (.debug_info) context is available. From section 3.1. (DWARF register names) of the DWARF for the ARM 64bit architecture with SVE support document (https://developer.arm.com/documentation/100985/0000/):

  1. The size of a general register is to be taken from context. For instance in a .debug_info section if the DW_AT_location attribute of a variable is DW_OP_reg0 then the number of significant bits in the register is determined by the variable's DW_AT_type attribute. If no context is available (for example in .debug_frame or .eh_frame sections) then the register number refers to a 64-bit register.

:

  1. In a similar manner to the general register file the size of an FP/Advanced SIMD register is taken from some external context to the register number. If no context is available then the only the least significant 64 bits of the register are referenced. In particular this means that the most significant part of a SIMD register is unrecoverable by frame unwinding.

The behaviour implemented in this patch is synonymous to that of GCC.

I asked @rsandifo-arm about this, who suggested that the unwinder restores the callee-saved registers in the base ABI only, i.e. normal returns preserve extra registers, but exceptional returns don't.

I don't think I've ever seen a target do that, but I guess there isn't any reason it can't work like this. Does clang need additional changes to support this, though?

It's unfortunate for debuggers, though; they also use "unwind" data to print the values of variables in different stack frames, so I'm not sure how that's supposed to work.

sdesmalen updated this revision to Diff 280502.Jul 24 2020, 9:39 AM

rebased patch.

I asked @rsandifo-arm about this, who suggested that the unwinder restores the callee-saved registers in the base ABI only, i.e. normal returns preserve extra registers, but exceptional returns don't.

I don't think I've ever seen a target do that, but I guess there isn't any reason it can't work like this. Does clang need additional changes to support this, though?

No changes are needed in Clang to support this AFAIK.

It's unfortunate for debuggers, though; they also use "unwind" data to print the values of variables in different stack frames, so I'm not sure how that's supposed to work.

Perhaps if LLVM would have a way to distinguish the unwind data (e.g. with debug context or without), it would be possible to generate two different kinds of CFI instructions.

No changes are needed in Clang to support this AFAIK.

(I meant loosely a change to clang and/or LLVM.)

Consider the following testcase:

#include <arm_sve.h>
void g1(svuint8_t);
void g2(svuint8_t);
struct Z { svuint8_t *x; ~Z() { g2(*x); } };
svuint8_t f(svuint8_t a, svuint8_t b) {
  Z z{&a};
  g1(b);
  return b;
}

If unwind doesn't preserve SVE registers, we need a spill here. clang currently doesn't generate one at -O2.

No changes are needed in Clang to support this AFAIK.

(I meant loosely a change to clang and/or LLVM.)

Consider the following testcase:

#include <arm_sve.h>
void g1(svuint8_t);
void g2(svuint8_t);
struct Z { svuint8_t *x; ~Z() { g2(*x); } };
svuint8_t f(svuint8_t a, svuint8_t b) {
  Z z{&a};
  g1(b);
  return b;
}

If unwind doesn't preserve SVE registers, we need a spill here. clang currently doesn't generate one at -O2.

Yes, great spot! The caller should assume that all SVE registers are clobbered by the unwinder if the function _may_ throw an exception. I think I'll create a separate patch to fix this, as it's not necessarily something to do with how LLVM emits the unwind info, but rather a bug where the caller doesn't honour the right CC.

efriedma accepted this revision.Jul 27 2020, 11:44 AM

If we're in a world where unwinders can handle the offset expression, but not actually restoring SVE registers, I guess this is the best we can do. LGTM

Yes, great spot! The caller should assume that all SVE registers are clobbered by the unwinder if the function _may_ throw an exception. I think I'll create a separate patch to fix this, as it's not necessarily something to do with how LLVM emits the unwind info, but rather a bug where the caller doesn't honour the right CC.

Okay. I'd like to see a corresponding change to the ABI document.

This revision is now accepted and ready to land.Jul 27 2020, 11:44 AM

Yes, great spot! The caller should assume that all SVE registers are clobbered by the unwinder if the function _may_ throw an exception. I think I'll create a separate patch to fix this, as it's not necessarily something to do with how LLVM emits the unwind info, but rather a bug where the caller doesn't honour the right CC.

I suspect we're in violent agreement here, but just in case: the SVE PCS rules apply as normal in Eli's testcase: normal returns from g1 and g2 preserve Z8-Z23 and P4-P15 regardless of what exceptions (if any) get thrown during the calls to g1 and g2. But the only state preserved across exception edges is the state that is preserved by the base ABI. This applies:

  • to Advanced SIMD vector PCS functions as well as SVE functions (in particular V16-V23 are not preserved across an exception edge)
  • to asynchronous exceptions as well as synchronous exceptions

In other words, this isn't a property of the calling convention so much as a property of the unwinder itself. The same situation would apply for something like:

void g(svfloat32_t);
svfloat32_t f(svfloat32_t x, int *y) {
  try { *y = 0; } catch (...) { g(x); throw; }
  return x;
}

if non-call exceptions are enabled: the exception edge from the possibly-faulting store to *y would clobber everything except the state preserved by a normal call.

Of course, nothing ever clobbers the SVE register state for normal returns from f, and in particular, this exception handler never returns from f. So the ideal function body would be something like:

f:
        addvl   sp, sp, #-1
        str     z0, [sp]       // Save x in case the EH handler gets run
        str     wzr, [x0]
        addvl   sp, sp, #1
        ret                    // z0 still contains x here

FWIW, GCC handles this by modelling the clobbers on the EH edge itself, rather than as being a property of any call (since, as in the example above, there might not be a call). I haven't checked if LLVM does the same.

sdesmalen updated this revision to Diff 282452.Aug 2 2020, 8:09 AM

Rebased and added comment to .cfi_escape to describe CFI offset DWARF expression (like was added to D84043)