This patch addresses several alignment issues in the stack frame when
RVV objects are taken into account.
One bug is that the RVV stack was never guaranteed to keep the alignment
of the stack *as a whole*. We must maintain a 16-byte aligned stack at
all times, especially when calling other functions. With the standard V
extension, this is conveniently happening since VLEN is at least 128 and
always 16-byte aligned. However, we support Zvl64b which does not
guarantee this. To fix this, the RVV stack size is rounded up to be
aligned to 16 bytes. This in practice generally makes us allocate a
stack sized at least 2*VLEN in size, and a multiple of 2.
|------------------------------| -- <-- FP | 8-byte callee-save | | | |------------------------------| | | | one VLENB-sized RVV object | | | |------------------------------| | | | 8-byte local variable | | | |------------------------------| -- <-- SP (must be aligned to 16)
In the example above, with Zvl64b we are decrementing SP by 12 bytes
which does not leave SP correctly aligned. We therefore introduce an
extra VLENB-sized amount used for alignment. This would therefore ensure
the total stack size was 16 bytes (48 for Zvl128b, 80 for Zvl256b, etc):
|------------------------------| -- <-- FP | 8-byte callee-save | | | |------------------------------| | | | one VLENB-sized padding obj | | | | one VLENB-sized RVV object | | | |------------------------------| | | | 8-byte local variable | | | |------------------------------| -- <-- SP
A new RVV invariant has been introduced in this patch, which is that the
base of the RVV stack itself is now always aligned to 16 bytes, not 8 as
before. This keeps us more in line with the scalar stack and should be
easier to reason about. The calculation of the RVV padding has thus
changed to be the amount required to align the scalar local variable
section to the RVV section's alignment. This amount is further rounded
up when setting up the initial stack to keep everything aligned:
|------------------------------| -- <-- FP | 8-byte callee-save | |------------------------------| | | | RVV objects | | (aligned to at least 16) | | | |------------------------------| | RVV padding of 8 bytes | |------------------------------| | 8-byte local variable | |------------------------------| -- <-- SP
In the example above, it's clear that we need 8 bytes of padding to keep
the RVV section aligned to 16 when using SP. But to keep SP *itself*
aligned to 16 we can't decrement the initial stack pointer by 24 - we
have to round up to 32.
With the RVV section correctly aligned, the second bug fixed by
this patch is that RVV objects themselves are now correctly aligned. We
were previously only guaranteeing an alignment of 8 bytes, even if they
required a higher alignment. This is relatively simple and in practice
we see more rounding up of VLEN amounts to account for alignment in
between objects:
|------------------------------| | RVV object (aligned to 16) | |------------------------------| | no padding necessary | |------------------------------| | 2*VLENB RVV object (align 16)| |------------------------------| | VLENB alignment padding | |------------------------------| | RVV object (align 32) | |------------------------------| | 3*VLENB alignment padding | |------------------------------| | VLENB RVV object (align 32) | |------------------------------| -- <-- base of RVV section
Note that a lot of the regressions in codegen owing to the new alignment
rules are correct but actually only strictly necessary for Zvl64b (and
Zvl32b but that's not really supported). I plan a follow-up patch to
take the known VLEN into account when padding for alignment.
If I didn't misunderstand, the SP here is assumed that align to maxalign, which is the max alignment of all stack objects (including rvv stack objects). But I found https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/MachineFrameInfo.cpp#L61 only records max align on non-scalable stack objects.