Instead of aligning the last callee-saved-register slot to the stack
alignment (16 bytes), just align the SVE callee-saved block. This also
simplifies the code that allocates space for the callee-saves.
This change is needed to make sure the offset to which the callee-saved
register is spilled, corresponds to the offset used for e.g. unwind call
frame instructions.
Since ZPRs are architecturally defined in multiplies of 16-bytes I don't see why anything needs fixing up.
At the same time I think PPRs are more problematic because they're defined in multiples of 2-bytes and thus we'd need a group of 8 to fulfil alignment.