This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][CodeGen] Fixing stack alignment of HFA arguments on AArch64 PCS
AbandonedPublic

Authored by pratlucas on Mar 10 2020, 3:59 AM.

Details

Summary

Properly complying with AArch64 PCS on the handling of over-aligned HFA
arguments when those are placed on the stack. AAPCS64 specifies that the
stacked argument address should be rounded up to the Natural Alignment
of the argument before the argument is copied to memory.

Over alignment information extracted from language attributes on clang
was not properly propagated to the backend for those arguments, as it
does not map to the backend's base type alignments. As the standard also
specifies that, when placed in registers, a single FP register should be
allocated for each individual HFA element, type coercion is no suitable
for capturing the alignment constraints.

This patch fixes the alignment of these arguments by capturing their
stack alignment requirements in an IR argument attribute, making sure
this information is available for the calling convention lowering stage.

Diff Detail

Event Timeline

pratlucas created this revision.Mar 10 2020, 3:59 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 10 2020, 3:59 AM
pratlucas updated this revision to Diff 249360.Mar 10 2020, 6:39 AM

Clang-format.

I've not looked at the code in detail yet, but I think this still gets the ABI wrong for this example:

typedef struct {
  __attribute__ ((__aligned__(32))) double v[4];
} TYPE1;

double func(double d0, double d1, double d2, double d3,
            double d4, double d5, double d6, double d7,
            float stk0, TYPE1 stk1) {
  return stk1.v[0];
}

The ABI says (https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst, rule B.5):

If the argument is an alignment adjusted type its value is passed as a copy of the actual value. The copy will have an alignment defined as follows.
* For a Fundamental Data Type, the alignment is the natural alignment of that type, after any promotions.
* For a Composite Type, the alignment of the copy will have 8-byte alignment if its natural alignment is <= 8 and 16-byte alignment if its natural alignment is >= 16.
The alignment of the copy is used for applying marshaling rules.

This means that stk1 should be passed as a copy with alignment 16 bytes, putting it at sp+16. GCC does this, clang without this patch passes it at sp+8, and clang with this patch passes it at sp+32.

I believe rule B.2 from the AAPCS64 should take precedence over rule B.5 for HFA arguments:

B.2	If the argument type is an HFA or an HVA, then the argument is used unmodified.
pratlucas edited reviewers, added: ostannard; removed: olista01.Mar 30 2020, 3:58 AM
rnk added a comment.Mar 30 2020, 12:51 PM

This means that stk1 should be passed as a copy with alignment 16 bytes, putting it at sp+16. GCC does this, clang without this patch passes it at sp+8, and clang with this patch passes it at sp+32.

MSVC puts it at sp+8: https://gcc.godbolt.org/z/aAku9j

They allegedly follow this same document. Either way, we need to remain compatible.

Given that neither GCC nor MSVC do things they way you are proposing, are you sure this is correct? What compiler does this change make us more compatible with?

I don't believe there is an issue from the user's standpoint, since Clang copies the entire argument into an appropriately aligned alloca. If it is address-taken and stored to, there will not be any faults.

From the AAPCS64's Parameter Passing Rules section (https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#642parameter-passing-rules), I believe the proposed handling is correct. The HFA related rules described in this section are:

Stage B – Pre-padding and extension of arguments
[...]
B.2 	If the argument type is an HFA or an HVA, then the argument is used unmodified.
[...]
Stage C – Assignment of arguments to registers and stack
[...]
C.2 	If the argument is an HFA or an HVA and there are sufficient unallocated SIMD and Floating-point registers (NSRN + number of members <= 8), then the argument is allocated to SIMD and Floating-point Registers (with one register per member of the HFA or HVA). The NSRN is incremented by the number of registers used. The argument has now been allocated.
C.3 	If the argument is an HFA or an HVA then the NSRN is set to 8 and the size of the argument is rounded up to the nearest multiple of 8 bytes.
C.4 	If the argument is an HFA, an HVA, a Quad-precision Floating-point or Short Vector Type then the NSAA is rounded up to the larger of 8 or the Natural Alignment of the argument’s type.
[...]

As per rule C.4, the argument should be allocated on the stack address rounded to the larger of 8 and its Natural Alignment, which is 32 according to what is specified by the Composite Types rules in sectoin 5.6 of that same document (https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#composite-types):

5.6   Composite Types
[...]
- The natural alignment of a composite type is the maximum of each of the member alignments of the 'top-level' members of the composite type i.e. before any alignment adjustment of the entire composite is applied

In regards to the compatibility with other compilers, I'm not sure that following what seems to be an uncompliant behavior would be the best way to proceed. @rnk and @ostannard, what would be your take on this?

rnk added a comment.Apr 10 2020, 1:37 PM

In regards to the compatibility with other compilers, I'm not sure that following what seems to be an uncompliant behavior would be the best way to proceed. @rnk and @ostannard, what would be your take on this?

I don't have any familiarity with the prevailing practices for ARM ABI compatibility, so I couldn't say. It might be worth checking in with stakeholders from the other compilers, i.e. file bugs against both compilers and ask for an opinion.

llvm/docs/LangRef.rst
1220

This seems like you are introducing a new meaning to alignstack, which according to the comments, only affects function SP alignment, not parameter alignment.

I'm assuming the reason you can't use the regular align attribute is that it is overloaded to mean two things: the alignment of the pointer when applied to a pointer, and the alignment of the argument memory when that pointer argument is marked byval. If you want to resolve this ambiguity, it seems like something that should be discussed on llvm-dev with a wider audience.

pratlucas abandoned this revision.Feb 2 2021, 2:06 AM
chill added a subscriber: chill.Feb 5 2021, 1:49 AM
chill added inline comments.
llvm/docs/LangRef.rst
1220

Sorry, I couldn't quite get it, do you suggest we should be using the align attribute instead of alignstack, if there are no
(major) objections on the llvm-dev list?

It certainly makes sense to me to use align as it already pertains to individual argument alignment (even though it's for pointers only now).

rnk added inline comments.Feb 8 2021, 1:26 PM
llvm/docs/LangRef.rst
1220

Mostly I think I meant that this will be a big change in the meaning of either the align or the alignstack attributes, and that should be hashed out on llvm-dev.

Right now align is kind of a hybrid between an optimization annotation attribute, like dereferenceable or nonnull, and an ABI attribute, like byval or inreg. When align is used with byval, it affects argument memory layout. When byval is not present, it is just an optimizer hint. IMO, ideally, we should separate those two roles.

I should be able to control the alignment of the memory used to pass a pointer argument, at the same time that I annotate which low bits of the pointer are known to be zero.