This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Fixing ABI mismatch for packed structs passed as function arguments
ClosedPublic

Authored by JiruiWu on Mar 16 2023, 10:03 AM.

Details

Summary

Previously when a packed struct, containing vector data types such as
uint16x8_t, is passed as a function argument, the alignment of the
struct used by the function caller and the alignment used by the callee
to load the argument from stack does not match.

This patch implements section 6.8.2, stage C.4 of the Procedure Call
Standard for the Arm 64-bit Architecture (AAPCS64): "If the argument is
an HFA, an HVA, a Quad-precision Floating-point or short vector type
then the NSAA is rounded up to the next multiple of 8 if its natural
alignment is ≤ 8 or the next multiple of 16 if its natural alignment
is ≥ 16." This ensures the alignments of the packed structs used as
function arguments are the same as described in the AAPCS64 for both
the caller and callee.

Reference:
AAPCS64 (https://github.com/ARM-software/abi-aa/blob/latest-release/aapcs64/aapcs64.rst)

Diff Detail

Event Timeline

JiruiWu created this revision.Mar 16 2023, 10:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2023, 10:03 AM
JiruiWu requested review of this revision.Mar 16 2023, 10:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2023, 10:03 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Looks sensible but I don't fully understand the context of the change. Please could you explain more what is wrong with the current behaviour, and which parts of the AAPCS you are referring to.

clang/lib/CodeGen/TargetInfo.cpp
5819

Does the similar code added in D100853 need updated too?

5822

Does this code definitely only apply when the ABI is AAPCS64, or should there be a check for that somewhere here? I can't tell whether the if on line 5806 is sufficient.

clang/test/CodeGen/aarch64-ABI-align-packed.cpp
2

Does this need the arm vendor in the triple?

2

Please add a brief comment explaining what this is testing.

JiruiWu updated this revision to Diff 507313.Mar 22 2023, 4:55 AM
JiruiWu marked 2 inline comments as done.

Addressing review comments.

JiruiWu marked 2 inline comments as done.Mar 22 2023, 4:56 AM
JiruiWu added inline comments.
clang/lib/CodeGen/TargetInfo.cpp
5819

No, because this patch is on AArch64 and https://reviews.llvm.org/D100853 is on AArch32. The alignment is capped to 16 in AArch64 and capped to 8 in AArch32.

5822

The code on line 5814 only applies when the ABI is AAPCS64 because it is in the if block that starts on line 5805 and ends on line 5818. As a result, the if on line 5806 is sufficient.

JiruiWu marked 2 inline comments as done.Mar 22 2023, 4:56 AM

LGTM, but I'm not that familiar with the code that selects the alignment so it would be good to get a second opinion.

clang/lib/CodeGen/TargetInfo.cpp
5814

Should this change cover AAPCS_VFP too?

olista01 accepted this revision.Apr 4 2023, 2:57 AM

LGTM

This revision is now accepted and ready to land.Apr 4 2023, 2:57 AM
rjmccall added inline comments.Apr 4 2023, 5:22 PM
clang/lib/AST/RecordLayoutBuilder.cpp
2118

I've always felt the data flow in this function was excessively convoluted. Let's puzzle it out to figure out what's going on. Ignoring the AIX stuff which I assume can't coincide with AArch64, we've got:

UnpackedFieldAlign = min(max(TyAlign, MaxAlignmentInChars), MaxFieldAlignment)
PackedFieldAlign = min(max(1, MaxAlignmentInChars), MaxFieldAlignment)
FieldAlign = FieldPacked ? PackedFieldAlign : UnpackedFieldAlign

where MaxAlignmentInChars is the highest value of all the alignment attributes on the field and MaxFieldAlignment is the value of #pragma pack that was active at the time of the struct definition.

Note that this gives us PackedFieldAlign <= FieldAlign <= UnpackedFieldAlign.

So:

  1. I think it's wrong to be checking Packed instead of FieldPacked here. But:
  2. If FieldPacked, then because UnpackedFieldAlign >= FieldAlign, the net effect of these three lines is UnadjustedAlignment = std::max(UnadjustedAlignment, UnpackedFieldAlign).
  3. If !FieldPacked, then UnpackedFieldAlign == FieldAlign, so the net effect of these three lines is *also* UnadjustedAlignment = std::max(UnadjustedAlignment, UnpackedFieldAlign).
  4. So actually you don't need to check FieldPacked at all; you should remove the old line and just do your new one unconditionally.

Also, AAPCS64 seems to define UnadjustedAlignment as the "natural alignment", and there's a doc comment saying it's the max of the type alignments. That makes me wonder if we should really be considering either the aligned attribute or #pragma pack in this computation at all; maybe we should just be looking at the type alignment.

dblaikie added inline comments.Apr 18 2023, 4:50 PM
clang/lib/AST/RecordLayoutBuilder.cpp
2118

I think I had a go at this over here & failed, might have some relevant notes: https://reviews.llvm.org/D118511#inline-1140212

But, yeah, would love to see it simplified, if possible - just the data point that I tried and failed recently :/ (& contributed to some of the current complexity)

rjmccall requested changes to this revision.Apr 19 2023, 3:07 PM

Yeah. To be clear, though, I'm not asking for the overall data flow of the function to be fixed in this patch; I'm just pointing out problems in the new logic being added by this patch.

This revision now requires changes to proceed.Apr 19 2023, 3:07 PM
JiruiWu updated this revision to Diff 515261.Apr 20 2023, 2:31 AM
JiruiWu marked 2 inline comments as done.

Addressing review comments.

JiruiWu marked an inline comment as done.Apr 20 2023, 2:42 AM
JiruiWu added inline comments.
clang/lib/AST/RecordLayoutBuilder.cpp
2118

I think the logic here is correct.

clang/lib/CodeGen/TargetInfo.cpp
5814

This patch does not cover AAPCS_VFP because AAPCS_VFP is not listed in the ABIKind of the class AArch64ABIInfo.

Thank you. Per my comment here:

Also, AAPCS64 seems to define UnadjustedAlignment as the "natural alignment", and there's a doc comment saying it's the max of the type alignments. That makes me wonder if we should really be considering either the aligned attribute or #pragma pack in this computation at all; maybe we should just be looking at the type alignment.

Could you add tests for what happens when a struct is modified by the aligned attribute on a field and/or #pragma pack and make sure that we do... whatever the right thing here is?

JiruiWu updated this revision to Diff 531338.Jun 14 2023, 7:55 AM

Addressing review comments.

JiruiWu updated this revision to Diff 531339.Jun 14 2023, 7:59 AM
JiruiWu retitled this revision from [ARM] Fixing ABI mismatch for packed structs and fields to [ARM] Fixing ABI mismatch for packed structs passed as function arguments.
JiruiWu edited the summary of this revision. (Show Details)

Updating the commit message.

JiruiWu updated this revision to Diff 531375.Jun 14 2023, 9:10 AM

Adding precommit tests D152932.

tmatheson added inline comments.
clang/test/CodeGen/aarch64-ABI-align-packed.cpp
6

The filename and description do not reflect what this file is actually doing, which is specifically testing the alignment of the structs used for variable argument lists.

tmatheson added inline comments.Jun 15 2023, 6:03 AM
clang/test/CodeGen/aarch64-ABI-align-packed.cpp
6

Sorry out of date comment, ignore.

chill added a subscriber: chill.Jun 16 2023, 2:35 AM
chill added inline comments.
clang/lib/CodeGen/TargetInfo.cpp
5818–5819

No need to "alignment adjusted", just "HFA/HVA"

chill added inline comments.Jun 16 2023, 2:53 AM
clang/test/CodeGen/aarch64-ABI-align-packed.c
34 ↗(On Diff #531375)

Don't you mean "__attribute__((aligned(n))) cannot decrease the minimum required alignment" ?

JiruiWu marked an inline comment as done.Jun 16 2023, 3:19 AM
JiruiWu added inline comments.
clang/test/CodeGen/aarch64-ABI-align-packed.c
34 ↗(On Diff #531375)

I added this comment to explain that the natural alignment of the struct aligned_member_8 is 16-byte instead of 8-byte. In this test case the alignment of M0 is 16 bytes, which is above the minimum required alignment specified by __attribute__((aligned(8))).

tmatheson requested changes to this revision.Jun 16 2023, 3:20 AM

I think the current patch is wrong for a couple of reasons.

Firstly the data types being tested, e.g. struct S { int8x16_t m; } etc, are not just composite types, but HVAs:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#homogeneous-aggregates

A Homogeneous Aggregate is a composite type where all of the Fundamental Data Types of the members that compose the type are the same. The test for homogeneity is applied after data layout is completed and without regard to access control or other source language restrictions. Note that for short-vector types the fundamental types are 64-bit vector and 128-bit vector; the type of the elements in the short vector does not form part of the test for homogeneity.

So these are HVAs with Fundamental Data Type of 128-bit vector. This explains why the alignment changes get applied, because they are scoped to if(isHomogeneousAggregate) which we would not expect to apply for normal composite types.

Since these are HVAs the relevant AAPCS64 rules are different. Specifically
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#parameter-passing

B.3 If the argument type is an HFA or an HVA, then the argument is used unmodified.

would be the "first matching rule" and B6 (the "alignment of the copy is either 8 or 16" rule) would not be applied.

If anyone can confirm or correct my reading of the above that would be appreciated. The rules are so spread out it's hard to be confident that I've taken everything into account.

I haven't checked if the current alignment for these HVA types is correct yet.

This revision now requires changes to proceed.Jun 16 2023, 3:20 AM
chill added a comment.Jun 16 2023, 3:34 AM

I was just thinking to LGTM it :)

IMHO, the alignment adjustment happens because of C.4 (B.3 indeed leave the HFA/HVA unmodified).

C.4 If the argument is an HFA, an HVA, a Quad-precision Floating-point or short vector type then the NSAA is rounded up to the next multiple of 8 if its natural alignment is ≤ 8 or the next multiple of 16 if its natural alignment is ≥ 16.

Browsing the AAPCS HFA and HVA seem always treated the same, and looking at bool AArch64ABIInfo::isHomogeneousAggregateBaseType(QualType Ty) const it
recognized both FP types and 64- and 128- bit vectors, so we have uniform treatment there as well.

chill added inline comments.Jun 16 2023, 3:41 AM
clang/test/CodeGen/aarch64-ABI-align-packed.c
34 ↗(On Diff #531375)

Yes, so the __attribute__ does not actually set the minimum required alignment,
it sets the member alignment to the maximum of the natural and the specified alignment.

I was just thinking to LGTM it :)

IMHO, the alignment adjustment happens because of C.4 (B.3 indeed leave the HFA/HVA unmodified).

C.4 If the argument is an HFA, an HVA, a Quad-precision Floating-point or short vector type then the NSAA is rounded up to the next multiple of 8 if its natural alignment is ≤ 8 or the next multiple of 16 if its natural alignment is ≥ 16.

I think that C2 would be hit first, suggesting it should be allocated a SIMD register and alignment should be irrelevant, assuming sufficient registers:

C.2 If the argument is an HFA or an HVA and there are sufficient unallocated SIMD and Floating-point registers (NSRN + number of members ≤ 8), then the argument is allocated to SIMD and Floating-point registers (with one register per member of the HFA or HVA). The NSRN is incremented by the number of registers used. The argument has now been allocated.

If not enough registers, the size also needs rounded up:

C.3 If the argument is an HFA or an HVA then the NSRN is set to 8 and the size of the argument is rounded up to the nearest multiple of 8 bytes.

After that C4 would indeed be hit. However C4 differs from B6, in that C4 rounds up to the nearest multiple of 8 or 16 (which is not what the patch currently does) whereas B6 restricts it to either 6 or 16 (which this what this patch does, but shouldn't apply to HVAs).

The final rule that actually does the allocation is C6:

C.6 If the argument is an HFA, an HVA, a Half-, Single-, Double- or Quad- precision Floating-point or short vector type, then the argument is copied to memory at the adjusted NSAA. The NSAA is incremented by the size of the argument. The argument has now been allocated.

(This is all in reference to HVA types like struct { uint8x16_t m; };)

chill added a comment.Jun 16 2023, 6:32 AM

I was just thinking to LGTM it :)

IMHO, the alignment adjustment happens because of C.4 (B.3 indeed leave the HFA/HVA unmodified).

C.4 If the argument is an HFA, an HVA, a Quad-precision Floating-point or short vector type then the NSAA is rounded up to the next multiple of 8 if its natural alignment is ≤ 8 or the next multiple of 16 if its natural alignment is ≥ 16.

I think that C2 would be hit first, suggesting it should be allocated a SIMD register and alignment should be irrelevant, assuming sufficient registers:

Sure, but this is not relevant. We should output a correct alignstack attribute if in the end it turns out the argument needs to be allocated in memory. No harm done if we output the attribute, but the
argument ends up in registers.

C.2 If the argument is an HFA or an HVA and there are sufficient unallocated SIMD and Floating-point registers (NSRN + number of members ≤ 8), then the argument is allocated to SIMD and Floating-point registers (with one register per member of the HFA or HVA). The NSRN is incremented by the number of registers used. The argument has now been allocated.

If not enough registers, the size also needs rounded up:

C.3 If the argument is an HFA or an HVA then the NSRN is set to 8 and the size of the argument is rounded up to the nearest multiple of 8 bytes.

I believe that is handled in the backend, by allocating arguments to at least 8-byte aligned stack slots, e.g. here https://github.com/llvm/llvm-project/blob/459f495f49a197a042890e1daa0a98cbae892d2b/llvm/lib/Target/AArch64/AArch64CallingConvention.cpp#L200

After that C4 would indeed be hit. However C4 differs from B6, in that C4 rounds up to the nearest multiple of 8 or 16 (which is not what the patch currently does) whereas B6 restricts it to either 6 or 16 (which this what this patch does, but shouldn't apply to HVAs).

But there isn't any other power of two between 8 and 16.

tmatheson accepted this revision.Jun 16 2023, 6:53 AM

But there isn't any other power of two between 8 and 16.

Ok, I see where I was going wrong, misreading C4 (it's the stack address which is "next multiple of N", which applies an alignment of N).

In that case I don't have any more objections.

This revision now requires review to proceed.Jun 16 2023, 6:53 AM

The description/commit message should reflect the final reasoning behind the change.

chill added a comment.Jun 16 2023, 8:46 AM

Previously when a packed struct, containing vector data types such as
uint16x8_t, is passed as a function argument, the alignment of the
struct used by the function caller and the alignment used by the callee
to load the argument from stack does not match.

I would suggest adding tests with assembler output that show what is fixed (perhaps pre-committed).

chill added inline comments.Jun 26 2023, 4:06 AM
clang/lib/CodeGen/TargetInfo.cpp
5822

The backend ought to set the minimum alignment of a stack slot to 8 anyway (for AAPCS), hence setting the minimum here to 8 is redundant.

JiruiWu updated this revision to Diff 536158.Jun 30 2023, 2:57 AM
JiruiWu marked 3 inline comments as done.
JiruiWu edited the summary of this revision. (Show Details)

Addressing code review comments. Adding an assembly test to better demonstrate the effects of this patch.

JiruiWu marked an inline comment as done.Jun 30 2023, 3:06 AM
JiruiWu added inline comments.
clang/lib/CodeGen/TargetInfo.cpp
5822

This 8 is necessary. I tried to use 0 instead of 8 but clang set the alignment to 16 by default, which is wrong. Specifying the alignment to 8 here fixes the problem.

chill added a comment.Jul 10 2023, 2:38 AM

Following D148094 , the patch does not apply.

chill added inline comments.Jul 10 2023, 6:25 AM
clang/test/CodeGen/aarch64-ABI-align-packed-assembly.c
100 ↗(On Diff #536158)

Can we add some CHECK: lines here and to other variadic functions as well (I recognize it might not be straightforward)?

JiruiWu updated this revision to Diff 542809.Jul 21 2023, 2:07 AM
JiruiWu marked 2 inline comments as done.

Addressing code review comments.

clang/test/CodeGen/aarch64-ABI-align-packed-assembly.c
100 ↗(On Diff #536158)

I compared the assembly output before and after applying my patch, there is no change in this part and thus I think there is no need for additional CHECK: lines.

JiruiWu updated this revision to Diff 542880.Jul 21 2023, 6:00 AM

Rebasing the patch.

rjmccall accepted this revision.Jul 21 2023, 3:11 PM

Okay, LGTM.

This revision is now accepted and ready to land.Jul 21 2023, 3:11 PM
This revision was landed with ongoing or failed builds.Jul 26 2023, 9:32 AM
This revision was automatically updated to reflect the committed changes.