Page MenuHomePhabricator

[ARM] Fix over-alignment in arguments that are HA of 128-bit vectors

Authored by petpav01 on Jul 24 2018, 1:39 AM.



Code in CC_ARM_AAPCS_Custom_Aggregate() is responsible for handling homogeneous aggregates for CC_ARM_AAPCS_VFP. When an aggregate ends up fully on stack, the function tries to pack all resulting items of the aggregate as tightly as possible. Once the first item is laid out, the alignment used for consecutive items is the size of one item.

This logic goes wrong for 128-bit vectors because their alignment is normally only 64 bits, and so can result in inserting unexpected padding between the first and second element.


$ cat test.c
#include <arm_neon.h>

typedef struct {
  double A[4];
} S_d64_4;

typedef struct {
  uint32x4_t A[2];
} S_v128_2;

int foo(S_d64_4 P0, S_d64_4 P1, float P2, S_v128_2 P3) {
  // * P0 is passed in D0-D3.
  // * P1 is passed in D4-D7.
  // * P2 is passed in [SP, SP+4).
  // * P3.A[0] is passed in [SP+8, SP+24).
  // * P3.A[1] should be passed according to AAPCS in [SP+24, SP+40) but the
  //   code produced by Clang/LLVM expects it in [SP+32, SP+48).
  return vgetq_lane_u32(P3.A[0], 0) + vgetq_lane_u32(P3.A[1], 0);

$ clang -target arm-none-eabi -mcpu=cortex-a53 -S test.c -o -
        push    {r11, lr}
        mov     r11, sp
        sub     sp, sp, #8
        bfc     sp, #0, #4
        ldr     r0, [r11, #40]   /* load from entry-SP + #32 */
        ldr     r1, [r11, #16]   /* load from entry-SP + #8 */
        add     r0, r1, r0
        mov     sp, r11
        pop     {r11, pc}

The proposed patch fixes the problem by updating the alignment with the item size only if this results in reducing it.

Diff Detail


Event Timeline

petpav01 created this revision.Jul 24 2018, 1:39 AM
efriedma added inline comments.
279 ↗(On Diff #156977)

Could you hoist the std::min(Align, Size) out of the loop so it's clear it isn't changing every iteration?

petpav01 updated this revision to Diff 157430.Jul 26 2018, 1:13 AM

Thanks for having a look at this patch.

279 ↗(On Diff #156977)


This revision is now accepted and ready to land.Jul 26 2018, 2:01 PM
This revision was automatically updated to reflect the committed changes.