The Problem
The AAPCS defines a homogeneous aggregate (HA) as an aggregate type containing between one and four members, all of which are of the same machine type.
It also specifies that, for the AAPCS-VFP calling convention, there are situations in which a co-processor register candidate (CPRC) should be back-filled into an unallocated register with a lower number than an already-allocated register.
It also specifies that, for the AAPCS-VFP calling convention, an HA with a base type of float, double, 64-bit vector or 128-bit vector must be allocated in a contiguous block of VFP registers, and if that is not possible it is allocated on the stack.
However, clang currently converts function arguments with struct types to multiple arguments. This means that this C code:
struct s { float a; float b; }; void callee(float a, double b, struct s c);
gets translated to this IR:
define void @callee(float %a1, double %b2, float %c.0, float %c.1) #0 { ... }
Currently, llvm will allocate %a1 to register s0, %b1 to d1 (overlapping s2 and s3), %c.0 to s1 (backfilling the register), and %c.1 to s4. However, %c.0 and %c.1 are parts of the same HA, so must be allocated in a contiguous block of registers, in this example s4 and s5.
There is currently some code in clang which solves some HA-related problems by inserting dummy arguments to use up registers, preventing an HA being split between registers and the stack. While it may appear that the above problem could also be solved by inserting a padding argument to use up s1, consider the following C function signature:
struct s { float a; float b; }; void callee(float a, double b, struct s c, float d);
In this case, d must be back-filled into s1, so we cannot use a padding argument to fill up s1.
The Solution
My solution is to move the handling of HAs from clang to the llvm calling convention code. to do this, I have created a custom allocation function which is used for all members of an HA. It stores members in a list in CCState, and when it sees the last member of the HA it allocates the whole lot in one go, trying registers first and then falling back to the stack.
There is a related patch to clang which prevents the expansion of a struct-typed argument into it's constituent members, which is needed for LLVM to be able to identify a HA. There are comments in clang that say that some optimisations work better with simple types than structs, but I have not done any benchmarks to find out how significant this is. Because of this, I only prevent expansion of struct arguments when the function uses the AAPCS-VFP calling convention.
Typo: register of