This is an archive of the discontinued LLVM Phabricator instance.

ARM: Homogeneous aggregates must be allocated to contiguous registers
ClosedPublic

Authored by olista01 on Mar 14 2014, 9:36 AM.

Details

Reviewers
rengolin
jmolloy
Summary

The Problem

The AAPCS defines a homogeneous aggregate (HA) as an aggregate type containing between one and four members, all of which are of the same machine type.

It also specifies that, for the AAPCS-VFP calling convention, there are situations in which a co-processor register candidate (CPRC) should be back-filled into an unallocated register with a lower number than an already-allocated register.

It also specifies that, for the AAPCS-VFP calling convention, an HA with a base type of float, double, 64-bit vector or 128-bit vector must be allocated in a contiguous block of VFP registers, and if that is not possible it is allocated on the stack.

However, clang currently converts function arguments with struct types to multiple arguments. This means that this C code:

struct s { float a; float b; };
void callee(float a, double b, struct s c);

gets translated to this IR:

define void @callee(float %a1, double %b2, float %c.0, float %c.1) #0 {
...
}

Currently, llvm will allocate %a1 to register s0, %b1 to d1 (overlapping s2 and s3), %c.0 to s1 (backfilling the register), and %c.1 to s4. However, %c.0 and %c.1 are parts of the same HA, so must be allocated in a contiguous block of registers, in this example s4 and s5.

There is currently some code in clang which solves some HA-related problems by inserting dummy arguments to use up registers, preventing an HA being split between registers and the stack. While it may appear that the above problem could also be solved by inserting a padding argument to use up s1, consider the following C function signature:

struct s { float a; float b; };
void callee(float a, double b, struct s c, float d);

In this case, d must be back-filled into s1, so we cannot use a padding argument to fill up s1.

The Solution

My solution is to move the handling of HAs from clang to the llvm calling convention code. to do this, I have created a custom allocation function which is used for all members of an HA. It stores members in a list in CCState, and when it sees the last member of the HA it allocates the whole lot in one go, trying registers first and then falling back to the stack.

There is a related patch to clang which prevents the expansion of a struct-typed argument into it's constituent members, which is needed for LLVM to be able to identify a HA. There are comments in clang that say that some optimisations work better with simple types than structs, but I have not done any benchmarks to find out how significant this is. Because of this, I only prevent expansion of struct arguments when the function uses the AAPCS-VFP calling convention.

Diff Detail

Event Timeline

asl added a comment.Mar 14 2014, 1:10 PM

This looks like a pretty big hammer to solve the problem. It also doubled the attribute size just for one (rare) and target-specific case...

In your example:

In this case, d must be back-filled into s1, so we cannot use a padding argument to fill up s1.

The problem can be solved by moving the order of the arguments, so requires just careful backfilling either with dummy or real arguments. What can't it be implemented that way?

I agree with Anton, it seems possible to do that on the backfilling code in Clang.

I would prefer to think of this as a refactoring to move the existing logic for handling HAs into the backend. It looks like it would be possible to solve this by re-ordering the arguments, but any other frontend which wants to generate ABI-compliant code would have to implement the same process.

It also doubled the attribute size just for one (rare) and target-specific case...

I'm not sure what you mean by this. If you are referring to ArgFlagsTy, I re-used the top two bits from the ByValSize field, so the size stays the same.

Hi Oliver,

I'm in favour of having a PCS helper in the IRBuilder in any form or shape, but that's orthogonal to any front-end's inability to produce correct ABI code. I wouldn't want a new, generic PCS helper to be moulded based on a single ARM HA issue, but to be developed from scratch, with most PCSs in mind, including x86, Mips, PPC, AArch64, etc.

In that sense, I agree with Anton that this *very* specific problem sould be fixed in the front-end, as it has historically being done by ARM-compatible front-ends (llvm-gcc, clang, our own EDG bridge, and others), and a more generic approach should be taken to a wider audience, mixing Clang and LLVM developers in a "grand design".

In the end, other front-ends will have to adapt to your own implementation's specific details anyway, and it doesn't matter what kind of specific behaviour for the front-end engineer, as long as it works. The only better scenario is when there is *NO* PCS specific knowledge in a function declaration, and all of it is done by the PCS helper. Anything in between will be just another shoddy contract.

I do want to see that happening, but not starting from a corner case.

cheers,
--renato

I'm in favour of having a PCS helper in the IRBuilder in any form or shape, but that's orthogonal to any front-end's inability to produce correct ABI code. I wouldn't want a new, generic PCS helper to be moulded based on a single ARM HA issue, but to be developed from scratch, with most PCSs in mind, including x86, Mips, PPC, AArch64, etc.

I'm not trying to create a more generic system for handling PCSs, and have tried to minimise the amount of code in target-independent places. If you have any suggestions to further reduce the target-independent code I have added, that would be appreciated.

In that sense, I agree with Anton that this *very* specific problem sould be fixed in the front-end, as it has historically being done by ARM-compatible front-ends (llvm-gcc, clang, our own EDG bridge, and others), and a more generic approach should be taken to a wider audience, mixing Clang and LLVM developers in a "grand design".

My understanding of the reason for putting the PCS code in clang is that it is required because the LLVM backend cannot handle all types correctly. This patch expands the set of types that can be handled by the ARM backend, so some of the ARM-specific clang code is no longer necessary. Having two separate bits of code allocating arguments to registers has never struck me as a particularly robust design.

In the end, other front-ends will have to adapt to your own implementation's specific details anyway, and it doesn't matter what kind of specific behaviour for the front-end engineer, as long as it works. The only better scenario is when there is *NO* PCS specific knowledge in a function declaration, and all of it is done by the PCS helper. Anything in between will be just another shoddy contract.

This patch strictly increases the set of types that the ARM backends can handle without help, so other frontends will continue to work as they currently do without changes. To generate ABI-compliant code, they still have the option to re-order arguments to get the correct back-filling, but this patch allows them to simply emit an argument with struct type, and it will be handled correctly.

I agree that needing no PCS knowledge to create a function definition would be ideal, but that would require a major change given that some PCSs depend on source-language details that will not always be unambiguously represented in IR. However, this does not mean that there is no benefit to reducing the amount of PCS knowledge in the frontend, when it could instead be handled by the target-specific backend.

I do want to see that happening, but not starting from a corner case.

I would also like to see that happen, but that is not what I am trying to do here.

Hi James,

I'm not advocating for piling up more hacks, because, even though this request implements the feature in the back-end (thus freeing the front-end from architectural decisions regarding HA on ARM), it will not free the front-end from architectural decisions regarding HA on other platforms, or anything else in them for that matter.

It'll also introduce code in the back-end to deal with an already existing hack (splitting structures into arguments), and once PCS helpers are in place, this code will be dead, as much as all other hacks in the front-end. Even though other front-ends can continue hacking the argument list as they are to deal with this PCS problem, new front-ends will still have to split them into arguments and maybe re-order them, which is still a hack.

My only point is to keep hacks in the front-end until such a time when someone creates a PCS helper, rather than adding hacks all over the place. The former will make it easier to spot all the hacks until such day comes. This also seemed to be the general consensus for a number of years.

But that's just a weak opinion, and I won't oppose to this change if other people feel it's the right way.

cheers,
--renato

I'm not advocating for piling up more hacks, because, even though this request implements the feature in the back-end (thus freeing the front-end from architectural decisions regarding HA on ARM), it will not free the front-end from architectural decisions regarding HA on other platforms, or anything else in them for that matter.

Just because this doesn't fix all problems, I don't think that is a reason not do do it.

It'll also introduce code in the back-end to deal with an already existing hack (splitting structures into arguments), and once PCS helpers are in place, this code will be dead, as much as all other hacks in the front-end. Even though other front-ends can continue hacking the argument list as they are to deal with this PCS problem, new front-ends will still have to split them into arguments and maybe re-order them, which is still a hack.

The code to split structs up into multiple arguments already exists in both clang and the backend, though I think the LLVM version will also split other types larger than one register, at least on some architectures. The whole point of this patch is that front-ends will no longer have to split struct arguments into multiple arguments (at least for ARM, I'm not familiar enough with other calling conventions to claim anything more general), and will not have to do any re-ordering.

My only point is to keep hacks in the front-end until such a time when someone creates a PCS helper, rather than adding hacks all over the place. The former will make it easier to spot all the hacks until such day comes. This also seemed to be the general consensus for a number of years.

I understand your point about keeping the hacks together, assuming that there is a plan to remove them altogether, but surely it is better to fix the original problem (in this case, support for a limited set of types in the backend calling convention lowering) rather than pile on more hacks.

Has there been any discussion about changing/improving the way we handle calling conventions? I can't find anything in the mailing list archives, maybe it happened on IRC?

Has there been any discussion about changing/improving the way we handle calling conventions? I can't find anything in the mailing list archives, maybe it happened on IRC?

At least when Clang was introduced, many many years ago, and when we did our EDG front-end in 2010. But there might have been more...

You might be right. I'll defer the decision to others...

--renato

I really like the idea of the backend being able to deal with this too. It's just so tempting with how close LLVM IR is to the AAPCS.

The patch seems superficially good to me as well, I'll take a more thorough look soon.

Tim.

asl added inline comments.Apr 29 2014, 4:21 PM
test/CodeGen/ARM/2014-03-04-hfa-in-contiguous-registers.ll
56

Add CHECK-NOT variant here using the currently generated code as a pattern

olista01 updated this revision to Diff 8972.Apr 30 2014, 5:59 AM

Added CHECK-NOT lines to the test, to make it clear what the old, incorrect behaviour was.

jmolloy added a subscriber: jmolloy.May 9 2014, 2:43 AM

Hi Oliver,

Overall this looks fine to me. I have a bunch of coding style comments, and I think given that you're adding a new argument attribute we need IR-level tests to check that the attribute can be parsed and emitted both as IR and as bitcode.

Cheers,

James

include/llvm/CodeGen/CallingConvLower.h
340

Typo: register of

include/llvm/Target/TargetLowering.h
2126

I don't like the name of this function: "functionArgumentNeedsConsecutiveRegisters" might be more explanatory?

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
7138

Braces not needed here.

7141

Local variables are UpperCamelCased.

lib/Target/ARM/ARMCallingConv.h
192

Please add a '&& "explanatory text!"' to the assert.

216

RegResult

233

Braces not needed

240

Could you simplify this by using MVT::getSizeInBits()?

unsigned Size = LocVT.SimpleTy.getSizeInBits() / 8;
unsigned Align = LocVT.SimpleTy == MVT::v2f64 ? 8 : Size;
256

Should be able to use a range loop here now that we're using C++11.

lib/Target/ARM/ARMISelLowering.cpp
1288

Why are you converting fastcc functions to AAPCS_VFP here?

test/CodeGen/ARM/2014-03-04-hfa-in-contiguous-registers.ll
2

The file name shouldn't contain the date any more, we stopped this a while ago.

The "attribute" that I added is in ISD::ArgFlagsTy (see include/llvm/Target/TargetCallingConv.h), not an IR attribute, so I can't add IR or bitcode tests for it.

lib/Target/ARM/ARMISelLowering.cpp
1288

This is the same behaviour as the old code. It has to be done here, instead of in CCAssignFnForNode, as getFuncArgNeedsRegBlock needs to know if a function will be treated as AAPCS_VFP, even if the function itself is marked as CallingConv::Fast or CallingConv::C.

jmolloy edited edge metadata.May 9 2014, 6:38 AM

The "attribute" that I added is in ISD::ArgFlagsTy (see include/llvm/Target/TargetCallingConv.h), not an IR attribute, so I can't add IR or bitcode tests for it.

D'oh! OK, thanks. All I've got left is the style issues then. Once those are fixed it's good to commit, as far as I'm concerned.

olista01 updated this revision to Diff 9255.May 9 2014, 6:39 AM
olista01 edited edge metadata.
jmolloy accepted this revision.May 9 2014, 6:41 AM
jmolloy edited edge metadata.

Hi Oliver,

This new revision looks good to me. Other reviewers have had plenty of time to object and they haven't, so please go ahead and commit.

Cheers,

James

This revision is now accepted and ready to land.May 9 2014, 6:41 AM
rengolin accepted this revision.May 9 2014, 6:47 AM
rengolin added a reviewer: rengolin.

I agree with James. Everything else can be done post-commit.

--renato

olista01 closed this revision.May 9 2014, 7:09 AM

Thanks, committed revision 208413.