Page MenuHomePhabricator

[ARM] Enforcing calling convention for half-precision FP arguments and returns for big-endian AArch32
Needs ReviewPublic

Authored by pratlucas on Feb 26 2020, 5:19 AM.

Details

Summary

Half-precision floating point arguments and returns are currently
promoted to either float or int32 in clang's CodeGen. As such promotions
are implemented as coercion through memory in clang, aruments and
returns of those types end up stored on the wrong bits on big-endian
AArch32 - MSBs instead of LSBs.

This patch enforces AAPCS' directions, making sure clang stores those
types on the proper bits during coercion through memory for bit-endian
targets.

Event Timeline

pratlucas created this revision.Feb 26 2020, 5:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 26 2020, 5:20 AM
lattner resigned from this revision.Mar 12 2020, 6:33 PM

I'm not a qualified reviewer for this at this point.

Ugh. I'd hate to introduce yet another weird little tweak to ABIArgInfo that's used on exactly one target. For 16-bit composite types, we seem to coerce to a type like [1 x i32]; would that be okay here?

You don't have a test that checks that you get the IR you want.

Hi @rjmccall,
I agree those kind of tweaks do not look good. The issue here, though, is that argument coercion currently ignores the target's endian information when performing coercion through memory.
This happens for any type that requires memory coercion, so unfortunately using [1 x i32] does not do the trick.
Let me know if you have any other sugestions for handling this, I'd be glad to avoid the ABIArgInfo approach.

Hi @rjmccall,
I agree those kind of tweaks do not look good. The issue here, though, is that argument coercion currently ignores the target's endian information when performing coercion through memory.
This happens for any type that requires memory coercion, so unfortunately using [1 x i32] does not do the trick.

Oh, wait, AAPCS wants half values to be passed in the *least* significant bits of a GPR, even on big-endian machines? That's certainly more convenient, but it's a weird inconsistency with the otherwise iron rule of the calling convention, which that it's exactly as if you laid all of the arguments out in memory and then popped the first four 32-bit values off. We're talking about a calling convention here that literally skips registers in order to "align" arguments.

Can we not just coerce to i16? Will LLVM not pass an i16 in the least-significant bits of a register?

Oh, wait, AAPCS wants half values to be passed in the *least* significant bits of a GPR, even on big-endian machines? That's certainly more convenient, but it's a weird inconsistency with the otherwise iron rule of the calling convention, which that it's exactly as if you laid all of the arguments out in memory and then popped the first four 32-bit values off. We're talking about a calling convention here that literally skips registers in order to "align" arguments.

Can we not just coerce to i16? Will LLVM not pass an i16 in the least-significant bits of a register?

Yes, AAPCS specifies that they should go into the LSBs:

B.2 [...] If the argument is a Half-precision Floating Point Type its size is set to 4 bytes as if it had been copied to the least significant bits of a 32-bit register and the remaining bits filled with unspecified values.

Coercing to i16 solves it for the general case, when the argumetns are going into GPRs, but is not suficient when those arguments are required to go into FP registers - e.g. -mfloat-abi=hard.

Oh, wait, AAPCS wants half values to be passed in the *least* significant bits of a GPR, even on big-endian machines? That's certainly more convenient, but it's a weird inconsistency with the otherwise iron rule of the calling convention, which that it's exactly as if you laid all of the arguments out in memory and then popped the first four 32-bit values off. We're talking about a calling convention here that literally skips registers in order to "align" arguments.

Can we not just coerce to i16? Will LLVM not pass an i16 in the least-significant bits of a register?

Yes, AAPCS specifies that they should go into the LSBs:

B.2 [...] If the argument is a Half-precision Floating Point Type its size is set to 4 bytes as if it had been copied to the least significant bits of a 32-bit register and the remaining bits filled with unspecified values.

Coercing to i16 solves it for the general case, when the argumetns are going into GPRs, but is not suficient when those arguments are required to go into FP registers - e.g. -mfloat-abi=hard.

Why not just make half as an argument do the right thing for that case?

rudkx resigned from this revision.Mar 18 2020, 10:25 AM

I did some refactoring here years ago but I'm not that familiar with the ABIs or the handling in clang.

Why not just make half as an argument do the right thing for that case?

That would be the ideal approach, but currently there's a limitation on the backend's calling convention lowering that gets in the way.
The lowering of calls in SelectionDAGBuilder includes a target-independent step that is responsible for spliting or promoting each argument into "legal registers" and takes place before the targets' calling convention lowering.
As f16 is not a legal type on many of the AAPCS_VFP targets, it gets promoted to f32 before the target's lowering code has a chance to define how to handle it.
Ideally, this stpe should only take place if lowering calling conventions after type legalization - there's a FIXME there already capturing that -, but that would involve a major rewriting that would impact multiple targets.
Inserting a hacky target-dependent fix in this step also didn't look very good.
Do you see other alternatives for handling it? If not, which approach would you suggest?

pratlucas edited reviewers, added: asl; removed: lattner, rudkx.Tue, May 5, 2:21 AM

Ping.

Why not just make half as an argument do the right thing for that case?

That would be the ideal approach, but currently there's a limitation on the backend's calling convention lowering that gets in the way.
The lowering of calls in SelectionDAGBuilder includes a target-independent step that is responsible for spliting or promoting each argument into "legal registers" and takes place before the targets' calling convention lowering.
As f16 is not a legal type on many of the AAPCS_VFP targets, it gets promoted to f32 before the target's lowering code has a chance to define how to handle it.
Ideally, this stpe should only take place if lowering calling conventions after type legalization - there's a FIXME there already capturing that -, but that would involve a major rewriting that would impact multiple targets.
Inserting a hacky target-dependent fix in this step also didn't look very good.
Do you see other alternatives for handling it? If not, which approach would you suggest?

Would it be possible to pass a half argument and fix-it-up at CodeGenPrepare?

chill added a subscriber: chill.Tue, Jun 2, 8:25 AM