This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Implement ABI lowering
ClosedPublic

Authored by asb on Nov 14 2017, 5:58 AM.

Details

Summary

RISCVABIInfo is implemented in terms of XLen, supporting both RV32 and RV64. Unfortunately we need to count argument registers in the frontend in order to determine when to emit signext and zeroext attributes. Integer scalars are extended according to their type up to 32-bits and then sign-extended to XLen when passed in registers, but are anyext when passed on the stack. This patch only implements the base integer (soft float) ABIs.

For more information on the RISC-V ABI, see the ABI doc, my golden model, and the LLVM RISC-V calling convention patch (specifically the comment documenting frontend expectations).

It was necessary to modify CodeGenModule::ConstructAttributeList to ensure that signext is emitted for int32_t/uint32_t return values. Mips is the only other implementer of shouldSignExtUnsignedType but is unaffected, as unlike classifyArgumentType, classifyReturnType will not extend i32 values (@sdardis: is this a bug?).

@chandlerc, if you could help nominate appropriate reviewers I would appreciate it. I've CCed in those who are doing work on/with RISC-V LLVM and are well placed to review whether the RISC-V ABI and calling convention is implemented faithfully.

Diff Detail

Repository
rL LLVM

Event Timeline

asb created this revision.Nov 14 2017, 5:58 AM
efriedma added a subscriber: efriedma.

You need more test coverage for the cases where arguments end up on the stack. And some test coverage for varargs calls.

lib/CodeGen/TargetInfo.cpp
8858 ↗(On Diff #122827)

It looks like the ABI says there's a special rule for varargs here?

8914 ↗(On Diff #122827)

The type of va_list and the behavior of va_arg should be documented in the ABI doc... but apparently it's missing at the moment? (I mean, this is the obvious implementation from what is mentioned, but it would be nice to see something more explicit.)

test/CodeGen/riscv32-abi.c
118 ↗(On Diff #122827)

Please clarify this comment; the ABI doc doesn't say anything about aligning arguments whose alignment is 2✕XLEN, except in the case of varargs.

asb updated this revision to Diff 123020.Nov 15 2017, 6:42 AM
asb marked an inline comment as done.

Updated to address review comments. I've added some extra test coverage that demonstrates that argument lowering happens the same once registers are exhausted, as well as more coverage around varargs. Also updated to properly handle the "aligned register pair" rule for variadic arguments, and added tests for this.

lib/CodeGen/TargetInfo.cpp
8858 ↗(On Diff #122827)

You're right, I neglected vararg calls. Now addressed and test cases added. The difference is only observable on RV64 I believe (as we only use the GPR tracking to decide whether an argument should be extended or not, and the argument promotion rules mean you'll never see signext/zeroext on varargs in RV32).

8914 ↗(On Diff #122827)

I've started an attempt to better document things https://github.com/riscv/riscv-elf-psabi-doc/pull/60

test/CodeGen/riscv32-abi.c
118 ↗(On Diff #122827)

I've corrected the comment to say "if passed on the stack" and submitted a PR to clarify the ABI doc https://github.com/riscv/riscv-elf-psabi-doc/pull/59

Mips is the only other implementer of shouldSignExtUnsignedType but is unaffected, as unlike classifyArgumentType, classifyReturnType will not extend i32 values (@sdardis: is this a bug?).

I believe this is an oversight that is addressed in the backend. The MIPS64 backend treats 'trunc i64 %x to i32' as '(SLL (EXTRACT_SUBREG GPR64:$src, sub_32), 0)', which is required to match MIPS32 arithmetic works on MIPS64.

I will take a longer look at this issue.

Thanks.

efriedma added inline comments.Nov 15 2017, 11:35 AM
lib/CodeGen/TargetInfo.cpp
8913 ↗(On Diff #123020)

The spec says "Aggregates larger than 2✕XLEN bits are passed by reference and are replaced in the argument list with the address". That's not byval.

mgrang added inline comments.Nov 15 2017, 4:35 PM
lib/CodeGen/TargetInfo.cpp
8845 ↗(On Diff #123020)

Comment sounds incomplete.

8943 ↗(On Diff #123020)

Parentheses around 2 * XLen can be skipped.

rjmccall added inline comments.Nov 16 2017, 4:38 PM
lib/CodeGen/CGCall.cpp
1937 ↗(On Diff #123020)

I feel like a better design would be to record whether to do a sext or a zext in the ABIArgInfo. Add getSignExtend and getZeroExtend static functions to ABIArgInfo and make getExtend a convenience function that takes a QualType and uses its signedness.

lib/CodeGen/TargetInfo.cpp
8790 ↗(On Diff #123020)

Is "XLen" a term that anyone working with RISCV would recognize? Because even if it is, it feels like something you should document here — just "standard pointer size in bits" would be fine.

8858 ↗(On Diff #123020)

I would encourage you to use CharUnits and getTypeSizeInChars more consistently in this code; it would simplify some of the conversions you're doing and eliminate some of the risk of forgetting a bits-to-bytes conversion. It looks like storing XLen as a CharUnits would also make this easier.

mgrang added inline comments.Nov 20 2017, 5:02 PM
lib/CodeGen/TargetInfo.cpp
8790 ↗(On Diff #123020)

+1

8821 ↗(On Diff #123020)

I think it's better to use parenthesis here to aid readability and avoid ambiguity?

8824 ↗(On Diff #123020)

I observed that you use RISCV and RISC-V interchangeably in comments. This is not a problem, per se but can make this uniform if we want to be *really* particular about this :)

apazos added inline comments.Nov 22 2017, 9:38 AM
lib/CodeGen/TargetInfo.cpp
8872 ↗(On Diff #123020)

Suggestion to make 1 default value when you declare the var

8938 ↗(On Diff #123020)

The size -> The type

So how does something like the following work:

union U { float f; int i; };
void f(union U u);

The flattening described in https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#hardware-floating-point-calling-convention makes sense for arrays and structs but I couldn't find where unions were described.

asb added a comment.Nov 23 2017, 12:33 AM

So how does something like the following work:

union U { float f; int i; };
void f(union U u);

The flattening described in https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#hardware-floating-point-calling-convention makes sense for arrays and structs but I couldn't find where unions were described.

You're right that's not currently described - I have an issue tracking this here: https://github.com/riscv/riscv-elf-psabi-doc/issues/24. Last time I tried to check gcc behaviour it seemed that such cases would be passed according to the integer calling convention, but I'd be happier if one of the GCC devs would confirm.

In D40023#933466, @asb wrote:

So how does something like the following work:

union U { float f; int i; };
void f(union U u);

The flattening described in https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md#hardware-floating-point-calling-convention makes sense for arrays and structs but I couldn't find where unions were described.

You're right that's not currently described - I have an issue tracking this here: https://github.com/riscv/riscv-elf-psabi-doc/issues/24. Last time I tried to check gcc behaviour it seemed that such cases would be passed according to the integer calling convention, but I'd be happier if one of the GCC devs would confirm.

Should we have a test which tries to make sure our behavior matches GCC's? ISTM that floats in unions are always in GPRs (at least according to abicop).

How are empty unions/arrays handled? Like a function which takes no args? abicop seemed upset when I asked it:

>>> m.call([Union()])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "~/majnemer/abicop/rvcc.py", line 186, in __init__
    self.alignment = max(m.alignment for m in members)
ValueError: max() arg is an empty sequence

It'd also be good to have a test for bool and tests for bitfields. Am I correct that struct S { int x : 2; int y : 2; }; would get passed in two GPRs which were sign extended from 2 to 32?

Hi Alex, just a reminder, it looks like Eli's and David's comments have not been addressed yet.

asb updated this revision to Diff 129607.Jan 12 2018, 5:53 AM
asb marked 9 inline comments as done.

I've addressed all outstanding comments (a number of response are inline).

Additionally:

Floats in unions are always passed in GPRs (as clarified here and here. However this patch is only concerned with the soft-float ABI, so the clarification doesn't directly affect us.

I've added tests for _Bool and empty structs and unions. GCC ignores empty structs and unions (it previously didn't mention unions but this PR will clarify).

I've pinged the issue on the ABI documentation repo regarding bit fields, hopefully one of the GCC developers can comment to clarify what they're doing. I'd hope we can return to bitfields in a follow-on patch.

asb added inline comments.Jan 12 2018, 5:53 AM
lib/CodeGen/CGCall.cpp
1937 ↗(On Diff #123020)

I could see how that might be cleaner, but that's a larger refactoring that's going to touch a lot more code. Are you happy for this patch to stick with this more incremental change (applying the same sign-extension logic to return values as is used for arguments), and to leave your suggested refactoring for a future patch?

lib/CodeGen/TargetInfo.cpp
8824 ↗(On Diff #123020)

Slightly different contexts. 'RISC-V' is the proper name of the target but 'RISCV' is the spelling of the LLVM target implementation. For this file at least, I think it makes sense.

8858 ↗(On Diff #123020)

XLen is defined throughout the RISC-V ISA and ABI documentation as the width of the integer ('x') registers in bits. I've modified EmitVAArg to make use of CharUnits to avoid a conversion - there don't seem to be any other bit/byte conversions I can see.

8913 ↗(On Diff #123020)

The LLVM backend lowers byval in that way. Given that at this point we have a trivial data type (plain struct or similar) which is copied-by-value by C semantics, the only difference is whether the generated IR indicates implicit copying with 'byval' or relies on the caller explicitly doing the copy. For some reason I thought 'byval' was preferred, but looking again it seems uncommon to do it this way. I've changed it to false - thanks for spotting this.

rjmccall added inline comments.Jan 12 2018, 8:42 AM
lib/CodeGen/CGCall.cpp
1937 ↗(On Diff #123020)

I won't insist that you do it, but I don't think this refactor would be as bad as you think. Doing these refactors incrementally when we realize that the existing infrastructure is failing us in some way is how we make sure they actually happen. Individual contributors rarely have any incentive to ever do that "future patch".

lib/CodeGen/TargetInfo.cpp
8858 ↗(On Diff #123020)

Okay, I can accept that, thanks.

asb added inline comments.
lib/CodeGen/CGCall.cpp
1937 ↗(On Diff #123020)

I've submitted this refactoring in D41999.

rjmccall added inline comments.Jan 12 2018, 10:25 AM
lib/CodeGen/CGCall.cpp
1937 ↗(On Diff #123020)

Much appreciated, thanks!

asb updated this revision to Diff 129683.Jan 12 2018, 12:24 PM
asb marked 8 inline comments as done.

Rebase after ABIArgInfo signext/zeroext refactoring D41999 / rL322396. We no longer need to modify CGCall.cpp for unsigned 32-bit return values to be sign extended as required by the RV64 ABI.

rjmccall accepted this revision.Jan 12 2018, 1:02 PM

Thanks. LGTM, but you should wait for Eli's sign-off, too.

This revision is now accepted and ready to land.Jan 12 2018, 1:02 PM
efriedma accepted this revision.Jan 12 2018, 1:13 PM

LGTM

Not sure if anyone's mentioned it yet, but there's a C ABI testing tool in clang/utils/ABITest/ which you'll probably want to try at some point.

lib/CodeGen/TargetInfo.cpp
8913 ↗(On Diff #123020)

"byval" generally means the value is memcpy'ed into the argument list (so there is no pointer in the argument list). This is useful for handling C calling conventions which allow excessively large structs to be passed in the argument list, so the backend can emit a memcpy rather than expanding the operation into straight-line code. The RISCV backend handling of byval is wrong, in the sense that it isn't consistent with what any other backend does.

This isn't relevant to the RISC-V C calling convention, but you should probably fix the backend at some point to avoid future confusion.

asb added inline comments.Jan 13 2018, 1:54 AM
lib/CodeGen/TargetInfo.cpp
8913 ↗(On Diff #123020)

If I understand your concern correctly, it's that the RISCV backend passes a pointer to the byval copied argument rather than passing it direct on the stack? Isn't that consistent with what the Sparc, Lanai and the PPC backends do? It also seems consistent with my reading of the description of the byval attribute in the langref - a hidden copy is made. Whether that hidden copy is passed direct or a pointer to it is passed (which is consistent with the RISC-V calling convention for large values) seems an orthogonal concern.

This revision was automatically updated to reflect the committed changes.
asb added a comment.Jan 15 2018, 11:21 AM

The riscv32-abi and risc64-abi tests are failing (specifically the vararg checks) on the clang-with-thin-lto and modules-slave-2 buildbots:
http://lab.llvm.org:8011/builders/clang-with-thin-lto-ubuntu/builds/7892
http://lab.llvm.org:8011/builders/clang-x86_64-linux-selfhost-modules-2/builds/15300

Other buildbots appear fine. Before I revert, does anyone have any ideas what the problem might be? Obviously the tests pass on my local machine, plus a range of other builders.

Looks like it fails with assertions disabled; no-asserts builds disable value labels, so the IR output looks a little different.

efriedma added inline comments.Jan 15 2018, 12:33 PM
lib/CodeGen/TargetInfo.cpp
8913 ↗(On Diff #123020)

I can see SPARC does in fact pass byval arguments like you said (and I'll assume Lanai behaves the same way). PowerPC doesn't, though, as far as I can tell (there are a bunch of PowerPC variants, but at least the 64-bit Linux ABI requires byval to pass arguments directly).

Yes, how exactly arguments are passed is not visible to the optimizer, and therefore is formally outside the scope of LangRef, but a byval which doesn't pass arguments directly is essentially useless.

asb added a comment.Jan 15 2018, 1:44 PM

Thanks Eli, that saved me some time - fixed in rL322514 (clang-x86_64-linux-selfhost-modules-2 is now green).

lib/CodeGen/TargetInfo.cpp
8913 ↗(On Diff #123020)

I was looking at PPCTargetLowering::LowerCall_32SVR4, seems it's different for PPC64 as you say.