This patch implements MC support for tail psuedo instruction.
A follow-up patch implements the codegen support as well as handling of the indirect tail pseudo instruction.
Details
Diff Detail
Event Timeline
Here's the patch to generate tail pseudo instruction: https://reviews.llvm.org/D45395
This patch would be better named 'Implement MC layer support for the tail pseudoinstruction' or similar. It should apply to the current head ,with D45395 rebased to apply on top of it. This is because MC layer support is a useful incremental change, and can be cleanly separated from the codegen changes (meaning we can revert one without affecting the other).
It looks like you'd just need to add the PseudoTAIL definition to RISCVInstrInfo.td and remove the PseudoTAILIndirect reference from this patch (it can be introduced in D45935).
lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp | ||
---|---|---|
153–155 | This line is probably going to be deleted in this patch anyway, but at least for the follow-up patch that includes PseudoTAILIndirect it might be better to have expandFunctionCall return the number of MIs inserted. |
Thanks for your comments @asb. There seems to be some problem with long double function arguments which I wanted to clarify.
The RISCV Calling Convention (https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf) says that for soft float abi, long doubles are passed in registers via reference:
In RV32, for example, the function double foo(int, double, long double) is passed its first argument in a0, its second argument in a2 and a3, and its third argument by reference via a4
However, consider this code:
__attribute__((noinline)) int bar(long double a) { printf("### Val L: %LF\n", a); return a; } __attribute__((noinline)) int foo() { long double a = 1; return bar(1); }
Here's the assembly for function foo:
Without tail call: _Z3foov: # @_Z3foov # %bb.0: # %entry addi sp, sp, -32 sw ra, 28(sp) lui a0, 262128 sw a0, 12(sp) sw zero, 8(sp) sw zero, 4(sp) sw zero, 0(sp) mv a0, sp call _Z3bare lw ra, 28(sp) addi sp, sp, 32 ret With tail call: _Z3foov: # @_Z3foov # %bb.0: # %entry addi sp, sp, -16 lui a0, 262128 sw a0, 12(sp) sw zero, 8(sp) sw zero, 4(sp) sw zero, 0(sp) mv a0, sp addi sp, sp, 16 tail _Z3bare
I do not understand why the long double argument is passed via stack. For tail call, this results in an incorrect value being printed inside function bar:
### Val: 0.000000
For double arguments, the assembly looks fine:
_Z3foov: # @_Z3foov # %bb.0: # %entry lui a1, 261888 mv a0, zero tail _Z3bard
These are the possibilities for parameter passing on RV32I/RV64I:
- Directly in a single register
- Directly in a pair of registers
- Directly in a register stack location
- Indirectly in a register
- Indirectly in a stack location
Because long double (fp128) and i128 are larger than 2*XLEN, they are passed indirectly and options 1-3 don't apply. Therefore the address of the value will be passed in a register, or if not available then the address is put on the stack. In order to pass indirectly, space on the stack often needs to be allocated in order to store the value. You're seeing this in the example code you pasted. You can see the logic for handling this in RISCVISelLowering.cpp if you grep for CCValAssign::Indirect.
Thanks for catching this issue. Rejecting tailcall optimisation when CCInfo.getNextStackOffset() != 0 isn't quit enough, we need to also reject it if any of the CCValAssign ArgsLocs are passed CCValAssign::Indirect (I can see the ARM's isEligibleForTailCall does that check too).
Thanks Alex for the explanation on indirect parameter passing. I have updated the condition in my patch and added a unit test.
test/CodeGen/RISCV/tail-calls.ll | ||
---|---|---|
1 | Note: I have limited this test only to riscv32. With riscv64 I get the following error for fp128 parameter: I guess I can restrict only the failing test to riscv32 but I felt it was cleaner this way. We could enable this for riscv64 once the support for fp128 in riscv64 is more mature. @asb What do you suggest? |
Thanks Mandeep, just a few minor cleanups
lib/Target/RISCV/RISCVISelLowering.cpp | ||
---|---|---|
1546–1547 | This change is needed for codegen rather than MC and should go in D45395 | |
lib/Target/RISCV/RISCVISelLowering.h | ||
32 | This change is needed for codegen rather than MC and should go in D45395 | |
lib/Target/RISCV/RISCVInstrInfo.td | ||
41–43 | This change is needed for codegen rather than MC and should go in D45395 | |
666 | Do we actually need Uses = [X2]? | |
672–675 | This change is needed for codegen rather than MC and should go in D45395 | |
test/MC/RISCV/tail-call-invalid.s | ||
1 ↗ | (On Diff #145065) | Please add a riscv64 RUN line |
test/MC/RISCV/tail-call.s | ||
7 ↗ | (On Diff #145065) | Please add riscv64 RUN lines |
lib/Target/RISCV/RISCVInstrInfo.td | ||
---|---|---|
666 | I decided to align this with ARM which has Uses = [SP] in its definition of Tail. |
lib/Target/RISCV/RISCVInstrInfo.td | ||
---|---|---|
667 | If I remove the definition for the Tail node then it cannot determine hasSideEfects, mayStore and mayLoad properties: def Tail : SDNode<"RISCVISD::TAIL", SDT_RISCVCall, [SDNPHasChain, SDNPOptInGlue, SDNPOutGlue, SDNPVariadic]>; So I had to explicitly specify them here. We can remove these in the next patch which includes the definition for Tail. |
This line is probably going to be deleted in this patch anyway, but at least for the follow-up patch that includes PseudoTAILIndirect it might be better to have expandFunctionCall return the number of MIs inserted.