This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Lower the tail pseudoinstruction
ClosedPublic

Authored by mgrang on Apr 6 2018, 4:07 PM.

Details

Summary

This patch lowers the tail pseudoinstruction. This has been modeled after ARM's tail call opt.

Diff Detail

Event Timeline

mgrang created this revision.Apr 6 2018, 4:07 PM
mgrang added inline comments.Apr 6 2018, 6:20 PM
lib/Target/RISCV/RISCVISelLowering.cpp
966

TODO: I couldn't write a test case to simulate this.

@asb What calling convention could result in a difference between caller and callee saved regs?

mgrang added inline comments.Apr 7 2018, 12:51 PM
lib/Target/RISCV/RISCVRegisterInfo.td
134

Can we also use x28-x31 (Caller temporaries) here?

asb added inline comments.Apr 12 2018, 6:32 AM
lib/Target/RISCV/RISCVISelLowering.cpp
966

Right now, the calling convention is always the same so this isn't testable. It's helpful to be future-proof though.

HsiangKai requested changes to this revision.Apr 16 2018, 12:01 AM
HsiangKai added a subscriber: HsiangKai.

I think it should be a typo.

lib/Target/RISCV/RISCVISelLowering.cpp
1338

Should it be MemOpChains.push_back(DAG.getStore(... ?

This revision now requires changes to proceed.Apr 16 2018, 12:01 AM

Two questions.

lib/Target/RISCV/RISCVISelLowering.cpp
941

Why? In ARM tailcall opt, it could not exceed the number of arguments due to it has no available registers for indirect call under thumb1 instructions. In RISCV, we still have temporary registers to use when the number of arguments exceeds 8.

lib/Target/RISCV/RISCVRegisterInfo.td
134

Could we also use x10-x17 (Argument registers)?

I think we could let RA use any registers except callee-saved registers and x0-x4.

We could not use callee-saved registers, because these registers will be restored before tailcall and the address register will be clobbered.
We could not use x0-x4, because these registers have special meaning in RISCV architecture.

asb added a comment.Apr 24 2018, 8:30 AM

It looks like this must be applied on top of D44885. Could you please rebase on to the latest version of that patch? It would also be useful to use 'Edit Related Revisions...' to indicate the in-flight patch(es) that this depends on.

In D45395#1076818, @asb wrote:

It looks like this must be applied on top of D44885. Could you please rebase on to the latest version of that patch? It would also be useful to use 'Edit Related Revisions...' to indicate the in-flight patch(es) that this depends on.

Rebasing on top of D44885 seems to have broken tail call opt. I am debugging right now. Will push the updated patch once I have the fix.

mgrang updated this revision to Diff 144047.Apr 25 2018, 6:15 PM

Rebased patch on top of D44885.

asb added a comment.Apr 27 2018, 9:16 AM

Could you please separate the MC layer change into a separate patch?

It looks like there are cases where the stack is used for parameter passing and isEligibleForTailCallOptimization returns true, but there is no test coverage for these cases. Do you intend to enable tailcall optimisation for these cases? Might it be worth starting off by just enabling it for calls that make no use of the stack for parameter passing, thus simplifying the patch and the testing story?

lib/Target/RISCV/RISCVISelLowering.cpp
1102–1106

Handling ABI lowering for RV32D is rather fiddly, and the code that handles this breaks the assumption that one entry in Outs == one register (we use an approach similar to Mips, introducing buildpairf64 and splitf64 nodes to help workaround the fact that legalising and splitting and f64 to two i32 at ABI lowering time is _really_ fiddly when i64 isn't a legal type).

I think the safest way of checking eligibility is to pass a CCState to this function. If CCInfo.getNextStackOffset() == 0 then we know the stack isn't used.

Even without the RV32D case, you might have large values (e.g. i128 on RV32) that are passed via the stack. I'm assuming the intention is to not to tailcall if the stack is used at all, in which case getNextStackoffset is the better test.

1111–1112

The "interrupt" attribute isn't currently defined by RISC-V. At a minimum, it would be worth adding a note that this should be expanded as new function attributes are introduced.

I assume it would be too restrictive to have a whitelist of allowable attributes rather than a blacklist of attributes that disable tailcalling?

1116

Very minor, but it seems inconsistent that IsCalleeStructRet is calculate in the LowerCall but other properties such as CallerF.hasStructRetAttr or the presence of byval attributes are calculated here. I'd get rid of the IsCalleeStructRet argument.

1204

Also see my query in isEligibleForTailOptimization. If the intent is that tail calls aren't used when the stack is used, NumBytes should always be zero.

lib/Target/RISCV/RISCVInstrInfo.td
667–676

The correct name for this pseudoinstruction is "tail" (as used by gas and described in the ISA manual).

669

PseudoTAIL would make sense, and matches then naming convention for PseudoCALL/PseudoCALLIndirect.

673–675

This is going to have problems as currently defined. It assumes that you can specify tail $reg and have it work. As we have to always interpret an argument to tail as a symbol, this is going to lead to either incorrectly linked code or an error about an undefined symbol. Defining PseudoTAILIndirect with no AsmString might be a viable option?

mgrang updated this revision to Diff 144422.Apr 27 2018, 5:04 PM

Addressed comments and split the MC part into a separate patch.

mgrang edited the summary of this revision. (Show Details)Apr 27 2018, 5:13 PM
asb added a comment.Apr 30 2018, 5:33 AM

Thanks Mandeep. I'll re-review properly once this is rebased against D46221.

lib/Target/RISCV/RISCVISelLowering.cpp
1333–1338

Isn't this code dead now that tail calls are never used when parameters are passed on the stack?

lib/Target/RISCV/RISCVRegisterInfo.td
134

I agree with @HsiangKai, it seems as though any register that isn't callee-saved (or otherwise has a special meaning) should be safe to use. Or at least, all such registers will avoid the problem described in the comment.

mgrang updated this revision to Diff 144922.May 2 2018, 1:14 PM
mgrang retitled this revision from [RISCV] Implement tail call optimization to [RISCV] Lower the tail pseudo instruction.
mgrang edited the summary of this revision. (Show Details)
mgrang retitled this revision from [RISCV] Lower the tail pseudo instruction to [RISCV] Lower the tail pseudoinstruction.May 2 2018, 1:17 PM
mgrang edited the summary of this revision. (Show Details)
mgrang updated this revision to Diff 145064.May 3 2018, 12:02 PM

Thanks Alex for the explanation on indirect parameter passing. I have updated the condition in my patch and added a unit test.

test/CodeGen/RISCV/tail-calls.ll
2

Note: I have limited this test only to riscv32. With riscv64 I get the following error for fp128 parameter:
LLVM ERROR: Cannot select: t17: i64 = Constant<4611404543450677248>

I guess I can restrict only the failing test to riscv32 but I felt it was cleaner this way. We could enable this for riscv64 once the support for fp128 in riscv64 is more mature.

@asb What do you suggest?

asb requested changes to this revision.May 9 2018, 9:02 PM

Hi Mandeep, I've added a number of inline comments and queries. I think we're getting pretty close now.

There's no test coverage for the patterns:

def : Pat<(Tail (iPTR texternalsym:$dst)),
          (PseudoTAIL texternalsym:$dst)>;
def : Pat<(Tail GPRTC:$dst),               
          (PseudoTAILIndirect GPRTC:$dst)>;
lib/Target/RISCV/RISCVISelLowering.cpp
1196–1197

Could you please add a test case for this, demonstrating that we get an error rather than silently miscompiling or ignoring musttail. A musttail function with structret or similar should do the trick.

1325

We know that IsTailCall should never be true here. I'd get rid of this conditional and add an assert at line 1308. It seems better than introducing a control flow path we can't test and we know should never execute.

1335–1343

Is this comment and logic actually relevant for the RISC-V implementation? In this patch, tail call optimisation is disabled for byval parameters and any case where the stack is used for parameter passing.

lib/Target/RISCV/RISCVInstrInfo.td
676

Is Uses = [X2] required here?

test/CodeGen/RISCV/disable-tail-calls.ll
2–3

The test is currently just testing whether -disable-tail-calls=false overrides a disable-tail-calls=true attribute. It should also test whether -disable-tail-calls=true overrides an explicit disable-tail-calls=false attribute.

The overriding behaviour seems a little surprising to me, but I see ARM is testing the same thing.

test/CodeGen/RISCV/tail-calls.ll
2

MC layer tests should check rv64 whenever relevant. Proper RV64 codegen isn't yet upstreamed (primarily as I need to sit down and flesh out the tests). Just testing RV32 for now is the right thing to do

This revision now requires changes to proceed.May 9 2018, 9:02 PM

Thanks for the comments Alex. I had a couple of observations:

  1. For PseudoTAILIndirect if I do not specify an AsmString then I don't see any instruction in the generated assembly. Is this because PseudoTAILIndirect is lowered in the RISCVMCCodeEmitter.cpp and may be too late for the assembly generation? What do you suggest to counter this?
  1. I am not sure if we need the pattern for texternalsym:
def : Pat<(Tail (iPTR texternalsym:$dst)),
          (PseudoTAIL texternalsym:$dst)>;

An extern function, for example, always matches the tglobaladdr pattern:

def : Pat<(Tail (iPTR tglobaladdr:$dst)),
          (PseudoTAIL texternalsym:$dst)>;

And nothing ever seems to match the texternalsym pattern. I expected the following to match texternalsym since callee is an extern, but it matches tglobaladdr:

declare void @callee()
define void @CALLER2() {
  tail call void @callee()
  ret void
}

I have addressed the rest of your comments and will push an updated version soon. Thanks.

mgrang updated this revision to Diff 146979.May 15 2018, 6:23 PM

Addressed comments.

I think those tail instructions that do not have a pseudo instruction in assembly code to represent them should not be lowered so late at MC layer as we are doing with the other call instructions.

asb added a comment.May 17 2018, 6:20 AM

I'm definitely seeing failures with this due to PseudoTAILIndirect.

Sorry for being cryptic regarding handling of PseudoTAILIndirect. I meant it should have no asmstring because it should never reach the asm printer. It perhaps would be nice if there was an assert if an isCodeGenOnly instruction makes it through to the asm printer.

PseudoTAILIndirect is purely a codegen construct, and won't produce the tail pseudoinstruction. My first instinct would be to handle it very similarly to PseudoCALLIndirect and use PseudoInstExpansion:

def PseudoTAILIndirect : Pseudo<(outs), (ins GPRTC:$rs1), [(Tail GPRTC:$rs1)]>,
                         PseudoInstExpansion<(JALR X0, GPR:$rs1, 0)>;

But I see there's some sort of issue there and haven't had a chance yet to dig deep into what's going on. Setting let usesCustomInserter and handling in EmitInstrWithCustomerInserter is an alternative approach that comes to mind, though it would be good to better understand the PseudoInstExpansion issue first.

Test-wise, we definitely need coverage of PseudoInstExpansion. Beyond that, this seems like a nice optimisation to have. You mentioned it made not much difference in the workloads you threw at it, but it seems to trigger quite frequently in the sort of small test programs you find in a compiler test suite. If nothing else, it helps reduce the amount of code to look through when debugging those sort of inputs :)

lib/Target/RISCV/MCTargetDesc/RISCVMCCodeEmitter.cpp
87–88

Should expand doc comment to explain it also expands PseudoTAIL and that it returns the number of instructions that are produced.

That said, I expect we're going to drop the PseudoTAILIndirect expansion from here and so there's less need to return the number of instructions inserted.

In D45395#1102983, @asb wrote:

I'm definitely seeing failures with this due to PseudoTAILIndirect.

Sorry for being cryptic regarding handling of PseudoTAILIndirect. I meant it should have no asmstring because it should never reach the asm printer. It perhaps would be nice if there was an assert if an isCodeGenOnly instruction makes it through to the asm printer.

PseudoTAILIndirect is purely a codegen construct, and won't produce the tail pseudoinstruction. My first instinct would be to handle it very similarly to PseudoCALLIndirect and use PseudoInstExpansion:

def PseudoTAILIndirect : Pseudo<(outs), (ins GPRTC:$rs1), [(Tail GPRTC:$rs1)]>,
                         PseudoInstExpansion<(JALR X0, GPR:$rs1, 0)>;

But I see there's some sort of issue there and haven't had a chance yet to dig deep into what's going on. Setting let usesCustomInserter and handling in EmitInstrWithCustomerInserter is an alternative approach that comes to mind, though it would be good to better understand the PseudoInstExpansion issue first.

Test-wise, we definitely need coverage of PseudoInstExpansion. Beyond that, this seems like a nice optimisation to have. You mentioned it made not much difference in the workloads you threw at it, but it seems to trigger quite frequently in the sort of small test programs you find in a compiler test suite. If nothing else, it helps reduce the amount of code to look through when debugging those sort of inputs :)

Thanks @asb for the suggestions.
I marked PseudoTAILIndirect as usesCustomInserter and custom lowered it in EmitInstrWithCustomInserter: https://reviews.llvm.org/differential/diff/147624

However, I see that the stack is not restored before the tail call. The call parameters are also not setup correctly without writing code to carry over the implicit regs info from the Pseudo to the JALR. As a result, I see failures on some of the spec benchmarks (gap, gcc, hmmer, mesa). I tried to play around to see how we can setup the call but could not find a clean way.

On the other hand, I looked at how AArch64 lowers the TCRETURNri. They custom lower it in the AArch64AsmPrinter after the auto-generated pseudos are lowered.
I tried the same for RISCV and it seemed to work very well: https://reviews.llvm.org/differential/diff/147623

I see the tail call being setup correctly and the spec benchmarks are all passing. This way we also don't need any explicit asmstring in the td for the Pseudo. I think we should follow this route for lowering the PseudoTAILIndirect.
What do you think Alex ?

mgrang updated this revision to Diff 148092.May 22 2018, 1:38 PM
  1. Lowered PseudoTAILIndirect in AsmPrinter (similar to AArch64 TCRETURNri pseudo).
  2. Added test cases for tail call on external symbols and indirect tail call.
  3. Addressed other comments.
asb added a comment.May 23 2018, 1:08 PM

I'm not sure what I did wrong when first trying PseudoInstExpansion with PseudoTailIndirect, but I'm not seeing any problems now. Replace the current PseudoTAILIndirect definition with the following and drop the RISCVAsmPrinter changes:

def PseudoTAILIndirect : Pseudo<(outs), (ins GPRTC:$rs1), [(Tail GPRTC:$rs1)]>,
                         PseudoInstExpansion<(JALR X0, GPR:$rs1, 0)>;

This generates the following in RISCVGenMCPseudoLowering.inc:

case RISCV::PseudoTAILIndirect: {
  MCInst TmpInst;
  MCOperand MCOp;
  TmpInst.setOpcode(RISCV::JALR);
  // Operand: rd
  TmpInst.addOperand(MCOperand::createReg(RISCV::X0));
  // Operand: rs1
  lowerOperand(MI->getOperand(0), MCOp);
  TmpInst.addOperand(MCOp);
  // Operand: imm12
  TmpInst.addOperand(MCOperand::createImm(0));
  EmitToStreamer(OutStreamer, TmpInst);
  break;
}

With that change, this looks good to me.

mgrang updated this revision to Diff 148275.May 23 2018, 1:22 PM

Added a PseudoInstExpansion rule for PseudoTAILIndirect as per Alex's suggestion.

mgrang added a comment.EditedMay 23 2018, 1:24 PM

Thanks for the pointer Alex. I guess what I was doing incorrect was specifying GPRTC as the reg class in the PseudoInstExpansion which made the tablegen trip.

def PseudoTAILIndirect : Pseudo<(outs), (ins GPRTC:$rs1), [(Tail GPRTC:$rs1)]>,
                         PseudoInstExpansion<(JALR X0, GPRTC:$rs1, 0)>;

With your suggestion, PseudoTAILIndirect is now an almost exact replica of PseudoCALLIndirect. Thanks again!

asb accepted this revision.May 23 2018, 1:56 PM

Fantastic, looks good to me (though see my comment for a tiny RISCVInstrInfo.td change to make prior to commit). Thanks!

lib/Target/RISCV/RISCVInstrInfo.td
684–685

This pattern isn't needed as it's already present in the PseudoTAILIndirect definition.

This is merged in rL333137.

HsiangKai resigned from this revision.Jul 26 2018, 6:59 PM
This revision is now accepted and ready to land.Jul 26 2018, 6:59 PM
mgrang closed this revision.Jul 31 2018, 11:19 AM