Page MenuHomePhabricator

[RISCV] Add support for save/restore of callee-saved registers via libcalls
ClosedPublic

Authored by lewis-revill on May 30 2019, 9:04 AM.

Details

Summary

This patch adds the support required for using the __riscv_save and __riscv_restore libcalls to implement a size-optimization for prologue and epilogue code, whereby the spill and restore code of callee-saved registers is implemented by common functions to reduce code duplication.

The initial prototype for this patch was implemented by @simoncook

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

We discussed this in the RISC-V meeting on 19 Sept 2019. @apazos says there are some SPEC failures in both 2006 and 2017, which would be good to triage.

Two nits, that I wanted to submit before the meeting, but didn't get around to.

llvm/lib/Target/RISCV/RISCV.td
80

Given the clang option defaults to false, I think it should here too, to avoid confusion in other frontends.

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
451–452

Nit: You shouldn't need this if statement - the for loop just won't execute if CSI is empty, surely.

lewis-revill marked 2 inline comments as done.Sep 19 2019, 9:03 AM
lewis-revill added inline comments.
llvm/lib/Target/RISCV/RISCV.td
80

This is simply a case of a confusing parameter of the SubtargetFeature class.

SubtargetFeature interprets this "true" string as 'Value the attribute to be set to by feature'. IE: when the feature is enabled, what value should the corresponding RISCVSubtarget variable be set to? Rather than a default value for that variable.

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
451–452

Good catch, thanks!

lenary added inline comments.Sep 19 2019, 9:06 AM
llvm/lib/Target/RISCV/RISCV.td
80

Ah, apologies for the confusion.

lewis-revill marked an inline comment as done.Sep 19 2019, 9:14 AM

We discussed this in the RISC-V meeting on 19 Sept 2019. @apazos says there are some SPEC failures in both 2006 and 2017, which would be good to triage.

I've uncovered some execution failures in our annotated GCC testsuite as well, so I will look into those as a starting point.

lewis-revill planned changes to this revision.Sep 23 2019, 3:12 AM

It seems like the regressions I'm seeing are due to the fact that calculating offsets for fixed objects at the top of the frame didn't account for extra stack size adjustment from the libcalls. I'm trying to find a neat way around this without breaking other calculations.

Rewrote logic to calculate stack sizes, frame indexes and frame pointer offsets. This was necessary to take into account the fact that the save/restore lib calls are essentially an opaque section of the stack that is inserted at the beginning of the frame, after positive offset objects such as fixed stack arguments, but before additional callee saved registers (IE FP registers) and negative offset (dynamic) objects. So calculations of the actual offsets of these objects must be adjusted accordingly for the stack to function correctly.

Thanks for the patch update. I will launch some new correctness runs.

apazos added a comment.Oct 2 2019, 2:54 PM

Lewis, with this patch we see less failures. But still some tests in SPEC and perennial test suites are failing.

Pengxuan and I are trying to triage the failures.

Here is what we see in one of the failed tests:

Code right before rologue/Epilogue Insertion & Frame Finalization:

%call39 = tail call i32 @somefunction()
ret i32 %call39

Code after Prologue/Epilogue Insertion & Frame Finalization:

PseudoCALL target-flags(<unknown>) @somefunction, implicit-def $x1, implicit $x2
$x10 = frame-destroy LUI 1
$x10 = frame-destroy ADDI killed $x10, -1712
$x2 = frame-destroy ADD $x2, killed $x10
CFI_INSTRUCTION def_cfa_offset 0
frame-destroy PseudoTAIL target-flags(<unknown>) &__riscv_restore_12, implicit $x2

x10 is the return value and it is being overwritten.

Is this patch self contained? Or does it depend on another patch under review?

For the tests that are passing, I see better code size reduction with C extension enabled (testing RV32IMAC) than with the machine outliner.

apazos added inline comments.Oct 2 2019, 8:07 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
684

Where are we making sure the PseudoCALL result in a0 is alive through the riscv_restore call?

apazos added inline comments.Oct 3 2019, 9:01 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
684

I did not find any other target that transforms a tail call back into a regular call.

The issue with doing this is that we don't have info about the return value of the original call.

If anyone knows how to do it, please give me some pointers.

I tried to fix this problem by adding implicit operands to the riscv_restore tail call that are all the possible return value registers:
BuildMI(MBB, MI, DL, TII.get(RISCV::PseudoTAIL))
.addExternalSymbol(RestoreLibCall, RISCVII::MO_CALL)
-.setMIFlag(MachineInstr::FrameDestroy);
+ .setMIFlag(MachineInstr::FrameDestroy)
+ .addReg(RISCV::X10, RegState::Implicit)
+ .addReg(RISCV::X11, RegState::Implicit)
+ .addReg(RISCV::F10_F, RegState::Implicit)
+ .addReg(RISCV::F11_F, RegState::Implicit)

With this change, some perennial tests now pass.

But there are still failures to be analyzed in SPEC.

Rebased to fix conflicts with recent split SP adjustments

The priority for this patch is to address the issues reported by @apazos but after that please check the clang-format output. There are some cases in this patch where it might make sense to use a different formatting than clang-format indicates, but the remaining should be addressed.

@apazos Have you considered tweaking the patch code to not do a tail call, just to check if that's what's causing the remaining failures? I'm not sure if that's too hard, but it could eventually be easier than drilling into the failing cases.

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
28

The return value isn't used as just an opaque index, it also reflects the frame size and is used for that purpose. The function comment should probably reflect that.

35

Use Register and RISCV::NoRegister. (You'll have to use MaxReg.id() instead in the call to max).

37

Might be worth adding a small comment explaining how this serves as a filters for the registers we are interested in. Or point to a later relevant comment?

40

Ditto NoRegister.

67

Check LLVM naming convention capitalization. Ditto other vars here.

94

Check LLVM naming convention capitalization. Ditto other vars here.

202

This could probably use SmallVector.

709

Ditto Register.

llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
109

Use IndexedMap instead?

llvm/test/CodeGen/RISCV/saverestore.ll
349

Maybe for these tests just put a -NOT check that __riscv_save_ isn't called?

Here is the bugpoint-reduced test case for the SPEC failure when enabling -msave-restore and allowing tail calls:

Run the command llc test.ll -mattr=+save-restore -o out.s

You will see the code generated is wrong:

tail    __riscv_restore_2
 jr      a5

target datalayout = "e-m:e-p:32:32-i64:64-n32-S128"
target triple = "riscv32-unknown-linux-gnu"

%struct.1_s = type { i8*, i8*, i8*}
%struct.2_s = type { i8*, i8*, i8*}

declare dso_local void @test2() local_unnamed_addr
declare dso_local i32 @test3(%struct.2_s* nonnull %a, %struct.1_s** %b) local_unnamed_addr

define dso_local void @test1(%struct.2_s* %a, %struct.1_s** %b) local_unnamed_addr {
entry:

br i1 undef, label %if.end2, label %if.then

if.then: ; preds = %entry

tail call void @test2()
br label %if.end2

if.end2: ; preds = %if.then, %entry

%call3 = tail call i32 null(%struct.2_s* nonnull %a, %struct.1_s** %b)
ret void

}

To summarize the issues we have detected so far:

  1. Bug1: When transforming a tail call back into a regular call, the transformation is not correctly preserving the original call's return values.

I fixed this by conservatively adding RegState::Implicit definitions for each possible integer and FP return registers, when defining the riscv_restore tail call.

  1. Bug2: test case provided above. Please Lewis take a look at how this case can be fixed.
  1. I have also run the tests with "-fno-optimize-sibling-calls -msave-restore", i.e., disabling tail calls when m-save-restore is enabled.

With this config, issues (1) and (2) do not happen.
There is some loss in code size savings when using fno-optimize-sibling-calls.
But still we see code size savings benefit from -msave-restore.

Bug2: test case provided above. Please Lewis take a look at how this case can be fixed.

Isn't the issue just that the code is checking for PseudoTAIL, and not PseudoTAILIndirect?


What happens if a tail call has more than 8 arguments?

What happens for a musttail call? A tailcc call? (Not sure if either of those are actually supported on RISCV at the moment?)

Yes Eli thanks for pointing out there are more scenarios that can fail.
It looks like the best solution is to permit both flags on, but then bail out from doing this transformation if there is any type of tail call already in the function.
This way we avoid messing with reverting tail calls back to regular calls.

Yes Eli thanks for pointing out there are more scenarios that can fail.
It looks like the best solution is to permit both flags on, but then bail out from doing this transformation if there is any type of tail call already in the function.
This way we avoid messing with reverting tail calls back to regular calls.

Is it worth trying to disallow tail call optimization completely if this flag is enabled? I'm not sure what GCC does exactly. but this seems to be the behaviour.

lewis-revill added inline comments.Oct 15 2019, 4:24 AM
llvm/test/CodeGen/RISCV/saverestore.ll
349

Thanks, I realise it makes much more sense to do this for all the other 'non-save-restore' checks too, since they don't actually check any save/restore behaviour.

Is it worth trying to disallow tail call optimization completely if this flag is enabled? I'm not sure what GCC does exactly. but this seems to be the behaviour.

I had reported above that I have already run that test: with "-fno-optimize-sibling-calls -msave-restore", i.e., disabling tail calls when m-save-restore is enabled.

But it seems a better solution is to optimistically apply -msave-restore when there are no tail calls of any type in a function, instead of disabling tail calls completed. Let tail call optimization prevail over msave-restore. No LLVM target is disabling tail calls.

So you can update the patch according to this solution plan.

Disable the save/restore optimization when a function contains tail calls. Address various miscellaneous concerns with the patch. Update tests to include less redundant code when checking cases where save/restore libcalls are not used.

Thanks Lewis, the runs are looking good, no failures, and good code size savings (average 3%)

Rebase on top of shrink wrapping patch.

Rebased and merged D68644 into this patch - this patch already assumes shrink wrapping support anyway.

pzheng added inline comments.Nov 14 2019, 1:00 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
451–452

Any chance this comment from @lenary is missed?

lewis-revill added inline comments.Nov 15 2019, 2:31 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
451–452

It appears this CFI code was removed since I last rebased.
But yes, just for posterity I did miss this comment!

lkail added a subscriber: lkail.Nov 15 2019, 4:22 AM
apazos added a comment.Dec 3 2019, 4:45 PM

Lewis, try rebasing it, not applying cleanly nor https://reviews.llvm.org/D62190

shiva0217 added inline comments.Dec 9 2019, 8:50 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
289

Should the -StackSize be -RealStackSize?

674

There is a case may trigger an assertion when compile with -O3 -g -msave-restore if the libcall has FrameSetup flag.

int main(int a, char* argv[]) {
  exit(0);
  return 0;
}
675

GCC will generate stack adjustment and GPR callee saved CFIs for the save libcalls. Should we do the same?

llvm/lib/Target/RISCV/RISCVRegisterInfo.h
40

An alternative of defining hasReservedSpillSlot could be set SaveRegs for the registers will be pushed by the libcalls in determineCalleeSaves. So the StackSize calculation will take the callee saved registers will be pushed by the libcalls into account.

lewis-revill marked an inline comment as done.Dec 16 2019, 4:05 AM
lewis-revill added inline comments.
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
674

Think I've found the cause of this. When 'DIFlagAllCallsDescribed' is set for the module LLVM attempts to provide call entry info DIEs for all calls, including this one that has been added by us. One detail of this is attempting to calculate the return PC value or offset for the call using a label that was assumed to be inserted after the call instruction.

However, while the attempt to add the call entry info doesn't check for the FrameSetup flag, the label was never inserted, since that code does check for the flag. Adding the missing check to avoid adding call entry info for calls marked as FrameSetup fixed this issue.

lewis-revill marked an inline comment as done.Dec 16 2019, 8:57 AM
lewis-revill added inline comments.
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
289

Looks like it, good catch.

675

I'm not familiar with the use of the CFI offset stuff, though just to be sure, you're saying that in addition to the manually-added .cfi_offsets in the libcalls themselves we would want to add them in our frame too?

jrtc27 added inline comments.Dec 16 2019, 9:15 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
675

Yes. The CFI use is completely ignorant of any control flow. It goes to the start of the function, and walks forward, executing every .cfi_* directive until reaching the current point in the code. Since the libcalls (and thus their CFI directives) are not inline in the function, they are completely invisible to the CFI machinery. Thus, their side-effects must be duplicated straight after the calls to them.

Rebased and addressed StackSize vs RealStackSize error.

Fixed existing .cfi_offset offsets. Since these are frame-pointer based they also need to account for the libcall stack adjustment.

Currently working on adding .cfi_offset instructions for the registers saved by the libcall.

Added .cfi_offset directives for registers saved by libcalls.

apazos added a comment.Jan 7 2020, 8:33 AM

Lewis, is the patch final? It would be good to merge it before the 10.0 release branch creation on Jan 15th

I see the following .cfi_offset directives generated using @shiva0217's test case. Any idea why the offset for ra is 536870908?

call    t0, __riscv_save_0
.cfi_def_cfa_offset 16
.cfi_offset ra, 536870908

I see the following .cfi_offset directives generated using @shiva0217's test case. Any idea why the offset for ra is 536870908?

call    t0, __riscv_save_0
.cfi_def_cfa_offset 16
.cfi_offset ra, 536870908

Hmm I see this too I'll look into it.

Fix .cfi_offset signedness error.

Lewis, is the patch final? It would be good to merge it before the 10.0 release branch creation on Jan 15th

I would say so now.

Fix .cfi_offset signedness error.

Thanks, @lewis-revill. It looks correct now.

Should I wait for the comments to be resolved on D71593 before I commit this patch? Ideally if this patch makes it into a release then that bug fix should be there too.

Lewis, your latest patch looks good, we just had another run with no new failures. But we know it will have issues with -g. So I think we should not merge it yet. Do you have a version of the patch that creates the labels for the compiler-generated save/restore lib calls, so that this optimization does not depend on D71593? We could merge that version then, and when D71593 is accepted, you just have to rework/remove the label generation part of the patch.

Lewis, your latest patch looks good, we just had another run with no new failures. But we know it will have issues with -g. So I think we should not merge it yet. Do you have a version of the patch that creates the labels for the compiler-generated save/restore lib calls, so that this optimization does not depend on D71593? We could merge that version then, and when D71593 is accepted, you just have to rework/remove the label generation part of the patch.

I don't expect that would be possible without making changes to generic code anyway. Removing the framesetup flag from the libcalls when generating them would allow labels to be produced, but that would require making modifications to the logic of this patch which relies on the libcalls being annotated as such.

evandro added a subscriber: evandro.Feb 5 2020, 1:52 PM

Since the DebugInfo fix has been accepted, I'm looking to get this patch and that fix committed shortly if there are no problems caused by rebasing.

This revision was not accepted when it landed; it landed in state Needs Review.Feb 11 2020, 1:27 PM
This revision was automatically updated to reflect the committed changes.