Page MenuHomePhabricator

[RISCV] Add support for save/restore of callee-saved registers via libcalls
Needs ReviewPublic

Authored by lewis-revill on May 30 2019, 9:04 AM.

Details

Reviewers
asb
edward-jones
Summary

This patch adds the support required for using the __riscv_save and __riscv_restore libcalls to implement a size-optimization for prologue and epilogue code, whereby the spill and restore code of callee-saved registers is implemented by common functions to reduce code duplication.

The initial prototype for this patch was implemented by @simoncook

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
lenary added inline comments.Jul 4 2019, 1:12 PM
lib/Target/RISCV/RISCVFrameLowering.cpp
401 ↗(On Diff #208046)

Won't we hit this case if (after the hard float ABI lands) someone uses any of fs0-fs11?

Would it not be better to return nullptr, so the code can bail out, rather than asserting?

The same applies in getRestoreLibCallName

MaskRay added inline comments.Jul 5 2019, 12:32 AM
lib/Target/RISCV/RISCVMachineFunctionInfo.cpp
27 ↗(On Diff #208046)

return EnableSaveRestore && VarArgsSaveSize == 0; ?

MaskRay added inline comments.Jul 5 2019, 12:37 AM
lib/Target/RISCV/RISCVMachineFunctionInfo.cpp
27 ↗(On Diff #208046)

Oh, why can't this stub just be inlined into getUseSaveRestoreLibCalls? It can be trivially computed every time its value is used. You can expand it later if this function ever becomes more complex.

MaskRay added inline comments.Jul 5 2019, 1:04 AM
lib/Target/RISCV/RISCVFrameLowering.cpp
394 ↗(On Diff #208046)

Delete using std::max;

399 ↗(On Diff #208046)

switch (*std::max_element(CSI.begin(), CSI.end())) {

428 ↗(On Diff #208046)

Delete this

433 ↗(On Diff #208046)

switch (*std::max_element(CSI.begin(), CSI.end())) {

MaskRay added inline comments.Jul 5 2019, 1:07 AM
lib/Target/RISCV/RISCVFrameLowering.cpp
399 ↗(On Diff #208046)

Sorry, please ignore the max_element comment.

lewis-revill added inline comments.Jul 5 2019, 2:53 AM
lib/Target/RISCV/RISCVFrameLowering.cpp
401 ↗(On Diff #208046)

I think the assertion is valid behaviour, since something has definitely gone wrong if any register not present in the list of callee-saved registers below is used as a callee-saved register. We would currently hit this if float ABI callee-saved registers are used, which would indicate that more logic needs to be added to explicitly handle that case.

lib/Target/RISCV/RISCVMachineFunctionInfo.cpp
27 ↗(On Diff #208046)

So this getUseSaveRestoreLibCalls call gets called at multiple points in the pipeline. When I first implemented this function I wanted to be absolutely sure that at any stage in the pipeline the function returns the same result. Now that I think about it that would indicate incorrect behaviour anyway, so I'll make this change and see if I can find anything that breaks.

asb requested changes to this revision.Jul 5 2019, 3:37 AM

Hi Lewis, did you evaluate implementing the machine outliner as a more general approach for getting code size savings in this sort of way?

You should have tests for at least a subset of target-abi {ilp32,lp64}{f,d} and the compiler shouldn't assert in those cases.

This revision now requires changes to proceed.Jul 5 2019, 3:37 AM
In D62686#1571192, @asb wrote:

Hi Lewis, did you evaluate implementing the machine outliner as a more general approach for getting code size savings in this sort of way?

I haven't, is there an open patch for this?

You should have tests for at least a subset of target-abi {ilp32,lp64}{f,d} and the compiler shouldn't assert in those cases.

Will do, I can rebase this patch on top of your hard float patch.

Will do, I can rebase this patch on top of your hard float patch.

Ah, I wasn't aware it had landed!

lewis-revill added inline comments.Jul 12 2019, 4:15 AM
lib/Target/RISCV/RISCVFrameLowering.cpp
401 ↗(On Diff #208046)

GCC's behaviour is to add manual floating point save/restore code in addition to any libcall that has been inserted for non-fp registers. I realize that this would break a lot of the logic throughout this patch where I assume that either all callee-save registers would be save/restored by libcall or none of them would, so perhaps the approach for now should be to bail out of using libcalls if any of the fp registers require save/restoring.

Add support for a split save/restore approach when callee saved registers from the floating point ABIs are used.

In D62686#1571192, @asb wrote:

Hi Lewis, did you evaluate implementing the machine outliner as a more general approach for getting code size savings in this sort of way?

You should have tests for at least a subset of target-abi {ilp32,lp64}{f,d} and the compiler shouldn't assert in those cases.

Would the existence of save/restore in the codebase conflict with the machine outliner approach? Otherwise it should be a case of simply disallowing enabling both flags at once if this leads to invalid behaviour.

In D62686#1571192, @asb wrote:

Hi Lewis, did you evaluate implementing the machine outliner as a more general approach for getting code size savings in this sort of way?

I've added D66210 as a very basic implementation of the machine outliner, the results are currently slightly better than this save/restore approach, but I'm not certain the implementation isn't too relaxed about what can be outlined. It's worth noting that enabling both the optimizations at once doesn't diminish the gains you get from the machine outliner.

Fix git error whereby logic from this patch was mistakenly included in D62190

lewis-revill planned changes to this revision.Sep 16 2019, 4:43 AM

There is an option to Clang for '-msave-restore' which should be utilized to enable this.

There is an option to Clang for '-msave-restore' which should be utilized to enable this.

I don't think we'd have a problem if clang support for this came in a separate commit.

There is an option to Clang for '-msave-restore' which should be utilized to enable this.

I don't think we'd have a problem if clang support for this came in a separate commit.

There needs to be an LLVM component too, to add a target feature such that we can do Features.push_back("+save-restore") from Clang.

Replace internal -mllvm option with target feature enabled through the clang frontend using -msave-restore.

Herald added a project: Restricted Project. · View Herald TranscriptSep 18 2019, 8:17 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

We discussed this in the RISC-V meeting on 19 Sept 2019. @apazos says there are some SPEC failures in both 2006 and 2017, which would be good to triage.

Two nits, that I wanted to submit before the meeting, but didn't get around to.

llvm/lib/Target/RISCV/RISCV.td
72

Given the clang option defaults to false, I think it should here too, to avoid confusion in other frontends.

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
444–445

Nit: You shouldn't need this if statement - the for loop just won't execute if CSI is empty, surely.

lewis-revill marked 2 inline comments as done.Sep 19 2019, 9:03 AM
lewis-revill added inline comments.
llvm/lib/Target/RISCV/RISCV.td
72

This is simply a case of a confusing parameter of the SubtargetFeature class.

SubtargetFeature interprets this "true" string as 'Value the attribute to be set to by feature'. IE: when the feature is enabled, what value should the corresponding RISCVSubtarget variable be set to? Rather than a default value for that variable.

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
444–445

Good catch, thanks!

lenary added inline comments.Sep 19 2019, 9:06 AM
llvm/lib/Target/RISCV/RISCV.td
72

Ah, apologies for the confusion.

lewis-revill marked an inline comment as done.Sep 19 2019, 9:14 AM

We discussed this in the RISC-V meeting on 19 Sept 2019. @apazos says there are some SPEC failures in both 2006 and 2017, which would be good to triage.

I've uncovered some execution failures in our annotated GCC testsuite as well, so I will look into those as a starting point.

lewis-revill planned changes to this revision.Sep 23 2019, 3:12 AM

It seems like the regressions I'm seeing are due to the fact that calculating offsets for fixed objects at the top of the frame didn't account for extra stack size adjustment from the libcalls. I'm trying to find a neat way around this without breaking other calculations.

Rewrote logic to calculate stack sizes, frame indexes and frame pointer offsets. This was necessary to take into account the fact that the save/restore lib calls are essentially an opaque section of the stack that is inserted at the beginning of the frame, after positive offset objects such as fixed stack arguments, but before additional callee saved registers (IE FP registers) and negative offset (dynamic) objects. So calculations of the actual offsets of these objects must be adjusted accordingly for the stack to function correctly.

Thanks for the patch update. I will launch some new correctness runs.

apazos added a comment.Oct 2 2019, 2:54 PM

Lewis, with this patch we see less failures. But still some tests in SPEC and perennial test suites are failing.

Pengxuan and I are trying to triage the failures.

Here is what we see in one of the failed tests:

Code right before rologue/Epilogue Insertion & Frame Finalization:

%call39 = tail call i32 @somefunction()
ret i32 %call39

Code after Prologue/Epilogue Insertion & Frame Finalization:

PseudoCALL target-flags(<unknown>) @somefunction, implicit-def $x1, implicit $x2
$x10 = frame-destroy LUI 1
$x10 = frame-destroy ADDI killed $x10, -1712
$x2 = frame-destroy ADD $x2, killed $x10
CFI_INSTRUCTION def_cfa_offset 0
frame-destroy PseudoTAIL target-flags(<unknown>) &__riscv_restore_12, implicit $x2

x10 is the return value and it is being overwritten.

Is this patch self contained? Or does it depend on another patch under review?

For the tests that are passing, I see better code size reduction with C extension enabled (testing RV32IMAC) than with the machine outliner.

apazos added inline comments.Oct 2 2019, 8:07 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
677

Where are we making sure the PseudoCALL result in a0 is alive through the riscv_restore call?

apazos added inline comments.Oct 3 2019, 9:01 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
677

I did not find any other target that transforms a tail call back into a regular call.

The issue with doing this is that we don't have info about the return value of the original call.

If anyone knows how to do it, please give me some pointers.

I tried to fix this problem by adding implicit operands to the riscv_restore tail call that are all the possible return value registers:
BuildMI(MBB, MI, DL, TII.get(RISCV::PseudoTAIL))
.addExternalSymbol(RestoreLibCall, RISCVII::MO_CALL)
-.setMIFlag(MachineInstr::FrameDestroy);
+ .setMIFlag(MachineInstr::FrameDestroy)
+ .addReg(RISCV::X10, RegState::Implicit)
+ .addReg(RISCV::X11, RegState::Implicit)
+ .addReg(RISCV::F10_F, RegState::Implicit)
+ .addReg(RISCV::F11_F, RegState::Implicit)

With this change, some perennial tests now pass.

But there are still failures to be analyzed in SPEC.

Rebased to fix conflicts with recent split SP adjustments

The priority for this patch is to address the issues reported by @apazos but after that please check the clang-format output. There are some cases in this patch where it might make sense to use a different formatting than clang-format indicates, but the remaining should be addressed.

@apazos Have you considered tweaking the patch code to not do a tail call, just to check if that's what's causing the remaining failures? I'm not sure if that's too hard, but it could eventually be easier than drilling into the failing cases.

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
28

The return value isn't used as just an opaque index, it also reflects the frame size and is used for that purpose. The function comment should probably reflect that.

35

Use Register and RISCV::NoRegister. (You'll have to use MaxReg.id() instead in the call to max).

37

Might be worth adding a small comment explaining how this serves as a filters for the registers we are interested in. Or point to a later relevant comment?

40

Ditto NoRegister.

67

Check LLVM naming convention capitalization. Ditto other vars here.

94

Check LLVM naming convention capitalization. Ditto other vars here.

202

This could probably use SmallVector.

702

Ditto Register.

llvm/lib/Target/RISCV/RISCVRegisterInfo.cpp
109

Use IndexedMap instead?

llvm/test/CodeGen/RISCV/saverestore.ll
349

Maybe for these tests just put a -NOT check that __riscv_save_ isn't called?

Here is the bugpoint-reduced test case for the SPEC failure when enabling -msave-restore and allowing tail calls:

Run the command llc test.ll -mattr=+save-restore -o out.s

You will see the code generated is wrong:

tail    __riscv_restore_2
 jr      a5

target datalayout = "e-m:e-p:32:32-i64:64-n32-S128"
target triple = "riscv32-unknown-linux-gnu"

%struct.1_s = type { i8*, i8*, i8*}
%struct.2_s = type { i8*, i8*, i8*}

declare dso_local void @test2() local_unnamed_addr
declare dso_local i32 @test3(%struct.2_s* nonnull %a, %struct.1_s** %b) local_unnamed_addr

define dso_local void @test1(%struct.2_s* %a, %struct.1_s** %b) local_unnamed_addr {
entry:

br i1 undef, label %if.end2, label %if.then

if.then: ; preds = %entry

tail call void @test2()
br label %if.end2

if.end2: ; preds = %if.then, %entry

%call3 = tail call i32 null(%struct.2_s* nonnull %a, %struct.1_s** %b)
ret void

}

To summarize the issues we have detected so far:

  1. Bug1: When transforming a tail call back into a regular call, the transformation is not correctly preserving the original call's return values.

I fixed this by conservatively adding RegState::Implicit definitions for each possible integer and FP return registers, when defining the riscv_restore tail call.

  1. Bug2: test case provided above. Please Lewis take a look at how this case can be fixed.
  1. I have also run the tests with "-fno-optimize-sibling-calls -msave-restore", i.e., disabling tail calls when m-save-restore is enabled.

With this config, issues (1) and (2) do not happen.
There is some loss in code size savings when using fno-optimize-sibling-calls.
But still we see code size savings benefit from -msave-restore.

Bug2: test case provided above. Please Lewis take a look at how this case can be fixed.

Isn't the issue just that the code is checking for PseudoTAIL, and not PseudoTAILIndirect?


What happens if a tail call has more than 8 arguments?

What happens for a musttail call? A tailcc call? (Not sure if either of those are actually supported on RISCV at the moment?)

Yes Eli thanks for pointing out there are more scenarios that can fail.
It looks like the best solution is to permit both flags on, but then bail out from doing this transformation if there is any type of tail call already in the function.
This way we avoid messing with reverting tail calls back to regular calls.

Yes Eli thanks for pointing out there are more scenarios that can fail.
It looks like the best solution is to permit both flags on, but then bail out from doing this transformation if there is any type of tail call already in the function.
This way we avoid messing with reverting tail calls back to regular calls.

Is it worth trying to disallow tail call optimization completely if this flag is enabled? I'm not sure what GCC does exactly. but this seems to be the behaviour.

lewis-revill added inline comments.Oct 15 2019, 4:24 AM
llvm/test/CodeGen/RISCV/saverestore.ll
349

Thanks, I realise it makes much more sense to do this for all the other 'non-save-restore' checks too, since they don't actually check any save/restore behaviour.

Is it worth trying to disallow tail call optimization completely if this flag is enabled? I'm not sure what GCC does exactly. but this seems to be the behaviour.

I had reported above that I have already run that test: with "-fno-optimize-sibling-calls -msave-restore", i.e., disabling tail calls when m-save-restore is enabled.

But it seems a better solution is to optimistically apply -msave-restore when there are no tail calls of any type in a function, instead of disabling tail calls completed. Let tail call optimization prevail over msave-restore. No LLVM target is disabling tail calls.

So you can update the patch according to this solution plan.

Disable the save/restore optimization when a function contains tail calls. Address various miscellaneous concerns with the patch. Update tests to include less redundant code when checking cases where save/restore libcalls are not used.

Thanks Lewis, the runs are looking good, no failures, and good code size savings (average 3%)

Rebase on top of shrink wrapping patch.

Rebased and merged D68644 into this patch - this patch already assumes shrink wrapping support anyway.

pzheng added inline comments.Thu, Nov 14, 1:00 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
444–445

Any chance this comment from @lenary is missed?

lewis-revill added inline comments.Fri, Nov 15, 2:31 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
444–445

It appears this CFI code was removed since I last rebased.
But yes, just for posterity I did miss this comment!

lkail added a subscriber: lkail.Fri, Nov 15, 4:22 AM
apazos added a comment.Tue, Dec 3, 4:45 PM

Lewis, try rebasing it, not applying cleanly nor https://reviews.llvm.org/D62190