Page MenuHomePhabricator

[RISCV] Add support for using -msave-restore with tailcalls
Needs ReviewPublic

Authored by edward-jones on Nov 18 2020, 8:34 AM.

Details

Summary

This introduces the use of a new compiler runtime library function which behaves the same as riscv_restore_N, however instead of returning to the callee these functions do a tailcall to the function whose pointer is stored in t1. This allows the -msave-restore optimization to be applied even in a tailcall context. The new runtime library functions are called riscv_restore_tailcall_X where X is the number of callee-saved registers which will be restored.

Original patch by Simon Cook.

Diff Detail

Event Timeline

edward-jones created this revision.Nov 18 2020, 8:34 AM
edward-jones requested review of this revision.Nov 18 2020, 8:34 AM

Where is this specified? I only see the non-tailcall versions in GCC. I don't think we should merge anything until it's at least a draft specification in an official place.

jrtc27 added inline comments.Nov 18 2020, 8:43 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1097

We should not need to duplicate all this code surely? Certainly these need to honour the code model and not assume medlow.

lenary resigned from this revision.Jan 14 2021, 9:43 AM

I have updated this so that the generation of the address passed to the restore functions is done in a separate method RISCVFrameLowering::buildAddr. This handles both the code models and mirrors RISCVISelLowering::getAddr - it would be nice if these functions could share code, or at least be in a common place.

I've also opened a pull request to document the existing -msave-restore behaviour in riscv-toolchain-conventions, and I have a follow up patch to document the tail call versions of the entry points too.

I've expanded the test to cover the different code models, and pic mode too.

lenary removed a subscriber: lenary.Apr 19 2021, 11:29 AM

Documentation for -msave-restore has been committed to riscv-toolchain-conventions, I have an additional pull request to document this proposed tail-call version here

On the RISC-V sync-up call I said I would try and provide some code size numbers for these changes. First I mentioned that I saw a code size regression when compiled Embench, here's per-benchmark numbers:

-Oz -msave-restore (no tailcall version)

 Benchmark            size
---------            ----
aha-mont64          1,964
crc32               1,140
cubic              55,942
edn                 2,364
huffbench           2,456
matmult-int         1,420
minver             11,160
nbody              10,748
nettle-aes          3,852
nettle-sha256       8,552
nsichneu           14,268
picojpeg            9,476
qrduino             7,256
sglib-combined      3,528
slre                3,652
st                 11,056
statemate           4,908
ud                  1,620
wikisort           15,060
---------           -----
Geometric mean      5,347
Geometric SD            2.65
Geometric range  12,143.345300622888

-Oz -msave-restore (with tailcall version)

Benchmark            size
---------            ----
aha-mont64          1,968
crc32               1,340
cubic              55,942
edn                 2,372
huffbench           2,656
matmult-int         1,476
minver             11,164
nbody              10,752
nettle-aes          3,896
nettle-sha256       8,588
nsichneu           14,272
picojpeg            9,476
qrduino             7,428
sglib-combined      6,512
slre                3,656
st                 11,116
statemate           4,912
ud                  1,620
wikisort           15,060
---------           -----
Geometric mean      5,620
Geometric SD            2.59
Geometric range  12,365.236826034434

The Embench benchmarks are very small, so the constant cost of the additional __riscv_restore_tailcall entry points exceeds the benefit of the optimization.

To try something larger I also grabbed sqlite 3.35.5 and tried building that (again -Oz -msave-restore). Below are results for that experiment.

  text    data     bss     dec     hex filename
652816    5604     903  659323   a0f7b sqlite3-save-restore.o
645328    5604     903  651835   9f23b sqlite3-save-restore-tailcall.o

So for this larger codebase where the constant cost of the additional functions is minor, we see approximately a 1% reduction in code size (note that this only compiling to an object file, so it doesn't include the the cost of the save-restore compiler-rt functions)