This adds the compiler-rt entry points required by the -msave-restore option. The added entry points are riscv_save_X and riscv_restore_X, where X is the number of callee-saved registers to be saved or restored respectively.
It seems a bit excessive to me to coalesce the entry points into bundles of 4. Do you have any particular benchmarking data or reasoning that supports choosing that threshold?
Also, shouldn't this implementation include CFI directives?
|1 ↗||(On Diff #306122)|
The header banner is missing here.
Also, compiler-rt seems to use a mix of .h and .inc files. I'm not sure which one is more canonical, but the existing one in the riscv directory is a .inc.
|9–19 ↗||(On Diff #306122)|
clang-format is complaining here and suggesting that you remove all indentation.
For what it's worth, libgcc seems to group by 2 for RV64 and by 4 for RV32, i.e. it always rounds up to a multiple of 16 bytes as that's an easy way to preserve stack alignment requirements.
I used bundles of 4 just to follow the behaviour I saw in libgcc, and the grouping of 2 for rv64 seemed a bit too fine-grained. I'm not sure what the original justification for the coalescing into groups of 2/4 was in libgcc.
I'll update to account for other suggested changes and see if I can find any benchmarks which show the tradeoff for the grouping threshold
You have to store the correct number of registers in order to access the stack above the libcall size, for instance arguments passed in via the stack. If you spill too many by in some cases spilling two, these offset calculations will be wrong and you'll read the incorrect argument.
Thanks for your earlier comment. I did think a bit about some possible alternative implementations, which could perhaps make some interesting trade-offs while properly preserving the stack alignment, but in the end the approach used in libgcc and this patch is probably the most sensible one.
Ah apologies, I missed your comment. So then libgcc groups them up into the smallest groups it can whist maintaining alignment.
I could switch rv64 to use groups of 2 then if matching libgcc would be advantageous. The only reason for sharing 4 for both rv32/rv64 is that it made the implementation more compact, but that's obviously not the metric to optimize for in an emulation library.
While keeping them both using a group of 4 would be reasonable (and has a neater implementation), I think overall it would be best to switch rv64 to groups of 2. If you made that change and addressed the review nits I think this would be an easy accept and merge :-)
@jrtc27 do you agree it would be worthwhile to make the change for rv64?
I would probably remove them now.
Also, even if we kept the macros we probably don't really need the .inc here, do we?
I suppose we also don't need the .inc for the RISC-V multiply builtins, although there it's a slightly different and more subtle situation.