This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Better Split Stack Pointer Adjustment for RVC
Needs ReviewPublic

Authored by luismarques on Apr 29 2020, 4:09 PM.

Details

Summary

We received a report that split SP adjustment was not performing optimally when
compressed instructions were available. This patch more closely matches the
FirstSPAdjustAmount to the available instructions, so that loading and storing
the callee-saved registers is done with compressed instructions on architectures
that support them.

This patch also contains a fix to the type of SecondSPAdjustAmount to
correctly track and check the sign of this adjustment.

Diff Detail

Event Timeline

lenary created this revision.Apr 29 2020, 4:09 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2020, 4:09 PM

It looks like some of the stack adjustment instructions still aren't getting compressed when they should?

It looks like some of the stack adjustment instructions still aren't getting compressed when they should?

Ah, I see. Yes, I think this can be tweaked to ensure that we use c.addisp in more places.

lenary updated this revision to Diff 261099.Apr 29 2020, 5:08 PM

Tweak thresholds to ensure that c.addi16sp is also used.

shiva0217 added inline comments.Apr 29 2020, 11:21 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
777

Could be early exit something like: if (CSI.size() == 0) return DoNotSplitSPAdjustment;

783

Could we extract a static function as getOffsetAddressableLimit(MF, StackAlign)?

789

Should be 8-bit limit for c.{l,s}wsp with isShiftedUInt<6, 2>(Imm)?

792

Should be 9-bit limit for c.{l,s}dsp with isShiftedUInt<6, 3>(Imm)?

llvm/test/CodeGen/RISCV/split-sp-adjust.ll
993

The stack didn't split for rv32c because the stack size is 256 which is equal to the limit. Should we choose a slightly greater stack size for stack_split_rv32_c test case?

lenary updated this revision to Diff 261139.Apr 30 2020, 12:43 AM

Update after shiva0217's review:

  • Update test to correctly show splitting on RV32*C
  • Use early return if CSI.size() == 0
  • Update comments to count bits correctly.

I decided against a getOffsetAddressableLimit function, as the logic in
getFirstSPAdjustAmount isn't just about addressable offsets, it's also about
what we can adjust in a single (compressed if possible) instruction, so I
thought that adding that function would make the logic harder to understand.

lenary marked 3 inline comments as done.Apr 30 2020, 12:45 AM
lenary marked 2 inline comments as done.
lenary added inline comments.
llvm/test/CodeGen/RISCV/split-sp-adjust.ll
993

Great catch - I forgot to update the tests when I tweaked the limits.

shiva0217 accepted this revision.EditedApr 30 2020, 2:01 AM

Update after shiva0217's review:

  • Update test to correctly show splitting on RV32*C
  • Use early return if CSI.size() == 0
  • Update comments to count bits correctly.

I decided against a getOffsetAddressableLimit function, as the logic in
getFirstSPAdjustAmount isn't just about addressable offsets, it's also about
what we can adjust in a single (compressed if possible) instruction, so I
thought that adding that function would make the logic harder to understand.

Got your point. The logic is not only for addressable but also for preference for code size.

LGTM. Thanks for the update.

This revision is now accepted and ready to land.Apr 30 2020, 2:01 AM
lenary marked an inline comment as done.Apr 30 2020, 7:01 AM

Got your point. The logic is not only for addressable but also for preference for code size.

Yeah. I think there could be an argument for using 512 for the threshold on RV64*C, but tbh the difference between these a threshold of 512 and of 496 will not have a huge effect on overall program size in most programs. This trades-off the size of the final addi sp, sp, <threshold> against how well you can compress the save/restore functions, and it's fairly unlikely you need the entire addresable range of the save/restore instructions anyway. I think we're talking a few bytes in functions with StackSize of 512 (the only stack size in that range that is well-aligned).

I think I would need to see benchmarks before I could say one was specifically better than the other.

lenary updated this revision to Diff 261466.May 1 2020, 8:58 AM

I have updated this patch based off internal discussions. Most of these updates
are about improving code clarity, and adding comments for future maintainers.

I also took the time to verify that this was producing the correct call frame
information, even in the presence of frame pointers.

asb requested changes to this revision.May 6 2020, 11:34 PM

Thanks Sam. I left a few minor comments inline. From looking at the diff in compiler output on the GCC torture suite, I do see a few cases where there is a potential regression. e.g. loop-ivopts-2.c, which represents a case where the stack adjustment is large, but doesn't overflow the 12-bit immediate fields of addi and s[d|w]. The prologue and epilogue are both two instructions and become three instructions with this patch (although it is a code size win).

I don't expect this is a real world problem and of course there are many microarchitectures where 3 compressed instructions could be preferred. But if we could gate examples in the same class as above on whether we're optimising for size I think that would be slightly preferable.

I haven't looked in more detail but I also see pr37573.c at O1 adds another two instructions to the prologue and epilogue. In the case the code size savings are more meaningful, so there's a strong argument that defaulting to slightly more instructions is a good choice in the common case.

I think this isn't totally clear cut so I'm not expressing a fixed and unchangeable position here, but did want to discuss further before merging.

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
735–763

Should be a C++ style comment

736

I'm struggling to parse the "reduce prologue/epilogue" bit of this sentence

739

Did you mean to say "function" here?

llvm/lib/Target/RISCV/RISCVFrameLowering.h
60–72

Should use a C++ style comment

61

Nit: LLVM seems to mostly use prologue and epilogue spellings

65

Nit: optional => Opetional

This revision now requires changes to proceed.May 6 2020, 11:34 PM
asb added inline comments.May 19 2020, 12:31 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.h
65

Oh dear, I managed to typo my suggested correction!

optional => Optional

lenary updated this revision to Diff 300508.Oct 24 2020, 3:18 PM
lenary marked 6 inline comments as done.

Address some comment phrasing issues.

I have not had the time to look at @asb's suggestions around gating the
behaviour of these changes -- I'm planning to hand this patch off to
@luismarques so we can get it finished and landed soon, as I don't expect to
have the time to investigate those issues in the near future.

lenary added inline comments.Oct 24 2020, 3:18 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.h
60–72

Actually I want to keep this as a doc comment, given it's in the header.

jrtc27 added inline comments.Oct 24 2020, 3:28 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.h
60–72

So /// then?

lenary updated this revision to Diff 301103.Oct 27 2020, 2:31 PM

Fixing comment style.

llvm/lib/Target/RISCV/RISCVFrameLowering.h
60–72

Ah, I didn't realise LLVM had a preference, because there are both styles in the repository.

lenary updated this revision to Diff 301104.Oct 27 2020, 2:33 PM

Oh, I left some history behind. This brings it back.

luismarques commandeered this revision.Nov 3 2020, 6:40 PM
luismarques edited reviewers, added: lenary; removed: luismarques.

Arrr matey, I'm the captain now!

Add logic to balance increased instruction count vs decreased code size.

rkruppe removed a subscriber: rkruppe.Nov 4 2020, 12:48 AM
luismarques marked 2 inline comments as done.

NFC minor clean-up tweak.

Fix borked test update.

luismarques edited the summary of this revision. (Show Details)

Rebase.

  • Sprayed test with correctness mist.
lenary resigned from this revision.Jan 18 2022, 2:22 AM