Enabling shrink wrapping requires ensuring the insertion point of the epilogue is correct for MBBs without a terminator, in which case the instruction to adjust the stack pointer is the last instruction in the block.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I've reworked this patch to safely work around the implementation of save/restore via libcalls. These place constraints on the save/restore points that can be chosen by the shrink wrapping algorithm.
lib/Target/RISCV/RISCVFrameLowering.cpp | ||
---|---|---|
171 ↗ | (On Diff #208047) | We might need to consider that there're two branches at the end of the basic block. Last non-debug instruction is PseudoBr which is a terminator. So the epilogue will be inserted between the branches. BEQ killed renamable $x10, $x0, %bb.3 PseudoBR %bb.2 Replacing by MachineBasicBlock::iterator MBBI = MBB.getFirstTerminator(); could handle the case. and if (MBBI == MBB.end()) MBBI = MBB.getLastNonDebugInstr(); to handle that there is no terminator in the basic block. |
I think this patch is orthogonal to save/restore via libcalls, and we can probably land it sooner. Could you please rebase so it's standalone?
Sure, that's sensible. I'll keep a separate patch for ensuring compatibility with save/restore for if/when it's required.
There are some additional perennial test suite failures when applying this patch and enabling -mllvm -enable-shrink-wrap.
When Lewis updates the patch to be standalone, we can verify it again.
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp | ||
---|---|---|
230 | I have been verifying the pending patches at Oz, Os, and also O2. This is helping with uncovering issues. With the latest standalone patch, it seems we have a few crashes left. Below is a bugpoint reduced test: define dso_local void @test() local_unnamed_addr { br i1 undef, label %T.exit, label %for.body.i for.body.i: ; preds = %for.body.i, %entry store i32 0, i32* undef %incdec.ptr.i.i = getelementptr inbounds i32, i32* null, i32 1 %cmp.i.i = icmp eq i32* undef, undef br i1 %cmp.i.i, label %T.exit.loopexit, label %for.body.i T.exit.loopexit: ; preds = %for.body.i %0 = ptrtoint i32* %incdec.ptr.i.i to i32 br label %T.exit T.exit: ; preds = %T.exit.loopexit, %entry ret void } |
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp | ||
---|---|---|
230 | It seems that shrink wrapping may choose an empty basic block to insert epilogue. We might need to add empty block detection for DL initialization and MBBI advance. Something like: DebugLoc DL = !MBB.empty() ? MBBI->getDebugLoc() : DebugLoc(); if (!MBB.empty() && !MBBI->isTerminator()) MBBI = std::next(MBBI); |
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp | ||
---|---|---|
230 | Thanks Ana, it looks like this is a case of the shrink wrapping pass choosing a basic block which is empty as the place to insert the prologue. I didn't realise it was possible, but I'll push an updated patch that takes this into account. |
Use MBB.end() and a blank DebugLoc for epilogue insertion when a basic block is empty.
Lewis, this patch is not applying clean on top of https://reviews.llvm.org/D62686. Can you please rebase?
Hi Ana, I've made this patch independent of the save/restore patch as per Alex's request. Unfortunately that means that neither will rebase cleanly on top of the other until one is committed.
I could recreate the old patch with the save/restore compatibility included (currently split into D68644) implemented on top of save/restore, if that is worth doing?
Off topic, this makes me think Phabricator needs some way to add multiple alternative diffs for applying a given patch on top of other patches...
Thanks Lewis, I applied the patch and I am not detecting any new failure with this feature.
Lewis, I am not seeing any code change.
Btw to turn it on by default on a the target you need to add enableShrinkWrapping implementation in RISCVFrameLowering.h
+ /// Returns true if the target will correctly handle shrink wrapping.
+ bool enableShrinkWrapping(const MachineFunction &MF) const override {
+ return true;
+ }
+
So I've been looking into the benchmarks that I've been running to figure out why code doesn't change with this enabled.
I found that GCC's shrink wrapping does make a difference, and that comes from a loop which looks like the following:
int foo(int rpt) { int i; int r; for (i = 0; i < rpt; ++i) { srand(0); r = ...; } return r; }
GCC appears to convert this into the equivalent of:
int foo(int rpt) { if (rpt == 0) return 0; int i; int r; for (i = 0; i < rpt; ++i) { srand(0); r = ...; } return r; }
And is then able to shrink wrap around the if statement. Sure enough, if I feed the modified function above into LLVM, we are able to shrink wrap it as well.
(I'm not certain if this conversion is sane from GCC, but I haven't properly looked into it.)
So personally I'm not concerned about the fact that shrink wrapping doesn't make a difference in my benchmarks; I can show that it changes C code we expect it to change... I'm not so sure about the lack of differences with SPEC though.
I note that the TargetFrameLowering hooks canUseAsPrologue and canUseAsEpilogue are both called by the shrink wrapper. They default to true, but targets may need to override this for correctness. Looking at e.g. AArch64, I see it overrides canUseAsPrologue and returns false in the case that the function needs stack realignment and there are no scratch registers available. Are you certain that no such case is needed for RISC-V?
So I see that the stack realignment implementation for larger values does indeed require the use of a scratch register, so that implies we should attempt to prevent the situation where we don't have one available. However the scratch register is created as a virtual GPR register, rather than attempting to scavenge a real free register. I can imagine that if this was changed, we could directly perform the same check in canUseAsPrologue to guarantee a register being available in a much more transparent and obvious way. EG the way in which AArch64 uses the function findScratchNonCalleeSaveRegister.
I'm working on a patch for the realignment implementation based off issues that arose with implementing the ilp32e ABI. I will update this thread when that patch is ready.
Lewis, same question, is the patch final? It would be good to merge it before the 10.0 release branch creation on Jan 15th
This patch is yes, so long as it applies cleanly. So if it's agreed that it should be landed then I can do that as soon as I get a chance.
Thanks for the patch Lewis. Overall it LGTM, and it would be great to merge this in time for the LLVM 10.0 release.
The patch would benefit from having more extensive tests, although those could be provided by a follow-up patch (but please do follow up). For instance, the current tests don't exercise the if (!MBB.empty()) else condition. It would also be great to include tests that illustrate the impact on debug instructions (even if that has been an area where testing has been limited in the past). See for instance my inline comment for a possible issue regarding correctly updating the debug locations.
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp | ||
---|---|---|
261 | This declaration shadows the DL above; I assume you meant to update the previously declared DL. |
I have been verifying the pending patches at Oz, Os, and also O2. This is helping with uncovering issues.
You can run it with 'llc test.ll -enable-shrink-wrap
With the latest standalone patch, it seems we have a few crashes left. Below is a bugpoint reduced test:
define dso_local void @test() local_unnamed_addr {
entry:
for.body.i: ; preds = %for.body.i, %entry
T.exit.loopexit: ; preds = %for.body.i
T.exit: ; preds = %T.exit.loopexit, %entry
}