Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

[CodeGen][Shrink-wrap]split restore point
ClosedPublic

Authored by sushgokh on Jan 26 2018, 2:03 PM.

Details

Summary

This change split a restore point to allow it to only post-dominate blocks reachable by use or def of CSRs/FI.

This is a WIP and I'm posting it to continue the discussion about :

Bugzilla : https://bugs.llvm.org/show_bug.cgi?id=33868

I will be happy to hear any high level comment/suggestion.

This change itself increase 15% more shrink-wrapping in spec2000/2006/2017 benchmarks. I observed 160% more shrink-wrapping in spec2000/2006/2017 benchmarks if we apply the copy forwarding (D41835), PostRASink (D41463), and this change all together.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
qcolombet added inline comments.Mar 28 2023, 2:15 AM
llvm/lib/CodeGen/ShrinkWrap.cpp
940–941

If we changed the CFG (or anything) we need to return true. There may be other things that we are destroying without realizing (e.g., live intervals, but that's not applicable here. The point is we don't know which analysis rely on what.)

sushgokh updated this revision to Diff 509026.EditedMar 28 2023, 8:22 AM

@qcolombet @nickdesaulniers Have addressed all the suggestions

Please mark comments as "Done" in phabricator so that it's clear to reviewers which comment threads are still outstanding.

llvm/lib/CodeGen/ShrinkWrap.cpp
210–211

I think it's more appropriate to pass a SmallVectorImpl<MachineBasicBlock*>& rather than SmallVector<MachineBasicBlock *, 2>& (having the parameter specialization of 2 is...overly specific). It would be preferable to use ArrayRef, but you can't push_back an ArrayRef, so the SmallVectorImp<>& is the way to go.

472

Seems like it can be undone though, IIUC?

481

ArrayRef

524

ArrayRef

552–553

SmallVectorImpl<MachineBasicBlock *>&

559

remove auto

604

remove auto

940–941

tryToSplitRestore is modifying the CFG, right? If that happens, we MUST return true.

Should we perhaps skip calling postShrinkWrapping unless HasCandidate is true?

sushgokh updated this revision to Diff 509217.Mar 28 2023, 10:58 PM
sushgokh marked 42 inline comments as done.Mar 28 2023, 11:03 PM
sushgokh added inline comments.
llvm/lib/CodeGen/ShrinkWrap.cpp
472

yes, right

sushgokh marked an inline comment as done.Apr 6 2023, 12:07 AM

ping.

nickdesaulniers accepted this revision.Apr 7 2023, 9:53 AM
nickdesaulniers added inline comments.
llvm/lib/CodeGen/ShrinkWrap.cpp
471

ReachableByDirty was removed in the latest diff. Please clean up the comment block, too.

This revision is now accepted and ready to land.Apr 7 2023, 9:53 AM
This revision was landed with ongoing or failed builds.Apr 10 2023, 11:29 PM
This revision was automatically updated to reflect the committed changes.

@sushgokh , @nathanchance is reporting issues with this patch breaking a few Linux kernel builds https://github.com/ClangBuiltLinux/linux/issues/1833. Can you please revert, and we'll work on getting you a reproducer?

@nickdesaulniers Does this look good ?

--- a/llvm/lib/CodeGen/ShrinkWrap.cpp
+++ b/llvm/lib/CodeGen/ShrinkWrap.cpp
@@ -601,6 +601,16 @@ bool ShrinkWrap::postShrinkWrapping(bool HasCandidate, MachineFunction &MF,
                                 CleanPreds, TII, RS))
     return false;

+  // FIXME: Currently, we bail out of optimisation if any of the DirtyPreds has
+  // INLINEASM_BR instruction. Also, if restore pt block address is taken, we
+  // need to make suitable changes post restore point split
+  for (MachineBasicBlock *BB : DirtyPreds)
+    for (const MachineInstr &MI : *BB)
+      if (MI.getOpcode() == TargetOpcode::INLINEASM_BR)
+        for (const MachineOperand &MO : MI.operands())
+          if (MO.isMBB() && MO.getMBB() == InitRestore)
+            return false;
+
   // Trying to reach out to the new save point which dominates all dirty blocks.
   MachineBasicBlock *NewSave =
       FindIDom<>(**DirtyPreds.begin(), DirtyPreds, *MDT, false);

@sushgokh I will let Nick comment on the correctness of that diff but I can confirm that applying that diff on top of 6530bd3030d3 (the parent of the revert of this change, bb5befefc6e7) makes both issues that I reported go away.

@nickdesaulniers Does this look good ?

--- a/llvm/lib/CodeGen/ShrinkWrap.cpp
+++ b/llvm/lib/CodeGen/ShrinkWrap.cpp
@@ -601,6 +601,16 @@ bool ShrinkWrap::postShrinkWrapping(bool HasCandidate, MachineFunction &MF,
                                 CleanPreds, TII, RS))
     return false;

+  // FIXME: Currently, we bail out of optimisation if any of the DirtyPreds has
+  // INLINEASM_BR instruction. Also, if restore pt block address is taken, we
+  // need to make suitable changes post restore point split
+  for (MachineBasicBlock *BB : DirtyPreds)

Rather than scan every instruction in the block, I think you can use
MachineBasicBlock::isInlineAsmBrIndirectTarget()
Perhaps we can do that before even adding such an MBB to DirtyPreds?

+ for (const MachineInstr &MI : *BB)
+ if (MI.getOpcode() == TargetOpcode::INLINEASM_BR)
+ for (const MachineOperand &MO : MI.operands())
+ if (MO.isMBB() && MO.getMBB() == InitRestore)
+ return false;
+

// Trying to reach out to the new save point which dominates all dirty blocks.
MachineBasicBlock *NewSave =
    FindIDom<>(**DirtyPreds.begin(), DirtyPreds, *MDT, false);
alexfh added a subscriber: alexfh.Apr 14 2023, 1:07 PM

In our internal testing we've noticed that this commit causes a ~10-15% regression of the Shootout-ackermann benchmark (https://github.com/llvm/llvm-test-suite/blob/main/SingleSource/Benchmarks/Shootout/ackermann.c) on x86. The benchmark is somewhat noisy, thus, I'm not 100% sure it's a real regression. But maybe you want to check if there's something obviously wrong started happening on that code.

@nickdesaulniers Thanks. Will make the necessary changes.

@alexfh Will surely check for the regression and update you if there's any.

@alexfh The regression exists. I tried on AArch64. However, its not due this patch itself but there is some significant block reordering taking place at "block-placement" pass.

@nickdesaulniers would it be good to go ahead and commit the patch and later track this issue with a bug raised for it? This is just to avoid major changes ticking in the patch if its delayed.

@nickdesaulniers would it be good to go ahead and commit the patch and later track this issue with a bug raised for it? This is just to avoid major changes ticking in the patch if its delayed.

No! Our CI will notice this immediately again. Do not break our builds!

fhahn added a subscriber: fhahn.Apr 17 2023, 11:32 AM

In our internal testing we've noticed that this commit causes a ~10-15% regression of the Shootout-ackermann benchmark (https://github.com/llvm/llvm-test-suite/blob/main/SingleSource/Benchmarks/Shootout/ackermann.c) on x86. The benchmark is somewhat noisy, thus, I'm not 100% sure it's a real regression. But maybe you want to check if there's something obviously wrong started happening on that code.

We've also seen a ~8% performance regression on internal benchmarks on ARM64.

@alexfh @fhahn Thanks for the inputs. Could you please confirm if the regression goes away by applying this?

@@ -471,7 +471,9 @@ tryToSplitRestore(MachineBasicBlock *MBB,
       MBBFallthrough.insert(BB);   MachineBasicBlock *NMBB = MF->CreateMachineBasicBlock();
-  MF->insert(MachineFunction::iterator(MBB), NMBB);
+  // Insert this block at the end of the function. Inserting in between may
+  // interfere with control flow optimizer decisions.
+  MF->insert(MF->end(), NMBB);   
   for (const MachineBasicBlock::RegisterMaskPair &LI : MBB->liveins())
     NMBB->addLiveIn(LI.PhysReg);

@nickdesaulniers Need your approval to go ahead with relanding with 2 changes viz. 1. INLINEASM_BR changes 2.Change in block location

Have tested both against SPEC17 and llvm test-suite and figures look good. No major diversions were observed for test suite

@sushgokh Has a new revision with the changes that you have mentioned been pushed here for review? I can pull it down and verify that the two configurations that had boot issues are fixed, as well as making sure that no other configurations regress.

@nathanchance Thanks for the initiative. Could you please apply the below patch on top of existing patch and check?

--- a/llvm/lib/CodeGen/ShrinkWrap.cpp
+++ b/llvm/lib/CodeGen/ShrinkWrap.cpp
@@ -471,7 +471,9 @@ tryToSplitRestore(MachineBasicBlock *MBB,
       MBBFallthrough.insert(BB);

   MachineBasicBlock *NMBB = MF->CreateMachineBasicBlock();
-  MF->insert(MachineFunction::iterator(MBB), NMBB);
+  // Insert this block at the end of the function. Inserting in between may
+  // interfere with control flow optimizer decisions.
+  MF->insert(MF->end(), NMBB);

   for (const MachineBasicBlock::RegisterMaskPair &LI : MBB->liveins())
     NMBB->addLiveIn(LI.PhysReg);
@@ -577,6 +579,12 @@ bool ShrinkWrap::postShrinkWrapping(bool HasCandidate, MachineFunction &MF,
       !MPDT->dominates(InitRestore, InitSave))
     return false;

+  // Bail out of the optimization if any of the basic block is target of
+  // INLINEASM_BR instruction
+  for (MachineBasicBlock &MBB : MF)
+    if (MBB.isInlineAsmBrIndirectTarget())
+      return false;
+
   DenseSet<const MachineBasicBlock *> DirtyBBs;
   for (MachineBasicBlock &MBB : MF) {
     if (MBB.isEHPad()) {

@sushgokh I tested that diff on top of the revert and I no longer see the boot issues I initially reported or any new regressions. I do not feel like I am knowledgeable enough to approve the reland, hopefully someone else can chime in on that but this should not be blocked on the Linux kernel regression any longer.

@nathanchance Thanks for the update :)

@nickdesaulniers Requesting approval for relanding

@nathanchance Thanks for the update :)

@nickdesaulniers Requesting approval for relanding

Can you please push the updated patch to this phab review (click reopen in the bottom left "Add Action..." menu) or a new one? It's hard for me to look at the diff without the surrounding context, which phab would allow me to view.

In particular, I'm concerned if the checks for isInlineAsmBrIndirectTarget in ShrinkWrap::performShrinkWrapping are still relevant after this change?

I would also like to see a callbr based test added.

sushgokh reopened this revision.May 2 2023, 7:06 PM
This revision is now accepted and ready to land.May 2 2023, 7:06 PM
sushgokh requested review of this revision.May 2 2023, 7:26 PM
sushgokh updated this revision to Diff 518942.

@nickdesaulniers test for inlineasm_br added as 'noshrink_bb_as_inlineasmbr_target()' in 'shrinkwrap-split-restore-point.mir'

sushgokh updated this revision to Diff 518949.May 2 2023, 7:39 PM
nickdesaulniers accepted this revision.May 5 2023, 10:29 AM

LGTM; though for the two tests you converted to use the update_*_test_checks.py, you should do those conversions as separate parent commits. I'm not going to block this commit on that, but it's nice to do in the future.

llvm/test/CodeGen/PowerPC/shrink-wrap.ll
1–4

Consider pre-committing the conversion to update_llc_test_checks.py so that it's more obvious if this change to LLVM changes anything of interest in this test.

llvm/test/CodeGen/PowerPC/shrink-wrap.mir
2

Consider pre-committing the conversion to update_mir_test_checks.py so that it's more obvious if this change to LLVM changes anything of interest in this test.

This revision is now accepted and ready to land.May 5 2023, 10:29 AM
This revision was automatically updated to reflect the committed changes.

It looks like for 32-bit ARM, shrink-wrapping is blocking the formation of pop {pc} instructions to return from a function (so we instead pop {lr}; bx lr). Is this new, or a side-effect of shrink-wrapping more cases? Any idea what's causing it? Is there an issue tracking it?

ayzhao added a subscriber: ayzhao.May 8 2023, 4:14 PM

This is causing lld to crash on Chrome builds when PGO and ThinLTO are enabled: https://crbug.com/1443635

It looks like for 32-bit ARM, shrink-wrapping is blocking the formation of pop {pc} instructions to return from a function (so we instead pop {lr}; bx lr). Is this new, or a side-effect of shrink-wrapping more cases? Any idea what's causing it? Is there an issue tracking it?

This should be side effect of doing more shrink wrapping and caused by later passes as this patch just changes the save/restore point and nothing beyond that. Currently, we dont have any issue tracking it but if this is leading to performance regression, then could you please raise a bug for it when I reland it ?

This is causing lld to crash on Chrome builds when PGO and ThinLTO are enabled: https://crbug.com/1443635

@ayzhao I request you to please provide me following information:

  1. Reproducer of the function causing the assert failure
  2. llc -print-before=shrink-wrap for the function
  3. llc -debug-only=shrink-wrap for function

Also, I request to check if the following patch works since I suspect that PGO is changing the block frequencies:

--- a/llvm/lib/CodeGen/ShrinkWrap.cpp
+++ b/llvm/lib/CodeGen/ShrinkWrap.cpp
@@ -632,7 +632,8 @@ bool ShrinkWrap::postShrinkWrapping(bool HasCandidate, MachineFunction &MF,
       FindIDom<>(**DirtyPreds.begin(), DirtyPreds, *MDT, false);

   while (NewSave && (hasDirtyPred(ReachableByDirty, *NewSave) ||
-                     EntryFreq < MBFI->getBlockFreq(NewSave).getFrequency()))
+                     EntryFreq < MBFI->getBlockFreq(NewSave).getFrequency() ||
+                     MLI->getLoopFor(NewSave)))
     NewSave = FindIDom<>(**NewSave->pred_begin(), NewSave->predecessors(), *MDT,
                          false);

This is causing lld to crash on Chrome builds when PGO and ThinLTO are enabled: https://crbug.com/1443635

@ayzhao I request you to please provide me following information:

  1. Reproducer of the function causing the assert failure

~/src/llvm-project/build/bin/llc patternprops.o.5.precodegen.ll
llc: /usr/local/google/home/ayzhao/src/llvm-project/llvm/lib/CodeGen/ShrinkWrap.cpp:666: bool (anonymous namespace)::ShrinkWrap::postShrinkWrapping(bool, llvm::MachineFunction &, llvm::RegScavenger *): Assertion `(!MLI->getLoopFor(Save) && !MLI->getLoopFor(Restore)) && "Unexpected save or restore point in a loop"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc patternprops.o.5.precodegen.ll
1.      Running pass 'Function Pass Manager' on module 'patternprops.o.5.precodegen.ll'.
2.      Running pass 'Shrink Wrapping analysis' on function '@_ZN6icu_7212PatternProps14trimWhiteSpaceEPKDsRi'
 #0 0x0000563786318bd7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x1e44bd7)
 #1 0x0000563786316a8e llvm::sys::RunSignalHandlers() (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x1e42a8e)
 #2 0x00005637863193ba SignalHandler(int) Signals.cpp:0:0
 #3 0x00007fafd965af90 (/lib/x86_64-linux-gnu/libc.so.6+0x3bf90)
 #4 0x00007fafd96a9ccc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007fafd965aef2 raise ./signal/../sysdeps/posix/raise.c:27:6
 #6 0x00007fafd9645472 abort ./stdlib/abort.c:81:7
 #7 0x00007fafd9645395 _nl_load_domain ./intl/loadmsgcat.c:1177:9
 #8 0x00007fafd9653df2 (/lib/x86_64-linux-gnu/libc.so.6+0x34df2)
 #9 0x000056378598fbbf (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x14bbbbf)
#10 0x00005637857517bc llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x127d7bc)
#11 0x0000563785c6bcb3 llvm::FPPassManager::runOnFunction(llvm::Function&) (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x1797cb3)
#12 0x0000563785c73e11 llvm::FPPassManager::runOnModule(llvm::Module&) (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x179fe11)
#13 0x0000563785c6c6cc llvm::legacy::PassManagerImpl::run(llvm::Module&) (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x17986cc)
#14 0x0000563784a8ad89 main (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x5b6d89)
#15 0x00007fafd964618a __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#16 0x00007fafd9646245 call_init ./csu/../csu/libc-start.c:128:20
#17 0x00007fafd9646245 __libc_start_main ./csu/../csu/libc-start.c:368:5
#18 0x0000563784a857b1 _start (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x5b17b1)
[1]    2041024 IOT instruction  ~/src/llvm-project/build/bin/llc patternprops.o.5.precodegen.ll
  1. llc -print-before=shrink-wrap for the function

  1. llc -debug-only=shrink-wrap for function

Also, I request to check if the following patch works since I suspect that PGO is changing the block frequencies:

--- a/llvm/lib/CodeGen/ShrinkWrap.cpp
+++ b/llvm/lib/CodeGen/ShrinkWrap.cpp
@@ -632,7 +632,8 @@ bool ShrinkWrap::postShrinkWrapping(bool HasCandidate, MachineFunction &MF,
       FindIDom<>(**DirtyPreds.begin(), DirtyPreds, *MDT, false);

   while (NewSave && (hasDirtyPred(ReachableByDirty, *NewSave) ||
-                     EntryFreq < MBFI->getBlockFreq(NewSave).getFrequency()))
+                     EntryFreq < MBFI->getBlockFreq(NewSave).getFrequency() ||
+                     MLI->getLoopFor(NewSave)))
     NewSave = FindIDom<>(**NewSave->pred_begin(), NewSave->predecessors(), *MDT,
                          false);

That seems to work:

$ ~/src/llvm-project/build/bin/llc patternprops.o.5.precodegen.ll

$ echo $?
0

@nickdesaulniers I have landed the patch with optimisation off by default so that future changes to it and relanding will become easy.

@ayzhao Thanks for the inputs.

sushgokh reopened this revision.May 15 2023, 8:08 PM
This revision is now accepted and ready to land.May 15 2023, 8:08 PM
sushgokh requested review of this revision.May 15 2023, 8:19 PM
sushgokh updated this revision to Diff 522416.

@nickdesaulniers could you please check if its good to reland?

@nickdesaulniers could you please check if its good to reland?

Are @ayzhao 's and @efriedma 's comments addressed?

sushgokh added a comment.EditedMay 23 2023, 10:07 PM

@ayzhao issue addressed.
@efriedma's issue needs to addressed separately in other pass(if need be) but I don't think this issue will result in performance regression.

nickdesaulniers accepted this revision.May 24 2023, 11:37 AM
This revision is now accepted and ready to land.May 24 2023, 11:37 AM
nickdesaulniers requested changes to this revision.May 24 2023, 11:38 AM
nickdesaulniers added inline comments.
llvm/test/CodeGen/AArch64/dont-shrink-wrap-stack-mayloadorstore.s
1 ↗(On Diff #522416)

There's a presubmit failure that this newly added test is missing a RUN line.

This revision now requires changes to proceed.May 24 2023, 11:38 AM
This revision was not accepted when it landed; it landed in state Needs Revision.May 25 2023, 1:27 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
sushgokh marked an inline comment as done.EditedMay 25 2023, 1:30 AM

I submitted the patch without checking that the patch needs revision. I got confused by back to back accept followed by revision request. Apology.

But I have removed the file accidently added

llvm/test/CodeGen/AArch64/dont-shrink-wrap-stack-mayloadorstore.s
1 ↗(On Diff #522416)

I had added that by mistake(git add -A). Apologies.