This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen][Shrink-wrap]split restore point
ClosedPublic

Authored by sushgokh on Jan 26 2018, 2:03 PM.

Details

Summary

This change split a restore point to allow it to only post-dominate blocks reachable by use or def of CSRs/FI.

This is a WIP and I'm posting it to continue the discussion about :

Bugzilla : https://bugs.llvm.org/show_bug.cgi?id=33868

I will be happy to hear any high level comment/suggestion.

This change itself increase 15% more shrink-wrapping in spec2000/2006/2017 benchmarks. I observed 160% more shrink-wrapping in spec2000/2006/2017 benchmarks if we apply the copy forwarding (D41835), PostRASink (D41463), and this change all together.

Diff Detail

Event Timeline

junbuml created this revision.Jan 26 2018, 2:03 PM

Thanks for working on this! I think the idea is worth pursuing. This kinda reminds me of Post Register Allocation Spill Code Optimization, Christopher Lupo, Kent D. Wilken. A similar issue is what they describe in §4.8. I think if you extract the blocks C, D, E, F from the example, that's your function.

I tried looking into implementing that a while ago, and I mostly gave up because our SESE region infrastructure is not used / tested much, especially during CodeGen. I think your implementation has a big advantage because it doesn't require to build such (expensive, IIRC) constructions and might be much faster since SESE regions are usually not available, and dominators are.

Few thoughts: (please correct me if I'm wrong):

  • We don't want to add extra branches. By that I mean that we want to guarantee that we always (or when we know it's not worth having extra branches) have a fall through from NMBB to MBB. In your example this is perfectly fine, and I see that NMBB is always inserted before MBB, which I think should be always fine.
  • We want to be sure that, in the end, if the points are not interesting, we don't insert NMBB.
  • I would run a verifier or add some statistics / remarks to see if any of the previous points are happening and if it causes any regressions.
  • I think doing this kind of simulations on the post-dominator tree itself as @qcolombet suggested sounds interesting. Correct me if I'm wrong, here we're looking to introduce a common post-dominator of all the "dirty" blocks, right?

I observed 160% more shrink-wrapping in spec2000/2006/2017 benchmarks if we apply the copy forwarding (D41835), PostRASink (D41463), and this change all together.

Just to be sure, on AArch64, right? Do you see any performance improvement with this change? Any regressions?

Since you marked this as [WIP] I'll skip any comments on the code itself for now.

Just two remarks:

  • So far shrink-wrapping was solely an analysis pass, i.e., it didn't modify the input code
  • Ideally I'd like we postpone creating new blocks until we decided where we are going to insert the code, otherwise we'll end up with blocks to clean-up later on

Regarding the SESE approach, I think it wouldn't be enough because what you want and not given by SESE is *new* common (post)dominator. E.g., in the motivating example we wanted to merge (A->C) (B->C) into (A->common) (B->common) (common->C).

Regarding the SESE approach, I think it wouldn't be enough because what you want and not given by SESE is *new* common (post)dominator. E.g., in the motivating example we wanted to merge (A->C) (B->C) into (A->common) (B->common) (common->C).

Yes, but if I understand the paper correctly, in Figure 4 (left) they basically create an SESE out of what they call "Set 1" in Figure 3 (depending on what they call "execution count cost model"). I read that part a while ago but I think at that point they are deciding whether to use "Region 1" as a save/restore boundary or to create a new SESE for that.

junbuml updated this revision to Diff 134496.Feb 15 2018, 1:50 PM

Based on the comments from Francis and Quentin, I tried to split the restore block only when we know that the block spilt is used as restore point. After performing current shrink wrapping, I ran post shrinking to see if we can shrink save point further by splitting restore point. Currently, this update is still WIP because I do this in shrink wrapping pass. If this approach is generally reasonable and we need to keep the shrink-wrap pass as an analysis pass, then I will move this in a new pass between shrink-wrap and PEI.

I observed 160% more shrink-wrapping in spec2000/2006/2017 benchmarks if we apply the copy forwarding (D41835), PostRASink (D41463), and this change all together.

Just to be sure, on AArch64, right? Do you see any performance improvement with this change? Any regressions?

Yes it was on AArch64. I applied this change on top of PostRASink (D41463), and I ran performance test for spec :

spec2000/mesa           -1.09 % 
spec2017/perlbench   +1.33 %
spec2017/xalancbmk  +3.04%
spec2017/povray        +3.59 %
spec2006/povray        +4.61%
junbuml added a comment.EditedFeb 21 2018, 2:40 PM

There are up/down in the scores of these benchmarks, but in povray many extra shrink wrapping happened in functions shown in profile and dynamic instruction count was decreased with this change: -4.80% (2006/povray) and -4.91% (2017/povray). Further shrink-wrapping happened in one of the hot function in xalanbmk, but the dynamic instruction count wasn't decreased in xalancbmk.

Hi Francis and Quentin,
Does any of you has any comment about it? I will be happy to hear any high level comment about it.
Thanks,
Jun

junbuml updated this revision to Diff 138433.Mar 14 2018, 12:25 PM

Rebased. In this change, I perform post shrinking where we try to further shrink the save point by splitting the restore point after finishing the regular shrink wrapping. This change is WIP for now because I do this in shrink wrapping pass. If overall approach is acceptable and we want to keep shrink-wrap pass as an analysis pas , I will move it as a new pass. Please take a look and let me know any comment.

junbuml updated this revision to Diff 140125.Mar 28 2018, 12:53 PM

Rebased and minor changes in comments. Please let me know if the approach I'm using here make sense.

@junbuml Is it possible for you to rebase this patch and upstream it? With current LLVM trunk, its giving around 7% uplift on SPEC2017 povray

Herald added a project: Restricted Project. · View Herald TranscriptFeb 8 2023, 12:01 AM

@junbuml I will take over this work next week.

Let me know if there are any issues with taking over.

sushgokh commandeered this revision.Mar 17 2023, 12:56 AM
sushgokh added a reviewer: junbuml.
sushgokh updated this revision to Diff 505997.Mar 17 2023, 1:05 AM
sushgokh retitled this revision from [WIP][Shrink-wrap]split restore point to [CodeGen][Shrink-wrap]split restore point.
sushgokh added reviewers: nickdesaulniers, MaskRay.
sushgokh added subscribers: SjoerdMeijer, madhur13490, rjj.

This is reland attempt.

The only change to the patch is introducing 'updateTerminator' function. Rest all is inherited from previous revision.

lkail added a subscriber: lkail.Mar 17 2023, 2:46 AM

Initial pass at basic code style review.

llvm/lib/CodeGen/ShrinkWrap.cpp
388–390
return !TII.analyzeBranch(...
416

MF is unused

429

Use ArrayRef from ADT for parameters.

444

Please add a comment above the definition describing what this function is doing.

454

Please add a comment above the definition describing what this function is doing.

467

remove auto

470

remove {} from for

471

remove auto

474

remove {} from for

479

replace auto with RegisterMaskPair&

486

remove auto

488

remove for {}

497

What's NMBB? Maybe a more descriptive identifier might be nice?

507

remove auto

510

remove for {}

511

remove auto

514

remove for {}

517

remove auto

519

remove for {}

524

remove auto

528

Please add a comment above the definition describing what this function is doing.

538

remove auto

584

remove auto

607

dominates

940–941

should we still return false here if if Changed == true?

Am I reading this right? Did the previous version of ShrinkWrap::runOnMachineFunction never return true?

thegameg added inline comments.Mar 20 2023, 5:10 PM
llvm/lib/CodeGen/ShrinkWrap.cpp
940–941

Correct, the previous version doesn't change the code, it just tells PrologEpilogInserter where to place the prologue/epilogue through MachineFrameInfo.

I guess here we should return true if we do any edge-splitting.

sushgokh added inline comments.Mar 21 2023, 2:19 AM
llvm/lib/CodeGen/ShrinkWrap.cpp
940–941

@nickdesaulniers I think no need of returning true if Changed == true because dominance/post-dominance relations are updated after post-shrinking. Is there any other thing that needs updation ?

sushgokh added inline comments.Mar 21 2023, 6:06 AM
llvm/lib/CodeGen/ShrinkWrap.cpp
497

Will add a comment to indicate that NMBB is short for new restore point.

sushgokh updated this revision to Diff 506979.Mar 21 2023, 7:50 AM

Hey,

I only had a cursory look but the overall direction seems good.

I'll look closer after you add comments that goes into more details of what each function is doing and lay off the nomenclature around dirty, clean, and splittable.

Cheers,
-Quentin

llvm/lib/CodeGen/ShrinkWrap.cpp
104

How about a more descriptive name like enable-shrink-wrap-region-split?

192

Typo: chance => change

199

Here and everywhere: use /// to enable doxygen comments.

200

Please document the method and all the parameters.
In particular what it means to be a clean/dirty predecessor and what we check here.
I see that you have some more comments in the cpp, but I believe it doesn't answer these questions.

Also we should document what we mean by splittable with respect to what because a block should always be splittable in the more general term, but that may not help us.
I.e., we should always be able to do something like

BB:
   <...>

>

BB:
   // fall through
NewBB:
   <...>

That's useless but there is nothing preventing us to do that so technically blocks are splittable :).

391

Both arguments can be const

391

Document what dirty means.

919

The comment starting at " If MF is irreducible, a block may be in a loop without" is duplicated here.

qcolombet added inline comments.Mar 28 2023, 2:15 AM
llvm/lib/CodeGen/ShrinkWrap.cpp
940–941

If we changed the CFG (or anything) we need to return true. There may be other things that we are destroying without realizing (e.g., live intervals, but that's not applicable here. The point is we don't know which analysis rely on what.)

sushgokh updated this revision to Diff 509026.EditedMar 28 2023, 8:22 AM

@qcolombet @nickdesaulniers Have addressed all the suggestions

Please mark comments as "Done" in phabricator so that it's clear to reviewers which comment threads are still outstanding.

llvm/lib/CodeGen/ShrinkWrap.cpp
210–211

I think it's more appropriate to pass a SmallVectorImpl<MachineBasicBlock*>& rather than SmallVector<MachineBasicBlock *, 2>& (having the parameter specialization of 2 is...overly specific). It would be preferable to use ArrayRef, but you can't push_back an ArrayRef, so the SmallVectorImp<>& is the way to go.

472

Seems like it can be undone though, IIUC?

481

ArrayRef

524

ArrayRef

552–553

SmallVectorImpl<MachineBasicBlock *>&

559

remove auto

604

remove auto

940–941

tryToSplitRestore is modifying the CFG, right? If that happens, we MUST return true.

Should we perhaps skip calling postShrinkWrapping unless HasCandidate is true?

sushgokh updated this revision to Diff 509217.Mar 28 2023, 10:58 PM
sushgokh marked 42 inline comments as done.Mar 28 2023, 11:03 PM
sushgokh added inline comments.
llvm/lib/CodeGen/ShrinkWrap.cpp
472

yes, right

sushgokh marked an inline comment as done.Apr 6 2023, 12:07 AM

ping.

nickdesaulniers accepted this revision.Apr 7 2023, 9:53 AM
nickdesaulniers added inline comments.
llvm/lib/CodeGen/ShrinkWrap.cpp
471

ReachableByDirty was removed in the latest diff. Please clean up the comment block, too.

This revision is now accepted and ready to land.Apr 7 2023, 9:53 AM
This revision was landed with ongoing or failed builds.Apr 10 2023, 11:29 PM
This revision was automatically updated to reflect the committed changes.

@sushgokh , @nathanchance is reporting issues with this patch breaking a few Linux kernel builds https://github.com/ClangBuiltLinux/linux/issues/1833. Can you please revert, and we'll work on getting you a reproducer?

@nickdesaulniers Does this look good ?

--- a/llvm/lib/CodeGen/ShrinkWrap.cpp
+++ b/llvm/lib/CodeGen/ShrinkWrap.cpp
@@ -601,6 +601,16 @@ bool ShrinkWrap::postShrinkWrapping(bool HasCandidate, MachineFunction &MF,
                                 CleanPreds, TII, RS))
     return false;

+  // FIXME: Currently, we bail out of optimisation if any of the DirtyPreds has
+  // INLINEASM_BR instruction. Also, if restore pt block address is taken, we
+  // need to make suitable changes post restore point split
+  for (MachineBasicBlock *BB : DirtyPreds)
+    for (const MachineInstr &MI : *BB)
+      if (MI.getOpcode() == TargetOpcode::INLINEASM_BR)
+        for (const MachineOperand &MO : MI.operands())
+          if (MO.isMBB() && MO.getMBB() == InitRestore)
+            return false;
+
   // Trying to reach out to the new save point which dominates all dirty blocks.
   MachineBasicBlock *NewSave =
       FindIDom<>(**DirtyPreds.begin(), DirtyPreds, *MDT, false);

@sushgokh I will let Nick comment on the correctness of that diff but I can confirm that applying that diff on top of 6530bd3030d3 (the parent of the revert of this change, bb5befefc6e7) makes both issues that I reported go away.

@nickdesaulniers Does this look good ?

--- a/llvm/lib/CodeGen/ShrinkWrap.cpp
+++ b/llvm/lib/CodeGen/ShrinkWrap.cpp
@@ -601,6 +601,16 @@ bool ShrinkWrap::postShrinkWrapping(bool HasCandidate, MachineFunction &MF,
                                 CleanPreds, TII, RS))
     return false;

+  // FIXME: Currently, we bail out of optimisation if any of the DirtyPreds has
+  // INLINEASM_BR instruction. Also, if restore pt block address is taken, we
+  // need to make suitable changes post restore point split
+  for (MachineBasicBlock *BB : DirtyPreds)

Rather than scan every instruction in the block, I think you can use
MachineBasicBlock::isInlineAsmBrIndirectTarget()
Perhaps we can do that before even adding such an MBB to DirtyPreds?

+ for (const MachineInstr &MI : *BB)
+ if (MI.getOpcode() == TargetOpcode::INLINEASM_BR)
+ for (const MachineOperand &MO : MI.operands())
+ if (MO.isMBB() && MO.getMBB() == InitRestore)
+ return false;
+

// Trying to reach out to the new save point which dominates all dirty blocks.
MachineBasicBlock *NewSave =
    FindIDom<>(**DirtyPreds.begin(), DirtyPreds, *MDT, false);
alexfh added a subscriber: alexfh.Apr 14 2023, 1:07 PM

In our internal testing we've noticed that this commit causes a ~10-15% regression of the Shootout-ackermann benchmark (https://github.com/llvm/llvm-test-suite/blob/main/SingleSource/Benchmarks/Shootout/ackermann.c) on x86. The benchmark is somewhat noisy, thus, I'm not 100% sure it's a real regression. But maybe you want to check if there's something obviously wrong started happening on that code.

@nickdesaulniers Thanks. Will make the necessary changes.

@alexfh Will surely check for the regression and update you if there's any.

@alexfh The regression exists. I tried on AArch64. However, its not due this patch itself but there is some significant block reordering taking place at "block-placement" pass.

@nickdesaulniers would it be good to go ahead and commit the patch and later track this issue with a bug raised for it? This is just to avoid major changes ticking in the patch if its delayed.

@nickdesaulniers would it be good to go ahead and commit the patch and later track this issue with a bug raised for it? This is just to avoid major changes ticking in the patch if its delayed.

No! Our CI will notice this immediately again. Do not break our builds!

fhahn added a subscriber: fhahn.Apr 17 2023, 11:32 AM

In our internal testing we've noticed that this commit causes a ~10-15% regression of the Shootout-ackermann benchmark (https://github.com/llvm/llvm-test-suite/blob/main/SingleSource/Benchmarks/Shootout/ackermann.c) on x86. The benchmark is somewhat noisy, thus, I'm not 100% sure it's a real regression. But maybe you want to check if there's something obviously wrong started happening on that code.

We've also seen a ~8% performance regression on internal benchmarks on ARM64.

@alexfh @fhahn Thanks for the inputs. Could you please confirm if the regression goes away by applying this?

@@ -471,7 +471,9 @@ tryToSplitRestore(MachineBasicBlock *MBB,
       MBBFallthrough.insert(BB);   MachineBasicBlock *NMBB = MF->CreateMachineBasicBlock();
-  MF->insert(MachineFunction::iterator(MBB), NMBB);
+  // Insert this block at the end of the function. Inserting in between may
+  // interfere with control flow optimizer decisions.
+  MF->insert(MF->end(), NMBB);   
   for (const MachineBasicBlock::RegisterMaskPair &LI : MBB->liveins())
     NMBB->addLiveIn(LI.PhysReg);

@nickdesaulniers Need your approval to go ahead with relanding with 2 changes viz. 1. INLINEASM_BR changes 2.Change in block location

Have tested both against SPEC17 and llvm test-suite and figures look good. No major diversions were observed for test suite

@sushgokh Has a new revision with the changes that you have mentioned been pushed here for review? I can pull it down and verify that the two configurations that had boot issues are fixed, as well as making sure that no other configurations regress.

@nathanchance Thanks for the initiative. Could you please apply the below patch on top of existing patch and check?

--- a/llvm/lib/CodeGen/ShrinkWrap.cpp
+++ b/llvm/lib/CodeGen/ShrinkWrap.cpp
@@ -471,7 +471,9 @@ tryToSplitRestore(MachineBasicBlock *MBB,
       MBBFallthrough.insert(BB);

   MachineBasicBlock *NMBB = MF->CreateMachineBasicBlock();
-  MF->insert(MachineFunction::iterator(MBB), NMBB);
+  // Insert this block at the end of the function. Inserting in between may
+  // interfere with control flow optimizer decisions.
+  MF->insert(MF->end(), NMBB);

   for (const MachineBasicBlock::RegisterMaskPair &LI : MBB->liveins())
     NMBB->addLiveIn(LI.PhysReg);
@@ -577,6 +579,12 @@ bool ShrinkWrap::postShrinkWrapping(bool HasCandidate, MachineFunction &MF,
       !MPDT->dominates(InitRestore, InitSave))
     return false;

+  // Bail out of the optimization if any of the basic block is target of
+  // INLINEASM_BR instruction
+  for (MachineBasicBlock &MBB : MF)
+    if (MBB.isInlineAsmBrIndirectTarget())
+      return false;
+
   DenseSet<const MachineBasicBlock *> DirtyBBs;
   for (MachineBasicBlock &MBB : MF) {
     if (MBB.isEHPad()) {

@sushgokh I tested that diff on top of the revert and I no longer see the boot issues I initially reported or any new regressions. I do not feel like I am knowledgeable enough to approve the reland, hopefully someone else can chime in on that but this should not be blocked on the Linux kernel regression any longer.

@nathanchance Thanks for the update :)

@nickdesaulniers Requesting approval for relanding

@nathanchance Thanks for the update :)

@nickdesaulniers Requesting approval for relanding

Can you please push the updated patch to this phab review (click reopen in the bottom left "Add Action..." menu) or a new one? It's hard for me to look at the diff without the surrounding context, which phab would allow me to view.

In particular, I'm concerned if the checks for isInlineAsmBrIndirectTarget in ShrinkWrap::performShrinkWrapping are still relevant after this change?

I would also like to see a callbr based test added.

sushgokh reopened this revision.May 2 2023, 7:06 PM
This revision is now accepted and ready to land.May 2 2023, 7:06 PM
sushgokh requested review of this revision.May 2 2023, 7:26 PM
sushgokh updated this revision to Diff 518942.

@nickdesaulniers test for inlineasm_br added as 'noshrink_bb_as_inlineasmbr_target()' in 'shrinkwrap-split-restore-point.mir'

sushgokh updated this revision to Diff 518949.May 2 2023, 7:39 PM
nickdesaulniers accepted this revision.May 5 2023, 10:29 AM

LGTM; though for the two tests you converted to use the update_*_test_checks.py, you should do those conversions as separate parent commits. I'm not going to block this commit on that, but it's nice to do in the future.

llvm/test/CodeGen/PowerPC/shrink-wrap.ll
1–4

Consider pre-committing the conversion to update_llc_test_checks.py so that it's more obvious if this change to LLVM changes anything of interest in this test.

llvm/test/CodeGen/PowerPC/shrink-wrap.mir
2

Consider pre-committing the conversion to update_mir_test_checks.py so that it's more obvious if this change to LLVM changes anything of interest in this test.

This revision is now accepted and ready to land.May 5 2023, 10:29 AM
This revision was automatically updated to reflect the committed changes.

It looks like for 32-bit ARM, shrink-wrapping is blocking the formation of pop {pc} instructions to return from a function (so we instead pop {lr}; bx lr). Is this new, or a side-effect of shrink-wrapping more cases? Any idea what's causing it? Is there an issue tracking it?

ayzhao added a subscriber: ayzhao.May 8 2023, 4:14 PM

This is causing lld to crash on Chrome builds when PGO and ThinLTO are enabled: https://crbug.com/1443635

It looks like for 32-bit ARM, shrink-wrapping is blocking the formation of pop {pc} instructions to return from a function (so we instead pop {lr}; bx lr). Is this new, or a side-effect of shrink-wrapping more cases? Any idea what's causing it? Is there an issue tracking it?

This should be side effect of doing more shrink wrapping and caused by later passes as this patch just changes the save/restore point and nothing beyond that. Currently, we dont have any issue tracking it but if this is leading to performance regression, then could you please raise a bug for it when I reland it ?

This is causing lld to crash on Chrome builds when PGO and ThinLTO are enabled: https://crbug.com/1443635

@ayzhao I request you to please provide me following information:

  1. Reproducer of the function causing the assert failure
  2. llc -print-before=shrink-wrap for the function
  3. llc -debug-only=shrink-wrap for function

Also, I request to check if the following patch works since I suspect that PGO is changing the block frequencies:

--- a/llvm/lib/CodeGen/ShrinkWrap.cpp
+++ b/llvm/lib/CodeGen/ShrinkWrap.cpp
@@ -632,7 +632,8 @@ bool ShrinkWrap::postShrinkWrapping(bool HasCandidate, MachineFunction &MF,
       FindIDom<>(**DirtyPreds.begin(), DirtyPreds, *MDT, false);

   while (NewSave && (hasDirtyPred(ReachableByDirty, *NewSave) ||
-                     EntryFreq < MBFI->getBlockFreq(NewSave).getFrequency()))
+                     EntryFreq < MBFI->getBlockFreq(NewSave).getFrequency() ||
+                     MLI->getLoopFor(NewSave)))
     NewSave = FindIDom<>(**NewSave->pred_begin(), NewSave->predecessors(), *MDT,
                          false);

This is causing lld to crash on Chrome builds when PGO and ThinLTO are enabled: https://crbug.com/1443635

@ayzhao I request you to please provide me following information:

  1. Reproducer of the function causing the assert failure

~/src/llvm-project/build/bin/llc patternprops.o.5.precodegen.ll
llc: /usr/local/google/home/ayzhao/src/llvm-project/llvm/lib/CodeGen/ShrinkWrap.cpp:666: bool (anonymous namespace)::ShrinkWrap::postShrinkWrapping(bool, llvm::MachineFunction &, llvm::RegScavenger *): Assertion `(!MLI->getLoopFor(Save) && !MLI->getLoopFor(Restore)) && "Unexpected save or restore point in a loop"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc patternprops.o.5.precodegen.ll
1.      Running pass 'Function Pass Manager' on module 'patternprops.o.5.precodegen.ll'.
2.      Running pass 'Shrink Wrapping analysis' on function '@_ZN6icu_7212PatternProps14trimWhiteSpaceEPKDsRi'
 #0 0x0000563786318bd7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x1e44bd7)
 #1 0x0000563786316a8e llvm::sys::RunSignalHandlers() (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x1e42a8e)
 #2 0x00005637863193ba SignalHandler(int) Signals.cpp:0:0
 #3 0x00007fafd965af90 (/lib/x86_64-linux-gnu/libc.so.6+0x3bf90)
 #4 0x00007fafd96a9ccc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007fafd965aef2 raise ./signal/../sysdeps/posix/raise.c:27:6
 #6 0x00007fafd9645472 abort ./stdlib/abort.c:81:7
 #7 0x00007fafd9645395 _nl_load_domain ./intl/loadmsgcat.c:1177:9
 #8 0x00007fafd9653df2 (/lib/x86_64-linux-gnu/libc.so.6+0x34df2)
 #9 0x000056378598fbbf (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x14bbbbf)
#10 0x00005637857517bc llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x127d7bc)
#11 0x0000563785c6bcb3 llvm::FPPassManager::runOnFunction(llvm::Function&) (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x1797cb3)
#12 0x0000563785c73e11 llvm::FPPassManager::runOnModule(llvm::Module&) (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x179fe11)
#13 0x0000563785c6c6cc llvm::legacy::PassManagerImpl::run(llvm::Module&) (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x17986cc)
#14 0x0000563784a8ad89 main (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x5b6d89)
#15 0x00007fafd964618a __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#16 0x00007fafd9646245 call_init ./csu/../csu/libc-start.c:128:20
#17 0x00007fafd9646245 __libc_start_main ./csu/../csu/libc-start.c:368:5
#18 0x0000563784a857b1 _start (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x5b17b1)
[1]    2041024 IOT instruction  ~/src/llvm-project/build/bin/llc patternprops.o.5.precodegen.ll
  1. llc -print-before=shrink-wrap for the function

  1. llc -debug-only=shrink-wrap for function

Also, I request to check if the following patch works since I suspect that PGO is changing the block frequencies:

--- a/llvm/lib/CodeGen/ShrinkWrap.cpp
+++ b/llvm/lib/CodeGen/ShrinkWrap.cpp
@@ -632,7 +632,8 @@ bool ShrinkWrap::postShrinkWrapping(bool HasCandidate, MachineFunction &MF,
       FindIDom<>(**DirtyPreds.begin(), DirtyPreds, *MDT, false);

   while (NewSave && (hasDirtyPred(ReachableByDirty, *NewSave) ||
-                     EntryFreq < MBFI->getBlockFreq(NewSave).getFrequency()))
+                     EntryFreq < MBFI->getBlockFreq(NewSave).getFrequency() ||
+                     MLI->getLoopFor(NewSave)))
     NewSave = FindIDom<>(**NewSave->pred_begin(), NewSave->predecessors(), *MDT,
                          false);

That seems to work:

$ ~/src/llvm-project/build/bin/llc patternprops.o.5.precodegen.ll

$ echo $?
0

@nickdesaulniers I have landed the patch with optimisation off by default so that future changes to it and relanding will become easy.

@ayzhao Thanks for the inputs.

sushgokh reopened this revision.May 15 2023, 8:08 PM
This revision is now accepted and ready to land.May 15 2023, 8:08 PM
sushgokh requested review of this revision.May 15 2023, 8:19 PM
sushgokh updated this revision to Diff 522416.

@nickdesaulniers could you please check if its good to reland?

@nickdesaulniers could you please check if its good to reland?

Are @ayzhao 's and @efriedma 's comments addressed?

sushgokh added a comment.EditedMay 23 2023, 10:07 PM

@ayzhao issue addressed.
@efriedma's issue needs to addressed separately in other pass(if need be) but I don't think this issue will result in performance regression.

nickdesaulniers accepted this revision.May 24 2023, 11:37 AM
This revision is now accepted and ready to land.May 24 2023, 11:37 AM
nickdesaulniers requested changes to this revision.May 24 2023, 11:38 AM
nickdesaulniers added inline comments.
llvm/test/CodeGen/AArch64/dont-shrink-wrap-stack-mayloadorstore.s
1 ↗(On Diff #522416)

There's a presubmit failure that this newly added test is missing a RUN line.

This revision now requires changes to proceed.May 24 2023, 11:38 AM
This revision was not accepted when it landed; it landed in state Needs Revision.May 25 2023, 1:27 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.
sushgokh marked an inline comment as done.EditedMay 25 2023, 1:30 AM

I submitted the patch without checking that the patch needs revision. I got confused by back to back accept followed by revision request. Apology.

But I have removed the file accidently added

llvm/test/CodeGen/AArch64/dont-shrink-wrap-stack-mayloadorstore.s
1 ↗(On Diff #522416)

I had added that by mistake(git add -A). Apologies.