Download Raw Diff

Details

Reviewers

qcolombet
thegameg
javed.absar
junbuml
nickdesaulniers
MaskRay

Commits

rGc4a60c9d3437: [CodeGen][ShrinkWrap] Enable PostShrinkWrap by default
rG1ddfd1c81867: [CodeGen][ShrinkWrap] Split restore point
rG5f0bccc3d1a7: [CodeGen][ShrinkWrap] Split restore point

Summary

This change split a restore point to allow it to only post-dominate blocks reachable by use or def of CSRs/FI.

This is a WIP and I'm posting it to continue the discussion about :

Bugzilla : https://bugs.llvm.org/show_bug.cgi?id=33868

I will be happy to hear any high level comment/suggestion.

This change itself increase 15% more shrink-wrapping in spec2000/2006/2017 benchmarks. I observed 160% more shrink-wrapping in spec2000/2006/2017 benchmarks if we apply the copy forwarding (D41835), PostRASink (D41463), and this change all together.

Diff Detail

Event Timeline

junbuml created this revision.Jan 26 2018, 2:03 PM

Herald added subscribers: javed.absar, mcrosier. · View Herald TranscriptJan 26 2018, 2:03 PM

sfertile added a subscriber: sfertile.Jan 29 2018, 9:41 AM

Thanks for working on this! I think the idea is worth pursuing. This kinda reminds me of Post Register Allocation Spill Code Optimization, Christopher Lupo, Kent D. Wilken. A similar issue is what they describe in §4.8. I think if you extract the blocks C, D, E, F from the example, that's your function.

I tried looking into implementing that a while ago, and I mostly gave up because our SESE region infrastructure is not used / tested much, especially during CodeGen. I think your implementation has a big advantage because it doesn't require to build such (expensive, IIRC) constructions and might be much faster since SESE regions are usually not available, and dominators are.

Few thoughts: (please correct me if I'm wrong):

We don't want to add extra branches. By that I mean that we want to guarantee that we always (or when we know it's not worth having extra branches) have a fall through from NMBB to MBB. In your example this is perfectly fine, and I see that NMBB is always inserted before MBB, which I think should be always fine.
We want to be sure that, in the end, if the points are not interesting, we don't insert NMBB.
I would run a verifier or add some statistics / remarks to see if any of the previous points are happening and if it causes any regressions.
I think doing this kind of simulations on the post-dominator tree itself as @qcolombet suggested sounds interesting. Correct me if I'm wrong, here we're looking to introduce a common post-dominator of all the "dirty" blocks, right?

I observed 160% more shrink-wrapping in spec2000/2006/2017 benchmarks if we apply the copy forwarding (D41835), PostRASink (D41463), and this change all together.

Just to be sure, on AArch64, right? Do you see any performance improvement with this change? Any regressions?

Since you marked this as [WIP] I'll skip any comments on the code itself for now.

Just two remarks:

So far shrink-wrapping was solely an analysis pass, i.e., it didn't modify the input code
Ideally I'd like we postpone creating new blocks until we decided where we are going to insert the code, otherwise we'll end up with blocks to clean-up later on

Regarding the SESE approach, I think it wouldn't be enough because what you want and not given by SESE is *new* common (post)dominator. E.g., in the motivating example we wanted to merge (A->C) (B->C) into (A->common) (B->common) (common->C).

In D42600#992723, @qcolombet wrote:

Regarding the SESE approach, I think it wouldn't be enough because what you want and not given by SESE is *new* common (post)dominator. E.g., in the motivating example we wanted to merge (A->C) (B->C) into (A->common) (B->common) (common->C).

Yes, but if I understand the paper correctly, in Figure 4 (left) they basically create an SESE out of what they call "Set 1" in Figure 3 (depending on what they call "execution count cost model"). I read that part a while ago but I think at that point they are deciding whether to use "Region 1" as a save/restore boundary or to create a new SESE for that.

Based on the comments from Francis and Quentin, I tried to split the restore block only when we know that the block spilt is used as restore point. After performing current shrink wrapping, I ran post shrinking to see if we can shrink save point further by splitting restore point. Currently, this update is still WIP because I do this in shrink wrapping pass. If this approach is generally reasonable and we need to keep the shrink-wrap pass as an analysis pass, then I will move this in a new pass between shrink-wrap and PEI.

I observed 160% more shrink-wrapping in spec2000/2006/2017 benchmarks if we apply the copy forwarding (D41835), PostRASink (D41463), and this change all together.

Just to be sure, on AArch64, right? Do you see any performance improvement with this change? Any regressions?

Yes it was on AArch64. I applied this change on top of PostRASink (D41463), and I ran performance test for spec :

spec2000/mesa           -1.09 % 
spec2017/perlbench   +1.33 %
spec2017/xalancbmk  +3.04%
spec2017/povray        +3.59 %
spec2006/povray        +4.61%

There are up/down in the scores of these benchmarks, but in povray many extra shrink wrapping happened in functions shown in profile and dynamic instruction count was decreased with this change: -4.80% (2006/povray) and -4.91% (2017/povray). Further shrink-wrapping happened in one of the hot function in xalanbmk, but the dynamic instruction count wasn't decreased in xalancbmk.

Hi Francis and Quentin,
Does any of you has any comment about it? I will be happy to hear any high level comment about it.
Thanks,
Jun

Rebased. In this change, I perform post shrinking where we try to further shrink the save point by splitting the restore point after finishing the regular shrink wrapping. This change is WIP for now because I do this in shrink wrapping pass. If overall approach is acceptable and we want to keep shrink-wrap pass as an analysis pas , I will move it as a new pass. Please take a look and let me know any comment.

Rebased and minor changes in comments. Please let me know if the approach I'm using here make sense.

jedilyn added a subscriber: jedilyn.Oct 17 2018, 6:54 PM

Herald added a reviewer: javed.absar. · View Herald TranscriptOct 17 2018, 6:54 PM

@junbuml Is it possible for you to rebase this patch and upstream it? With current LLVM trunk, its giving around 7% uplift on SPEC2017 povray

Herald added a project: Restricted Project. · View Herald TranscriptFeb 8 2023, 12:01 AM

@junbuml I will take over this work next week.

Let me know if there are any issues with taking over.

sushgokh mentioned this in D145657: [CodeGen] Retry landing D42600 - Precommit test .Mar 8 2023, 11:29 PM

nickdesaulniers added a subscriber: nickdesaulniers.Mar 14 2023, 12:12 PM

sushgokh commandeered this revision.Mar 17 2023, 12:56 AM

sushgokh added a reviewer: junbuml.

This is reland attempt.

The only change to the patch is introducing 'updateTerminator' function. Rest all is inherited from previous revision.

Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2023, 1:05 AM

Herald added subscribers: luke, • pcwang-thead, frasercrmck and 24 others. · View Herald Transcript

Harbormaster completed remote builds in B220006: Diff 505997.Mar 17 2023, 2:05 AM

lkail added a subscriber: lkail.Mar 17 2023, 2:46 AM

Initial pass at basic code style review.

llvm/lib/CodeGen/ShrinkWrap.cpp
360–362 ↗	(On Diff #505997)	return !TII.analyzeBranch(...
388 ↗	(On Diff #505997)	`MF` is unused
401 ↗	(On Diff #505997)	Use `ArrayRef` from ADT for parameters.
416 ↗	(On Diff #505997)	Please add a comment above the definition describing what this function is doing.
426 ↗	(On Diff #505997)	Please add a comment above the definition describing what this function is doing.
439 ↗	(On Diff #505997)	remove auto
442 ↗	(On Diff #505997)	remove {} from for
443 ↗	(On Diff #505997)	remove auto
446 ↗	(On Diff #505997)	remove {} from for
451 ↗	(On Diff #505997)	replace auto with RegisterMaskPair&
458 ↗	(On Diff #505997)	remove auto
460 ↗	(On Diff #505997)	remove for {}
469 ↗	(On Diff #505997)	What's `NMBB`? Maybe a more descriptive identifier might be nice?
479 ↗	(On Diff #505997)	remove auto
482 ↗	(On Diff #505997)	remove for {}
483 ↗	(On Diff #505997)	remove auto
486 ↗	(On Diff #505997)	remove for {}
489 ↗	(On Diff #505997)	remove auto
491 ↗	(On Diff #505997)	remove for {}
496 ↗	(On Diff #505997)	remove auto
500 ↗	(On Diff #505997)	Please add a comment above the definition describing what this function is doing.
510 ↗	(On Diff #505997)	remove auto
556 ↗	(On Diff #505997)	remove auto
579 ↗	(On Diff #505997)	dominates
884–885 ↗	(On Diff #505997)	should we still return `false` here if if `Changed == true`? Am I reading this right? Did the previous version of `ShrinkWrap::runOnMachineFunction` never return `true`?

thegameg added inline comments.Mar 20 2023, 5:10 PM

llvm/lib/CodeGen/ShrinkWrap.cpp
884–885 ↗	(On Diff #505997)	Correct, the previous version doesn't change the code, it just tells `PrologEpilogInserter` where to place the prologue/epilogue through `MachineFrameInfo`. I guess here we should return true if we do any edge-splitting.

sushgokh added inline comments.Mar 21 2023, 2:19 AM

llvm/lib/CodeGen/ShrinkWrap.cpp
884–885 ↗	(On Diff #505997)	@nickdesaulniers I think no need of returning true if Changed == true because dominance/post-dominance relations are updated after post-shrinking. Is there any other thing that needs updation ?

sushgokh added inline comments.Mar 21 2023, 6:06 AM

llvm/lib/CodeGen/ShrinkWrap.cpp
469 ↗	(On Diff #505997)	Will add a comment to indicate that NMBB is short for new restore point.

sushgokh updated this revision to Diff 506979.Mar 21 2023, 7:50 AM

Herald added a subscriber: asbirlea. · View Herald TranscriptMar 21 2023, 7:50 AM

Harbormaster completed remote builds in B220726: Diff 506979.Mar 21 2023, 9:42 AM

Hey,

I only had a cursory look but the overall direction seems good.

I'll look closer after you add comments that goes into more details of what each function is doing and lay off the nomenclature around dirty, clean, and splittable.

Cheers,
-Quentin

llvm/lib/CodeGen/ShrinkWrap.cpp
103 ↗	(On Diff #506979)	How about a more descriptive name like `enable-shrink-wrap-region-split`?
192 ↗	(On Diff #506979)	Typo: chance => change
199 ↗	(On Diff #506979)	Here and everywhere: use `///` to enable doxygen comments.
200 ↗	(On Diff #506979)	Please document the method and all the parameters. In particular what it means to be a clean/dirty predecessor and what we check here. I see that you have some more comments in the cpp, but I believe it doesn't answer these questions. Also we should document what we mean by splittable with respect to what because a block should always be splittable in the more general term, but that may not help us. I.e., we should always be able to do something like BB: <...> > BB: // fall through NewBB: <...> That's useless but there is nothing preventing us to do that so technically blocks are splittable :).
363 ↗	(On Diff #506979)	Both arguments can be const
363 ↗	(On Diff #506979)	Document what dirty means.
868 ↗	(On Diff #506979)	The comment starting at " If MF is irreducible, a block may be in a loop without" is duplicated here.

qcolombet added inline comments.Mar 28 2023, 2:15 AM

llvm/lib/CodeGen/ShrinkWrap.cpp
884–885 ↗	(On Diff #505997)	If we changed the CFG (or anything) we need to return true. There may be other things that we are destroying without realizing (e.g., live intervals, but that's not applicable here. The point is we don't know which analysis rely on what.)

@qcolombet @nickdesaulniers Have addressed all the suggestions

Harbormaster completed remote builds in B222260: Diff 509026.Mar 28 2023, 9:16 AM

Please mark comments as "Done" in phabricator so that it's clear to reviewers which comment threads are still outstanding.

llvm/lib/CodeGen/ShrinkWrap.cpp
210–211 ↗	(On Diff #509026)	I think it's more appropriate to pass a `SmallVectorImpl<MachineBasicBlock>&` rather than `SmallVector<MachineBasicBlock , 2>&` (having the parameter specialization of `2` is...overly specific). It would be preferable to use `ArrayRef`, but you can't `push_back` an `ArrayRef`, so the `SmallVectorImp<>&` is the way to go.
451 ↗	(On Diff #509026)	Seems like it can be undone though, IIUC?
460 ↗	(On Diff #509026)	ArrayRef
503 ↗	(On Diff #509026)	ArrayRef
531–532 ↗	(On Diff #509026)	SmallVectorImpl<MachineBasicBlock *>&
538 ↗	(On Diff #509026)	remove auto
583 ↗	(On Diff #509026)	remove auto
884–885 ↗	(On Diff #505997)	`tryToSplitRestore` is modifying the CFG, right? If that happens, we MUST return `true`. Should we perhaps skip calling `postShrinkWrapping` unless `HasCandidate` is `true`?

sushgokh updated this revision to Diff 509217.Mar 28 2023, 10:58 PM

sushgokh marked 42 inline comments as done.Mar 28 2023, 11:03 PM

sushgokh added inline comments.

llvm/lib/CodeGen/ShrinkWrap.cpp
451 ↗	(On Diff #509026)	yes, right

Harbormaster completed remote builds in B222411: Diff 509217.Mar 29 2023, 4:10 AM

ping.

nickdesaulniers accepted this revision.Apr 7 2023, 9:53 AM

nickdesaulniers added inline comments.

llvm/lib/CodeGen/ShrinkWrap.cpp
450 ↗	(On Diff #509217)	`ReachableByDirty` was removed in the latest diff. Please clean up the comment block, too.

This revision is now accepted and ready to land.Apr 7 2023, 9:53 AM

This revision was landed with ongoing or failed builds.Apr 10 2023, 11:29 PM

Closed by commit rG5f0bccc3d1a7: [CodeGen][ShrinkWrap] Split restore point (authored by sushgokh). · Explain Why

This revision was automatically updated to reflect the committed changes.

sushgokh added a commit: rG5f0bccc3d1a7: [CodeGen][ShrinkWrap] Split restore point.

@sushgokh , @nathanchance is reporting issues with this patch breaking a few Linux kernel builds https://github.com/ClangBuiltLinux/linux/issues/1833. Can you please revert, and we'll work on getting you a reproducer?

@nickdesaulniers yes, will revert it

sushgokh added a reverting change: rGbb5befefc6e7: Revert "[CodeGen][ShrinkWrap] Split restore point".Apr 12 2023, 10:23 PM

@nickdesaulniers Does this look good ?

--- a/llvm/lib/CodeGen/ShrinkWrap.cpp
+++ b/llvm/lib/CodeGen/ShrinkWrap.cpp
@@ -601,6 +601,16 @@ bool ShrinkWrap::postShrinkWrapping(bool HasCandidate, MachineFunction &MF,
                                 CleanPreds, TII, RS))
     return false;

+  // FIXME: Currently, we bail out of optimisation if any of the DirtyPreds has
+  // INLINEASM_BR instruction. Also, if restore pt block address is taken, we
+  // need to make suitable changes post restore point split
+  for (MachineBasicBlock *BB : DirtyPreds)
+    for (const MachineInstr &MI : *BB)
+      if (MI.getOpcode() == TargetOpcode::INLINEASM_BR)
+        for (const MachineOperand &MO : MI.operands())
+          if (MO.isMBB() && MO.getMBB() == InitRestore)
+            return false;
+
   // Trying to reach out to the new save point which dominates all dirty blocks.
   MachineBasicBlock *NewSave =
       FindIDom<>(**DirtyPreds.begin(), DirtyPreds, *MDT, false);

@sushgokh I will let Nick comment on the correctness of that diff but I can confirm that applying that diff on top of 6530bd3030d3 (the parent of the revert of this change, bb5befefc6e7) makes both issues that I reported go away.

In D42600#4267396, @sushgokh wrote:

@nickdesaulniers Does this look good ?

--- a/llvm/lib/CodeGen/ShrinkWrap.cpp
+++ b/llvm/lib/CodeGen/ShrinkWrap.cpp
@@ -601,6 +601,16 @@ bool ShrinkWrap::postShrinkWrapping(bool HasCandidate, MachineFunction &MF,
                                 CleanPreds, TII, RS))
     return false;

+  // FIXME: Currently, we bail out of optimisation if any of the DirtyPreds has
+  // INLINEASM_BR instruction. Also, if restore pt block address is taken, we
+  // need to make suitable changes post restore point split
+  for (MachineBasicBlock *BB : DirtyPreds)

Rather than scan every instruction in the block, I think you can use
MachineBasicBlock::isInlineAsmBrIndirectTarget()
Perhaps we can do that before even adding such an MBB to DirtyPreds?

+ for (const MachineInstr &MI : *BB)
+ if (MI.getOpcode() == TargetOpcode::INLINEASM_BR)
+ for (const MachineOperand &MO : MI.operands())
+ if (MO.isMBB() && MO.getMBB() == InitRestore)
+ return false;
+
// Trying to reach out to the new save point which dominates all dirty blocks.
MachineBasicBlock *NewSave =
    FindIDom<>(**DirtyPreds.begin(), DirtyPreds, *MDT, false);

In our internal testing we've noticed that this commit causes a ~10-15% regression of the Shootout-ackermann benchmark (https://github.com/llvm/llvm-test-suite/blob/main/SingleSource/Benchmarks/Shootout/ackermann.c) on x86. The benchmark is somewhat noisy, thus, I'm not 100% sure it's a real regression. But maybe you want to check if there's something obviously wrong started happening on that code.

@nickdesaulniers Thanks. Will make the necessary changes.

@alexfh Will surely check for the regression and update you if there's any.

@alexfh The regression exists. I tried on AArch64. However, its not due this patch itself but there is some significant block reordering taking place at "block-placement" pass.

@nickdesaulniers would it be good to go ahead and commit the patch and later track this issue with a bug raised for it? This is just to avoid major changes ticking in the patch if its delayed.

In D42600#4274129, @sushgokh wrote:

@nickdesaulniers would it be good to go ahead and commit the patch and later track this issue with a bug raised for it? This is just to avoid major changes ticking in the patch if its delayed.

No! Our CI will notice this immediately again. Do not break our builds!

In D42600#4269504, @alexfh wrote:

In our internal testing we've noticed that this commit causes a ~10-15% regression of the Shootout-ackermann benchmark (https://github.com/llvm/llvm-test-suite/blob/main/SingleSource/Benchmarks/Shootout/ackermann.c) on x86. The benchmark is somewhat noisy, thus, I'm not 100% sure it's a real regression. But maybe you want to check if there's something obviously wrong started happening on that code.

We've also seen a ~8% performance regression on internal benchmarks on ARM64.

kmclaughlin added a subscriber: kmclaughlin.Apr 18 2023, 3:46 AM

@alexfh @fhahn Thanks for the inputs. Could you please confirm if the regression goes away by applying this?

@@ -471,7 +471,9 @@ tryToSplitRestore(MachineBasicBlock *MBB,
       MBBFallthrough.insert(BB);   MachineBasicBlock *NMBB = MF->CreateMachineBasicBlock();
-  MF->insert(MachineFunction::iterator(MBB), NMBB);
+  // Insert this block at the end of the function. Inserting in between may
+  // interfere with control flow optimizer decisions.
+  MF->insert(MF->end(), NMBB);   
   for (const MachineBasicBlock::RegisterMaskPair &LI : MBB->liveins())
     NMBB->addLiveIn(LI.PhysReg);

dtcxzyw added a subscriber: dtcxzyw.Apr 22 2023, 6:10 PM

@alexfh @fhahn Any update ?

@nickdesaulniers Need your approval to go ahead with relanding with 2 changes viz. 1. INLINEASM_BR changes 2.Change in block location

Have tested both against SPEC17 and llvm test-suite and figures look good. No major diversions were observed for test suite

@sushgokh Has a new revision with the changes that you have mentioned been pushed here for review? I can pull it down and verify that the two configurations that had boot issues are fixed, as well as making sure that no other configurations regress.

@nathanchance Thanks for the initiative. Could you please apply the below patch on top of existing patch and check?

--- a/llvm/lib/CodeGen/ShrinkWrap.cpp
+++ b/llvm/lib/CodeGen/ShrinkWrap.cpp
@@ -471,7 +471,9 @@ tryToSplitRestore(MachineBasicBlock *MBB,
       MBBFallthrough.insert(BB);

   MachineBasicBlock *NMBB = MF->CreateMachineBasicBlock();
-  MF->insert(MachineFunction::iterator(MBB), NMBB);
+  // Insert this block at the end of the function. Inserting in between may
+  // interfere with control flow optimizer decisions.
+  MF->insert(MF->end(), NMBB);

   for (const MachineBasicBlock::RegisterMaskPair &LI : MBB->liveins())
     NMBB->addLiveIn(LI.PhysReg);
@@ -577,6 +579,12 @@ bool ShrinkWrap::postShrinkWrapping(bool HasCandidate, MachineFunction &MF,
       !MPDT->dominates(InitRestore, InitSave))
     return false;

+  // Bail out of the optimization if any of the basic block is target of
+  // INLINEASM_BR instruction
+  for (MachineBasicBlock &MBB : MF)
+    if (MBB.isInlineAsmBrIndirectTarget())
+      return false;
+
   DenseSet<const MachineBasicBlock *> DirtyBBs;
   for (MachineBasicBlock &MBB : MF) {
     if (MBB.isEHPad()) {

@sushgokh I tested that diff on top of the revert and I no longer see the boot issues I initially reported or any new regressions. I do not feel like I am knowledgeable enough to approve the reland, hopefully someone else can chime in on that but this should not be blocked on the Linux kernel regression any longer.

@nathanchance Thanks for the update :)

@nickdesaulniers Requesting approval for relanding

In D42600#4304778, @sushgokh wrote:

@nathanchance Thanks for the update :)

@nickdesaulniers Requesting approval for relanding

Can you please push the updated patch to this phab review (click reopen in the bottom left "Add Action..." menu) or a new one? It's hard for me to look at the diff without the surrounding context, which phab would allow me to view.

In particular, I'm concerned if the checks for isInlineAsmBrIndirectTarget in ShrinkWrap::performShrinkWrapping are still relevant after this change?

I would also like to see a callbr based test added.

sushgokh reopened this revision.May 2 2023, 7:06 PM

This revision is now accepted and ready to land.May 2 2023, 7:06 PM

sushgokh requested review of this revision.May 2 2023, 7:26 PM

sushgokh updated this revision to Diff 518942.

@nickdesaulniers test for inlineasm_br added as 'noshrink_bb_as_inlineasmbr_target()' in 'shrinkwrap-split-restore-point.mir'

sushgokh updated this revision to Diff 518949.May 2 2023, 7:39 PM

Harbormaster completed remote builds in B229598: Diff 518949.May 2 2023, 8:30 PM

LGTM; though for the two tests you converted to use the update_*_test_checks.py, you should do those conversions as separate parent commits. I'm not going to block this commit on that, but it's nice to do in the future.

llvm/test/CodeGen/PowerPC/shrink-wrap.ll
1 ↗	(On Diff #518949)	Consider pre-committing the conversion to update_llc_test_checks.py so that it's more obvious if this change to LLVM changes anything of interest in this test.
llvm/test/CodeGen/PowerPC/shrink-wrap.mir
1 ↗	(On Diff #518949)	Consider pre-committing the conversion to update_mir_test_checks.py so that it's more obvious if this change to LLVM changes anything of interest in this test.

This revision is now accepted and ready to land.May 5 2023, 10:29 AM

sushgokh mentioned this in rG7cba80010445: [CodeGen] Autogen tests as prerequisite for D42600.May 7 2023, 11:58 PM

Closed by commit rG1ddfd1c81867: [CodeGen][ShrinkWrap] Split restore point (authored by sushgokh). · Explain WhyMay 8 2023, 12:52 AM

This revision was automatically updated to reflect the committed changes.

sushgokh added a commit: rG1ddfd1c81867: [CodeGen][ShrinkWrap] Split restore point.

It looks like for 32-bit ARM, shrink-wrapping is blocking the formation of pop {pc} instructions to return from a function (so we instead pop {lr}; bx lr). Is this new, or a side-effect of shrink-wrapping more cases? Any idea what's causing it? Is there an issue tracking it?

This is causing lld to crash on Chrome builds when PGO and ThinLTO are enabled: https://crbug.com/1443635

ayzhao added a reverting change: rGf4999d3535af: Revert "[CodeGen][ShrinkWrap] Split restore point".May 8 2023, 4:28 PM

In D42600#4327559, @efriedma wrote:

It looks like for 32-bit ARM, shrink-wrapping is blocking the formation of pop {pc} instructions to return from a function (so we instead pop {lr}; bx lr). Is this new, or a side-effect of shrink-wrapping more cases? Any idea what's causing it? Is there an issue tracking it?

This should be side effect of doing more shrink wrapping and caused by later passes as this patch just changes the save/restore point and nothing beyond that. Currently, we dont have any issue tracking it but if this is leading to performance regression, then could you please raise a bug for it when I reland it ?

In D42600#4327970, @ayzhao wrote:

This is causing lld to crash on Chrome builds when PGO and ThinLTO are enabled: https://crbug.com/1443635

@ayzhao I request you to please provide me following information:

Reproducer of the function causing the assert failure
llc -print-before=shrink-wrap for the function
llc -debug-only=shrink-wrap for function

Also, I request to check if the following patch works since I suspect that PGO is changing the block frequencies:

--- a/llvm/lib/CodeGen/ShrinkWrap.cpp
+++ b/llvm/lib/CodeGen/ShrinkWrap.cpp
@@ -632,7 +632,8 @@ bool ShrinkWrap::postShrinkWrapping(bool HasCandidate, MachineFunction &MF,
       FindIDom<>(**DirtyPreds.begin(), DirtyPreds, *MDT, false);

   while (NewSave && (hasDirtyPred(ReachableByDirty, *NewSave) ||
-                     EntryFreq < MBFI->getBlockFreq(NewSave).getFrequency()))
+                     EntryFreq < MBFI->getBlockFreq(NewSave).getFrequency() ||
+                     MLI->getLoopFor(NewSave)))
     NewSave = FindIDom<>(**NewSave->pred_begin(), NewSave->predecessors(), *MDT,
                          false);

In D42600#4329528, @sushgokh wrote:

In D42600#4327970, @ayzhao wrote:

This is causing lld to crash on Chrome builds when PGO and ThinLTO are enabled: https://crbug.com/1443635

@ayzhao I request you to please provide me following information:

Reproducer of the function causing the assert failure

patternprops.o.5.precodegen.ll189 KBDownload

~/src/llvm-project/build/bin/llc patternprops.o.5.precodegen.ll
llc: /usr/local/google/home/ayzhao/src/llvm-project/llvm/lib/CodeGen/ShrinkWrap.cpp:666: bool (anonymous namespace)::ShrinkWrap::postShrinkWrapping(bool, llvm::MachineFunction &, llvm::RegScavenger *): Assertion `(!MLI->getLoopFor(Save) && !MLI->getLoopFor(Restore)) && "Unexpected save or restore point in a loop"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc patternprops.o.5.precodegen.ll
1.      Running pass 'Function Pass Manager' on module 'patternprops.o.5.precodegen.ll'.
2.      Running pass 'Shrink Wrapping analysis' on function '@_ZN6icu_7212PatternProps14trimWhiteSpaceEPKDsRi'
 #0 0x0000563786318bd7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x1e44bd7)
 #1 0x0000563786316a8e llvm::sys::RunSignalHandlers() (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x1e42a8e)
 #2 0x00005637863193ba SignalHandler(int) Signals.cpp:0:0
 #3 0x00007fafd965af90 (/lib/x86_64-linux-gnu/libc.so.6+0x3bf90)
 #4 0x00007fafd96a9ccc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x00007fafd965aef2 raise ./signal/../sysdeps/posix/raise.c:27:6
 #6 0x00007fafd9645472 abort ./stdlib/abort.c:81:7
 #7 0x00007fafd9645395 _nl_load_domain ./intl/loadmsgcat.c:1177:9
 #8 0x00007fafd9653df2 (/lib/x86_64-linux-gnu/libc.so.6+0x34df2)
 #9 0x000056378598fbbf (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x14bbbbf)
#10 0x00005637857517bc llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x127d7bc)
#11 0x0000563785c6bcb3 llvm::FPPassManager::runOnFunction(llvm::Function&) (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x1797cb3)
#12 0x0000563785c73e11 llvm::FPPassManager::runOnModule(llvm::Module&) (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x179fe11)
#13 0x0000563785c6c6cc llvm::legacy::PassManagerImpl::run(llvm::Module&) (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x17986cc)
#14 0x0000563784a8ad89 main (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x5b6d89)
#15 0x00007fafd964618a __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:74:3
#16 0x00007fafd9646245 call_init ./csu/../csu/libc-start.c:128:20
#17 0x00007fafd9646245 __libc_start_main ./csu/../csu/libc-start.c:368:5
#18 0x0000563784a857b1 _start (/usr/local/google/home/ayzhao/src/llvm-project/build/bin/llc+0x5b17b1)
[1]    2041024 IOT instruction  ~/src/llvm-project/build/bin/llc patternprops.o.5.precodegen.ll

llc -print-before=shrink-wrap for the function

print-before-shrink-wrap-stderr.txt59 KBDownload

print-before-shrink-wrap-stdout.txt24 KBDownload

llc -debug-only=shrink-wrap for function

debug-only-shrink-wrap-stderr.txt4 KBDownload

debug-only-shrink-wrap-stdout.txt24 KBDownload

Also, I request to check if the following patch works since I suspect that PGO is changing the block frequencies:

--- a/llvm/lib/CodeGen/ShrinkWrap.cpp
+++ b/llvm/lib/CodeGen/ShrinkWrap.cpp
@@ -632,7 +632,8 @@ bool ShrinkWrap::postShrinkWrapping(bool HasCandidate, MachineFunction &MF,
       FindIDom<>(**DirtyPreds.begin(), DirtyPreds, *MDT, false);

   while (NewSave && (hasDirtyPred(ReachableByDirty, *NewSave) ||
-                     EntryFreq < MBFI->getBlockFreq(NewSave).getFrequency()))
+                     EntryFreq < MBFI->getBlockFreq(NewSave).getFrequency() ||
+                     MLI->getLoopFor(NewSave)))
     NewSave = FindIDom<>(**NewSave->pred_begin(), NewSave->predecessors(), *MDT,
                          false);

That seems to work:

$ ~/src/llvm-project/build/bin/llc patternprops.o.5.precodegen.ll

$ echo $?
0

sushgokh mentioned this in rG1569b36ee9dd: [CodeGen][ShrinkWrap] Split restore point.May 11 2023, 5:23 AM

@nickdesaulniers I have landed the patch with optimisation off by default so that future changes to it and relanding will become easy.

@ayzhao Thanks for the inputs.

sushgokh reopened this revision.May 15 2023, 8:08 PM

This revision is now accepted and ready to land.May 15 2023, 8:08 PM

sushgokh requested review of this revision.May 15 2023, 8:19 PM

sushgokh updated this revision to Diff 522416.

Harbormaster completed remote builds in B232175: Diff 522416.May 15 2023, 9:57 PM

@nickdesaulniers could you please check if its good to reland?

In D42600#4363694, @sushgokh wrote:

@nickdesaulniers could you please check if its good to reland?

Are @ayzhao 's and @efriedma 's comments addressed?

@ayzhao issue addressed.
@efriedma's issue needs to addressed separately in other pass(if need be) but I don't think this issue will result in performance regression.

nickdesaulniers accepted this revision.May 24 2023, 11:37 AM

This revision is now accepted and ready to land.May 24 2023, 11:37 AM

nickdesaulniers requested changes to this revision.May 24 2023, 11:38 AM

nickdesaulniers added inline comments.

llvm/test/CodeGen/AArch64/dont-shrink-wrap-stack-mayloadorstore.s
1 ↗	(On Diff #522416)	There's a presubmit failure that this newly added test is missing a RUN line.

This revision now requires changes to proceed.May 24 2023, 11:38 AM

This revision was not accepted when it landed; it landed in state Needs Revision.May 25 2023, 1:27 AM

This revision was landed with ongoing or failed builds.

Closed by commit rGc4a60c9d3437: [CodeGen][ShrinkWrap] Enable PostShrinkWrap by default (authored by sushgokh). · Explain Why

This revision was automatically updated to reflect the committed changes.

sushgokh added a commit: rGc4a60c9d3437: [CodeGen][ShrinkWrap] Enable PostShrinkWrap by default.

I submitted the patch without checking that the patch needs revision. I got confused by back to back accept followed by revision request. Apology.

But I have removed the file accidently added

llvm/test/CodeGen/AArch64/dont-shrink-wrap-stack-mayloadorstore.s
1 ↗	(On Diff #522416)	I had added that by mistake(git add -A). Apologies.

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen][Shrink-wrap]split restore point
ClosedPublic

Details

Diff Detail

Event Timeline

>

Revision Contents

Diff 131648

lib/CodeGen/ShrinkWrap.cpp

test/CodeGen/AArch64/srink-wrap-split-restore.ll

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen][Shrink-wrap]split restore pointClosedPublic

Details

Diff Detail

Event Timeline

>

Revision Contents

Diff 131648

lib/CodeGen/ShrinkWrap.cpp

test/CodeGen/AArch64/srink-wrap-split-restore.ll

[CodeGen][Shrink-wrap]split restore point
ClosedPublic