This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
1/5
LoopDeletion.cpp
-
test/Transforms/LoopDeletion/
-
Transforms/
-
LoopDeletion/
-
early-exits.ll
-
noop-loops-with-subloops.ll

Differential D93734

[LoopDeletion] Insert an early exit from dead path in loop
Needs ReviewPublic

Authored by jonpa on Dec 22 2020, 3:32 PM.

Download Raw Diff

Details

Reviewers

atmnpatel
jdoerfert
Florian
fhahn

Summary

This is applied on top of https://reviews.llvm.org/D86844 "[LoopDeletion] Allows deletion of possibly infinite side-effect free loops".

This patch handles the case where the whole loop is not dead, but only a certain constant path through it. If this path is entered, the loop is in fact dead and an early exit could be made.

Before finalizing the patch by adding tests and also handling the new PM, I would like to get some feedback as to if this is looking to be the right approach. I have seen that this handles the omnetpp function printAddressTable() the same way as GCC does, which gives a nice improvement on the benchmark (SystemZ, output disabled).

Diff Detail

Unit TestsFailed

	Time	Test
	90 ms	x64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-cxa-atexit.S
	110 ms	x64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-static-initializer.S
	70 ms	x64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-tls.S

Event Timeline

jonpa created this revision.Dec 22 2020, 3:32 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptDec 22 2020, 3:32 PM

jonpa requested review of this revision.Dec 22 2020, 3:32 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2020, 3:32 PM

jonpa added a reviewer: Florian.Dec 22 2020, 3:52 PM

The idea is that we have

H:
  %c = ... ; invariant wrt H and L
  br %c, L, B
B: 
  side_effects
  br L
L:
  br %x, H, Exit

Exit:
  ...

right?

One of my problems is that this is too tied to the syntax.
Another is that we are not deleting a loop. The former can be addressed later I guess.

Isn't this more related to LoopUnswitch?
We want to "unswitch the paths that are side-effect free", or something like that.

We could also make it generic by collecting the blocks that are allowed on such a path
in order to allow different CFGs. But that can be done later as well.

llvm/lib/Transforms/Scalar/LoopDeletion.cpp
187	Nit: Unrelated and misses documentation. A useful helper though, could be added to D86844 directly (if you don't mind).
556	that reminds me, we really need to make `llvm.assume` side-effect free... D89054

In D93734#2469137, @jdoerfert wrote:
The idea is that we have
H:
  %c = ... ; invariant wrt H and L
  br %c, L, B
B: 
  side_effects
  br L
L:
  br %x, H, Exit

Exit:
  ...
right?

Yes, exactly (could be more blocks than 3, though, of course).

One of my problems is that this is too tied to the syntax.

How do you mean, exactly?

Another is that we are not deleting a loop. The former can be addressed later I guess.

Isn't this more related to LoopUnswitch?
We want to "unswitch the paths that are side-effect free", or something like that.

I thought that if we cannot delete the whole loop we can effectively delete a part of it with an early exit.

LoopUnswitching only works if the condition is loop invariant currently as far as I can see, so I am not sure how simple that would be...

We could also make it generic by collecting the blocks that are allowed on such a path
in order to allow different CFGs. But that can be done later as well.

Might be worth a try...

jdoerfert added a reviewer: fhahn.Dec 22 2020, 4:23 PM

In D93734#2469162, @jonpa wrote:
In D93734#2469137, @jdoerfert wrote:
The idea is that we have
H:
  %c = ... ; invariant wrt H and L
  br %c, L, B
B: 
  side_effects
  br L
L:
  br %x, H, Exit

Exit:
  ...
right?
Yes, exactly (could be more blocks than 3, though, of course).

One of my problems is that this is too tied to the syntax.

How do you mean, exactly?

H:
  %c1 = ... ; invariant wrt H, LazyEval, and L
  br %c1, LazyEval, B
LazyEval:  
  %c2 = ... ; invariant wrt H, LazyEval, and L
  br %c2, L, B
B: 
  side_effects
  br L
L:
  br %x, H, Exit
Exit:
   ...

I mean the above breaks the matching even though the same logic applies. I would argue that is not what we want.
(And given that %c1 and %c2 can have loads, we might not be able to undo short circuit evaluation here to reduce
it to your initial pattern.)

Isn't this more related to LoopUnswitch?
We want to "unswitch the paths that are side-effect free", or something like that.

I thought that if we cannot delete the whole loop we can effectively delete a part of it with an early exit.

We delete parts, agreed, but not a loop. Unsure if this is the right place conceptually.

LoopUnswitching only works if the condition is loop invariant currently as far as I can see, so I am not sure how simple that would be...

With that argumentation you could say LoopDeletion doesn't do this either (right now) ;)
I mean, you can simply copy this code there and it should work, right?

We could also make it generic by collecting the blocks that are allowed on such a path
in order to allow different CFGs. But that can be done later as well.

Might be worth a try...

FWIW, I my goal is to make it something driven by a generic analysis, not another pattern we match.

As implemented, this is a bit too specific to one particular pattern to be commitable, but I think you've got the seed of a good idea here.

I think you can generalize this as follows:

If all conditions contributing to control flow along the path from header to exit block are loop invariant (use SCEVs definition), then the exit is either taken or not taken equally on all iterations.
If all exit blocks meet the previous criteria, then the loop must either execute once or be infinite. If we can prove it's not infinite, it must execute once.
For a loop which executes exactly once, the backedge is dead. We can break the backedge, leave the rest of the loop unchanged, and get the result you're looking for.

As a special case of the above (which is probably all we should implement), if all loop exits dominate the backedge, SCEV will be able to compute the backedge taken count as zero. You can simply check the backedge taken count and break the backedge.

Note that my phrasing doesn't involve reasoning about side effects.

dongAxis1944 added a subscriber: dongAxis1944.Dec 23 2020, 12:15 AM

Thanks for sharing the patch Jonas!

In D93734#2469173, @jdoerfert wrote:

In D93734#2469162, @jonpa wrote:

In D93734#2469137, @jdoerfert wrote:

Isn't this more related to LoopUnswitch?
We want to "unswitch the paths that are side-effect free", or something like that.

I thought that if we cannot delete the whole loop we can effectively delete a part of it with an early exit.

We delete parts, agreed, but not a loop. Unsure if this is the right place conceptually.

LoopUnswitching only works if the condition is loop invariant currently as far as I can see, so I am not sure how simple that would be...

As mentioned already, I think framing this as extension to unswitching could be interesting. In a way, it is some kind of 'partial' unswitching, where only the condition of the no-sideeffect version becomes known after unswitching.

I put together an early version of that idea in D93764. It focuses primarily on conditions as in omnetpp's MACRelayUnitBase::printAddressTable, where the condition is a load of a loaded address and one the path from the false successor to header/exits has no side effects. It still an early version(not ready for review yet) which needs more tests and comments, but from looking at the impact on the test-suite (increase in number of branches unswitched), it seems that this could be viable in general.

In D93734#2469399, @reames wrote:

Note that my phrasing doesn't involve reasoning about side effects.

I think the side effect part is still needed (at some point) as the use case they want to tackle has loads in the condition that might be modified by the "main body" (which is potentially never executed).

fhahn mentioned this in D93764: [LoopUnswitch] Implement first version of partial unswitching..Dec 27 2020, 11:43 AM

reames mentioned this in D93906: [LoopDeletion] Break backedge of loops when known not taken.Dec 29 2020, 10:41 AM

With the motto of pushing things forward even if only by aiding the other related patches, I have continued to improve my patch to use as some kind of baseline for "early exit" insertions. Perhaps it can be used during development of the partial loop-unswitching to find cases to handle, or perhaps it could be used for some cases if it would reduce the burden on the other algorithm. It would be very nice if partial unswitching could handle all this instead, of course :-)

Instead of just handling the header->latch edge, now any edge from the "Header region" to the "Latch region" can be handled. The requirements are that in the header region, all conditions must be loop invariant, and there can only be one reachable exit block in the latch region from the branch target block back to header. If there is no exit block on the dead path but the loop has a unique exit block, that exit block is used for the early exit, given the mustprogress attribute.

Interestingly enough, there are now cases where this patch manages to "delete" (or "eliminate") a loop which has multiple exits. The loop structure is removed while the BBs remain, which should hopefullyl be removed later by CFG-opt. I think this is rare on benchmarks though... (see below)

I found out quickly that perhaps the hardest part of this was to update datastructures after changing the CFG in a loop... I have barely been able to build the benchmarks as it is, so I am in need of some good advice on how to update things after a loop change, for this patch to be usable. Currently I have changed things temporarily so that LI, DT, etc are recomputed after loop deletion (BTW, I found that only recomputing those analyses on master after LoopDeletion was not NFC, which surprised me... Is that a bug or expected with the aim to save compile time?)

Statistics on SPEC-17 on SystemZ:

master (patched to not preserve analyses after LoopDeletion for a fair comparison):

      7276                    loop-delete - Number of loops deleted

Only Header/Latch (like first simple version aimed to do):
      1414                    loop-delete - Number of early exits inserted
      7498                    loop-delete - Number of loops deleted
         8                    loop-delete - Number of loops eliminated (no remaining blocks)
       462                    loop-delete - Number of skipped loops (SCEV 0)

Top/Bot regions (current patch):
      2353                    loop-delete - Number of early exits inserted
      7397                    loop-delete - Number of loops deleted
         6                    loop-delete - Number of loops eliminated (no remaining blocks)
       462                    loop-delete - Number of skipped loops (SCEV 0)

A rise from ~1400 to ~2350 is a fairly nice improvement in number of early exits inserted compared to initial patch. There seem to be some difference also in number of loops deleted as reported by deleteLoopIfDead(), which I have not investigated. It seems that somehow must relate to a parent loop now not having a subloop..?

@reames: I am not quite sure how the SE->isZero() case directly relates to the detection of dead paths per my patch... I have here tried a statistic for that case, which showed that some loops could be handled by your patch, while there are still many more that can not... (see above).

@jdoerfert:

FWIW, I my goal is to make it something driven by a generic analysis, not another pattern we match.

I agree with that, I have now made the patch more general.

In D93734#2478272, @jonpa wrote:

I found out quickly that perhaps the hardest part of this was to update datastructures after changing the CFG in a loop... I have barely been able to build the benchmarks as it is, so I am in need of some good advice on how to update things after a loop change, for this patch to be usable. Currently I have changed things temporarily so that LI, DT, etc are recomputed after loop deletion (BTW, I found that only recomputing those analyses on master after LoopDeletion was not NFC, which surprised me... Is that a bug or expected with the aim to save compile time?)

Statistics on SPEC-17 on SystemZ:

Thanks for the update Jonas! It looks like the patch includes some required changes that still landed (D86844), which might impact the number of removed loops. It might be good to re-collect the statistics. I tried to collect stats with this patch on SPEC2000/SPEC2006/MultiSource for X86 with LTO, but unfortunately there have been a few crashes.

@jdoerfert:

FWIW, I my goal is to make it something driven by a generic analysis, not another pattern we match.

I agree with that, I have now made the patch more general.

I think both this patch and D93764 require very similar analysis to find 'no-op' paths through loops (with the difference that partial unswitching can allow stores to memory that does not clobber the condition). Do you think it would be worth unifying the analysis code?

In D93734#2486760, @fhahn wrote:

In D93734#2478272, @jonpa wrote:

I found out quickly that perhaps the hardest part of this was to update datastructures after changing the CFG in a loop... I have barely been able to build the benchmarks as it is, so I am in need of some good advice on how to update things after a loop change, for this patch to be usable. Currently I have changed things temporarily so that LI, DT, etc are recomputed after loop deletion (BTW, I found that only recomputing those analyses on master after LoopDeletion was not NFC, which surprised me... Is that a bug or expected with the aim to save compile time?)

Statistics on SPEC-17 on SystemZ:

Thanks for the update Jonas! It looks like the patch includes some required changes that still landed (D86844), which might impact the number of removed loops. It might be good to re-collect the statistics. I tried to collect stats with this patch on SPEC2000/SPEC2006/MultiSource for X86 with LTO, but unfortunately there have been a few crashes.

Sorry to hear about the crashes, hopefully it can all be fixed... I rebased the patch and did a new run:

master (patched to not preserve analyses after LoopDeletion for a fair comparison):

      7557                    loop-delete - Number of loops deleted

current patch:
      2352                    loop-delete - Number of early exits inserted
      7420                    loop-delete - Number of loops deleted
         6                    loop-delete - Number of loops eliminated (no remaining blocks)
       439                    loop-delete - Number of skipped loops (SCEV 0)

@jdoerfert:

FWIW, I my goal is to make it something driven by a generic analysis, not another pattern we match.

I agree with that, I have now made the patch more general.

I think both this patch and D93764 require very similar analysis to find 'no-op' paths through loops (with the difference that partial unswitching can allow stores to memory that does not clobber the condition). Do you think it would be worth unifying the analysis code?

To me it depends on what your goal is with the partial unswitching - do you mean to extend your patch to include multiple partially invariant conditions in the header region? (Or is that already a side-effect of revisiting the new loop? If so, there is no need to compute the "Top-region" like this patch does.). I guess we could perhaps have a common function that finds a unique exit block from SuccBB back to header and collect the memory references as well on that path...

Do you think partial unswitching will be able eventually to handle all the early-exit cases? I haven't tried this patch yet on top of your patch, but that will be interesting :-)

patch rebased

jdoerfert added inline comments.Jan 8 2021, 2:33 PM

llvm/lib/Transforms/Scalar/LoopDeletion.cpp
118	DriveBy: This hurts. I know it was there before but `auto &I` is `BasicBlock `... argh. Could we please change that if we commit this to a `auto ` at least with a sensible variable name.

jonpa added inline comments.Jan 9 2021, 1:42 PM

llvm/lib/Transforms/Scalar/LoopDeletion.cpp
118	I think it's better in that case perhaps to just commit such an NFC change separately beforehand?

fhahn mentioned this in rGbee486851c1a: [LoopUnswitch] Implement first version of partial unswitching..Jan 21 2021, 1:47 AM

What is left after we merged the loop unswitch solution?

llvm/lib/Transforms/Scalar/LoopDeletion.cpp
118	Yeah, stuff like that can be committed directly as NFC.

fhahn mentioned this in D95468: [LoopUnswitch] Add shortcut if unswitched path is a no-op..Jan 26 2021, 12:12 PM

In D93734#2512432, @jdoerfert wrote:

What is left after we merged the loop unswitch solution?

I realized that the for the SECP2017 version, loop-unswitching does not happen by default, due to cost-modeling. (It does happen for the SPEC2006 version). So I tried to extend the logic to check if the candidate path is a no-op: D95468. That should also handle the SPEC2017 case.

Patch rebased.

What is left after we merged the loop unswitch solution?

I did a rerun today on top of 302432f, which should include both Florians and Philips patches:

trunk:
      8201                    loop-delete - Number of loops deleted

patch:

      2624                    loop-delete - Number of early exits inserted
      8006                    loop-delete - Number of loops deleted
         6                    loop-delete - Number of loops eliminated (no remaining blocks)
       279                    loop-delete - Number of skipped loops (SCEV 0)

patch + D95468

      2617                    loop-delete - Number of early exits inserted
      7798                    loop-delete - Number of loops deleted
         6                    loop-delete - Number of loops eliminated (no remaining blocks)
       279                    loop-delete - Number of skipped loops (SCEV 0)

I realized that the for the SECP2017 version, loop-unswitching does not happen by default, due to cost-modeling. (It does happen for the SPEC2006 version). So I tried to extend the logic to check if the candidate path is a no-op: D95468. That should also handle the SPEC2017 case.

I may be doing something wrong, but D95468 did not help very much looking at these numbers it seems...

I may be doing something wrong, but D95468 did not help very much looking at these numbers it seems...

Maybe outer loops have been skipped and therefore you avoided duplication of outer and inner loops (with D95468). The statistics we have are too coarse grained to exactly pinpoint what happened.

In D93734#2524218, @jdoerfert wrote:

I may be doing something wrong, but D95468 did not help very much looking at these numbers it seems...

Maybe outer loops have been skipped and therefore you avoided duplication of outer and inner loops (with D95468). The statistics we have are too coarse grained to exactly pinpoint what happened.

I wonder what loops are those which where "partially unrolled", and which were not. Do the partially unrolled ones get some recognizable name maybe in the header blocks?

In D93734#2529365, @jonpa wrote:

In D93734#2524218, @jdoerfert wrote:

I may be doing something wrong, but D95468 did not help very much looking at these numbers it seems...

Maybe outer loops have been skipped and therefore you avoided duplication of outer and inner loops (with D95468). The statistics we have are too coarse grained to exactly pinpoint what happened.

I wonder what loops are those which where "partially unrolled", and which were not. Do the partially unrolled ones get some recognizable name maybe in the header blocks?

Do you mean partially unswitched or partially unrolled? I don't think so unfortunately. But I think most cases where already caught with D95468 and only in a few cases we now skip duplication. Also note that D95468 adds the shortcut outside the loop, so loop-deletion will still insert an early exit as before (long term I think we could also just turn the branch in the loop into an early exit).

fhahn mentioned this in rGb8c81fa5c7f7: [LoopUnswitch] Add shortcut if unswitched path is a no-op..Feb 1 2021, 1:04 AM

Patch updated to also run with the new pass manager.

This now gives a ~9% improvement on Omnetpp - just like it was with the legacy pass manager (before but not after Florians patch, I believe).

@fhahn : are you going to port your improvement to loop unswitching to newpm?

Harbormaster completed remote builds in B93181: Diff 329788.Mar 11 2021, 2:24 AM

In D93734#2618081, @jonpa wrote:

Patch updated to also run with the new pass manager.

This now gives a ~9% improvement on Omnetpp - just like it was with the legacy pass manager (before but not after Florians patch, I believe).

@fhahn : are you going to port your improvement to loop unswitching to newpm?

Yes I am planning to, but we are seeing several other regressions with the new pass manager that I'll probably need to investigate first. So if anyone wants to port the loop-unswitching changes before I get a chance, I'd be more than happy.

I again saw a regression on omnetpp against gcc so I decided to revisit this patch as it previously handled that benchmar. In doing so I have updated/revisited with some more comments as well. It now only runs on the new pass manager.

It seems however that this time it was not the loop in PrintAddressTable that was the issue, so I still can't say what the omnetpp regression is about this time. However, I reran SPEC-17 and found that a few benchmarks seemed to improve just slighty (~1%). Statistics report that still ~1400 loops get optimized with this patch (~2400 edges redirected). So at least in theory, this patch might still be of interest, even if just to give hints on missed optimizations by other loop passes.

I checked if the extra work of finding Top/Bot regions and handling multiple exits where still worthwhile (compared to a very simple approach). I found that the Top/Bot search about doubled the effectiveness (relative just checking edges from Header to Latch), where the multiple exit loops were about 10% of the cases (I now made multiple-exit handling fall under an experimental option "-early-exit-extra".).

f510.parest_r seemed to improve ~1.5% on both z14 and z15, and I made a reduced test case for one of the files changed (sparsity_pattern.ii, picked randomly). I see three less BBs in the output and an outer loop removed.

opt -mtriple=systemz-unknown -march=z14 -O3 ./tc_earlex.ll -debug-only=loop-delete

Analyzing Loop for deletion: Loop at depth 1 containing: %bb5.preheader<header>,%bb8,%bb1.loopexit<latch><exiting>,%bb8.preheader,%bb1.loopexit.loopexit
    Loop at depth 2 containing: %bb8<header><latch><exiting>
Loop is not invariant, cannot delete.
Trying to insert early exits:
Top region: bb5.preheader, bb8.preheader, 
Bot region: bb1.loopexit, bb1.loopexit.loopexit, 
Inserting early exit in bb5.preheader:
  br i1 %i7.not3, label %bb1.loopexit, label %bb8.preheader
  =>
  br i1 %i7.not3, label %bb9.loopexit, label %bb8.preheader

tc_earlex.ll3 KBDownload

Thoughts anyone? Maybe somebody would like to try the patch on some other platform?

Harbormaster completed remote builds in B125834: Diff 375210.Sep 27 2021, 5:42 AM

@fhahn : I reran this today and found that there are no real benchmark performance improvements anymore. However, there are still +2000 early exits insertions reported, It seems you have managed to handle the important cases, so I guess this still isn't really interesting. It would be however interesting to hear your thoughts on this: I remember you did a handling for this and I guess you must have covered the important cases already..?

Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2022, 2:43 AM

I have derived two reduced test cases where this patch improve the loop branching from SPEC. These were the first two files I looked at out of many, and it seems that in both cases LoopDeletion fails in deleting the loop as it is found to be not variant. In both of these cases it was a matter of breaking the outer loop if the inner loop was never visited.

On SPEC, I still see 2185 of these early exits inserted. This is kind of interesting, although it may be that this is not going to improve any benchmarks. Maybe it relates to outer loops with inner loops that always are executed. Maybe the overhead of the outer loop isn't that great in other cases...

// Derived from gcc / tree-vect-slp.c                                                                                                                                                                                                                                             

int VEC_gimple_base_iterate_vec_;

void build_vector();

unsigned VEC_gimple_base_length();

int VEC_gimple_base_iterate() {
  if (VEC_gimple_base_iterate_vec_)
    return 1;
  return 0;
}

void vect_get_constant_vectors() {
  int j = VEC_gimple_base_length();
  for (; j; j++)                            **// No Loop deletion, but early exit if inner loop never visited.  **                                                                                                                                                                                              
    for (; VEC_gimple_base_iterate();)
      build_vector();
}

clang -c -o tree-vect-slp.s -S -O3 -march=arch13  tree-vect-slp.i -w -mllvm -debug-only=loop-delete

Analyzing Loop for deletion: Loop at depth 1 containing: %for.cond1.preheader<header>,%for.body4,%for.inc<latch><exiting>,%for.body4.preheader,%for.inc.loopexit
    Loop at depth 2 containing: %for.body4<header><latch><exiting>
Loop is not invariant, cannot delete.
Trying to insert early exits:
Top region: for.cond1.preheader, for.body4.preheader, 
Bot region: for.inc, for.inc.loopexit, 
Inserting early exit in for.cond1.preheader:
  br i1 %tobool.not.i.not7, label %for.inc, label %for.body4.preheader
  =>
  br i1 %tobool.not.i.not7, label %for.end5.loopexit10, label %for.body4.preheader

// Derived from cactus / FlatBoundary.c                                                                                                                                                                                                                                           

void memcpy();
int CCTK_GroupDimI();

typedef struct {
  int *cctk_lsh
} cGH;

int Glob_A, Glob_B, Glob_C;

void BndFlatDirVIApplyBndFlat(cGH *GH) {
  int i = 0, j = 0, k = 0, ash[3] = {0, 0, 0}, lsh[3] = {0, 0, 0};
  int vtypesize = CCTK_GroupDimI();

  for (; Glob_A;) {
    for (i = 0; i < Glob_B; i++)
      ash[i] = lsh[i] = GH->cctk_lsh[i];

    for (k = 0; k < Glob_C; k++)
      for (; j < 1;)
        for (; lsh[0];)
          ;

    for (; k < 100000; k++) {               // No Loop deletion, but early exit if inner loop(s) never visited.                                                                                                                                                                               
      for (j = 0; j < lsh[1]; j++) {
        for (i = 0; i < lsh[0]; i++) {
          int _index_to = ash[0] * ash[1] * (k - 1) * vtypesize;
          memcpy(GH + _index_to);
        }
      }
    }
  }

}



Analyzing Loop for deletion: Loop at depth 2 containing: %for.cond25.preheader.us<header>,%for.cond29.preheader.us.us,%for.body32.us.us,%for.cond29.for.inc40_crit_edge.us.us,%for.cond25.for.inc43_crit_edge.us<latch><exiting>,%for.cond29.preheader.us.us.preheader,%for.cond25.for.inc43_crit_edge.us.loopexit
    Loop at depth 3 containing: %for.cond29.preheader.us.us<header>,%for.body32.us.us,%for.cond29.for.inc40_crit_edge.us.us<latch><exiting>
        Loop at depth 4 containing: %for.body32.us.us<header><latch><exiting>
Loop is not invariant, cannot delete.
Trying to insert early exits:
Top region: for.cond25.preheader.us, for.cond29.preheader.us.us.preheader, 
Bot region: for.cond25.for.inc43_crit_edge.us, for.cond25.for.inc43_crit_edge.us.loopexit, 
Inserting early exit in for.cond25.preheader.us:
  br i1 %cmp3165, label %for.cond29.preheader.us.us.preheader, label %for.cond25.for.inc43_crit_edge.us
  =>
  br i1 %cmp3165, label %for.cond29.preheader.us.us.preheader, label %for.cond.loopexit.loopexit

@fhahn Any comments on the C test cases I derived..?

ping

@fhahn : I am still curious why this is not an improvement. I realize that you have your reasoning that makes this patch less likely beneficial, and as I spent some time on it, I would be very happy to also understand why :-)

Herald added a subscriber: StephenFan. · View Herald TranscriptJun 7 2023, 5:54 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

LoopDeletion.cpp

254 lines

test/

Transforms/

LoopDeletion/

early-exits.ll

482 lines

noop-loops-with-subloops.ll

2 lines

Diff 375210

llvm/lib/Transforms/Scalar/LoopDeletion.cpp

Show All 30 Lines
#include "llvm/Transforms/Scalar/LoopPassManager.h"		#include "llvm/Transforms/Scalar/LoopPassManager.h"
#include "llvm/Transforms/Utils/LoopUtils.h"		#include "llvm/Transforms/Utils/LoopUtils.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "loop-delete"		#define DEBUG_TYPE "loop-delete"

STATISTIC(NumDeleted, "Number of loops deleted");		STATISTIC(NumDeleted, "Number of loops deleted");
		STATISTIC(NumEarlyExits, "Number of early exits inserted");
		STATISTIC(NumEarlyExitedLoops, "Number of loops with early exit inserted");
		STATISTIC(NumEarlyExitedLoopsMultiExit,
		"Number of loops with early exit inserted: multi exit");

static cl::opt<bool> EnableSymbolicExecution(		static cl::opt<bool> EnableSymbolicExecution(
"loop-deletion-enable-symbolic-execution", cl::Hidden, cl::init(true),		"loop-deletion-enable-symbolic-execution", cl::Hidden, cl::init(true),
cl::desc("Break backedge through symbolic execution of 1st iteration "		cl::desc("Break backedge through symbolic execution of 1st iteration "
"attempting to prove that the backedge is never taken"));		"attempting to prove that the backedge is never taken"));

enum class LoopDeletionResult {		enum class LoopDeletionResult {
Unmodified,		Unmodified,
Modified,		Modified,
Deleted,		Deleted,
};		};

static LoopDeletionResult merge(LoopDeletionResult A, LoopDeletionResult B) {		static LoopDeletionResult merge(LoopDeletionResult A, LoopDeletionResult B) {
if (A == LoopDeletionResult::Deleted \|\| B == LoopDeletionResult::Deleted)		if (A == LoopDeletionResult::Deleted \|\| B == LoopDeletionResult::Deleted)
return LoopDeletionResult::Deleted;		return LoopDeletionResult::Deleted;
if (A == LoopDeletionResult::Modified \|\| B == LoopDeletionResult::Modified)		if (A == LoopDeletionResult::Modified \|\| B == LoopDeletionResult::Modified)
return LoopDeletionResult::Modified;		return LoopDeletionResult::Modified;
return LoopDeletionResult::Unmodified;		return LoopDeletionResult::Unmodified;
}		}

		static bool BBHasSideEffects(const BasicBlock *BB) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'BBHasSideEffects' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'BBHasSideEffects' [readability-identifier…
		return (any_of(*BB, [](const Instruction &I) {
		return I.mayHaveSideEffects() && !I.isDroppable();
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - return I.mayHaveSideEffects() && !I.isDroppable(); - })); + return I.mayHaveSideEffects() && !I.isDroppable(); + })); Lint: Pre-merge checks: clang-format: please reformat the code ``` - return I.mayHaveSideEffects() && !I.
		}));
		}

/// Determines if a loop is dead.		/// Determines if a loop is dead.
///		///
/// This assumes that we've already checked for unique exit and exiting blocks,		/// This assumes that we've already checked for unique exit and exiting blocks,
/// and that the code is in LCSSA form.		/// and that the code is in LCSSA form.
static bool isLoopDead(Loop *L, ScalarEvolution &SE,		static bool isLoopDead(Loop *L, ScalarEvolution &SE,
SmallVectorImpl<BasicBlock *> &ExitingBlocks,		SmallVectorImpl<BasicBlock *> &ExitingBlocks,
BasicBlock *ExitBlock, bool &Changed,		BasicBlock *ExitBlock, bool &Changed,
BasicBlock *Preheader, LoopInfo &LI) {		BasicBlock *Preheader, LoopInfo &LI) {
Show All 33 Lines	static bool isLoopDead(Loop *L, ScalarEvolution &SE,

if (!AllEntriesInvariant \|\| !AllOutgoingValuesSame)		if (!AllEntriesInvariant \|\| !AllOutgoingValuesSame)
return false;		return false;

// Make sure that no instructions in the block have potential side-effects.		// Make sure that no instructions in the block have potential side-effects.
// This includes instructions that could write to memory, and loads that are		// This includes instructions that could write to memory, and loads that are
// marked volatile.		// marked volatile.
for (auto &I : L->blocks())		for (auto &I : L->blocks())
if (any_of(*I, [](Instruction &I) {		if (BBHasSideEffects(I))
		jdoerfertUnsubmitted Not Done Reply Inline Actions DriveBy: This hurts. I know it was there before but `auto &I` is `BasicBlock `... argh. Could we please change that if we commit this to a `auto ` at least with a sensible variable name. jdoerfert: DriveBy: This hurts. I know it was there before but `auto &I` is `BasicBlock *`... argh. Could…
		jonpaAuthorUnsubmitted Done Reply Inline Actions I think it's better in that case perhaps to just commit such an NFC change separately beforehand? jonpa: I think it's better in that case perhaps to just commit such an NFC change separately…
		jdoerfertUnsubmitted Not Done Reply Inline Actions Yeah, stuff like that can be committed directly as NFC. jdoerfert: Yeah, stuff like that can be committed directly as NFC.
return I.mayHaveSideEffects() && !I.isDroppable();
}))
return false;		return false;

// The loop or any of its sub-loops looping infinitely is legal. The loop can		// The loop or any of its sub-loops looping infinitely is legal. The loop can
// only be considered dead if either		// only be considered dead if either
// a. the function is mustprogress.		// a. the function is mustprogress.
// b. all (sub-)loops are mustprogress or have a known trip-count.		// b. all (sub-)loops are mustprogress or have a known trip-count.
if (L->getHeader()->getParent()->mustProgress())		if (L->getHeader()->getParent()->mustProgress())
return true;		return true;
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	static bool isLoopNeverExecuted(Loop *L) {
assert(!pred_empty(Preheader) &&		assert(!pred_empty(Preheader) &&
"Preheader should have predecessors at this point!");		"Preheader should have predecessors at this point!");
// All the predecessors have the loop preheader as not-taken target.		// All the predecessors have the loop preheader as not-taken target.
return true;		return true;
}		}

static Value *		static Value *
getValueOnFirstIteration(Value V, DenseMap<Value , Value *> &FirstIterValue,		getValueOnFirstIteration(Value V, DenseMap<Value , Value *> &FirstIterValue,
const SimplifyQuery &SQ) {		const SimplifyQuery &SQ) {
		jdoerfertUnsubmitted Not Done Reply Inline Actions Nit: Unrelated and misses documentation. A useful helper though, could be added to D86844 directly (if you don't mind). jdoerfert: Nit: Unrelated and misses documentation. A useful helper though, could be added to D86844…
// Quick hack: do not flood cache with non-instruction values.		// Quick hack: do not flood cache with non-instruction values.
if (!isa<Instruction>(V))		if (!isa<Instruction>(V))
return V;		return V;
// Do we already know cached result?		// Do we already know cached result?
auto Existing = FirstIterValue.find(V);		auto Existing = FirstIterValue.find(V);
if (Existing != FirstIterValue.end())		if (Existing != FirstIterValue.end())
return Existing->second;		return Existing->second;
Value *FirstIterV = nullptr;		Value *FirstIterV = nullptr;
▲ Show 20 Lines • Show All 310 Lines • ▼ Show 20 Lines	return OptimizationRemark(DEBUG_TYPE, "Invariant", L->getStartLoc(),
<< "Loop deleted because it is invariant";		<< "Loop deleted because it is invariant";
});		});
deleteDeadLoop(L, &DT, &SE, &LI, MSSA);		deleteDeadLoop(L, &DT, &SE, &LI, MSSA);
++NumDeleted;		++NumDeleted;

return LoopDeletionResult::Deleted;		return LoopDeletionResult::Deleted;
}		}

		// EXPERIMENTAL
		// Also handle loops with multiple exits.
		static cl::opt<bool> EarlyExitExtra("early-exit-extra", cl::init(false), cl::Hidden);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -static cl::opt<bool> EarlyExitExtra("early-exit-extra", cl::init(false), cl::Hidden); +static cl::opt<bool> EarlyExitExtra("early-exit-extra", cl::init(false), + cl::Hidden); Lint: Pre-merge checks: clang-format: please reformat the code ``` -static cl::opt<bool> EarlyExitExtra("early-exit…

		// The following two functions check if the terminator condition of a BB is
		// loop-invariant. If it is not, the edge cannot be redirected since another
		// edge may be taken in a following iteration.
		static bool usesLoopPHI(const Instruction I, const Loop L) {
		if (!L->contains(I->getParent()))
		return false;
		if (isa<PHINode>(I))
		return true;
		for (auto &Op : I->operands())
		if (const Instruction *OpI = dyn_cast<Instruction>(&Op))
		if (usesLoopPHI(OpI, L))
		return true;
		return false;
		}

		static bool BBHasLIVCondOnDeadPath(const BasicBlock BB, Loop L) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'BBHasLIVCondOnDeadPath' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'BBHasLIVCondOnDeadPath' [readability…
		// Check if the condition might change after an iteration across a dead
		// path. (TODO: worth checking if incoming value from preheader/latch are
		// the same and allow that case?)
		const Instruction *TI = BB->getTerminator();
		const Instruction *Cond = nullptr;
		if (const BranchInst *BI = dyn_cast<BranchInst>(TI)) {
		if (BI->isConditional())
		Cond = dyn_cast<Instruction>(BI->getCondition());
		} else if (const SwitchInst *SI = dyn_cast<SwitchInst>(TI)) {
		Cond = dyn_cast<Instruction>(SI->getCondition());
		} else
		return false;
		return (Cond == nullptr \|\| !usesLoopPHI(Cond, L));
		}

		// These two functions are used after an early exit was inserted to remove
		// any BB that has become disconnected from the loop.
		static bool hasPredecessorInLoop(const BasicBlock BB, Loop L) {
		for (auto *Pred: predecessors(BB))
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - for (auto Pred: predecessors(BB)) + for (auto Pred : predecessors(BB)) Lint: Pre-merge checks: clang-format: please reformat the code ``` - for (auto *Pred: predecessors(BB)) + for (auto…
		if (L->contains(Pred))
		return true;
		return false;
		}
		jdoerfertUnsubmitted Not Done Reply Inline Actions that reminds me, we really need to make `llvm.assume` side-effect free... D89054 jdoerfert: that reminds me, we really need to make `llvm.assume` side-effect free... D89054

		static bool hasSuccessorInLoop(const BasicBlock BB, Loop L) {
		for (auto *Succ: successors(BB))
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - for (auto Succ: successors(BB)) + for (auto Succ : successors(BB)) Lint: Pre-merge checks: clang-format: please reformat the code ``` - for (auto *Succ: successors(BB)) + for (auto…
		if (L->contains(Succ))
		return true;
		return false;
		}

		// The most simple version of this algorithm would only consider edges
		// between the header and latch. By finding more blocks connected to the
		// header or latch, more (~ x2) edges can be redirected. For instance, an
		// edge from the header to the latch via a third block is dead and can be
		// exited if all blocks are free from sideeffects. The "Top" region
		// (connected to the header) is found whe Forward is true, otherwise the
		// "Bot" region is computed (connected to the latch).
		static void findNoSEBlocks(SmallPtrSet<BasicBlock *, 4> &NoSEBBs,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -static void findNoSEBlocks(SmallPtrSet<BasicBlock , 4> &NoSEBBs, - Loop L, bool Forward) { +static void findNoSEBlocks(SmallPtrSet<BasicBlock , 4> &NoSEBBs, Loop L, + bool Forward) { Lint: Pre-merge checks: clang-format: please reformat the code ``` -static void findNoSEBlocks(SmallPtrSet<BasicBlock *…
		Loop *L, bool Forward) {
		// - Demand loop invariant branch conditions in Top to make sure an edge to
		// Bot can be broken safely.
		// - Allow varying branch conditions in Bot region as a unique exit block
		// is checked for later.
		auto takeBlock = [&L, &Forward](BasicBlock *BB) -> bool {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'takeBlock' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'takeBlock' [readability-identifier…
		return !BBHasSideEffects(BB) && (!Forward \|\| BBHasLIVCondOnDeadPath(BB, L));
		};

		BasicBlock *Header = L->getHeader();
		BasicBlock *Latch = L->getLoopLatch();
		BasicBlock *Start = Forward ? Header : Latch;
		if (!takeBlock(Start))
		return;
		NoSEBBs.insert(Start);

		bool Change = true;
		while (Change) {
		Change = false;
		for (auto BB : L->getBlocks()) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: 'auto BB' can be declared as 'auto BB' [llvm-qualified-auto] not useful Lint: Pre-merge checks:* clang-tidy: warning: 'auto BB' can be declared as 'auto *BB' [llvm-qualified-auto] [[https…
		if (NoSEBBs.count(BB) \|\| !takeBlock(BB))
		continue;

		bool All = true;
		if (Forward) {
		for (auto *Pred: predecessors(BB))
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - for (auto Pred: predecessors(BB)) + for (auto Pred : predecessors(BB)) Lint: Pre-merge checks: clang-format: please reformat the code ``` - for (auto *Pred: predecessors(BB)) +…
		if (!NoSEBBs.count(Pred)) {
		All = false;
		break;
		}
		} else {
		for (auto *Succ: successors(BB))
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - for (auto Succ: successors(BB)) + for (auto Succ : successors(BB)) Lint: Pre-merge checks: clang-format: please reformat the code ``` - for (auto *Succ: successors(BB)) +…
		if (L->contains(Succ) && !NoSEBBs.count(Succ)) {
		All = false;
		break;
		}
		}
		if (All) {
		NoSEBBs.insert(BB);
		Change = true;
		}
		}
		}
		}

		// Return the unique exit block for BB (in "Bot"). If there is no such block
		// for the loop as a whole, it is still possible there is one on the path
		// from BB to the backedge (this results in ~10% more loops/edges being
		// optimized). If there are more than one, return null. Also make sure there
		// are no live-out values.
		static BasicBlock findExitBlock(const BasicBlock BB, Loop *L) {
		BasicBlock *ExitBB = L->getUniqueExitBlock();

		if (!ExitBB && EarlyExitExtra) {
		SmallVector<const BasicBlock *, 4> WorkList;
		WorkList.push_back(BB);
		SmallPtrSet<const BasicBlock *, 4> Visited;
		while (!WorkList.empty()) {
		const BasicBlock *CurrBB = WorkList.pop_back_val();
		if (!Visited.insert(CurrBB).second)
		continue;

		const Instruction *TI = CurrBB->getTerminator();
		for (unsigned i = 0, e = TI->getNumOperands(); i != e; ++i)
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for variable 'e' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming]…
		if (BasicBlock *SuccBB = dyn_cast<BasicBlock>(TI->getOperand(i))) {
		if (!L->contains(SuccBB)) {
		if (ExitBB != nullptr && ExitBB != SuccBB)
		return nullptr;
		ExitBB = SuccBB;
		} else if (SuccBB != L->getHeader()){
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - } else if (SuccBB != L->getHeader()){ + } else if (SuccBB != L->getHeader()) { Lint: Pre-merge checks: clang-format: please reformat the code ``` - } else if (SuccBB != L->getHeader()){ +…
		WorkList.push_back(SuccBB);
		}
		}
		}
		}

		return (ExitBB && !isa<PHINode>(ExitBB->begin())) ? ExitBB : nullptr;
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - return (ExitBB && !isa<PHINode>(ExitBB->begin())) ? ExitBB : nullptr; + return (ExitBB && !isa<PHINode>(ExitBB->begin())) ? ExitBB : nullptr; Lint: Pre-merge checks: clang-format: please reformat the code ``` - return (ExitBB && !isa<PHINode>(ExitBB->begin())) ?
		}

		#ifndef NDEBUG
		static void dumpBBSet(std::string Msg, SmallPtrSet<BasicBlock *, 4> &S) {
		dbgs() << Msg << ": ";
		for (const BasicBlock *BB : S)
		dbgs() << BB->getName() << ", ";
		dbgs() << "\n";
		}
		#endif

		// In a loop that is not entirely dead, try to find dead paths that if
		// entered are taken in all following iterations and can therefore be exited
		// immediately.
		static LoopDeletionResult tryInsertEarlyExit(Loop *L, DominatorTree &DT,
		ScalarEvolution &SE, LoopInfo &LI,
		MemorySSA *MSSA,
		OptimizationRemarkEmitter &ORE) {
		assert(L->isLCSSAForm(DT) && "Expected LCSSA!");
		BasicBlock *LatchBB = L->getLoopLatch();
		if (!LatchBB \|\| L->getNumBlocks() == 1)
		return LoopDeletionResult::Unmodified;

		// Check for a known trip count or a forward progress gurantee.
		const SCEV *S = SE.getConstantMaxBackedgeTakenCount(L);
		if (isa<SCEVCouldNotCompute>(S) && !LatchBB->getParent()->mustProgress() &&
		!hasMustProgress(L))
		return LoopDeletionResult::Unmodified;

		// Compute two sets of blocks which are free of side-effets. Top starts
		// from header, and Bot starts from latch.
		SmallPtrSet<BasicBlock *, 4> Top, Bot;
		findNoSEBlocks(Top, L, true /Forward/);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - findNoSEBlocks(Top, L, true /Forward/); + findNoSEBlocks(Top, L, true /Forward/); Lint: Pre-merge checks: clang-format: please reformat the code ``` - findNoSEBlocks(Top, L, true /Forward/); +…
		findNoSEBlocks(Bot, L, false /Forward/);
		LLVM_DEBUG(dbgs() << "Trying to insert early exits:\n";
		dumpBBSet("Top region", Top);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - dumpBBSet("Top region", Top); - dumpBBSet("Bot region", Bot);); + dumpBBSet("Top region", Top); dumpBBSet("Bot region", Bot);); Lint: Pre-merge checks: clang-format: please reformat the code ``` - dumpBBSet("Top region", Top)…
		dumpBBSet("Bot region", Bot););

		bool Change = false;
		for (auto BB : Top) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: 'auto BB' can be declared as 'auto BB' [llvm-qualified-auto] not useful Lint: Pre-merge checks:* clang-tidy: warning: 'auto BB' can be declared as 'auto *BB' [llvm-qualified-auto] [[https…
		Instruction *TI = BB->getTerminator();
		// We can replace an edge if the conditions will always evaluate the same
		// on the dead path from header to BB and with only one possible exit
		// block from the target BB in the bottom region.
		for (unsigned i = 0, e = TI->getNumOperands(); i != e; ++i)
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for variable 'e' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming]…
		if (BasicBlock *SuccBB = dyn_cast<BasicBlock>(TI->getOperand(i)))
		if (Bot.count(SuccBB))
		if (BasicBlock *ExitBB = findExitBlock(SuccBB, L)) {
		LLVM_DEBUG(dbgs() << "Inserting early exit in " << BB->getName() << ":\n";);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - LLVM_DEBUG(dbgs() << "Inserting early exit in " << BB->getName() << ":\n";); + LLVM_DEBUG(dbgs() << "Inserting early exit in " << BB->getName() + << ":\n";); Lint: Pre-merge checks: clang-format: please reformat the code ``` - LLVM_DEBUG(dbgs() << "Inserting early…
		LLVM_DEBUG(TI->dump(); dbgs() << " =>\n";);
		TI->setOperand(i, ExitBB);
		for (PHINode &Phi : SuccBB->phis())
		Phi.removeIncomingValue(BB);
		LLVM_DEBUG(TI->dump(););
		Change = true;
		NumEarlyExits++;
		}
		}

		if (!Change)
		return LoopDeletionResult::Unmodified;

		NumEarlyExitedLoops++;
		if (!L->getUniqueExitBlock())
		NumEarlyExitedLoopsMultiExit++;

		// Update data structures. (Correct? Hopefully there is a better way..?)
		SE.forgetLoop(L);
		DT.recalculate(const_cast<Function >(LatchBB->getParent()));
		Loop *CurrLoop = L;
		while (CurrLoop != nullptr) {
		// Iteratively remove disconnected blocks from CurrLoop.
		Change = true;
		while (Change) {
		Change = false;
		std::vector<BasicBlock*> LoopBlocks(CurrLoop->block_begin(),
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - std::vector<BasicBlock> LoopBlocks(CurrLoop->block_begin(), - CurrLoop->block_end()); + std::vector<BasicBlock > LoopBlocks(CurrLoop->block_begin(), + CurrLoop->block_end()); Lint: Pre-merge checks: clang-format: please reformat the code ``` - std::vector<BasicBlock*> LoopBlocks(CurrLoop…
		CurrLoop->block_end());
		for (auto BB : LoopBlocks)
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: 'auto BB' can be declared as 'auto BB' [llvm-qualified-auto] not useful Lint: Pre-merge checks:* clang-tidy: warning: 'auto BB' can be declared as 'auto *BB' [llvm-qualified-auto] [[https…
		if (!hasPredecessorInLoop(BB, CurrLoop) \|\|
		!hasSuccessorInLoop(BB, CurrLoop)) {
		LLVM_DEBUG(dbgs() << "Removing block from loop: "
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - LLVM_DEBUG(dbgs() << "Removing block from loop: " - << BB->getName() << "\n";); + LLVM_DEBUG(dbgs() << "Removing block from loop: " << BB->getName() + << "\n";); Lint: Pre-merge checks: clang-format: please reformat the code ``` - LLVM_DEBUG(dbgs() << "Removing block from…
		<< BB->getName() << "\n";);
		CurrLoop->removeBlockFromLoop(BB);
		Change = true;
		}
		}
		CurrLoop = CurrLoop->getParentLoop();
		}

		// If all blocks have been removed, return 'Deleted', or LoopPass will try
		// to dump it which doesn't work with an emtpy loop.
		LoopDeletionResult Result = L->getNumBlocks() ? LoopDeletionResult::Modified
		: LoopDeletionResult::Deleted;
		if (Result == LoopDeletionResult::Deleted) {
		LI.erase(L);
		LLVM_DEBUG(dbgs() << "Loop eliminated: all blocks removed.\n"; );
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - LLVM_DEBUG(dbgs() << "Loop eliminated: all blocks removed.\n"; ); + LLVM_DEBUG(dbgs() << "Loop eliminated: all blocks removed.\n";); Lint: Pre-merge checks: clang-format: please reformat the code ``` - LLVM_DEBUG(dbgs() << "Loop eliminated: all…
		}
		return Result;
		}

PreservedAnalyses LoopDeletionPass::run(Loop &L, LoopAnalysisManager &AM,		PreservedAnalyses LoopDeletionPass::run(Loop &L, LoopAnalysisManager &AM,
LoopStandardAnalysisResults &AR,		LoopStandardAnalysisResults &AR,
LPMUpdater &Updater) {		LPMUpdater &Updater) {

LLVM_DEBUG(dbgs() << "Analyzing Loop for deletion: ");		LLVM_DEBUG(dbgs() << "Analyzing Loop for deletion: ");
LLVM_DEBUG(L.dump());		LLVM_DEBUG(L.dump());
std::string LoopName = std::string(L.getName());		std::string LoopName = std::string(L.getName());
// For the new PM, we can't use OptimizationRemarkEmitter as an analysis		// For the new PM, we can't use OptimizationRemarkEmitter as an analysis
// pass. Function analyses need to be preserved across loop transformations		// pass. Function analyses need to be preserved across loop transformations
// but ORE cannot be preserved (see comment before the pass definition).		// but ORE cannot be preserved (see comment before the pass definition).
OptimizationRemarkEmitter ORE(L.getHeader()->getParent());		OptimizationRemarkEmitter ORE(L.getHeader()->getParent());
auto Result = deleteLoopIfDead(&L, AR.DT, AR.SE, AR.LI, AR.MSSA, ORE);		auto Result = deleteLoopIfDead(&L, AR.DT, AR.SE, AR.LI, AR.MSSA, ORE);

// If we can prove the backedge isn't taken, just break it and be done. This		// If we can prove the backedge isn't taken, just break it and be done. This
// leaves the loop structure in place which means it can handle dispatching		// leaves the loop structure in place which means it can handle dispatching
// to the right exit based on whatever loop invariant structure remains.		// to the right exit based on whatever loop invariant structure remains.
if (Result != LoopDeletionResult::Deleted)		if (Result != LoopDeletionResult::Deleted)
Result = merge(Result, breakBackedgeIfNotTaken(&L, AR.DT, AR.SE, AR.LI,		Result = merge(Result, breakBackedgeIfNotTaken(&L, AR.DT, AR.SE, AR.LI,
AR.MSSA, ORE));		AR.MSSA, ORE));

		if (Result != LoopDeletionResult::Deleted)
		Result = merge(Result, tryInsertEarlyExit(&L, AR.DT, AR.SE, AR.LI, AR.MSSA,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - Result = merge(Result, tryInsertEarlyExit(&L, AR.DT, AR.SE, AR.LI, AR.MSSA, - ORE)); + Result = merge(Result, + tryInsertEarlyExit(&L, AR.DT, AR.SE, AR.LI, AR.MSSA, ORE)); Lint: Pre-merge checks: clang-format: please reformat the code ``` - Result = merge(Result, tryInsertEarlyExit(&L…
		ORE));

if (Result == LoopDeletionResult::Unmodified)		if (Result == LoopDeletionResult::Unmodified)
return PreservedAnalyses::all();		return PreservedAnalyses::all();

if (Result == LoopDeletionResult::Deleted)		if (Result == LoopDeletionResult::Deleted)
Updater.markLoopAsDeleted(L, LoopName);		Updater.markLoopAsDeleted(L, LoopName);

auto PA = getLoopPassPreservedAnalyses();		auto PA = getLoopPassPreservedAnalyses();
if (AR.MSSA)		if (AR.MSSA)
▲ Show 20 Lines • Show All 62 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopDeletion/early-exits.ll

This file was added.

				; RUN: opt < %s -loop-deletion -early-exit-extra -S \| FileCheck %s
				;
				; Test insertion of early exits from dead paths.

				@g = external global i8

				; Known trip count. If %loop branches to %latch, the loop is dead.
				define void @f0() {
				; CHECK-LABEL: @f0(
				entry:
				br label %loop

				loop:
				; CHECK-LABEL: loop:
				; CHECK: br i1 %cmp, label %body, label %exit
				%IV = phi i64 [ 0, %entry ], [ %next, %latch ]
				%b = load i8, i8* @g
				%cmp = icmp ne i8 %b, 0
				br i1 %cmp, label %body, label %latch

				body:
				call void @foo()
				br label %latch

				latch:
				%next = add i64 %IV, 1
				%cmp2 = icmp ne i64 %IV, 128
				br i1 %cmp2, label %loop, label %exit

				exit:
				ret void
				}

				; Loop has mustprogress attribute and @Rb_tree_increment is readonly.
				; If %loop branches to %latch, the loop is dead.
				%0 = type { i32, %0, %0, %0* }
				define void @f1() {
				; CHECK-LABEL: @f1(
				entry:
				br label %loop

				loop: ; preds = %entry, %latch
				; CHECK-LABEL: loop:
				; CHECK: br i1 %i3.i3.not, label %body, label %exit
				%i2.i2 = load i8, i8* inttoptr (i64 8 to i8*)
				%0 = and i8 %i2.i2, 1
				%i3.i3.not = icmp eq i8 %0, 0
				br i1 %i3.i3.not, label %body, label %latch

				body: ; preds = %loop
				tail call void @foo()
				br label %latch

				latch: ; preds = %body, %loop
				%i2.i = tail call %0* @Rb_tree_increment() #0
				%i5.i.not = icmp eq %0* %i2.i, null
				br i1 %i5.i.not, label %exit, label %loop, !llvm.loop !1

				exit: ; preds = %latch
				ret void
				}

				; Header has side-effects
				define void @f2() {
				; CHECK-LABEL: @f2(
				entry:
				br label %loop

				loop:
				; CHECK-LABEL: loop:
				; CHECK: br i1 %cmp, label %body, label %latch
				%IV = phi i64 [ 0, %entry ], [ %next, %latch ]
				%b = load volatile i8, i8* @g
				%cmp = icmp ne i8 %b, 0
				br i1 %cmp, label %body, label %latch

				body:
				call void @foo()
				br label %latch

				latch:
				%next = add i64 %IV, 1
				%cmp2 = icmp ne i64 %IV, 128
				br i1 %cmp2, label %loop, label %exit

				exit:
				ret void
				}

				; Latch has side-effects
				define void @f3(i64* %dst) {
				; CHECK-LABEL: @f3(
				entry:
				br label %loop

				loop:
				; CHECK-LABEL: loop:
				; CHECK: br i1 %cmp, label %body, label %latch
				%IV = phi i64 [ 0, %entry ], [ %next, %latch ]
				%b = load i8, i8* @g
				%cmp = icmp ne i8 %b, 0
				br i1 %cmp, label %body, label %latch

				body:
				call void @foo()
				br label %latch

				latch:
				%next = add i64 %IV, 1
				store volatile i64 0, i64* %dst
				%cmp2 = icmp ne i64 %IV, 128
				br i1 %cmp2, label %loop, label %exit

				exit:
				ret void
				}

				; Condition is not loop-invariant
				define void @f4(i64 %src) {
				; CHECK-LABEL: @f4(
				entry:
				br label %loop

				loop:
				; CHECK-LABEL: loop:
				; CHECK: br i1 %cmp, label %body, label %latch
				%IV = phi i64 [ %src, %entry ], [ %next, %latch ]
				%cmp = icmp eq i64 %IV, 0
				br i1 %cmp, label %body, label %latch

				body:
				call void @foo()
				br label %latch

				latch:
				%next = add i64 %IV, 1
				%cmp2 = icmp ne i64 %IV, 128
				br i1 %cmp2, label %loop, label %exit

				exit:
				ret void
				}

				; Header successors: one has side-effects.
				define void @f5(i64 %src) {
				; CHECK-LABEL: @f5(
				entry:
				br label %loop

				loop:
				; CHECK-LABEL: loop:
				; CHECK: br i1 %cmp, label %exit, label %body1
				%IV = phi i64 [ %src, %entry ], [ %next, %latch ]
				%b = load i8, i8* @g
				%cmp = icmp eq i8 %b, 0
				br i1 %cmp, label %body0, label %body1

				body0:
				; CHECK-LABEL: body0:
				; CHECK: br label %exit
				br label %latch

				body1:
				; CHECK-LABEL: body1:
				; CHECK: br i1 %cmp1, label %body2, label %latch
				call void @foo()
				%cmp1 = icmp ne i8 %b, 2
				br i1 %cmp1, label %body2, label %latch

				body2:
				; CHECK-LABEL: body2:
				; CHECK: br label %latch
				call void @foo()
				br label %latch

				latch:
				%next = add i64 %IV, 1
				%cmp2 = icmp ne i64 %IV, 128
				br i1 %cmp2, label %loop, label %exit

				exit:
				ret void
				}

				; Switch instruction and multiple latches: only early-exit from those with no
				; side-effects.
				define void @f6(i64 %src) mustprogress {
				; CHECK-LABEL: @f6(
				entry:
				br label %loop

				loop:
				; CHECK-LABEL: loop:
				; CHECK: switch i64 %src, label %exit [
				; CHECK-NEXT: i64 2, label %latch2
				; CHECK-NEXT: i64 4, label %exit
				%IV = phi i64 [ %src, %entry ], [ %next1, %latch1 ],
				[ %next2, %latch2 ], [ %next4, %latch4 ]
				switch i64 %src, label %latch1 [ i64 2, label %latch2
				i64 4, label %latch4 ]

				latch1:
				%next1 = add i64 %IV, 1
				%cmp1 = icmp ne i64 %IV, 128
				br i1 %cmp1, label %loop, label %exit

				latch2:
				call void @foo()
				%next2 = add i64 %IV, 2
				%cmp2 = icmp ne i64 %IV, 128
				br i1 %cmp2, label %loop, label %exit

				latch4:
				%next4 = add i64 %IV, 4
				%cmp4 = icmp ne i64 %IV, 128
				br i1 %cmp4, label %loop, label %exit

				exit:
				ret void
				}

				; This loop has two exits which means deleteLoopIfDead() will bail. The loop
				; can be eliminated after inserting an early exit.
				define void @f7(i1 %arg, i1 %arg2) mustprogress {
				; CHECK-LABEL: @f7(
				entry:
				br label %loop

				loop:
				; CHECK-LABEL: loop:
				; CHECK: br i1 %arg, label %exit1, label %exit0
				br i1 %arg, label %exit1, label %body

				body: ; preds = %bb1
				br i1 %arg2, label %exit0, label %latch

				exit0: ; preds = %bb2
				unreachable

				latch: ; preds = %bb2
				br label %loop

				exit1: ; preds = %bb1
				ret void
				}

				; Exiting block with an edge to a no-side-effects latch
				define void @f8(i64* %dst) {
				; CHECK-LABEL: @f8(
				entry:
				br label %loop

				loop:
				; CHECK-LABEL: loop:
				; CHECK: br i1 %cmp, label %body, label %body2
				%IV = phi i64 [ 0, %entry ], [ %next, %latch ]
				%b = load i8, i8* @g
				%cmp = icmp ne i8 %b, 0
				br i1 %cmp, label %body, label %body2

				body:
				call void @foo()
				br label %latch

				body2:
				; CHECK-LABEL: body2:
				; CHECK: br i1 false, label %latch, label %exit
				br i1 false, label %latch, label %exit

				latch:
				%next = add i64 %IV, 1
				store volatile i64 0, i64* %dst
				%cmp2 = icmp ne i64 %IV, 128
				br i1 %cmp2, label %loop, label %exit

				exit:
				ret void
				}

				; Branch from Top (not header) to a Bot (not latch)
				define void @f9() {
				; CHECK-LABEL: @f9(
				entry:
				br label %loop

				loop:
				; CHECK-LABEL: loop:
				; CHECK: br i1 %cmp, label %body, label %body2
				%IV = phi i64 [ 0, %entry ], [ %next, %latch ]
				%b = load i8, i8* @g
				%cmp = icmp ne i8 %b, 0
				br i1 %cmp, label %body, label %body2

				body:
				call void @foo()
				br label %latch

				body2:
				; CHECK-LABEL: body2:
				; CHECK: br i1 false, label %body, label %exit
				br i1 false, label %body, label %bot

				bot:
				; CHECK-LABEL: bot:
				; CHECK: br label %exit
				br label %latch

				latch:
				%next = add i64 %IV, 1
				%cmp2 = icmp ne i64 %IV, 128
				br i1 %cmp2, label %loop, label %exit

				exit:
				ret void
				}

				; Multiple exits, but only one on dead path.
				define void @f10() {
				; CHECK-LABEL: @f10(
				entry:
				br label %loop

				loop:
				; CHECK-LABEL: loop:
				; CHECK: br i1 %cmp, label %body, label %exit
				%IV = phi i64 [ 0, %entry ], [ %next, %latch ]
				%b = load i8, i8* @g
				%cmp = icmp ne i8 %b, 0
				br i1 %cmp, label %body, label %bot1

				body:
				call void @foo()
				br label %latch

				bot0:
				br i1 false, label %bot1, label %bot2

				bot1:
				; CHECK-LABEL: bot1:
				; CHECK: br i1 false, label %exit, label %exit
				br i1 false, label %latch, label %exit

				bot2:
				br i1 false, label %latch, label %exit2

				latch:
				%next = add i64 %IV, 1
				%cmp2 = icmp ne i64 %IV, 128
				br i1 %cmp2, label %loop, label %exit

				exit:
				ret void

				exit2:
				ret void
				}

				; Branches to two different blocks in Bot, two different and usable exit blocks.
				define void @f11() mustprogress {
				; CHECK-LABEL: @f11(
				entry:
				br label %loop

				loop:
				; CHECK-LABEL: loop:
				; CHECK: br i1 %cmp, label %body, label %top1
				%IV = phi i64 [ 0, %entry ], [ %next, %latch ]
				%b = load i8, i8* @g
				%cmp = icmp ne i8 %b, 0
				br i1 %cmp, label %body, label %top1

				body:
				call void @foo()
				br label %latch

				top1:
				; CHECK-LABEL: top1:
				; CHECK: br i1 false, label %exit, label %exit2
				br i1 false, label %bot1, label %bot2

				bot1:
				br i1 false, label %latch, label %exit

				bot2:
				br i1 false, label %latch, label %exit2

				latch:
				%next = add i64 %IV, 1
				br label %loop

				exit:
				ret void

				exit2:
				ret void
				}

				; %loop branches to %bot1, but not a single exit block. %bot1 branches to %bot2,
				; from wich there is only one exit block.
				define void @f12() mustprogress {
				; CHECK-LABEL: @f12(
				entry:
				br label %loop

				loop:
				; CHECK-LABEL: loop:
				; CHECK: br i1 %cmp, label %body, label %bot1
				%IV = phi i64 [ 0, %entry ], [ %next, %latch ]
				%b = load i8, i8* @g
				%cmp = icmp ne i8 %b, 0
				br i1 %cmp, label %body, label %bot1

				body:
				call void @foo()
				br label %latch

				bot1:
				; CHECK-LABEL: bot1:
				; CHECK: br i1 false, label %exit2, label %exit
				br i1 false, label %bot2, label %exit

				bot2:
				br i1 false, label %latch, label %exit2

				latch:
				%next = add i64 %IV, 1
				br label %loop

				exit:
				ret void

				exit2:
				ret void
				}

				; Loop nest
				define void @f13() mustprogress {
				; CHECK-LABEL: @f13(
				entry:
				br label %loop1

				loop1:
				; CHECK-LABEL: loop1:
				; CHECK: br i1 %cmp, label %loop2.preheader, label %exit1
				%IV = phi i64 [ 0, %entry ], [ %next, %latch1 ]
				%b = load i8, i8* @g
				%cmp = icmp ne i8 %b, 0
				br i1 %cmp, label %loop2.preheader, label %latch1

				loop2.preheader:
				br label %loop2

				loop2:
				; CHECK-LABEL: loop2:
				; CHECK: br i1 false, label %body2, label %exit2
				br i1 false, label %body2, label %latch2

				body2:
				call void @foo()
				br label %latch2

				latch2:
				br i1 true, label %loop2, label %exit2

				exit2:
				br label %latch1

				latch1:
				%next = add i64 %IV, 1
				%cmp2 = icmp ne i64 %IV, 128
				br i1 %cmp2, label %loop1, label %exit1

				exit1:
				ret void
				}

				declare void @foo()
				declare %0* @Rb_tree_increment()
				attributes #0 = { nounwind readonly willreturn }
				!1 = distinct !{!1, !2, !3}
				!2 = !{!"llvm.loop.mustprogress"}
				!3 = !{!"llvm.loop.unroll.disable"}

llvm/test/Transforms/LoopDeletion/noop-loops-with-subloops.ll

	Show First 20 Lines • Show All 300 Lines • ▼ Show 20 Lines
	}			}

	define void @loop2_finite_but_child_is_not(i1 %c1, i1 %c2, i1 %c3) {			define void @loop2_finite_but_child_is_not(i1 %c1, i1 %c2, i1 %c3) {
	; CHECK-LABEL: @loop2_finite_but_child_is_not(			; CHECK-LABEL: @loop2_finite_but_child_is_not(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[LOOP1:%.*]]			; CHECK-NEXT: br label [[LOOP1:%.*]]
	; CHECK: loop1:			; CHECK: loop1:
	; CHECK-NEXT: [[IV1:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV1_NEXT:%.]], [[LOOP1_LATCH:%.]] ]			; CHECK-NEXT: [[IV1:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ [[IV1_NEXT:%.]], [[LOOP1_LATCH:%.]] ]
	; CHECK-NEXT: br i1 [[C1:%.]], label [[LOOP1_LATCH]], label [[LOOP2_PREHEADER:%.]]			; CHECK-NEXT: br i1 [[C1:%.]], label %exit, label [[LOOP2_PREHEADER:%.]]
	; CHECK: loop2.preheader:			; CHECK: loop2.preheader:
	; CHECK-NEXT: br label [[LOOP2:%.*]]			; CHECK-NEXT: br label [[LOOP2:%.*]]
	; CHECK: loop2:			; CHECK: loop2:
	; CHECK-NEXT: [[IV:%.]] = phi i32 [ [[IV_NEXT:%.]], [[LOOP2_LATCH:%.*]] ], [ 0, [[LOOP2_PREHEADER]] ]			; CHECK-NEXT: [[IV:%.]] = phi i32 [ [[IV_NEXT:%.]], [[LOOP2_LATCH:%.*]] ], [ 0, [[LOOP2_PREHEADER]] ]
	; CHECK-NEXT: br label [[LOOP3:%.*]]			; CHECK-NEXT: br label [[LOOP3:%.*]]
	; CHECK: loop3:			; CHECK: loop3:
	; CHECK-NEXT: br i1 [[C2:%.*]], label [[LOOP2_LATCH]], label [[LOOP3]]			; CHECK-NEXT: br i1 [[C2:%.*]], label [[LOOP2_LATCH]], label [[LOOP3]]
	; CHECK: loop2.latch:			; CHECK: loop2.latch:
	▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines