Page MenuHomePhabricator

[LoopDeletion] Insert an early exit from dead path in loop
Needs ReviewPublic

Authored by jonpa on Tue, Dec 22, 3:32 PM.

Details

Summary

This is applied on top of https://reviews.llvm.org/D86844 "[LoopDeletion] Allows deletion of possibly infinite side-effect free loops".

This patch handles the case where the whole loop is not dead, but only a certain constant path through it. If this path is entered, the loop is in fact dead and an early exit could be made.

Before finalizing the patch by adding tests and also handling the new PM, I would like to get some feedback as to if this is looking to be the right approach. I have seen that this handles the omnetpp function printAddressTable() the same way as GCC does, which gives a nice improvement on the benchmark (SystemZ, output disabled).

Diff Detail

Event Timeline

jonpa created this revision.Tue, Dec 22, 3:32 PM
jonpa requested review of this revision.Tue, Dec 22, 3:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptTue, Dec 22, 3:32 PM

The idea is that we have

H:
  %c = ... ; invariant wrt H and L
  br %c, L, B
B: 
  side_effects
  br L
L:
  br %x, H, Exit

Exit:
  ...

right?

One of my problems is that this is too tied to the syntax.
Another is that we are not deleting a loop. The former can be addressed later I guess.

Isn't this more related to LoopUnswitch?
We want to "unswitch the paths that are side-effect free", or something like that.

We could also make it generic by collecting the blocks that are allowed on such a path
in order to allow different CFGs. But that can be done later as well.

llvm/lib/Transforms/Scalar/LoopDeletion.cpp
149

Nit: Unrelated and misses documentation. A useful helper though, could be added to D86844 directly (if you don't mind).

290

that reminds me, we really need to make llvm.assume side-effect free... D89054

jonpa added a comment.Tue, Dec 22, 4:18 PM

The idea is that we have

H:
  %c = ... ; invariant wrt H and L
  br %c, L, B
B: 
  side_effects
  br L
L:
  br %x, H, Exit

Exit:
  ...

right?

Yes, exactly (could be more blocks than 3, though, of course).

One of my problems is that this is too tied to the syntax.

How do you mean, exactly?

Another is that we are not deleting a loop. The former can be addressed later I guess.

Isn't this more related to LoopUnswitch?
We want to "unswitch the paths that are side-effect free", or something like that.

I thought that if we cannot delete the whole loop we can effectively delete a part of it with an early exit.

LoopUnswitching only works if the condition is loop invariant currently as far as I can see, so I am not sure how simple that would be...

We could also make it generic by collecting the blocks that are allowed on such a path
in order to allow different CFGs. But that can be done later as well.

Might be worth a try...

The idea is that we have

H:
  %c = ... ; invariant wrt H and L
  br %c, L, B
B: 
  side_effects
  br L
L:
  br %x, H, Exit

Exit:
  ...

right?

Yes, exactly (could be more blocks than 3, though, of course).

One of my problems is that this is too tied to the syntax.

How do you mean, exactly?

H:
  %c1 = ... ; invariant wrt H, LazyEval, and L
  br %c1, LazyEval, B
LazyEval:  
  %c2 = ... ; invariant wrt H, LazyEval, and L
  br %c2, L, B
B: 
  side_effects
  br L
L:
  br %x, H, Exit
Exit:
   ...

I mean the above breaks the matching even though the same logic applies. I would argue that is not what we want.
(And given that %c1 and %c2 can have loads, we might not be able to undo short circuit evaluation here to reduce
it to your initial pattern.)

Isn't this more related to LoopUnswitch?
We want to "unswitch the paths that are side-effect free", or something like that.

I thought that if we cannot delete the whole loop we can effectively delete a part of it with an early exit.

We delete parts, agreed, but not a loop. Unsure if this is the right place conceptually.

LoopUnswitching only works if the condition is loop invariant currently as far as I can see, so I am not sure how simple that would be...

With that argumentation you could say LoopDeletion doesn't do this either (right now) ;)
I mean, you can simply copy this code there and it should work, right?

We could also make it generic by collecting the blocks that are allowed on such a path
in order to allow different CFGs. But that can be done later as well.

Might be worth a try...

FWIW, I my goal is to make it something driven by a generic analysis, not another pattern we match.

reames added a subscriber: reames.Tue, Dec 22, 8:32 PM

As implemented, this is a bit too specific to one particular pattern to be commitable, but I think you've got the seed of a good idea here.

I think you can generalize this as follows:

  • If all conditions contributing to control flow along the path from header to exit block are loop invariant (use SCEVs definition), then the exit is either taken or not taken equally on all iterations.
  • If all exit blocks meet the previous criteria, then the loop must either execute once or be infinite. If we can prove it's not infinite, it must execute once.
  • For a loop which executes exactly once, the backedge is dead. We can break the backedge, leave the rest of the loop unchanged, and get the result you're looking for.

As a special case of the above (which is probably all we should implement), if all loop exits dominate the backedge, SCEV will be able to compute the backedge taken count as zero. You can simply check the backedge taken count and break the backedge.

Note that my phrasing doesn't involve reasoning about side effects.

fhahn added a comment.Wed, Dec 23, 7:42 AM

Thanks for sharing the patch Jonas!

Isn't this more related to LoopUnswitch?
We want to "unswitch the paths that are side-effect free", or something like that.

I thought that if we cannot delete the whole loop we can effectively delete a part of it with an early exit.

We delete parts, agreed, but not a loop. Unsure if this is the right place conceptually.

LoopUnswitching only works if the condition is loop invariant currently as far as I can see, so I am not sure how simple that would be...

As mentioned already, I think framing this as extension to unswitching could be interesting. In a way, it is some kind of 'partial' unswitching, where only the condition of the no-sideeffect version becomes known after unswitching.

I put together an early version of that idea in D93764. It focuses primarily on conditions as in omnetpp's MACRelayUnitBase::printAddressTable, where the condition is a load of a loaded address and one the path from the false successor to header/exits has no side effects. It still an early version(not ready for review yet) which needs more tests and comments, but from looking at the impact on the test-suite (increase in number of branches unswitched), it seems that this could be viable in general.

Note that my phrasing doesn't involve reasoning about side effects.

I think the side effect part is still needed (at some point) as the use case they want to tackle has loads in the condition that might be modified by the "main body" (which is potentially never executed).

jonpa updated this revision to Diff 314473.Mon, Jan 4, 4:08 PM

With the motto of pushing things forward even if only by aiding the other related patches, I have continued to improve my patch to use as some kind of baseline for "early exit" insertions. Perhaps it can be used during development of the partial loop-unswitching to find cases to handle, or perhaps it could be used for some cases if it would reduce the burden on the other algorithm. It would be very nice if partial unswitching could handle all this instead, of course :-)

Instead of just handling the header->latch edge, now any edge from the "Header region" to the "Latch region" can be handled. The requirements are that in the header region, all conditions must be loop invariant, and there can only be one reachable exit block in the latch region from the branch target block back to header. If there is no exit block on the dead path but the loop has a unique exit block, that exit block is used for the early exit, given the mustprogress attribute.

Interestingly enough, there are now cases where this patch manages to "delete" (or "eliminate") a loop which has multiple exits. The loop structure is removed while the BBs remain, which should hopefullyl be removed later by CFG-opt. I think this is rare on benchmarks though... (see below)

I found out quickly that perhaps the hardest part of this was to update datastructures after changing the CFG in a loop... I have barely been able to build the benchmarks as it is, so I am in need of some good advice on how to update things after a loop change, for this patch to be usable. Currently I have changed things temporarily so that LI, DT, etc are recomputed after loop deletion (BTW, I found that only recomputing those analyses on master after LoopDeletion was not NFC, which surprised me... Is that a bug or expected with the aim to save compile time?)

Statistics on SPEC-17 on SystemZ:

master (patched to not preserve analyses after LoopDeletion for a fair comparison):

      7276                    loop-delete - Number of loops deleted
Only Header/Latch (like first simple version aimed to do):
      1414                    loop-delete - Number of early exits inserted
      7498                    loop-delete - Number of loops deleted
         8                    loop-delete - Number of loops eliminated (no remaining blocks)
       462                    loop-delete - Number of skipped loops (SCEV 0)
Top/Bot regions (current patch):
      2353                    loop-delete - Number of early exits inserted
      7397                    loop-delete - Number of loops deleted
         6                    loop-delete - Number of loops eliminated (no remaining blocks)
       462                    loop-delete - Number of skipped loops (SCEV 0)

A rise from ~1400 to ~2350 is a fairly nice improvement in number of early exits inserted compared to initial patch. There seem to be some difference also in number of loops deleted as reported by deleteLoopIfDead(), which I have not investigated. It seems that somehow must relate to a parent loop now not having a subloop..?

@reames: I am not quite sure how the SE->isZero() case directly relates to the detection of dead paths per my patch... I have here tried a statistic for that case, which showed that some loops could be handled by your patch, while there are still many more that can not... (see above).

@jdoerfert:

FWIW, I my goal is to make it something driven by a generic analysis, not another pattern we match.

I agree with that, I have now made the patch more general.

fhahn added a comment.Fri, Jan 8, 6:54 AM

I found out quickly that perhaps the hardest part of this was to update datastructures after changing the CFG in a loop... I have barely been able to build the benchmarks as it is, so I am in need of some good advice on how to update things after a loop change, for this patch to be usable. Currently I have changed things temporarily so that LI, DT, etc are recomputed after loop deletion (BTW, I found that only recomputing those analyses on master after LoopDeletion was not NFC, which surprised me... Is that a bug or expected with the aim to save compile time?)

Statistics on SPEC-17 on SystemZ:

Thanks for the update Jonas! It looks like the patch includes some required changes that still landed (D86844), which might impact the number of removed loops. It might be good to re-collect the statistics. I tried to collect stats with this patch on SPEC2000/SPEC2006/MultiSource for X86 with LTO, but unfortunately there have been a few crashes.

@jdoerfert:

FWIW, I my goal is to make it something driven by a generic analysis, not another pattern we match.

I agree with that, I have now made the patch more general.

I think both this patch and D93764 require very similar analysis to find 'no-op' paths through loops (with the difference that partial unswitching can allow stores to memory that does not clobber the condition). Do you think it would be worth unifying the analysis code?

jonpa added a comment.Fri, Jan 8, 12:29 PM

I found out quickly that perhaps the hardest part of this was to update datastructures after changing the CFG in a loop... I have barely been able to build the benchmarks as it is, so I am in need of some good advice on how to update things after a loop change, for this patch to be usable. Currently I have changed things temporarily so that LI, DT, etc are recomputed after loop deletion (BTW, I found that only recomputing those analyses on master after LoopDeletion was not NFC, which surprised me... Is that a bug or expected with the aim to save compile time?)

Statistics on SPEC-17 on SystemZ:

Thanks for the update Jonas! It looks like the patch includes some required changes that still landed (D86844), which might impact the number of removed loops. It might be good to re-collect the statistics. I tried to collect stats with this patch on SPEC2000/SPEC2006/MultiSource for X86 with LTO, but unfortunately there have been a few crashes.

Sorry to hear about the crashes, hopefully it can all be fixed... I rebased the patch and did a new run:

master (patched to not preserve analyses after LoopDeletion for a fair comparison):

      7557                    loop-delete - Number of loops deleted
current patch:
      2352                    loop-delete - Number of early exits inserted
      7420                    loop-delete - Number of loops deleted
         6                    loop-delete - Number of loops eliminated (no remaining blocks)
       439                    loop-delete - Number of skipped loops (SCEV 0)

@jdoerfert:

FWIW, I my goal is to make it something driven by a generic analysis, not another pattern we match.

I agree with that, I have now made the patch more general.

I think both this patch and D93764 require very similar analysis to find 'no-op' paths through loops (with the difference that partial unswitching can allow stores to memory that does not clobber the condition). Do you think it would be worth unifying the analysis code?

To me it depends on what your goal is with the partial unswitching - do you mean to extend your patch to include multiple partially invariant conditions in the header region? (Or is that already a side-effect of revisiting the new loop? If so, there is no need to compute the "Top-region" like this patch does.). I guess we could perhaps have a common function that finds a unique exit block from SuccBB back to header and collect the memory references as well on that path...

Do you think partial unswitching will be able eventually to handle all the early-exit cases? I haven't tried this patch yet on top of your patch, but that will be interesting :-)

jonpa updated this revision to Diff 315487.Fri, Jan 8, 12:32 PM

patch rebased

jdoerfert added inline comments.Fri, Jan 8, 2:33 PM
llvm/lib/Transforms/Scalar/LoopDeletion.cpp
110

DriveBy:

This hurts. I know it was there before but auto &I is BasicBlock *... argh.
Could we please change that if we commit this to a auto * at least with a sensible variable name.

jonpa added inline comments.Sat, Jan 9, 1:42 PM
llvm/lib/Transforms/Scalar/LoopDeletion.cpp
110

I think it's better in that case perhaps to just commit such an NFC change separately beforehand?