This is an archive of the discontinued LLVM Phabricator instance.

Enable unrolling of multi-exit loops
ClosedPublic

Authored by meheff on Sep 30 2014, 2:35 PM.

Details

Reviewers
atrick
jingyue
Summary

This patch de-pessimizes the calculation of loop trip counts in ScalarEvolution in the presence of multiple exits. Previously all loops exits had to have identical counts for a loop trip count to be considered computable. This pessimization was implemented by calling getBackedgeTakenCount(L) rather than getExitCount(L, ExitingBlock) inside of ScalarEvolution::
getSmallConstantTripCount() (see the FIXME in the comments of that function). The pessimization was added to fix a corner case involving undefined behavior (pr/16130). This patch more precisely handles the undefined behavior case allowing the pessimization to be removed.

ControlsExit replaces IsSubExpr to more precisely track the case where undefined behavior is expected to occur. Because undefined behavior is tracked more precisely we can remove MustExit from ExitLimit. MustExit was used to track the case where the limit was computed potentially assuming undefined behavior even if undefined behavior didn't necessarily occur.

Diff Detail

Event Timeline

meheff updated this revision to Diff 14248.Sep 30 2014, 2:35 PM
meheff retitled this revision from to Enable unrolling of multi-exit loops.
meheff updated this object.
meheff edited the test plan for this revision. (Show Details)
meheff added reviewers: atrick, jingyue.
meheff added a subscriber: Unknown Object (MLST).

The review diff tool doesn't do a very good job of presenting it, but a big chunk of the patch is moving the class SCEVDivision and a few static functions earlier in the file so they can be used in ScalarEvolution::HowFarToZero.

jingyue edited edge metadata.Sep 30 2014, 9:17 PM

LGTM. You may want to mention your change to MustExit and ControlsExit in the description. Thanks for working on this!

lib/Analysis/ScalarEvolution.cpp
4721–4722

Since MustExit is gone, comments need to be updated.

6033–6034

Can we use getUDivExactExpr when R is zero?

7103–7104

Looks like the diff tool is not doing a good job :) Are the remaining differences in this file simply copy and paste?

meheff updated this object.Oct 2 2014, 10:15 AM
meheff edited edge metadata.
meheff added inline comments.
lib/Analysis/ScalarEvolution.cpp
4721–4722

Done.

6033–6034

Done.

7103–7104

Yeah, there is a big cut and paste. Evidently there is some very similar code elsewhere in the file which is confusing the diff.

meheff updated this revision to Diff 14335.Oct 2 2014, 10:25 AM
meheff added a comment.Oct 3 2014, 9:59 AM

Hi Andrew,

Do you mind having a look?

Thanks!
Mark

atrick requested changes to this revision.Oct 7 2014, 11:03 PM
atrick edited edge metadata.

Sorry for the delay reviewing--I had to be careful with this one, and the diff is unclear. It does look like great improvement and it's about time to move on to full-fledge support for multi-exit loops.

With this change, ComputeExitLimit design loses some information. We cannot as precisely determine the trip count of a particular exit given knowledge of NSW. But I see now that information was probably not very useful. The idea of conditionally considering NW/NSW based on the loop structure makes the code simpler and the API safer. As a result, the loop unroller can actually be more aggressive without adding any complexity. So I think this is a great change.

One thing I'm concerned about though is that I think you're using SCEVDivision for the first time in the standard LLVM pipeline (outside of the delinearizer). SCEVDivision looks like it recurses over the expression operands. Every time someone does this, we eventually have to track down some case that results in pathological compile time. Can you prove that this recursion will never visit the same expression twice? If not, I think you need to add a visited set as we do elsewhere, or find another way to check for exact division.

This revision now requires changes to proceed.Oct 7 2014, 11:03 PM

Thanks for the review.

In D5550#10, @atrick wrote:

Sorry for the delay reviewing--I had to be careful with this one, and the diff is unclear. It does look like great improvement and it's about time to move on to full-fledge support for multi-exit loops.

No problem on the latency. It took me a long time to reason through all the code and the change ;-)

With this change, ComputeExitLimit design loses some information. We cannot as precisely determine the trip count of a particular exit given knowledge of NSW. But I see now that information was probably not very useful. The idea of conditionally considering NW/NSW based on the loop structure makes the code simpler and the API safer. As a result, the loop unroller can actually be more aggressive without adding any complexity. So I think this is a great change.

One interesting thing about the NSW case where the condition is "missed": InstCombine catches some of these and transforms them to infinite loops. For example, the following gets transformed to an infinite loop:

;; run with: opt -instcombine
declare void @bar(...) #1
define void @test() {
entry:

br label %loop

loop:

%i = phi i32 [ 0, %entry ], [ %i.next, %loop ]
call void (...)* @bar()
%i.next = add nsw i32 %i, 2
%t = icmp ne i32 %i.next, 5
br i1 %t, label %loop, label %exit

exit:

ret void

}

InstCombine does this by looking at known bit values of the two sides of the icmp ne. However, if loop unrolling runs before instcombine the loop will be unrolled completely with a trip count of 2. If you change the increment value from 2 to 3, then instcombine can't reason about the bits and won't transform it to an infinite loop. Given this inconsistency (not surprising given that we're talking about undefined behavior), I'd also be happy just returning couldNotCompute in cases where the equality condition misses with nsw. This would simplify the code further by eliminating the ControlsExit parameter in the various functions. I wrote the code as is to match existing behavior. Let me know if you think this further simplification and change in behavior is a good idea.

One thing I'm concerned about though is that I think you're using SCEVDivision for the first time in the standard LLVM pipeline (outside of the delinearizer). SCEVDivision looks like it recurses over the expression operands. Every time someone does this, we eventually have to track down some case that results in pathological compile time. Can you prove that this recursion will never visit the same expression twice? If not, I think you need to add a visited set as we do elsewhere, or find another way to check for exact division.

I'll dive in a get back shortly on this...

meheff added a comment.Oct 8 2014, 2:16 PM
In D5550#10, @atrick wrote:

One thing I'm concerned about though is that I think you're using SCEVDivision for the first time in the standard LLVM pipeline (outside of the delinearizer). SCEVDivision looks like it recurses over the expression operands. Every time someone does this, we eventually have to track down some case that results in pathological compile time. Can you prove that this recursion will never visit the same expression twice? If not, I think you need to add a visited set as we do elsewhere, or find another way to check for exact division.

Is using SCEVDivision fundamentally different than getUDivExpr which is used now? getUDivExpr recurses over expression operands just like SCEVDivision.

Mark

atrick accepted this revision.Oct 8 2014, 4:31 PM
atrick edited edge metadata.

I though getUDivExpr was less general. Looking at it now it seems to recurse in roughly the same situations as SCEVDivision, with no memoization. So I think it's ok to bring in SCEVDivision. I can't argue that will create a new problem.

This revision is now accepted and ready to land.Oct 8 2014, 4:31 PM

Just fyi, I ran SPEC to see if there was any performance change. Performance difference was in the noise.

Commited r219517.

Andrew, btw, you can probably close "rdar:14038809 [SCEV]: Optimize trip count computation for multi-exit loops.".

Mark

meheff closed this revision.Oct 10 2014, 10:54 AM