This is an archive of the discontinued LLVM Phabricator instance.

[IndVarSimplify] Extend trip count instead of truncating IV in LFTR, when legal
ClosedPublic

Authored by amehsan on Aug 2 2016, 12:07 PM.

Details

Summary

When legal, extending trip count in the loop control logic generates better code compared to truncating IV. This is because

(1) extending trip count is a loop invariant operation (see genLoopLimit where we prove trip count is loop invariant).
(2) Scalar Evolution seems to have problems understanding trunc when computing loop trip count. So removing them allows better analysis performed in Scalar Evolution. (In particular this fixes PR 28363 which is the motivation for this change).

I am not going to perform any performance test. Any degradation caused by this should be an indication of a bug elsewhere.

To prove legality, we rely on SCEV to prove zext(trunc(IV)) == IV (or similarly for sext). If this holds, we can prove equivalence of trunc(IV)==ExitCnt (1) and IV == zext(ExitCnt). Simply take zext of boths sides of (1) and apply the proven equivalence.

Diff Detail

Event Timeline

amehsan updated this revision to Diff 66516.Aug 2 2016, 12:07 PM
amehsan retitled this revision from to [IndVarSimplify] Extend trip count instead of truncating IV in LFTR, when original IV does not overflow.
amehsan updated this object.
amehsan added reviewers: sanjoy, hfinkel.
amehsan added a subscriber: llvm-commits.
mehdi_amini added inline comments.
test/Transforms/IndVarSimplify/elim-extend.ll
1

I think we try to avoid using "grep" in test, existing usage are legacy AFAIK.

amehsan updated this revision to Diff 66595.Aug 2 2016, 4:43 PM

Some changes in usage of ValueMap. Also fixed an incorrect "cleanup". Will address Mehdi's comment separately in another revision.

amehsan added inline comments.Aug 2 2016, 4:44 PM
test/Transforms/IndVarSimplify/elim-extend.ll
1

Sure, will fix the first two test cases. For the other two, given the entire test is based on grep, I think I have to keep it.

sanjoy requested changes to this revision.Aug 3 2016, 12:47 PM
sanjoy edited edge metadata.
sanjoy added inline comments.
lib/Transforms/Scalar/IndVarSimplify.cpp
100

Why do you need this constructor?

104

Why not have the LHS be an AssertingVH?

Actually, will keeping a raw pointer (or an AssertingVH) work if an IV is widened twice?

1412

Use make_unique here.

1987–2016

s/extended/Extended/ to follow the coding conventions

1990

Not sure if you need the braces around Iter->second.

1996

I think there is a way to make this stateless (though I don't know if SCEV is smart enough to make that work viably for all the cases you care about): you were going to emit a backedge condition of trunc(IV) == ExitCnt. (Please double check this logic) since sext and zext are one-to-one, this is equivalent to both zext(trunc(IV)) == zext(ExitCnt) and sext(trunc(IV)) == sext(ExitCnt).

zext(trunc(IV)) == zext(ExitCnt) is profitable if zext(trunc(IV)) == IV (in which case the RHS is zext(ExitCnt)) and sext(trunc(IV)) == sext(ExitCnt) is profitable if sext(trunc(IV)) == IV (in which case the RHS is sext(ExitCnt)). You can check these equivalences (i.e. exit(trunc(T)) == T) using SCEV.

I think this will be a better solution overall, if it works. :)

This revision now requires changes to proceed.Aug 3 2016, 12:47 PM
amehsan added inline comments.Aug 3 2016, 1:13 PM
lib/Transforms/Scalar/IndVarSimplify.cpp
1996

I agree this is better. Will try it to see if it works.

amehsan added inline comments.Aug 3 2016, 3:33 PM
lib/Transforms/Scalar/IndVarSimplify.cpp
100

It is leftover from an earlier version of the code. Will remove.

104

My understanding is that ValueMap handles Value pointers properly if the value is deleted or RAUW'ed. I am not entirely sure that the default behavior when an IV is widened twice is what we actually want, but playing with a number of simple loops, I cannot generate an example where IV is widened twice.

I think if we come across such a loop and we care about it, what we need to do is to change the default behavior of ValueMap for RAUW (and/or delete). The main thing to worry about will be what is the OriginalType that we store for it. So if we actually have a case that this has to be handled, then we do not need to change this data structure. We may need to customize the ValueMapConfig.

1412

OK

1987–2016

OK

1990

Yes. Will remove.

1996

IIUC we cannot make it entirely stateless. If I have an unsigned i32 IV, sext(trunc(IV)) != IV. (think of when the IV goes above max signed i32). So at least I need to record the value of IsSigned and then we need the ValueMap. It would have been nice if we could just ask SCEV to do the analysis for us here to decide whether we should trunc iv or extend trip count.

Another question that I have about the suggested approach is compile time. Is it possible that we have cases for which SCEV analsysis is expensive and so asking for another SCEV analysis is something that we prefer to avoid?

sanjoy added inline comments.Aug 3 2016, 4:01 PM
lib/Transforms/Scalar/IndVarSimplify.cpp
104

Sorry, this is my fault. I mistook the ValueMap for a DenseMap. I think what you have here is fine for now, modulo my comments on not keeping any state at all.

1996

IIUC we cannot make it entirely stateless. If I have an unsigned i32 IV, sext(trunc(IV)) != IV. (think of when the IV goes above max signed i32). So at least I need to record the value of IsSigned and then we need the ValueMap. It would have been nice if we could just ask SCEV to do the analysis for us here to decide whether we should trunc iv or extend trip count.

I didn't mean to say that we can unconditionally assume sext(trunc(IV)) == IV -- we'll have to ask SCEV to re-prove that, like:

A = getSCEV(IV);
B = getSext(getTrunc(A))
if (A == B) then signed;
C = getZext(getTrunc(A))
if (A == C) then unsigned;

The nice thing here is that the optimization becomes orthogonal to the "history" of what indvars did. Otherwise we end up in odd situations where indvars can simplify a certain pattern if *it* generated it, but can't simplify it if the exact same IR pattern came directly from the user. Does this make sense?

Another question that I have about the suggested approach is compile time. Is it possible that we have cases for which SCEV analsysis is expensive and so asking for another SCEV analysis is something that we prefer to avoid?

Yes, there is a potential compile-time issue here. I don't think it should be material though (and if it is, my first action would be try to make SCEV faster, not avoid doing the more right thing in indvars).

amehsan added inline comments.Aug 4 2016, 10:57 AM
lib/Transforms/Scalar/IndVarSimplify.cpp
1996

I noticed this before, but I misinterpreted something. The right solution might be something else. After simplifyAndExtend the loop looks exactly as we want.

for.body.preheader:                               ; preds = %entry
  %0 = zext i32 %m to i64

for.body:                                         ; preds = %for.body.preheader, %for.body
  %indvars.iv = phi i64 [ 500, %for.body.preheader ], [ %indvars.iv.next, %for.body ]
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %inc = add nuw i32 %i.011, 1
  %cmp = icmp ult i64 %indvars.iv.next, %0

One simple problem is that in IndVarSimplify::run the following call

const SCEV *BackedgeTakenCount = SE->getBackedgeTakenCount(L);

is before simplifyAndExtend. This order can change, but even when I do that the BackEdgeCount returned by SCEV is 'i32 %m'. What is the reason for that? Maybe we need to make sure the BackEdgeCount is an i64 expression (here %0) and then the trunc will not be generated.

amehsan added inline comments.Aug 4 2016, 11:18 AM
lib/Transforms/Scalar/IndVarSimplify.cpp
1996

Never mind. That is due to caching.

amehsan updated this revision to Diff 67013.Aug 5 2016, 1:59 PM
amehsan edited edge metadata.

As Sanjoy proposed, I changed the implementation to rely on SCEV to prove legality of extending trip count.

sanjoy requested changes to this revision.Aug 5 2016, 2:15 PM
sanjoy edited edge metadata.

Code-wise only minor comments, otherwise the code lgtm.

Test-wise, you really need a few specific test cases for the code you've added (and one or two negative tests). Even if the code is being exercised as part of other tests, it is nicer to have commits come with test cases that document what new functionality was added.

You also need to not use grep in the test cases. Note: I've fixed all of the IndVarSimplify tests to not use grep.

lib/Transforms/Scalar/IndVarSimplify.cpp
1987–2016

I'd s/Extended/ExtendedIV/

Also add a one-line comment on what you're trying to do: instead of truncating the IV, try to extend the exit count, if legal.

1989

I'd s/ZExTrunc/ZExtTrunc/

2010

No braces needed here.

This revision now requires changes to proceed.Aug 5 2016, 2:15 PM

Code-wise only minor comments, otherwise the code lgtm.

Test-wise, you really need a few specific test cases for the code you've added (and one or two negative tests). Even if the code is being exercised as part of other tests, it is nicer to have commits come with test cases that document what new functionality was added.

Yes, I meant to add testcases, but I forgot. Will add.

You also need to not use grep in the test cases. Note: I've fixed all of the IndVarSimplify tests to not use grep.

OK. I did not realize there is a recent change in the testcases. Will modify that.

amehsan updated this revision to Diff 67163.EditedAug 8 2016, 7:32 AM
amehsan edited edge metadata.

added new testcases.

With the new implementation, (that we first try to prove legality of zext and if it fails we use sext) two of the testcases no longer fail. So I have excluded those changes. Most minor comments on the code was also applied, except s/Extended/ExtendedIV/ as we are extending trip count not IV, and I thought ExtendedTripCount is too long a variable name without carrying much information given the comments in the code and overall code size for this logic.

sanjoy requested changes to this revision.Aug 8 2016, 10:42 AM
sanjoy edited edge metadata.

One minor comment on the code, a couple of nits on the test cases.

lib/Transforms/Scalar/IndVarSimplify.cpp
1987–2016

Please wrap this comment to 80 columns. Actually, I'd be just a little bit more descriptive here about the rationale -- since A == B == sext(A) == sext(B) == zext(A) == zext(B), we choose one of these encodings of the equality check if we already have a widened version of the induction variable with us.

2012

No braces needed here.

test/Transforms/IndVarSimplify/lftr-wide-trip-count.ll
7

Are the noalias, nocapture, readnone, readonly argument attributes necessary? If not, I'd just drop them to get a more targeted test case.

22

Please use -instnamer on these to remove the %0 variable names. Otherwise these tests will become difficult to edit later.

80

Missing CHECK: s?

84

Looks like %a is unused here and below?

157

Missing CHECK: s?

This revision now requires changes to proceed.Aug 8 2016, 10:42 AM
sanjoy added inline comments.Aug 8 2016, 10:45 AM
test/Transforms/IndVarSimplify/lftr-wide-trip-count.ll
82

Wherever pertinent, can you also please check the full icmp ne expression (that is, not just the prefix)? Since that's the whole change basically (changing what icmp expression we generate) I think we should be rigorous there.

amehsan updated this revision to Diff 67626.Aug 10 2016, 4:18 PM
amehsan edited edge metadata.

Comments applied. Bootstrap test passed.

sanjoy accepted this revision.Aug 10 2016, 4:45 PM
sanjoy edited edge metadata.

lgtm, thanks!

This revision is now accepted and ready to land.Aug 10 2016, 4:45 PM
amehsan retitled this revision from [IndVarSimplify] Extend trip count instead of truncating IV in LFTR, when original IV does not overflow to [IndVarSimplify] Extend trip count instead of truncating IV in LFTR, when legal.Aug 11 2016, 6:05 AM
amehsan updated this object.
amehsan edited edge metadata.
amehsan closed this revision.Aug 11 2016, 7:01 AM

Committed revision 278334.