This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Handling for ICmp occuring in the evolution chain.
ClosedPublic

Authored by jbhateja on Oct 3 2017, 12:36 AM.

Details

Summary

If a compare instruction is same or inverse of the compare in the
branch of the loop latch, then return a constant evolution node.
This shall facilitate computations of loop exit counts in cases
where compare appears in the evolution chain of induction variables.

Will fix PR 34538

Diff Detail

Repository
rL LLVM

Event Timeline

jbhateja created this revision.Oct 3 2017, 12:36 AM
jbhateja edited subscribers, added: llvm-commits; removed: sanjoy.
jbhateja retitled this revision from [ScalarEvolution] Handling for ICmp occuring in then evolution chain. to [ScalarEvolution] Handling for ICmp occuring in the evolution chain..Oct 3 2017, 1:53 AM
junryoungju accepted this revision.Oct 3 2017, 3:39 AM

Looks fine.

This revision is now accepted and ready to land.Oct 3 2017, 3:39 AM
sanjoy requested changes to this revision.Oct 3 2017, 10:11 AM
sanjoy added inline comments.
lib/Analysis/ScalarEvolution.cpp
4995 โ†—(On Diff #117480)

I don't think this is correct in the last iteration of the loop when the latch will not be taken.

This revision now requires changes to proceed.Oct 3 2017, 10:11 AM
This comment was removed by junryoungju.
sanjoy added a comment.Oct 3 2017, 7:28 PM

can you explain me why? (I may I didn't understood why do we need to process without loop latch)
I thought without latch cannot be evoluted from ICmp evolution.

without latch loop have to me evoluted from previous SCEV operation. isn't it?

We may be talking past each other, but I was thinking of a situation like this or equivalent:

bool cond;
do {
  cond = icmp eq 0, 1;
  print(cond);
} while (cond);

If I understand correctly, this patch will infer that getSCEV(cond) is the SCEVConstant for true, but it is actually false.

hfinkel edited edge metadata.Oct 3 2017, 7:32 PM

can you explain me why? (I may I didn't understood why do we need to process without loop latch)
I thought without latch cannot be evoluted from ICmp evolution.

without latch loop have to me evoluted from previous SCEV operation. isn't it?

I don't understand your statement.

The point is that, in the last loop iteration, latch condition is not true, so replacing it with true is incorrect. If you have:

int values[11];
int i = 0;
do {

values[i] = i != 10;

} while (i++ != 10);

After the loop, values should be { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0 }. In the 11th iteration, the comparison will be false. With this patch, we'd store 1 into the values array, even for the last iteration.

no, If there has no latch will not execute evoluateForICmp in LoopDeletion(if I think correct)
getBackedgeTakenInfo will return SCEVUnkown correctly if there is no latch.

ah I misunderstood.

(before I explaining it, please understand that I am not really good at english)
I meant It will not be executed before it has something affects on any other variables. (check out https://bugs.llvm.org/show_bug.cgi?id=34538)
because that issue isn't target on that you meant. (issue of LoopDeletion)
in LoopDeletion, getBackedgeTakenInfo will return SCEVUnkown correctly if it affects on any other outside variable (because it can be infinite loop getBackedgeTakenInfo cannot analysis maximum backedge count)

so I mean I only thought that SCEV on LoopDeletion. I misunderstood just don't care that I commented.

int values[11];
int i = 0;
do {

values[i] = i != 10;
} while (i++ != 10);

will be removed. from LoopDeletion.

int values[11];
int i = 0;
do {

values[i] = i != 10;
} while (i++ != 10);


print(values[10]);

will not be even optimized.
(this is what I thought)

no, If there has no latch will not execute evoluateForICmp in LoopDeletion(if I think correct)
getBackedgeTakenInfo will return SCEVUnkown correctly if there is no latch.

Whether or not there's a latch is a static property of the

int values[11];
int i = 0;
do {

values[i] = i != 10;
} while (i++ != 10);

will be removed. from LoopDeletion.

int values[11];
int i = 0;
do {

values[i] = i != 10;
} while (i++ != 10);


print(values[10]);

will not be even optimized.
(this is what I thought)

Whether or not any current transformation performs the simplification is not relevant. It would be trivial to write a transformation that replace instructions whose SCEVs are constants with those constants. SCEV needs to be semantically sound on its own.

For following C and IR references.

int foo() {

int start = 0;
int end = 10000;
int tag = 0;
do {
    tag = 0;
    if (start < end) start++, tag = 1;
} while (tag);
return 0;

}

do.body: ; preds = %do.body, %entry

%start.0 = phi i32 [ 0, %entry ], [ %inc.start.0, %do.body ]
%cmp = icmp slt i32 %start.0, 10000
%inc = zext i1 %cmp to i32
%inc.start.0 = add nsw i32 %start.0, %inc
br i1 %cmp, label %do.body, label %do.end

Following is the result of SCEV Analysis with the patch

PROMPT>opt -analyze -scalar-evolution test1_preana_sgt.ll
Printing analysis 'Scalar Evolution Analysis' for function 'foo':
Classifying expressions for: @foo

%start.0 = phi i32 [ 0, %entry ], [ %inc.start.0, %do.body ]
-->  {0,+,1}<nuw><nsw><%do.body> U: [0,10001) S: [0,10001)            Exits: 10000            LoopDispositions: { %do.body: Computable }
%inc = zext i1 %cmp to i32
-->  1 U: [1,2) S: [1,2)              Exits: 1                LoopDispositions: { %do.body: Invariant }
%inc.start.0 = add nsw i32 %start.0, %inc
-->  {1,+,1}<nuw><nsw><%do.body> U: [1,10002) S: [1,10002)            Exits: 10001            LoopDispositions: { %do.body: Computable }

Determining loop execution counts for: @foo
Loop %do.body: backedge-taken count is 10000
Loop %do.body: max backedge-taken count is 10000
Loop %do.body: Predicated backedge-taken count is 10000
Predicates:

Loop %do.body: Trip multiple is 10001.

sanjoy added a comment.Oct 3 2017, 9:23 PM
int values[11];
int i = 0;
do {

values[i] = i != 10;
} while (i++ != 10);

will be removed. from LoopDeletion.

int values[11];
int i = 0;
do {

values[i] = i != 10;
} while (i++ != 10);


print(values[10]);

will not be even optimized.
(this is what I thought)

But LoopDeletion is not the only user of SCEV. E.g. see https://reviews.llvm.org/rL314266

can you explain me why? (I may I didn't understood why do we need to process without loop latch)
I thought without latch cannot be evoluted from ICmp evolution.

without latch loop have to me evoluted from previous SCEV operation. isn't it?

I don't understand your statement.

The point is that, in the last loop iteration, latch condition is not true, so replacing it with true is incorrect. If you have:

int values[11];
int i = 0;
do {

values[i] = i != 10;

} while (i++ != 10);

After the loop, values should be { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0 }. In the 11th iteration, the comparison will be false. With this patch, we'd store 1 into the values array, even for the last iteration.

Hi Hal,

I don't see any changes in output of scalar-evolution analysis over above example with and without this patch. Following are the results

Printing analysis 'Scalar Evolution Analysis' for function 'main':
Classifying expressions for: @main

%i.0 = phi i32 [ 0, %entry ], [ %inc, %do.body ]
-->  {0,+,1}<nuw><nsw><%do.body> U: [0,11) S: [0,11)          Exits: 10               LoopDispositions: { %do.body: Computable }
%conv = zext i1 %cmp to i32
-->  (zext i1 %cmp to i32) U: [0,2) S: [0,2)          Exits: 0                LoopDispositions: { %do.body: Variant }
%idxprom = sext i32 %i.0 to i64
-->  {0,+,1}<nuw><nsw><%do.body> U: [0,11) S: [0,11)          Exits: 10               LoopDispositions: { %do.body: Computable }
%arrayidx = getelementptr inbounds [11 x i32], [11 x i32]* @values, i64 0, i64 %idxprom
-->  {@values,+,4}<nsw><%do.body> U: [0,-3) S: [-9223372036854775808,9223372036854775805)             Exits: (40 + @values)     LoopDispositions: { %do.body: Computable }
%inc = add nsw i32 %i.0, 1
-->  {1,+,1}<nuw><nsw><%do.body> U: [1,12) S: [1,12)          Exits: 11               LoopDispositions: { %do.body: Computable }
%0 = load i32, i32* getelementptr inbounds ([11 x i32], [11 x i32]* @values, i64 0, i64 10), align 8
-->  %0 U: full-set S: full-set
%call = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([6 x i8], [6 x i8]* @.str, i32 0, i32 0), i32 %0)
-->  %call U: full-set S: full-set

Determining loop execution counts for: @main
Loop %do.body: backedge-taken count is 10
Loop %do.body: max backedge-taken count is 10
Loop %do.body: Predicated backedge-taken count is 10
Predicates:

Loop %do.body: Trip multiple is 11

jbhateja added inline comments.Oct 4 2017, 2:26 AM
lib/Analysis/ScalarEvolution.cpp
4995 โ†—(On Diff #117480)

This routine is not directly evaluating any condition, in fact its returing a true or false value for comparision, client of SCEV is going to perform further analysis based on it. Can you help in to identify a test scenario where client using this will generate incorrect result, I see you have explored this area well in past.

Yes, that is what I meant,
this will not evoluate all of loops at all.

With this IR:

define void @f() {
entry:
  br label %loop

loop:
  %iv = phi i32 [ 0, %entry ], [ %iv.inc, %loop ]
  %iv.inc = add i32 %iv, 1
  %cmp = icmp ne i32 %iv.inc, 10
  %cmp.zext = zext i1 %cmp to i32
  br i1 %cmp, label %loop, label %leave

leave:
  ret void
}

and opt -analyze -scalar-evolution , on master:

Printing analysis 'Scalar Evolution Analysis' for function 'f':
Classifying expressions for: @f
  %iv = phi i32 [ 0, %entry ], [ %iv.inc, %loop ]
  -->  {0,+,1}<%loop> U: [0,10) S: [0,10)               Exits: 9                LoopDispositions: { %loop: Computable }
  %iv.inc = add i32 %iv, 1
  -->  {1,+,1}<%loop> U: [1,11) S: [1,11)               Exits: 10               LoopDispositions: { %loop: Computable }
  %cmp.zext = zext i1 %cmp to i32
  -->  (zext i1 %cmp to i32) U: [0,2) S: [0,2)          Exits: 0                LoopDispositions: { %loop: Variant }
Determining loop execution counts for: @f
Loop %loop: backedge-taken count is 9
Loop %loop: max backedge-taken count is 9
Loop %loop: Predicated backedge-taken count is 9
 Predicates:

Loop %loop: Trip multiple is 10

and with your patch:

Printing analysis 'Scalar Evolution Analysis' for function 'f':
Classifying expressions for: @f
  %iv = phi i32 [ 0, %entry ], [ %iv.inc, %loop ]
  -->  {0,+,1}<%loop> U: [0,10) S: [0,10)               Exits: 9                LoopDispositions: { %loop: Computable }
  %iv.inc = add i32 %iv, 1
  -->  {1,+,1}<%loop> U: [1,11) S: [1,11)               Exits: 10               LoopDispositions: { %loop: Computable }
  %cmp.zext = zext i1 %cmp to i32
  -->  1 U: [1,2) S: [1,2)              Exits: 1                LoopDispositions: { %loop: Invariant }
Determining loop execution counts for: @f
Loop %loop: backedge-taken count is 9
Loop %loop: max backedge-taken count is 9
Loop %loop: Predicated backedge-taken count is 9
 Predicates:

Loop %loop: Trip multiple is 10

Note that the SCEV for %cmp.zext is different. I'm not sure how you got a
different answer on your end, since this should basically be the same as the IR
from Hal's example. Perhaps in the example you tried the icmp feeding in to the
latch was a different llvm::Instruction than the icmp feeding into the zext? If
so, that's a missing-CSE bug.

jbhateja updated this revision to Diff 119241.Oct 16 2017, 7:28 PM
jbhateja edited edge metadata.
  • Limiting the evaluation of ICmp to computation of SCEV of PHI nodes.

With this IR:

define void @f() {
entry:
  br label %loop

loop:
  %iv = phi i32 [ 0, %entry ], [ %iv.inc, %loop ]
  %iv.inc = add i32 %iv, 1
  %cmp = icmp ne i32 %iv.inc, 10
  %cmp.zext = zext i1 %cmp to i32
  br i1 %cmp, label %loop, label %leave

leave:
  ret void
}

and opt -analyze -scalar-evolution , on master:

Printing analysis 'Scalar Evolution Analysis' for function 'f':
Classifying expressions for: @f
  %iv = phi i32 [ 0, %entry ], [ %iv.inc, %loop ]
  -->  {0,+,1}<%loop> U: [0,10) S: [0,10)               Exits: 9                LoopDispositions: { %loop: Computable }
  %iv.inc = add i32 %iv, 1
  -->  {1,+,1}<%loop> U: [1,11) S: [1,11)               Exits: 10               LoopDispositions: { %loop: Computable }
  %cmp.zext = zext i1 %cmp to i32
  -->  (zext i1 %cmp to i32) U: [0,2) S: [0,2)          Exits: 0                LoopDispositions: { %loop: Variant }
Determining loop execution counts for: @f
Loop %loop: backedge-taken count is 9
Loop %loop: max backedge-taken count is 9
Loop %loop: Predicated backedge-taken count is 9
 Predicates:

Loop %loop: Trip multiple is 10

and with your patch:

Printing analysis 'Scalar Evolution Analysis' for function 'f':
Classifying expressions for: @f
  %iv = phi i32 [ 0, %entry ], [ %iv.inc, %loop ]
  -->  {0,+,1}<%loop> U: [0,10) S: [0,10)               Exits: 9                LoopDispositions: { %loop: Computable }
  %iv.inc = add i32 %iv, 1
  -->  {1,+,1}<%loop> U: [1,11) S: [1,11)               Exits: 10               LoopDispositions: { %loop: Computable }
  %cmp.zext = zext i1 %cmp to i32
  -->  1 U: [1,2) S: [1,2)              Exits: 1                LoopDispositions: { %loop: Invariant }
Determining loop execution counts for: @f
Loop %loop: backedge-taken count is 9
Loop %loop: max backedge-taken count is 9
Loop %loop: Predicated backedge-taken count is 9
 Predicates:

Loop %loop: Trip multiple is 10

Note that the SCEV for %cmp.zext is different. I'm not sure how you got a
different answer on your end, since this should basically be the same as the IR
from Hal's example. Perhaps in the example you tried the icmp feeding in to the
latch was a different llvm::Instruction than the icmp feeding into the zext? If
so, that's a missing-CSE bug.

@sanjoy , with the latest revision changes following is the result of scalar evolution analysis for above example.

PROMPT>opt -analyze -scalar-evolution sdas.ll
Printing analysis 'Scalar Evolution Analysis' for function 'f':
Classifying expressions for: @f

%iv = phi i32 [ 0, %entry ], [ %iv.inc, %loop ]
-->  {0,+,1}<%loop> U: [0,10) S: [0,10)               Exits: 9                LoopDispositions: { %loop: Computable }
%iv.inc = add i32 %iv, 1
-->  {1,+,1}<%loop> U: [1,11) S: [1,11)               Exits: 10               LoopDispositions: { %loop: Computable }
%cmp.zext = zext i1 %cmp to i32
-->  (zext i1 %cmp to i32) U: [0,2) S: [0,2)          Exits: 0                LoopDispositions: { %loop: Variant }

Determining loop execution counts for: @f
Loop %loop: backedge-taken count is 9
Loop %loop: max backedge-taken count is 9
Loop %loop: Predicated backedge-taken count is 9
Predicates:

Loop %loop: Trip multiple is 10

junryoungju added inline comments.Oct 16 2017, 9:10 PM
lib/Analysis/ScalarEvolution.cpp
6482 โ†—(On Diff #119241)

How about using getOne, or getZero instead of getConstant?

jbhateja added inline comments.Oct 17 2017, 3:10 AM
lib/Analysis/ScalarEvolution.cpp
6482 โ†—(On Diff #119241)

Interally it calls getConstant, I shall make this change before commit.

This comment was removed by junryoungju.
junryoungju accepted this revision.Oct 17 2017, 4:12 AM

NVM All looks fine.

jbhateja updated this revision to Diff 119297.Oct 17 2017, 5:21 AM
  • Updating for a review comment.

I don't understand why is there are CG commits here....

I don't understand why is there are CG commits here....

It got mixed by mistake, any easy way to revert last diff from differential ?

Go to diff history, and download diff. and re upload it.

jbhateja updated this revision to Diff 119306.Oct 17 2017, 6:15 AM

Removing unrelated changes uploaded by mistake in previous diff.

Go to diff history, and download diff. and re upload it.

Thanks Jun!! , its very useful.

junryoungju accepted this revision.Oct 17 2017, 6:19 AM
jbhateja updated this revision to Diff 119340.Oct 17 2017, 9:01 AM

Formatting changes.
Test case extension from .txt to .ll

junryoungju accepted this revision.Oct 17 2017, 9:29 AM

I think you should apply patch and create new diff with full-context :<

jbhateja updated this revision to Diff 119418.Oct 17 2017, 6:29 PM

If a compare instruction is same or inverse of the compare in the
branch of the loop latch, then return a constant evolution node.
Currently scope of evaluation is limited to SCEV computation for
PHI nodes.

This shall facilitate computations of loop exit counts in cases
where compare appears in the evolution chain of induction variables.

Will fix PR 34538

This revision was automatically updated to reflect the committed changes.
jbhateja reopened this revision.Oct 18 2017, 7:49 PM

[llvm] r316129 - Revert "[ScalarEvolution] Handling for ICmp occuring in the evolution chain."

Reverted by r316129 by @sanjoy, shall be landed post LGTM.

sanjoy requested changes to this revision.Oct 20 2017, 10:34 AM
sanjoy added inline comments.
llvm/trunk/lib/Analysis/ScalarEvolution.cpp
4842

This is in the right direction, but I would do it a bit more generally. I'd use SCEVRewriteVisitor to rewrite any SCEVUnknown in Accum with the backedge taken condition as 0 or 1 (depending on the direction of the latch). That should handle not just zext but any general function of the latch condition.

This revision now requires changes to proceed.Oct 20 2017, 10:34 AM
jbhateja updated this revision to Diff 119765.Oct 21 2017, 10:37 AM
jbhateja edited edge metadata.

Did you checked that this fixes emits right assemblies on clang?
sometimes it doesn't optimizes even it emits correct loop-count on opt

lib/Analysis/ScalarEvolution.cpp
4092 โ†—(On Diff #119765)

It still can computable if after simplify loop forms.
using SE.getCouldNotCompute() isn't a good idea.

see what happens on end of createAddRecFromPHI method.

4789 โ†—(On Diff #119765)

Is that really evolutable if BEValue is not a AddExpr?
we cannot assume that is always AddExpr.

this cannot evolute if isn't a AddExpr.

6499 โ†—(On Diff #119765)

It isn't really evoluting ICmp, just returning its true cond or false cond.
I think we should change this method name more intuitively

jbhateja added a comment.EditedOct 23 2017, 7:55 AM

@reviewers, kindly let me know if any other comments.

lib/Analysis/ScalarEvolution.cpp
4092 โ†—(On Diff #119765)

Could not compute SCEV is returned when visitor could not compute ICmp. This is checked at client side for validating results returned by visitor. Hope I am able to put across my point.

6499 โ†—(On Diff #119765)

Well in a way we are actually evaluating ICmp by matching it against the latch condition. Shall rename to make it more intuitive to resolve your comment.

junryoungju added a comment.EditedOct 24 2017, 12:34 AM

@reviewers, kindly let me know if any other comments.

I hope just understand that I can be rude a bit, please understand I'm not really good at english. ( I'm just a korean highschool student. )

@reviewers, kindly let me know if any other comments.

I hope just understand that I can be rude a bit, please understand I'm not really good at english. ( I'm just a korean highschool student. )

No issues with your english @junryoungju.
@sanjoy let me know if your concerns have been addressed.

jbhateja retitled this revision from [ScalarEvolution] Handling for ICmp occuring in the evolution chain. to [SCEV] Handling for ICmp occuring in the evolution chain..Oct 24 2017, 1:58 AM
jbhateja updated this revision to Diff 120193.Oct 24 2017, 11:54 PM
  • Review comments resolutions.
This comment was removed by junryoungju.
sanjoy requested changes to this revision.Oct 26 2017, 2:20 PM
sanjoy added inline comments.
lib/Analysis/ScalarEvolution.cpp
4083 โ†—(On Diff #120193)

I think a better name would be BackedgeConditionFolder. Please also add a one-liner describing what it does.

4100 โ†—(On Diff #120193)

You can just use a cast<> since you've already checked isa<Instruction>.

4103 โ†—(On Diff #120193)

Not all select condition operands are icmp instructions.

I also don't think you need the condition to be an icmp instruction -- if the condition is the latch condition, then on increment expression it is going to be true.

4817 โ†—(On Diff #120193)

I think it would be simpler if you changed SCEVCmpEvaluator::rewrite to not return CouldNotCompute and always just return the rewritten value; and here did:

Ops.push_back(SCEVCmpEvaluator::rewrite(Add->getOperand(i)));
6512 โ†—(On Diff #120193)

I'd suggest putting this method in SCEVCmpEvaluator; having it on ScalarEvolution will be prone to misuse. Once you do that, you can probably make the convention a bit simpler and have it return a SCEVConstant * which may be null to indicate failure to fold.

6514 โ†—(On Diff #120193)

This comment needs to be more specific on why the transform is correct -- it isn't that the latch condition is always true, but we know it is true if the backedge is taken.

lib/Transforms/Scalar/LoopStrengthReduce.cpp
2974 โ†—(On Diff #120193)

Why do you need this change?

This revision now requires changes to proceed.Oct 26 2017, 2:20 PM
junryoungju added a comment.EditedOct 26 2017, 7:04 PM

Okay, just we need to make sure we are evoluting conditional instructions or just checking that instructions are true or false for now.

first one.
If we are "actaully" evoluting a conditional instructions. better creating SCEVConditonal, checking that is true or false when is computing backedge taken count on each AddRec step.
I think this is a correct way for handling conditional instructions.
but, this will cause SCEV jobs MUUUUCHHH SLOWER.

second one.
If we are just checking that instructions are true or false. we do not need to rewrite it, rewriting value means that we handle that value as that "actual" SCEV. (register SCEV on SCEVCallbackVH)
this mean, this is same to write this chain on to createSCEV that we did at first.
so we have to handle it as a "except" of SCEV chain. better inlining code into createAddRecFromPHI.

jbhateja updated this revision to Diff 120731.Oct 28 2017, 4:19 AM
jbhateja edited edge metadata.
  • Review comment resolutions.
sanjoy requested changes to this revision.Oct 30 2017, 12:03 AM

This is looking close to ready, just a few minor things inline.

lib/Analysis/ScalarEvolution.cpp
4092 โ†—(On Diff #120731)

Doesn't look like L can be nullptr - ScalarEvolution::createAddRecFromPHI bails out early if L is nullptr.

I think you can rewrite this to be:

if (BasicBlock *Latch = ...) {
  ...
}
4115 โ†—(On Diff #120731)

Calling getSCEV + visit seems a bit overkill -- can't you just compare I->getOperand(0) with LoopBackOnPosCond here?

How about adding an Optional<bool> isValueKnown(Value *Cond) function that returns true or false if Cond is equal to LoopBackOnPosCond else returns None, and using that function in both the cases?

4122 โ†—(On Diff #120731)

Typically LLVM style is would be putting the break; inside the curly brace block.

4137 โ†—(On Diff #120731)

A more idiomatic name for this is BackedgeTakenOnTrue.

6518 โ†—(On Diff #120731)

Whitespace damange?

test/Analysis/ScalarEvolution/pr34538.ll
13 โ†—(On Diff #120731)

I'd put these two test cases in the same file with a somewhat more descriptive file name. Perhaps you can put the PR number as the function name instead?

This revision now requires changes to proceed.Oct 30 2017, 12:03 AM

:( that I commented didn't reviewed. this is why I wanted to make other reversion.

:( that I commented didn't reviewed. this is why I wanted to make other reversion.

Are you saying that you had made the same comments as I did above? If so, that wasn't clear from what you wrote. You can help authors by:

  • Adding inline comments instead of a top level comment, knowing which part of the patch you're referring to helps comprehension
  • Being specific; instead of saying "this is very general" say "use cast instead of switch"
jbhateja updated this revision to Diff 120877.Oct 30 2017, 1:09 PM
jbhateja edited edge metadata.
  • Review comments resolution.
sanjoy requested changes to this revision.Oct 30 2017, 2:18 PM
sanjoy added inline comments.
lib/Analysis/ScalarEvolution.cpp
4085 โ†—(On Diff #120877)

I'd emphasize that this replacement is valid only in some specific cases (the backedge increment of an add recurrence).

4089 โ†—(On Diff #120877)

This can be private. Please also make it explicit.

4095 โ†—(On Diff #120877)

I think you also need to check that the false edge actually leaves the loop. If both the edges out of the conditional branch branch to the header then this rewrite is not valid.

(Please also add a test case for the above situation ^).

4102 โ†—(On Diff #120877)

You can bail out early if BackedgeCond is nullptr.

4118 โ†—(On Diff #120877)

Use getTrueValue() and getFalseValue() instead of hard-coding the operand indices.

4115 โ†—(On Diff #120731)

This comment needs to be addressed -- I don't think we should be creating SCEVUnknown unnecessarily as you're doing here.

6518 โ†—(On Diff #120731)

Wasn't fixed.

This revision now requires changes to proceed.Oct 30 2017, 2:18 PM
junryoungju added a comment.EditedOct 30 2017, 4:18 PM

:( that I commented didn't reviewed. this is why I wanted to make other reversion.

Are you saying that you had made the same comments as I did above? If so, that wasn't clear from what you wrote. You can help authors by:

  • Adding inline comments instead of a top level comment, knowing which part of the patch you're referring to helps comprehension
  • Being specific; instead of saying "this is very general" say "use cast instead of switch"

No, I wanted to say all that we are missing a very important point.

Okay, just we need to make sure we are evoluting conditional instructions or just checking that instructions are true or false for now.

first one.
If we are "actaully" evoluting a conditional instructions. better creating SCEVConditonal, checking that is true or false when is computing backedge taken count on each AddRec step.
I think this is a correct way for handling conditional instructions.
but, this will cause SCEV jobs MUUUUCHHH SLOWER.

second one.
If we are just checking that instructions are true or false. we do not need to rewrite it, rewriting value means that we handle that value as that "actual" SCEV. (register SCEV on SCEVCallbackVH)
this mean, this is same to write this chain on to createSCEV that we did at first.
so we have to handle it as a "except" of SCEV chain. better inlining code into createAddRecFromPHI.

This is the actual point. not only fixing a ICmp, this will not fix a "point" that bug reported.
and this will cause same bug, but from littlely different code.
It just fixing only that "specific" IR assembles in a very narrow range.

:( that I commented didn't reviewed. this is why I wanted to make other reversion.

Are you saying that you had made the same comments as I did above? If so, that wasn't clear from what you wrote. You can help authors by:

  • Adding inline comments instead of a top level comment, knowing which part of the patch you're referring to helps comprehension
  • Being specific; instead of saying "this is very general" say "use cast instead of switch"

No, I wanted to say all that we are missing a very important point.

Okay, just we need to make sure we are evoluting conditional instructions or just checking that instructions are true or false for now.

first one.
If we are "actaully" evoluting a conditional instructions. better creating SCEVConditonal, checking that is true or false when is computing backedge taken count on each AddRec step.
I think this is a correct way for handling conditional instructions.
but, this will cause SCEV jobs MUUUUCHHH SLOWER.

second one.
If we are just checking that instructions are true or false. we do not need to rewrite it, rewriting value means that we handle that value as that "actual" SCEV. (register SCEV on SCEVCallbackVH)
this mean, this is same to write this chain on to createSCEV that we did at first.
so we have to handle it as a "except" of SCEV chain. better inlining code into createAddRecFromPHI.

This is the actual point. not only fixing a ICmp, this will not fix a "point" that bug reported.
and this will cause same bug, but from littlely different code.
It just fixing only that "specific" IR assembles in a very narrow range.

@junryoungju, it will be good if you append to the bug report with other Patterns, I had made changes in my last revision to cover other case[s]too (I guess you added one another pattern in the mirror revision which you created from this patch) , we can generalise this or incremental addition can also be made.

The problem is, if under ICmp's operand node has a ICmp can evolutable? (or any other conditional expressions).
This only can be solved by registering this chain into getSCEV.

; RUN: opt -analyze -scalar-evolution < %s | FileCheck %s --check-prefix=CHECK-ANALYSIS

define i32 @foo() {
  br label %do_start

do_start:
  %inc = phi i32 [ 0, %0 ], [ %add, %do_start ]
  %cmp = icmp slt i32 %inc, 10000
  %add_cond = add nsw i32 %inc, 2
  %sel = select i1 %cmp, i32 %add_cond, i32 %inc
  %add = add nsw i32 %sel, 1
  br i1 %cmp, label %do_start, label %do_end

; CHECK-ANALYSIS: Loop %do_start: backedge-taken count is 3334
; CHECK-ANALYSIS: Loop %do_start: max backedge-taken count is 3334
; CHECK-ANALYSIS: Loop %do_start: Predicated backedge-taken count is 3334

do_end:
  ret i32 0
}

This case is "actaully" generated from Clang executable using -O3 -S -emit-llvm option.
that mean this case can be "always" appear.

To fix this case, we need to create SCEVConditional. that I commented here.

Okay, just we need to make sure we are evoluting conditional instructions or just checking that instructions are true or false for now.

first one.
If we are "actaully" evoluting a conditional instructions. better creating SCEVConditonal, checking that is true or false when is computing backedge taken count on each AddRec step.
I think this is a correct way for handling conditional instructions.
but, this will cause SCEV jobs MUUUUCHHH SLOWER.

second one.
If we are just checking that instructions are true or false. we do not need to rewrite it, rewriting value means that we handle that value as that "actual" SCEV. (register SCEV on SCEVCallbackVH)
this mean, this is same to write this chain on to createSCEV that we did at first.
so we have to handle it as a "except" of SCEV chain. better inlining code into createAddRecFromPHI.

and just note this issue CANNOT be solved just by checking ICmp has conditional instruction.

jbhateja updated this revision to Diff 120936.Oct 30 2017, 10:32 PM
jbhateja edited edge metadata.
  • Review comments resolution.
lib/Analysis/ScalarEvolution.cpp
4085 โ†—(On Diff #120877)

Returning a SCEV for a true or false value by checking backedge condition of the loop latch is generic theme, which is what comment states.

4089 โ†—(On Diff #120877)

Yes, infact other visitor SCEVShiftRewriter is having same problem. Will be fixed as NFCI.

4095 โ†—(On Diff #120877)

Can you give a scenario where this can happen ? A latch where both the branches have same desitination header ?

4102 โ†—(On Diff #120877)

Done

4118 โ†—(On Diff #120877)

Done.

junryoungju added inline comments.Oct 30 2017, 10:37 PM
lib/Analysis/ScalarEvolution.cpp
4789 โ†—(On Diff #119765)

This still didn't fixed.
You should try clang output to this patch emits correct optimized assemblies that commented on original bug report.

The problem is, if under ICmp's operand node has a ICmp can evolutable? (or any other conditional expressions).
This only can be solved by registering this chain into getSCEV.

; RUN: opt -analyze -scalar-evolution < %s | FileCheck %s --check-prefix=CHECK-ANALYSIS

define i32 @foo() {
  br label %do_start

do_start:
  %inc = phi i32 [ 0, %0 ], [ %add, %do_start ]
  %cmp = icmp slt i32 %inc, 10000
  %add_cond = add nsw i32 %inc, 2
  %sel = select i1 %cmp, i32 %add_cond, i32 %inc
  %add = add nsw i32 %sel, 1
  br i1 %cmp, label %do_start, label %do_end

; CHECK-ANALYSIS: Loop %do_start: backedge-taken count is 3334
; CHECK-ANALYSIS: Loop %do_start: max backedge-taken count is 3334
; CHECK-ANALYSIS: Loop %do_start: Predicated backedge-taken count is 3334

do_end:
  ret i32 0
}

This case is "actaully" generated from Clang executable using -O3 -S -emit-llvm option.
that mean this case can be "always" appear.

To fix this case, we need to create SCEVConditional. that I commented here.

Okay, just we need to make sure we are evoluting conditional instructions or just checking that instructions are true or false for now.

first one.
If we are "actaully" evoluting a conditional instructions. better creating SCEVConditonal, checking that is true or false when is computing backedge taken count on each AddRec step.
I think this is a correct way for handling conditional instructions.
but, this will cause SCEV jobs MUUUUCHHH SLOWER.

second one.
If we are just checking that instructions are true or false. we do not need to rewrite it, rewriting value means that we handle that value as that "actual" SCEV. (register SCEV on SCEVCallbackVH)
this mean, this is same to write this chain on to createSCEV that we did at first.
so we have to handle it as a "except" of SCEV chain. better inlining code into createAddRecFromPHI.

and just note this issue CANNOT be solved just by checking ICmp has conditional instruction.

Just try pulling out earlier diff : https://reviews.llvm.org/D38494?id=120193 , this was fixing your new case.
You shuld attempt an incremental fix once this patch goes through.

junryoungju added a comment.EditedOct 31 2017, 1:43 AM

The problem is, if under ICmp's operand node has a ICmp can evolutable? (or any other conditional expressions).
This only can be solved by registering this chain into getSCEV.

; RUN: opt -analyze -scalar-evolution < %s | FileCheck %s --check-prefix=CHECK-ANALYSIS

define i32 @foo() {
  br label %do_start

do_start:
  %inc = phi i32 [ 0, %0 ], [ %add, %do_start ]
  %cmp = icmp slt i32 %inc, 10000
  %add_cond = add nsw i32 %inc, 2
  %sel = select i1 %cmp, i32 %add_cond, i32 %inc
  %add = add nsw i32 %sel, 1
  br i1 %cmp, label %do_start, label %do_end

; CHECK-ANALYSIS: Loop %do_start: backedge-taken count is 3334
; CHECK-ANALYSIS: Loop %do_start: max backedge-taken count is 3334
; CHECK-ANALYSIS: Loop %do_start: Predicated backedge-taken count is 3334

do_end:
  ret i32 0
}

This case is "actaully" generated from Clang executable using -O3 -S -emit-llvm option.
that mean this case can be "always" appear.

To fix this case, we need to create SCEVConditional. that I commented here.

Okay, just we need to make sure we are evoluting conditional instructions or just checking that instructions are true or false for now.

first one.
If we are "actaully" evoluting a conditional instructions. better creating SCEVConditonal, checking that is true or false when is computing backedge taken count on each AddRec step.
I think this is a correct way for handling conditional instructions.
but, this will cause SCEV jobs MUUUUCHHH SLOWER.

second one.
If we are just checking that instructions are true or false. we do not need to rewrite it, rewriting value means that we handle that value as that "actual" SCEV. (register SCEV on SCEVCallbackVH)
this mean, this is same to write this chain on to createSCEV that we did at first.
so we have to handle it as a "except" of SCEV chain. better inlining code into createAddRecFromPHI.

and just note this issue CANNOT be solved just by checking ICmp has conditional instruction.

Just try pulling out earlier diff : https://reviews.llvm.org/D38494?id=120193 , this was fixing your new case.
You shuld attempt an incremental fix once this patch goes through.

That isn't a POINT, Please read the point that I commented.
Only that handling each test case dosn't means fixing a "point" of problem.

Is the point of this problem is handling the problem just by CHECKING that ICmp is true, or false?
OR are we REALLY wants to handle condtional instructions.
The point of problem is, current SCEV chain cannot handle conditonal expressions.
NOT only a clang, clang emits PHI node when its conditional, LLVM is a toolchain that we can create a compiler.

so I wanted to listen the opinion that "how about handling condtional variables by just creating SCEVCondtional, and comparing at computing backedge-count by visiting each step."

The problem is, if under ICmp's operand node has a ICmp can evolutable? (or any other conditional expressions).
This only can be solved by registering this chain into getSCEV.

; RUN: opt -analyze -scalar-evolution < %s | FileCheck %s --check-prefix=CHECK-ANALYSIS

define i32 @foo() {
  br label %do_start

do_start:
  %inc = phi i32 [ 0, %0 ], [ %add, %do_start ]
  %cmp = icmp slt i32 %inc, 10000
  %add_cond = add nsw i32 %inc, 2
  %sel = select i1 %cmp, i32 %add_cond, i32 %inc
  %add = add nsw i32 %sel, 1
  br i1 %cmp, label %do_start, label %do_end

; CHECK-ANALYSIS: Loop %do_start: backedge-taken count is 3334
; CHECK-ANALYSIS: Loop %do_start: max backedge-taken count is 3334
; CHECK-ANALYSIS: Loop %do_start: Predicated backedge-taken count is 3334

do_end:
  ret i32 0
}

This case is "actaully" generated from Clang executable using -O3 -S -emit-llvm option.
that mean this case can be "always" appear.

To fix this case, we need to create SCEVConditional. that I commented here.

Okay, just we need to make sure we are evoluting conditional instructions or just checking that instructions are true or false for now.

first one.
If we are "actaully" evoluting a conditional instructions. better creating SCEVConditonal, checking that is true or false when is computing backedge taken count on each AddRec step.
I think this is a correct way for handling conditional instructions.
but, this will cause SCEV jobs MUUUUCHHH SLOWER.

second one.
If we are just checking that instructions are true or false. we do not need to rewrite it, rewriting value means that we handle that value as that "actual" SCEV. (register SCEV on SCEVCallbackVH)
this mean, this is same to write this chain on to createSCEV that we did at first.
so we have to handle it as a "except" of SCEV chain. better inlining code into createAddRecFromPHI.

and just note this issue CANNOT be solved just by checking ICmp has conditional instruction.

Just try pulling out earlier diff : https://reviews.llvm.org/D38494?id=120193 , this was fixing your new case.
You shuld attempt an incremental fix once this patch goes through.

That isn't a POINT, Please read the point that I commented.
Only that handling each test case dosn't means fixing a "point" of problem.

Is the point of this problem is handling the problem just by CHECKING that ICmp is true, or false?
OR are we REALLY wants to handle condtional instructions.
The point of problem is, current SCEV chain cannot handle conditonal expressions.
NOT only a clang, clang emits PHI node when its conditional, LLVM is a toolchain that we can create a compiler.

so I wanted to listen the opinion that "how about handling condtional variables by just creating SCEVCondtional, and comparing at computing backedge-count by visiting each step."

@junryoungju, I would suggest we can do this discussion over llvm-dev mailing list. This patch is not claiming to fix all the patterns and we can incrementally add over it.

@sanjoy, all your comments have been resolved.

sanjoy requested changes to this revision.Nov 5 2017, 1:23 PM

Mostly minor stuff remains + the one correctness issue with a conditional latch branching to the header in both the true and false cases.

lib/Analysis/ScalarEvolution.cpp
4124 โ†—(On Diff #120936)

The ICmp prefix to ICmpSE is misleading, since the condition does not need to be an icmp.

4095 โ†—(On Diff #120877)

Yes

4115 โ†—(On Diff #120731)

The cast isa seems redundant here -- can't you return a value of type SCEVConstant (that may or may not be null)?

Though I'd still prefer Optional<bool> since that's more specific -- reading SCEVConstant you can't immediately tell that CondSE is either 1 or 0; but with Optional<bool> that fact will be obvious from the type system.

This revision now requires changes to proceed.Nov 5 2017, 1:23 PM
jbhateja added inline comments.Nov 5 2017, 8:28 PM
lib/Analysis/ScalarEvolution.cpp
4095 โ†—(On Diff #120877)

Yes ? @sanjoy, Can you provide a scenario/test in HLL where such situation is possible.

sanjoy added inline comments.Nov 5 2017, 8:57 PM
lib/Analysis/ScalarEvolution.cpp
4095 โ†—(On Diff #120877)

I answered "yes" to your second question -- since that's precisely the situation I'm talking about:

define void @foo(i1 %c) {
entry:
  br label %loop

loop:
  br i1 %c, label %loop, label %loop
}
jbhateja added inline comments.Nov 8 2017, 10:53 AM
lib/Analysis/ScalarEvolution.cpp
4095 โ†—(On Diff #120877)

Well, what I was checking was how can such an IR be generated ? Are we making any such check for both outgoing branches from a latch any where else in the code, because conditional branch instruction should become an unconditional branch in this scenario.

Shall revise this patch with your other comments, do add any thing else to save iteration ;-)

sanjoy added inline comments.Nov 8 2017, 11:08 AM
lib/Analysis/ScalarEvolution.cpp
4095 โ†—(On Diff #120877)

instruction should become an unconditional branch in this scenario.

Correctness of transformations cannot rely on a specific optimization having happened -- you're right that in most cases the trivial conditional branch will be folded to an unconditional one, but we can't miscompile code if that optimization didn't happen for some reason or the other.

jbhateja updated this revision to Diff 122314.Nov 9 2017, 12:56 PM
jbhateja edited edge metadata.

ping @reviewers

sanjoy accepted this revision.Nov 12 2017, 12:46 PM

Mostly minor stuff; please check in directly after addressing these.

lib/Analysis/ScalarEvolution.cpp
4099 โ†—(On Diff #122314)

It is usually customary to to put the else block in braces if the if is in braces.

4108 โ†—(On Diff #122314)

No need to change it in this patch, but this constraint isn't necessary, is it?

4149 โ†—(On Diff #122314)

I think it is better to return Optional<bool> here, and instead call getOne and getZero in the default: case in the switch on I->getOpcode() which is the only place that needs it. It will also make the case Instruction::Select: code a bit simpler.

test/Analysis/ScalarEvolution/pr34538.ll
32 โ†—(On Diff #122314)

Please also add a test case where both the latch branches branch to the loop header.

This revision is now accepted and ready to land.Nov 12 2017, 12:46 PM
jbhateja added inline comments.Nov 13 2017, 8:13 AM
lib/Analysis/ScalarEvolution.cpp
4099 โ†—(On Diff #122314)

Actually I purpusfully wrote it this way based on comments received on other patches reviewed. Shall change here following the file convention.

4108 โ†—(On Diff #122314)

Not sure, why this constraint not necssary, why do we want to cross loop boundary to explore evolution expression. If Expression is loop invariant either its a constant or its def lies outside the loop.

test/Analysis/ScalarEvolution/pr34538.ll
32 โ†—(On Diff #122314)

This will break the latch detection at header as getLoopLatch() looks at all the incoming branches and if sees more than one incoming branch from a block within loop then it returns null. Though two disjoint natural-loops can have same headers and two loop latches are possible. Adding a test will be futile here.

jbhateja updated this revision to Diff 122660.Nov 13 2017, 8:31 AM
  • Formatting change.
  • Rebasing for commit.
This revision was automatically updated to reflect the committed changes.
sanjoy added inline comments.Nov 13 2017, 9:30 AM
lib/Analysis/ScalarEvolution.cpp
4108 โ†—(On Diff #122314)

Yes, it does not help ver much if the increment amount is already a constant. But it can help e.g. change an addrec from {0,+,%arg} (%arg being a value whose def is outside the loop) to {0,+,1} which is useful.

4149 โ†—(On Diff #122314)

This wasn't addressed?

test/Analysis/ScalarEvolution/pr34538.ll
32 โ†—(On Diff #122314)

I see, that's a very good point. That means the BI->getSuccessor(0) != BI->getSuccessor(1) can be an assert instead?

jbhateja added inline comments.Nov 13 2017, 9:53 AM
lib/Analysis/ScalarEvolution.cpp
4149 โ†—(On Diff #122314)

My bad I missed it,
BTW I'm not clear how you want this and if current code is not clean.

test/Analysis/ScalarEvolution/pr34538.ll
32 โ†—(On Diff #122314)

Exaclty, Is post commit NFC for this and your proposed code ok here.

sanjoy added inline comments.Nov 23 2017, 12:32 PM
lib/Analysis/ScalarEvolution.cpp
4149 โ†—(On Diff #122314)

I was suggesting:

Optional<bool> compareWithBackedgeCondition(Value *IC) {
  return BackedgeCond == IC ? IsPositiveBECond : None;
}

and then putting the getZero() vs. getOne() in the default: case. In the Select case you would not have to call getZero() and getOne()

test/Analysis/ScalarEvolution/pr34538.ll
32 โ†—(On Diff #122314)

Yes, a post-commit NFC fixup is fine (feel free to push without review).