Page MenuHomePhabricator

[ScalarEvolution] Handling Conditional Instruction in SCEV chain.
AbandonedPublic

Authored by junryoungju on Sep 9 2017, 2:54 AM.

Details

Summary

this code, only evolaute node that cannot be resolved as SCEVAddRec.
so now emits correct Loop Exits and Disposition with following IR.

define void @f() {
entry:
  br label %loop

loop:
  %iv = phi i32 [ 0, %entry ], [ %iv.inc, %loop ]
  %iv.inc = add i32 %iv, 1
  %cmp = icmp ne i32 %iv.inc, 10
  %cmp.zext = zext i1 %cmp to i32
  br i1 %cmp, label %loop, label %leave

leave:
  ret void
}
Printing analysis 'Scalar Evolution Analysis' for function 'f':
Classifying expressions for: @f
  %iv = phi i32 [ 0, %entry ], [ %iv.inc, %loop ]
  -->  {0,+,1}<%loop> U: [0,10) S: [0,10)               Exits: 9                LoopDispositions: { %loop: Computable }
  %iv.inc = add i32 %iv, 1
  -->  {1,+,1}<%loop> U: [1,11) S: [1,11)               Exits: 10               LoopDispositions: { %loop: Computable }
  %cmp.zext = zext i1 %cmp to i32
  -->  (zext i1 %cmp to i32) U: [0,2) S: [0,2)          Exits: 0                LoopDispositions: { %loop: Variant }
Determining loop execution counts for: @f
Loop %loop: backedge-taken count is 9
Loop %loop: max backedge-taken count is 9
Loop %loop: Predicated backedge-taken count is 9
 Predicates:

Loop %loop: Trip multiple is 10

Diff Detail

Event Timeline

junryoungju created this revision.Sep 9 2017, 2:54 AM
junryoungju planned changes to this revision.Sep 11 2017, 12:43 AM
This comment was removed by junryoungju.
junryoungju retitled this revision from [DAG][X86] Fixing X86 DAG Selecting that if BLENDPD instruction has 1 mask and target architecture doesn't suffer from partial register stalls replace BLENDPD to MOVSD to [ScalarEvolution] Handling for ICmp occuring in the evolution chain..
junryoungju edited the summary of this revision. (Show Details)
junryoungju added reviewers: sanjoy, hfinkel, jbhateja.
junryoungju added subscribers: sanjoy, hfinkel, jbhateja.

Lol I just did reversion old patch.
just ignore old comments or patch.

And here is emitted assemblies with following code. ( clang -O3 -S )

int foo() {
    int start = 0;
    int end = 10000;
    int tag = 0;
    do {
        tag = 0;
        if (start < end) 
        {
            start++;
            tag = 1;
        }
    } while (tag);
    return 0;
}

patched version.

	.text
	.def	 "?foo@@YAHXZ";
	.scl	2;
	.type	32;
	.endef
	.globl	"?foo@@YAHXZ"           # -- Begin function ?foo@@YAHXZ
	.p2align	4, 0x90
"?foo@@YAHXZ":                          # @"\01?foo@@YAHXZ"
# BB#0:                                 # %entry
	xorl	%eax, %eax
	retq
                                        # -- End function

with godbolt.org compiler.
https://godbolt.org/g/QM4Sbx

foo(): # @foo()
  xor eax, eax
.LBB0_1: # =>This Inner Loop Header: Depth=1
  xor ecx, ecx
  cmp eax, 10000
  setl cl
  add ecx, eax
  cmp eax, 10000
  mov eax, ecx
  jl .LBB0_1
  xor eax, eax
  ret
hfinkel edited edge metadata.Oct 11 2017, 6:50 AM

Please upload a patch with full context.

junryoungju added a comment.EditedOct 11 2017, 4:52 PM

I don't understand what you mean D:
does it meaning do I need to upload raw source code?

sanjoy edited edge metadata.Oct 11 2017, 5:02 PM

I don't understand what you mean D:
does it meaning do I need to upload raw source code?

See https://llvm.org/docs/Phabricator.html#id3

It isn't also obvious to me how this new version fixes the issue I pointed out in the previous revision -- please mention that in the next version.

I went through your example, and I think I know what you need to do -- while you cannot assume that a condition feeding into the loop latch is true generally, you can assume it is true *in the context* of a add recurrence increment. That is, in createAddRecFromPHI, under // Create an add with everything but the specified operand., if an operand to the addition is a loop variant value that is the loop latch condition then you can assume said loop variant value to be the constant 1.

junryoungju added a comment.EditedOct 11 2017, 6:23 PM

I went through your example, and I think I know what you need to do -- while you cannot assume that a condition feeding into the loop latch is true generally, you can assume it is true *in the context* of a add recurrence increment. That is, in createAddRecFromPHI, under // Create an add with everything but the specified operand., if an operand to the addition is a loop variant value that is the loop latch condition then you can assume said loop variant value to be the constant 1.

Yes, but. if I understood correctly, isn't it assuming from context seems does not match with SCEV's concept?
It will just analyze all operand, not evoluting child operand nodes.

so I was tried to evolute "Latch is taken, but cannot be evoluted as SCEVAddRecExpr" generally that emitted as "SCEVUnknown" operand with ICmp.
this mean, this will not evolute general ICmp instruction that has correct SCEV node that resolved as SCEVAddRecExpr operand.

jbhateja added inline comments.Oct 12 2017, 8:11 AM
lib/Analysis/ScalarEvolution.cpp
6127

Hi,

As per comments over reivew https://reviews.llvm.org/D38494.

Even if we fine tune the condition for calling evaluateForICmp we shall still be polluting the scalar evolution of zext (please refer to D38494 to test example). Also the LoopDisposition for zext will now show in-variant where as its variant as in last iteration value to zext goes low i.e. false.

One full proof method [ not sure if optimal, do suggest better options ] could be to reduce the trip count of loop by one ie. make its range from start to end-1 and copy to contents of loop outside the loop for last iteation of loop. This shall maintain the sanity of scalar evolution analysis since now evaluateForICmp can return TRUE when compare condition matches with the latch condition and any cline using this info shall compute correct LoopDispositions.

jbhateja edited edge metadata.Oct 12 2017, 8:14 AM
junryoungju added a comment.EditedOct 12 2017, 6:05 PM

I went through your example, and I think I know what you need to do -- while you cannot assume that a condition feeding into the loop latch is true generally, you can assume it is true *in the context* of a add recurrence increment. That is, in createAddRecFromPHI, under // Create an add with everything but the specified operand., if an operand to the addition is a loop variant value that is the loop latch condition then you can assume said loop variant value to be the constant 1.

Yes, but. if I understood correctly, isn't it assuming from context seems does not match with SCEV's concept?
It will just analyze all operand, not evoluting child operand nodes.

so I was tried to evolute "Latch is taken, but cannot be evoluted as SCEVAddRecExpr" generally that emitted as "SCEVUnknown" operand with ICmp.
this mean, this will not evolute general ICmp instruction that has correct SCEV node that resolved as SCEVAddRecExpr operand.

But, how do I get SCEVAddExpr before we end evoluting PHI Node.
It will be like.
PHI node -> SCEVAddExpr -> SCEVZeroExtendExpr -> ICmp -> PHI node -> ..... -> infinite loop.
to compare AddExpr with condition of terminator.

ICmpInst ICI = dyn_cast<U>;

Value* LHS = ICI->getOperand(0); // << This will constant.
Value* RHS = ICI->getOperand(1); // << This will PHI node.

So using getSCEV to get SCEVAddExpr on RHS will make infinite loop. because its absolutly called by getSCEV -> createSCEV -> getZeroExtendExpr
See what it happens on call stack.

 	clang.exe!llvm::ScalarEvolution::getSCEV(llvm::Value * V) 줄 3773	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::createSCEV(llvm::Value * V) 줄 6114	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::getSCEV(llvm::Value * V) 줄 3773	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::createSCEV(llvm::Value * V) 줄 5842	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::getSCEV(llvm::Value * V) 줄 3773	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::createSCEV(llvm::Value * V) 줄 6178	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::getSCEV(llvm::Value * V) 줄 3773	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::createSCEV(llvm::Value * V) 줄 6114	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::getSCEV(llvm::Value * V) 줄 3773	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::createSCEV(llvm::Value * V) 줄 5842	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::getSCEV(llvm::Value * V) 줄 3773	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::createSCEV(llvm::Value * V) 줄 6178	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::getSCEV(llvm::Value * V) 줄 3773	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::createSCEV(llvm::Value * V) 줄 6114	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::getSCEV(llvm::Value * V) 줄 3773	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::createSCEV(llvm::Value * V) 줄 5842	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::getSCEV(llvm::Value * V) 줄 3773	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::createSCEV(llvm::Value * V) 줄 6178	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::getSCEV(llvm::Value * V) 줄 3773	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::createSCEV(llvm::Value * V) 줄 6114	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::getSCEV(llvm::Value * V) 줄 3773	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::createSCEV(llvm::Value * V) 줄 5842	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::getSCEV(llvm::Value * V) 줄 3773	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::createSCEV(llvm::Value * V) 줄 6178	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::getSCEV(llvm::Value * V) 줄 3773	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::createSCEV(llvm::Value * V) 줄 6114	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::getSCEV(llvm::Value * V) 줄 3773	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::createSCEV(llvm::Value * V) 줄 5842	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::getSCEV(llvm::Value * V) 줄 3773	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::createSCEV(llvm::Value * V) 줄 6178	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::getSCEV(llvm::Value * V) 줄 3773	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::createSCEV(llvm::Value * V) 줄 6114	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::getSCEV(llvm::Value * V) 줄 3773	C++	기호가 로드되었습니다.
 	clang.exe!llvm::ScalarEvolution::createSCEV(llvm::Value * V) 줄 5842	C++	기호가 로드되었습니다.
......

This is being wrong, we have to analyze it not by evoluting any other operands or expressions.

Now check that AddExpr on ICmp's PHI Node modifies ICmp's comparer operand.

sanjoy requested changes to this revision.Oct 12 2017, 10:41 PM
sanjoy added inline comments.
lib/Analysis/ScalarEvolution.cpp
6127

Yes, I don't think being increasingly clever on getSCEV(icmp ...) is the right path forward. The fix needs to happen in createAddRecFromPHI -- at https://github.com/llvm-mirror/llvm/blob/c88052663b093c458a1645279983e0d50e3fdb0b/lib/Analysis/ScalarEvolution.cpp#L4745 you need to check if the operand to the add expression is based on (e.g. is a zext or sext) the backedge taken cmp*. If it is, you can replace said operand with 1 when constructing the add recurrence.

  • In this specific case, you can look through the SCEVZextExpr and the SCEVUnknownExpr and pull out the icmp.
This revision now requires changes to proceed.Oct 12 2017, 10:41 PM

so that I have to change this code to emit SCEVAddRecExpr on createAddRecFromPHI right?
not evoluting SCEVUnknown.

current trunk keep build fails D: I ill upload when its available.

junryoungju added a comment.EditedOct 14 2017, 6:08 PM

I don't understand what is happening right now D:

opt --scalar-evolution --loop-deletion --analyze -S with follow IR.

; RUN: opt -S -scalar-evolution -loop-deletion -simplifycfg -analyze < %s | FileCheck %s --check-prefix=CHECK-ANALYSIS

define i32 @foo() local_unnamed_addr #0 {
; CHECK-ANALYSIS: Loop %do.body: backedge-taken count is 10000
; CHECK-ANALYSIS: Loop %do.body: max backedge-taken count is 10000
; CHECK-ANALYSIS: Loop %do.body: Predicated backedge-taken count is 10000
entry:
  br label %do.body

do.body:                                          ; preds = %do.body, %entry
  %start.0 = phi i32 [ 0, %entry ], [ %inc.start.0, %do.body ]
  %cmp = icmp slt i32 %start.0, 10000
  %inc = zext i1 %cmp to i32
  %inc.start.0 = add nsw i32 %start.0, %inc
  br i1 %cmp, label %do.body, label %do.end

do.end:                                           ; preds = %do.body
  ret i32 0
}
Printing analysis 'Scalar Evolution Analysis' for function 'foo':
Classifying expressions for: @foo
  %start.0 = phi i32 [ 0, %entry ], [ %inc.start.0, %do.body ]
  -->  {0,+,1}<%do.body> U: full-set S: full-set                Exits: <<Unknown>>              LoopDispositions: { %do.body: Computable }
  %inc = zext i1 %cmp to i32
  -->  (zext i1 %cmp to i32) U: [0,2) S: [0,2)          Exits: <<Unknown>>              LoopDispositions: { %do.body: Variant }
  %inc.start.0 = add nsw i32 %start.0, %inc
  -->  ((zext i1 %cmp to i32) + %start.0) U: full-set S: full-set               Exits: <<Unknown>>      LoopDispositions: { %do.body: Variant }
Determining loop execution counts for: @foo
Loop %do.body: Unpredictable backedge-taken count.
Loop %do.body: Unpredictable max backedge-taken count.
Loop %do.body: Unpredictable predicated backedge-taken count.

BUT IT DELETES LOOP. (this is the actual problem while testing)

	.text
	.def	 "?foo@@YAHXZ";
	.scl	2;
	.type	32;
	.endef
	.globl	"?foo@@YAHXZ"           # -- Begin function ?foo@@YAHXZ
	.p2align	4, 0x90
"?foo@@YAHXZ":                          # @"\01?foo@@YAHXZ"
# BB#0:                                 # %entry
	xorl	%eax, %eax
	retq
                                        # -- End function

returning AddRecExpr with getOne, getZero and StartV, ZEXT(Constant 1) has same results.
is it emitting correctly? D:

if I understood correctly, it shouldn't be. because its "actually" predicatable. also, this will not optimize something like this.

int foo() {
    int start = 0;
    int end = 10000;
    int tag = 0;
    do {
        tag = 0;
        if (start < end) 
        {
            start++;
            tag = 1;
        }
    } while (tag);
    printf("%d", start);
    return 0;
}

[ ADDED ]
Is there are any changes on LLVM's SCEV / Loop Exit Limit?
I gets a same result (not clang with -S result, I mean opt.exe's result) on current trunk.

junryoungju updated this revision to Diff 119050.EditedOct 14 2017, 7:40 PM
junryoungju edited edge metadata.

this will handle checking AddExpr increasing ICmp's target operand, also now handles if AddExpr increasing an Constant, we are going to use it.

Now do not handle that we cannot find exit condition. also, refactored.

NOTE : DO NOT MERGE OR ACCEPT THIS. FIXING DIFF.

Cleanup, Refactored code.

@sanjoy its done :> review please.

Now emits correct backedge-taken count.

Fixed that checking SelectInst will not return nullptr if loop isn't same.

junryoungju added a comment.EditedOct 16 2017, 9:53 PM

But still it doesn't handle if its not a ICmpInst or SelectInst.
We should also handle SwitchInst.

Is there a any way to handle this case?

void test() {
    auto step = 0;
    while(true) {
        switch (step) {
            case 0:
                step = 1;
                break;
            case 1:
                step = 2;
                break;
            case 2:
                step = 3;
                break;
            case 3:
                return;
        }
    }
}

@sanjoy @jbhateja @hfinkel

It should be like this

while.cond:                                       ; preds = %sw.epilog, %entry
  %step.0 = phi i32 [ 0, %entry ], [ %step.1, %sw.epilog ]
  switch i32 %step.0, label %sw.epilog [
    i32 0, label %sw.bb
    i32 1, label %sw.bb1
    i32 2, label %sw.bb2
    i32 3, label %sw.bb3
  ]

%step.1 = phi i32 [ %step.0, %while.cond ], [ 3, %sw.bb2 ], [ 2, %sw.bb1 ], [ 1, %sw.bb ]

can we assume it that we can exit the loop without iterate out each cases instructions?

jbhateja added a comment.EditedOct 17 2017, 3:03 AM

Hi Jun Ryoung Ju,

There is a open revision under review https://reviews.llvm.org/D38494 on this.

It seems fix here are inspired from that revision which is an overlap, revision is already open and its comments have been addressed.

Thanks

junryoungju added a comment.EditedOct 17 2017, 3:49 AM

Hi Jun Ryoung Ju,

There is a open revision under review https://reviews.llvm.org/D38494 on this.

It seems fix here are inspired from that revision which is an overlap, revision is already open and its comments have been addressed.

Thanks

Yes, but I created new reversion because of this isn't a same solution.
and that cannot solve on this code, is can solve on https://reviews.llvm.org/D38494
so this is actually different for now

Rollback patch.

junryoungju added a comment.EditedOct 17 2017, 6:49 PM

This will handle that https://reviews.llvm.org/D38494 cannot handle.

such as this code.

int foo() {
    int start = 0;
    int end = 10000;
    int tag = 0;
    do {
        tag = 0;
        if (start < end) start+=2, tag = 1;
    } while (tag);
    return 0;
}

Please review the code after I handle SwitchInst on code.
such as following code.

void test() {
    auto step = 0;
    while(true) {
        switch (step) {
            case 0:
                step = 1;
                break;
            case 1:
                step = 2;
                break;
            case 2:
                step = 3;
                break;
            case 3:
                return;
        }
    }
}
  • Please rebase this patch with trunk.
  • Add a test case which asserts on scalar evoluition dumps.

It shall expedite the review process.

  • Please rebase this patch with trunk.
  • Add a test case which asserts on scalar evoluition dumps.

    It shall expedite the review process.

Thanks :>

junryoungju retitled this revision from [ScalarEvolution] Handling for ICmp occuring in the evolution chain. to [ScalarEvolution] Handling Conditional Instruction in SCEV chain..Oct 17 2017, 8:33 PM
This comment was removed by junryoungju.

Now this checks that ICmp is false or true cond.

Few general concerns.

1/  Always submit patch with few test cases (valid for any patch).
2/  Patch https://reviews.llvm.org/D38494 is taking care of this problem. I don't see what is your point in duplicating the efforts.

You are on the review list also.

There is no derth of open problems to tackle. I hope you understand my point here.

Thanks.

Few general concerns.

1/  Always submit patch with few test cases (valid for any patch).
2/  Patch https://reviews.llvm.org/D38494 is taking care of this problem. I don't see what is your point in duplicating the efforts.

You are on the review list also.

There is no derth of open problems to tackle. I hope you understand my point here.

Thanks.

Because we are solving issue as another way.

junryoungju updated this revision to Diff 120170.EditedOct 24 2017, 6:27 PM

Now we evolute SelectInst directly.
Also added some testcases.

Fixed testcase checks.

Fixed wrong testcase uploaded.

junryoungju added a comment.EditedOct 25 2017, 6:09 PM

For this time, I am actually working on SwitchInst.
before I work on it, I want to check that this patch has no issue

review please @sanjoy

sanjoy requested changes to this revision.Oct 26 2017, 2:32 PM
sanjoy added inline comments.
test/Analysis/ScalarEvolution/cond_1.ll
15

I thought @jbhateja 's patch should already get cases like these? If you care about switch instructions, you probably need to add support for switches to the rewriter he's written (it already supports select).

This revision now requires changes to proceed.Oct 26 2017, 2:32 PM
junryoungju added inline comments.Oct 26 2017, 6:10 PM
test/Analysis/ScalarEvolution/cond_1.ll
15

That isn't really support the select instruction.
I think you should try cond_3.ll for it.

also, that patch doesn't generates correct clang assemblies. (not emitting optimized assemblies)
please check that @jbhateja 's patch is "really" supporting a select or icmp instruction.

and I don't think that rewriting for these works ins't really good idea.
because almost all visitors will register SCEVs on value map, (SCEVCallbackVM)
remember the first goal, its not evolution a ICmp or Select instruction on createAddRecFromPHI
we do not need to rewrite it if we are going to evolute it, we are just check that "condition" is actually true or false on now.

if we are going to evolute it or visiting it. we just need to create a SCEVConditional, just analyze backend taken count dynamically.
just comparing each step.

now I am going to talking about switch instruction.
if we are not handling it as a "evolution" I am just going to check that switch has like step.
like this

switch(step)
{
    case 0: step++;
    case 1: step++;
    case 2: step++;
    case 3: break;
}

or not, I am just going to create SCEVConditional for dynamic analyize.

This is why I noticed that this is reversion of @jbhateja 's patch

chandlerc requested changes to this revision.Oct 26 2017, 6:37 PM
chandlerc added a subscriber: chandlerc.

Few general concerns.

1/  Always submit patch with few test cases (valid for any patch).
2/  Patch https://reviews.llvm.org/D38494 is taking care of this problem. I don't see what is your point in duplicating the efforts.

You are on the review list also.

There is no derth of open problems to tackle. I hope you understand my point here.

Thanks.

Because we are solving issue as another way.

I see you have already joined the review of the other patch. If there is a problem with that approach, please clearly explain why and what the alternative approach would be on that patch as part of code review. Perhaps the author will wish you to provide a patch for this alternative, but it doesn't make sense to continue reviewing *both* patches in parallel. We should figure out which approach is best first, and then review the patch that uses that approach. As the other patch (D38494) seems to be quite active, I would suggest consolidating the discussion around approach on that patch.

Until that is resolved, I don't think it makes sense to continue iterating in review here. It just fragments the discussion.

junryoungju abandoned this revision.Oct 26 2017, 6:56 PM

Few general concerns.

1/  Always submit patch with few test cases (valid for any patch).
2/  Patch https://reviews.llvm.org/D38494 is taking care of this problem. I don't see what is your point in duplicating the efforts.

You are on the review list also.

There is no derth of open problems to tackle. I hope you understand my point here.

Thanks.

Because we are solving issue as another way.

I see you have already joined the review of the other patch. If there is a problem with that approach, please clearly explain why and what the alternative approach would be on that patch as part of code review. Perhaps the author will wish you to provide a patch for this alternative, but it doesn't make sense to continue reviewing *both* patches in parallel. We should figure out which approach is best first, and then review the patch that uses that approach. As the other patch (D38494) seems to be quite active, I would suggest consolidating the discussion around approach on that patch.

Until that is resolved, I don't think it makes sense to continue iterating in review here. It just fragments the discussion.

I am removing this from issue. I will focus on https://reviews.llvm.org/D38494 .