This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Support/
-
Support/
-
ScopHelper.cpp

Differential D41692

[Polly][WIP] Remove immediate dominator heuristic for error block detection.
AbandonedPublic

Authored by Meinersbur on Jan 3 2018, 2:20 AM.

Download Raw Diff

Details

Reviewers

grosser
bollu
jdoerfert

Summary

The heuristic forces blocks that directly follows a loop header to not to be considered error blocks. It was introduced in r249611 with the following commit message:

This replaces the support for user defined error functions by a
heuristic that tries to determine if a call to a non-pure function
should be considered "an error". If so the block is assumed not to be
executed at runtime. While treating all non-pure function calls as
errors will allow a lot more regions to be analyzed, it will also
cause us to dismiss a lot again due to an infeasible runtime context.
This patch tries to limit that effect. A non-pure function call is
considered an error if it is executed only in conditionally with
regards to a cheap but simple heuristic.

In the code below CCK_Abort2() would be considered as an error block, but not CCK_Abort1() due to this heuristic.

for (int i = 0; i < n; i+=1) {
  if (ErrorCondition1)
    CCK_Abort1(); // No __attribute__((noreturn))
  if (ErrorCondition2)
    CCK_Abort2(); // No __attribute__((noreturn))
}

This does not seem useful to me. Checking error conditions in the beginning of some work is quite common. It causes a switch default-case to be not considered an error block in SPEC's cactuBSSN. The comment justifying the heuristic mentions a "load", which does not seem to be applicable here. I propose to remove this heuristic.

This patch currently fails the following unrelated test cases:

Polly :: ScopDetect/mod_ref_read_pointer.ll
Polly :: ScopInfo/max-loop-depth.ll
Polly :: ScopInfo/mod_ref_access_pointee_arguments.ll
Polly :: ScopInfo/mod_ref_read_pointee_arguments.ll
Polly :: ScopInfo/mod_ref_read_pointer.ll
Polly :: ScopInfo/mod_ref_read_pointers.ll

Diff Detail

Repository: rPLO Polly

Event Timeline

Meinersbur created this revision.Jan 3 2018, 2:20 AM

Herald added a reviewer: bollu. · View Herald TranscriptJan 3 2018, 2:20 AM

It would be interesting to see if this is relevant for libquantum to work. I am fine dropping this for now,

grosser accepted this revision.Jan 10 2018, 7:55 AM

This revision is now accepted and ready to land.Jan 10 2018, 7:55 AM

I am confused. It seems you want to be smarter about the error blocks (which is necessary!) but this will just make them more greedy (too greedy in my experience). Did anybody check how this affects the test-suite or why the test cases fail [, after all, this says "ready to land"] ?

If you want a generally smarter way to determine and work with error blocks I could also send you some patches.

I ran a comparison on the test-suite and here is the result:

Program                                                                        leone_N20_polly  leone_N30_errheuristic diff
 SingleSource/Benchmarks/Misc/flops                                            4.332            6.311                    45.7% 
 S...urce/Benchmarks/Polybench/linear-algebra/solvers/gramschmidt/gramschmidt  1.443            1.637                    13.4% 
 Polybench/Polybench-421/stencils/adi/P421_adi                                21.096           19.344                    -8.3%  
 External/SPEC/CFP2017rate/507.cactuBSSN_r/507.cactuBSSN_r                    31.899           33.395                     4.7%

It shows that e.g. Misc/flops runs 45% slower with this patch (comparing medians of three runs). It is, however, the same performance as -O3 run, i.e. Polly makes it slower.
507.cactuBSSN_r shows up I think because Polly is able to tile it, but the default size is ... not a good one for this benchmark. That benchmark should also need improvement in the delinearizer, so I an wondering why it even shows up here.

I proposed this patch because I do not see the idea behind this heuristic. It basically only adds an exception for the first condition in a loop body. What makes the first condition so different than the second? IMHO if a conditional contains a (non-pure/modelable) function call, we should assume that it is not executed because the call would serialize all dependencies in the loops it is in (this only hope would be that the isl rescheduler unswitches the loop). Or maybe I misunderstood the idea behind the heuristic? If so, could you explain the intention?

I personally would love to see more intelligent heuristics.

In D41692#1052643, @Meinersbur wrote:

I ran a comparison on the test-suite and here is the result:

Program                                                                        leone_N20_polly  leone_N30_errheuristic diff
 SingleSource/Benchmarks/Misc/flops                                            4.332            6.311                    45.7% 
 S...urce/Benchmarks/Polybench/linear-algebra/solvers/gramschmidt/gramschmidt  1.443            1.637                    13.4% 
 Polybench/Polybench-421/stencils/adi/P421_adi                                21.096           19.344                    -8.3%  
 External/SPEC/CFP2017rate/507.cactuBSSN_r/507.cactuBSSN_r                    31.899           33.395                     4.7%

Is there a change the number of statically discarded SCoPs due to error blocks (or alternatively a compile time impact)?

It shows that e.g. Misc/flops runs 45% slower with this patch (comparing medians of three runs). It is, however, the same performance as -O3 run, i.e. Polly makes it slower.
507.cactuBSSN_r shows up I think because Polly is able to tile it, but the default size is ... not a good one for this benchmark. That benchmark should also need improvement in the delinearizer, so I an wondering why it even shows up here.

I proposed this patch because I do not see the idea behind this heuristic. It basically only adds an exception for the first condition in a loop body. What makes the first condition so different than the second? IMHO if a conditional contains a (non-pure/modelable) function call, we should assume that it is not executed because the call would serialize all dependencies in the loops it is in (this only hope would be that the isl rescheduler unswitches the loop). Or maybe I misunderstood the idea behind the heuristic? If so, could you explain the intention?

That is why I wrote the comment (below). One possible improvement is (as explained) the post dominator tree. In addition I have patches that looks at the CFG more closely and also consider the already taken error blocks during detection. I'll have to look for them though and send them out via email.

// FIXME: This is a simple heuristic to determine if the load is executed
//        in a conditional. However, we actually would need the control
//        condition, i.e., the post dominance frontier. Alternatively we
//        could walk up the dominance tree until we find a block that is
//        not post dominated by the load and check if it is a conditional
//        or a loop header.

I personally would love to see more intelligent heuristics.

Me too.

In D41692#1051817, @jdoerfert wrote:

I am confused. It seems you want to be smarter about the error blocks (which is necessary!) but this will just make them more greedy (too greedy in my experience). Did anybody check how this affects the test-suite or why the test cases fail [, after all, this says "ready to land"] ?

If you want a generally smarter way to determine and work with error blocks I could also send you some patches.

Hi All,
I checked the test cases. The reason why they fail is exactly the same.
Consider for example, the test case scopInfo/mod_ref_read_pointers.ll. The BB labeled for.body is considered by the function isErrorBlock as an error block. The reason why is that it contains a call function with the “readonly” attribute but we are looking for the “readnone”. I am referring to line 430 in lib/Support/ScopHelper.cpp.
As a consequence ScopBuilder will not build the access functions for the bb and you will end up with an empty scop. It is still not clear to me why we are looking for the “readnone” attribute to check if a function is pure. Is “readonly” too loose?
Looking forward to hearing from you.
Best regards,
Lorenzo Chelini

If a function is readnone, CI->doesNotAccessMemory() already returns true, i.e. isErrorBlock does not care about it.
If a function is readonly, it can read from any address. Therefore there might be a dependency from and to any memory-writing instruction. But by itself, it does not hinder us from transforming a loop if there is no write to memory in it (which probably is rare). Also, the -polly-allow-modref-calls is required to allow it.

In any case, if such a call is only executed conditionally, we may assume that the condition is never true so the function is never called and we can optimize the loop as if the call wasn't there.

With the test case, I think I understand what the heuristic was trying to do. It tries to identify the loop exit condition because the assumption that the loop is never executed is not an optimistic one.

However, after -loop-rotate (Polly always runs after loop-rotate), the loop exit condition is moved away from the loop header:

define void @jd(i32* %A) {
entry:
  br label %for.body

for.body:                                         ; preds = %entry, %for.inc
  %indvars.iv1 = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.inc ]
  %call = call i32 @func(i32* %A)
  %tmp = add nsw i64 %indvars.iv1, 2
  %arrayidx = getelementptr inbounds i32, i32* %A, i64 %tmp
  store i32 %call, i32* %arrayidx, align 4
  br label %for.inc

for.inc:                                          ; preds = %for.body
  %indvars.iv.next = add nuw nsw i64 %indvars.iv1, 1
  %exitcond = icmp ne i64 %indvars.iv.next, 1024
  br i1 %exitcond, label %for.body, label %for.end

for.end:                                          ; preds = %for.inc
  ret void
}

This is the form of code we should expect in heuristics.

I can think of three possibilities on how to fix the test cases:

Detect whether loop rotation has taken place (e.g.: all outgoing edges of the header point to BBs inside the loop)

Modify the heuristic such that it applies only if it is the only way through the loop without leaving it (The postdominance frontier solution suggested by Johannes).

Run -loop-rotate over the failing test-cases.

Michael

Hi All,
Attached the patch to fix the test cases.

0001-Review-number-D41692.-Test-cases-fixed-adding-loop-r.patch3 KBDownload

Best regards,
Lorenzo Chelini.

In D41692#1055374, @chelini wrote:

Hi All,
Attached the patch to fix the test cases.
0001-Review-number-D41692.-Test-cases-fixed-adding-loop-r.patch3 KBDownload

Best regards,
Lorenzo Chelini.

Could you create a new review for your version of this patch (I'd abandon this one)?
Instead of adding -loop-rotate to the test-cases, please update the IR instead. When we have -loop-rotate in the test case itself, we would be implicitly testing loop rotation as well, but the only thing we want to check is the error block detection.

Deprecated in favor if D45274.

Revision Contents

Path

Size

lib/

Support/

ScopHelper.cpp

19 lines

Diff 128498

lib/Support/ScopHelper.cpp

Show First 20 Lines • Show All 393 Lines • ▼ Show 20 Lines	if (R.isTopLevelRegion()) {
for (auto Pred : predecessors(R.getExit()))		for (auto Pred : predecessors(R.getExit()))
if (R.contains(Pred) && !DT.dominates(&BB, Pred))		if (R.contains(Pred) && !DT.dominates(&BB, Pred))
DominatesAllPredecessors = false;		DominatesAllPredecessors = false;
}		}

if (DominatesAllPredecessors)		if (DominatesAllPredecessors)
return false;		return false;

// FIXME: This is a simple heuristic to determine if the load is executed
// in a conditional. However, we actually would need the control
// condition, i.e., the post dominance frontier. Alternatively we
// could walk up the dominance tree until we find a block that is
// not post dominated by the load and check if it is a conditional
// or a loop header.
auto *DTNode = DT.getNode(&BB);
if (!DTNode)
return false;

DTNode = DTNode->getIDom();

if (!DTNode)
return false;

auto *IDomBB = DTNode->getBlock();
if (LI.isLoopHeader(IDomBB))
return false;

for (Instruction &Inst : BB)		for (Instruction &Inst : BB)
if (CallInst *CI = dyn_cast<CallInst>(&Inst)) {		if (CallInst *CI = dyn_cast<CallInst>(&Inst)) {
if (isIgnoredIntrinsic(CI))		if (isIgnoredIntrinsic(CI))
continue;		continue;

// memset, memcpy and memmove are modeled intrinsics.		// memset, memcpy and memmove are modeled intrinsics.
if (isa<MemSetInst>(CI) \|\| isa<MemTransferInst>(CI))		if (isa<MemSetInst>(CI) \|\| isa<MemTransferInst>(CI))
continue;		continue;
▲ Show 20 Lines • Show All 179 Lines • Show Last 20 Lines