This is an archive of the discontinued LLVM Phabricator instance.

[LoopInterchange] Disallow interchange when memory accesses are guarded by control flow (PR48057)
Needs ReviewPublic

Authored by TaWeiTu on Mar 2 2021, 8:29 PM.

Details

Reviewers
fhahn
Whitney
Summary

If instructions that have side effects are guarded by control flow,
the order of execution might differ after interchanging the loops
and results in incorrect behaviour (the branch condition may depends
on the execution order).

This fixes https://bugs.llvm.org/show_bug.cgi?id=48057

Diff Detail

Event Timeline

TaWeiTu created this revision.Mar 2 2021, 8:29 PM
TaWeiTu requested review of this revision.Mar 2 2021, 8:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2021, 8:29 PM
TaWeiTu updated this revision to Diff 327658.Mar 2 2021, 8:32 PM
This comment was removed by TaWeiTu.
TaWeiTu updated this revision to Diff 327662.Mar 2 2021, 8:42 PM
This comment was removed by TaWeiTu.
TaWeiTu retitled this revision from [LoopInterchange] Disallow interchange when memory accesses are guarded by control flow to [LoopInterchange] Disallow interchange when memory accesses are guarded by control flow (PR48057).Mar 2 2021, 9:06 PM
Harbormaster completed remote builds in B91723: Diff 327658.
TaWeiTu updated this revision to Diff 327836.Mar 3 2021, 9:52 AM
  • Remove unnecessary lines in the test case
This comment was removed by Whitney.
fhahn added a comment.Mar 5 2021, 9:23 AM

Is this fixing a similar issue to D91228 /D87879?

Is this fixing a similar issue to D91228 /D87879?

Yes, the intention is the same.
I didn't know about these patches before, and I just test the reproducer PR47523 (the one D87879 is trying to address) against my patch, and the issue is indeed resolved (D91228 is addressing the exact same PR as this one does).

The following test in lcssa.ll is failing:

define void @lcssa_05(i32* %ptr) {
entry:
  br label %outer.header

outer.header:                                     ; preds = %outer.inc, %entry
  %iv.outer = phi i64 [ 1, %entry ], [ %iv.outer.next, %outer.inc ]
  br label %for.body3

for.body3:                                        ; preds = %bb3, %outer.header
  %iv.inner = phi i64 [ %iv.inner.next, %bb3 ], [ 1, %outer.header ]
  br i1 undef, label %bb2, label %bb3

bb2:                                              ; preds = %for.body3
  %arrayidx5 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @A, i64 0, i64 %iv.inner, i64 %iv.outer
  %vA = load i32, i32* %arrayidx5
  %arrayidx9 = getelementptr inbounds [100 x [100 x i32]], [100 x [100 x i32]]* @C, i64 0, i64 %iv.inner, i64 %iv.outer
  %vC = load i32, i32* %arrayidx9
  %add = add nsw i32 %vA, %vC
  br label %bb3

bb3:                                              ; preds = %bb2, %for.body3
  %addp = phi i32 [ %add, %bb2 ], [ 0, %for.body3 ]
  store i32 %addp, i32* %ptr
  %iv.inner.next = add nuw nsw i64 %iv.inner, 1
  %exitcond = icmp eq i64 %iv.inner.next, 100
  br i1 %exitcond, label %outer.inc, label %for.body3

outer.inc:                                        ; preds = %bb3
  %iv.inner.lcssa = phi i64 [ %iv.inner, %bb3 ]
  %iv.outer.next = add nsw i64 %iv.outer, 1
  %cmp = icmp eq i64 %iv.outer.next, 100
  br i1 %cmp, label %outer.header, label %for.exit

for.exit:                                         ; preds = %outer.inc
  %iv.inner.lcssa.lcssa = phi i64 [ %iv.inner.lcssa, %outer.inc ]
  store i64 %iv.inner.lcssa.lcssa, i64* @Y
  br label %for.end16

for.end16:                                        ; preds = %for.exit
  ret void
}

The test is expecting the loops to be interchanged, but judging from br i1 undef, label %bb2, label %bb3 I think the interchange should probabily be considered invalid in this case?

TaWeiTu updated this revision to Diff 328603.Mar 5 2021, 11:44 AM
  • Add testcase of PR47523
Whitney added inline comments.Mar 7 2021, 7:19 AM
llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1153
TaWeiTu updated this revision to Diff 328863.Mar 7 2021, 7:32 AM
  • Use LoopInfo::getLoopFor
Whitney added inline comments.Mar 7 2021, 7:44 AM
llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1150

In what scenario, outer loop contains instructions guarded by control flow?
(I assume loop interchange operate on tight loops.)

TaWeiTu updated this revision to Diff 328873.Mar 7 2021, 8:41 AM
  • Only check basic blocks in the inner loop
TaWeiTu marked 2 inline comments as done.Mar 7 2021, 8:41 AM
TaWeiTu added inline comments.
llvm/lib/Transforms/Scalar/LoopInterchange.cpp
1150

Thanks for pointing that out! I think you are right.

Have you considered the case where the blocks guarded by control flow doesn't have memory operation, but affect memory operation through phi?

bb1:
  br %cond, bb2, bb3
bb2:
  br bb4
bb3:
  br bb4
bb4:
  %t = phi [ %t2, %bb2 ], [ %t3, %bb3 ]
  store %t, %p
TaWeiTu updated this revision to Diff 332204.Mar 21 2021, 11:39 PM

Consider instructions that are indirectly affected by the control flow

Have you considered the case where the blocks guarded by control flow doesn't have memory operation, but affect memory operation through phi?

bb1:
  br %cond, bb2, bb3
bb2:
  br bb4
bb3:
  br bb4
bb4:
  %t = phi [ %t2, %bb2 ], [ %t3, %bb3 ]
  store %t, %p

I've updated the patch to perform a more comprehensive check that consider instructions that are indirectly guarded by the control flow.

Gentle ping

@TaWeiTu How do you want to handle the lcssa.ll failure? Maybe we can only disallow memory accesses that are guarded by loop variant condition?

TaWeiTu updated this revision to Diff 335988.Apr 7 2021, 8:46 PM
  • Skip loop-invariant-guarded blocks

@TaWeiTu How do you want to handle the lcssa.ll failure? Maybe we can only disallow memory accesses that are guarded by loop variant condition?

Now only loop-variant-guarded blocks are considered "bad". lcssa.ll should be fixed.
We should probably also add logic to check whether the orders of memory accesses actually change after interchanging the loop, but I think it's beyond the scope of this patch and can be improved in the future.

Whitney added inline comments.Apr 8 2021, 9:08 AM
llvm/lib/Transforms/Scalar/LoopInterchange.cpp
989
  br cond, BB1, BB2
BB1:
  br BB3
BB2
  br BB3

BB3 should not guarded by a loop variant block, but this function would return BB1, BB2, and BB3.

congzhe added inline comments.
llvm/lib/Transforms/Scalar/LoopInterchange.cpp
989

I agree with this concern. I'm wondering if we can do this: instead of the entire worklist algorithm in this function, can we just loop over all BBs in the inner loop that are successors of loop-variant cmps, and check if they dominate the loop latch?

Currently in loop interchange the inner loop latch is also the (only) inner loop exiting block, therefore a BB that does not dominate the inner latch can be considered as involved in control-flow divergence. This way the treatment of the above case would be seemingly easier.

1035

is OpBlock is not from LoopVariantGuardedBlocks, would interchange be legal?

1048

would it simplify the logic if we use only GuardedInst instead of both GuardedInst and Worklist, and directly check for every instruction I in GuardedInst, if any of I or users of I accesses memory then we bail out (detailed checks in the suggestion below)?

This way we can also avoid looping over all instructions in the loop as in line 1061.

1066

IMHO maybe we make the check more fine-grained.

  1. If the value being stored is a loop-invariant, then it would not break legality.
  1. Otherwise if we store (loop-variants) to the same location, then bail out.
  1. Otherwise if we store (loop-variants) not to the same location and the memory locations do not alias, then it would not break legality. I'm thinking of cases like cond && B[i][j] = A[i][j] inside the inner loop.

Suggestion No.3 might be less straightforward to implement, but No.1/No.2 is hopefully more straightforward to implement.

llvm/test/Transforms/LoopInterchange/control-flow.ll
153

is %tobool a loop-invariant? I guess interchange in this case is actually legal should %0 is a loop-invariant?

If in this IR we hoist %0 outside of the loop nest (can we add the corresponding test case), I guess loop interchange should not bail out but with this patch it would bail out. Maybe it makes more sense to use SCEV::isLoopInvariant() rather than Loop::isLoopInvariant() in function getLoopVariantGuardedBlocks().