This is an archive of the discontinued LLVM Phabricator instance.

[LCSSA] Don't invalidate entire loop in SCEV
ClosedPublic

Authored by nikic on Apr 27 2023, 4:02 AM.

Details

Summary

We already invalidate each individual instruction for which LCSSA is formed in formLCSSAForInstructions(), so I don't see a reason why we would need to invalidate the entire loop on top of that.

I believe we also no longer need the instruction-level invalidation now that SCEV looks through LCSSA phis, but I'll leave that for a separate patch, as it's less obvious.

Diff Detail

Event Timeline

nikic created this revision.Apr 27 2023, 4:02 AM
nikic requested review of this revision.Apr 27 2023, 4:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2023, 4:02 AM
fhahn accepted this revision.Apr 28 2023, 2:29 AM

LGTM! AFAICT the current code should properly invalidate individual values now.

This revision is now accepted and ready to land.Apr 28 2023, 2:29 AM
This revision was landed with ongoing or failed builds.Apr 28 2023, 3:17 AM
This revision was automatically updated to reflect the committed changes.
vporpo added a subscriber: vporpo.Jun 1 2023, 10:39 AM

Hi, these changes that remove SCEV invalidation from LCSSA are causing compiler crashes:

5362a0d859d8 [LCSSA] Remove unused ScalarEvolution argument (NFC)
5cbb9f7a58d9 [LCSSA] Don't invalidate SCEV
0aed0dbec243 [LCSSA] Don't invalidate entire loop in SCEV

The incorrect code is generated by the SCEV expander. Here is a reproducer:

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-grtev4-linux-gnu"

define i32 @f(i1 %cond1) #0 !prof !0 {
entry:
  br label %loop1

loop1:
  %ld = load i64, ptr null, align 8
  br i1 %cond1, label %loop1, label %exit1, !prof !1

exit1:
  br label %entry2

entry2:
  br label %loop2

loop2:
  %phi = phi i64 [ 0, %entry2 ], [ %inc, %loop2 ]
  %inc = add i64 %phi, 1
  %gep = getelementptr [268435454 x ptr], ptr null,  i64 %phi
  store ptr null, ptr %gep, align 8
  store ptr null, ptr null, align 8
  %cond2 = icmp eq i64 %ld, %phi
  br i1 %cond2, label %exit2, label %loop2

exit2:
  ret i32 0
}

attributes #0 = { "target-cpu"="haswell" }

!0 = !{!"function_entry_count", i64 1}
!1 = !{!"branch_weights", i32 123, i32 472}

Run: opt -p 'slp-vectorizer,loop-unroll' .

This is the output of the crash:

Instruction does not dominate all uses!
  %ld.lcssa2 = phi i64 [ %ld, %loop1 ]
  %0 = add i64 %ld.lcssa2, 1
Instruction does not dominate all uses!
  %ld.lcssa2 = phi i64 [ %ld, %loop1 ]
  %1 = icmp ult i64 %ld.lcssa2, 7
LLVM ERROR: Broken module found, compilation aborted!

SLP is just creating some SCEV objects and is not modifying the code at all. SCEV seems to go stale after LCSSA, so the SCEV expander in LoopUnrollRuntime.cpp:747 is generating incorrect code.

Could you please take a look and revert the patches if this is not an easy fix?

Thanks!

nikic added a comment.Jun 6 2023, 1:21 AM

@vporpo Thanks for the report! I've reverted the problematic changes in https://github.com/llvm/llvm-project/commit/143ed21b26b2c68695e9f74f0ce4632b8a2e000b and https://github.com/llvm/llvm-project/commit/3c9cf023db32ba2cfa1e052ddc58f57dd080995c and have added a test in https://github.com/llvm/llvm-project/commit/79115aebb759eb1ef8cdcc35f228d88f70c7bfc2.

In this test case, the BECount of the second loop depends on %ld defined in the first loop. LCSSA construction does not change this, as SCEV looks through LCSSA phi nodes.

However, when we peel the loop, the exit value is now a phi node between the value %ld from the loop, and the %ld.peel clone from the peeled iteration. Directly using the SCEV node for %ld in the BECount of the second loop is no longer valid.

Loop peeling tries to handle this by invalidating the (former LCSSA, now real) phi nodes in the exits. This worked previously, because the BECount calculation for the second loop would go through that LCSSA phi node and cache it's value in the ValueToExpr map. This means that if we invalidate the phi node, we would also forget memoized values for the corresponding expression, which includes the BECount for the second loop. However, without the invalidation during LCSSA construction, there will be no value cached for the phi node, so we also won't invalidate the corresponding expression.

It's a bit hard to determine who is at fault here, because both LCSSA and peeling do pretty reasonable things. I think maybe the best fix for this issue would be to make LCSSA not invalidate the value, but rather explicitly create the SCEV for the new LCSSA phi node, if one exists for the instruction in the loop. @fhahn Do you have any thoughts on this?

vporpo added a comment.Jun 6 2023, 4:13 PM

Thanks @nikic for looking into this!