This is an archive of the discontinued LLVM Phabricator instance.

[DA] Handle mismatching loop levels by updating the numbering scheme
AbandonedPublic

Authored by bmahjour on Oct 1 2021, 1:54 PM.

Details

Reviewers
Meinersbur
fhahn
artemrad
Group Reviewers
Restricted Project
Summary

To represent various loop levels within a nest, DA implements a special numbering scheme (see comment atop establishNestingLevels). The goal of this numbering scheme appears to be representing each unique loop distinctively by using as little memory as possible. This numbering scheme is simple when the source and destination of the dependence are in the same loop. In such cases the level is simply the depth of the loop in which src and dst reside. When the src and dst are not in the same loop, we could run into the following situation exposed by https://reviews.llvm.org/D71539.

Suppose we have the following loop structure and memory access:

1: L1
2:    L2
3:    A[s1] 
4:    L3
5:      A[s2]

Further assume that the subscript of the first access (ie s1) is an LCSSA phi with a single incoming value from L2. With the change in D71539, the SCEV expression for s1 (queried at the scope of L1 where the access on line 3 occurs) will be an AddRec whose loop is L2.

Since the access A[s1] occurs at a loop depth of 1, but its subscript references a loop at depth 2, the current numbering scheme gets confused and assigns the same level value to both s1 and s2, which further causes classifyPair to incorrectly classify the subscript pair. In an assert enabled build, this would trigger the following assertion:

bool llvm::DependenceInfo::testSIV(const llvm::SCEV *, const llvm::SCEV *, unsigned int &, llvm::FullDependence &, llvm::DependenceInfo::Constraint &, const llvm::SCEV *&) const: Assertion `CurLoop == DstAddRec->getLoop() && "both loops in SIV should be same"' failed.

This patch proposes a slight change to the numbering scheme using the subscript SCEVs to treat the memory access instructions in such cases, as if they appeared inside the AddRec loop of the inner-most subscript.

An alternative fix can be reviewed in D110973.

Diff Detail

Event Timeline

bmahjour created this revision.Oct 1 2021, 1:54 PM
bmahjour requested review of this revision.Oct 1 2021, 1:54 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 1 2021, 1:54 PM
bmahjour edited the summary of this revision. (Show Details)Oct 1 2021, 1:54 PM
bmahjour edited the summary of this revision. (Show Details)Oct 1 2021, 2:02 PM
bmahjour added reviewers: Meinersbur, Florian, Restricted Project.Apr 20 2022, 8:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2022, 8:49 AM
bmahjour edited reviewers, added: fhahn; removed: Florian.Apr 20 2022, 8:54 AM
Meinersbur added a comment.EditedApr 20 2022, 12:10 PM

The test fails for me when applying the patch. Is D71539 supposed to be a parent patch?

Is the loop nest in the example supposed to look like this?

for (L1) {
  for (L2) {
    s1 = ...;
  }
  A[s1]; // getScev(s1) = SCEVAddRecExp<L1,...>
  for (L3) {
    A[s2];
  }
}

The problem seems to be that DA assumes that s1 depends on the induction variable of L1, even though the access is located outside of it. getSCEVAtScope may resolve it to a computed value, but may not if it does not understand the exit condition. In checkSubscript this is handled by calling isLoopInvariant first. However, the loop-invariance is only checked if there is no SCEVAddRecExpr. Might the problem be that an SCEVAddRecExpr may actually be loop-invariant, if evaluated outside the loop's scope?

llvm/lib/Analysis/DependenceAnalysis.cpp
3590

Consider removing unrelated changes from this patch.

3615

Is there a reason this is moved?

llvm/test/Analysis/DependenceAnalysis/MismatchingNestLevels.ll
11

Could you add some pseudocode helping to understand the case?

24

It would be better to not rely on undef in test cases. Handling of undef can change, depending on what an analysis thinks runs faster.

I was able to reproduce the problem without D71539:

$ cat da-lcssa.c
void foobar(long n, double *A) {
    long  i;
    for (i = 0; i*n <= n*n; ++i) {
        A[i] = i;
    }
    A[i] = i;
}

$ cat da-lcssa.ll
; ModuleID = 'da-lcssa.c'
source_filename = "da-lcssa.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: argmemonly nofree norecurse nosync nounwind writeonly uwtable
define dso_local void @foobar(i64 noundef %n, ptr nocapture noundef writeonly %A) local_unnamed_addr #0 {
entry:
  %mul1 = mul nsw i64 %n, %n
  br label %for.body

for.body:                                         ; preds = %entry, %for.body
  %i.012 = phi i64 [ 0, %entry ], [ %inc, %for.body ]
  %conv = sitofp i64 %i.012 to double
  %arrayidx = getelementptr inbounds double, ptr %A, i64 %i.012
  store double %conv, ptr %arrayidx, align 8, !tbaa !3
  %inc = add nuw nsw i64 %i.012, 1
  %mul = mul nsw i64 %inc, %n
  %cmp.not = icmp sgt i64 %mul, %mul1
  br i1 %cmp.not, label %for.end, label %for.body, !llvm.loop !7

for.end:                                          ; preds = %for.body
  %conv2 = sitofp i64 %inc to double
  %arrayidx3 = getelementptr inbounds double, ptr %A, i64 %inc
  store double %conv2, ptr %arrayidx3, align 8, !tbaa !3
  ret void
}

attributes #0 = { argmemonly nofree norecurse nosync nounwind writeonly uwtable "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }

!llvm.module.flags = !{!0, !1}
!llvm.ident = !{!2}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{i32 7, !"uwtable", i32 2}
!2 = !{!"clang version 15.0.0 (/home/meinersbur/src/llvm-project/clang 2d92ee97f1afb3657579a46a7dd4611b61e9cc16)"}
!3 = !{!4, !4, i64 0}
!4 = !{!"double", !5, i64 0}
!5 = !{!"omnipotent char", !6, i64 0}
!6 = !{!"Simple C/C++ TBAA"}
!7 = distinct !{!7, !8, !9}
!8 = !{!"llvm.loop.mustprogress"}
!9 = !{!"llvm.loop.unroll.disable"}


$ opt da-lcssa.ll -disable-output "-passes=print<da>" -aa-pipeline=basic-aa
'Dependence Analysis' for function 'foobar':
Src:  store double %conv, ptr %arrayidx, align 8, !tbaa !3 --> Dst:  store double %conv, ptr %arrayidx, align 8, !tbaa !3
  da analyze - none!
Src:  store double %conv, ptr %arrayidx, align 8, !tbaa !3 --> Dst:  store double %conv2, ptr %arrayidx3, align 8, !tbaa !3
  da analyze - opt: /home/meinersbur/src/llvm-project/llvm/lib/Analysis/DependenceAnalysis.cpp:1160: bool llvm::DependenceInfo::strongSIVtest(const llvm::SCEV*, const llvm::SCEV*, const llvm::SCEV*, const llvm::Loop*, unsigned int, llvm::FullDependence&, llvm::DependenceInfo::Constraint&) const: Assertion `0 < Level && Level <= CommonLevels && "level out of range"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /home/meinersbur/build/llvm-project/release/bin/opt da-lcssa.ll -disable-output -passes=print<da> -aa-pipeline=basic-aa
 #0 0x00007fa14c381f14 PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
 #1 0x00007fa14c37f6b4 SignalHandler(int) Signals.cpp:0:0
 #2 0x00007fa1492303c0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x143c0)
 #3 0x00007fa148c9303b raise /build/glibc-sMfBJT/glibc-2.31/signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #4 0x00007fa148c72859 abort /build/glibc-sMfBJT/glibc-2.31/stdlib/abort.c:81:7
 #5 0x00007fa148c72729 get_sysdep_segment_value /build/glibc-sMfBJT/glibc-2.31/intl/loadmsgcat.c:509:8
 #6 0x00007fa148c72729 _nl_load_domain /build/glibc-sMfBJT/glibc-2.31/intl/loadmsgcat.c:970:34
 #7 0x00007fa148c84006 (/lib/x86_64-linux-gnu/libc.so.6+0x34006)
 #8 0x00007fa14af1201c llvm::DependenceInfo::strongSIVtest(llvm::SCEV const*, llvm::SCEV const*, llvm::SCEV const*, llvm::Loop const*, unsigned int, llvm::FullDependence&, llvm::DependenceInfo::Constraint&) const (/home/meinersbur/build/llvm-project/release/bin/opt+0x1c8201c)
 #9 0x00007fa14af1eb6a llvm::DependenceInfo::testSIV(llvm::SCEV const*, llvm::SCEV const*, unsigned int&, llvm::FullDependence&, llvm::DependenceInfo::Constraint&, llvm::SCEV const*&) const (.constprop.0) DependenceAnalysis.cpp:0:0
#10 0x00007fa14af23eb8 llvm::DependenceInfo::depends(llvm::Instruction*, llvm::Instruction*, bool) (/home/meinersbur/build/llvm-project/release/bin/opt+0x1c93eb8)
#11 0x00007fa14af255bd dumpExampleDependence(llvm::raw_ostream&, llvm::DependenceInfo*) DependenceAnalysis.cpp:0:0
#12 0x00007fa14af25a76 llvm::DependenceAnalysisPrinterPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/meinersbur/build/llvm-project/release/bin/opt+0x1c95a76)
#13 0x00007fa14c6adad6 llvm::detail::PassModel<llvm::Function, llvm::DependenceAnalysisPrinterPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0
#14 0x00007fa14b99f766 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/meinersbur/build/llvm-project/release/bin/opt+0x270f766)
#15 0x00007fa14ad5f936 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) NVPTXTargetMachine.cpp:0:0
#16 0x00007fa14b99e179 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/meinersbur/build/llvm-project/release/bin/opt+0x270e179)
#17 0x00007fa14ad5fdc6 llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) NVPTXTargetMachine.cpp:0:0
#18 0x00007fa14b99bc6f llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/meinersbur/build/llvm-project/release/bin/opt+0x270bc6f)
#19 0x00007fa14a9a7787 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::StringRef>, llvm::ArrayRef<llvm::PassPlugin>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) (/home/meinersbur/build/llvm-project/release/bin/opt+0x1717787)
#20 0x00007fa14a9baa2c main (/home/meinersbur/build/llvm-project/release/bin/opt+0x172aa2c)
#21 0x00007fa148c740b3 __libc_start_main /build/glibc-sMfBJT/glibc-2.31/csu/../csu/libc-start.c:342:3
#22 0x00007fa14a99948e _start (/home/meinersbur/build/llvm-project/release/bin/opt+0x170948e)
Aborted (core dumped)
congzhe added inline comments.
llvm/lib/Analysis/DependenceAnalysis.cpp
752

Would it be better to pass Pairs by reference?

And rename Pairs to Pair (since Pairs is used as an integer elsewhere)?

Meinersbur added inline comments.Apr 20 2022, 4:09 PM
llvm/lib/Analysis/DependenceAnalysis.cpp
752

Even better: Pass ArrayRef<Subscript>

I think the correct fix is to handle AddRecExpr from a loop outside LoopNest like other non-AddRec expressions. At the moment this means "non-linear" (like D110973) unless it is completely loop-invariant. This is a limitation of the current checkSubscript that does not look into other kinds of SCEV expressions. In @test1/@test2 this is all we can do because %c.1.lcssa (which has the value of %add27 when the loop exists) has an undefined value because with the backedge branch on undef, %c.1.lcssa is undefined which usually is not considered "linear".

DependenceInfo::depends should actually call getSCEVAtScope instead of just getScev (like tryDelinearize already does), because if ScalarEvolution can understand the backedge taken count, the SCEVAddRecExpr will be simplified to that result.

I think the correct fix is to handle AddRecExpr from a loop outside LoopNest like other non-AddRec expressions. At the moment this means "non-linear" (like D110973) unless it is completely loop-invariant. This is a limitation of the current checkSubscript that does not look into other kinds of SCEV expressions. In @test1/@test2 this is all we can do because %c.1.lcssa (which has the value of %add27 when the loop exists) has an undefined value because with the backedge branch on undef, %c.1.lcssa is undefined which usually is not considered "linear".

DependenceInfo::depends should actually call getSCEVAtScope instead of just getScev (like tryDelinearize already does), because if ScalarEvolution can understand the backedge taken count, the SCEVAddRecExpr will be simplified to that result.

Are you essentially suggesting that we follow the approach in D110973, and improve it by using getSCEVAtScope() such that we may be able to infer some of those AddRec (like in the lcssa case) to be loop invariant with respect to LoopNest? If that is the case then we may want to add another test that is a mirror of @test1 but not using undef as the loop exit condition?

Are you essentially suggesting that we follow the approach in D110973, and improve it by using getSCEVAtScope() such that we may be able to infer some of those AddRec (like in the lcssa case) to be loop invariant with respect to LoopNest?

yes. ScalarEvolution::computeLoopDisposition() == LoopInvariant might be the designated API for this.

If that is the case then we may want to add another test that is a mirror of @test1 but not using undef as the loop exit condition?

We preferably do not want undef at all in test cases, it makes them fragile.

@Meinersbur Thanks for the test case that fails without D71539. The test case I reported was bugpoint reduced and that's why the undefs are there. I'll update D110973 with a test that doesn't use undef and that passes today but would fail with D71539. I'll also add your test to it.

As for treating the case as non-linear, I'd be fine with that. I'll update D110973 to use SCEV LoopDisposition.

bmahjour abandoned this revision.May 3 2022, 3:16 PM