This is an archive of the discontinued LLVM Phabricator instance.

Thanks for the patch! I don't think this is the ideal solution, it looks like the source of the issue is a missing check that the AddRecs for step expressions are both in the inner loop. I'll include a diff with the fix below which I am planning to submit.

As you already shared a test case, it would be great if you could clean it up a bit and land just the patch to start with. I left some comments inline. What do you think?

index 5a01a8e4b055..220e48643245 100644
--- a/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
+++ b/llvm/include/llvm/Analysis/LoopAccessAnalysis.h
@@ -253,6 +253,10 @@ public:
     return {};
   }

+  const Loop *getInnermostLoop() const {
+    return InnermostLoop;
+  }
+
 private:
   /// A wrapper around ScalarEvolution, used to add runtime SCEV checks, and
   /// applies dynamic knowledge to simplify SCEV expressions and convert them
diff --git a/llvm/lib/Analysis/LoopAccessAnalysis.cpp b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
index a9a4a820db50..4febf0bfd62c 100644
--- a/llvm/lib/Analysis/LoopAccessAnalysis.cpp
+++ b/llvm/lib/Analysis/LoopAccessAnalysis.cpp
@@ -281,7 +281,7 @@ void RuntimePointerChecking::tryToCreateDiffCheck(

   auto *SrcAR = dyn_cast<SCEVAddRecExpr>(Src->Expr);
   auto *SinkAR = dyn_cast<SCEVAddRecExpr>(Sink->Expr);
-  if (!SrcAR || !SinkAR) {
+  if (!SrcAR || !SinkAR || SrcAR->getLoop() != DC.getInnermostLoop() || SinkAR->getLoop() != DC.getInnermostLoop()) {
     CanUseDiffCheck = false;
     return;
   }

llvm/test/Transforms/LoopVectorize/nested-loop.ll
9	I don't think this test needs the X86 cost model. `target triple` could probably be dropped and replaced by passing `-force-vector-width=4 -force-vector-interleave=1` to `opt`. If it requires the target triple, it needs to be moved to the `X86` sub-directory.
17	This check shouldn't be really needed.
20	could pass `%Len` as `i64`
23	block names could be simplified. e.g. `%for.cond1.preheader.us -> %outer.header`, `%for.cond1.for.end_crit_edge.us: -> %outer.latch`, `%for.body3.us -> %inner` `for.end12 -> %exit`.
31	same could be simplified by dropping `indvars.` prefix.
35	!tbaa shouldn't be needed.
42	This could be simplified, if a use of `%sub.us.lcssa` is needed then a function that only takes an `i32` should be sufficient.
57	attributes shouldn't be needed
60	none of the metadata should be needed

Allen added a subscriber: Allen.Aug 24 2022, 7:48 AM

kpdev42 updated this revision to Diff 455303.Aug 24 2022, 11:29 AM

Thanks for the update. As I mentioned in the previous message, I think it would be good to just land the test in this patch and I'll submit the code change separately. The test looks good to me.

Harbormaster completed remote builds in B183175: Diff 455303.Aug 24 2022, 1:50 PM

In D132490#3746679, @fhahn wrote:

Thanks for the update. As I mentioned in the previous message, I think it would be good to just land the test in this patch and I'll submit the code change separately. The test looks good to me.

Thank you fer the review.
Lets keep buildbot green. Please land both fix and test case on your own

kpdev42 edited the summary of this revision. (Show Details)Aug 25 2022, 12:37 AM

In D132490#3748231, @kpdev42 wrote:

In D132490#3746679, @fhahn wrote:

Thanks for the update. As I mentioned in the previous message, I think it would be good to just land the test in this patch and I'll submit the code change separately. The test looks good to me.

Thank you fer the review.
Lets keep buildbot green. Please land both fix and test case on your own

Thanks, the idea would be to commit the test first with the checks so it passes without the fix. The fix then only shows the improvements on the test case, so the bots would stay green at all time.

In D132490#3748233, @fhahn wrote:

In D132490#3748231, @kpdev42 wrote:

In D132490#3746679, @fhahn wrote:

Thanks for the update. As I mentioned in the previous message, I think it would be good to just land the test in this patch and I'll submit the code change separately. The test looks good to me.

Thank you fer the review.
Lets keep buildbot green. Please land both fix and test case on your own

Thanks, the idea would be to commit the test first with the checks so it passes without the fix. The fix then only shows the improvements on the test case, so the bots would stay green at all time.

I am afraid that this test does not pass without fix, so far would you like to (a) modify the test, so that it would pass (diff check instead of overlap check) or (b) XFAIL the test or (c) maybe something else?

Herald added a subscriber: • pcwang-thead. · View Herald TranscriptSep 5 2022, 5:51 AM

In D132490#3770109, @kpdev42 wrote:

In D132490#3748233, @fhahn wrote:

In D132490#3748231, @kpdev42 wrote:

In D132490#3746679, @fhahn wrote:

Thanks for the update. As I mentioned in the previous message, I think it would be good to just land the test in this patch and I'll submit the code change separately. The test looks good to me.

Thank you fer the review.
Lets keep buildbot green. Please land both fix and test case on your own

Thanks, the idea would be to commit the test first with the checks so it passes without the fix. The fix then only shows the improvements on the test case, so the bots would stay green at all time.

I am afraid that this test does not pass without fix, so far would you like to (a) modify the test, so that it would pass (diff check instead of overlap check) or (b) XFAIL the test or (c) maybe something else?

Oh right, common practice is to generate the (incorrect) check lines for the test so they pass on current main, together with a FIXME comment explaining what the current issue is. I did something like that in 6e56779e6bc168a3acd14f9bf2c4fd3fd9d86bd1.

Anyways, it should be fixed on current main by now.

This revision now requires changes to proceed.Sep 26 2022, 6:46 AM

Revision Contents

Path

Size

compiler-rt/

lib/

fuzzer/

CMakeLists.txt

2 lines

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

2 lines

test/

Transforms/

LoopVectorize/

nested-loop.ll

74 lines

Diff 454895

compiler-rt/lib/fuzzer/CMakeLists.txt

Show First 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	macro(partially_link_libcxx name dir arch)
get_target_flags_for_arch(${arch} target_cflags)		get_target_flags_for_arch(${arch} target_cflags)
if(CMAKE_CXX_COMPILER_ID MATCHES Clang)		if(CMAKE_CXX_COMPILER_ID MATCHES Clang)
get_compiler_rt_target(${arch} target)		get_compiler_rt_target(${arch} target)
set(target_cflags --target=${target} ${target_cflags})		set(target_cflags --target=${target} ${target_cflags})
endif()		endif()
set(cxx_${arch}_merge_dir "${CMAKE_CURRENT_BINARY_DIR}/cxx_${arch}_merge.dir")		set(cxx_${arch}_merge_dir "${CMAKE_CURRENT_BINARY_DIR}/cxx_${arch}_merge.dir")
file(MAKE_DIRECTORY ${cxx_${arch}_merge_dir})		file(MAKE_DIRECTORY ${cxx_${arch}_merge_dir})
add_custom_command(TARGET clang_rt.${name}-${arch} POST_BUILD		add_custom_command(TARGET clang_rt.${name}-${arch} POST_BUILD
COMMAND ${CMAKE_CXX_COMPILER} ${target_cflags} -Wl,--whole-archive "$<TARGET_LINKER_FILE:clang_rt.${name}-${arch}>" -Wl,--no-whole-archive ${dir}/lib/libc++.a -r -o ${name}.o		COMMAND ${CMAKE_CXX_COMPILER} ${target_cflags} -Wl,--whole-archive "$<TARGET_LINKER_FILE:clang_rt.${name}-${arch}>" -Wl,--no-whole-archive ${dir}/lib/libc++.a -nodefaultlibs -r -o ${name}.o
COMMAND ${CMAKE_OBJCOPY} --localize-hidden ${name}.o		COMMAND ${CMAKE_OBJCOPY} --localize-hidden ${name}.o
COMMAND ${CMAKE_COMMAND} -E remove "$<TARGET_LINKER_FILE:clang_rt.${name}-${arch}>"		COMMAND ${CMAKE_COMMAND} -E remove "$<TARGET_LINKER_FILE:clang_rt.${name}-${arch}>"
COMMAND ${CMAKE_AR} qcs "$<TARGET_LINKER_FILE:clang_rt.${name}-${arch}>" ${name}.o		COMMAND ${CMAKE_AR} qcs "$<TARGET_LINKER_FILE:clang_rt.${name}-${arch}>" ${name}.o
WORKING_DIRECTORY ${cxx_${arch}_merge_dir}		WORKING_DIRECTORY ${cxx_${arch}_merge_dir}
)		)
endmacro()		endmacro()

foreach(arch ${FUZZER_SUPPORTED_ARCH})		foreach(arch ${FUZZER_SUPPORTED_ARCH})
Show All 24 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,888 Lines • ▼ Show 20 Lines	void Create(Loop *L, const LoopAccessInfo &LAI,

const auto &RtPtrChecking = *LAI.getRuntimePointerChecking();		const auto &RtPtrChecking = *LAI.getRuntimePointerChecking();
if (RtPtrChecking.Need) {		if (RtPtrChecking.Need) {
auto *Pred = SCEVCheckBlock ? SCEVCheckBlock : Preheader;		auto *Pred = SCEVCheckBlock ? SCEVCheckBlock : Preheader;
MemCheckBlock = SplitBlock(Pred, Pred->getTerminator(), DT, LI, nullptr,		MemCheckBlock = SplitBlock(Pred, Pred->getTerminator(), DT, LI, nullptr,
"vector.memcheck");		"vector.memcheck");

auto DiffChecks = RtPtrChecking.getDiffChecks();		auto DiffChecks = RtPtrChecking.getDiffChecks();
if (DiffChecks) {		if (L->getParentLoop() == nullptr && DiffChecks) {
Value *RuntimeVF = nullptr;		Value *RuntimeVF = nullptr;
MemRuntimeCheckCond = addDiffRuntimeChecks(		MemRuntimeCheckCond = addDiffRuntimeChecks(
MemCheckBlock->getTerminator(), L, *DiffChecks, MemCheckExp,		MemCheckBlock->getTerminator(), L, *DiffChecks, MemCheckExp,
[VF, &RuntimeVF](IRBuilderBase &B, unsigned Bits) {		[VF, &RuntimeVF](IRBuilderBase &B, unsigned Bits) {
if (!RuntimeVF)		if (!RuntimeVF)
RuntimeVF = getRuntimeVF(B, B.getIntNTy(Bits), VF);		RuntimeVF = getRuntimeVF(B, B.getIntNTy(Bits), VF);
return RuntimeVF;		return RuntimeVF;
},		},
▲ Show 20 Lines • Show All 8,718 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/nested-loop.ll

This file was added.

				; RUN: opt -loop-vectorize -S %s -o - \| FileCheck %s
				; CHECK: vector.memcheck:
				; CHECK-NEXT: %bound0 = icmp ult ptr
				; CHECK-NEXT: %bound1 = icmp ult ptr
				; CHECK-NEXT: %found.conflict = and i1 %bound0, %bound1

				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				fhahnUnsubmitted Not Done Reply Inline Actions I don't think this test needs the X86 cost model. `target triple` could probably be dropped and replaced by passing `-force-vector-width=4 -force-vector-interleave=1` to `opt`. If it requires the target triple, it needs to be moved to the `X86` sub-directory. fhahn: I don't think this test needs the X86 cost model. `target triple` could probably be dropped and…
				@.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1

				; Function Attrs: nofree nounwind uwtable
				define dso_local void @array_magick(ptr nocapture noundef %a, ptr nocapture noundef readonly %b, i32 noundef %len) local_unnamed_addr #0 {
				entry:
				%cmp24 = icmp sgt i32 %len, 0
				br i1 %cmp24, label %for.cond1.preheader.us.preheader, label %for.end12

				fhahnUnsubmitted Not Done Reply Inline Actions This check shouldn't be really needed. fhahn: This check shouldn't be really needed.
				for.cond1.preheader.us.preheader: ; preds = %entry
				%wide.trip.count30 = zext i32 %len to i64
				br label %for.cond1.preheader.us
				fhahnUnsubmitted Not Done Reply Inline Actions could pass `%Len` as `i64` fhahn: could pass `%Len` as `i64`

				for.cond1.preheader.us: ; preds = %for.cond1.preheader.us.preheader, %for.cond1.for.end_crit_edge.us
				%indvars.iv27 = phi i64 [ 0, %for.cond1.preheader.us.preheader ], [ %indvars.iv.next28, %for.cond1.for.end_crit_edge.us ]
				fhahnUnsubmitted Not Done Reply Inline Actions block names could be simplified. e.g. `%for.cond1.preheader.us -> %outer.header`, `%for.cond1.for.end_crit_edge.us: -> %outer.latch`, `%for.body3.us -> %inner` `for.end12 -> %exit`. fhahn: block names could be simplified. e.g. `%for.cond1.preheader.us -> %outer.header`, `%for.cond1.
				%arrayidx.us = getelementptr inbounds i32, ptr %a, i64 %indvars.iv27
				%.pre = load i32, ptr %arrayidx.us, align 4, !tbaa !5
				br label %for.body3.us

				for.body3.us: ; preds = %for.cond1.preheader.us, %for.body3.us
				%0 = phi i32 [ %.pre, %for.cond1.preheader.us ], [ %sub.us, %for.body3.us ]
				%indvars.iv = phi i64 [ 0, %for.cond1.preheader.us ], [ %indvars.iv.next, %for.body3.us ]
				%arrayidx5.us = getelementptr inbounds i32, ptr %b, i64 %indvars.iv
				fhahnUnsubmitted Not Done Reply Inline Actions same could be simplified by dropping `indvars.` prefix. fhahn: same could be simplified by dropping `indvars.` prefix.
				%1 = load i32, ptr %arrayidx5.us, align 4, !tbaa !5
				%sub.us = sub i32 %0, %1
				store i32 %sub.us, ptr %arrayidx.us, align 4, !tbaa !5
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				fhahnUnsubmitted Not Done Reply Inline Actions !tbaa shouldn't be needed. fhahn: !tbaa shouldn't be needed.
				%exitcond.not = icmp eq i64 %indvars.iv.next, %wide.trip.count30
				br i1 %exitcond.not, label %for.cond1.for.end_crit_edge.us, label %for.body3.us, !llvm.loop !9

				for.cond1.for.end_crit_edge.us: ; preds = %for.body3.us
				%sub.us.lcssa = phi i32 [ %sub.us, %for.body3.us ]
				%call.us = tail call i32 (ptr, ...) @printf(ptr noundef nonnull @.str, i32 noundef %sub.us.lcssa)
				%indvars.iv.next28 = add nuw nsw i64 %indvars.iv27, 1
				fhahnUnsubmitted Not Done Reply Inline Actions This could be simplified, if a use of `%sub.us.lcssa` is needed then a function that only takes an `i32` should be sufficient. fhahn: This could be simplified, if a use of `%sub.us.lcssa` is needed then a function that only…
				%exitcond31.not = icmp eq i64 %indvars.iv.next28, %wide.trip.count30
				br i1 %exitcond31.not, label %for.end12.loopexit, label %for.cond1.preheader.us, !llvm.loop !12

				for.end12.loopexit: ; preds = %for.cond1.for.end_crit_edge.us
				br label %for.end12

				for.end12: ; preds = %for.end12.loopexit, %entry
				ret void
				}

				; Function Attrs: nofree nounwind
				declare noundef i32 @printf(ptr nocapture noundef readonly, ...) local_unnamed_addr #1

				attributes #0 = { nofree nounwind uwtable "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #1 = { nofree nounwind "frame-pointer"="none" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				fhahnUnsubmitted Not Done Reply Inline Actions attributes shouldn't be needed fhahn: attributes shouldn't be needed

				!llvm.module.flags = !{!0, !1, !2, !3}
				!llvm.ident = !{!4}
				fhahnUnsubmitted Not Done Reply Inline Actions none of the metadata should be needed fhahn: none of the metadata should be needed

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{i32 8, !"PIC Level", i32 2}
				!2 = !{i32 7, !"PIE Level", i32 2}
				!3 = !{i32 7, !"uwtable", i32 2}
				!4 = !{!"clang version 16.0.0 (https://github.com/llvm/llvm-project.git 09f608fda51ca9dd2d88c2985bad1cfc1e36251e)"}
				!5 = !{!6, !6, i64 0}
				!6 = !{!"int", !7, i64 0}
				!7 = !{!"omnipotent char", !8, i64 0}
				!8 = !{!"Simple C/C++ TBAA"}
				!9 = distinct !{!9, !10, !11}
				!10 = !{!"llvm.loop.mustprogress"}
				!11 = !{!"llvm.loop.unroll.disable"}
				!12 = distinct !{!12, !10, !11}

This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Emit runtime checks correctly for nested loopsNeeds RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 454895

compiler-rt/lib/fuzzer/CMakeLists.txt

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/nested-loop.ll

[LoopVectorize] Emit runtime checks correctly for nested loops
Needs RevisionPublic