This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
2
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/X86/
-
Transforms/
-
LoopVectorize/
-
X86/
1/2
tail-folding-memcheck.ll

Differential D66803

[LV] Tail-folding with runtime memory checks
ClosedPublic

Authored by SjoerdMeijer on Aug 27 2019, 7:41 AM.

Download Raw Diff

Details

Reviewers

Ayal
dorit
hsaito
fhahn
dcaballe

Commits

rG0469b0e4ef71: [LV] Tail-folding with runtime memory checks
rL370707: [LV] Tail-folding with runtime memory checks

Summary

The loop vectorizer is running in an assert when it tries to fold the tail and
has to emit runtime memory disambiguation checks. This happens as soon as
you try to do something like:

#pragma clang loop predicate(enable)
for (int i=0; i < N; i++)
  A[i] = B[i] + C[i];

where A, B, and C are integer pointers, not annotated with e.g. the restrict
type qualifier.

However, it looks like the logic to deal with this is all in place, so this
simply removes the assert.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Aug 27 2019, 7:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 27 2019, 7:41 AM

Herald added subscribers: rkruppe, hiraditya. · View Herald Transcript

I wrote some manual tests, and looked at some IR, and the produced code looked okay to me.
I will continue testing a bit more until you point out to me that I missed something here and did something stupid :-)

The original intent was to make sure that under OptForSize only a single (vector) loop will be produced. I.e., w/o a scalar loop serving either as scalar leftover iterations or as runtime guard bailout. There's another such assert in emitSCEVChecks(). Now that FoldTailByMasking no longer implies OptForSize this should indeed be updated (to use CM_ScalarEpilogueNotAllowedOptSize instead perhaps?).

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2699	also here

Hi,

Thanks for taking a look at this, and for clarifying this:

The original intent was to make sure that under OptForSize only a single (vector) loop will be produced. I.e., w/o a scalar loop serving either as scalar leftover iterations or as runtime guard bailout.

and for pointing this out:

There's another such assert in emitSCEVChecks(). Now that FoldTailByMasking no longer implies OptForSize this should indeed be updated (to use CM_ScalarEpilogueNotAllowedOptSize instead perhaps?).

Shall we address this separately in a different patch? I am looking at this now, and feel that I have embarked on a little SCEV adventure, and that this is a separate issue.

In D66803#1648523, @SjoerdMeijer wrote:

Shall we address this separately in a different patch? I am looking at this now, and feel that I have embarked on a little SCEV adventure, and that this is a separate issue.

Sure. Can also add a TODO/FIXME for now.

Thanks, have added a FIXME, which I will address asap.

I have addressed the other assert in D66932.

In D66803#1650509, @SjoerdMeijer wrote:

I have addressed the other assert in D66932.

OK, great, then the FIXME is probably redundant.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2732	Replace the assert so that it checks hasOptSize, instead of removing it?
llvm/test/Transforms/LoopVectorize/X86/tail-folding-memcheck.ll
14	Better avoid hard coded values such as `%12`. E.g., see what utils/update_test_checks.py makes of it.

Comments addressed.

SjoerdMeijer marked an inline comment as done.Aug 29 2019, 5:30 AM

SjoerdMeijer added inline comments.

llvm/test/Transforms/LoopVectorize/X86/tail-folding-memcheck.ll
14	Thanks for spotting, copy-paste mistake. I think checking the masked loads/stores and the labels is good enough (and conciser), but if you think it is better I can run the script.

friendly ping

Ayal accepted this revision.Sep 3 2019, 12:37 AM

This revision is now accepted and ready to land.Sep 3 2019, 12:37 AM

Many thanks for reviewing!

Closed by commit rL370707: [LV] Tail-folding with runtime memory checks (authored by SjoerdMeijer). · Explain WhySep 3 2019, 1:37 AM

This revision was automatically updated to reflect the committed changes.

Hi,

I've hit the new assertion when compiling for my out of tree target when using -Osize in combination with

#pragma clang loop vectorize(enable)

on a loop.

What is supposed to prevent us from triggering the assertion in a case like that?

In D66803#1656923, @uabelho wrote:

Hi,

I've hit the new assertion when compiling for my out of tree target when using -Osize in combination with

#pragma clang loop vectorize(enable)

on a loop.

What is supposed to prevent us from triggering the assertion in a case like that?

Can be reproduced on trunk with

clang -mllvm -disable-llvm-optzns -S -Xclang -emit-llvm-bc -Os lala.c -o ./lala.bc
opt -disable-basicaa -O1 -S -o - lala.bc

lala.c264 BDownload

Hi, sorry about this.

Just looking at the .C file, I was surprised that it tries to emit runtime checks because restrict is. But running opt with disable-basicaa I can imagine it will try to do that yes; that doesn't really surprise me. And so when I ran clang -Os -fvectorize I am not running in to this assert.

Triggering this assert with opt shows its usefulness of this assert I think. But if we can trigger this from a user-facing tool clang then I think we do have a problem. Do you have a clang command to trigger this?

Why would it be ok to give the assert when we run

clang
opt

?
You mean that clang's output in that case is broken? In that case I'd expect a verifier in opt to catch the fault, not that opt would crash with an assertion.

Anyway, I get the crash also with

clang -mllvm -disable-basicaa -S -Os lala.c -o -

I was just trying to say that triggering an assert by some option combination in a non-user facing tool is probably not that difficult. And I was also trying to say that the assert was kind of serving it purpose, because I was expecting basicaa to run. But probably that assumption is wrong?

But let's forget this, because I agree that asserting is not really a graceful way of dealing with this, and there's probably room for improvement. Just checking what you are expecting for this case: do you expect this loop to be vectorised? I would guess that because of disabling basicaa, runtime checks are required and there's code growth, which is undesired when optimising for size, so no vectorization.

Normally we do use basicaa, the crash was found in "fuzz" tests where we compile the code with random flags.

So I'm not really sure what the vectorizer should really do in that case, apart from not crashing :)

Okay, thanks for reporting, I will look into this.

SjoerdMeijer mentioned this in D67764: [LV] Forced vectorization with runtime checks and OptForSize.Sep 21 2019, 1:24 PM

SjoerdMeijer mentioned this in rL372694: [LV] Forced vectorization with runtime checks and OptForSize.Sep 24 2019, 1:03 AM

SjoerdMeijer mentioned this in rG0fcb3afb401c: [LV] Forced vectorization with runtime checks and OptForSize.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

5 lines

test/

Transforms/

LoopVectorize/

X86/

tail-folding-memcheck.ll

38 lines

Diff 217847

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,690 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::emitSCEVChecks(Loop L, BasicBlock Bypass) {
Value *SCEVCheck =		Value *SCEVCheck =
Exp.expandCodeForPredicate(&PSE.getUnionPredicate(), BB->getTerminator());		Exp.expandCodeForPredicate(&PSE.getUnionPredicate(), BB->getTerminator());

if (auto *C = dyn_cast<ConstantInt>(SCEVCheck))		if (auto *C = dyn_cast<ConstantInt>(SCEVCheck))
if (C->isZero())		if (C->isZero())
return;		return;

assert(!Cost->foldTailByMasking() &&		assert(!Cost->foldTailByMasking() &&
"Cannot SCEV check stride or overflow when folding tail");		"Cannot SCEV check stride or overflow when folding tail");
AyalUnsubmitted Not Done Reply Inline Actions also here Ayal: also here

// Create a new block containing the stride check.		// Create a new block containing the stride check.
BB->setName("vector.scevcheck");		BB->setName("vector.scevcheck");
auto *NewBB = BB->splitBasicBlock(BB->getTerminator(), "vector.ph");		auto *NewBB = BB->splitBasicBlock(BB->getTerminator(), "vector.ph");
// Update dominator tree immediately if the generated block is a		// Update dominator tree immediately if the generated block is a
// LoopBypassBlock because SCEV expansions to generate loop bypass		// LoopBypassBlock because SCEV expansions to generate loop bypass
// checks may query it before the current function is finished.		// checks may query it before the current function is finished.
DT->addNewBlock(NewBB, BB);		DT->addNewBlock(NewBB, BB);
if (L->getParentLoop())		if (L->getParentLoop())
Show All 16 Lines	void InnerLoopVectorizer::emitMemRuntimeChecks(Loop L, BasicBlock Bypass) {
// faster.		// faster.
Instruction *FirstCheckInst;		Instruction *FirstCheckInst;
Instruction *MemRuntimeCheck;		Instruction *MemRuntimeCheck;
std::tie(FirstCheckInst, MemRuntimeCheck) =		std::tie(FirstCheckInst, MemRuntimeCheck) =
Legal->getLAI()->addRuntimeChecks(BB->getTerminator());		Legal->getLAI()->addRuntimeChecks(BB->getTerminator());
if (!MemRuntimeCheck)		if (!MemRuntimeCheck)
return;		return;

assert(!Cost->foldTailByMasking() && "Cannot check memory when folding tail");		assert(!BB->getParent()->hasOptSize() &&
AyalUnsubmitted Not Done Reply Inline Actions Replace the assert so that it checks hasOptSize, instead of removing it? Ayal: Replace the assert so that it checks hasOptSize, instead of removing it?
		"Cannot emit memory checks when optimizing for size");

// Create a new block containing the memory check.		// Create a new block containing the memory check.
BB->setName("vector.memcheck");		BB->setName("vector.memcheck");
auto *NewBB = BB->splitBasicBlock(BB->getTerminator(), "vector.ph");		auto *NewBB = BB->splitBasicBlock(BB->getTerminator(), "vector.ph");
// Update dominator tree immediately if the generated block is a		// Update dominator tree immediately if the generated block is a
// LoopBypassBlock because SCEV expansions to generate loop bypass		// LoopBypassBlock because SCEV expansions to generate loop bypass
// checks may query it before the current function is finished.		// checks may query it before the current function is finished.
DT->addNewBlock(NewBB, BB);		DT->addNewBlock(NewBB, BB);
if (L->getParentLoop())		if (L->getParentLoop())
▲ Show 20 Lines • Show All 5,054 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/tail-folding-memcheck.ll

This file was added.

				; RUN: opt < %s -loop-vectorize -mcpu=core-avx2 -S \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				define dso_local void @_Z1GPiS_S_(i32* nocapture %A, i32* nocapture readonly %B, i32* nocapture readonly %C) local_unnamed_addr #0 {
				; CHECK-LABEL: @_Z1GPiS_S_
				; CHECK: vector.memcheck:
				; CHECK: vector.body:
				; CHECK: @llvm.masked.load.v8i32.p0v8i32
				; CHECK: @llvm.masked.load.v8i32.p0v8i32
				; CHECK: @llvm.masked.store.v8i32.p0v8i32
				; CHECK: br i1 %{{.}}, label %{{.}}, label %vector.body
				entry:
				AyalUnsubmitted Not Done Reply Inline Actions Better avoid hard coded values such as `%12`. E.g., see what utils/update_test_checks.py makes of it. Ayal: Better avoid hard coded values such as `%12`. E.g., see what utils/update_test_checks.py makes…
				SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions Thanks for spotting, copy-paste mistake. I think checking the masked loads/stores and the labels is good enough (and conciser), but if you think it is better I can run the script. SjoerdMeijer: Thanks for spotting, copy-paste mistake. I think checking the masked loads/stores and the…
				br label %for.body

				for.cond.cleanup:
				ret void

				for.body:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %B, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx2 = getelementptr inbounds i32, i32* %C, i64 %indvars.iv
				%1 = load i32, i32* %arrayidx2, align 4
				%add = add nsw i32 %1, %0
				%arrayidx4 = getelementptr inbounds i32, i32* %A, i64 %indvars.iv
				store i32 %add, i32* %arrayidx4, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 430
				br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !6
				}

				attributes #0 = { nofree norecurse nounwind uwtable }

				!6 = distinct !{!6, !7, !8}
				!7 = !{!"llvm.loop.vectorize.predicate.enable", i1 true}
				!8 = !{!"llvm.loop.vectorize.enable", i1 true}