This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/X86/
-
Transforms/
-
LoopVectorize/
-
X86/
-
force-ifcvt.ll

Differential D19512

[LoopVectorize] Don't consider conditional-load dereferenceability when vectorization is forced
ClosedPublic

Authored by hfinkel on Apr 25 2016, 4:28 PM.

Download Raw Diff

Details

Reviewers

silviu.baranga
anemet
nadav
mzolotukhin
delena
aschwaighofer

Commits

rG411d31ad7245: [LoopVectorize] Don't consider conditional-load dereferenceability for marked…
rL267514: [LoopVectorize] Don't consider conditional-load dereferenceability for marked…

Summary

I really thought we were doing this already, but we were not. Given this input:

void Test(int *res, int *c, int *d, int *p) {
#pragma clang loop vectorize(assume_safety)
  for (int i = 0; i < 16; i++)
    res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
}

we still don't vectorize this loop. Even with "assume_safety", the check that we don't if-convert conditionally-executed loads (to protect against data-dependent deferenceability) was not elided. We should vectorize this.

The change here seems straightforward. One subtlety: As implemented, it will still prefer to use a masked-load instrinsic (given target support) over the speculated load. The choice here seems architecture specific; the best option depends on how expensive the masked load is compared to a regular load. Ideally, using the masked load still reduces unnecessary memory traffic, and so should be preferred. If we'd rather do it the other way, flipping the order of the checks is easy.

There is still an issue with the generated code: it contains runtime overlap checks. Fixing that will be follow-up work.

Diff Detail

Event Timeline

hfinkel updated this revision to Diff 54939.Apr 25 2016, 4:28 PM

hfinkel retitled this revision from to [LoopVectorize] Don't consider conditional-load dereferenceability when vectorization is forced.

hfinkel updated this object.

hfinkel added reviewers: mzolotukhin, anemet, silviu.baranga, nadav, aschwaighofer, delena.

hfinkel added a subscriber: llvm-commits.

Herald added subscribers: mzolotukhin, mcrosier. · View Herald TranscriptApr 25 2016, 4:28 PM

Actually, checking:

Hints->getForce() == LoopVectorizeHints::FK_Enabled

will catch both vectorize(assume_safety) and vectorize(enable). I'd need to also check:

TheLoop->isAnnotatedParallel()

to only get assume_safety. It occurs to me that I don't entirely like this setup: The parallel loop annotation is documented to refer to loop dependencies, not if-conversion safety. While I don't think users will be bothered by this (or see it), do you think that taking llvm.mem.parallel_loop_access to imply if-conversion safety is okay, or do we need to enhance it to differentiate between these concepts (loop-carried dependencies vs. if-conversion safety)? I don't have a use case for differentiating them. Opinions?

There is still an issue with the generated code: it contains runtime overlap checks. Fixing that will be follow-up work.

On this, the loop is not fully annotated by the time it hits the vectorizer. SimplyCFG is dropping the metadata on one of the loads. I'll need to fixup the test case to be fully annotated if/when a check for TheLoop->isAnnotatedParallel() is added.

In D19512#411520, @hfinkel wrote:
Actually, checking:
Hints->getForce() == LoopVectorizeHints::FK_Enabled
will catch both vectorize(assume_safety) and vectorize(enable). I'd need to also check:

I was about point this out and then ask you this :-) :

TheLoop->isAnnotatedParallel()
to only get assume_safety. It occurs to me that I don't entirely like this setup: The parallel loop annotation is documented to refer to loop dependencies, not if-conversion safety. While I don't think users will be bothered by this (or see it), do you think that taking llvm.mem.parallel_loop_access to imply if-conversion safety is okay, or do we need to enhance it to differentiate between these concepts (loop-carried dependencies vs. if-conversion safety)? I don't have a use case for differentiating them. Opinions?

Looks like we only use llvm.mem.parallel_loop_access to convey assume_safety so I guess we can just extend it to imply if-convertable loads as well, no?

In D19512#411557, @anemet wrote:
In D19512#411520, @hfinkel wrote:
Actually, checking:
Hints->getForce() == LoopVectorizeHints::FK_Enabled
will catch both vectorize(assume_safety) and vectorize(enable). I'd need to also check:
I was about point this out and then ask you this :-) :
TheLoop->isAnnotatedParallel()
to only get assume_safety. It occurs to me that I don't entirely like this setup: The parallel loop annotation is documented to refer to loop dependencies, not if-conversion safety. While I don't think users will be bothered by this (or see it), do you think that taking llvm.mem.parallel_loop_access to imply if-conversion safety is okay, or do we need to enhance it to differentiate between these concepts (loop-carried dependencies vs. if-conversion safety)? I don't have a use case for differentiating them. Opinions?
Looks like we only use llvm.mem.parallel_loop_access to convey assume_safety so I guess we can just extend it to imply if-convertable loads as well, no?

I'm happy to do this (we can always differentiate late if there's a use case for that). I'll update the patch.

Would be good to also document this in LangRef.rst.

It is the llvm.mem.parallel_loop_access that really implies the if-conversion safety. Check that. Note the semantic addition in the LangRef.

LGTM. You may be able to drop the llvm.loop.vectorize.enable MD now.

This revision is now accepted and ready to land.Apr 25 2016, 6:15 PM

Closed by commit rL267514: [LoopVectorize] Don't consider conditional-load dereferenceability for marked… (authored by hfinkel). · Explain WhyApr 25 2016, 7:06 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

2 lines

test/

Transforms/

LoopVectorize/

X86/

force-ifcvt.ll

42 lines

Diff 54939

lib/Transforms/Vectorize/LoopVectorize.cpp

Show First 20 Lines • Show All 4,887 Lines • ▼ Show 20 Lines	if (it->mayReadFromMemory()) {
if (!LI)		if (!LI)
return false;		return false;
if (!SafePtrs.count(LI->getPointerOperand())) {		if (!SafePtrs.count(LI->getPointerOperand())) {
if (isLegalMaskedLoad(LI->getType(), LI->getPointerOperand()) \|\|		if (isLegalMaskedLoad(LI->getType(), LI->getPointerOperand()) \|\|
isLegalMaskedGather(LI->getType())) {		isLegalMaskedGather(LI->getType())) {
MaskedOp.insert(LI);		MaskedOp.insert(LI);
continue;		continue;
}		}
		if (Hints->getForce() == LoopVectorizeHints::FK_Enabled)
		continue;
return false;		return false;
}		}
}		}

// We don't predicate stores at the moment.		// We don't predicate stores at the moment.
if (it->mayWriteToMemory()) {		if (it->mayWriteToMemory()) {
StoreInst *SI = dyn_cast<StoreInst>(it);		StoreInst *SI = dyn_cast<StoreInst>(it);
// We only support predication of stores in basic blocks with one		// We only support predication of stores in basic blocks with one
▲ Show 20 Lines • Show All 1,307 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/X86/force-ifcvt.ll

This file was added.

				; RUN: opt -loop-vectorize -S < %s \| FileCheck %s
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: norecurse nounwind uwtable
				define void @Test(i32* nocapture %res, i32* nocapture readnone %c, i32* nocapture readonly %d, i32* nocapture readonly %p) #0 {
				entry:
				br label %for.body

				; CHECK-LABEL: @Test
				; CHECK: <4 x i32>

				for.body: ; preds = %cond.end, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %cond.end ]
				%arrayidx = getelementptr inbounds i32, i32* %p, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
				%cmp1 = icmp eq i32 %0, 0
				%arrayidx3 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv
				%1 = load i32, i32* %arrayidx3, align 4
				br i1 %cmp1, label %cond.end, label %cond.false

				cond.false: ; preds = %for.body
				%arrayidx7 = getelementptr inbounds i32, i32* %d, i64 %indvars.iv
				%2 = load i32, i32* %arrayidx7, align 4, !llvm.mem.parallel_loop_access !0
				%add = add nsw i32 %2, %1
				br label %cond.end

				cond.end: ; preds = %for.body, %cond.false
				%cond = phi i32 [ %add, %cond.false ], [ %1, %for.body ]
				store i32 %cond, i32* %arrayidx3, align 4, !llvm.mem.parallel_loop_access !0
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 16
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %cond.end
				ret void
				}

				attributes #0 = { norecurse nounwind uwtable "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" }

				!0 = distinct !{!0, !1}
				!1 = !{!"llvm.loop.vectorize.enable", i1 true}