This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
docs/
-
LangRef.rst
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/X86/
-
Transforms/
-
LoopVectorize/
-
X86/
-
force-ifcvt.ll

Differential D19512

[LoopVectorize] Don't consider conditional-load dereferenceability when vectorization is forced
ClosedPublic

Authored by hfinkel on Apr 25 2016, 4:28 PM.

Download Raw Diff

Details

Reviewers

silviu.baranga
anemet
nadav
mzolotukhin
delena
aschwaighofer

Commits

rG411d31ad7245: [LoopVectorize] Don't consider conditional-load dereferenceability for marked…
rL267514: [LoopVectorize] Don't consider conditional-load dereferenceability for marked…

Summary

I really thought we were doing this already, but we were not. Given this input:

void Test(int *res, int *c, int *d, int *p) {
#pragma clang loop vectorize(assume_safety)
  for (int i = 0; i < 16; i++)
    res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
}

we still don't vectorize this loop. Even with "assume_safety", the check that we don't if-convert conditionally-executed loads (to protect against data-dependent deferenceability) was not elided. We should vectorize this.

The change here seems straightforward. One subtlety: As implemented, it will still prefer to use a masked-load instrinsic (given target support) over the speculated load. The choice here seems architecture specific; the best option depends on how expensive the masked load is compared to a regular load. Ideally, using the masked load still reduces unnecessary memory traffic, and so should be preferred. If we'd rather do it the other way, flipping the order of the checks is easy.

There is still an issue with the generated code: it contains runtime overlap checks. Fixing that will be follow-up work.

Diff Detail

Repository: rL LLVM

Event Timeline

hfinkel updated this revision to Diff 54939.Apr 25 2016, 4:28 PM

hfinkel retitled this revision from to [LoopVectorize] Don't consider conditional-load dereferenceability when vectorization is forced.

hfinkel updated this object.

hfinkel added reviewers: mzolotukhin, anemet, silviu.baranga, nadav, aschwaighofer, delena.

hfinkel added a subscriber: llvm-commits.

Herald added subscribers: mzolotukhin, mcrosier. · View Herald TranscriptApr 25 2016, 4:28 PM

Actually, checking:

Hints->getForce() == LoopVectorizeHints::FK_Enabled

will catch both vectorize(assume_safety) and vectorize(enable). I'd need to also check:

TheLoop->isAnnotatedParallel()

to only get assume_safety. It occurs to me that I don't entirely like this setup: The parallel loop annotation is documented to refer to loop dependencies, not if-conversion safety. While I don't think users will be bothered by this (or see it), do you think that taking llvm.mem.parallel_loop_access to imply if-conversion safety is okay, or do we need to enhance it to differentiate between these concepts (loop-carried dependencies vs. if-conversion safety)? I don't have a use case for differentiating them. Opinions?

There is still an issue with the generated code: it contains runtime overlap checks. Fixing that will be follow-up work.

On this, the loop is not fully annotated by the time it hits the vectorizer. SimplyCFG is dropping the metadata on one of the loads. I'll need to fixup the test case to be fully annotated if/when a check for TheLoop->isAnnotatedParallel() is added.

In D19512#411520, @hfinkel wrote:
Actually, checking:
Hints->getForce() == LoopVectorizeHints::FK_Enabled
will catch both vectorize(assume_safety) and vectorize(enable). I'd need to also check:

I was about point this out and then ask you this :-) :

TheLoop->isAnnotatedParallel()
to only get assume_safety. It occurs to me that I don't entirely like this setup: The parallel loop annotation is documented to refer to loop dependencies, not if-conversion safety. While I don't think users will be bothered by this (or see it), do you think that taking llvm.mem.parallel_loop_access to imply if-conversion safety is okay, or do we need to enhance it to differentiate between these concepts (loop-carried dependencies vs. if-conversion safety)? I don't have a use case for differentiating them. Opinions?

Looks like we only use llvm.mem.parallel_loop_access to convey assume_safety so I guess we can just extend it to imply if-convertable loads as well, no?

In D19512#411557, @anemet wrote:
In D19512#411520, @hfinkel wrote:
Actually, checking:
Hints->getForce() == LoopVectorizeHints::FK_Enabled
will catch both vectorize(assume_safety) and vectorize(enable). I'd need to also check:
I was about point this out and then ask you this :-) :
TheLoop->isAnnotatedParallel()
to only get assume_safety. It occurs to me that I don't entirely like this setup: The parallel loop annotation is documented to refer to loop dependencies, not if-conversion safety. While I don't think users will be bothered by this (or see it), do you think that taking llvm.mem.parallel_loop_access to imply if-conversion safety is okay, or do we need to enhance it to differentiate between these concepts (loop-carried dependencies vs. if-conversion safety)? I don't have a use case for differentiating them. Opinions?
Looks like we only use llvm.mem.parallel_loop_access to convey assume_safety so I guess we can just extend it to imply if-convertable loads as well, no?

I'm happy to do this (we can always differentiate late if there's a use case for that). I'll update the patch.

Would be good to also document this in LangRef.rst.

It is the llvm.mem.parallel_loop_access that really implies the if-conversion safety. Check that. Note the semantic addition in the LangRef.

LGTM. You may be able to drop the llvm.loop.vectorize.enable MD now.

This revision is now accepted and ready to land.Apr 25 2016, 6:15 PM

Closed by commit rL267514: [LoopVectorize] Don't consider conditional-load dereferenceability for marked… (authored by hfinkel). · Explain WhyApr 25 2016, 7:06 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

docs/

LangRef.rst

3 lines

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

4 lines

test/

Transforms/

LoopVectorize/

X86/

force-ifcvt.ll

41 lines

Diff 54962

llvm/trunk/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 4,710 Lines • ▼ Show 20 Lines

	'``llvm.mem.parallel_loop_access``' Metadata			'``llvm.mem.parallel_loop_access``' Metadata
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^			^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	The ``llvm.mem.parallel_loop_access`` metadata refers to a loop identifier,			The ``llvm.mem.parallel_loop_access`` metadata refers to a loop identifier,
	or metadata containing a list of loop identifiers for nested loops.			or metadata containing a list of loop identifiers for nested loops.
	The metadata is attached to memory accessing instructions and denotes that			The metadata is attached to memory accessing instructions and denotes that
	no loop carried memory dependence exist between it and other instructions denoted			no loop carried memory dependence exist between it and other instructions denoted
	with the same loop identifier.			with the same loop identifier. The metadata on memory reads also implies that
				if conversion (i.e. speculative execution within a loop iteration) is safe.

	Precisely, given two instructions ``m1`` and ``m2`` that both have the			Precisely, given two instructions ``m1`` and ``m2`` that both have the
	``llvm.mem.parallel_loop_access`` metadata, with ``L1`` and ``L2`` being the			``llvm.mem.parallel_loop_access`` metadata, with ``L1`` and ``L2`` being the
	set of loops associated with that metadata, respectively, then there is no loop			set of loops associated with that metadata, respectively, then there is no loop
	carried dependence between ``m1`` and ``m2`` for loops in both ``L1`` and			carried dependence between ``m1`` and ``m2`` for loops in both ``L1`` and
	``L2``.			``L2``.

	As a special case, if all memory accessing instructions in a loop have			As a special case, if all memory accessing instructions in a loop have
	▲ Show 20 Lines • Show All 7,671 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

	Show First 20 Lines • Show All 4,867 Lines • ▼ Show 20 Lines
	}			}

	bool LoopVectorizationLegality::blockNeedsPredication(BasicBlock *BB) {			bool LoopVectorizationLegality::blockNeedsPredication(BasicBlock *BB) {
	return LoopAccessInfo::blockNeedsPredication(BB, TheLoop, DT);			return LoopAccessInfo::blockNeedsPredication(BB, TheLoop, DT);
	}			}

	bool LoopVectorizationLegality::blockCanBePredicated(BasicBlock *BB,			bool LoopVectorizationLegality::blockCanBePredicated(BasicBlock *BB,
	SmallPtrSetImpl<Value *> &SafePtrs) {			SmallPtrSetImpl<Value *> &SafePtrs) {
				const bool IsAnnotatedParallel = TheLoop->isAnnotatedParallel();

	for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; ++it) {			for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; ++it) {
	// Check that we don't have a constant expression that can trap as operand.			// Check that we don't have a constant expression that can trap as operand.
	for (Instruction::op_iterator OI = it->op_begin(), OE = it->op_end();			for (Instruction::op_iterator OI = it->op_begin(), OE = it->op_end();
	OI != OE; ++OI) {			OI != OE; ++OI) {
	if (Constant C = dyn_cast<Constant>(OI))			if (Constant C = dyn_cast<Constant>(OI))
	if (C->canTrap())			if (C->canTrap())
	return false;			return false;
	}			}
	// We might be able to hoist the load.			// We might be able to hoist the load.
	if (it->mayReadFromMemory()) {			if (it->mayReadFromMemory()) {
	LoadInst *LI = dyn_cast<LoadInst>(it);			LoadInst *LI = dyn_cast<LoadInst>(it);
	if (!LI)			if (!LI)
	return false;			return false;
	if (!SafePtrs.count(LI->getPointerOperand())) {			if (!SafePtrs.count(LI->getPointerOperand())) {
	if (isLegalMaskedLoad(LI->getType(), LI->getPointerOperand()) \|\|			if (isLegalMaskedLoad(LI->getType(), LI->getPointerOperand()) \|\|
	isLegalMaskedGather(LI->getType())) {			isLegalMaskedGather(LI->getType())) {
	MaskedOp.insert(LI);			MaskedOp.insert(LI);
	continue;			continue;
	}			}
				// !llvm.mem.parallel_loop_access implies if-conversion safety.
				if (IsAnnotatedParallel)
				continue;
	return false;			return false;
	}			}
	}			}

	// We don't predicate stores at the moment.			// We don't predicate stores at the moment.
	if (it->mayWriteToMemory()) {			if (it->mayWriteToMemory()) {
	StoreInst *SI = dyn_cast<StoreInst>(it);			StoreInst *SI = dyn_cast<StoreInst>(it);
	// We only support predication of stores in basic blocks with one			// We only support predication of stores in basic blocks with one
	▲ Show 20 Lines • Show All 1,307 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/X86/force-ifcvt.ll

				; RUN: opt -loop-vectorize -S < %s \| FileCheck %s
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: norecurse nounwind uwtable
				define void @Test(i32* nocapture %res, i32* nocapture readnone %c, i32* nocapture readonly %d, i32* nocapture readonly %p) #0 {
				entry:
				br label %for.body

				; CHECK-LABEL: @Test
				; CHECK: <4 x i32>

				for.body: ; preds = %cond.end, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %cond.end ]
				%arrayidx = getelementptr inbounds i32, i32* %p, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
				%cmp1 = icmp eq i32 %0, 0
				%arrayidx3 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv
				%1 = load i32, i32* %arrayidx3, align 4, !llvm.mem.parallel_loop_access !0
				br i1 %cmp1, label %cond.end, label %cond.false

				cond.false: ; preds = %for.body
				%arrayidx7 = getelementptr inbounds i32, i32* %d, i64 %indvars.iv
				%2 = load i32, i32* %arrayidx7, align 4, !llvm.mem.parallel_loop_access !0
				%add = add nsw i32 %2, %1
				br label %cond.end

				cond.end: ; preds = %for.body, %cond.false
				%cond = phi i32 [ %add, %cond.false ], [ %1, %for.body ]
				store i32 %cond, i32* %arrayidx3, align 4, !llvm.mem.parallel_loop_access !0
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 16
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %cond.end
				ret void
				}

				attributes #0 = { norecurse nounwind uwtable "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" }

				!0 = distinct !{!0}