This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Vectorize/
-
llvm/
-
Transforms/
-
Vectorize/
-
LoopVectorizationLegality.h
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
LoopVectorizationLegality.cpp
-
test/Transforms/LoopVectorize/X86/
-
Transforms/
-
LoopVectorize/
-
X86/
-
force-ifcvt.ll
-
tail_folding_and_assume_safety.ll

Differential D103907

[LV] Parallel annotated loop does not imply all loads can be hoisted.
ClosedPublic

Authored by fodinabor on Jun 8 2021, 9:19 AM.

Download Raw Diff

Details

Reviewers

SjoerdMeijer
Meinersbur
Ayal
Pierre-vh
dmgreen
rengolin
fhahn
samparker
gilr
pjaaskel
jdoerfert

Commits

rG4f01122c3f6c: [LV] Parallel annotated loop does not imply all loads can be hoisted.

Summary

As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annotated parallel (!llvm.loop.parallel_accesses), is not expectable, the documentation for this behavior was since removed from the LangRef again, and can lead to invalid reads.
This was observed in POCL (https://github.com/pocl/pocl/issues/757) and would require similar workarounds in current work at hipSYCL.

The question remains why this was initially added and what the implications of removing this optimization would be.
Do we need an alternative mechanism to propagate the information about legality of if-conversion?
Or is the idea that conditional loads in #pragma clang loop vectorize(assume_safety) can be executed unmasked without additional checks flawed in general?
I think this implication is not part of what a user of that pragma (and corresponding metadata) would expect and thus dangerous.

Only two additional tests failed, which are adapted in this patch. Depending on the further direction force-ifcvt.ll should be removed or further adapted.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fodinabor created this revision.Jun 8 2021, 9:19 AM

Herald added subscribers: Naghasan, Anastasia, hiraditya. · View Herald TranscriptJun 8 2021, 9:19 AM

fodinabor requested review of this revision.Jun 8 2021, 9:19 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2021, 9:19 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B108228: Diff 350624.Jun 8 2021, 9:55 AM

Could you also update the language reference? It currently reads:

The metadata on memory reads also implies that if conversion (i.e. speculative execution within a loop iteration) is safe.

OK, disregard that, I was looking at an old language reference (the first found by Google).

The part was added in by @hfinkel in rG411d31ad72456ba88c0b0bee0faba2b774add65f and removed by me in rG978ba61536c2cdafa8454b7330c5d8e58d0d5048 (might have been unintentional, at least don't remember removing intentionally)

As @pjaaskel noted in the bug report, the speculative semantics violate semantics such as by OpenMP. IMHO if really needed, speculation-safety should have its own metadata.

inducer added a subscriber: inducer.Jun 8 2021, 2:59 PM

Thanks for tackling this! As a minor point, the correct link to the original pocl bug report is https://github.com/pocl/pocl/issues/757.

Meinersbur added inline comments.Jun 8 2021, 5:29 PM

llvm/test/Transforms/LoopVectorize/X86/force-ifcvt.ll
-41–-36	I suggest to remove broken-by-design tests entirely.

pekka.jaaskelainen added a subscriber: pekka.jaaskelainen.Jun 8 2021, 10:09 PM

Kazhuu added a subscriber: Kazhuu.Jun 8 2021, 10:25 PM

Remove broken-by-design force-ifcvt.ll test.

fodinabor edited the summary of this revision. (Show Details)Jun 9 2021, 2:11 AM

Harbormaster completed remote builds in B108366: Diff 350826.Jun 9 2021, 2:44 AM

I am in favor of this. Would be great if we had another person had a look. @pjaaskel? @fhahn?

To make another case: I think this kind speculation causes undefined behaviour/implementation-dependent semantics. For example:

#pragma omp simd // adds llvm.loop.parallel_accesses
for (int i = 0; i < N; i+=1) {
  double *C = i ? A : B;
  V = C[i].;
}

which an optimization might transform to

#pragma omp simd
for (int i = 0; i < N; i+=1) {
  if (i)
    V = A[i];
  if (!i)
    V = *B;
}

This moved from an unconditional memory access (no if-conversion) to two speculable accesses. In cases such as N == 1, A was never accessed (and might not have been allocated), but it would after an if-conversion. IMHO it is not predictable what memory is going to accessed.

This fixes a bug and people are in favor. If we need to add a new mechanism for this we can afterwards still. LGTM

This revision is now accepted and ready to land.Jun 10 2021, 1:45 PM

FWIW, if someone adds documentation what #pragma clang loop vectorize(assume_safety) actually means (somewhere), we can go back and make it imply if-conversion. However, we should not conflate it with the access groups/parallel metadata. Instead, annotate loads as "speculatable" (or sth. similar) and we go with that.

jdoerfert mentioned this in D99784: [LICM] Hoist loads with invariant.group metadata.Jun 10 2021, 2:07 PM

Closed by commit rG4f01122c3f6c: [LV] Parallel annotated loop does not imply all loads can be hoisted. (authored by fodinabor). · Explain WhyJun 10 2021, 2:35 PM

This revision was automatically updated to reflect the committed changes.

fodinabor added a commit: rG4f01122c3f6c: [LV] Parallel annotated loop does not imply all loads can be hoisted..

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Vectorize/

LoopVectorizationLegality.h

15 lines

lib/

Transforms/

Vectorize/

LoopVectorizationLegality.cpp

14 lines

test/

Transforms/

LoopVectorize/

X86/

force-ifcvt.ll

tail_folding_and_assume_safety.ll

4 lines

Diff 351268

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

Show First 20 Lines • Show All 429 Lines • ▼ Show 20 Lines	private:
/// transformation.		/// transformation.
bool canVectorizeWithIfConvert();		bool canVectorizeWithIfConvert();

/// Return true if we can vectorize this outer loop. The method performs		/// Return true if we can vectorize this outer loop. The method performs
/// specific checks for outer loop vectorization.		/// specific checks for outer loop vectorization.
bool canVectorizeOuterLoop();		bool canVectorizeOuterLoop();

/// Return true if all of the instructions in the block can be speculatively		/// Return true if all of the instructions in the block can be speculatively
/// executed, and record the loads/stores that require masking. If's that		/// executed, and record the loads/stores that require masking.
/// guard loads can be ignored under "assume safety" unless \p PreserveGuards
/// is true. This can happen when we introduces guards for which the original
/// "unguarded-loads are safe" assumption does not hold. For example, the
/// vectorizer's fold-tail transformation changes the loop to execute beyond
/// its original trip-count, under a proper guard, which should be preserved.
/// \p SafePtrs is a list of addresses that are known to be legal and we know		/// \p SafePtrs is a list of addresses that are known to be legal and we know
/// that we can read from them without segfault.		/// that we can read from them without segfault.
/// \p MaskedOp is a list of instructions that have to be transformed into		/// \p MaskedOp is a list of instructions that have to be transformed into
/// calls to the appropriate masked intrinsic when the loop is vectorized.		/// calls to the appropriate masked intrinsic when the loop is vectorized.
/// \p ConditionalAssumes is a list of assume instructions in predicated		/// \p ConditionalAssumes is a list of assume instructions in predicated
/// blocks that must be dropped if the CFG gets flattened.		/// blocks that must be dropped if the CFG gets flattened.
bool blockCanBePredicated(BasicBlock BB, SmallPtrSetImpl<Value > &SafePtrs,		bool blockCanBePredicated(
		BasicBlock BB, SmallPtrSetImpl<Value > &SafePtrs,
SmallPtrSetImpl<const Instruction *> &MaskedOp,		SmallPtrSetImpl<const Instruction *> &MaskedOp,
SmallPtrSetImpl<Instruction *> &ConditionalAssumes,		SmallPtrSetImpl<Instruction *> &ConditionalAssumes) const;
bool PreserveGuards = false) const;

/// Updates the vectorization state by adding \p Phi to the inductions list.		/// Updates the vectorization state by adding \p Phi to the inductions list.
/// This can set \p Phi as the main induction of the loop if \p Phi is a		/// This can set \p Phi as the main induction of the loop if \p Phi is a
/// better choice for the main induction than the existing one.		/// better choice for the main induction than the existing one.
void addInductionPhi(PHINode *Phi, const InductionDescriptor &ID,		void addInductionPhi(PHINode *Phi, const InductionDescriptor &ID,
SmallPtrSetImpl<Value *> &AllowedExit);		SmallPtrSetImpl<Value *> &AllowedExit);

/// If an access has a symbolic strides, this maps the pointer value to		/// If an access has a symbolic strides, this maps the pointer value to
▲ Show 20 Lines • Show All 105 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

Show First 20 Lines • Show All 949 Lines • ▼ Show 20 Lines

bool LoopVectorizationLegality::blockNeedsPredication(BasicBlock *BB) const {		bool LoopVectorizationLegality::blockNeedsPredication(BasicBlock *BB) const {
return LoopAccessInfo::blockNeedsPredication(BB, TheLoop, DT);		return LoopAccessInfo::blockNeedsPredication(BB, TheLoop, DT);
}		}

bool LoopVectorizationLegality::blockCanBePredicated(		bool LoopVectorizationLegality::blockCanBePredicated(
BasicBlock BB, SmallPtrSetImpl<Value > &SafePtrs,		BasicBlock BB, SmallPtrSetImpl<Value > &SafePtrs,
SmallPtrSetImpl<const Instruction *> &MaskedOp,		SmallPtrSetImpl<const Instruction *> &MaskedOp,
SmallPtrSetImpl<Instruction *> &ConditionalAssumes,		SmallPtrSetImpl<Instruction *> &ConditionalAssumes) const {
bool PreserveGuards) const {
const bool IsAnnotatedParallel = TheLoop->isAnnotatedParallel();

for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
// Check that we don't have a constant expression that can trap as operand.		// Check that we don't have a constant expression that can trap as operand.
for (Value *Operand : I.operands()) {		for (Value *Operand : I.operands()) {
if (auto *C = dyn_cast<Constant>(Operand))		if (auto *C = dyn_cast<Constant>(Operand))
if (C->canTrap())		if (C->canTrap())
return false;		return false;
}		}

Show All 11 Lines	if (isa<NoAliasScopeDeclInst>(&I))
continue;		continue;

// We might be able to hoist the load.		// We might be able to hoist the load.
if (I.mayReadFromMemory()) {		if (I.mayReadFromMemory()) {
auto *LI = dyn_cast<LoadInst>(&I);		auto *LI = dyn_cast<LoadInst>(&I);
if (!LI)		if (!LI)
return false;		return false;
if (!SafePtrs.count(LI->getPointerOperand())) {		if (!SafePtrs.count(LI->getPointerOperand())) {
// !llvm.mem.parallel_loop_access implies if-conversion safety.
// Otherwise, record that the load needs (real or emulated) masking
// and let the cost model decide.
if (!IsAnnotatedParallel \|\| PreserveGuards)
MaskedOp.insert(LI);		MaskedOp.insert(LI);
continue;		continue;
}		}
}		}

if (I.mayWriteToMemory()) {		if (I.mayWriteToMemory()) {
auto *SI = dyn_cast<StoreInst>(&I);		auto *SI = dyn_cast<StoreInst>(&I);
if (!SI)		if (!SI)
return false;		return false;
▲ Show 20 Lines • Show All 299 Lines • ▼ Show 20 Lines	bool LoopVectorizationLegality::prepareToFoldTailByMasking() {

SmallPtrSet<const Instruction *, 8> TmpMaskedOp;		SmallPtrSet<const Instruction *, 8> TmpMaskedOp;
SmallPtrSet<Instruction *, 8> TmpConditionalAssumes;		SmallPtrSet<Instruction *, 8> TmpConditionalAssumes;

// Check and mark all blocks for predication, including those that ordinarily		// Check and mark all blocks for predication, including those that ordinarily
// do not need predication such as the header block.		// do not need predication such as the header block.
for (BasicBlock *BB : TheLoop->blocks()) {		for (BasicBlock *BB : TheLoop->blocks()) {
if (!blockCanBePredicated(BB, SafePointers, TmpMaskedOp,		if (!blockCanBePredicated(BB, SafePointers, TmpMaskedOp,
TmpConditionalAssumes,		TmpConditionalAssumes)) {
/* MaskAllLoads= */ true)) {
LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking as requested.\n");		LLVM_DEBUG(dbgs() << "LV: Cannot fold tail by masking as requested.\n");
return false;		return false;
}		}
}		}

LLVM_DEBUG(dbgs() << "LV: can fold tail by masking.\n");		LLVM_DEBUG(dbgs() << "LV: can fold tail by masking.\n");

MaskedOp.insert(TmpMaskedOp.begin(), TmpMaskedOp.end());		MaskedOp.insert(TmpMaskedOp.begin(), TmpMaskedOp.end());
ConditionalAssumes.insert(TmpConditionalAssumes.begin(),		ConditionalAssumes.insert(TmpConditionalAssumes.begin(),
TmpConditionalAssumes.end());		TmpConditionalAssumes.end());

return true;		return true;
}		}

} // namespace llvm		} // namespace llvm

llvm/test/Transforms/LoopVectorize/X86/force-ifcvt.ll

This file was deleted.

	; RUN: opt -loop-vectorize -S < %s \| FileCheck %s
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"

	; Function Attrs: norecurse nounwind uwtable
	define void @Test(i32* nocapture %res, i32* nocapture readnone %c, i32* nocapture readonly %d, i32* nocapture readonly %p) #0 {
	entry:
	br label %for.body

	; CHECK-LABEL: @Test
	; CHECK: <4 x i32>

	for.body: ; preds = %cond.end, %entry
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %cond.end ]
	%arrayidx = getelementptr inbounds i32, i32* %p, i64 %indvars.iv
	%0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !1
	%cmp1 = icmp eq i32 %0, 0
	%arrayidx3 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv
	%1 = load i32, i32* %arrayidx3, align 4, !llvm.access.group !1
	br i1 %cmp1, label %cond.end, label %cond.false

	cond.false: ; preds = %for.body
	%arrayidx7 = getelementptr inbounds i32, i32* %d, i64 %indvars.iv
	%2 = load i32, i32* %arrayidx7, align 4, !llvm.access.group !1
	%add = add nsw i32 %2, %1
	br label %cond.end

	cond.end: ; preds = %for.body, %cond.false
	%cond = phi i32 [ %add, %cond.false ], [ %1, %for.body ]
	store i32 %cond, i32* %arrayidx3, align 4, !llvm.access.group !1
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp eq i64 %indvars.iv.next, 16
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0

	for.end: ; preds = %cond.end
	ret void
	}

	attributes #0 = { norecurse nounwind uwtable "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" }

	!0 = distinct !{!0, !{!"llvm.loop.parallel_accesses", !1}}
	!1 = distinct !{}

llvm/test/Transforms/LoopVectorize/X86/tail_folding_and_assume_safety.ll

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	if.then:
br label %for.inc		br label %for.inc

for.inc:		for.inc:
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, 1021		%exitcond = icmp eq i64 %indvars.iv.next, 1021
br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !8		br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !8
}		}

; Case2: With pragma assume_safety only the store is masked.		; Case2: With pragma assume_safety both, load and store are masked.
; void assume_safety(int * p, int * q1, int * q2, int guard) {		; void assume_safety(int * p, int * q1, int * q2, int guard) {
; #pragma clang loop vectorize(assume_safety)		; #pragma clang loop vectorize(assume_safety)
; for(int ix=0; ix < 1021; ++ix) {		; for(int ix=0; ix < 1021; ++ix) {
; if (ix > guard) {		; if (ix > guard) {
; p[ix] = q1[ix] + q2[ix];		; p[ix] = q1[ix] + q2[ix];
; }		; }
; }		; }
;}		;}

;CHECK-LABEL: @assume_safety		;CHECK-LABEL: @assume_safety
;CHECK: vector.body:		;CHECK: vector.body:
;CHECK-NOT: @llvm.masked.load		;CHECK: call <8 x i32> @llvm.masked.load
;CHECK: call void @llvm.masked.store		;CHECK: call void @llvm.masked.store

; Function Attrs: norecurse nounwind uwtable		; Function Attrs: norecurse nounwind uwtable
define void @assume_safety(i32* nocapture, i32* nocapture readonly, i32* nocapture readonly, i32) local_unnamed_addr #0 {		define void @assume_safety(i32* nocapture, i32* nocapture readonly, i32* nocapture readonly, i32) local_unnamed_addr #0 {
%5 = sext i32 %3 to i64		%5 = sext i32 %3 to i64
br label %7		br label %7

; <label>:6:		; <label>:6:
▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Parallel annotated loop does not imply all loads can be hoisted.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 351268

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

llvm/test/Transforms/LoopVectorize/X86/force-ifcvt.ll

llvm/test/Transforms/LoopVectorize/X86/tail_folding_and_assume_safety.ll

[LV] Parallel annotated loop does not imply all loads can be hoisted.
ClosedPublic