This is an archive of the discontinued LLVM Phabricator instance.

[LV] Consider non-consecutive vectorizable accesses in max VF selection
ClosedPublic

Authored by mssimpso on Feb 23 2017, 10:33 AM.

Download Raw Diff

Details

Reviewers

delena
mkuper

Commits

rG455c2ee39463: [LV] Considier non-consecutive but vectorizable accesses for VF selection
rL296747: [LV] Considier non-consecutive but vectorizable accesses for VF selection

Summary

When computing the smallest and largest types for selecting the maximum vectorization factor, we currently ignore loads and stores of pointer types if the memory access is non-consecutive. We do this because such accesses must be scalarized regardless of vectorization factor, and thus shouldn't be considered when determining the factor. This patch makes this check less aggressive by also considering non-consecutive accesses that may be vectorized, such as interleaved accesses. Because we don't know at the time of the check if an accesses will certainly be vectorized (this is a cost model decision given a particular VF), we consider all accesses that can potentially be vectorized.

Diff Detail

Repository: rL LLVM

Event Timeline

mssimpso created this revision.Feb 23 2017, 10:33 AM

Herald added a subscriber: mzolotukhin. · View Herald TranscriptFeb 23 2017, 10:33 AM

mkuper added inline comments.Feb 28 2017, 2:46 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
6341 ↗	(On Diff #89536)	Why would this only apply to pointer types, though? What's special about them? (It looks like it was a heuristic of some sort, but I'm not sure it makes sense anymore.)

mssimpso added inline comments.Feb 28 2017, 3:06 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
6341 ↗	(On Diff #89536)	I'm not sure it makes sense anymore either - I'm happy to remove it. It was added when we could only choose the VF based on the size of the largest type. Maybe the pointer size was just used in place of "a large scalar type size that will cause the max VF to be too small"? I'm actually hoping we can enable -vectorizer-maximize-bandwidth at some point, though. For some context, once I commit D29466 and D29675, ARM/AArch64 should be prepared for the change. I discovered the bug here while testing those two patches (with -vectorizer-maximize-bandwidth=false). I was expecting these patches to be NFC, but for the loop in the test case, we were choosing a very large VF by mistake.

LGTM

lib/Transforms/Vectorize/LoopVectorize.cpp
6341 ↗	(On Diff #89536)	Hm, right, if we have a loop that mostly works on i8, but gathers the pointers, we'll have a bad time with the MaxVF. ILet's just keep ignoring it for now... :-) And I'd really like to enable maximize-bandwidth as well. I need to run our tests again and see whether we have any regressions on x86.

mssimpso added inline comments.Feb 28 2017, 3:25 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
6341 ↗	(On Diff #89536)	Sounds good. Thanks, Michael!

Closed by commit rL296747: [LV] Considier non-consecutive but vectorizable accesses for VF selection (authored by mssimpso). · Explain WhyMar 2 2017, 6:07 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

13 lines

test/

Transforms/

LoopVectorize/

AArch64/

smallest-and-widest-types.ll

33 lines

Diff 90325

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,320 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB) {
T = RdxDesc.getRecurrenceType();		T = RdxDesc.getRecurrenceType();
}		}

// Examine the stored values.		// Examine the stored values.
if (auto *ST = dyn_cast<StoreInst>(&I))		if (auto *ST = dyn_cast<StoreInst>(&I))
T = ST->getValueOperand()->getType();		T = ST->getValueOperand()->getType();

// Ignore loaded pointer types and stored pointer types that are not		// Ignore loaded pointer types and stored pointer types that are not
// consecutive. However, we do want to take consecutive stores/loads of		// vectorizable.
// pointer vectors into account.		//
if (T->isPointerTy() && !isConsecutiveLoadOrStore(&I))		// FIXME: The check here attempts to predict whether a load or store will
		// be vectorized. We only know this for certain after a VF has
		// been selected. Here, we assume that if an access can be
		// vectorized, it will be. We should also look at extending this
		// optimization to non-pointer types.
		//
		if (T->isPointerTy() && !isConsecutiveLoadOrStore(&I) &&
		!Legal->isAccessInterleaved(&I) && !Legal->isLegalGatherOrScatter(&I))
continue;		continue;

MinWidth = std::min(MinWidth,		MinWidth = std::min(MinWidth,
(unsigned)DL.getTypeSizeInBits(T->getScalarType()));		(unsigned)DL.getTypeSizeInBits(T->getScalarType()));
MaxWidth = std::max(MaxWidth,		MaxWidth = std::max(MaxWidth,
(unsigned)DL.getTypeSizeInBits(T->getScalarType()));		(unsigned)DL.getTypeSizeInBits(T->getScalarType()));
}		}
}		}
▲ Show 20 Lines • Show All 1,476 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/AArch64/smallest-and-widest-types.ll

				; REQUIRES: asserts
				; RUN: opt < %s -loop-vectorize -debug-only=loop-vectorize -disable-output 2>&1 \| FileCheck %s

				target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64--linux-gnu"

				; CHECK-LABEL: Checking a loop in "interleaved_access"
				; CHECK: The Smallest and Widest types: 64 / 64 bits
				;
				define void @interleaved_access(i8** %A, i64 %N) {
				for.ph:
				br label %for.body

				for.body:
				%i = phi i64 [ %i.next.3, %for.body ], [ 0, %for.ph ]
				%tmp0 = getelementptr inbounds i8, i8* %A, i64 %i
				store i8* null, i8** %tmp0, align 8
				%i.next.0 = add nuw nsw i64 %i, 1
				%tmp1 = getelementptr inbounds i8, i8* %A, i64 %i.next.0
				store i8* null, i8** %tmp1, align 8
				%i.next.1 = add nsw i64 %i, 2
				%tmp2 = getelementptr inbounds i8, i8* %A, i64 %i.next.1
				store i8* null, i8** %tmp2, align 8
				%i.next.2 = add nsw i64 %i, 3
				%tmp3 = getelementptr inbounds i8, i8* %A, i64 %i.next.2
				store i8* null, i8** %tmp3, align 8
				%i.next.3 = add nsw i64 %i, 4
				%cond = icmp slt i64 %i.next.3, %N
				br i1 %cond, label %for.body, label %for.end

				for.end:
				ret void
				}