This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
4/8
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
1
irregular_type.ll

Differential D97465

[LoopVectorize] Refine hasIrregularType predicate
ClosedPublic

Authored by LemonBoy on Feb 25 2021, 5:56 AM.

Download Raw Diff

Details

Reviewers

mkuper
fhahn
craig.topper
david-arm
lebedev.ri

Commits

rG4f024938e4c9: [LoopVectorize] Refine hasIrregularType predicate

Summary

The hasIrregularType predicate checks whether an array of N values of type Ty is "bitcast-compatible" with a <N x Ty> vector.
The previous check returned invalid results in some cases where there's some padding between the array elements: eg. a 4-element array of u7 values is considered as compatible with <4 x u7>, even though the vector is only loading/storing 28 bits instead of 32.

The problem causes LLVM to generate incorrect code for some targets: for AArch64 the vector loads/stores are lowered in terms of ubfx/bfi, effectively losing the top (N * padding bits).

Diff Detail

Unit TestsFailed

	Time	Test
	30 ms	x64 windows > LLVM.ExecutionEngine/JITLink/AArch64::MachO_arm64_ehframe.test
	40 ms	x64 windows > LLVM.ExecutionEngine/JITLink/AArch64::MachO_arm64_relocations.s
	50 ms	x64 windows > LLVM.ExecutionEngine/JITLink/X86::ELF_skip_debug_sections.s
	70 ms	x64 windows > LLVM.ExecutionEngine/JITLink/X86::ELF_weak_definitions.s
	40 ms	x64 windows > LLVM.ExecutionEngine/JITLink/X86::ELF_x86-64_common.s
		View Full Test Results (19 Failed)

Event Timeline

LemonBoy created this revision.Feb 25 2021, 5:56 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald TranscriptFeb 25 2021, 5:56 AM

LemonBoy requested review of this revision.Feb 25 2021, 5:56 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 25 2021, 5:56 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B90805: Diff 326369.Feb 25 2021, 6:38 AM

craig.topper added inline comments.Feb 25 2021, 11:50 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
374–375	Should we remove the unused VF argument?

Remove unused parameter.

LemonBoy marked an inline comment as done.Feb 25 2021, 1:50 PM

LemonBoy added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
374–375	Good catch, fixed.

Harbormaster completed remote builds in B90890: Diff 326488.Feb 25 2021, 4:21 PM

fhahn added inline comments.Mar 2 2021, 3:56 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
374–375	comment needs updating, there's no given vectorization factor any longer, right?
374–375	The comment also needs to be updated to not refer to VF I think, as it is gone now. Something like `Determine if an array of type Ty is "bit cast compatible" with a vector with the same number of elements`.
llvm/test/Transforms/LoopVectorize/irregular_type.ll
6	can you add a comment explaining what the test checks?

LemonBoy added a subscriber: david-arm.Mar 2 2021, 4:14 AM

LemonBoy added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
381	CC @david-arm This slightly different formulation of the check was introduced in c5ba0d33cc060cc06a28a5d9101060afd1c0ee9a.

Update some documentation comments.

Harbormaster completed remote builds in B91608: Diff 327503.Mar 2 2021, 12:09 PM

LemonBoy added a reviewer: david-arm.Mar 6 2021, 12:11 AM

Ping ?

Seems good to me.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
380–381	I wonder why can't we still vectorize such cases, by instead loading `<N x DL.getTypeAllocSizeInBits(Ty)>` vector and then truncating it? (beware of endianness)

This revision is now accepted and ready to land.Mar 16 2021, 1:08 PM

LemonBoy added inline comments.Mar 17 2021, 4:31 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
380–381	This was meant to be a hotfix targeting LLVM12. I've experimented with the widen+truncate strategy and the results are promising (at least on x86), I'll submit a patch once I clean up the code.

lebedev.ri added inline comments.Mar 17 2021, 4:32 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
380–381	This was meant to be a hotfix targeting LLVM12. Sure, i wasn't even suggesting doing that here. I've experimented with the widen+truncate strategy and the results are promising (at least on x86), I'll submit a patch once I clean up the code. Nice!

This revision was landed with ongoing or failed builds.Mar 17 2021, 9:05 AM

Closed by commit rG4f024938e4c9: [LoopVectorize] Refine hasIrregularType predicate (authored by LemonBoy). · Explain Why

This revision was automatically updated to reflect the committed changes.

LemonBoy added a commit: rG4f024938e4c9: [LoopVectorize] Refine hasIrregularType predicate.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

22 lines

test/

Transforms/

LoopVectorize/

irregular_type.ll

27 lines

Diff 327503

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 365 Lines • ▼ Show 20 Lines	static Type getMemInstValueType(Value I) {
assert((isa<LoadInst>(I) \|\| isa<StoreInst>(I)) &&		assert((isa<LoadInst>(I) \|\| isa<StoreInst>(I)) &&
"Expected Load or Store instruction");		"Expected Load or Store instruction");
if (auto *LI = dyn_cast<LoadInst>(I))		if (auto *LI = dyn_cast<LoadInst>(I))
return LI->getType();		return LI->getType();
return cast<StoreInst>(I)->getValueOperand()->getType();		return cast<StoreInst>(I)->getValueOperand()->getType();
}		}

/// A helper function that returns true if the given type is irregular. The		/// A helper function that returns true if the given type is irregular. The
/// type is irregular if its allocated size doesn't equal the store size of an		/// type is irregular if its allocated size doesn't equal the store size of an
/// element of the corresponding vector type at the given vectorization factor.		/// element of the corresponding vector type.
		craig.topperUnsubmitted Not Done Reply Inline Actions Should we remove the unused VF argument? craig.topper: Should we remove the unused VF argument?
		LemonBoyAuthorUnsubmitted Done Reply Inline Actions Good catch, fixed. LemonBoy: Good catch, fixed.
		fhahnUnsubmitted Not Done Reply Inline Actions comment needs updating, there's no given vectorization factor any longer, right? fhahn: comment needs updating, there's no given vectorization factor any longer, right?
		fhahnUnsubmitted Not Done Reply Inline Actions The comment also needs to be updated to not refer to VF I think, as it is gone now. Something like `Determine if an array of type Ty is "bit cast compatible" with a vector with the same number of elements`. fhahn: The comment also needs to be updated to not refer to VF I think, as it is gone now. Something…
static bool hasIrregularType(Type *Ty, const DataLayout &DL, ElementCount VF) {		static bool hasIrregularType(Type *Ty, const DataLayout &DL) {
// Determine if an array of VF elements of type Ty is "bitcast compatible"		// Determine if an array of N elements of type Ty is "bitcast compatible"
// with a <VF x Ty> vector.		// with a <N x Ty> vector.
if (VF.isVector()) {		// This is only true if there is no padding between the array elements.
auto *VectorTy = VectorType::get(Ty, VF);
return TypeSize::get(VF.getKnownMinValue() *
LemonBoyAuthorUnsubmitted Done Reply Inline Actions CC @david-arm This slightly different formulation of the check was introduced in c5ba0d33cc060cc06a28a5d9101060afd1c0ee9a. LemonBoy: CC @david-arm This slightly different formulation of the check was introduced in…
DL.getTypeAllocSize(Ty).getFixedValue(),
VF.isScalable()) != DL.getTypeStoreSize(VectorTy);
}

// If the vectorization factor is one, we just check if an array of type Ty
// requires padding between elements.
return DL.getTypeAllocSizeInBits(Ty) != DL.getTypeSizeInBits(Ty);		return DL.getTypeAllocSizeInBits(Ty) != DL.getTypeSizeInBits(Ty);
}		}
		lebedev.riUnsubmitted Not Done Reply Inline Actions I wonder why can't we still vectorize such cases, by instead loading `<N x DL.getTypeAllocSizeInBits(Ty)>` vector and then truncating it? (beware of endianness) lebedev.ri: I wonder why can't we still vectorize such cases, by instead loading `<N x DL.
		LemonBoyAuthorUnsubmitted Done Reply Inline Actions This was meant to be a hotfix targeting LLVM12. I've experimented with the widen+truncate strategy and the results are promising (at least on x86), I'll submit a patch once I clean up the code. LemonBoy: This was meant to be a hotfix targeting LLVM12. I've experimented with the widen+truncate…
		lebedev.riUnsubmitted Done Reply Inline Actions This was meant to be a hotfix targeting LLVM12. Sure, i wasn't even suggesting doing that here. I've experimented with the widen+truncate strategy and the results are promising (at least on x86), I'll submit a patch once I clean up the code. Nice! lebedev.ri: > This was meant to be a hotfix targeting LLVM12. Sure, i wasn't even suggesting doing that…

/// A helper function that returns the reciprocal of the block probability of		/// A helper function that returns the reciprocal of the block probability of
/// predicated blocks. If we return X, we are assuming the predicated block		/// predicated blocks. If we return X, we are assuming the predicated block
/// will execute once for every X iterations of the loop header.		/// will execute once for every X iterations of the loop header.
///		///
/// TODO: We should use actual block probability here, if available. Currently,		/// TODO: We should use actual block probability here, if available. Currently,
/// we always assume predicated blocks have a 50% chance of executing.		/// we always assume predicated blocks have a 50% chance of executing.
static unsigned getReciprocalPredBlockProb() { return 2; }		static unsigned getReciprocalPredBlockProb() { return 2; }
▲ Show 20 Lines • Show All 4,684 Lines • ▼ Show 20 Lines	assert(getWideningDecision(I, VF) == CM_Unknown &&
"Decision should not be set yet.");		"Decision should not be set yet.");
auto *Group = getInterleavedAccessGroup(I);		auto *Group = getInterleavedAccessGroup(I);
assert(Group && "Must have a group.");		assert(Group && "Must have a group.");

// If the instruction's allocated size doesn't equal it's type size, it		// If the instruction's allocated size doesn't equal it's type size, it
// requires padding and will be scalarized.		// requires padding and will be scalarized.
auto &DL = I->getModule()->getDataLayout();		auto &DL = I->getModule()->getDataLayout();
auto *ScalarTy = getMemInstValueType(I);		auto *ScalarTy = getMemInstValueType(I);
if (hasIrregularType(ScalarTy, DL, VF))		if (hasIrregularType(ScalarTy, DL))
return false;		return false;

// Check if masking is required.		// Check if masking is required.
// A Group may need masking for one of two reasons: it resides in a block that		// A Group may need masking for one of two reasons: it resides in a block that
// needs predication, or it was decided to use masking to deal with gaps.		// needs predication, or it was decided to use masking to deal with gaps.
bool PredicatedAccessRequiresMasking =		bool PredicatedAccessRequiresMasking =
Legal->blockNeedsPredication(I->getParent()) && Legal->isMaskRequired(I);		Legal->blockNeedsPredication(I->getParent()) && Legal->isMaskRequired(I);
bool AccessWithGapsRequiresMasking =		bool AccessWithGapsRequiresMasking =
Show All 30 Lines	bool LoopVectorizationCostModel::memoryInstructionCanBeWidened(
// scalarized.		// scalarized.
if (isScalarWithPredication(I))		if (isScalarWithPredication(I))
return false;		return false;

// If the instruction's allocated size doesn't equal it's type size, it		// If the instruction's allocated size doesn't equal it's type size, it
// requires padding and will be scalarized.		// requires padding and will be scalarized.
auto &DL = I->getModule()->getDataLayout();		auto &DL = I->getModule()->getDataLayout();
auto *ScalarTy = LI ? LI->getType() : SI->getValueOperand()->getType();		auto *ScalarTy = LI ? LI->getType() : SI->getValueOperand()->getType();
if (hasIrregularType(ScalarTy, DL, VF))		if (hasIrregularType(ScalarTy, DL))
return false;		return false;

return true;		return true;
}		}

void LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) {		void LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) {
// We should not collect Uniforms more than once per VF. Right now,		// We should not collect Uniforms more than once per VF. Right now,
// this function is called from collectUniformsAndScalars(), which		// this function is called from collectUniformsAndScalars(), which
▲ Show 20 Lines • Show All 4,608 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/irregular_type.ll

This file was added.

				; RUN: opt %s -loop-vectorize -force-vector-width=4 -S \| FileCheck %s

				; Ensure the array loads/stores are not optimized into vector operations when
				; the element type has padding bits.

				; CHECK: foo
				fhahnUnsubmitted Not Done Reply Inline Actions can you add a comment explaining what the test checks? fhahn: can you add a comment explaining what the test checks?
				; CHECK: vector.body
				; CHECK-NOT: load <4 x i7>
				; CHECK-NOT: store <4 x i7>
				; CHECK: for.body
				define void @foo(i7* %a, i64 %n) {
				entry:
				br label %for.body

				for.body:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds i7, i7* %a, i64 %indvars.iv
				%0 = load i7, i7* %arrayidx, align 1
				%sub = add nuw nsw i7 %0, 0
				store i7 %sub, i7* %arrayidx, align 1
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%cmp = icmp eq i64 %indvars.iv.next, %n
				br i1 %cmp, label %for.exit, label %for.body

				for.exit:
				ret void
				}