This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
CodeGen/
-
BasicTTIImpl.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Target/
-
AArch64/
-
AArch64ISelLowering.h
-
AArch64ISelLowering.cpp
-
AArch64TargetTransformInfo.h
-
RISCV/
-
RISCVTargetTransformInfo.h
-
Transforms/Vectorize/
-
Vectorize/
9/13
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/AArch64/
-
Transforms/
-
LoopVectorize/
-
AArch64/
1/1
eliminate-tail-predication.ll
-
sve-tail-folding.ll

Differential D146199

[LoopVectorize] Don't tail-fold for scalable VFs when there is no scalar tail
ClosedPublic

Authored by david-arm on Mar 16 2023, 1:50 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
reames
dtemirbulatov
dmgreen
MattDevereau

Commits

rG1c4fedfa35ae: [LoopVectorize] Don't tail-fold for scalable VFs when there is no scalar tail

Summary

Currently in LoopVectorize we avoid tail-folding if we can
prove the trip count is always a multiple of the maximum
fixed-width VF. This works because we know the vectoriser
only ever chooses a VF that is a power of 2. However, if
we are also considering scalable VFs then we conservatively
bail out of the optimisation because we don't know the value
of vscale, which could be an odd or prime number, etc.

This patch tries to enable the same optimisation for scalable
VFs by asking if vscale is known to be a power of 2. If so,
we can then query the maximum value of vscale and use the same
logic as we do for fixed-width VFs. I've also added a new TTI
hook called isVScaleKnownToBeAPowerOfTwo that does the same
thing as the existing TargetLowering hook.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

david-arm created this revision.Mar 16 2023, 1:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2023, 1:50 AM

Herald added subscribers: luke, shiva0217, frasercrmck and 22 others. · View Herald Transcript

david-arm requested review of this revision.Mar 16 2023, 1:50 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2023, 1:50 AM

Herald added subscribers: llvm-commits, • pcwang-thead, alextsao1999, MaskRay. · View Herald Transcript

The changes to make the tests more robust should probably be submitted separately so the diff only shows the functional test changes. Also, would be good to pre-commit new tests like vector_add_trip1024

I had written a very similar patch recently, but it would only use the fixed length if the scalable was unknown. The performance of it was pretty bad though, so I ended up dropping it. I had noticed that there is an xfail in llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll at the moment. Can it now be replaced with a check for store <vscale x 4 x i32>?

TargetTransformInfo::isVScaleKnownToBeAPowerOfTwo isn't going to be useable from all the places that need it like instcombine. It might be best to add it to somewhere like vscale_range in the long run?

Harbormaster completed remote builds in B219805: Diff 505723.Mar 16 2023, 3:15 AM

Rebased on top of NFC test patch.

david-arm added a parent revision: D146219: [NFC][LoopVectorize] Change trip counts for some tests to guarantee a scalar tail.Mar 16 2023, 4:59 AM

Harbormaster completed remote builds in B219841: Diff 505773.Mar 16 2023, 5:58 AM

LGTM

I do wonder if we should consider redefining vscale in LangRef to be a power of two. I don't think we have a motivating example for a non power of two. I believe this was originally to support SVE whose original version didn't mandate power of two vector sizes, but I believe a later version of that specification mandated them didn't it?

llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll
2	Please use update-test-checks.py

This revision is now accepted and ready to land.Mar 16 2023, 8:30 AM

sdesmalen added inline comments.Mar 16 2023, 10:16 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5161	Is `MaxRuntimeVF` a better name? Also, is it worth making this a `std::optional<unsigned>`? I struggled to wrap my head around what `MaxPossibleVF = 1` meant, because I assumed this meant "don't vectorize, use scalar instructions", given other uses of `VF=1` in the LV.
5164	I initially wanted to ask "should this be `else if (MaxFactors.ScalableVF) {`?" but then I realised we need to be conservative here because the decision (not) to do tail folding is made for _both_ fixed and scalable VFs. So should this code be considering the `max(MaxFactors.FixedVF, MaxVScale * MaxFactors.ScalableVF)` instead?
5171–5172	nit: move assignment into the condition? if (std::optional<unsigned> MaxVScale = getMaxVScale(*TheFunction, TTI)) { .. }
5179	nit: This default is already set on line 5155, so there's no need to set it again here?

Hi @paulwalker-arm,

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5179	I don't think we can because this could have been set to something greater than 1 for the fixed-width case, since for scalable vectorisation we usually have both a max fixed-width VF and a max scalable VF. As you say above, we're kind of taking the max of the fixed-width and scalable VFs, but only if safe to do so for scalable VFs. I agree the logic is kind of awkward, but it's not immediately obvious to me how to make it better?

Matt added a subscriber: Matt.Mar 17 2023, 11:11 AM

Addressed review comments about using std::optional

david-arm marked 5 inline comments as done.Mar 21 2023, 9:49 AM

In D146199#4198834, @dmgreen wrote:

I had written a very similar patch recently, but it would only use the fixed length if the scalable was unknown. The performance of it was pretty bad though, so I ended up dropping it. I had noticed that there is an xfail in llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll at the moment. Can it now be replaced with a check for store <vscale x 4 x i32>?

TargetTransformInfo::isVScaleKnownToBeAPowerOfTwo isn't going to be useable from all the places that need it like instcombine. It might be best to add it to somewhere like vscale_range in the long run?

Hi @dmgreen, so I don't have a strong objection to doing this as a new vscale_power_of_2 attribute, but I am trying to avoid changing the LangRef again if we don't have a compelling case to do so yet. This is what we did originally with the vscale max, i.e. we first added a TTI interface, then as time went on we saw more and more convincing arguments for moving this to be a vscale_range attribute instead. There is nothing to stop us doing something similar in future I think, right? Of course, there is already a TLI hook of the same name that would need removing too.

Harbormaster completed remote builds in B220754: Diff 507024.Mar 21 2023, 11:11 AM

Hi @dmgreen, so I don't have a strong objection to doing this as a new vscale_power_of_2 attribute, but I am trying to avoid changing the LangRef again if we don't have a compelling case to do so yet. This is what we did originally with the vscale max, i.e. we first added a TTI interface, then as time went on we saw more and more convincing arguments for moving this to be a vscale_range attribute instead. There is nothing to stop us doing something similar in future I think, right? Of course, there is already a TLI hook of the same name that would need removing too.

Yeah sounds good. I was thinking as a followup, something to keep in mind. It could even go like @reames suggested where we only support power-of-2 vscales - I heard that GCC always took that route for SVE and has only ever supported power-of-2 vscales.

sdesmalen added inline comments.Mar 23 2023, 9:27 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5161	Sorry for changing my mind on this, but I wonder if `MaxPowerOf2RuntimeVF` is more appropriate name, since the value is `std::nullopt` if the VF is not a power of 2.
5161–5178	You can write this a bit more compactly using std::max: std::optional<unsigned> MaxPowerOf2RuntimeVF = MaxFactors.FixedVF.getFixedValue(); if (MaxFactors.ScalableVF) { std::optional<unsigned> MaxVScale = getMaxVScale(TheFunction, TTI); if (MaxVScale && TTI.isVScaleKnownToBeAPowerOfTwo()) { MaxPowerOf2RuntimeVF = std::max<unsigned>( MaxPowerOf2RuntimeVF, MaxVScale MaxFactors.ScalableVF.getKnownMinValue()); } else MaxPowerOf2RuntimeVF = std::nullopt; // Stick with tail-folding for now. }
5165–5169	I find this comment more confusing than enlightening. I also think that the issue is more that we decide early in the process whether or not to use tail folding, before we get to decide on a VF. We could make a more fine-grained choice if we would make those choices together. But when I read this comment, I'm not really thinking about that. In any case, I'm happy for the comment to be removed since the code itself is clear enough.

david-arm mentioned this in rGbd0c281fcdcb: [NFC][LoopVectorize] Change trip counts for some tests to guarantee a scalar….Mar 24 2023, 2:44 AM

Addressed review comments.

david-arm marked 3 inline comments as done.Mar 24 2023, 5:57 AM

Thanks @david-arm, LGTM.

Harbormaster completed remote builds in B221566: Diff 508065.Mar 24 2023, 6:41 AM

Closed by commit rG1c4fedfa35ae: [LoopVectorize] Don't tail-fold for scalable VFs when there is no scalar tail (authored by david-arm). · Explain WhyMar 27 2023, 1:34 AM

This revision was automatically updated to reflect the committed changes.

david-arm added a commit: rG1c4fedfa35ae: [LoopVectorize] Don't tail-fold for scalable VFs when there is no scalar tail.

ABataev added a subscriber: ABataev.Mar 28 2023, 12:57 PM

ABataev added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5161	Looks like this std::optional does not work here, because MaxFactors.FixedVF.getFixedValue() returns non_optional value. So it may crash in the assert in line 5174, if MaxFactors.FixedVF.getFixedValue() returns 0 and MaxFactors.ScalableVF is not set.

sdesmalen added inline comments.Mar 28 2023, 2:19 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5161	My understanding is that normally (when not forcing it through an option) MaxFactors.FixedVF will be >= 1, are you saying that is not the case? I agree it's worth adding an extra check.

ABataev added inline comments.Mar 28 2023, 2:52 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5161	It might be 0 too

david-arm added inline comments.Mar 29 2023, 1:18 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5173	@ABataev @sdesmalen I am happy to add an extra check here such as if (MaxPowerOf2RuntimeVF && *MaxPowerOf2RuntimeVF > 0) { but I there may be no way to add a test to defend the non-zero behaviour. You'd need to have one of these scenarios: computeFeasibleMaxVF returns 0 and 0 for both fixed-width and scalable VFs. Not sure how this could ever happen? computeFeasibleMaxVF returns 0 for fixed-width and non-zero for scalable VFs, but there is no vscale max or vscale is not a power of 2. This may be possible in theory, but difficult to find an actual test that exhibits this behaviour. Regardless, even if I can't write a test for this I am happy to add a check!

ABataev added inline comments.Mar 29 2023, 5:56 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5173	I think it can be reproduced if: Legal->getMaxSafeVectorWidthInBits() / WidestType == 0 (i.e. there is a dependency in the loop, but WidestType is larger than the dependency). UserVF is specified with the fixed vector length, which is greater than 0.

Allen mentioned this in D154314: [LV] Remove the reminder loop if we know the mask is always true.Jul 2 2023, 7:40 PM

Allen mentioned this in D154953: [InstSimplify] Remove the remainder loop if we know the mask is always true.Jul 11 2023, 5:13 AM

Allen added a child revision: D154953: [InstSimplify] Remove the remainder loop if we know the mask is always true.Jul 15 2023, 4:51 AM

Allen mentioned this in rG3e386b227886: [InstSimplify] Remove the remainder loop if we know the mask is always true.Jul 31 2023, 8:23 PM

Allen mentioned this in rG497966f7f2bb: Reland [InstSimplify] Remove the remainder loop if we know the mask is always….Aug 1 2023, 7:23 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

7 lines

TargetTransformInfoImpl.h

1 line

CodeGen/

BasicTTIImpl.h

1 line

lib/

Analysis/

TargetTransformInfo.cpp

4 lines

Target/

AArch64/

AArch64ISelLowering.h

2 lines

AArch64ISelLowering.cpp

4 lines

AArch64TargetTransformInfo.h

2 lines

RISCV/

RISCVTargetTransformInfo.h

4 lines

Transforms/

Vectorize/

LoopVectorize.cpp

26 lines

test/

Transforms/

LoopVectorize/

AArch64/

eliminate-tail-predication.ll

53 lines

sve-tail-folding.ll

43 lines

Diff 508525

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 1,049 Lines • ▼ Show 20 Lines	public:

/// \return The maximum value of vscale if the target specifies an		/// \return The maximum value of vscale if the target specifies an
/// architectural maximum vector length, and std::nullopt otherwise.		/// architectural maximum vector length, and std::nullopt otherwise.
std::optional<unsigned> getMaxVScale() const;		std::optional<unsigned> getMaxVScale() const;

/// \return the value of vscale to tune the cost model for.		/// \return the value of vscale to tune the cost model for.
std::optional<unsigned> getVScaleForTuning() const;		std::optional<unsigned> getVScaleForTuning() const;

		/// \return true if vscale is known to be a power of 2
		bool isVScaleKnownToBeAPowerOfTwo() const;

/// \return True if the vectorization factor should be chosen to		/// \return True if the vectorization factor should be chosen to
/// make the vector of the smallest element type match the size of a		/// make the vector of the smallest element type match the size of a
/// vector register. For wider element types, this could result in		/// vector register. For wider element types, this could result in
/// creating vectors that span multiple vector registers.		/// creating vectors that span multiple vector registers.
/// If false, the vectorization factor will be chosen based on the		/// If false, the vectorization factor will be chosen based on the
/// size of the widest element type.		/// size of the widest element type.
/// \p K Register Kind for vectorization.		/// \p K Register Kind for vectorization.
bool shouldMaximizeVectorBandwidth(TargetTransformInfo::RegisterKind K) const;		bool shouldMaximizeVectorBandwidth(TargetTransformInfo::RegisterKind K) const;
▲ Show 20 Lines • Show All 744 Lines • ▼ Show 20 Lines	public:
virtual unsigned getNumberOfRegisters(unsigned ClassID) const = 0;		virtual unsigned getNumberOfRegisters(unsigned ClassID) const = 0;
virtual unsigned getRegisterClassForType(bool Vector,		virtual unsigned getRegisterClassForType(bool Vector,
Type *Ty = nullptr) const = 0;		Type *Ty = nullptr) const = 0;
virtual const char *getRegisterClassName(unsigned ClassID) const = 0;		virtual const char *getRegisterClassName(unsigned ClassID) const = 0;
virtual TypeSize getRegisterBitWidth(RegisterKind K) const = 0;		virtual TypeSize getRegisterBitWidth(RegisterKind K) const = 0;
virtual unsigned getMinVectorRegisterBitWidth() const = 0;		virtual unsigned getMinVectorRegisterBitWidth() const = 0;
virtual std::optional<unsigned> getMaxVScale() const = 0;		virtual std::optional<unsigned> getMaxVScale() const = 0;
virtual std::optional<unsigned> getVScaleForTuning() const = 0;		virtual std::optional<unsigned> getVScaleForTuning() const = 0;
		virtual bool isVScaleKnownToBeAPowerOfTwo() const = 0;
virtual bool		virtual bool
shouldMaximizeVectorBandwidth(TargetTransformInfo::RegisterKind K) const = 0;		shouldMaximizeVectorBandwidth(TargetTransformInfo::RegisterKind K) const = 0;
virtual ElementCount getMinimumVF(unsigned ElemWidth,		virtual ElementCount getMinimumVF(unsigned ElemWidth,
bool IsScalable) const = 0;		bool IsScalable) const = 0;
virtual unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const = 0;		virtual unsigned getMaximumVF(unsigned ElemWidth, unsigned Opcode) const = 0;
virtual unsigned getStoreMinimumVF(unsigned VF, Type *ScalarMemTy,		virtual unsigned getStoreMinimumVF(unsigned VF, Type *ScalarMemTy,
Type *ScalarValTy) const = 0;		Type *ScalarValTy) const = 0;
virtual bool shouldConsiderAddressTypePromotion(		virtual bool shouldConsiderAddressTypePromotion(
▲ Show 20 Lines • Show All 529 Lines • ▼ Show 20 Lines	unsigned getMinVectorRegisterBitWidth() const override {
return Impl.getMinVectorRegisterBitWidth();		return Impl.getMinVectorRegisterBitWidth();
}		}
std::optional<unsigned> getMaxVScale() const override {		std::optional<unsigned> getMaxVScale() const override {
return Impl.getMaxVScale();		return Impl.getMaxVScale();
}		}
std::optional<unsigned> getVScaleForTuning() const override {		std::optional<unsigned> getVScaleForTuning() const override {
return Impl.getVScaleForTuning();		return Impl.getVScaleForTuning();
}		}
		bool isVScaleKnownToBeAPowerOfTwo() const override {
		return Impl.isVScaleKnownToBeAPowerOfTwo();
		}
bool shouldMaximizeVectorBandwidth(		bool shouldMaximizeVectorBandwidth(
TargetTransformInfo::RegisterKind K) const override {		TargetTransformInfo::RegisterKind K) const override {
return Impl.shouldMaximizeVectorBandwidth(K);		return Impl.shouldMaximizeVectorBandwidth(K);
}		}
ElementCount getMinimumVF(unsigned ElemWidth,		ElementCount getMinimumVF(unsigned ElemWidth,
bool IsScalable) const override {		bool IsScalable) const override {
return Impl.getMinimumVF(ElemWidth, IsScalable);		return Impl.getMinimumVF(ElemWidth, IsScalable);
}		}
▲ Show 20 Lines • Show All 423 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 433 Lines • ▼ Show 20 Lines	public:
TypeSize getRegisterBitWidth(TargetTransformInfo::RegisterKind K) const {		TypeSize getRegisterBitWidth(TargetTransformInfo::RegisterKind K) const {
return TypeSize::getFixed(32);		return TypeSize::getFixed(32);
}		}

unsigned getMinVectorRegisterBitWidth() const { return 128; }		unsigned getMinVectorRegisterBitWidth() const { return 128; }

std::optional<unsigned> getMaxVScale() const { return std::nullopt; }		std::optional<unsigned> getMaxVScale() const { return std::nullopt; }
std::optional<unsigned> getVScaleForTuning() const { return std::nullopt; }		std::optional<unsigned> getVScaleForTuning() const { return std::nullopt; }
		bool isVScaleKnownToBeAPowerOfTwo() const { return false; }

bool		bool
shouldMaximizeVectorBandwidth(TargetTransformInfo::RegisterKind K) const {		shouldMaximizeVectorBandwidth(TargetTransformInfo::RegisterKind K) const {
return false;		return false;
}		}

ElementCount getMinimumVF(unsigned ElemWidth, bool IsScalable) const {		ElementCount getMinimumVF(unsigned ElemWidth, bool IsScalable) const {
return ElementCount::get(0, IsScalable);		return ElementCount::get(0, IsScalable);
▲ Show 20 Lines • Show All 905 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 708 Lines • ▼ Show 20 Lines	public:
/// @{		/// @{

TypeSize getRegisterBitWidth(TargetTransformInfo::RegisterKind K) const {		TypeSize getRegisterBitWidth(TargetTransformInfo::RegisterKind K) const {
return TypeSize::getFixed(32);		return TypeSize::getFixed(32);
}		}

std::optional<unsigned> getMaxVScale() const { return std::nullopt; }		std::optional<unsigned> getMaxVScale() const { return std::nullopt; }
std::optional<unsigned> getVScaleForTuning() const { return std::nullopt; }		std::optional<unsigned> getVScaleForTuning() const { return std::nullopt; }
		bool isVScaleKnownToBeAPowerOfTwo() const { return false; }

/// Estimate the overhead of scalarizing an instruction. Insert and Extract		/// Estimate the overhead of scalarizing an instruction. Insert and Extract
/// are set if the demanded result elements need to be inserted and/or		/// are set if the demanded result elements need to be inserted and/or
/// extracted from vectors.		/// extracted from vectors.
InstructionCost getScalarizationOverhead(VectorType *InTy,		InstructionCost getScalarizationOverhead(VectorType *InTy,
const APInt &DemandedElts,		const APInt &DemandedElts,
bool Insert, bool Extract,		bool Insert, bool Extract,
TTI::TargetCostKind CostKind) {		TTI::TargetCostKind CostKind) {
▲ Show 20 Lines • Show All 1,745 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 674 Lines • ▼ Show 20 Lines
	std::optional<unsigned> TargetTransformInfo::getMaxVScale() const {			std::optional<unsigned> TargetTransformInfo::getMaxVScale() const {
	return TTIImpl->getMaxVScale();			return TTIImpl->getMaxVScale();
	}			}

	std::optional<unsigned> TargetTransformInfo::getVScaleForTuning() const {			std::optional<unsigned> TargetTransformInfo::getVScaleForTuning() const {
	return TTIImpl->getVScaleForTuning();			return TTIImpl->getVScaleForTuning();
	}			}

				bool TargetTransformInfo::isVScaleKnownToBeAPowerOfTwo() const {
				return TTIImpl->isVScaleKnownToBeAPowerOfTwo();
				}

	bool TargetTransformInfo::shouldMaximizeVectorBandwidth(			bool TargetTransformInfo::shouldMaximizeVectorBandwidth(
	TargetTransformInfo::RegisterKind K) const {			TargetTransformInfo::RegisterKind K) const {
	return TTIImpl->shouldMaximizeVectorBandwidth(K);			return TTIImpl->shouldMaximizeVectorBandwidth(K);
	}			}

	ElementCount TargetTransformInfo::getMinimumVF(unsigned ElemWidth,			ElementCount TargetTransformInfo::getMinimumVF(unsigned ElemWidth,
	bool IsScalable) const {			bool IsScalable) const {
	return TTIImpl->getMinimumVF(ElemWidth, IsScalable);			return TTIImpl->getMinimumVF(ElemWidth, IsScalable);
	▲ Show 20 Lines • Show All 569 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 911 Lines • ▼ Show 20 Lines	public:
/// If a change in streaming mode is required on entry to/return from a		/// If a change in streaming mode is required on entry to/return from a
/// function call it emits and returns the corresponding SMSTART or SMSTOP node.		/// function call it emits and returns the corresponding SMSTART or SMSTOP node.
/// \p Entry tells whether this is before/after the Call, which is necessary		/// \p Entry tells whether this is before/after the Call, which is necessary
/// because PSTATE.SM is only queried once.		/// because PSTATE.SM is only queried once.
SDValue changeStreamingMode(SelectionDAG &DAG, SDLoc DL, bool Enable,		SDValue changeStreamingMode(SelectionDAG &DAG, SDLoc DL, bool Enable,
SDValue Chain, SDValue InFlag,		SDValue Chain, SDValue InFlag,
SDValue PStateSM, bool Entry) const;		SDValue PStateSM, bool Entry) const;

bool isVScaleKnownToBeAPowerOfTwo() const override;		bool isVScaleKnownToBeAPowerOfTwo() const override { return true; }

// Normally SVE is only used for byte size vectors that do not fit within a		// Normally SVE is only used for byte size vectors that do not fit within a
// NEON vector. This changes when OverrideNEON is true, allowing SVE to be		// NEON vector. This changes when OverrideNEON is true, allowing SVE to be
// used for 64bit and 128bit vectors as well.		// used for 64bit and 128bit vectors as well.
bool useSVEForFixedLengthVectorVT(EVT VT, bool OverrideNEON = false) const;		bool useSVEForFixedLengthVectorVT(EVT VT, bool OverrideNEON = false) const;

private:		private:
/// Keep a pointer to the AArch64Subtarget around so that we can		/// Keep a pointer to the AArch64Subtarget around so that we can
▲ Show 20 Lines • Show All 306 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,024 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
}		}
}		}
}		}

bool AArch64TargetLowering::mergeStoresAfterLegalization(EVT VT) const {		bool AArch64TargetLowering::mergeStoresAfterLegalization(EVT VT) const {
return !Subtarget->useSVEForFixedLengthVectors();		return !Subtarget->useSVEForFixedLengthVectors();
}		}

bool AArch64TargetLowering::isVScaleKnownToBeAPowerOfTwo() const {
return true;
}

bool AArch64TargetLowering::useSVEForFixedLengthVectorVT(		bool AArch64TargetLowering::useSVEForFixedLengthVectorVT(
EVT VT, bool OverrideNEON) const {		EVT VT, bool OverrideNEON) const {
if (!VT.isFixedLengthVector() \|\| !VT.isSimple())		if (!VT.isFixedLengthVector() \|\| !VT.isSimple())
return false;		return false;

// Don't use SVE for vectors we cannot scalarize if required.		// Don't use SVE for vectors we cannot scalarize if required.
switch (VT.getVectorElementType().getSimpleVT().SimpleTy) {		switch (VT.getVectorElementType().getSimpleVT().SimpleTy) {
// Fixed length predicates should be promoted to i8.		// Fixed length predicates should be promoted to i8.
▲ Show 20 Lines • Show All 18,373 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	public:
unsigned getMinVectorRegisterBitWidth() const {		unsigned getMinVectorRegisterBitWidth() const {
return ST->getMinVectorRegisterBitWidth();		return ST->getMinVectorRegisterBitWidth();
}		}

std::optional<unsigned> getVScaleForTuning() const {		std::optional<unsigned> getVScaleForTuning() const {
return ST->getVScaleForTuning();		return ST->getVScaleForTuning();
}		}

		bool isVScaleKnownToBeAPowerOfTwo() const { return true; }

bool shouldMaximizeVectorBandwidth(TargetTransformInfo::RegisterKind K) const;		bool shouldMaximizeVectorBandwidth(TargetTransformInfo::RegisterKind K) const;

/// Try to return an estimate cost factor that can be used as a multiplier		/// Try to return an estimate cost factor that can be used as a multiplier
/// when scalarizing an operation for a vector with ElementCount \p VF.		/// when scalarizing an operation for a vector with ElementCount \p VF.
/// For scalable vectors this currently takes the most pessimistic view based		/// For scalable vectors this currently takes the most pessimistic view based
/// upon the maximum possible value for vscale.		/// upon the maximum possible value for vscale.
unsigned getMaxNumElements(ElementCount VF) const {		unsigned getMaxNumElements(ElementCount VF) const {
if (!VF.isScalable())		if (!VF.isScalable())
▲ Show 20 Lines • Show All 261 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

Show First 20 Lines • Show All 226 Lines • ▼ Show 20 Lines	bool forceScalarizeMaskedGather(VectorType *VTy, Align Alignment) {
return ST->is64Bit() && !ST->hasVInstructionsI64();		return ST->is64Bit() && !ST->hasVInstructionsI64();
}		}

bool forceScalarizeMaskedScatter(VectorType *VTy, Align Alignment) {		bool forceScalarizeMaskedScatter(VectorType *VTy, Align Alignment) {
// Scalarize masked scatter for RV64 if EEW=64 indices aren't supported.		// Scalarize masked scatter for RV64 if EEW=64 indices aren't supported.
return ST->is64Bit() && !ST->hasVInstructionsI64();		return ST->is64Bit() && !ST->hasVInstructionsI64();
}		}

		bool isVScaleKnownToBeAPowerOfTwo() const {
		return TLI->isVScaleKnownToBeAPowerOfTwo();
		}

/// \returns How the target needs this vector-predicated operation to be		/// \returns How the target needs this vector-predicated operation to be
/// transformed.		/// transformed.
TargetTransformInfo::VPLegalization		TargetTransformInfo::VPLegalization
getVPLegalizationStrategy(const VPIntrinsic &PI) const {		getVPLegalizationStrategy(const VPIntrinsic &PI) const {
using VPLegalization = TargetTransformInfo::VPLegalization;		using VPLegalization = TargetTransformInfo::VPLegalization;
if (!ST->hasVInstructions() \|\|		if (!ST->hasVInstructions() \|\|
(PI.getIntrinsicID() == Intrinsic::vp_reduce_mul &&		(PI.getIntrinsicID() == Intrinsic::vp_reduce_mul &&
cast<VectorType>(PI.getArgOperand(1)->getType())		cast<VectorType>(PI.getArgOperand(1)->getType())
▲ Show 20 Lines • Show All 103 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,149 Lines • ▼ Show 20 Lines	if (!useMaskedInterleavedAccesses(TTI)) {
assert(WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&		assert(WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&
"No decisions should have been taken at this point");		"No decisions should have been taken at this point");
// Note: There is no need to invalidate any cost modeling decisions here, as		// Note: There is no need to invalidate any cost modeling decisions here, as
// non where taken so far.		// non where taken so far.
InterleaveInfo.invalidateGroupsRequiringScalarEpilogue();		InterleaveInfo.invalidateGroupsRequiringScalarEpilogue();
}		}

FixedScalableVFPair MaxFactors = computeFeasibleMaxVF(TC, UserVF, true);		FixedScalableVFPair MaxFactors = computeFeasibleMaxVF(TC, UserVF, true);

// Avoid tail folding if the trip count is known to be a multiple of any VF		// Avoid tail folding if the trip count is known to be a multiple of any VF
// we chose.		// we choose.
// FIXME: The condition below pessimises the case for fixed-width vectors,		std::optional<unsigned> MaxPowerOf2RuntimeVF =
		sdesmalenUnsubmitted Done Reply Inline Actions Is `MaxRuntimeVF` a better name? Also, is it worth making this a `std::optional<unsigned>`? I struggled to wrap my head around what `MaxPossibleVF = 1` meant, because I assumed this meant "don't vectorize, use scalar instructions", given other uses of `VF=1` in the LV. sdesmalen: Is `MaxRuntimeVF` a better name? Also, is it worth making this a `std::optional<unsigned>`? I…
		sdesmalenUnsubmitted Done Reply Inline Actions Sorry for changing my mind on this, but I wonder if `MaxPowerOf2RuntimeVF` is more appropriate name, since the value is `std::nullopt` if the VF is not a power of 2. sdesmalen: Sorry for changing my mind on this, but I wonder if `MaxPowerOf2RuntimeVF` is more appropriate…
		ABataevUnsubmitted Not Done Reply Inline Actions Looks like this std::optional does not work here, because MaxFactors.FixedVF.getFixedValue() returns non_optional value. So it may crash in the assert in line 5174, if MaxFactors.FixedVF.getFixedValue() returns 0 and MaxFactors.ScalableVF is not set. ABataev: Looks like this std::optional does not work here, because MaxFactors.FixedVF.getFixedValue()…
		sdesmalenUnsubmitted Not Done Reply Inline Actions My understanding is that normally (when not forcing it through an option) MaxFactors.FixedVF will be >= 1, are you saying that is not the case? I agree it's worth adding an extra check. sdesmalen: My understanding is that normally (when not forcing it through an option) MaxFactors.FixedVF…
		ABataevUnsubmitted Not Done Reply Inline Actions It might be 0 too ABataev: It might be 0 too
// when scalable VFs are also candidates for vectorization.		MaxFactors.FixedVF.getFixedValue();
if (MaxFactors.FixedVF.isVector() && !MaxFactors.ScalableVF) {		if (MaxFactors.ScalableVF) {
ElementCount MaxFixedVF = MaxFactors.FixedVF;		std::optional<unsigned> MaxVScale = getMaxVScale(*TheFunction, TTI);
		sdesmalenUnsubmitted Done Reply Inline Actions I initially wanted to ask "should this be `else if (MaxFactors.ScalableVF) {`?" but then I realised we need to be conservative here because the decision (not) to do tail folding is made for _both_ fixed and scalable VFs. So should this code be considering the `max(MaxFactors.FixedVF, MaxVScale * MaxFactors.ScalableVF)` instead? sdesmalen: I initially wanted to ask "should this be `else if (MaxFactors.ScalableVF) {`?" but then I…
assert((UserVF.isNonZero() \|\| isPowerOf2_32(MaxFixedVF.getFixedValue())) &&		if (MaxVScale && TTI.isVScaleKnownToBeAPowerOfTwo()) {
		MaxPowerOf2RuntimeVF = std::max<unsigned>(
		*MaxPowerOf2RuntimeVF,
		MaxVScale MaxFactors.ScalableVF.getKnownMinValue());
		} else
		sdesmalenUnsubmitted Done Reply Inline Actions I find this comment more confusing than enlightening. I also think that the issue is more that we decide early in the process whether or not to use tail folding, before we get to decide on a VF. We could make a more fine-grained choice if we would make those choices together. But when I read this comment, I'm not really thinking about that. In any case, I'm happy for the comment to be removed since the code itself is clear enough. sdesmalen: I find this comment more confusing than enlightening. I also think that the issue is more that…
		MaxPowerOf2RuntimeVF = std::nullopt; // Stick with tail-folding for now.
		}

		sdesmalenUnsubmitted Done Reply Inline Actions nit: move assignment into the condition? if (std::optional<unsigned> MaxVScale = getMaxVScale(TheFunction, TTI)) { .. } sdesmalen:* nit: move assignment into the condition? if (std::optional<unsigned> MaxVScale =…
		if (MaxPowerOf2RuntimeVF) {
		david-armAuthorUnsubmitted Done Reply Inline Actions @ABataev @sdesmalen I am happy to add an extra check here such as if (MaxPowerOf2RuntimeVF && MaxPowerOf2RuntimeVF > 0) { but I there may be no way to add a test to defend the non-zero behaviour. You'd need to have one of these scenarios: computeFeasibleMaxVF returns 0 and 0 for both fixed-width and scalable VFs. Not sure how this could ever happen? computeFeasibleMaxVF returns 0 for fixed-width and non-zero for scalable VFs, but there is no vscale max or vscale is not a power of 2. This may be possible in theory, but difficult to find an actual test that exhibits this behaviour. Regardless, even if I can't write a test for this I am happy to add a check! david-arm:* @ABataev @sdesmalen I am happy to add an extra check here such as if (MaxPowerOf2RuntimeVF…
		ABataevUnsubmitted Not Done Reply Inline Actions I think it can be reproduced if: Legal->getMaxSafeVectorWidthInBits() / WidestType == 0 (i.e. there is a dependency in the loop, but WidestType is larger than the dependency). UserVF is specified with the fixed vector length, which is greater than 0. ABataev: I think it can be reproduced if: 1. Legal->getMaxSafeVectorWidthInBits() / WidestType == 0 (i.e.
		assert((UserVF.isNonZero() \|\| isPowerOf2_32(*MaxPowerOf2RuntimeVF)) &&
"MaxFixedVF must be a power of 2");		"MaxFixedVF must be a power of 2");
unsigned MaxVFtimesIC = UserIC ? MaxFixedVF.getFixedValue() * UserIC		unsigned MaxVFtimesIC =
: MaxFixedVF.getFixedValue();		UserIC ? MaxPowerOf2RuntimeVF UserIC : *MaxPowerOf2RuntimeVF;
ScalarEvolution *SE = PSE.getSE();		ScalarEvolution *SE = PSE.getSE();
		sdesmalenUnsubmitted Done Reply Inline Actions You can write this a bit more compactly using std::max: std::optional<unsigned> MaxPowerOf2RuntimeVF = MaxFactors.FixedVF.getFixedValue(); if (MaxFactors.ScalableVF) { std::optional<unsigned> MaxVScale = getMaxVScale(TheFunction, TTI); if (MaxVScale && TTI.isVScaleKnownToBeAPowerOfTwo()) { MaxPowerOf2RuntimeVF = std::max<unsigned>( MaxPowerOf2RuntimeVF, MaxVScale MaxFactors.ScalableVF.getKnownMinValue()); } else MaxPowerOf2RuntimeVF = std::nullopt; // Stick with tail-folding for now. } sdesmalen: You can write this a bit more compactly using std::max: std::optional<unsigned>…
const SCEV *BackedgeTakenCount = PSE.getBackedgeTakenCount();		const SCEV *BackedgeTakenCount = PSE.getBackedgeTakenCount();
		sdesmalenUnsubmitted Done Reply Inline Actions nit: This default is already set on line 5155, so there's no need to set it again here? sdesmalen: nit: This default is already set on line 5155, so there's no need to set it again here?
		david-armAuthorUnsubmitted Done Reply Inline Actions I don't think we can because this could have been set to something greater than 1 for the fixed-width case, since for scalable vectorisation we usually have both a max fixed-width VF and a max scalable VF. As you say above, we're kind of taking the max of the fixed-width and scalable VFs, but only if safe to do so for scalable VFs. I agree the logic is kind of awkward, but it's not immediately obvious to me how to make it better? david-arm: I don't think we can because this could have been set to something greater than 1 for the fixed…
const SCEV *ExitCount = SE->getAddExpr(		const SCEV *ExitCount = SE->getAddExpr(
BackedgeTakenCount, SE->getOne(BackedgeTakenCount->getType()));		BackedgeTakenCount, SE->getOne(BackedgeTakenCount->getType()));
const SCEV *Rem = SE->getURemExpr(		const SCEV *Rem = SE->getURemExpr(
SE->applyLoopGuards(ExitCount, TheLoop),		SE->applyLoopGuards(ExitCount, TheLoop),
SE->getConstant(BackedgeTakenCount->getType(), MaxVFtimesIC));		SE->getConstant(BackedgeTakenCount->getType(), MaxVFtimesIC));
if (Rem->isZero()) {		if (Rem->isZero()) {
// Accept MaxFixedVF if we do not have a tail.		// Accept MaxFixedVF if we do not have a tail.
LLVM_DEBUG(dbgs() << "LV: No tail will remain for any chosen VF.\n");		LLVM_DEBUG(dbgs() << "LV: No tail will remain for any chosen VF.\n");
▲ Show 20 Lines • Show All 5,417 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
	; RUN: opt -passes=loop-vectorize -force-target-instruction-cost=1 -prefer-predicate-over-epilogue=predicate-dont-vectorize -S < %s 2>&1 \| FileCheck %s			; RUN: opt -passes=loop-vectorize -force-target-instruction-cost=1 -prefer-predicate-over-epilogue=predicate-dont-vectorize -S < %s 2>&1 \| FileCheck %s
				reamesUnsubmitted Done Reply Inline Actions Please use update-test-checks.py reames: Please use update-test-checks.py

	; This test currently fails when the LV calculates a maximums safe
	; distance for scalable vectors, because the code to eliminate the tail is
	; pessimistic when scalable vectors are considered. This will be addressed
	; in a future patch, at which point we should be able to un-XFAIL the
	; test. The expected output is to vectorize this loop without predication
	; (and thus have unpredicated vector store).
	; XFAIL: *

	; CHECK: store <4 x i32>

	target triple = "aarch64"			target triple = "aarch64"
	target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"			target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"


	define void @f1(ptr %A) #0 {			define void @f1(ptr %A) #0 {
				; CHECK-LABEL: define void @f1
				; CHECK-SAME: (ptr [[A:%.*]]) #[[ATTR0:[0-9]+]] {
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 4
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP1]]
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 4
				; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP3]]
				; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
				; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[INDEX]], 0
				; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[TMP4]]
				; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i32, ptr [[TMP5]], i32 0
				; CHECK-NEXT: store <vscale x 4 x i32> shufflevector (<vscale x 4 x i32> insertelement (<vscale x 4 x i32> poison, i32 1, i64 0), <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer), ptr [[TMP6]], align 4
				; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
				; CHECK-NEXT: [[TMP8:%.*]] = mul i64 [[TMP7]], 4
				; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP8]]
				; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
				; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP0:![0-9]+]]
				; CHECK: middle.block:
				; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
				; CHECK: scalar.ph:
				; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
				; CHECK-NEXT: br label [[FOR_BODY:%.*]]
				; CHECK: for.body:
				; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[FOR_BODY]] ]
				; CHECK-NEXT: [[ARRAYIDX:%.*]] = getelementptr inbounds i32, ptr [[A]], i64 [[IV]]
				; CHECK-NEXT: store i32 1, ptr [[ARRAYIDX]], align 4
				; CHECK-NEXT: [[IV_NEXT]] = add nuw nsw i64 [[IV]], 1
				; CHECK-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[IV_NEXT]], 1024
				; CHECK-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY]], label [[EXIT]], !llvm.loop [[LOOP3:![0-9]+]]
				; CHECK: exit:
				; CHECK-NEXT: ret void
				;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%arrayidx = getelementptr inbounds i32, ptr %A, i64 %iv			%arrayidx = getelementptr inbounds i32, ptr %A, i64 %iv
	store i32 1, ptr %arrayidx, align 4			store i32 1, ptr %arrayidx, align 4
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond = icmp ne i64 %iv.next, 1024			%exitcond = icmp ne i64 %iv.next, 1024
	br i1 %exitcond, label %for.body, label %exit			br i1 %exitcond, label %for.body, label %exit

	exit:			exit:
	ret void			ret void
	}			}

	attributes #0 = { "target-features"="+sve" }			attributes #0 = { "target-features"="+sve" vscale_range(1,16) }

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll

	Show First 20 Lines • Show All 771 Lines • ▼ Show 20 Lines

	while.end.loopexit: ; preds = %while.body			while.end.loopexit: ; preds = %while.body
	ret void			ret void
	}			}

	define void @simple_memset_trip1024(i32 %val, ptr %ptr, i64 %n) #0 {			define void @simple_memset_trip1024(i32 %val, ptr %ptr, i64 %n) #0 {
	; CHECK-LABEL: @simple_memset_trip1024(			; CHECK-LABEL: @simple_memset_trip1024(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
	; CHECK: vector.ph:
	; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 4			; CHECK-NEXT: [[TMP1:%.*]] = mul i64 [[TMP0]], 4
				; CHECK-NEXT: [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 1024, [[TMP1]]
				; CHECK-NEXT: br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
				; CHECK: vector.ph:
	; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[TMP2:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 4			; CHECK-NEXT: [[TMP3:%.*]] = mul i64 [[TMP2]], 4
	; CHECK-NEXT: [[TMP4:%.*]] = sub i64 [[TMP3]], 1			; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 1024, [[TMP3]]
	; CHECK-NEXT: [[N_RND_UP:%.*]] = add i64 1024, [[TMP4]]			; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 1024, [[N_MOD_VF]]
	; CHECK-NEXT: [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
	; CHECK-NEXT: [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
	; CHECK-NEXT: [[TMP5:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[TMP6:%.*]] = mul i64 [[TMP5]], 4
	; CHECK-NEXT: [[TMP7:%.*]] = sub i64 1024, [[TMP6]]
	; CHECK-NEXT: [[TMP8:%.*]] = icmp ugt i64 1024, [[TMP6]]
	; CHECK-NEXT: [[TMP9:%.*]] = select i1 [[TMP8]], i64 [[TMP7]], i64 0
	; CHECK-NEXT: [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 0, i64 1024)
	; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[VAL:%.]], i64 0			; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.]] = insertelement <vscale x 4 x i32> poison, i32 [[VAL:%.]], i64 0
	; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer			; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
	; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]			; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
	; CHECK: vector.body:			; CHECK: vector.body:
	; CHECK-NEXT: [[INDEX1:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT2:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[INDEX1:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT2:%.]], [[VECTOR_BODY]] ]
	; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.]] = phi <vscale x 4 x i1> [ [[ACTIVE_LANE_MASK_ENTRY]], [[VECTOR_PH]] ], [ [[ACTIVE_LANE_MASK_NEXT:%.]], [[VECTOR_BODY]] ]			; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[INDEX1]], 0
	; CHECK-NEXT: [[TMP10:%.*]] = add i64 [[INDEX1]], 0			; CHECK-NEXT: [[TMP5:%.]] = getelementptr i32, ptr [[PTR:%.]], i64 [[TMP4]]
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr i32, ptr [[PTR:%.]], i64 [[TMP10]]			; CHECK-NEXT: [[TMP6:%.*]] = getelementptr i32, ptr [[TMP5]], i32 0
	; CHECK-NEXT: [[TMP12:%.*]] = getelementptr i32, ptr [[TMP11]], i32 0			; CHECK-NEXT: store <vscale x 4 x i32> [[BROADCAST_SPLAT]], ptr [[TMP6]], align 4
	; CHECK-NEXT: call void @llvm.masked.store.nxv4i32.p0(<vscale x 4 x i32> [[BROADCAST_SPLAT]], ptr [[TMP12]], i32 4, <vscale x 4 x i1> [[ACTIVE_LANE_MASK]])			; CHECK-NEXT: [[TMP7:%.*]] = call i64 @llvm.vscale.i64()
	; CHECK-NEXT: [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 4 x i1> @llvm.get.active.lane.mask.nxv4i1.i64(i64 [[INDEX1]], i64 [[TMP9]])			; CHECK-NEXT: [[TMP8:%.*]] = mul i64 [[TMP7]], 4
	; CHECK-NEXT: [[TMP13:%.*]] = call i64 @llvm.vscale.i64()			; CHECK-NEXT: [[INDEX_NEXT2]] = add nuw i64 [[INDEX1]], [[TMP8]]
	; CHECK-NEXT: [[TMP14:%.*]] = mul i64 [[TMP13]], 4			; CHECK-NEXT: [[TMP9:%.*]] = icmp eq i64 [[INDEX_NEXT2]], [[N_VEC]]
	; CHECK-NEXT: [[INDEX_NEXT2]] = add i64 [[INDEX1]], [[TMP14]]			; CHECK-NEXT: br i1 [[TMP9]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
	; CHECK-NEXT: [[TMP15:%.*]] = xor <vscale x 4 x i1> [[ACTIVE_LANE_MASK_NEXT]], shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> poison, i1 true, i64 0), <vscale x 4 x i1> poison, <vscale x 4 x i32> zeroinitializer)
	; CHECK-NEXT: [[TMP16:%.*]] = extractelement <vscale x 4 x i1> [[TMP15]], i32 0
	; CHECK-NEXT: br i1 [[TMP16]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
	; CHECK: middle.block:			; CHECK: middle.block:
	; CHECK-NEXT: br i1 true, label [[WHILE_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]			; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1024, [[N_VEC]]
				; CHECK-NEXT: br i1 [[CMP_N]], label [[WHILE_END_LOOPEXIT:%.*]], label [[SCALAR_PH]]
	; CHECK: scalar.ph:			; CHECK: scalar.ph:
	; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]			; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ [[N_VEC]], [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
	; CHECK-NEXT: br label [[WHILE_BODY:%.*]]			; CHECK-NEXT: br label [[WHILE_BODY:%.*]]
	; CHECK: while.body:			; CHECK: while.body:
	; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ [[INDEX_NEXT:%.]], [[WHILE_BODY]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[GEP:%.*]] = getelementptr i32, ptr [[PTR]], i64 [[INDEX]]			; CHECK-NEXT: [[GEP:%.*]] = getelementptr i32, ptr [[PTR]], i64 [[INDEX]]
	; CHECK-NEXT: store i32 [[VAL]], ptr [[GEP]], align 4			; CHECK-NEXT: store i32 [[VAL]], ptr [[GEP]], align 4
	; CHECK-NEXT: [[INDEX_NEXT]] = add nsw i64 [[INDEX]], 1			; CHECK-NEXT: [[INDEX_NEXT]] = add nsw i64 [[INDEX]], 1
	Show All 18 Lines
	}			}

	!0 = distinct !{!0, !1, !2}			!0 = distinct !{!0, !1, !2}
	!1 = !{!"llvm.loop.vectorize.width", i32 4}			!1 = !{!"llvm.loop.vectorize.width", i32 4}
	!2 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}			!2 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}
	!3 = distinct !{!3, !4}			!3 = distinct !{!3, !4}
	!4 = !{!"llvm.loop.vectorize.width", i32 4}			!4 = !{!"llvm.loop.vectorize.width", i32 4}

	attributes #0 = { "target-features"="+sve" }			attributes #0 = { "target-features"="+sve" vscale_range(1,16) }

This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Don't tail-fold for scalable VFs when there is no scalar tailClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 508525

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll

[LoopVectorize] Don't tail-fold for scalable VFs when there is no scalar tail
ClosedPublic