This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64TargetTransformInfo.h
-
Transforms/Vectorize/
-
Vectorize/
-
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/AArch64/
-
Transforms/
-
LoopVectorize/
-
AArch64/
1/2
scalable-strict-fadd.ll
-
strict-fadd.ll

Differential D106653

[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64
ClosedPublic

Authored by david-arm on Jul 23 2021, 5:16 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
kmclaughlin
c-rhodes
spatel
RKSimon
dmgreen
craig.topper

Commits

rGf4122398e7c1: [LoopVectorize][AArch64] Enable ordered reductions by default for AArch64

Summary

I have added a new TTI interface called enableOrderedReductions() that
controls whether or not ordered reductions should be enabled for a
given target. By default this returns false, whereas for AArch64 it
returns true and we rely upon the cost model to make sensible
vectorisation choices. It is still possible to override the new TTI
interface by setting the command line flag:

-force-ordered-reductions=true|false

I have added a new RUN line to show that we use ordered reductions by
default for SVE and Neon:

Transforms/LoopVectorize/AArch64/strict-fadd.ll
Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

david-arm created this revision.Jul 23 2021, 5:16 AM

Herald added subscribers: danielkiss, hiraditya, kristof.beyls. · View Herald TranscriptJul 23 2021, 5:16 AM

david-arm requested review of this revision.Jul 23 2021, 5:16 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 23 2021, 5:16 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

david-arm added a parent revision: D106646: [LoopVectorize] Don't interleave scalar ordered reductions for inner loops.Jul 23 2021, 5:16 AM

david-arm added reviewers: RKSimon, dmgreen.Jul 23 2021, 5:29 AM

I wonder if this might be something of interest for other targets, not just AArch64?

david-arm added a reviewer: craig.topper.Jul 23 2021, 5:39 AM

Harbormaster completed remote builds in B115826: Diff 361161.Jul 23 2021, 6:00 AM

Do you have any performance results?

Changed the patch to only enable strict (ordered) reductions for AArch64 when SVE is enabled.

Herald added subscribers: CarolineConcatto, tschuett. · View Herald TranscriptJul 27 2021, 10:06 AM

Hi @dmgreen, we've decided for now to only enable this by default for AArch64 when SVE is enabled as this is lower risk. We are currently still collecting performance data for the fixed-width vectorisation case when SVE is enabled.

In D106653#2907824, @david-arm wrote:

Hi @dmgreen, we've decided for now to only enable this by default for AArch64 when SVE is enabled as this is lower risk. We are currently still collecting performance data for the fixed-width vectorisation case when SVE is enabled.

Sorry, that should really be "collecting performance data for the fixed-width vectorisation case when strict reductions are forced".

Harbormaster completed remote builds in B116470: Diff 362082.Jul 27 2021, 12:02 PM

What do you mean by "lower risk"? Do you have performance numbers for SVE? Or is the cost so high that in practice they are never generated?

In D106653#2909467, @dmgreen wrote:

What do you mean by "lower risk"? Do you have performance numbers for SVE? Or is the cost so high that in practice they are never generated?

The end-goal is to enable strict reductions by default for all targets so that buildbots guard the functionality and hopefully give some performance benefit as well. The cost-model must be conservative enough to avoid any regressions, but still let through loops where there is an obvious benefit. At that point, we can start tuning the cost-model to let through more cases when that helps performance.

Our other motivation is around making LLVM 13 an experimental compiler for VLA auto-vectorization. We specifically want to enable strict reductions by default in LLVM 13 for vector-length-agnostic SVE, because this is a new vectorization capability which SVE can handle. The cost-model doesn't really matter too much at this point, because VLA auto-vec is experimental and little effort has yet been made to improve code-quality, so it's unlikely that strict-reductions will make a dent.

To achieve our end-goal of enabling strict reductions by default for all targets, we can do that in stages:

Enable it by default for VLA SVE (this patch)

We can enable it because performance doesn't really matter. It would mean this patch needs to be updated to (temporarily) give a high/invalid cost for ordered reductions when the type is a FixedVectorType, so that we don't accidentally introduce any regressions for e.g. -mcpu=a64fx when -scalable-vectorization=on|preferred is not specified.

Enable it by default for AArch64.

We have performed SPEC2K6 measurements where we can see that the cost-model holds up and where performance across the board is similar or better, with only very minor regressions (<1%). We want to do a bit more benchmarking (such as measuring on different AArch64 machines) to present numbers we're confident about.

Enable it by default for other targets.

This will require measurements on targets other than AArch64.

Does that sound like a sensible approach?

Matt added a subscriber: Matt.Jul 28 2021, 2:43 PM

Herald added a subscriber: ctetreau. · View Herald TranscriptJul 28 2021, 2:43 PM

Turning this on sounds good, so long as we do our due-diligence and check it's OK performance wise. I just did some experiments and it seemed fine-ish, but they were only very small exampled run on a A510 and A710. So long as there was more than a single real vector operation, the benefits started overcoming the overheads.

A couple of high level comments though:

I'm not a fan of -march=arm8-a+sve working differently to -march=armv8-a for non-sve related code. i.e "hasSVE" shouldn't be the trigger for allowing NEON inorder reductions, if they do not have anything to do with SVE. We should consider just jumping to step 2 and enabling it for all aarch64, so long as the performance results look OK and we are careful about regressions of course.
It seems that the cost is so high under SVE that they will never be generated in practice. This suggests that enabling them will have little effect for either testing or performance.

Enabled ordered reductions by default for AArch64.
Rebase.

david-arm added a parent revision: D107264: [NFC] Rename enable-strict-reductions to force-ordered-reductions.Aug 2 2021, 4:59 AM

Harbormaster completed remote builds in B117427: Diff 363445.Aug 2 2021, 5:26 AM

spatel added inline comments.Aug 6 2021, 5:37 AM

llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
5	Drive-by comment (checking my understanding) - we could add this RUN line with `--check-prefix=CHECK-NOT-VECTORIZED` as a preliminary commit and then update only the prefix to show the change in behavior with this patch?

david-arm added inline comments.Aug 10 2021, 1:46 AM

llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
5	That sounds like a good idea! I'll commit a one line patch to do this and then rebase.

Rebase.

Harbormaster completed remote builds in B119067: Diff 365744.Aug 11 2021, 7:37 AM

david-arm added a parent revision: D108292: [Analysis][AArch64] Make fixed-width ordered reductions slightly more expensive.Aug 18 2021, 5:01 AM

Hi @dmgreen, so I have run SPEC2006 on a neoverse-n1 9 times without my patch and 9 times with when building with -O3, then compared the averages. Here is a summary of the results showing a few outliers (slowest at the top, fastest at the bottom):

Benchmark | Percentage Runtime Change (<0 = faster with ordered reductions)

453.povray: 0.3
464.h264ref: 0.1
462.libquantum: 0
...
450.soplex: -0.95
429.mcf: -1.22
482.sphinx3: -1.27
471.omnetpp: -1.51

Geometric mean: -0.45

Overall it looks like this slightly faster with ordered reductions enabled by default for AArch64.

Just spotted a bug in enableOrderedReductions and fixed it. This was fixed in the patch I was using for performance testing. :)

Harbormaster completed remote builds in B120106: Diff 367181.Aug 18 2021, 5:28 AM

The numbers look good to me. With D108292 this LGTM. Thanks.

This revision is now accepted and ready to land.Aug 18 2021, 9:27 AM

This revision was landed with ongoing or failed builds.Aug 19 2021, 1:29 AM

Closed by commit rGf4122398e7c1: [LoopVectorize][AArch64] Enable ordered reductions by default for AArch64 (authored by david-arm). · Explain Why

This revision was automatically updated to reflect the committed changes.

david-arm added a commit: rGf4122398e7c1: [LoopVectorize][AArch64] Enable ordered reductions by default for AArch64.

Hi David, it looks like this change caused an assertion failure. Can you please take a look?

> cat test.i
int a;
double b, c;
void d() {
  for (; a; a++) {
    b += c;
    c = a;
  }
}
> clang -O2 test.i --target=aarch64-linux
clang: ../llvm/lib/Transforms/Vectorize/VPlanValue.h:186: llvm::Value *llvm::VPValue::getLiveInIRValue(): Assertion `!getDef() && "VPValue is not a live-in; it is defined by a VPDef inside a VPlan"' failed.

In D106653#2957702, @pcc wrote:

Hi David, it looks like this change caused an assertion failure. Can you please take a look?

> cat test.i
int a;
double b, c;
void d() {
  for (; a; a++) {
    b += c;
    c = a;
  }
}
> clang -O2 test.i --target=aarch64-linux
clang: ../llvm/lib/Transforms/Vectorize/VPlanValue.h:186: llvm::Value *llvm::VPValue::getLiveInIRValue(): Assertion `!getDef() && "VPValue is not a live-in; it is defined by a VPDef inside a VPlan"' failed.

Thanks for the report! I have a suspicion of what may be going wrong here. I'll revert the patch while I take a look.

fhahn added a reverting change: rGab9296f13be4: Revert "[LoopVectorize][AArch64] Enable ordered reductions by default for….Aug 20 2021, 1:32 PM

fhahn mentioned this in rG9baed023b4b5: [LV] Adjust reduction recipes before recurrence handling..Aug 22 2021, 3:02 AM

I recommitted the patch in d024a01511c1 after fixing the issue causing the revert in 9baed023b4b5

In D106653#2959661, @fhahn wrote:

I recommitted the patch in d024a01511c1 after fixing the issue causing the revert in 9baed023b4b5

Hi @fhahn, thanks a lot for fixing this! I was on holiday Monday and Tuesday, only just saw the fallout now. :)

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

7 lines

TargetTransformInfoImpl.h

2 lines

lib/

Analysis/

TargetTransformInfo.cpp

4 lines

Target/

AArch64/

AArch64TargetTransformInfo.h

2 lines

Transforms/

Vectorize/

LoopVectorize.cpp

13 lines

test/

Transforms/

LoopVectorize/

AArch64/

scalable-strict-fadd.ll

2 lines

strict-fadd.ll

2 lines

Diff 367423

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 656 Lines • ▼ Show 20 Lines	public:
/// Return true if the target supports masked gather.		/// Return true if the target supports masked gather.
bool isLegalMaskedGather(Type *DataType, Align Alignment) const;		bool isLegalMaskedGather(Type *DataType, Align Alignment) const;

/// Return true if the target supports masked compress store.		/// Return true if the target supports masked compress store.
bool isLegalMaskedCompressStore(Type *DataType) const;		bool isLegalMaskedCompressStore(Type *DataType) const;
/// Return true if the target supports masked expand load.		/// Return true if the target supports masked expand load.
bool isLegalMaskedExpandLoad(Type *DataType) const;		bool isLegalMaskedExpandLoad(Type *DataType) const;

		/// Return true if we should be enabling ordered reductions for the target.
		bool enableOrderedReductions() const;

/// Return true if the target has a unified operation to calculate division		/// Return true if the target has a unified operation to calculate division
/// and remainder. If so, the additional implicit multiplication and		/// and remainder. If so, the additional implicit multiplication and
/// subtraction required to calculate a remainder from division are free. This		/// subtraction required to calculate a remainder from division are free. This
/// can enable more aggressive transformations for division and remainder than		/// can enable more aggressive transformations for division and remainder than
/// would typically be allowed using throughput or size cost models.		/// would typically be allowed using throughput or size cost models.
bool hasDivRemOp(Type *DataType, bool IsSigned) const;		bool hasDivRemOp(Type *DataType, bool IsSigned) const;

/// Return true if the given instruction (assumed to be a memory access		/// Return true if the given instruction (assumed to be a memory access
▲ Show 20 Lines • Show All 830 Lines • ▼ Show 20 Lines	public:
virtual bool isLegalMaskedStore(Type *DataType, Align Alignment) = 0;		virtual bool isLegalMaskedStore(Type *DataType, Align Alignment) = 0;
virtual bool isLegalMaskedLoad(Type *DataType, Align Alignment) = 0;		virtual bool isLegalMaskedLoad(Type *DataType, Align Alignment) = 0;
virtual bool isLegalNTStore(Type *DataType, Align Alignment) = 0;		virtual bool isLegalNTStore(Type *DataType, Align Alignment) = 0;
virtual bool isLegalNTLoad(Type *DataType, Align Alignment) = 0;		virtual bool isLegalNTLoad(Type *DataType, Align Alignment) = 0;
virtual bool isLegalMaskedScatter(Type *DataType, Align Alignment) = 0;		virtual bool isLegalMaskedScatter(Type *DataType, Align Alignment) = 0;
virtual bool isLegalMaskedGather(Type *DataType, Align Alignment) = 0;		virtual bool isLegalMaskedGather(Type *DataType, Align Alignment) = 0;
virtual bool isLegalMaskedCompressStore(Type *DataType) = 0;		virtual bool isLegalMaskedCompressStore(Type *DataType) = 0;
virtual bool isLegalMaskedExpandLoad(Type *DataType) = 0;		virtual bool isLegalMaskedExpandLoad(Type *DataType) = 0;
		virtual bool enableOrderedReductions() = 0;
virtual bool hasDivRemOp(Type *DataType, bool IsSigned) = 0;		virtual bool hasDivRemOp(Type *DataType, bool IsSigned) = 0;
virtual bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) = 0;		virtual bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) = 0;
virtual bool prefersVectorizedAddressing() = 0;		virtual bool prefersVectorizedAddressing() = 0;
virtual InstructionCost getScalingFactorCost(Type Ty, GlobalValue BaseGV,		virtual InstructionCost getScalingFactorCost(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset,		int64_t BaseOffset,
bool HasBaseReg, int64_t Scale,		bool HasBaseReg, int64_t Scale,
unsigned AddrSpace) = 0;		unsigned AddrSpace) = 0;
virtual bool LSRWithInstrQueries() = 0;		virtual bool LSRWithInstrQueries() = 0;
▲ Show 20 Lines • Show All 366 Lines • ▼ Show 20 Lines	bool isLegalMaskedGather(Type *DataType, Align Alignment) override {
return Impl.isLegalMaskedGather(DataType, Alignment);		return Impl.isLegalMaskedGather(DataType, Alignment);
}		}
bool isLegalMaskedCompressStore(Type *DataType) override {		bool isLegalMaskedCompressStore(Type *DataType) override {
return Impl.isLegalMaskedCompressStore(DataType);		return Impl.isLegalMaskedCompressStore(DataType);
}		}
bool isLegalMaskedExpandLoad(Type *DataType) override {		bool isLegalMaskedExpandLoad(Type *DataType) override {
return Impl.isLegalMaskedExpandLoad(DataType);		return Impl.isLegalMaskedExpandLoad(DataType);
}		}
		bool enableOrderedReductions() override {
		return Impl.enableOrderedReductions();
		}
bool hasDivRemOp(Type *DataType, bool IsSigned) override {		bool hasDivRemOp(Type *DataType, bool IsSigned) override {
return Impl.hasDivRemOp(DataType, IsSigned);		return Impl.hasDivRemOp(DataType, IsSigned);
}		}
bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) override {		bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) override {
return Impl.hasVolatileVariant(I, AddrSpace);		return Impl.hasVolatileVariant(I, AddrSpace);
}		}
bool prefersVectorizedAddressing() override {		bool prefersVectorizedAddressing() override {
return Impl.prefersVectorizedAddressing();		return Impl.prefersVectorizedAddressing();
▲ Show 20 Lines • Show All 497 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 257 Lines • ▼ Show 20 Lines	public:
bool isLegalMaskedGather(Type *DataType, Align Alignment) const {		bool isLegalMaskedGather(Type *DataType, Align Alignment) const {
return false;		return false;
}		}

bool isLegalMaskedCompressStore(Type *DataType) const { return false; }		bool isLegalMaskedCompressStore(Type *DataType) const { return false; }

bool isLegalMaskedExpandLoad(Type *DataType) const { return false; }		bool isLegalMaskedExpandLoad(Type *DataType) const { return false; }

		bool enableOrderedReductions() const { return false; }

bool hasDivRemOp(Type *DataType, bool IsSigned) const { return false; }		bool hasDivRemOp(Type *DataType, bool IsSigned) const { return false; }

bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) const {		bool hasVolatileVariant(Instruction *I, unsigned AddrSpace) const {
return false;		return false;
}		}

bool prefersVectorizedAddressing() const { return true; }		bool prefersVectorizedAddressing() const { return true; }

▲ Show 20 Lines • Show All 908 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 404 Lines • ▼ Show 20 Lines
	bool TargetTransformInfo::isLegalMaskedCompressStore(Type *DataType) const {			bool TargetTransformInfo::isLegalMaskedCompressStore(Type *DataType) const {
	return TTIImpl->isLegalMaskedCompressStore(DataType);			return TTIImpl->isLegalMaskedCompressStore(DataType);
	}			}

	bool TargetTransformInfo::isLegalMaskedExpandLoad(Type *DataType) const {			bool TargetTransformInfo::isLegalMaskedExpandLoad(Type *DataType) const {
	return TTIImpl->isLegalMaskedExpandLoad(DataType);			return TTIImpl->isLegalMaskedExpandLoad(DataType);
	}			}

				bool TargetTransformInfo::enableOrderedReductions() const {
				return TTIImpl->enableOrderedReductions();
				}

	bool TargetTransformInfo::hasDivRemOp(Type *DataType, bool IsSigned) const {			bool TargetTransformInfo::hasDivRemOp(Type *DataType, bool IsSigned) const {
	return TTIImpl->hasDivRemOp(DataType, IsSigned);			return TTIImpl->hasDivRemOp(DataType, IsSigned);
	}			}

	bool TargetTransformInfo::hasVolatileVariant(Instruction *I,			bool TargetTransformInfo::hasVolatileVariant(Instruction *I,
	unsigned AddrSpace) const {			unsigned AddrSpace) const {
	return TTIImpl->hasVolatileVariant(I, AddrSpace);			return TTIImpl->hasVolatileVariant(I, AddrSpace);
	}			}
	▲ Show 20 Lines • Show All 748 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 293 Lines • ▼ Show 20 Lines	if (auto *DataTypeVTy = dyn_cast<VectorType>(DataType)) {
cast<FixedVectorType>(DataTypeVTy)->getNumElements();		cast<FixedVectorType>(DataTypeVTy)->getNumElements();
unsigned EltSize = DataTypeVTy->getElementType()->getScalarSizeInBits();		unsigned EltSize = DataTypeVTy->getElementType()->getScalarSizeInBits();
return NumElements > 1 && isPowerOf2_64(NumElements) && EltSize >= 8 &&		return NumElements > 1 && isPowerOf2_64(NumElements) && EltSize >= 8 &&
EltSize <= 128 && isPowerOf2_64(EltSize);		EltSize <= 128 && isPowerOf2_64(EltSize);
}		}
return BaseT::isLegalNTStore(DataType, Alignment);		return BaseT::isLegalNTStore(DataType, Alignment);
}		}

		bool enableOrderedReductions() const { return true; }

InstructionCost getInterleavedMemoryOpCost(		InstructionCost getInterleavedMemoryOpCost(
unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,		unsigned Opcode, Type *VecTy, unsigned Factor, ArrayRef<unsigned> Indices,
Align Alignment, unsigned AddressSpace,		Align Alignment, unsigned AddressSpace,
TTI::TargetCostKind CostKind = TTI::TCK_SizeAndLatency,		TTI::TargetCostKind CostKind = TTI::TCK_SizeAndLatency,
bool UseMaskForCond = false, bool UseMaskForGaps = false);		bool UseMaskForCond = false, bool UseMaskForGaps = false);

bool		bool
shouldConsiderAddressTypePromotion(const Instruction &I,		shouldConsiderAddressTypePromotion(const Instruction &I,
Show All 26 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 325 Lines • ▼ Show 20 Lines	cl::desc("The maximum interleave count to use when interleaving a scalar "
"reduction in a nested loop."));		"reduction in a nested loop."));

static cl::opt<bool>		static cl::opt<bool>
PreferInLoopReductions("prefer-inloop-reductions", cl::init(false),		PreferInLoopReductions("prefer-inloop-reductions", cl::init(false),
cl::Hidden,		cl::Hidden,
cl::desc("Prefer in-loop vector reductions, "		cl::desc("Prefer in-loop vector reductions, "
"overriding the targets preference."));		"overriding the targets preference."));

cl::opt<bool> ForceOrderedReductions(		static cl::opt<bool> ForceOrderedReductions(
"force-ordered-reductions", cl::init(false), cl::Hidden,		"force-ordered-reductions", cl::init(false), cl::Hidden,
cl::desc("Enable the vectorisation of loops with in-order (strict) "		cl::desc("Enable the vectorisation of loops with in-order (strict) "
"FP reductions"));		"FP reductions"));

static cl::opt<bool> PreferPredicatedReductionSelect(		static cl::opt<bool> PreferPredicatedReductionSelect(
"prefer-predicated-reduction-select", cl::init(false), cl::Hidden,		"prefer-predicated-reduction-select", cl::init(false), cl::Hidden,
cl::desc(		cl::desc(
"Prefer predicating a reduction operation over an after loop select."));		"Prefer predicating a reduction operation over an after loop select."));
▲ Show 20 Lines • Show All 969 Lines • ▼ Show 20 Lines	public:
/// outside. In loop reductions are collected into InLoopReductionChains.		/// outside. In loop reductions are collected into InLoopReductionChains.
void collectInLoopReductions();		void collectInLoopReductions();

/// Returns true if we should use strict in-order reductions for the given		/// Returns true if we should use strict in-order reductions for the given
/// RdxDesc. This is true if the -enable-strict-reductions flag is passed,		/// RdxDesc. This is true if the -enable-strict-reductions flag is passed,
/// the IsOrdered flag of RdxDesc is set and we do not allow reordering		/// the IsOrdered flag of RdxDesc is set and we do not allow reordering
/// of FP operations.		/// of FP operations.
bool useOrderedReductions(const RecurrenceDescriptor &RdxDesc) {		bool useOrderedReductions(const RecurrenceDescriptor &RdxDesc) {
return ForceOrderedReductions && !Hints->allowReordering() &&		return !Hints->allowReordering() && RdxDesc.isOrdered();
RdxDesc.isOrdered();
}		}

/// \returns The smallest bitwidth each instruction can be represented with.		/// \returns The smallest bitwidth each instruction can be represented with.
/// The vector equivalents of these instructions should be truncated to this		/// The vector equivalents of these instructions should be truncated to this
/// type.		/// type.
const MapVector<Instruction *, uint64_t> &getMinimalBitwidths() const {		const MapVector<Instruction *, uint64_t> &getMinimalBitwidths() const {
return MinBWs;		return MinBWs;
}		}
▲ Show 20 Lines • Show All 8,890 Lines • ▼ Show 20 Lines	if (Hints.isPotentiallyUnsafe() &&
reportVectorizationFailure(		reportVectorizationFailure(
"Potentially unsafe FP op prevents vectorization",		"Potentially unsafe FP op prevents vectorization",
"loop not vectorized due to unsafe FP support.",		"loop not vectorized due to unsafe FP support.",
"UnsafeFP", ORE, L);		"UnsafeFP", ORE, L);
Hints.emitRemarkWithHints();		Hints.emitRemarkWithHints();
return false;		return false;
}		}

if (!LVL.canVectorizeFPMath(ForceOrderedReductions)) {		bool AllowOrderedReductions;
		// If the flag is set, use that instead and override the TTI behaviour.
		if (ForceOrderedReductions.getNumOccurrences() > 0)
		AllowOrderedReductions = ForceOrderedReductions;
		else
		AllowOrderedReductions = TTI->enableOrderedReductions();
		if (!LVL.canVectorizeFPMath(AllowOrderedReductions)) {
ORE->emit([&]() {		ORE->emit([&]() {
auto *ExactFPMathInst = Requirements.getExactFPInst();		auto *ExactFPMathInst = Requirements.getExactFPInst();
return OptimizationRemarkAnalysisFPCommute(DEBUG_TYPE, "CantReorderFPOps",		return OptimizationRemarkAnalysisFPCommute(DEBUG_TYPE, "CantReorderFPOps",
ExactFPMathInst->getDebugLoc(),		ExactFPMathInst->getDebugLoc(),
ExactFPMathInst->getParent())		ExactFPMathInst->getParent())
<< "loop not vectorized: cannot prove it is safe to reorder "		<< "loop not vectorized: cannot prove it is safe to reorder "
"floating-point operations";		"floating-point operations";
});		});
▲ Show 20 Lines • Show All 340 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll

	; RUN: opt < %s -loop-vectorize -scalable-vectorization=on -mtriple aarch64-unknown-linux-gnu -mattr=+sve -force-ordered-reductions=false -hints-allow-reordering=false -S 2>%t \| FileCheck %s --check-prefix=CHECK-NOT-VECTORIZED			; RUN: opt < %s -loop-vectorize -scalable-vectorization=on -mtriple aarch64-unknown-linux-gnu -mattr=+sve -force-ordered-reductions=false -hints-allow-reordering=false -S 2>%t \| FileCheck %s --check-prefix=CHECK-NOT-VECTORIZED
	; RUN: opt < %s -loop-vectorize -scalable-vectorization=on -mtriple aarch64-unknown-linux-gnu -mattr=+sve -force-ordered-reductions=false -hints-allow-reordering=true -S 2>%t \| FileCheck %s --check-prefix=CHECK-UNORDERED			; RUN: opt < %s -loop-vectorize -scalable-vectorization=on -mtriple aarch64-unknown-linux-gnu -mattr=+sve -force-ordered-reductions=false -hints-allow-reordering=true -S 2>%t \| FileCheck %s --check-prefix=CHECK-UNORDERED
	; RUN: opt < %s -loop-vectorize -scalable-vectorization=on -mtriple aarch64-unknown-linux-gnu -mattr=+sve -force-ordered-reductions=true -hints-allow-reordering=false -S 2>%t \| FileCheck %s --check-prefix=CHECK-ORDERED			; RUN: opt < %s -loop-vectorize -scalable-vectorization=on -mtriple aarch64-unknown-linux-gnu -mattr=+sve -force-ordered-reductions=true -hints-allow-reordering=false -S 2>%t \| FileCheck %s --check-prefix=CHECK-ORDERED
	; RUN: opt < %s -loop-vectorize -scalable-vectorization=on -mtriple aarch64-unknown-linux-gnu -mattr=+sve -force-ordered-reductions=true -hints-allow-reordering=true -S 2>%t \| FileCheck %s --check-prefix=CHECK-UNORDERED			; RUN: opt < %s -loop-vectorize -scalable-vectorization=on -mtriple aarch64-unknown-linux-gnu -mattr=+sve -force-ordered-reductions=true -hints-allow-reordering=true -S 2>%t \| FileCheck %s --check-prefix=CHECK-UNORDERED
	; RUN: opt < %s -loop-vectorize -scalable-vectorization=on -mtriple aarch64-unknown-linux-gnu -mattr=+sve -hints-allow-reordering=false -S 2>%t \| FileCheck %s --check-prefix=CHECK-NOT-VECTORIZED			; RUN: opt < %s -loop-vectorize -scalable-vectorization=on -mtriple aarch64-unknown-linux-gnu -mattr=+sve -hints-allow-reordering=false -S 2>%t \| FileCheck %s --check-prefix=CHECK-ORDERED
				spatelUnsubmitted Not Done Reply Inline Actions Drive-by comment (checking my understanding) - we could add this RUN line with `--check-prefix=CHECK-NOT-VECTORIZED` as a preliminary commit and then update only the prefix to show the change in behavior with this patch? spatel: Drive-by comment (checking my understanding) - we could add this RUN line with `--check…
				david-armAuthorUnsubmitted Done Reply Inline Actions That sounds like a good idea! I'll commit a one line patch to do this and then rebase. david-arm: That sounds like a good idea! I'll commit a one line patch to do this and then rebase.

	define float @fadd_strict(float* noalias nocapture readonly %a, i64 %n) #0 {			define float @fadd_strict(float* noalias nocapture readonly %a, i64 %n) #0 {
	; CHECK-ORDERED-LABEL: @fadd_strict			; CHECK-ORDERED-LABEL: @fadd_strict
	; CHECK-ORDERED: vector.body:			; CHECK-ORDERED: vector.body:
	; CHECK-ORDERED: %[[VEC_PHI:.]] = phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX:.]], %vector.body ]			; CHECK-ORDERED: %[[VEC_PHI:.]] = phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX:.]], %vector.body ]
	; CHECK-ORDERED: %[[LOAD:.]] = load <vscale x 8 x float>, <vscale x 8 x float>			; CHECK-ORDERED: %[[LOAD:.]] = load <vscale x 8 x float>, <vscale x 8 x float>
	; CHECK-ORDERED: %[[RDX]] = call float @llvm.vector.reduce.fadd.nxv8f32(float %[[VEC_PHI]], <vscale x 8 x float> %[[LOAD]])			; CHECK-ORDERED: %[[RDX]] = call float @llvm.vector.reduce.fadd.nxv8f32(float %[[VEC_PHI]], <vscale x 8 x float> %[[LOAD]])
	; CHECK-ORDERED: for.end			; CHECK-ORDERED: for.end
	▲ Show 20 Lines • Show All 389 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll

	; RUN: opt < %s -loop-vectorize -mtriple aarch64-unknown-linux-gnu -force-ordered-reductions=false -hints-allow-reordering=false -S 2>%t \| FileCheck %s --check-prefix=CHECK-NOT-VECTORIZED			; RUN: opt < %s -loop-vectorize -mtriple aarch64-unknown-linux-gnu -force-ordered-reductions=false -hints-allow-reordering=false -S 2>%t \| FileCheck %s --check-prefix=CHECK-NOT-VECTORIZED
	; RUN: opt < %s -loop-vectorize -mtriple aarch64-unknown-linux-gnu -force-ordered-reductions=false -hints-allow-reordering=true -S 2>%t \| FileCheck %s --check-prefix=CHECK-UNORDERED			; RUN: opt < %s -loop-vectorize -mtriple aarch64-unknown-linux-gnu -force-ordered-reductions=false -hints-allow-reordering=true -S 2>%t \| FileCheck %s --check-prefix=CHECK-UNORDERED
	; RUN: opt < %s -loop-vectorize -mtriple aarch64-unknown-linux-gnu -force-ordered-reductions=true -hints-allow-reordering=false -S 2>%t \| FileCheck %s --check-prefix=CHECK-ORDERED			; RUN: opt < %s -loop-vectorize -mtriple aarch64-unknown-linux-gnu -force-ordered-reductions=true -hints-allow-reordering=false -S 2>%t \| FileCheck %s --check-prefix=CHECK-ORDERED
	; RUN: opt < %s -loop-vectorize -mtriple aarch64-unknown-linux-gnu -force-ordered-reductions=true -hints-allow-reordering=true -S 2>%t \| FileCheck %s --check-prefix=CHECK-UNORDERED			; RUN: opt < %s -loop-vectorize -mtriple aarch64-unknown-linux-gnu -force-ordered-reductions=true -hints-allow-reordering=true -S 2>%t \| FileCheck %s --check-prefix=CHECK-UNORDERED
	; RUN: opt < %s -loop-vectorize -mtriple aarch64-unknown-linux-gnu -hints-allow-reordering=false -S 2>%t \| FileCheck %s --check-prefix=CHECK-NOT-VECTORIZED			; RUN: opt < %s -loop-vectorize -mtriple aarch64-unknown-linux-gnu -hints-allow-reordering=false -S 2>%t \| FileCheck %s --check-prefix=CHECK-ORDERED

	define float @fadd_strict(float* noalias nocapture readonly %a, i64 %n) {			define float @fadd_strict(float* noalias nocapture readonly %a, i64 %n) {
	; CHECK-ORDERED-LABEL: @fadd_strict			; CHECK-ORDERED-LABEL: @fadd_strict
	; CHECK-ORDERED: vector.body:			; CHECK-ORDERED: vector.body:
	; CHECK-ORDERED: %[[VEC_PHI:.]] = phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX:.]], %vector.body ]			; CHECK-ORDERED: %[[VEC_PHI:.]] = phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX:.]], %vector.body ]
	; CHECK-ORDERED: %[[LOAD:.]] = load <8 x float>, <8 x float>			; CHECK-ORDERED: %[[LOAD:.]] = load <8 x float>, <8 x float>
	; CHECK-ORDERED: %[[RDX]] = call float @llvm.vector.reduce.fadd.v8f32(float %[[VEC_PHI]], <8 x float> %[[LOAD]])			; CHECK-ORDERED: %[[RDX]] = call float @llvm.vector.reduce.fadd.v8f32(float %[[VEC_PHI]], <8 x float> %[[LOAD]])
	; CHECK-ORDERED: for.end			; CHECK-ORDERED: for.end
	▲ Show 20 Lines • Show All 769 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 367423

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll

[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64
ClosedPublic