This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
1/3
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Target/AArch64/
-
AArch64/
2/2
AArch64TargetTransformInfo.h
3/10
AArch64TargetTransformInfo.cpp
-
Transforms/Vectorize/
-
Vectorize/
19/27
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/AArch64/
-
Transforms/
-
LoopVectorize/
-
AArch64/
7/8
scalable-reductions.ll

Differential D95245

[SVE] Add support for scalable vectorization of loops with int/fast FP reductions
ClosedPublic

Authored by kmclaughlin on Jan 22 2021, 9:28 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
david-arm
greened
fhahn
frasercrmck
efriedma
dmgreen

Commits

rGba1e150d03ca: [SVE] Add support for scalable vectorization of loops with int/fast FP…

Summary

This patch enables scalable vectorization of loops with integer/fast reductions, e.g:

unsigned sum = 0;
for (int i = 0; i < n; ++i) {
  sum += a[i];
}

A new TTI interface, isLegalToVectorizeReduction, has been added to prevent
reductions which are not supported for scalable types from vectorizing.
If the reduction is not supported for a given scalable VF,
computeFeasibleMaxVF will fall back to using fixed-width vectorization.

Diff Detail

Event Timeline

kmclaughlin created this revision.Jan 22 2021, 9:28 AM

Herald added a reviewer: efriedma. · View Herald TranscriptJan 22 2021, 9:28 AM

Herald added subscribers: NickHung, bmahjour, psnobl and 2 others. · View Herald Transcript

kmclaughlin requested review of this revision.Jan 22 2021, 9:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 22 2021, 9:28 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B86309: Diff 318549.Jan 22 2021, 10:08 AM

cameron.mcinally added a subscriber: cameron.mcinally.Jan 22 2021, 11:07 AM

bmahjour removed a subscriber: bmahjour.Jan 22 2021, 11:09 AM

Matt added a subscriber: Matt.Jan 22 2021, 11:13 AM

timsmith78 added a subscriber: timsmith78.Jan 22 2021, 12:18 PM

Hey Kerry,
Thank you for this patch.
I found some nit and I have some suggestions about instructionCost.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4695	So once we start to use Scalable vector and we start to use the VF.getKnownMinValue(), shouldn't;t this be multiplied by getMaxVScale()?
4743	Same here, should we not need to multiply by getMaxVScale()?
6192	I believe we can use LoopCost.isValid(), here!
6213	Can you change SmallLoopCost to be instruction cost as LoopCost, so you don't need to use *LoopCost.getValue()? And I believe that in the std::min you will not need to use getValue
7700	nit
9470	nit

david-arm added inline comments.Jan 25 2021, 1:37 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1121	Just a thought - if we're excluding FMul from reductions is it worth having an assert here that the op is not fmul?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1519	It might be worth printing out the recurrence kind here. Do we also want to emit a remark here to help the user understand why it failed to vectorise?
4695	This is for vectorise of induction variables. I think we'll have to use a runtime VF that I introduced in D95139 here. I don't think Kerry has to fix this in her patch.
6192	I think since we're changing LoopCost to be InstructionCost we can change the line above too from LoopCost = *expectedCost(VF).first.getValue(); to LoopCost = expectedCost(VF).first;

david-arm added inline comments.Jan 25 2021, 1:37 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9485	Similar to an earlier comment, a remark here would be good I think.
llvm/test/Transforms/LoopVectorize/scalable_reductions.ll
1 ↗	(On Diff #318549)	Needs a "REQUIRES: asserts" here I think because you're relying upon debug output. Also, since you're explicitly adding "-mattr=+sve" here I think you'll either have to: Make the test generic work for all targets (this test will fail on some builds due to lack of AArch64 support), or Move the test for LoopVectorize/AArch64
16 ↗	(On Diff #318549)	I wonder if it's worth adding CHECK lines for the resulting IR to show we've vectorised the loop using reductions and checking we have the right structure, i.e. vector.body, middle.block, etc?

fhahn added inline comments.Jan 25 2021, 2:02 AM

llvm/include/llvm/Analysis/TargetTransformInfo.h
1309	This should probably have a comment,.
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3924 ↗	(On Diff #318549)	Can you add a test for this? Also, this seems completely unrelated, can you split it off?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1271	those changes could also be submitted separately?
1513	This also needs a comment. And the name could probably be improved. Maybe `canVectorizeReductions`?
7691	This should only be checked in the code handling `UserVF` below? Also, This seems like a property that generally limits to vectorization factor to fixed-width vectorization factors and would be good to check beforehand. Would it be possible to just limit vectorization factors to fixed width factors in `computeFeasibleMaxVF`? This way, we won't need extra checks once automatically picked VFs are supported. You'd also won't need any extra code in the caller of `::plan`. This is similar to how we deal with other 'legality' properties that depend on the vectorization factor, like dependencies that may limit the vectorization factor.
9483	This message seems a bit odd. I think the cost model should just be responsible for assigning a cost, not deciding whether it is possible to vectorize or not; that's the job of the legality checks. Please see my comment above, the could probably done in `computeFeasibleMaxVF`, which technically is part of the cost model, but is the first step and applies other legality constraints as well which limit the vectorization factor.
llvm/test/Transforms/LoopVectorize/scalable_reductions.ll
9 ↗	(On Diff #318549)	Personally I don't think the C source code adds much value. The IR is very compact and it should be obvious from the IR & test name what is going on. Also, the IR that clang generates can change, clang options may change, pragmas may change and so on.
20 ↗	(On Diff #318549)	this should not be needed for the test.
23 ↗	(On Diff #318549)	this should not be needed for the test, you can just pass `%n` as `i64`.
27 ↗	(On Diff #318549)	nit: can strip `indvars` from the name to mark things more compact.

Removed changes to LoopVectorizationPlanner::plan and instead check whether reductions can be vectorized in computeFeasibleMaxVF. If any reduction in the loop cannot be vectorized with a scalable VF, we fall back on fixed-width vectorization.

Changes to have VectorizationFactor use InstructionCost were not necessary to the patch after the above change and have also been removed.

Improved the tests in scalable_reductions.ll based on suggestions from @fhahn & @david-arm

Thanks for reviewing this patch, all!

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3924 ↗	(On Diff #318549)	I've removed this from the patch, I don't think it's required for the tests here.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
6213	Hi @CarolineConcatto, thanks for your suggestions on InstructionCost! I didn't change the SmallLoopCost flag to be an instruction cost in the last revision as this caused tests which use -small-loop-cost to fail (e.g. LoopVectorize/unroll_novec.ll)
7691	Thanks for this suggestion, @fhahn. I've moved the canVectorizeReductions check to `computeFeasibleMaxVF` & updated the affected test in scalable_reductions.ll, where we can use fixed-width vectorization instead (`@mul`)
llvm/test/Transforms/LoopVectorize/scalable_reductions.ll
1 ↗	(On Diff #318549)	Added `REQUIRES: asserts` & moved the test to `Transforms/LoopVectorize/AArch64`

david-arm added inline comments.Jan 27 2021, 1:36 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1513	nit: Perhaps use `///` here instead of '//' in line with other function comments?
5706	I wonder if it's worth bailing out even earlier, i.e. in the same place as above where you check initially? I think the main benefit to bailing out here is if you can reduce the VF to something smaller so that it becomes legal. However, I think for reductions changing the VF won't make a difference in practice.
5715	nit: Perhaps use "operations" here instead of types? I'm thinking that the user probably isn't aware of the RecurrenceKind so type might not make as much sense?
llvm/test/Transforms/LoopVectorize/AArch64/scalable_reductions.ll
2 ↗	(On Diff #319303)	I think you can reduce the number of RUN lines here by piping stderr for the first RUN line to a temporary file, e.g. something like ; RUN: opt < %s -loop-vectorize -transform-warning -mtriple aarch64-unknown-linux-gnu -mattr=+sve -debug-only=loop-vectorize -S 2>%t \| FileCheck %s -check-prefix=CHECK ; RUN cat %t \| FileCheck %s -check-prefix=CHECK-DEBUG
4 ↗	(On Diff #319303)	Is it worth changing this to check for the new remark instead? You can use something like this: ; RUN: opt < %s -loop-vectorize -pass-remarks='loop-vectorize' -disable-output -mtriple aarch64-unknown-linux-gnu -mattr=+sve -S 2>&1 \| ...
223 ↗	(On Diff #319303)	I'm a bit surprised this vectorises to be honest, since there is no 'fast' flag here! Perhaps for IEEE math you have to add specific attributes to the function?

Moved the canVectorizeReductions check to earlier in computeFeasibleMaxVF
Updated the RUN lines in scalable_reductions.ll
Removed duplicate test for FAdd

kmclaughlin marked an inline comment as not done.Feb 1 2021, 10:11 AM

kmclaughlin added inline comments.

llvm/test/Transforms/LoopVectorize/AArch64/scalable_reductions.ll
223 ↗	(On Diff #319303)	I think what happened here is that the hints used to enable vectorization have allowed reordering, similar to using -Ofast. I found this comment at the top of allowReordering() in LoopVectorizationLegality: // When enabling loop hints are provided we allow the vectorizer to change // the order of operations that is given by the scalar loop. This is not // enabled by default because can be unsafe or inefficient. For example, // reordering floating-point operations will change the way round-off // error accumulates in the loop. This behaviour was queried on the mailing list last year: https://lists.llvm.org/pipermail/llvm-dev/2020-June/142697.html

dcaballe added a subscriber: dcaballe.Feb 1 2021, 10:17 AM

dmgreen added a subscriber: dmgreen.Feb 1 2021, 10:26 AM

dmgreen added inline comments.

llvm/include/llvm/Analysis/TargetTransformInfo.h
1310	Does this need to check the type? Does an i128 reduction work, for example? I presume if a <vscale x 4 x float> reduction works then any <vscale x ? x float> will work?

Thanks for making the changes @kmclaughlin! Just a couple more comments ...

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5659–5660	I think this looks much better now you're just checking reductions only once and early on - thanks for this! However, I think you might need to move this check down to line 5677 where we return UserVF. So the reason I think this is because if we have a loop that contains memory dependences and reductions in the same loop we want to ensure we always do the reduction checks regardless. For example, Legal->isSafeForAnyVectorWidth() could return false and then in the code below we may successfully reduce the UserVF from <vscale x 8 x float> to <vscale x 4 x float> without ever calling canVectorizeReductions.
llvm/test/Transforms/LoopVectorize/AArch64/scalable_reductions.ll
1 ↗	(On Diff #320512)	Thanks for RUN line changes here - looks a lot neater now thanks! If it's not too difficult I think it would be great if you could test the remark here too, since this is user-facing rather than debug. If you want you can even test the remark instead of the debug - this would also mean you can remove the "REQUIRE: asserts" line above too.

sdesmalen added inline comments.Feb 2 2021, 4:16 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1094	nit: bail out early to reduce indentation. if (!Scalable) return true;
1111	nit: can be removed if you add the early bail out.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1516	nit: use `return llvm::all_of(....)` with lambda, instead of loop?
1519	Is it worth just passing the whole Recurrence descriptor and the whole of VF? When passing the whole Recurrence descriptor, in the future the function can also determine whether it can vectorize an ordered reduction (e.g. ordered fadd) in the loop body using some instruction.

fhahn added inline comments.Feb 2 2021, 4:39 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5659–5660	please also add a test for this scenario.

Moved the Legal->isSafeForAnyVectorWidth() check in computeFeasibleMaxVF further down so that we always check the reductions even if the loop contains memory dependencies. Added a test for this scenario to scalable_reductions.ll.

Changed isLegalToVectorizeReduction so that the whole RecurrenceDescriptor and VF are passed in, and added a check of the recurrence type.

Replaced the loop in canVectorizeReductions with lambda

Removed REQUIRE: asserts from the test file and added -pass-remarks-analysis/missed flags to the RUN line

llvm/include/llvm/Analysis/TargetTransformInfo.h
1310	Hi @dmgreen, thanks for taking a look at this! I've added a check of the recurrence type to isLegalToVectorizeReduction. I think any <vscale x ? x float> reduction will work, I added some tests for legalization of vector reductions as part of D93050.

LGTM! Thanks for making all changes. Perhaps wait a while before merging in case others want a look?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1516	nit: I think you can remove the '(' and ')' surrounding the llvm::all_of call here.

LGTM. Forgot to click "Accept Revision" before. Doh!

This revision is now accepted and ready to land.Feb 3 2021, 8:40 AM

dmgreen added inline comments.Feb 4 2021, 12:42 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1098	Thanks. This looks like it should work for most current types. Are bfloats always supported? It may be better to be more specific in case other smaller-than-64bit float types are added in the future.

david-arm added inline comments.Feb 4 2021, 1:00 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1098	Is this needed though? If bfloats are in the scalar IR it means that the user has explicitly written code using the SVE ACLE so I'd imagine that all bets are off anyway if they didn't build with bf16 support. I'd also imagine that these would be flagged up as illegal types earlier on in the vectoriser too I think?

dmgreen added inline comments.Feb 4 2021, 3:46 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1098	Hmm. I guess I I don't see the advantage of getting it wrong. Clang isn't the only frontend and the vectorizer needs to take any valid input and not crash or produce code that will later crash. Being specific about which types are supported seems like a better idea to me than hoping it works and hoping that won't change in the future.

david-arm added inline comments.Feb 4 2021, 3:58 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1098	No that's a fair point and happy for @kmclaughlin to add the check. However, we can't test such a scenario even with hand written IR because the vectoriser crashes without bfloat support: LLVM ERROR: Cannot legalize this vector #8 0x0000ffff959efad8 llvm::TargetLoweringBase::getTypeConversion(llvm::LLVMContext&, llvm::EVT) const (.localalias) (/home/davshe01/upstream/llvm-project/build2/bin/../lib/libLLVMSupport.so.13git+0xcfad8) #9 0x0000ffff959efbd8 llvm::TargetLoweringBase::getTypeLegalizationCost(llvm::DataLayout const&, llvm::Type*) const (/home/davshe01/upstream/llvm-project/build2/bin/../lib/libLLVMSupport.so.13git+0xcfbd8)

Added a function called isLegalScalarTypeForSVE which checks that the reduction type is supported & added a new test which uses bfloat to scalable-reductions.ll

Nice one. Thanks for the change. LGTM

sdesmalen added inline comments.Feb 4 2021, 2:09 PM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1121	The same should hold for integer Mul. nit: you can better add that to the switch statement below as: case Instruction::Mul: case Instruction::FMul: assert(!isa<ScalableVectorType>(Ty) && "Unexpected ..."); LLVM_FALLTHROUGH; case Instruction::Fadd: ...
llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
189	Can you merge this function with `isLegalScalarTypeForSVEMaskedMemOp` and name it `isLegalElementTypeForSVE`? I think their implementation should be the same (including your check here for `hasBF16`)

sdesmalen added inline comments.Feb 5 2021, 1:54 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h
190–191	I forgot to mention that there are no reduction instructions for bfloat, so you'll need to catch out that specific case in `isLegalToVectorizeReduction`

Merged isLegalScalarTypeForSVEMaskedMemOp & isLegalScalarTypeForSVE
Return false from isLegalToVectorizeReduction for bfloat types
Included isa<ScalableVectorType>(Ty) in the switch statement conditions of useReductionIntrinsic

Thanks for the changes. I only have some more comments about the tests now.

llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll
101	nit: remove `dso_local` here and in other definitions.
340	This CHECK-DEBUG (with it's own RUN line) is not checking which function is not vectorizing, it could just as well be emitted for one of the other functions. I'd suggest explicitly adding checks for `@mul` and adding a CHECK-DEBUG line for the other tests as well.
376	Same as above. Can you also add a comment saying why you're testing a `memory_dependence` issue in a test file called `scalable-reductions.ll` ?
424	These two fmin/fmax tests are not very useful, because the loop doesn't fail to vectorize because of code added in this patch.
470	nit: use `nnan` directly in the fp operation instead of an attribute.

david-arm added inline comments.Feb 8 2021, 12:38 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1121	Hi @sdesmalen, just for information the reason I'd asked for an assert here is that if we're still intending to create a target reduction intrinsic at this point with a mul or fmul then something has gone badly wrong and is almost certainly a bug. This is because this function is only ever called at the point where you've already decided that it's legal to reduce a scalable mul operation. The two places where this is called are from SLPVectorizer.cpp:createSimpleTargetReduction and InnerLoopVectorizer::fixReduction (via createTargetReduction).

david-arm added inline comments.Feb 8 2021, 1:14 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1121	Sorry, please ignore my comment! For some reason I hadn't seen the assert in there.

kmclaughlin mentioned this in D96350: [SVE][LoopVectorize] Enable vectorization of fmin/fmax with nnan.Feb 9 2021, 9:09 AM

Changes to the tests in scalable-reductions.ll:

Removed dso_local from definitions
Added a comment on the purpose of the memory_dependence test
Added CHECK-REMARK lines for each test in the file
Removed the unnecessary fmin/fmax tests where we can't vectorize

llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll
470	Hi @sdesmalen, these tests for fmin/fmax fail without the `no-nans-fp-math` attribute, I think because `RecurrenceDescriptor::isRecurrenceInstr` is just checking for the function attribute and not the flags on the instruction. I've created a separate patch (D96350) to try and address this.

Rebased changes

LGTM! Latest version looks good and I think you've addressed @sdesmalen's comments. Thanks!

llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll
377	nit: Perhaps you could make it clear you're testing the ordering, i.e. with something like: This test was added to ensure we always check the legality of reductions (end emit a warning if necessary) before checking for memory dependencies

LV changes LGTM, thanks for the updates!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1514	nit: `.` at end of sentence.
1516	nit: `llvm::` should not be required
5665	I think you should be bale to use `reportVectorizationFailure` to print to `dbgs()` and generate a remark with the same message
llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll
21	nit: those checks should not be needed.

Closed by commit rGba1e150d03ca: [SVE] Add support for scalable vectorization of loops with int/fast FP… (authored by kmclaughlin). · Explain WhyFeb 16 2021, 5:50 AM

This revision was automatically updated to reflect the committed changes.

kmclaughlin marked 5 inline comments as done.

kmclaughlin added a commit: rGba1e150d03ca: [SVE] Add support for scalable vectorization of loops with int/fast FP….

Thanks all for reviewing these changes!

sdesmalen mentioned this in D96021: [LoopVectorize] NFC: Move UserVF feasibility checks to separate function..Feb 16 2021, 6:39 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

11 lines

TargetTransformInfoImpl.h

5 lines

lib/

Analysis/

TargetTransformInfo.cpp

5 lines

Target/

AArch64/

AArch64TargetTransformInfo.h

15 lines

AArch64TargetTransformInfo.cpp

46 lines

Transforms/

Vectorize/

LoopVectorize.cpp

33 lines

test/

Transforms/

LoopVectorize/

AArch64/

scalable-reductions.ll

476 lines

Diff 321749

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show All 15 Lines
/// This file defines #2, which is the interface that IR-level transformations		/// This file defines #2, which is the interface that IR-level transformations
/// use for querying the codegen.		/// use for querying the codegen.
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_ANALYSIS_TARGETTRANSFORMINFO_H		#ifndef LLVM_ANALYSIS_TARGETTRANSFORMINFO_H
#define LLVM_ANALYSIS_TARGETTRANSFORMINFO_H		#define LLVM_ANALYSIS_TARGETTRANSFORMINFO_H

		#include "llvm/Analysis/IVDescriptors.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/AtomicOrdering.h"		#include "llvm/Support/AtomicOrdering.h"
#include "llvm/Support/DataTypes.h"		#include "llvm/Support/DataTypes.h"
#include "llvm/Support/InstructionCost.h"		#include "llvm/Support/InstructionCost.h"
#include <functional>		#include <functional>
▲ Show 20 Lines • Show All 1,268 Lines • ▼ Show 20 Lines
/// \returns True if it is legal to vectorize the given load chain.		/// \returns True if it is legal to vectorize the given load chain.
bool isLegalToVectorizeLoadChain(unsigned ChainSizeInBytes, Align Alignment,		bool isLegalToVectorizeLoadChain(unsigned ChainSizeInBytes, Align Alignment,
unsigned AddrSpace) const;		unsigned AddrSpace) const;

/// \returns True if it is legal to vectorize the given store chain.		/// \returns True if it is legal to vectorize the given store chain.
bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes, Align Alignment,		bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes, Align Alignment,
unsigned AddrSpace) const;		unsigned AddrSpace) const;

		/// \returns True if it is legal to vectorize the given reduction kind.
		fhahnUnsubmitted Done Reply Inline Actions This should probably have a comment,. fhahn: This should probably have a comment,.
		bool isLegalToVectorizeReduction(RecurrenceDescriptor RdxDesc,
		dmgreenUnsubmitted Not Done Reply Inline Actions Does this need to check the type? Does an i128 reduction work, for example? I presume if a <vscale x 4 x float> reduction works then any <vscale x ? x float> will work? dmgreen: Does this need to check the type? Does an i128 reduction work, for example? I presume if a…
		kmclaughlinAuthorUnsubmitted Not Done Reply Inline Actions Hi @dmgreen, thanks for taking a look at this! I've added a check of the recurrence type to isLegalToVectorizeReduction. I think any <vscale x ? x float> reduction will work, I added some tests for legalization of vector reductions as part of D93050. kmclaughlin: Hi @dmgreen, thanks for taking a look at this! I've added a check of the recurrence type to…
		ElementCount VF) const;

/// \returns The new vector factor value if the target doesn't support \p		/// \returns The new vector factor value if the target doesn't support \p
/// SizeInBytes loads or has a better vector factor.		/// SizeInBytes loads or has a better vector factor.
unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,		unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const;		VectorType *VecTy) const;

/// \returns The new vector factor value if the target doesn't support \p		/// \returns The new vector factor value if the target doesn't support \p
/// SizeInBytes stores or has a better vector factor.		/// SizeInBytes stores or has a better vector factor.
▲ Show 20 Lines • Show All 323 Lines • ▼ Show 20 Lines	public:
virtual bool isLegalToVectorizeLoad(LoadInst *LI) const = 0;		virtual bool isLegalToVectorizeLoad(LoadInst *LI) const = 0;
virtual bool isLegalToVectorizeStore(StoreInst *SI) const = 0;		virtual bool isLegalToVectorizeStore(StoreInst *SI) const = 0;
virtual bool isLegalToVectorizeLoadChain(unsigned ChainSizeInBytes,		virtual bool isLegalToVectorizeLoadChain(unsigned ChainSizeInBytes,
Align Alignment,		Align Alignment,
unsigned AddrSpace) const = 0;		unsigned AddrSpace) const = 0;
virtual bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes,		virtual bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes,
Align Alignment,		Align Alignment,
unsigned AddrSpace) const = 0;		unsigned AddrSpace) const = 0;
		virtual bool isLegalToVectorizeReduction(RecurrenceDescriptor RdxDesc,
		ElementCount VF) const = 0;
virtual unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,		virtual unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const = 0;		VectorType *VecTy) const = 0;
virtual unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,		virtual unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const = 0;		VectorType *VecTy) const = 0;
virtual bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		virtual bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
ReductionFlags) const = 0;		ReductionFlags) const = 0;
▲ Show 20 Lines • Show All 510 Lines • ▼ Show 20 Lines	bool isLegalToVectorizeLoadChain(unsigned ChainSizeInBytes, Align Alignment,
return Impl.isLegalToVectorizeLoadChain(ChainSizeInBytes, Alignment,		return Impl.isLegalToVectorizeLoadChain(ChainSizeInBytes, Alignment,
AddrSpace);		AddrSpace);
}		}
bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes, Align Alignment,		bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes, Align Alignment,
unsigned AddrSpace) const override {		unsigned AddrSpace) const override {
return Impl.isLegalToVectorizeStoreChain(ChainSizeInBytes, Alignment,		return Impl.isLegalToVectorizeStoreChain(ChainSizeInBytes, Alignment,
AddrSpace);		AddrSpace);
}		}
		bool isLegalToVectorizeReduction(RecurrenceDescriptor RdxDesc,
		ElementCount VF) const override {
		return Impl.isLegalToVectorizeReduction(RdxDesc, VF);
		}
unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,		unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const override {		VectorType *VecTy) const override {
return Impl.getLoadVectorFactor(VF, LoadSize, ChainSizeInBytes, VecTy);		return Impl.getLoadVectorFactor(VF, LoadSize, ChainSizeInBytes, VecTy);
}		}
unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,		unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const override {		VectorType *VecTy) const override {
▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 680 Lines • ▼ Show 20 Lines	bool isLegalToVectorizeLoadChain(unsigned ChainSizeInBytes, Align Alignment,
return true;		return true;
}		}

bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes, Align Alignment,		bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes, Align Alignment,
unsigned AddrSpace) const {		unsigned AddrSpace) const {
return true;		return true;
}		}

		bool isLegalToVectorizeReduction(RecurrenceDescriptor RdxDesc,
		ElementCount VF) const {
		return true;
		}

unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,		unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const {		VectorType *VecTy) const {
return VF;		return VF;
}		}

unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,		unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
▲ Show 20 Lines • Show All 418 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 1,029 Lines • ▼ Show 20 Lines
	}			}

	bool TargetTransformInfo::isLegalToVectorizeStoreChain(			bool TargetTransformInfo::isLegalToVectorizeStoreChain(
	unsigned ChainSizeInBytes, Align Alignment, unsigned AddrSpace) const {			unsigned ChainSizeInBytes, Align Alignment, unsigned AddrSpace) const {
	return TTIImpl->isLegalToVectorizeStoreChain(ChainSizeInBytes, Alignment,			return TTIImpl->isLegalToVectorizeStoreChain(ChainSizeInBytes, Alignment,
	AddrSpace);			AddrSpace);
	}			}

				bool TargetTransformInfo::isLegalToVectorizeReduction(
				RecurrenceDescriptor RdxDesc, ElementCount VF) const {
				return TTIImpl->isLegalToVectorizeReduction(RdxDesc, VF);
				}

	unsigned TargetTransformInfo::getLoadVectorFactor(unsigned VF,			unsigned TargetTransformInfo::getLoadVectorFactor(unsigned VF,
	unsigned LoadSize,			unsigned LoadSize,
	unsigned ChainSizeInBytes,			unsigned ChainSizeInBytes,
	VectorType *VecTy) const {			VectorType *VecTy) const {
	return TTIImpl->getLoadVectorFactor(VF, LoadSize, ChainSizeInBytes, VecTy);			return TTIImpl->getLoadVectorFactor(VF, LoadSize, ChainSizeInBytes, VecTy);
	}			}

	unsigned TargetTransformInfo::getStoreVectorFactor(unsigned VF,			unsigned TargetTransformInfo::getStoreVectorFactor(unsigned VF,
	▲ Show 20 Lines • Show All 428 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 180 Lines • ▼ Show 20 Lines	public:
void getPeelingPreferences(Loop *L, ScalarEvolution &SE,		void getPeelingPreferences(Loop *L, ScalarEvolution &SE,
TTI::PeelingPreferences &PP);		TTI::PeelingPreferences &PP);

Value getOrCreateResultFromMemIntrinsic(IntrinsicInst Inst,		Value getOrCreateResultFromMemIntrinsic(IntrinsicInst Inst,
Type *ExpectedType);		Type *ExpectedType);

bool getTgtMemIntrinsic(IntrinsicInst *Inst, MemIntrinsicInfo &Info);		bool getTgtMemIntrinsic(IntrinsicInst *Inst, MemIntrinsicInfo &Info);

bool isLegalScalarTypeForSVEMaskedMemOp(Type *Ty) const {		bool isLegalElementTypeForSVE(Type *Ty) const {
		sdesmalenUnsubmitted Done Reply Inline Actions Can you merge this function with `isLegalScalarTypeForSVEMaskedMemOp` and name it `isLegalElementTypeForSVE`? I think their implementation should be the same (including your check here for `hasBF16`) sdesmalen: Can you merge this function with `isLegalScalarTypeForSVEMaskedMemOp` and name it…
if (Ty->isPointerTy())		if (Ty->isPointerTy())
return true;		return true;
		sdesmalenUnsubmitted Done Reply Inline Actions I forgot to mention that there are no reduction instructions for bfloat, so you'll need to catch out that specific case in `isLegalToVectorizeReduction` sdesmalen: I forgot to mention that there are no reduction instructions for bfloat, so you'll need to…

if (Ty->isBFloatTy() \|\| Ty->isHalfTy() \|\|		if (Ty->isBFloatTy() && ST->hasBF16())
Ty->isFloatTy() \|\| Ty->isDoubleTy())		return true;

		if (Ty->isHalfTy() \|\| Ty->isFloatTy() \|\| Ty->isDoubleTy())
return true;		return true;

if (Ty->isIntegerTy(8) \|\| Ty->isIntegerTy(16) \|\|		if (Ty->isIntegerTy(8) \|\| Ty->isIntegerTy(16) \|\|
Ty->isIntegerTy(32) \|\| Ty->isIntegerTy(64))		Ty->isIntegerTy(32) \|\| Ty->isIntegerTy(64))
return true;		return true;

return false;		return false;
}		}

bool isLegalMaskedLoadStore(Type *DataType, Align Alignment) {		bool isLegalMaskedLoadStore(Type *DataType, Align Alignment) {
if (isa<FixedVectorType>(DataType) \|\| !ST->hasSVE())		if (isa<FixedVectorType>(DataType) \|\| !ST->hasSVE())
return false;		return false;

return isLegalScalarTypeForSVEMaskedMemOp(DataType->getScalarType());		return isLegalElementTypeForSVE(DataType->getScalarType());
}		}

bool isLegalMaskedLoad(Type *DataType, Align Alignment) {		bool isLegalMaskedLoad(Type *DataType, Align Alignment) {
return isLegalMaskedLoadStore(DataType, Alignment);		return isLegalMaskedLoadStore(DataType, Alignment);
}		}

bool isLegalMaskedStore(Type *DataType, Align Alignment) {		bool isLegalMaskedStore(Type *DataType, Align Alignment) {
return isLegalMaskedLoadStore(DataType, Alignment);		return isLegalMaskedLoadStore(DataType, Alignment);
}		}

bool isLegalMaskedGatherScatter(Type *DataType) const {		bool isLegalMaskedGatherScatter(Type *DataType) const {
if (isa<FixedVectorType>(DataType) \|\| !ST->hasSVE())		if (isa<FixedVectorType>(DataType) \|\| !ST->hasSVE())
return false;		return false;

return isLegalScalarTypeForSVEMaskedMemOp(DataType->getScalarType());		return isLegalElementTypeForSVE(DataType->getScalarType());
}		}

bool isLegalMaskedGather(Type *DataType, Align Alignment) const {		bool isLegalMaskedGather(Type *DataType, Align Alignment) const {
return isLegalMaskedGatherScatter(DataType);		return isLegalMaskedGatherScatter(DataType);
}		}
bool isLegalMaskedScatter(Type *DataType, Align Alignment) const {		bool isLegalMaskedScatter(Type *DataType, Align Alignment) const {
return isLegalMaskedGatherScatter(DataType);		return isLegalMaskedGatherScatter(DataType);
}		}
Show All 29 Lines	public:
bool shouldExpandReduction(const IntrinsicInst *II) const { return false; }		bool shouldExpandReduction(const IntrinsicInst *II) const { return false; }

unsigned getGISelRematGlobalCost() const {		unsigned getGISelRematGlobalCost() const {
return 2;		return 2;
}		}

bool supportsScalableVectors() const { return ST->hasSVE(); }		bool supportsScalableVectors() const { return ST->hasSVE(); }

		bool isLegalToVectorizeReduction(RecurrenceDescriptor RdxDesc,
		ElementCount VF) const;

bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
TTI::ReductionFlags Flags) const;		TTI::ReductionFlags Flags) const;

int getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,		int getArithmeticReductionCost(unsigned Opcode, VectorType *Ty,
bool IsPairwiseForm,		bool IsPairwiseForm,
TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput);		TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput);

int getShuffleCost(TTI::ShuffleKind Kind, VectorType *Tp, int Index,		int getShuffleCost(TTI::ShuffleKind Kind, VectorType *Tp, int Index,
VectorType *SubTp);		VectorType *SubTp);
/// @}		/// @}
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_AARCH64_AARCH64TARGETTRANSFORMINFO_H		#endif // LLVM_LIB_TARGET_AARCH64_AARCH64TARGETTRANSFORMINFO_H

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 1,083 Lines • ▼ Show 20 Lines	if (const GetElementPtrInst *GEPInst = dyn_cast<GetElementPtrInst>(U)) {
AllowPromotionWithoutCommonHeader = true;		AllowPromotionWithoutCommonHeader = true;
break;		break;
}		}
}		}
}		}
return Considerable;		return Considerable;
}		}

		bool AArch64TTIImpl::isLegalToVectorizeReduction(RecurrenceDescriptor RdxDesc,
		ElementCount VF) const {
		if (!VF.isScalable())
		sdesmalenUnsubmitted Done Reply Inline Actions nit: bail out early to reduce indentation. if (!Scalable) return true; sdesmalen: nit: bail out early to reduce indentation. if (!Scalable) return true;
		return true;

		Type *Ty = RdxDesc.getRecurrenceType();
		if (Ty->isBFloatTy() \|\| !isLegalElementTypeForSVE(Ty))
		dmgreenUnsubmitted Not Done Reply Inline Actions Thanks. This looks like it should work for most current types. Are bfloats always supported? It may be better to be more specific in case other smaller-than-64bit float types are added in the future. dmgreen: Thanks. This looks like it should work for most current types. Are bfloats always supported? It…
		david-armUnsubmitted Not Done Reply Inline Actions Is this needed though? If bfloats are in the scalar IR it means that the user has explicitly written code using the SVE ACLE so I'd imagine that all bets are off anyway if they didn't build with bf16 support. I'd also imagine that these would be flagged up as illegal types earlier on in the vectoriser too I think? david-arm: Is this needed though? If bfloats are in the scalar IR it means that the user has explicitly…
		dmgreenUnsubmitted Not Done Reply Inline Actions Hmm. I guess I I don't see the advantage of getting it wrong. Clang isn't the only frontend and the vectorizer needs to take any valid input and not crash or produce code that will later crash. Being specific about which types are supported seems like a better idea to me than hoping it works and hoping that won't change in the future. dmgreen: Hmm. I guess I I don't see the advantage of getting it wrong. Clang isn't the only frontend and…
		david-armUnsubmitted Not Done Reply Inline Actions No that's a fair point and happy for @kmclaughlin to add the check. However, we can't test such a scenario even with hand written IR because the vectoriser crashes without bfloat support: LLVM ERROR: Cannot legalize this vector #8 0x0000ffff959efad8 llvm::TargetLoweringBase::getTypeConversion(llvm::LLVMContext&, llvm::EVT) const (.localalias) (/home/davshe01/upstream/llvm-project/build2/bin/../lib/libLLVMSupport.so.13git+0xcfad8) #9 0x0000ffff959efbd8 llvm::TargetLoweringBase::getTypeLegalizationCost(llvm::DataLayout const&, llvm::Type) const (/home/davshe01/upstream/llvm-project/build2/bin/../lib/libLLVMSupport.so.13git+0xcfbd8) david-arm:* No that's a fair point and happy for @kmclaughlin to add the check. However, we can't test such…
		return false;

		switch (RdxDesc.getRecurrenceKind()) {
		case RecurKind::Add:
		case RecurKind::FAdd:
		case RecurKind::And:
		case RecurKind::Or:
		case RecurKind::Xor:
		case RecurKind::SMin:
		case RecurKind::SMax:
		case RecurKind::UMin:
		case RecurKind::UMax:
		case RecurKind::FMin:
		sdesmalenUnsubmitted Done Reply Inline Actions nit: can be removed if you add the early bail out. sdesmalen: nit: can be removed if you add the early bail out.
		case RecurKind::FMax:
		return true;
		default:
		return false;
		}
		}

bool AArch64TTIImpl::useReductionIntrinsic(unsigned Opcode, Type *Ty,		bool AArch64TTIImpl::useReductionIntrinsic(unsigned Opcode, Type *Ty,
TTI::ReductionFlags Flags) const {		TTI::ReductionFlags Flags) const {
auto *VTy = cast<VectorType>(Ty);		auto *VTy = cast<VectorType>(Ty);
		david-armUnsubmitted Done Reply Inline Actions Just a thought - if we're excluding FMul from reductions is it worth having an assert here that the op is not fmul? david-arm: Just a thought - if we're excluding FMul from reductions is it worth having an assert here that…
		sdesmalenUnsubmitted Not Done Reply Inline Actions The same should hold for integer Mul. nit: you can better add that to the switch statement below as: case Instruction::Mul: case Instruction::FMul: assert(!isa<ScalableVectorType>(Ty) && "Unexpected ..."); LLVM_FALLTHROUGH; case Instruction::Fadd: ... sdesmalen: The same should hold for integer Mul. nit: you can better add that to the switch statement…
		david-armUnsubmitted Not Done Reply Inline Actions Hi @sdesmalen, just for information the reason I'd asked for an assert here is that if we're still intending to create a target reduction intrinsic at this point with a mul or fmul then something has gone badly wrong and is almost certainly a bug. This is because this function is only ever called at the point where you've already decided that it's legal to reduce a scalable mul operation. The two places where this is called are from SLPVectorizer.cpp:createSimpleTargetReduction and InnerLoopVectorizer::fixReduction (via createTargetReduction). david-arm: Hi @sdesmalen, just for information the reason I'd asked for an assert here is that if we're…
		david-armUnsubmitted Not Done Reply Inline Actions Sorry, please ignore my comment! For some reason I hadn't seen the assert in there. david-arm: Sorry, please ignore my comment! For some reason I hadn't seen the assert in there.
unsigned ScalarBits = Ty->getScalarSizeInBits();		unsigned ScalarBits = Ty->getScalarSizeInBits();
		bool Scalable = isa<ScalableVectorType>(Ty);
switch (Opcode) {		switch (Opcode) {
case Instruction::FAdd:		case Instruction::Mul:
case Instruction::FMul:		case Instruction::FMul:
		assert(!Scalable && "Unexpected reduction opcode");
		LLVM_FALLTHROUGH;
		case Instruction::FAdd:
case Instruction::And:		case Instruction::And:
case Instruction::Or:		case Instruction::Or:
case Instruction::Xor:		case Instruction::Xor:
case Instruction::Mul:		return Scalable;
return false;
case Instruction::Add:		case Instruction::Add:
return ScalarBits * cast<FixedVectorType>(VTy)->getNumElements() >= 128;		return Scalable \|\|
case Instruction::ICmp:
return (ScalarBits < 64) &&
(ScalarBits * cast<FixedVectorType>(VTy)->getNumElements() >= 128);		(ScalarBits * cast<FixedVectorType>(VTy)->getNumElements() >= 128);
		case Instruction::ICmp:
		return Scalable \|\|
		((ScalarBits < 64) &&
		(ScalarBits * cast<FixedVectorType>(VTy)->getNumElements() >= 128));
case Instruction::FCmp:		case Instruction::FCmp:
return Flags.NoNaN;		return Scalable \|\| Flags.NoNaN;
default:		default:
llvm_unreachable("Unhandled reduction opcode");		llvm_unreachable("Unhandled reduction opcode");
}		}
return false;		return false;
}		}

int AArch64TTIImpl::getMinMaxReductionCost(VectorType Ty, VectorType CondTy,		int AArch64TTIImpl::getMinMaxReductionCost(VectorType Ty, VectorType CondTy,
bool IsPairwise, bool IsUnsigned,		bool IsPairwise, bool IsUnsigned,
▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,262 Lines • ▼ Show 20 Lines	public:
/// that needs to be vectorized. We ignore values that remain scalar such as		/// that needs to be vectorized. We ignore values that remain scalar such as
/// 64 bit loop indices.		/// 64 bit loop indices.
std::pair<unsigned, unsigned> getSmallestAndWidestTypes();		std::pair<unsigned, unsigned> getSmallestAndWidestTypes();

/// \return The desired interleave count.		/// \return The desired interleave count.
/// If interleave count has been specified by metadata it will be returned.		/// If interleave count has been specified by metadata it will be returned.
/// Otherwise, the interleave count is computed and returned. VF and LoopCost		/// Otherwise, the interleave count is computed and returned. VF and LoopCost
/// are the selected vectorization factor and the cost of the selected VF.		/// are the selected vectorization factor and the cost of the selected VF.
unsigned selectInterleaveCount(ElementCount VF, unsigned LoopCost);		unsigned selectInterleaveCount(ElementCount VF, unsigned LoopCost);
		fhahnUnsubmitted Done Reply Inline Actions those changes could also be submitted separately? fhahn: those changes could also be submitted separately?

/// Memory access instruction may be vectorized in more than one way.		/// Memory access instruction may be vectorized in more than one way.
/// Form of instruction after vectorization depends on cost.		/// Form of instruction after vectorization depends on cost.
/// This function takes cost-based decisions for Load/Store instructions		/// This function takes cost-based decisions for Load/Store instructions
/// and collects them in a map. This decisions map is used for building		/// and collects them in a map. This decisions map is used for building
/// the lists of loop-uniform and loop-scalar instructions.		/// the lists of loop-uniform and loop-scalar instructions.
/// The calculated cost is saved with widening decision in order to		/// The calculated cost is saved with widening decision in order to
/// avoid redundant calculations.		/// avoid redundant calculations.
▲ Show 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	bool isLegalGatherOrScatter(Value *V) {
if (!LI && !SI)		if (!LI && !SI)
return false;		return false;
auto *Ty = getMemInstValueType(V);		auto *Ty = getMemInstValueType(V);
Align Align = getLoadStoreAlignment(V);		Align Align = getLoadStoreAlignment(V);
return (LI && isLegalMaskedGather(Ty, Align)) \|\|		return (LI && isLegalMaskedGather(Ty, Align)) \|\|
(SI && isLegalMaskedScatter(Ty, Align));		(SI && isLegalMaskedScatter(Ty, Align));
}		}

		/// Returns true if the target machine supports all of the reduction
		fhahnUnsubmitted Done Reply Inline Actions This also needs a comment. And the name could probably be improved. Maybe `canVectorizeReductions`? fhahn: This also needs a comment. And the name could probably be improved. Maybe…
		david-armUnsubmitted Done Reply Inline Actions nit: Perhaps use `///` here instead of '//' in line with other function comments? david-arm: nit: Perhaps use `///` here instead of '//' in line with other function comments?
		/// variables found for the given VF
		fhahnUnsubmitted Done Reply Inline Actions nit: `.` at end of sentence. fhahn: nit: `.` at end of sentence.
		bool canVectorizeReductions(ElementCount VF) {
		return (llvm::all_of(Legal->getReductionVars(), [&](auto &Reduction) -> bool {
		sdesmalenUnsubmitted Done Reply Inline Actions nit: use `return llvm::all_of(....)` with lambda, instead of loop? sdesmalen: nit: use `return llvm::all_of(....)` with lambda, instead of loop?
		david-armUnsubmitted Not Done Reply Inline Actions nit: I think you can remove the '(' and ')' surrounding the llvm::all_of call here. david-arm: nit: I think you can remove the '(' and ')' surrounding the llvm::all_of call here.
		fhahnUnsubmitted Done Reply Inline Actions nit: `llvm::` should not be required fhahn: nit: `llvm::` should not be required
		RecurrenceDescriptor RdxDesc = Reduction.second;
		return TTI.isLegalToVectorizeReduction(RdxDesc, VF);
		}));
		david-armUnsubmitted Not Done Reply Inline Actions It might be worth printing out the recurrence kind here. Do we also want to emit a remark here to help the user understand why it failed to vectorise? david-arm: It might be worth printing out the recurrence kind here. Do we also want to emit a remark here…
		sdesmalenUnsubmitted Done Reply Inline Actions Is it worth just passing the whole Recurrence descriptor and the whole of VF? When passing the whole Recurrence descriptor, in the future the function can also determine whether it can vectorize an ordered reduction (e.g. ordered fadd) in the loop body using some instruction. sdesmalen: Is it worth just passing the whole Recurrence descriptor and the whole of VF? When passing the…
		}

/// Returns true if \p I is an instruction that will be scalarized with		/// Returns true if \p I is an instruction that will be scalarized with
/// predication. Such instructions include conditional stores and		/// predication. Such instructions include conditional stores and
/// instructions that may divide by zero.		/// instructions that may divide by zero.
/// If a non-zero VF has been calculated, we check if I will be scalarized		/// If a non-zero VF has been calculated, we check if I will be scalarized
/// predication for that VF.		/// predication for that VF.
bool isScalarWithPredication(Instruction *I,		bool isScalarWithPredication(Instruction *I,
ElementCount VF = ElementCount::getFixed(1));		ElementCount VF = ElementCount::getFixed(1));

▲ Show 20 Lines • Show All 3,061 Lines • ▼ Show 20 Lines	if (VF.isVector() && IsPtrLoopInvariant && IsIndexLoopInvariant.all()) {
}		}
}		}
}		}

void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN,		void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN,
RecurrenceDescriptor *RdxDesc,		RecurrenceDescriptor *RdxDesc,
Value *StartV, unsigned UF,		Value *StartV, unsigned UF,
ElementCount VF) {		ElementCount VF) {
assert(!VF.isScalable() && "scalable vectors not yet supported.");
PHINode *P = cast<PHINode>(PN);		PHINode *P = cast<PHINode>(PN);
if (EnableVPlanNativePath) {		if (EnableVPlanNativePath) {
// Currently we enter here in the VPlan-native path for non-induction		// Currently we enter here in the VPlan-native path for non-induction
// PHIs where all control flow is uniform. We simply widen these PHIs.		// PHIs where all control flow is uniform. We simply widen these PHIs.
// Create a vector phi with no operands - the vector phi operands will be		// Create a vector phi with no operands - the vector phi operands will be
// set at the end of vector code generation.		// set at the end of vector code generation.
Type *VecTy =		Type *VecTy =
(VF.isScalar()) ? PN->getType() : VectorType::get(PN->getType(), VF);		(VF.isScalar()) ? PN->getType() : VectorType::get(PN->getType(), VF);
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	case InductionDescriptor::IK_PtrInduction: {

if (Cost->isScalarAfterVectorization(P, VF)) {		if (Cost->isScalarAfterVectorization(P, VF)) {
// This is the normalized GEP that starts counting at zero.		// This is the normalized GEP that starts counting at zero.
Value *PtrInd =		Value *PtrInd =
Builder.CreateSExtOrTrunc(Induction, II.getStep()->getType());		Builder.CreateSExtOrTrunc(Induction, II.getStep()->getType());
// Determine the number of scalars we need to generate for each unroll		// Determine the number of scalars we need to generate for each unroll
// iteration. If the instruction is uniform, we only need to generate the		// iteration. If the instruction is uniform, we only need to generate the
// first lane. Otherwise, we generate all VF values.		// first lane. Otherwise, we generate all VF values.
unsigned Lanes =		unsigned Lanes =
CarolineConcattoUnsubmitted Not Done Reply Inline Actions So once we start to use Scalable vector and we start to use the VF.getKnownMinValue(), shouldn't;t this be multiplied by getMaxVScale()? CarolineConcatto: So once we start to use Scalable vector and we start to use the VF.getKnownMinValue()…
david-armUnsubmitted Not Done Reply Inline Actions This is for vectorise of induction variables. I think we'll have to use a runtime VF that I introduced in D95139 here. I don't think Kerry has to fix this in her patch. david-arm: This is for vectorise of induction variables. I think we'll have to use a runtime VF that I…
Cost->isUniformAfterVectorization(P, VF) ? 1 : VF.getKnownMinValue();		Cost->isUniformAfterVectorization(P, VF) ? 1 : VF.getKnownMinValue();
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
for (unsigned Lane = 0; Lane < Lanes; ++Lane) {		for (unsigned Lane = 0; Lane < Lanes; ++Lane) {
Constant *Idx = ConstantInt::get(PtrInd->getType(),		Constant *Idx = ConstantInt::get(PtrInd->getType(),
Lane + Part * VF.getKnownMinValue());		Lane + Part * VF.getKnownMinValue());
Value *GlobalIdx = Builder.CreateAdd(PtrInd, Idx);		Value *GlobalIdx = Builder.CreateAdd(PtrInd, Idx);
Value *SclrGep =		Value *SclrGep =
emitTransformedIndex(Builder, GlobalIdx, PSE.getSE(), DL, II);		emitTransformedIndex(Builder, GlobalIdx, PSE.getSE(), DL, II);
Show All 31 Lines	case InductionDescriptor::IK_PtrInduction: {
NewPointerPhi->addIncoming(InductionGEP, LoopLatch);		NewPointerPhi->addIncoming(InductionGEP, LoopLatch);

// Create UF many actual address geps that use the pointer		// Create UF many actual address geps that use the pointer
// phi as base and a vectorized version of the step value		// phi as base and a vectorized version of the step value
// (<step0, ..., stepN>) as offset.		// (<step0, ..., stepN>) as offset.
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
SmallVector<Constant *, 8> Indices;		SmallVector<Constant *, 8> Indices;
// Create a vector of consecutive numbers from zero to VF.		// Create a vector of consecutive numbers from zero to VF.
for (unsigned i = 0; i < VF.getKnownMinValue(); ++i)		for (unsigned i = 0; i < VF.getKnownMinValue(); ++i)
CarolineConcattoUnsubmitted Not Done Reply Inline Actions Same here, should we not need to multiply by getMaxVScale()? CarolineConcatto: Same here, should we not need to multiply by getMaxVScale()?
Indices.push_back(		Indices.push_back(
ConstantInt::get(PhiType, i + Part * VF.getKnownMinValue()));		ConstantInt::get(PhiType, i + Part * VF.getKnownMinValue()));
Constant *StartOffset = ConstantVector::get(Indices);		Constant *StartOffset = ConstantVector::get(Indices);

Value *GEP = Builder.CreateGEP(		Value *GEP = Builder.CreateGEP(
ScStValueType->getPointerElementType(), NewPointerPhi,		ScStValueType->getPointerElementType(), NewPointerPhi,
Builder.CreateMul(		Builder.CreateMul(
StartOffset,		StartOffset,
▲ Show 20 Lines • Show All 891 Lines • ▼ Show 20 Lines	ORE->emit([&]() {
<< "Ignoring VF=" << ore::NV("UserVF", UserVF)		<< "Ignoring VF=" << ore::NV("UserVF", UserVF)
<< " because target does not support scalable vectors.";		<< " because target does not support scalable vectors.";
});		});
}		}

// Beyond this point two scenarios are handled. If UserVF isn't specified		// Beyond this point two scenarios are handled. If UserVF isn't specified
// then a suitable VF is chosen. If UserVF is specified and there are		// then a suitable VF is chosen. If UserVF is specified and there are
// dependencies, check if it's legal. However, if a UserVF is specified and		// dependencies, check if it's legal. However, if a UserVF is specified and
// there are no dependencies, then there's nothing to do.		// there are no dependencies, then there's nothing to do.
if (UserVF.isNonZero() && !IgnoreScalableUserVF &&		if (UserVF.isNonZero() && !IgnoreScalableUserVF) {
		david-armUnsubmitted Done Reply Inline Actions I think this looks much better now you're just checking reductions only once and early on - thanks for this! However, I think you might need to move this check down to line 5677 where we return UserVF. So the reason I think this is because if we have a loop that contains memory dependences and reductions in the same loop we want to ensure we always do the reduction checks regardless. For example, Legal->isSafeForAnyVectorWidth() could return false and then in the code below we may successfully reduce the UserVF from <vscale x 8 x float> to <vscale x 4 x float> without ever calling canVectorizeReductions. david-arm: I think this looks much better now you're just checking reductions only once and early on…
		fhahnUnsubmitted Done Reply Inline Actions please also add a test for this scenario. fhahn: please also add a test for this scenario.
Legal->isSafeForAnyVectorWidth())		if (!canVectorizeReductions(UserVF)) {
		LLVM_DEBUG(dbgs() << "LV: Scalable vectorization not supported for the "
		"reduction operations found in this loop. "
		"Using fixed-width vectorization instead.\n");
		ORE->emit([&]() {
		fhahnUnsubmitted Done Reply Inline Actions I think you should be bale to use `reportVectorizationFailure` to print to `dbgs()` and generate a remark with the same message fhahn: I think you should be bale to use `reportVectorizationFailure` to print to `dbgs()` and…
		return OptimizationRemarkAnalysis(DEBUG_TYPE, "ScalableVFUnfeasible",
		TheLoop->getStartLoc(),
		TheLoop->getHeader())
		<< "Scalable vectorization not supported for the "
		<< "reduction operations found in this loop. "
		<< "Using fixed-width vectorization instead.";
		});
		return computeFeasibleMaxVF(
		ConstTripCount, ElementCount::getFixed(UserVF.getKnownMinValue()));
		}

		if (Legal->isSafeForAnyVectorWidth())
return UserVF;		return UserVF;
		}

MinBWs = computeMinimumValueSizes(TheLoop->getBlocks(), *DB, &TTI);		MinBWs = computeMinimumValueSizes(TheLoop->getBlocks(), *DB, &TTI);
unsigned SmallestType, WidestType;		unsigned SmallestType, WidestType;
std::tie(SmallestType, WidestType) = getSmallestAndWidestTypes();		std::tie(SmallestType, WidestType) = getSmallestAndWidestTypes();
unsigned WidestRegister = TTI.getRegisterBitWidth(true);		unsigned WidestRegister = TTI.getRegisterBitWidth(true);

// Get the maximum safe dependence distance in bits computed by LAA.		// Get the maximum safe dependence distance in bits computed by LAA.
// It is computed by MaxVF * sizeOf(type) * 8, where type is taken from		// It is computed by MaxVF * sizeOf(type) * 8, where type is taken from
Show All 10 Lines	if (UserVF.isNonZero() && !IgnoreScalableUserVF) {

if (UserVF.isScalable()) {		if (UserVF.isScalable()) {
Optional<unsigned> MaxVScale = TTI.getMaxVScale();		Optional<unsigned> MaxVScale = TTI.getMaxVScale();

// Scale VF by vscale before checking if it's safe.		// Scale VF by vscale before checking if it's safe.
MaxSafeVF = ElementCount::getScalable(		MaxSafeVF = ElementCount::getScalable(
MaxVScale ? (MaxSafeElements / MaxVScale.getValue()) : 0);		MaxVScale ? (MaxSafeElements / MaxVScale.getValue()) : 0);

if (MaxSafeVF.isZero()) {		if (MaxSafeVF.isZero()) {
		david-armUnsubmitted Done Reply Inline Actions I wonder if it's worth bailing out even earlier, i.e. in the same place as above where you check initially? I think the main benefit to bailing out here is if you can reduce the VF to something smaller so that it becomes legal. However, I think for reductions changing the VF won't make a difference in practice. david-arm: I wonder if it's worth bailing out even earlier, i.e. in the same place as above where you…
// The dependence distance is too small to use scalable vectors,		// The dependence distance is too small to use scalable vectors,
// fallback on fixed.		// fallback on fixed.
LLVM_DEBUG(		LLVM_DEBUG(
dbgs()		dbgs()
<< "LV: Max legal vector width too small, scalable vectorization "		<< "LV: Max legal vector width too small, scalable vectorization "
"unfeasible. Using fixed-width vectorization instead.\n");		"unfeasible. Using fixed-width vectorization instead.\n");
ORE->emit([&]() {		ORE->emit([&]() {
return OptimizationRemarkAnalysis(DEBUG_TYPE, "ScalableVFUnfeasible",		return OptimizationRemarkAnalysis(DEBUG_TYPE, "ScalableVFUnfeasible",
TheLoop->getStartLoc(),		TheLoop->getStartLoc(),
		david-armUnsubmitted Done Reply Inline Actions nit: Perhaps use "operations" here instead of types? I'm thinking that the user probably isn't aware of the RecurrenceKind so type might not make as much sense? david-arm: nit: Perhaps use "operations" here instead of types? I'm thinking that the user probably isn't…
TheLoop->getHeader())		TheLoop->getHeader())
<< "Max legal vector width too small, scalable vectorization "		<< "Max legal vector width too small, scalable vectorization "
<< "unfeasible. Using fixed-width vectorization instead.";		<< "unfeasible. Using fixed-width vectorization instead.";
});		});
return computeFeasibleMaxVF(		return computeFeasibleMaxVF(
ConstTripCount, ElementCount::getFixed(UserVF.getKnownMinValue()));		ConstTripCount, ElementCount::getFixed(UserVF.getKnownMinValue()));
}		}
}		}
▲ Show 20 Lines • Show All 460 Lines • ▼ Show 20 Lines	unsigned LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,

// If we did not calculate the cost for VF (because the user selected the VF)		// If we did not calculate the cost for VF (because the user selected the VF)
// then we calculate the cost of VF here.		// then we calculate the cost of VF here.
if (LoopCost == 0) {		if (LoopCost == 0) {
assert(expectedCost(VF).first.isValid() && "Expected a valid cost");		assert(expectedCost(VF).first.isValid() && "Expected a valid cost");
LoopCost = *expectedCost(VF).first.getValue();		LoopCost = *expectedCost(VF).first.getValue();
}		}

assert(LoopCost && "Non-zero loop cost expected");		assert(LoopCost && "Non-zero loop cost expected");
		CarolineConcattoUnsubmitted Done Reply Inline Actions I believe we can use LoopCost.isValid(), here! CarolineConcatto: I believe we can use LoopCost.isValid(), here!
		david-armUnsubmitted Done Reply Inline Actions I think since we're changing LoopCost to be InstructionCost we can change the line above too from LoopCost = expectedCost(VF).first.getValue(); to LoopCost = expectedCost(VF).first; david-arm:* I think since we're changing LoopCost to be InstructionCost we can change the line above too…

// Interleave if we vectorized this loop and there is a reduction that could		// Interleave if we vectorized this loop and there is a reduction that could
// benefit from interleaving.		// benefit from interleaving.
if (VF.isVector() && HasReductions) {		if (VF.isVector() && HasReductions) {
LLVM_DEBUG(dbgs() << "LV: Interleaving because of reductions.\n");		LLVM_DEBUG(dbgs() << "LV: Interleaving because of reductions.\n");
return IC;		return IC;
}		}

// Note that if we've already vectorized the loop we will have done the		// Note that if we've already vectorized the loop we will have done the
// runtime check and so interleaving won't require further checks.		// runtime check and so interleaving won't require further checks.
bool InterleavingRequiresRuntimePointerCheck =		bool InterleavingRequiresRuntimePointerCheck =
(VF.isScalar() && Legal->getRuntimePointerChecking()->Need);		(VF.isScalar() && Legal->getRuntimePointerChecking()->Need);

// We want to interleave small loops in order to reduce the loop overhead and		// We want to interleave small loops in order to reduce the loop overhead and
// potentially expose ILP opportunities.		// potentially expose ILP opportunities.
LLVM_DEBUG(dbgs() << "LV: Loop cost is " << LoopCost << '\n'		LLVM_DEBUG(dbgs() << "LV: Loop cost is " << LoopCost << '\n'
<< "LV: IC is " << IC << '\n'		<< "LV: IC is " << IC << '\n'
<< "LV: VF is " << VF << '\n');		<< "LV: VF is " << VF << '\n');
const bool AggressivelyInterleaveReductions =		const bool AggressivelyInterleaveReductions =
TTI.enableAggressiveInterleaving(HasReductions);		TTI.enableAggressiveInterleaving(HasReductions);
if (!InterleavingRequiresRuntimePointerCheck && LoopCost < SmallLoopCost) {		if (!InterleavingRequiresRuntimePointerCheck && LoopCost < SmallLoopCost) {
		CarolineConcattoUnsubmitted Not Done Reply Inline Actions Can you change SmallLoopCost to be instruction cost as LoopCost, so you don't need to use LoopCost.getValue()? And I believe that in the std::min you will not need to use getValue CarolineConcatto:* Can you change SmallLoopCost to be instruction cost as LoopCost, so you don't need to use…
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions Hi @CarolineConcatto, thanks for your suggestions on InstructionCost! I didn't change the SmallLoopCost flag to be an instruction cost in the last revision as this caused tests which use -small-loop-cost to fail (e.g. LoopVectorize/unroll_novec.ll) kmclaughlin: Hi @CarolineConcatto, thanks for your suggestions on InstructionCost! I didn't change the…
// We assume that the cost overhead is 1 and we use the cost model		// We assume that the cost overhead is 1 and we use the cost model
// to estimate the cost of the loop and interleave until the cost of the		// to estimate the cost of the loop and interleave until the cost of the
// loop overhead is about 5% of the cost of the loop.		// loop overhead is about 5% of the cost of the loop.
unsigned SmallIC =		unsigned SmallIC =
std::min(IC, (unsigned)PowerOf2Floor(SmallLoopCost / LoopCost));		std::min(IC, (unsigned)PowerOf2Floor(SmallLoopCost / LoopCost));

// Interleave until store/load ports (estimated by max interleave count) are		// Interleave until store/load ports (estimated by max interleave count) are
// saturated.		// saturated.
▲ Show 20 Lines • Show All 1,461 Lines • ▼ Show 20 Lines	LLVM_DEBUG(
"which requires masked-interleaved support.\n");		"which requires masked-interleaved support.\n");
if (CM.InterleaveInfo.invalidateGroups())		if (CM.InterleaveInfo.invalidateGroups())
// Invalidating interleave groups also requires invalidating all decisions		// Invalidating interleave groups also requires invalidating all decisions
// based on them, which includes widening decisions and uniform and scalar		// based on them, which includes widening decisions and uniform and scalar
// values.		// values.
CM.invalidateCostModelingDecisions();		CM.invalidateCostModelingDecisions();
}		}

ElementCount MaxVF = MaybeMaxVF.getValue();		ElementCount MaxVF = MaybeMaxVF.getValue();
		fhahnUnsubmitted Not Done Reply Inline Actions This should only be checked in the code handling `UserVF` below? Also, This seems like a property that generally limits to vectorization factor to fixed-width vectorization factors and would be good to check beforehand. Would it be possible to just limit vectorization factors to fixed width factors in `computeFeasibleMaxVF`? This way, we won't need extra checks once automatically picked VFs are supported. You'd also won't need any extra code in the caller of `::plan`. This is similar to how we deal with other 'legality' properties that depend on the vectorization factor, like dependencies that may limit the vectorization factor. fhahn: This should only be checked in the code handling `UserVF` below? Also, This seems like a…
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions Thanks for this suggestion, @fhahn. I've moved the canVectorizeReductions check to `computeFeasibleMaxVF` & updated the affected test in scalable_reductions.ll, where we can use fixed-width vectorization instead (`@mul`) kmclaughlin: Thanks for this suggestion, @fhahn. I've moved the canVectorizeReductions check to…
assert(MaxVF.isNonZero() && "MaxVF is zero.");		assert(MaxVF.isNonZero() && "MaxVF is zero.");

bool UserVFIsLegal = ElementCount::isKnownLE(UserVF, MaxVF);		bool UserVFIsLegal = ElementCount::isKnownLE(UserVF, MaxVF);
if (!UserVF.isZero() &&		if (!UserVF.isZero() &&
(UserVFIsLegal \|\| (UserVF.isScalable() && MaxVF.isScalable()))) {		(UserVFIsLegal \|\| (UserVF.isScalable() && MaxVF.isScalable()))) {
// FIXME: MaxVF is temporarily used inplace of UserVF for illegal scalable		// FIXME: MaxVF is temporarily used inplace of UserVF for illegal scalable
// VFs here, this should be reverted to only use legal UserVFs once the		// VFs here, this should be reverted to only use legal UserVFs once the
// loop below supports scalable VFs.		// loop below supports scalable VFs.
ElementCount VF = UserVFIsLegal ? UserVF : MaxVF;		ElementCount VF = UserVFIsLegal ? UserVF : MaxVF;
		CarolineConcattoUnsubmitted Done Reply Inline Actions nit CarolineConcatto: nit
LLVM_DEBUG(dbgs() << "LV: Using " << (UserVFIsLegal ? "user" : "max")		LLVM_DEBUG(dbgs() << "LV: Using " << (UserVFIsLegal ? "user" : "max")
<< " VF " << VF << ".\n");		<< " VF " << VF << ".\n");
assert(isPowerOf2_32(VF.getKnownMinValue()) &&		assert(isPowerOf2_32(VF.getKnownMinValue()) &&
"VF needs to be a power of two");		"VF needs to be a power of two");
// Collect the instructions (and their associated costs) that will be more		// Collect the instructions (and their associated costs) that will be more
// profitable to scalarize.		// profitable to scalarize.
CM.selectUserVectorizationFactor(VF);		CM.selectUserVectorizationFactor(VF);
CM.collectInLoopReductions();		CM.collectInLoopReductions();
▲ Show 20 Lines • Show All 1,753 Lines • ▼ Show 20 Lines	#endif /* NDEBUG */

VectorizationFactor VF = VectorizationFactor::Disabled();		VectorizationFactor VF = VectorizationFactor::Disabled();
unsigned IC = 1;		unsigned IC = 1;

if (MaybeVF) {		if (MaybeVF) {
VF = *MaybeVF;		VF = *MaybeVF;
// Select the interleave count.		// Select the interleave count.
IC = CM.selectInterleaveCount(VF.Width, VF.Cost);		IC = CM.selectInterleaveCount(VF.Width, VF.Cost);
}		}
		CarolineConcattoUnsubmitted Done Reply Inline Actions nit CarolineConcatto: nit

// Identify the diagnostic messages that should be produced.		// Identify the diagnostic messages that should be produced.
std::pair<StringRef, std::string> VecDiagMsg, IntDiagMsg;		std::pair<StringRef, std::string> VecDiagMsg, IntDiagMsg;
bool VectorizeLoop = true, InterleaveLoop = true;		bool VectorizeLoop = true, InterleaveLoop = true;
if (Requirements.doesNotMeet(F, L, Hints)) {		if (Requirements.doesNotMeet(F, L, Hints)) {
LLVM_DEBUG(dbgs() << "LV: Not vectorizing: loop did not meet vectorization "		LLVM_DEBUG(dbgs() << "LV: Not vectorizing: loop did not meet vectorization "
"requirements.\n");		"requirements.\n");
Hints.emitRemarkWithHints();		Hints.emitRemarkWithHints();
return false;		return false;
}		}

if (VF.Width.isScalar()) {		if (VF.Width.isScalar()) {
LLVM_DEBUG(dbgs() << "LV: Vectorization is possible but not beneficial.\n");		LLVM_DEBUG(dbgs() << "LV: Vectorization is possible but not beneficial.\n");
		fhahnUnsubmitted Done Reply Inline Actions This message seems a bit odd. I think the cost model should just be responsible for assigning a cost, not deciding whether it is possible to vectorize or not; that's the job of the legality checks. Please see my comment above, the could probably done in `computeFeasibleMaxVF`, which technically is part of the cost model, but is the first step and applies other legality constraints as well which limit the vectorization factor. fhahn: This message seems a bit odd. I think the cost model should just be responsible for assigning a…
VecDiagMsg = std::make_pair(		VecDiagMsg = std::make_pair(
"VectorizationNotBeneficial",		"VectorizationNotBeneficial",
		david-armUnsubmitted Not Done Reply Inline Actions Similar to an earlier comment, a remark here would be good I think. david-arm: Similar to an earlier comment, a remark here would be good I think.
"the cost-model indicates that vectorization is not beneficial");		"the cost-model indicates that vectorization is not beneficial");
VectorizeLoop = false;		VectorizeLoop = false;
}		}

if (!MaybeVF && UserIC > 1) {		if (!MaybeVF && UserIC > 1) {
// Tell the user interleaving was avoided up-front, despite being explicitly		// Tell the user interleaving was avoided up-front, despite being explicitly
// requested.		// requested.
LLVM_DEBUG(dbgs() << "LV: Ignoring UserIC, because vectorization and "		LLVM_DEBUG(dbgs() << "LV: Ignoring UserIC, because vectorization and "
▲ Show 20 Lines • Show All 274 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll

This file was added.

				; RUN: opt < %s -loop-vectorize -pass-remarks-analysis=loop-vectorize -pass-remarks-missed=loop-vectorize -mtriple aarch64-unknown-linux-gnu -mattr=+sve -S 2>%t \| FileCheck %s -check-prefix=CHECK
				; RUN: cat %t \| FileCheck %s -check-prefix=CHECK-DEBUG

				; Reduction can be vectorized

				; ADD

				define dso_local i32 @add(i32* nocapture %a, i32* nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @add
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[LOAD2:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[ADD1:.*]] = add <vscale x 8 x i32> %[[LOAD1]]
				; CHECK: %[[ADD2:.*]] = add <vscale x 8 x i32> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[ADD:.*]] = add <vscale x 8 x i32> %[[ADD2]], %[[ADD1]]
				; CHECK-NEXT: call i32 @llvm.vector.reduce.add.nxv8i32(<vscale x 8 x i32> %[[ADD]])
				entry:
				%cmp6 = icmp sgt i64 %n, 0
				br i1 %cmp6, label %for.body, label %for.end

				fhahnUnsubmitted Done Reply Inline Actions nit: those checks should not be needed. fhahn: nit: those checks should not be needed.
				for.body: ; preds = %entry, %for.body
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi i32 [ 2, %entry ], [ %add, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%add = add nsw i32 %0, %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				%sum.0.lcssa = phi i32 [ 2, %entry ], [ %add, %for.body ]
				ret i32 %sum.0.lcssa
				}

				; OR

				define dso_local i32 @or(i32* nocapture %a, i32* nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @or
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[LOAD2:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[OR1:.*]] = or <vscale x 8 x i32> %[[LOAD1]]
				; CHECK: %[[OR2:.*]] = or <vscale x 8 x i32> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[OR:.*]] = or <vscale x 8 x i32> %[[OR2]], %[[OR1]]
				; CHECK-NEXT: call i32 @llvm.vector.reduce.or.nxv8i32(<vscale x 8 x i32> %[[OR]])
				entry:
				%cmp6 = icmp sgt i64 %n, 0
				br i1 %cmp6, label %for.body, label %for.end

				for.body: ; preds = %entry, %for.body
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi i32 [ 2, %entry ], [ %or, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%or = or i32 %0, %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				%sum.0.lcssa = phi i32 [ 2, %entry ], [ %or, %for.body ]
				ret i32 %sum.0.lcssa
				}

				; AND

				define dso_local i32 @and(i32* nocapture %a, i32* nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @and
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[LOAD2:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[AND1:.*]] = and <vscale x 8 x i32> %[[LOAD1]]
				; CHECK: %[[AND2:.*]] = and <vscale x 8 x i32> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[ABD:.*]] = and <vscale x 8 x i32> %[[ADD2]], %[[AND1]]
				; CHECK-NEXT: call i32 @llvm.vector.reduce.and.nxv8i32(<vscale x 8 x i32> %[[ADD]])
				entry:
				%cmp6 = icmp sgt i64 %n, 0
				br i1 %cmp6, label %for.body, label %for.end

				for.body: ; preds = %entry, %for.body
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi i32 [ 2, %entry ], [ %and, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%and = and i32 %0, %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				%sum.0.lcssa = phi i32 [ 2, %entry ], [ %and, %for.body ]
				ret i32 %sum.0.lcssa
				}

				; XOR

				define dso_local i32 @xor(i32* nocapture %a, i32* nocapture readonly %b, i64 %n) {
				sdesmalenUnsubmitted Done Reply Inline Actions nit: remove `dso_local` here and in other definitions. sdesmalen: nit: remove `dso_local` here and in other definitions.
				; CHECK-LABEL: @xor
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[LOAD2:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[XOR1:.*]] = xor <vscale x 8 x i32> %[[LOAD1]]
				; CHECK: %[[XOR2:.*]] = xor <vscale x 8 x i32> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[XOR:.*]] = xor <vscale x 8 x i32> %[[XOR2]], %[[XOR1]]
				; CHECK-NEXT: call i32 @llvm.vector.reduce.xor.nxv8i32(<vscale x 8 x i32> %[[XOR]])
				entry:
				%cmp6 = icmp sgt i64 %n, 0
				br i1 %cmp6, label %for.body, label %for.end

				for.body: ; preds = %entry, %for.body
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi i32 [ 2, %entry ], [ %xor, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%xor = xor i32 %0, %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				%sum.0.lcssa = phi i32 [ 2, %entry ], [ %xor, %for.body ]
				ret i32 %sum.0.lcssa
				}

				; SMIN

				define dso_local i32 @smin(i32* nocapture %a, i32* nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @smin
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[LOAD2:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[ICMP1:.*]] = icmp slt <vscale x 8 x i32> %[[LOAD1]]
				; CHECK: %[[ICMP2:.*]] = icmp slt <vscale x 8 x i32> %[[LOAD2]]
				; CHECK: %[[SEL1:.*]] = select <vscale x 8 x i1> %[[ICMP1]], <vscale x 8 x i32> %[[LOAD1]]
				; CHECK: %[[SEL2:.*]] = select <vscale x 8 x i1> %[[ICMP2]], <vscale x 8 x i32> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[ICMP:.*]] = icmp slt <vscale x 8 x i32> %[[SEL1]], %[[SEL2]]
				; CHECK-NEXT: %[[SEL:.*]] = select <vscale x 8 x i1> %[[ICMP]], <vscale x 8 x i32> %[[SEL1]], <vscale x 8 x i32> %[[SEL2]]
				; CHECK-NEXT: call i32 @llvm.vector.reduce.smin.nxv8i32(<vscale x 8 x i32> %[[SEL]])
				entry:
				%cmp6 = icmp sgt i64 %n, 0
				br i1 %cmp6, label %for.body, label %for.end

				for.body: ; preds = %entry, %for.body
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.010 = phi i32 [ 2, %entry ], [ %.sroa.speculated, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%cmp.i = icmp slt i32 %0, %sum.010
				%.sroa.speculated = select i1 %cmp.i, i32 %0, i32 %sum.010
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				%sum.0.lcssa = phi i32 [ 1, %entry ], [ %.sroa.speculated, %for.body ]
				ret i32 %sum.0.lcssa
				}

				; UMAX

				define dso_local i32 @umax(i32* nocapture %a, i32* nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @umax
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[LOAD2:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[ICMP1:.*]] = icmp ugt <vscale x 8 x i32> %[[LOAD1]]
				; CHECK: %[[ICMP2:.*]] = icmp ugt <vscale x 8 x i32> %[[LOAD2]]
				; CHECK: %[[SEL1:.*]] = select <vscale x 8 x i1> %[[ICMP1]], <vscale x 8 x i32> %[[LOAD1]]
				; CHECK: %[[SEL2:.*]] = select <vscale x 8 x i1> %[[ICMP2]], <vscale x 8 x i32> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[ICMP:.*]] = icmp ugt <vscale x 8 x i32> %[[SEL1]], %[[SEL2]]
				; CHECK-NEXT: %[[SEL:.*]] = select <vscale x 8 x i1> %[[ICMP]], <vscale x 8 x i32> %[[SEL1]], <vscale x 8 x i32> %[[SEL2]]
				; CHECK-NEXT: call i32 @llvm.vector.reduce.umax.nxv8i32(<vscale x 8 x i32> %[[SEL]])
				entry:
				%cmp6 = icmp sgt i64 %n, 0
				br i1 %cmp6, label %for.body, label %for.end

				for.body: ; preds = %entry, %for.body
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.010 = phi i32 [ 2, %entry ], [ %.sroa.speculated, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%cmp.i = icmp ugt i32 %0, %sum.010
				%.sroa.speculated = select i1 %cmp.i, i32 %0, i32 %sum.010
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				%sum.0.lcssa = phi i32 [ 1, %entry ], [ %.sroa.speculated, %for.body ]
				ret i32 %sum.0.lcssa
				}

				; FADD (FAST)

				define dso_local float @fadd_fast(float* noalias nocapture readonly %a, i64 %n) {
				; CHECK-LABEL: @fadd_fast
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <vscale x 8 x float>
				; CHECK: %[[LOAD2:.*]] = load <vscale x 8 x float>
				; CHECK: %[[ADD1:.*]] = fadd fast <vscale x 8 x float> %[[LOAD1]]
				; CHECK: %[[ADD2:.*]] = fadd fast <vscale x 8 x float> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[ADD:.*]] = fadd fast <vscale x 8 x float> %[[ADD2]], %[[ADD1]]
				; CHECK-NEXT: call fast float @llvm.vector.reduce.fadd.nxv8f32(float -0.000000e+00, <vscale x 8 x float> %[[ADD]])
				entry:
				%cmp6 = icmp sgt i64 %n, 0
				br i1 %cmp6, label %for.body, label %for.end

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
				%0 = load float, float* %arrayidx, align 4
				%add = fadd fast float %0, %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				%sum.0.lcssa = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]
				ret float %sum.0.lcssa
				}

				define dso_local bfloat @fadd_fast_bfloat(bfloat* noalias nocapture readonly %a, i64 %n) {
				; CHECK-LABEL: @fadd_fast_bfloat
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <8 x bfloat>
				; CHECK: %[[LOAD2:.*]] = load <8 x bfloat>
				; CHECK: %[[MUL1:.*]] = fadd fast <8 x bfloat> %[[LOAD1]]
				; CHECK: %[[MUL2:.*]] = fadd fast <8 x bfloat> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[RDX1:.*]] = fadd fast <8 x bfloat> %[[MUL2]], %[[MUL1]]
				; CHECK: %[[SHUFFLE1:.*]] = shufflevector <8 x bfloat> %[[RDX1]], <8 x bfloat> poison, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK: %[[RDX2:.*]] = fadd fast <8 x bfloat> %[[RDX1]], %[[SHUFFLE1]]
				; CHECK: %[[SHUFFLE2:.*]] = shufflevector <8 x bfloat> %[[RDX2]], <8 x bfloat> poison, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK: %[[RDX3:.*]] = fadd fast <8 x bfloat> %[[RDX2]], %[[SHUFFLE2]]
				; CHECK: %[[SHUFFLE3:.*]] = shufflevector <8 x bfloat> %[[RDX3]], <8 x bfloat> poison, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK: %[[RDX:.*]] = fadd fast <8 x bfloat> %[[RDX3]], %[[SHUFFLE3]]
				; CHECK: %[[EXTRACT:.*]] = extractelement <8 x bfloat> %[[RDX]], i32 0
				entry:
				%cmp6 = icmp sgt i64 %n, 0
				br i1 %cmp6, label %for.body, label %for.end

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi bfloat [ 0.000000e+00, %entry ], [ %add, %for.body ]
				%arrayidx = getelementptr inbounds bfloat, bfloat* %a, i64 %iv
				%0 = load bfloat, bfloat* %arrayidx, align 4
				%add = fadd fast bfloat %0, %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				%sum.0.lcssa = phi bfloat [ 0.000000e+00, %entry ], [ %add, %for.body ]
				ret bfloat %sum.0.lcssa
				}

				; FMIN (FAST)

				define dso_local float @fmin_fast(float* noalias nocapture readonly %a, i64 %n) #0 {
				; CHECK-LABEL: @fmin
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <vscale x 8 x float>
				; CHECK: %[[LOAD2:.*]] = load <vscale x 8 x float>
				; CHECK: %[[ICMP1:.*]] = fcmp fast olt <vscale x 8 x float> %[[LOAD1]]
				; CHECK: %[[ICMP2:.*]] = fcmp fast olt <vscale x 8 x float> %[[LOAD2]]
				; CHECK: %[[SEL1:.*]] = select <vscale x 8 x i1> %[[ICMP1]], <vscale x 8 x float> %[[LOAD1]]
				; CHECK: %[[SEL2:.*]] = select <vscale x 8 x i1> %[[ICMP2]], <vscale x 8 x float> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[ICMP:.*]] = fcmp fast olt <vscale x 8 x float> %[[SEL1]], %[[SEL2]]
				; CHECK-NEXT: %[[SEL:.*]] = select fast <vscale x 8 x i1> %[[ICMP]], <vscale x 8 x float> %[[SEL1]], <vscale x 8 x float> %[[SEL2]]
				; CHECK-NEXT: call fast float @llvm.vector.reduce.fmin.nxv8f32(<vscale x 8 x float> %[[SEL]])
				entry:
				%cmp6 = icmp sgt i64 %n, 0
				br i1 %cmp6, label %for.body, label %for.end

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi float [ 0.000000e+00, %entry ], [ %.sroa.speculated, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
				%0 = load float, float* %arrayidx, align 4
				%cmp.i = fcmp fast olt float %0, %sum.07
				%.sroa.speculated = select i1 %cmp.i, float %0, float %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				%sum.0.lcssa = phi float [ 0.000000e+00, %entry ], [ %.sroa.speculated, %for.body ]
				ret float %sum.0.lcssa
				}

				; FMAX (FAST)

				define dso_local float @fmax_fast(float* noalias nocapture readonly %a, i64 %n) #0 {
				; CHECK-LABEL: @fmax_fast
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <vscale x 8 x float>
				; CHECK: %[[LOAD2:.*]] = load <vscale x 8 x float>
				; CHECK: %[[ICMP1:.*]] = fcmp fast ogt <vscale x 8 x float> %[[LOAD1]]
				; CHECK: %[[ICMP2:.*]] = fcmp fast ogt <vscale x 8 x float> %[[LOAD2]]
				; CHECK: %[[SEL1:.*]] = select <vscale x 8 x i1> %[[ICMP1]], <vscale x 8 x float> %[[LOAD1]]
				; CHECK: %[[SEL2:.*]] = select <vscale x 8 x i1> %[[ICMP2]], <vscale x 8 x float> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[ICMP:.*]] = fcmp fast ogt <vscale x 8 x float> %[[SEL1]], %[[SEL2]]
				; CHECK-NEXT: %[[SEL:.*]] = select fast <vscale x 8 x i1> %[[ICMP]], <vscale x 8 x float> %[[SEL1]], <vscale x 8 x float> %[[SEL2]]
				; CHECK-NEXT: call fast float @llvm.vector.reduce.fmax.nxv8f32(<vscale x 8 x float> %[[SEL]])
				entry:
				%cmp6 = icmp sgt i64 %n, 0
				br i1 %cmp6, label %for.body, label %for.end

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi float [ 0.000000e+00, %entry ], [ %.sroa.speculated, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
				%0 = load float, float* %arrayidx, align 4
				%cmp.i = fcmp fast ogt float %0, %sum.07
				%.sroa.speculated = select i1 %cmp.i, float %0, float %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				%sum.0.lcssa = phi float [ 0.000000e+00, %entry ], [ %.sroa.speculated, %for.body ]
				ret float %sum.0.lcssa
				}

				; Reduction cannot be vectorized

				; MUL

				; CHECK-DEBUG: Scalable vectorization not supported for the reduction operations found in this loop. Using fixed-width vectorization instead.
				sdesmalenUnsubmitted Done Reply Inline Actions This CHECK-DEBUG (with it's own RUN line) is not checking which function is not vectorizing, it could just as well be emitted for one of the other functions. I'd suggest explicitly adding checks for `@mul` and adding a CHECK-DEBUG line for the other tests as well. sdesmalen: This CHECK-DEBUG (with it's own RUN line) is not checking //which// function is not vectorizing…
				define dso_local i32 @mul(i32* nocapture %a, i32* nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @mul
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <8 x i32>
				; CHECK: %[[LOAD2:.*]] = load <8 x i32>
				; CHECK: %[[MUL1:.*]] = mul <8 x i32> %[[LOAD1]]
				; CHECK: %[[MUL2:.*]] = mul <8 x i32> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[RDX1:.*]] = mul <8 x i32> %[[MUL2]], %[[MUL1]]
				; CHECK: %[[SHUFFLE1:.*]] = shufflevector <8 x i32> %[[RDX1]], <8 x i32> poison, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK: %[[RDX2:.*]] = mul <8 x i32> %[[RDX1]], %[[SHUFFLE1]]
				; CHECK: %[[SHUFFLE2:.*]] = shufflevector <8 x i32> %[[RDX2]], <8 x i32> poison, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK: %[[RDX3:.*]] = mul <8 x i32> %[[RDX2]], %[[SHUFFLE2]]
				; CHECK: %[[SHUFFLE3:.*]] = shufflevector <8 x i32> %[[RDX3]], <8 x i32> poison, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK: %[[RDX:.*]] = mul <8 x i32> %[[RDX3]], %[[SHUFFLE3]]
				; CHECK: %[[EXTRACT:.*]] = extractelement <8 x i32> %[[RDX]], i32 0
				entry:
				%cmp6 = icmp sgt i64 %n, 0
				br i1 %cmp6, label %for.body, label %for.end

				for.body: ; preds = %entry, %for.body
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi i32 [ 2, %entry ], [ %mul, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%mul = mul nsw i32 %0, %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				%sum.0.lcssa = phi i32 [ 2, %entry ], [ %mul, %for.body ]
				ret i32 %sum.0.lcssa
				}

				; CHECK-DEBUG: Scalable vectorization not supported for the reduction operations found in this loop. Using fixed-width vectorization instead.
				sdesmalenUnsubmitted Done Reply Inline Actions Same as above. Can you also add a comment saying why you're testing a `memory_dependence` issue in a test file called `scalable-reductions.ll` ? sdesmalen: Same as above. Can you also add a comment saying why you're testing a `memory_dependence` issue…
				define dso_local i32 @memory_dependence(i32* noalias nocapture %a, i32* noalias nocapture readonly %b, i64 %n) {
				david-armUnsubmitted Done Reply Inline Actions nit: Perhaps you could make it clear you're testing the ordering, i.e. with something like: This test was added to ensure we always check the legality of reductions (end emit a warning if necessary) before checking for memory dependencies david-arm: nit: Perhaps you could make it clear you're testing the ordering, i.e. with something like…
				; CHECK-LABEL: @memory_dependence
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <8 x i32>
				; CHECK: %[[LOAD2:.*]] = load <8 x i32>
				; CHECK: %[[LOAD3:.*]] = load <8 x i32>
				; CHECK: %[[LOAD4:.*]] = load <8 x i32>
				; CHECK: %[[ADD1:.*]] = add nsw <8 x i32> %[[LOAD3]], %[[LOAD1]]
				; CHECK: %[[ADD2:.*]] = add nsw <8 x i32> %[[LOAD4]], %[[LOAD2]]
				; CHECK: %[[MUL1:.*]] = mul <8 x i32> %[[LOAD3]]
				; CHECK: %[[MUL2:.*]] = mul <8 x i32> %[[LOAD4]]
				; CHECK: middle.block:
				; CHECK: %[[RDX1:.*]] = mul <8 x i32> %[[MUL2]], %[[MUL1]]
				; CHECK: %[[SHUFFLE1:.*]] = shufflevector <8 x i32> %[[RDX1]], <8 x i32> poison, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK: %[[RDX2:.*]] = mul <8 x i32> %[[RDX1]], %[[SHUFFLE1]]
				; CHECK: %[[SHUFFLE2:.*]] = shufflevector <8 x i32> %[[RDX2]], <8 x i32> poison, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK: %[[RDX3:.*]] = mul <8 x i32> %[[RDX2]], %[[SHUFFLE2]]
				; CHECK: %[[SHUFFLE3:.*]] = shufflevector <8 x i32> %[[RDX3]], <8 x i32> poison, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				; CHECK: %[[RDX:.*]] = mul <8 x i32> %[[RDX3]], %[[SHUFFLE3]]
				; CHECK: %[[EXTRACT:.*]] = extractelement <8 x i32> %[[RDX]], i32 0
				entry:
				%cmp6 = icmp sgt i64 %n, 0
				br i1 %cmp6, label %for.body, label %for.end

				for.body:
				%i = phi i64 [ %inc, %for.body ], [ 0, %entry ]
				%sum = phi i32 [ %mul, %for.body ], [ 2, %entry ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %i
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx1 = getelementptr inbounds i32, i32* %b, i64 %i
				%1 = load i32, i32* %arrayidx1, align 4
				%add = add nsw i32 %1, %0
				%add2 = add nuw nsw i64 %i, 32
				%arrayidx3 = getelementptr inbounds i32, i32* %a, i64 %add2
				store i32 %add, i32* %arrayidx3, align 4
				%mul = mul nsw i32 %1, %sum
				%inc = add nuw nsw i64 %i, 1
				%exitcond.not = icmp eq i64 %inc, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				%sum.0.lcssa = phi i32 [ 2, %entry ], [ %mul, %for.body ]
				ret i32 %sum.0.lcssa
				}

				; FMIN

				; CHECK-DEBUG: loop not vectorized: value that could not be identified as reduction is used outside the loop
				sdesmalenUnsubmitted Done Reply Inline Actions These two fmin/fmax tests are not very useful, because the loop doesn't fail to vectorize because of code added in this patch. sdesmalen: These two fmin/fmax tests are not very useful, because the loop doesn't fail to vectorize…
				define dso_local float @fmin(float* noalias nocapture readonly %a, i64 %n) {
				entry:
				%cmp6 = icmp sgt i64 %n, 0
				br i1 %cmp6, label %for.body, label %for.end

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi float [ 0.000000e+00, %entry ], [ %.sroa.speculated, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
				%0 = load float, float* %arrayidx, align 4
				%cmp.i = fcmp olt float %0, %sum.07
				%.sroa.speculated = select i1 %cmp.i, float %0, float %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				%sum.0.lcssa = phi float [ 0.000000e+00, %entry ], [ %.sroa.speculated, %for.body ]
				ret float %sum.0.lcssa
				}

				; FMAX

				; CHECK-DEBUG: loop not vectorized: value that could not be identified as reduction is used outside the loop
				define dso_local float @fmax(float* noalias nocapture readonly %a, i64 %n) {
				entry:
				%cmp6 = icmp sgt i64 %n, 0
				br i1 %cmp6, label %for.body, label %for.end

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi float [ 0.000000e+00, %entry ], [ %.sroa.speculated, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
				%0 = load float, float* %arrayidx, align 4
				%cmp.i = fcmp ogt float %0, %sum.07
				%.sroa.speculated = select i1 %cmp.i, float %0, float %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				%sum.0.lcssa = phi float [ 0.000000e+00, %entry ], [ %.sroa.speculated, %for.body ]
				ret float %sum.0.lcssa
				}

				attributes #0 = { "no-nans-fp-math"="true" }
				sdesmalenUnsubmitted Not Done Reply Inline Actions nit: use `nnan` directly in the fp operation instead of an attribute. sdesmalen: nit: use `nnan` directly in the fp operation instead of an attribute.
				kmclaughlinAuthorUnsubmitted Done Reply Inline Actions Hi @sdesmalen, these tests for fmin/fmax fail without the `no-nans-fp-math` attribute, I think because `RecurrenceDescriptor::isRecurrenceInstr` is just checking for the function attribute and not the flags on the instruction. I've created a separate patch (D96350) to try and address this. kmclaughlin: Hi @sdesmalen, these tests for fmin/fmax fail without the `no-nans-fp-math` attribute, I think…

				!0 = distinct !{!0, !1, !2, !3, !4}
				!1 = !{!"llvm.loop.vectorize.width", i32 8}
				!2 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}
				!3 = !{!"llvm.loop.interleave.count", i32 2}
				!4 = !{!"llvm.loop.vectorize.enable", i1 true}