This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
2/7
IVDescriptors.h
-
Transforms/Utils/
-
Utils/
-
LoopUtils.h
-
lib/
-
Analysis/
3/5
IVDescriptors.cpp
-
Transforms/
-
Utils/
1/1
LoopUtils.cpp
-
Vectorize/
8/15
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/AArch64/
-
Transforms/
-
LoopVectorize/
-
AArch64/
4/7
strict-fadd.ll

Differential D98435

[LoopVectorize] Add strict in-order reduction support for fixed-width vectorization
ClosedPublic

Authored by kmclaughlin on Mar 11 2021, 9:47 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
dmgreen
fhahn
david-arm
peterwaller-arm
spatel

Commits

rG7344f3d39a0d: [LoopVectorize] Add strict in-order reduction support for fixed-width…

Summary

Previously we could only vectorize FP reductions if fast math was enabled, as this allows us to
reorder FP operations. However, it may still be beneficial to vectorize the loop by moving
the reduction inside the vectorized loop and making sure that the scalar reduction value
be an input to the horizontal reduction, e.g:

%phi = phi float [ 0.0, %entry ], [ %reduction, %vector_body ]
%load = load <8 x float>
%reduction = call float @llvm.vector.reduce.fadd.v8f32(float %phi, <8 x float> %load)

This patch adds a new flag (IsOrdered) to RecurrenceDescriptor and makes use of the changes added
by D75069 as much as possible, which already teaches the vectorizer about in-loop reductions.
For now in-order reduction support is off by default and controlled with the -enable-strict-reductions flag.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

kmclaughlin created this revision.Mar 11 2021, 9:47 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptMar 11 2021, 9:47 AM

kmclaughlin requested review of this revision.Mar 11 2021, 9:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 11 2021, 9:47 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Some minor nits. Ran out of time for a more thorough review at this moment.

llvm/lib/Transforms/Utils/LoopUtils.cpp
1081	Nit: this variable appears to be unused?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
334	nit: As 'reduction' is spelt out on the adjacent arguments I would prefer consistency over brevity unless there is a reason to differ here.

fhahn added inline comments.Mar 11 2021, 2:02 PM

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
2	`+sve` is not needed for the test? Can you also add a negative test?
119	I don't think this test is testing what you intended it to. The condition `> 0.5f` is in the loop pre-header. I think it should be in the loop body, so it tests generating the correct mask? (The C source here is a bit misleading IMO)

Harbormaster completed remote builds in B93333: Diff 329992.Mar 11 2021, 4:47 PM

david-arm added inline comments.Mar 12 2021, 12:53 AM

llvm/lib/Analysis/IVDescriptors.cpp
215	HI @kmclaughlin, I think you can combine the two `IsOrdered =` statements into one by declaring the variable below the `if (auto *EIP = ...) { ... }` block.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4394–4405	nit: I think you can actually now just kill the `{` brace here and the other one `}` at line 4377. That wil avoid the additional indentation.
9235	nit: I think you can just do: Value *C = Builder.getInt1(1);
9246–9250	nit: You could avoid some of the extra indentation below if you changed the `} else {` to something like: } else if (IsOrdered) NextInChain = NewRed; } else { ... }
llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
119	Hi @fhahn, I suspect it's because when compiled with clang the condition is hoisted out before we reach the vectoriser, but you're right I think that the condition should be in the loop. At the moment this test is really the same as `@fadd_strict`.

Hello. Fantastic to see this getting used in more cases. In-order reductions sound like a great use of it.

llvm/include/llvm/Analysis/IVDescriptors.h
258	Am I correct in saying that Ordered reductions are a subset of floating point reductions that don't have AllowReassoc on the fadd have only a single fadd added in each loop iteration (but possibly predicated to be one of several additions). So either fadd(redphi(..)) or phi(fadd(redphi), redphi) or phi(fadd(redphi), fadd(redphi)) ? And that we can't easily get this info from just the FMF or the ExactFPMathInst?
llvm/lib/Analysis/IVDescriptors.cpp
204	I think this may need to check all branches of the phi, if I'm understanding what this is doing exactly. They could have different instructions down each path with different flags (although that would be quite rare I suspect). There might also be selects, if this is matching on if-block phis. I'm not sure if that is handled in AddReductionVar though.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4406	Cost->isInLoopReduction(Phi) -> IsInLoopReductionPhi Does this work in-order for UF > 1? I feel like the order of the adds will be changed, so that it's no-longer in-order. If not, is this code needed, if it can only be using UF == 1 and ReducedPartRdx is set to State.get(LoopExitInstDef, 0) above.
9237	I'm suspecting that the MaskOfOnes isn't needed, and any masking needed would be handled by the Select created above? Otherwise it can use the Mask from the Cond operand to detect when the instruction needs to be predicated and when not.
llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
9	Note that the identity value for a fadd (without nsz) should be -0.0, not 0.0. Otherwise 0.0 + n doesn't always equal n. https://alive2.llvm.org/ce/z/LS2RK3 I believe the select would only be needed if the reduction is predicated though, as mentioned above.

spatel added inline comments.Mar 16 2021, 4:49 AM

llvm/include/llvm/Analysis/IVDescriptors.h
258	I have the same question. I was looking at this code recently ( 36a489d19475 , 1bee549737ac ) -- but I'm still not sure if we are behaving correctly or optimally. The "ExactFPMathInst" seems to provide the same information. I experimented with changing that to a bool. The only reason we appear to save an instruction pointer rather than a bool is so we can provide a prettier optimization remark in the case a loop is not vectorized. Ie, we say something like: "The FP math at line 42 is not associative" rather than "The loop starting at line 39 requires exact FP math".

david-arm added inline comments.Mar 16 2021, 5:02 AM

llvm/include/llvm/Analysis/IVDescriptors.h
258	I think the IsOrdered flag here is more of a convenience so that we can avoid calling the more expensive `checkOrderedReduction` function every time we may want to use strict, in-order reduction intrinsics. The `checkOrderedReduction` does cast the instruction to a `FPMathOperator` and look for the allows-reassoc flag.

david-arm added inline comments.Mar 16 2021, 9:06 AM

llvm/include/llvm/Analysis/IVDescriptors.h
258	Hi @spatel, oh I see now what you mean. I didn't realise we now had a ExactFPMathInst in RecurrenceDescriptor. It looks like you're saying instead of setting the IsOrdered flag we can just set ExactFPMathInst to the instruction in ReduxDesc, which then gets passed on to the RecurrenceDescriptor.

dmgreen added inline comments.Mar 16 2021, 9:26 AM

llvm/include/llvm/Analysis/IVDescriptors.h
258	As far as I understand, we still can not vectorize in-order reductions with multiple adds in the loop. Something like https://godbolt.org/z/c9qd1v. The ordering would change if we tried. So the flag may still be needed for the second bullet point above, which is a large part of the checkOrderedReduction. Either that or a way to compute that there is only a single fadd (possibly through a select/if block phi).

Changed the name of the flag to -enable-strict-reductions
Removed unnecessary changes to create a mask as this was already handled by VPReductionRecipe::execute
Simplified checkOrderedReductions for this patch. For now, if Exit is a Phi node we will not set IsOrdered to true.

Removed +sve from the RUN line of the tests
Added a negative test (@fadd_multiple)
Added more tests

Thanks all for reviewing this!

llvm/include/llvm/Analysis/IVDescriptors.h
258	Hi @dmgreen & @spatel, I also didn't realise that ExactFPMathInst is very similar and already checks the hasAllowReassoc flag. I've left the IsOrdered flag in this patch as I think there is still a need for it in the case mentioned above where there is a chain of fadds (I've added a test for this to `strict-fadd.ll`). I also changed the conditions in checkOrderedReduction slightly, to check if `Exit == ExactFPMathInst`.
llvm/lib/Analysis/IVDescriptors.cpp
204	I realised whilst trying to add a better test than `@fadd_conditional_rdx` for conditional reductions that with the tests I have written this part of the function is redundant. I've removed it from this patch for now since I don't think I can write any tests for it yet - I believe in addition to looking through Phi nodes here and checking all branches, I would also need to change `getReductionOpChain` to look through Phis and Selects in order to vectorize with inloop reductions, is this correct?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4406	The changes I made to `VPReductionRecipe::execute` should ensure that ordering is enforced when the UF > 1, through the way the chain is constructed for each unrolled part. I was missing a change to fixReduction to correctly set the incoming values of Phi for each unrolled part in my first patch, however I believe this is fixed now and the test for unrolling (`@fadd_strict_unroll`) has been updated accordingly.
llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
2	Hi @fhahn, I've added a negative test where we have multiple fadds in the loop (@fadd_multiple) & removed `+sve`
9	I've created a separate patch which changes the identity value to -0.0 (D98963)
119	I tried to write a test where the condition is in the loop, but I don't think this is possible yet with just the changes in this patch (I believe there would also be some changes needed to getReductionOpChain to try and look through Phis before I could try this). Since this isn't testing anything that `@fadd_strict` doesn't already, I've removed it and instead added `@fadd_predicated` (which uses loop.vectorize.predicate.enable) and `@fadd_conditional` to test the masking in `VPReductionRecipe::execute` with in-order reductions.

Harbormaster completed remote builds in B94735: Diff 331917.Mar 19 2021, 11:42 AM

dmgreen added inline comments.Mar 24 2021, 1:15 AM

llvm/lib/Analysis/IVDescriptors.cpp
204	Oh yeah, that sounds like it might be correct. We handle predicated reductions, but were more interested in tail-folding for MVE. Predication through a select/if phi might well need more work.
207	IsOrdered &= (LHS == Phi) \|\| (RHS == Phi);
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4301	Can this use isScalar(), or is it to handle scalable single items too? We generate multiple phi's, but only use the first one? The others get DCE'd?
4406	Yeah, looks OK I think. All the back-to-back operations might be slow, forced down a critical path, but the order seems good.

david-arm added inline comments.Mar 24 2021, 2:32 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4301	Yes, I think in this case we can just use State.VF.isVector() since that also covers the case when VF=(1,scalable)

Addressing review comments & fixing clang-format warnings

kmclaughlin added inline comments.Mar 24 2021, 8:25 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4301	Yes, we generate multiple phis and the unused phis are removed by InstCombine. Changed this to use `State.VF.isVector()` as suggested above.

kmclaughlin added a parent revision: D98963: [LoopVectorize] Change the identity element for FAdd.Mar 24 2021, 11:45 AM

Harbormaster completed remote builds in B95500: Diff 332999.Mar 24 2021, 3:23 PM

Thanks. This LGTM, if there are no other comments.

llvm/include/llvm/Analysis/IVDescriptors.h
258	Can we clarify the comment, to specify that it means this reduction can be treated like an inorder reduction, and what that currently means.

LGTM as well! Can address Dave's comment before merging I think?

This revision is now accepted and ready to land.Mar 26 2021, 1:34 AM

This revision was landed with ongoing or failed builds.Apr 6 2021, 6:46 AM

Closed by commit rG7344f3d39a0d: [LoopVectorize] Add strict in-order reduction support for fixed-width… (authored by kmclaughlin). · Explain Why

This revision was automatically updated to reflect the committed changes.

kmclaughlin marked an inline comment as done.

kmclaughlin added a commit: rG7344f3d39a0d: [LoopVectorize] Add strict in-order reduction support for fixed-width….

fhahn added inline comments.Apr 8 2021, 4:36 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
333	What's the plan for this option going forward? Would it be safe to remove it and let the cost model deal with deciding whether it is worth to vectorize with strict reductions only?
4301	I think there's test coverage missing for this code. All tests pass if this change gets removed. @kmclaughlin could you add a test?

kmclaughlin mentioned this in D100385: [NFC] Add tests for scalable vectorization of loops with in-order reductions.Apr 13 2021, 7:06 AM

kmclaughlin added inline comments.Apr 15 2021, 8:50 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
333	Hi @fhahn, I think there is more testing needed before we can be confident in removing the `-enable-strict-reductions` flag, for instance adding more tests for scalable types and running LNT with the flag enabled. After this, the plan is to remove the option (after making any necessary changes to the cost model to decide if using in-order reductions is beneficial).
4301	The `@fadd_strict_unroll` test should have been testing this part of fixReduction(), but there was a mistake in the CHECK lines I added for the vector.reduce.fadd intrinsics. I've pushed rG93f54fae9dda to address this, and also created D100570 to try and prevent multiple unused Phis from being generated to begin with.

craig.topper mentioned this in D99509: [RISCV] Add legality check for vectoring reduction.Apr 15 2021, 9:24 AM

kmclaughlin mentioned this in rG62ee638a8700: [NFC] Add tests for scalable vectorization of loops with in-order reductions.Apr 19 2021, 3:17 AM

Ayal mentioned this in D100113: [LV] Move reduction PHI node fixup to VPlan::execute (NFC)..May 3 2021, 12:47 AM

kmclaughlin mentioned this in D101836: [LoopVectorize] Enable strict reductions when allowReordering() returns false.May 4 2021, 7:38 AM

kmclaughlin mentioned this in rG9f76a8526010: [LoopVectorize] Enable strict reductions when allowReordering() returns false.May 26 2021, 6:06 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

IVDescriptors.h

13 lines

Transforms/

Utils/

LoopUtils.h

5 lines

lib/

Analysis/

IVDescriptors.cpp

31 lines

Transforms/

Utils/

LoopUtils.cpp

11 lines

Vectorize/

LoopVectorize.cpp

38 lines

test/

Transforms/

LoopVectorize/

AArch64/

strict-fadd.ll

280 lines

Diff 335502

llvm/include/llvm/Analysis/IVDescriptors.h

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines

/// This struct holds information about recurrence variables.		/// This struct holds information about recurrence variables.
class RecurrenceDescriptor {		class RecurrenceDescriptor {
public:		public:
RecurrenceDescriptor() = default;		RecurrenceDescriptor() = default;

RecurrenceDescriptor(Value Start, Instruction Exit, RecurKind K,		RecurrenceDescriptor(Value Start, Instruction Exit, RecurKind K,
FastMathFlags FMF, Instruction ExactFP, Type RT,		FastMathFlags FMF, Instruction ExactFP, Type RT,
bool Signed, SmallPtrSetImpl<Instruction *> &CI)		bool Signed, bool Ordered,
		SmallPtrSetImpl<Instruction *> &CI)
: StartValue(Start), LoopExitInstr(Exit), Kind(K), FMF(FMF),		: StartValue(Start), LoopExitInstr(Exit), Kind(K), FMF(FMF),
ExactFPMathInst(ExactFP), RecurrenceType(RT), IsSigned(Signed) {		ExactFPMathInst(ExactFP), RecurrenceType(RT), IsSigned(Signed),
		IsOrdered(Ordered) {
CastInsts.insert(CI.begin(), CI.end());		CastInsts.insert(CI.begin(), CI.end());
}		}

/// This POD struct holds information about a potential recurrence operation.		/// This POD struct holds information about a potential recurrence operation.
class InstDesc {		class InstDesc {
public:		public:
InstDesc(bool IsRecur, Instruction I, Instruction ExactFP = nullptr)		InstDesc(bool IsRecur, Instruction I, Instruction ExactFP = nullptr)
: IsRecurrence(IsRecur), PatternLastInst(I),		: IsRecurrence(IsRecur), PatternLastInst(I),
▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	public:

/// Returns a reference to the instructions used for type-promoting the		/// Returns a reference to the instructions used for type-promoting the
/// recurrence.		/// recurrence.
const SmallPtrSet<Instruction *, 8> &getCastInsts() const { return CastInsts; }		const SmallPtrSet<Instruction *, 8> &getCastInsts() const { return CastInsts; }

/// Returns true if all source operands of the recurrence are SExtInsts.		/// Returns true if all source operands of the recurrence are SExtInsts.
bool isSigned() const { return IsSigned; }		bool isSigned() const { return IsSigned; }

		/// Expose an ordered FP reduction to the instance users.
		bool isOrdered() const { return IsOrdered; }

/// Attempts to find a chain of operations from Phi to LoopExitInst that can		/// Attempts to find a chain of operations from Phi to LoopExitInst that can
/// be treated as a set of reductions instructions for in-loop reductions.		/// be treated as a set of reductions instructions for in-loop reductions.
SmallVector<Instruction , 4> getReductionOpChain(PHINode Phi,		SmallVector<Instruction , 4> getReductionOpChain(PHINode Phi,
Loop *L) const;		Loop *L) const;

private:		private:
// The starting value of the recurrence.		// The starting value of the recurrence.
// It does not have to be zero!		// It does not have to be zero!
TrackingVH<Value> StartValue;		TrackingVH<Value> StartValue;
// The instruction who's value is used outside the loop.		// The instruction who's value is used outside the loop.
Instruction *LoopExitInstr = nullptr;		Instruction *LoopExitInstr = nullptr;
// The kind of the recurrence.		// The kind of the recurrence.
RecurKind Kind = RecurKind::None;		RecurKind Kind = RecurKind::None;
// The fast-math flags on the recurrent instructions. We propagate these		// The fast-math flags on the recurrent instructions. We propagate these
// fast-math flags into the vectorized FP instructions we generate.		// fast-math flags into the vectorized FP instructions we generate.
FastMathFlags FMF;		FastMathFlags FMF;
// First instance of non-reassociative floating-point in the PHI's use-chain.		// First instance of non-reassociative floating-point in the PHI's use-chain.
Instruction *ExactFPMathInst = nullptr;		Instruction *ExactFPMathInst = nullptr;
// The type of the recurrence.		// The type of the recurrence.
Type *RecurrenceType = nullptr;		Type *RecurrenceType = nullptr;
// True if all source operands of the recurrence are SExtInsts.		// True if all source operands of the recurrence are SExtInsts.
bool IsSigned = false;		bool IsSigned = false;
		// True if this recurrence can be treated as an in-order reduction.
		dmgreenUnsubmitted Not Done Reply Inline Actions Am I correct in saying that Ordered reductions are a subset of floating point reductions that don't have AllowReassoc on the fadd have only a single fadd added in each loop iteration (but possibly predicated to be one of several additions). So either fadd(redphi(..)) or phi(fadd(redphi), redphi) or phi(fadd(redphi), fadd(redphi)) ? And that we can't easily get this info from just the FMF or the ExactFPMathInst? dmgreen: Am I correct in saying that Ordered reductions are a subset of floating point reductions that…
		spatelUnsubmitted Not Done Reply Inline Actions I have the same question. I was looking at this code recently ( 36a489d19475 , 1bee549737ac ) -- but I'm still not sure if we are behaving correctly or optimally. The "ExactFPMathInst" seems to provide the same information. I experimented with changing that to a bool. The only reason we appear to save an instruction pointer rather than a bool is so we can provide a prettier optimization remark in the case a loop is not vectorized. Ie, we say something like: "The FP math at line 42 is not associative" rather than "The loop starting at line 39 requires exact FP math". spatel: I have the same question. I was looking at this code recently ( 36a489d19475 , 1bee549737ac )…
		david-armUnsubmitted Not Done Reply Inline Actions I think the IsOrdered flag here is more of a convenience so that we can avoid calling the more expensive `checkOrderedReduction` function every time we may want to use strict, in-order reduction intrinsics. The `checkOrderedReduction` does cast the instruction to a `FPMathOperator` and look for the allows-reassoc flag. david-arm: I think the IsOrdered flag here is more of a convenience so that we can avoid calling the more…
		david-armUnsubmitted Not Done Reply Inline Actions Hi @spatel, oh I see now what you mean. I didn't realise we now had a ExactFPMathInst in RecurrenceDescriptor. It looks like you're saying instead of setting the IsOrdered flag we can just set ExactFPMathInst to the instruction in ReduxDesc, which then gets passed on to the RecurrenceDescriptor. david-arm: Hi @spatel, oh I see now what you mean. I didn't realise we now had a ExactFPMathInst in…
		dmgreenUnsubmitted Not Done Reply Inline Actions As far as I understand, we still can not vectorize in-order reductions with multiple adds in the loop. Something like https://godbolt.org/z/c9qd1v. The ordering would change if we tried. So the flag may still be needed for the second bullet point above, which is a large part of the checkOrderedReduction. Either that or a way to compute that there is only a single fadd (possibly through a select/if block phi). dmgreen: As far as I understand, we still can not vectorize in-order reductions with multiple adds in…
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions Hi @dmgreen & @spatel, I also didn't realise that ExactFPMathInst is very similar and already checks the hasAllowReassoc flag. I've left the IsOrdered flag in this patch as I think there is still a need for it in the case mentioned above where there is a chain of fadds (I've added a test for this to `strict-fadd.ll`). I also changed the conditions in checkOrderedReduction slightly, to check if `Exit == ExactFPMathInst`. kmclaughlin: Hi @dmgreen & @spatel, I also didn't realise that ExactFPMathInst is very similar and already…
		dmgreenUnsubmitted Done Reply Inline Actions Can we clarify the comment, to specify that it means this reduction can be treated like an inorder reduction, and what that currently means. dmgreen: Can we clarify the comment, to specify that it means this reduction can be treated like an…
		// Currently only a non-reassociative FAdd can be considered in-order,
		// if it is also the only FAdd in the PHI's use chain.
		bool IsOrdered = false;
// Instructions used for type-promoting the recurrence.		// Instructions used for type-promoting the recurrence.
SmallPtrSet<Instruction *, 8> CastInsts;		SmallPtrSet<Instruction *, 8> CastInsts;
};		};

/// A struct for saving information about induction variables.		/// A struct for saving information about induction variables.
class InductionDescriptor {		class InductionDescriptor {
public:		public:
/// This enum represents the kinds of inductions that we support.		/// This enum represents the kinds of inductions that we support.
▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Utils/LoopUtils.h

	Show First 20 Lines • Show All 383 Lines • ▼ Show 20 Lines

	/// Create a generic target reduction using a recurrence descriptor \p Desc			/// Create a generic target reduction using a recurrence descriptor \p Desc
	/// The target is queried to determine if intrinsics or shuffle sequences are			/// The target is queried to determine if intrinsics or shuffle sequences are
	/// required to implement the reduction.			/// required to implement the reduction.
	/// Fast-math-flags are propagated using the RecurrenceDescriptor.			/// Fast-math-flags are propagated using the RecurrenceDescriptor.
	Value createTargetReduction(IRBuilderBase &B, const TargetTransformInfo TTI,			Value createTargetReduction(IRBuilderBase &B, const TargetTransformInfo TTI,
	RecurrenceDescriptor &Desc, Value *Src);			RecurrenceDescriptor &Desc, Value *Src);

				/// Create an ordered reduction intrinsic using the given recurrence
				/// descriptor \p Desc.
				Value *createOrderedReduction(IRBuilderBase &B, RecurrenceDescriptor &Desc,
				Value Src, Value Start);

	/// Get the intersection (logical and) of all of the potential IR flags			/// Get the intersection (logical and) of all of the potential IR flags
	/// of each scalar operation (VL) that will be converted into a vector (I).			/// of each scalar operation (VL) that will be converted into a vector (I).
	/// If OpValue is non-null, we only consider operations similar to OpValue			/// If OpValue is non-null, we only consider operations similar to OpValue
	/// when intersecting.			/// when intersecting.
	/// Flag set: NSW, NUW, exact, and all of fast-math.			/// Flag set: NSW, NUW, exact, and all of fast-math.
	void propagateIRFlags(Value I, ArrayRef<Value > VL, Value *OpValue = nullptr);			void propagateIRFlags(Value I, ArrayRef<Value > VL, Value *OpValue = nullptr);

	/// Returns true if we can prove that \p S is defined and always negative in			/// Returns true if we can prove that \p S is defined and always negative in
	▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

llvm/lib/Analysis/IVDescriptors.cpp

Show First 20 Lines • Show All 183 Lines • ▼ Show 20 Lines	while (!Worklist.empty()) {
// we haven't yet visited.		// we haven't yet visited.
for (Value *O : cast<User>(Val)->operands())		for (Value *O : cast<User>(Val)->operands())
if (auto *I = dyn_cast<Instruction>(O))		if (auto *I = dyn_cast<Instruction>(O))
if (TheLoop->contains(I) && !Visited.count(I))		if (TheLoop->contains(I) && !Visited.count(I))
Worklist.push_back(I);		Worklist.push_back(I);
}		}
}		}

		// Check if a given Phi node can be recognized as an ordered reduction for
		// vectorizing floating point operations without unsafe math.
		static bool checkOrderedReduction(RecurKind Kind, Instruction *ExactFPMathInst,
		Instruction Exit, PHINode Phi) {
		// Currently only FAdd is supported
		if (Kind != RecurKind::FAdd)
		return false;

		bool IsOrdered =
		Exit->getOpcode() == Instruction::FAdd && Exit == ExactFPMathInst;

		// The only pattern accepted is the one in which the reduction PHI
		// is used as one of the operands of the exit instruction
		dmgreenUnsubmitted Not Done Reply Inline Actions I think this may need to check all branches of the phi, if I'm understanding what this is doing exactly. They could have different instructions down each path with different flags (although that would be quite rare I suspect). There might also be selects, if this is matching on if-block phis. I'm not sure if that is handled in AddReductionVar though. dmgreen: I think this may need to check all branches of the phi, if I'm understanding what this is doing…
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions I realised whilst trying to add a better test than `@fadd_conditional_rdx` for conditional reductions that with the tests I have written this part of the function is redundant. I've removed it from this patch for now since I don't think I can write any tests for it yet - I believe in addition to looking through Phi nodes here and checking all branches, I would also need to change `getReductionOpChain` to look through Phis and Selects in order to vectorize with inloop reductions, is this correct? kmclaughlin: I realised whilst trying to add a better test than `@fadd_conditional_rdx` for conditional…
		dmgreenUnsubmitted Not Done Reply Inline Actions Oh yeah, that sounds like it might be correct. We handle predicated reductions, but were more interested in tail-folding for MVE. Predication through a select/if phi might well need more work. dmgreen: Oh yeah, that sounds like it might be correct. We handle predicated reductions, but were more…
		auto *LHS = Exit->getOperand(0);
		auto *RHS = Exit->getOperand(1);
		IsOrdered &= ((LHS == Phi) \|\| (RHS == Phi));
		dmgreenUnsubmitted Done Reply Inline Actions IsOrdered &= (LHS == Phi) \|\| (RHS == Phi); dmgreen: IsOrdered &= (LHS == Phi) \|\| (RHS == Phi);

		if (!IsOrdered)
		return false;

		LLVM_DEBUG(dbgs() << "LV: Found an ordered reduction: Phi: " << *Phi
		<< ", ExitInst: " << *Exit << "\n");

		return true;
		david-armUnsubmitted Done Reply Inline Actions HI @kmclaughlin, I think you can combine the two `IsOrdered =` statements into one by declaring the variable below the `if (auto EIP = ...) { ... }` block. david-arm:* HI @kmclaughlin, I think you can combine the two `IsOrdered =` statements into one by declaring…
		}

bool RecurrenceDescriptor::AddReductionVar(PHINode *Phi, RecurKind Kind,		bool RecurrenceDescriptor::AddReductionVar(PHINode *Phi, RecurKind Kind,
Loop *TheLoop, FastMathFlags FuncFMF,		Loop *TheLoop, FastMathFlags FuncFMF,
RecurrenceDescriptor &RedDes,		RecurrenceDescriptor &RedDes,
DemandedBits *DB,		DemandedBits *DB,
AssumptionCache *AC,		AssumptionCache *AC,
DominatorTree *DT) {		DominatorTree *DT) {
if (Phi->getNumIncomingValues() != 2)		if (Phi->getNumIncomingValues() != 2)
return false;		return false;
▲ Show 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	bool RecurrenceDescriptor::AddReductionVar(PHINode *Phi, RecurKind Kind,
// This means we have seen one but not the other instruction of the		// This means we have seen one but not the other instruction of the
// pattern or more than just a select and cmp.		// pattern or more than just a select and cmp.
if (isMinMaxRecurrenceKind(Kind) && NumCmpSelectPatternInst != 2)		if (isMinMaxRecurrenceKind(Kind) && NumCmpSelectPatternInst != 2)
return false;		return false;

if (!FoundStartPHI \|\| !FoundReduxOp \|\| !ExitInstruction)		if (!FoundStartPHI \|\| !FoundReduxOp \|\| !ExitInstruction)
return false;		return false;

		const bool IsOrdered = checkOrderedReduction(
		Kind, ReduxDesc.getExactFPMathInst(), ExitInstruction, Phi);

if (Start != Phi) {		if (Start != Phi) {
// If the starting value is not the same as the phi node, we speculatively		// If the starting value is not the same as the phi node, we speculatively
// looked through an 'and' instruction when evaluating a potential		// looked through an 'and' instruction when evaluating a potential
// arithmetic reduction to determine if it may have been type-promoted.		// arithmetic reduction to determine if it may have been type-promoted.
//		//
// We now compute the minimal bit width that is required to represent the		// We now compute the minimal bit width that is required to represent the
// reduction. If this is the same width that was indicated by the 'and', we		// reduction. If this is the same width that was indicated by the 'and', we
// can represent the reduction in the smaller type. The 'and' instruction		// can represent the reduction in the smaller type. The 'and' instruction
Show All 38 Lines	bool RecurrenceDescriptor::AddReductionVar(PHINode *Phi, RecurKind Kind,
// only have a single instruction with out-of-loop users.		// only have a single instruction with out-of-loop users.

// The ExitInstruction(Instruction which is allowed to have out-of-loop users)		// The ExitInstruction(Instruction which is allowed to have out-of-loop users)
// is saved as part of the RecurrenceDescriptor.		// is saved as part of the RecurrenceDescriptor.

// Save the description of this reduction variable.		// Save the description of this reduction variable.
RecurrenceDescriptor RD(RdxStart, ExitInstruction, Kind, FMF,		RecurrenceDescriptor RD(RdxStart, ExitInstruction, Kind, FMF,
ReduxDesc.getExactFPMathInst(), RecurrenceType,		ReduxDesc.getExactFPMathInst(), RecurrenceType,
IsSigned, CastInsts);		IsSigned, IsOrdered, CastInsts);
RedDes = RD;		RedDes = RD;

return true;		return true;
}		}

RecurrenceDescriptor::InstDesc		RecurrenceDescriptor::InstDesc
RecurrenceDescriptor::isMinMaxSelectCmpPattern(Instruction *I,		RecurrenceDescriptor::isMinMaxSelectCmpPattern(Instruction *I,
const InstDesc &Prev) {		const InstDesc &Prev) {
▲ Show 20 Lines • Show All 746 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/LoopUtils.cpp

Show First 20 Lines • Show All 1,064 Lines • ▼ Show 20 Lines	Value *llvm::createTargetReduction(IRBuilderBase &B,
// TODO: Support in-order reductions based on the recurrence descriptor.		// TODO: Support in-order reductions based on the recurrence descriptor.
// All ops in the reduction inherit fast-math-flags from the recurrence		// All ops in the reduction inherit fast-math-flags from the recurrence
// descriptor.		// descriptor.
IRBuilderBase::FastMathFlagGuard FMFGuard(B);		IRBuilderBase::FastMathFlagGuard FMFGuard(B);
B.setFastMathFlags(Desc.getFastMathFlags());		B.setFastMathFlags(Desc.getFastMathFlags());
return createSimpleTargetReduction(B, TTI, Src, Desc.getRecurrenceKind());		return createSimpleTargetReduction(B, TTI, Src, Desc.getRecurrenceKind());
}		}

		Value *llvm::createOrderedReduction(IRBuilderBase &B,
		RecurrenceDescriptor &Desc, Value *Src,
		Value *Start) {
		auto Kind = Desc.getRecurrenceKind();
		assert(Kind == RecurKind::FAdd && "Unexpected reduction kind");
		assert(Src->getType()->isVectorTy() && "Expected a vector type");
		assert(!Start->getType()->isVectorTy() && "Expected a scalar type");

		return B.CreateFAddReduce(Start, Src);
		peterwaller-armUnsubmitted Done Reply Inline Actions Nit: this variable appears to be unused? peterwaller-arm: Nit: this variable appears to be unused?
		}

void llvm::propagateIRFlags(Value I, ArrayRef<Value > VL, Value *OpValue) {		void llvm::propagateIRFlags(Value I, ArrayRef<Value > VL, Value *OpValue) {
auto *VecOp = dyn_cast<Instruction>(I);		auto *VecOp = dyn_cast<Instruction>(I);
if (!VecOp)		if (!VecOp)
return;		return;
auto *Intersection = (OpValue == nullptr) ? dyn_cast<Instruction>(VL[0])		auto *Intersection = (OpValue == nullptr) ? dyn_cast<Instruction>(VL[0])
: dyn_cast<Instruction>(OpValue);		: dyn_cast<Instruction>(OpValue);
if (!Intersection)		if (!Intersection)
return;		return;
▲ Show 20 Lines • Show All 802 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 324 Lines • ▼ Show 20 Lines	cl::desc("The maximum interleave count to use when interleaving a scalar "
"reduction in a nested loop."));		"reduction in a nested loop."));

static cl::opt<bool>		static cl::opt<bool>
PreferInLoopReductions("prefer-inloop-reductions", cl::init(false),		PreferInLoopReductions("prefer-inloop-reductions", cl::init(false),
cl::Hidden,		cl::Hidden,
cl::desc("Prefer in-loop vector reductions, "		cl::desc("Prefer in-loop vector reductions, "
"overriding the targets preference."));		"overriding the targets preference."));

		cl::opt<bool> EnableStrictReductions(
		fhahnUnsubmitted Not Done Reply Inline Actions What's the plan for this option going forward? Would it be safe to remove it and let the cost model deal with deciding whether it is worth to vectorize with strict reductions only? fhahn: What's the plan for this option going forward? Would it be safe to remove it and let the cost…
		kmclaughlinAuthorUnsubmitted Not Done Reply Inline Actions Hi @fhahn, I think there is more testing needed before we can be confident in removing the `-enable-strict-reductions` flag, for instance adding more tests for scalable types and running LNT with the flag enabled. After this, the plan is to remove the option (after making any necessary changes to the cost model to decide if using in-order reductions is beneficial). kmclaughlin: Hi @fhahn, I think there is more testing needed before we can be confident in removing the `…
		"enable-strict-reductions", cl::init(false), cl::Hidden,
		peterwaller-armUnsubmitted Done Reply Inline Actions nit: As 'reduction' is spelt out on the adjacent arguments I would prefer consistency over brevity unless there is a reason to differ here. peterwaller-arm: nit: As 'reduction' is spelt out on the adjacent arguments I would prefer consistency over…
		cl::desc("Enable the vectorisation of loops with in-order (strict) "
		"FP reductions"));

static cl::opt<bool> PreferPredicatedReductionSelect(		static cl::opt<bool> PreferPredicatedReductionSelect(
"prefer-predicated-reduction-select", cl::init(false), cl::Hidden,		"prefer-predicated-reduction-select", cl::init(false), cl::Hidden,
cl::desc(		cl::desc(
"Prefer predicating a reduction operation over an after loop select."));		"Prefer predicating a reduction operation over an after loop select."));

cl::opt<bool> EnableVPlanNativePath(		cl::opt<bool> EnableVPlanNativePath(
"enable-vplan-native-path", cl::init(false), cl::Hidden,		"enable-vplan-native-path", cl::init(false), cl::Hidden,
cl::desc("Enable VPlan-native vectorization path with "		cl::desc("Enable VPlan-native vectorization path with "
▲ Show 20 Lines • Show All 3,913 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixFirstOrderRecurrence(PHINode *Phi,
// scalar epilogue); in that case, the exiting path through middle will be		// scalar epilogue); in that case, the exiting path through middle will be
// dynamically dead and the value picked for the phi doesn't matter.		// dynamically dead and the value picked for the phi doesn't matter.
for (PHINode &LCSSAPhi : LoopExitBlock->phis())		for (PHINode &LCSSAPhi : LoopExitBlock->phis())
if (any_of(LCSSAPhi.incoming_values(),		if (any_of(LCSSAPhi.incoming_values(),
[Phi](Value *V) { return V == Phi; }))		[Phi](Value *V) { return V == Phi; }))
LCSSAPhi.addIncoming(ExtractForPhiUsedOutsideLoop, LoopMiddleBlock);		LCSSAPhi.addIncoming(ExtractForPhiUsedOutsideLoop, LoopMiddleBlock);
}		}

		static bool useOrderedReductions(RecurrenceDescriptor &RdxDesc) {
		return EnableStrictReductions && RdxDesc.isOrdered();
		}

void InnerLoopVectorizer::fixReduction(PHINode *Phi, VPTransformState &State) {		void InnerLoopVectorizer::fixReduction(PHINode *Phi, VPTransformState &State) {
// Get it's reduction variable descriptor.		// Get it's reduction variable descriptor.
assert(Legal->isReductionVariable(Phi) &&		assert(Legal->isReductionVariable(Phi) &&
"Unable to find the reduction variable");		"Unable to find the reduction variable");
RecurrenceDescriptor RdxDesc = Legal->getReductionVars()[Phi];		RecurrenceDescriptor RdxDesc = Legal->getReductionVars()[Phi];

RecurKind RK = RdxDesc.getRecurrenceKind();		RecurKind RK = RdxDesc.getRecurrenceKind();
TrackingVH<Value> ReductionStartValue = RdxDesc.getRecurrenceStartValue();		TrackingVH<Value> ReductionStartValue = RdxDesc.getRecurrenceStartValue();
Show All 13 Lines	void InnerLoopVectorizer::fixReduction(PHINode *Phi, VPTransformState &State) {
// Reductions do not have to start at zero. They can start with		// Reductions do not have to start at zero. They can start with
// any loop invariant values.		// any loop invariant values.
BasicBlock *Latch = OrigLoop->getLoopLatch();		BasicBlock *Latch = OrigLoop->getLoopLatch();
Value *LoopVal = Phi->getIncomingValueForBlock(Latch);		Value *LoopVal = Phi->getIncomingValueForBlock(Latch);

for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Value *VecRdxPhi = State.get(State.Plan->getVPValue(Phi), Part);		Value *VecRdxPhi = State.get(State.Plan->getVPValue(Phi), Part);
Value *Val = State.get(State.Plan->getVPValue(LoopVal), Part);		Value *Val = State.get(State.Plan->getVPValue(LoopVal), Part);
		if (IsInLoopReductionPhi && useOrderedReductions(RdxDesc) &&
		State.VF.isVector())
		dmgreenUnsubmitted Not Done Reply Inline Actions Can this use isScalar(), or is it to handle scalable single items too? We generate multiple phi's, but only use the first one? The others get DCE'd? dmgreen: Can this use isScalar(), or is it to handle scalable single items too? We generate multiple…
		david-armUnsubmitted Done Reply Inline Actions Yes, I think in this case we can just use State.VF.isVector() since that also covers the case when VF=(1,scalable) david-arm: Yes, I think in this case we can just use State.VF.isVector() since that also covers the…
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions Yes, we generate multiple phis and the unused phis are removed by InstCombine. Changed this to use `State.VF.isVector()` as suggested above. kmclaughlin: Yes, we generate multiple phis and the unused phis are removed by InstCombine. Changed this to…
		fhahnUnsubmitted Not Done Reply Inline Actions I think there's test coverage missing for this code. All tests pass if this change gets removed. @kmclaughlin could you add a test? fhahn: I think there's test coverage missing for this code. All tests pass if this change gets removed.
		kmclaughlinAuthorUnsubmitted Not Done Reply Inline Actions The `@fadd_strict_unroll` test should have been testing this part of fixReduction(), but there was a mistake in the CHECK lines I added for the vector.reduce.fadd intrinsics. I've pushed rG93f54fae9dda to address this, and also created D100570 to try and prevent multiple unused Phis from being generated to begin with. kmclaughlin: The `@fadd_strict_unroll` test should have been testing this part of fixReduction(), but there…
		Val = State.get(State.Plan->getVPValue(LoopVal), UF - 1);
cast<PHINode>(VecRdxPhi)		cast<PHINode>(VecRdxPhi)
->addIncoming(Val, LI->getLoopFor(LoopVectorBody)->getLoopLatch());		->addIncoming(Val, LI->getLoopFor(LoopVectorBody)->getLoopLatch());
}		}

// Before each round, move the insertion point right between		// Before each round, move the insertion point right between
// the PHIs and the values we are going to write.		// the PHIs and the values we are going to write.
// This allows us to write both PHINodes and the extractelement		// This allows us to write both PHINodes and the extractelement
// instructions.		// instructions.
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixReduction(PHINode *Phi, VPTransformState &State) {
// The middle block terminator has already been assigned a DebugLoc here (the		// The middle block terminator has already been assigned a DebugLoc here (the
// OrigLoop's single latch terminator). We want the whole middle block to		// OrigLoop's single latch terminator). We want the whole middle block to
// appear to execute on this line because: (a) it is all compiler generated,		// appear to execute on this line because: (a) it is all compiler generated,
// (b) these instructions are always executed after evaluating the latch		// (b) these instructions are always executed after evaluating the latch
// conditional branch, and (c) other passes may add new predecessors which		// conditional branch, and (c) other passes may add new predecessors which
// terminate on this line. This is the easiest way to ensure we don't		// terminate on this line. This is the easiest way to ensure we don't
// accidentally cause an extra step back into the loop while debugging.		// accidentally cause an extra step back into the loop while debugging.
setDebugLocFromInst(Builder, LoopMiddleBlock->getTerminator());		setDebugLocFromInst(Builder, LoopMiddleBlock->getTerminator());
{		if (IsInLoopReductionPhi && useOrderedReductions(RdxDesc))
		ReducedPartRdx = State.get(LoopExitInstDef, UF - 1);
		else {
// Floating-point operations should have some FMF to enable the reduction.		// Floating-point operations should have some FMF to enable the reduction.
IRBuilderBase::FastMathFlagGuard FMFG(Builder);		IRBuilderBase::FastMathFlagGuard FMFG(Builder);
Builder.setFastMathFlags(RdxDesc.getFastMathFlags());		Builder.setFastMathFlags(RdxDesc.getFastMathFlags());
for (unsigned Part = 1; Part < UF; ++Part) {		for (unsigned Part = 1; Part < UF; ++Part) {
Value *RdxPart = State.get(LoopExitInstDef, Part);		Value *RdxPart = State.get(LoopExitInstDef, Part);
if (Op != Instruction::ICmp && Op != Instruction::FCmp) {		if (Op != Instruction::ICmp && Op != Instruction::FCmp) {
ReducedPartRdx = Builder.CreateBinOp(		ReducedPartRdx = Builder.CreateBinOp(
(Instruction::BinaryOps)Op, RdxPart, ReducedPartRdx, "bin.rdx");		(Instruction::BinaryOps)Op, RdxPart, ReducedPartRdx, "bin.rdx");
} else {		} else {
		david-armUnsubmitted Done Reply Inline Actions nit: I think you can actually now just kill the `{` brace here and the other one `}` at line 4377. That wil avoid the additional indentation. david-arm: nit: I think you can actually now just kill the `{` brace here and the other one `}` at line…
ReducedPartRdx = createMinMaxOp(Builder, RK, ReducedPartRdx, RdxPart);		ReducedPartRdx = createMinMaxOp(Builder, RK, ReducedPartRdx, RdxPart);
		dmgreenUnsubmitted Not Done Reply Inline Actions Cost->isInLoopReduction(Phi) -> IsInLoopReductionPhi Does this work in-order for UF > 1? I feel like the order of the adds will be changed, so that it's no-longer in-order. If not, is this code needed, if it can only be using UF == 1 and ReducedPartRdx is set to State.get(LoopExitInstDef, 0) above. dmgreen: Cost->isInLoopReduction(Phi) -> IsInLoopReductionPhi Does this work in-order for UF > 1? I…
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions The changes I made to `VPReductionRecipe::execute` should ensure that ordering is enforced when the UF > 1, through the way the chain is constructed for each unrolled part. I was missing a change to fixReduction to correctly set the incoming values of Phi for each unrolled part in my first patch, however I believe this is fixed now and the test for unrolling (`@fadd_strict_unroll`) has been updated accordingly. kmclaughlin: The changes I made to `VPReductionRecipe::execute` should ensure that ordering is enforced when…
		dmgreenUnsubmitted Not Done Reply Inline Actions Yeah, looks OK I think. All the back-to-back operations might be slow, forced down a critical path, but the order seems good. dmgreen: Yeah, looks OK I think. All the back-to-back operations might be slow, forced down a critical…
}		}
}		}
}		}

// Create the reduction after the loop. Note that inloop reductions create the		// Create the reduction after the loop. Note that inloop reductions create the
// target reduction in the loop using a Reduction recipe.		// target reduction in the loop using a Reduction recipe.
if (VF.isVector() && !IsInLoopReductionPhi) {		if (VF.isVector() && !IsInLoopReductionPhi) {
ReducedPartRdx =		ReducedPartRdx =
▲ Show 20 Lines • Show All 1,672 Lines • ▼ Show 20 Lines	for (Instruction &I : BB->instructionsWithoutDebug()) {
continue;		continue;

// Examine PHI nodes that are reduction variables. Update the type to		// Examine PHI nodes that are reduction variables. Update the type to
// account for the recurrence type.		// account for the recurrence type.
if (auto *PN = dyn_cast<PHINode>(&I)) {		if (auto *PN = dyn_cast<PHINode>(&I)) {
if (!Legal->isReductionVariable(PN))		if (!Legal->isReductionVariable(PN))
continue;		continue;
RecurrenceDescriptor RdxDesc = Legal->getReductionVars()[PN];		RecurrenceDescriptor RdxDesc = Legal->getReductionVars()[PN];
if (PreferInLoopReductions \|\|		if (PreferInLoopReductions \|\| useOrderedReductions(RdxDesc) \|\|
TTI.preferInLoopReduction(RdxDesc.getOpcode(),		TTI.preferInLoopReduction(RdxDesc.getOpcode(),
RdxDesc.getRecurrenceType(),		RdxDesc.getRecurrenceType(),
TargetTransformInfo::ReductionFlags()))		TargetTransformInfo::ReductionFlags()))
continue;		continue;
T = RdxDesc.getRecurrenceType();		T = RdxDesc.getRecurrenceType();
}		}

// Examine the stored values.		// Examine the stored values.
▲ Show 20 Lines • Show All 1,562 Lines • ▼ Show 20 Lines	for (auto &Reduction : Legal->getReductionVars()) {

// We don't collect reductions that are type promoted (yet).		// We don't collect reductions that are type promoted (yet).
if (RdxDesc.getRecurrenceType() != Phi->getType())		if (RdxDesc.getRecurrenceType() != Phi->getType())
continue;		continue;

// If the target would prefer this reduction to happen "in-loop", then we		// If the target would prefer this reduction to happen "in-loop", then we
// want to record it as such.		// want to record it as such.
unsigned Opcode = RdxDesc.getOpcode();		unsigned Opcode = RdxDesc.getOpcode();
if (!PreferInLoopReductions &&		if (!PreferInLoopReductions && !useOrderedReductions(RdxDesc) &&
!TTI.preferInLoopReduction(Opcode, Phi->getType(),		!TTI.preferInLoopReduction(Opcode, Phi->getType(),
TargetTransformInfo::ReductionFlags()))		TargetTransformInfo::ReductionFlags()))
continue;		continue;

// Check that we can correctly put the reductions into the loop, by		// Check that we can correctly put the reductions into the loop, by
// finding the chain of operations that leads from the phi to the loop		// finding the chain of operations that leads from the phi to the loop
// exit value.		// exit value.
SmallVector<Instruction *, 4> ReductionOperations =		SmallVector<Instruction *, 4> ReductionOperations =
▲ Show 20 Lines • Show All 1,526 Lines • ▼ Show 20 Lines
void VPInterleaveRecipe::execute(VPTransformState &State) {		void VPInterleaveRecipe::execute(VPTransformState &State) {
assert(!State.Instance && "Interleave group being replicated.");		assert(!State.Instance && "Interleave group being replicated.");
State.ILV->vectorizeInterleaveGroup(IG, definedValues(), State, getAddr(),		State.ILV->vectorizeInterleaveGroup(IG, definedValues(), State, getAddr(),
getStoredValues(), getMask());		getStoredValues(), getMask());
}		}

void VPReductionRecipe::execute(VPTransformState &State) {		void VPReductionRecipe::execute(VPTransformState &State) {
assert(!State.Instance && "Reduction being replicated.");		assert(!State.Instance && "Reduction being replicated.");
		Value *PrevInChain = State.get(getChainOp(), 0);
for (unsigned Part = 0; Part < State.UF; ++Part) {		for (unsigned Part = 0; Part < State.UF; ++Part) {
RecurKind Kind = RdxDesc->getRecurrenceKind();		RecurKind Kind = RdxDesc->getRecurrenceKind();
		bool IsOrdered = useOrderedReductions(*RdxDesc);
Value *NewVecOp = State.get(getVecOp(), Part);		Value *NewVecOp = State.get(getVecOp(), Part);
if (VPValue *Cond = getCondOp()) {		if (VPValue *Cond = getCondOp()) {
Value *NewCond = State.get(Cond, Part);		Value *NewCond = State.get(Cond, Part);
VectorType *VecTy = cast<VectorType>(NewVecOp->getType());		VectorType *VecTy = cast<VectorType>(NewVecOp->getType());
Constant *Iden = RecurrenceDescriptor::getRecurrenceIdentity(		Constant *Iden = RecurrenceDescriptor::getRecurrenceIdentity(
Kind, VecTy->getElementType(), RdxDesc->getFastMathFlags());		Kind, VecTy->getElementType(), RdxDesc->getFastMathFlags());
Constant *IdenVec =		Constant *IdenVec =
ConstantVector::getSplat(VecTy->getElementCount(), Iden);		ConstantVector::getSplat(VecTy->getElementCount(), Iden);
Value *Select = State.Builder.CreateSelect(NewCond, NewVecOp, IdenVec);		Value *Select = State.Builder.CreateSelect(NewCond, NewVecOp, IdenVec);
NewVecOp = Select;		NewVecOp = Select;
}		}
Value *NewRed =		Value *NewRed;
createTargetReduction(State.Builder, TTI, *RdxDesc, NewVecOp);
Value *PrevInChain = State.get(getChainOp(), Part);
Value *NextInChain;		Value *NextInChain;
		if (IsOrdered) {
		NewRed = createOrderedReduction(State.Builder, *RdxDesc, NewVecOp,
		david-armUnsubmitted Done Reply Inline Actions nit: I think you can just do: Value C = Builder.getInt1(1); david-arm:* nit: I think you can just do: Value *C = Builder.getInt1(1);
		PrevInChain);
		PrevInChain = NewRed;
		dmgreenUnsubmitted Done Reply Inline Actions I'm suspecting that the MaskOfOnes isn't needed, and any masking needed would be handled by the Select created above? Otherwise it can use the Mask from the Cond operand to detect when the instruction needs to be predicated and when not. dmgreen: I'm suspecting that the MaskOfOnes isn't needed, and any masking needed would be handled by the…
		} else {
		PrevInChain = State.get(getChainOp(), Part);
		NewRed = createTargetReduction(State.Builder, TTI, *RdxDesc, NewVecOp);
		}
if (RecurrenceDescriptor::isMinMaxRecurrenceKind(Kind)) {		if (RecurrenceDescriptor::isMinMaxRecurrenceKind(Kind)) {
NextInChain =		NextInChain =
createMinMaxOp(State.Builder, RdxDesc->getRecurrenceKind(),		createMinMaxOp(State.Builder, RdxDesc->getRecurrenceKind(),
NewRed, PrevInChain);		NewRed, PrevInChain);
} else {		} else if (IsOrdered)
		NextInChain = NewRed;
		else {
NextInChain = State.Builder.CreateBinOp(		NextInChain = State.Builder.CreateBinOp(
(Instruction::BinaryOps)getUnderlyingInstr()->getOpcode(), NewRed,		(Instruction::BinaryOps)getUnderlyingInstr()->getOpcode(), NewRed,
		david-armUnsubmitted Done Reply Inline Actions nit: You could avoid some of the extra indentation below if you changed the `} else {` to something like: } else if (IsOrdered) NextInChain = NewRed; } else { ... } david-arm: nit: You could avoid some of the extra indentation below if you changed the `} else {` to…
PrevInChain);		PrevInChain);
}		}
State.set(this, NextInChain, Part);		State.set(this, NextInChain, Part);
}		}
}		}

void VPReplicateRecipe::execute(VPTransformState &State) {		void VPReplicateRecipe::execute(VPTransformState &State) {
if (State.Instance) { // Generate a single instance.		if (State.Instance) { // Generate a single instance.
▲ Show 20 Lines • Show All 801 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll

This file was added.

				; RUN: opt < %s -loop-vectorize -instcombine -mtriple aarch64-unknown-linux-gnu -enable-strict-reductions -S \| FileCheck %s -check-prefix=CHECK

				fhahnUnsubmitted Done Reply Inline Actions `+sve` is not needed for the test? Can you also add a negative test? fhahn: `+sve` is not needed for the test? Can you also add a negative test?
				kmclaughlinAuthorUnsubmitted Done Reply Inline Actions Hi @fhahn, I've added a negative test where we have multiple fadds in the loop (@fadd_multiple) & removed `+sve` kmclaughlin: Hi @fhahn, I've added a negative test where we have multiple fadds in the loop (@fadd_multiple)…
				define float @fadd_strict(float* noalias nocapture readonly %a, i64 %n) {
				; CHECK-LABEL: @fadd_strict
				; CHECK: vector.body:
				; CHECK: %[[VEC_PHI:.]] = phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX:.]], %vector.body ]
				; CHECK: %[[LOAD:.]] = load <8 x float>, <8 x float>
				; CHECK: %[[RDX]] = call float @llvm.vector.reduce.fadd.v8f32(float %[[VEC_PHI]], <8 x float> %[[LOAD]])
				; CHECK: for.end
				dmgreenUnsubmitted Not Done Reply Inline Actions Note that the identity value for a fadd (without nsz) should be -0.0, not 0.0. Otherwise 0.0 + n doesn't always equal n. https://alive2.llvm.org/ce/z/LS2RK3 I believe the select would only be needed if the reduction is predicated though, as mentioned above. dmgreen: Note that the identity value for a fadd (without nsz) should be -0.0, not 0.0. Otherwise 0.0 +…
				kmclaughlinAuthorUnsubmitted Done Reply Inline Actions I've created a separate patch which changes the identity value to -0.0 (D98963) kmclaughlin: I've created a separate patch which changes the identity value to -0.0 (D98963)
				; CHECK: %[[PHI:.]] = phi float [ %[[SCALAR:.]], %for.body ], [ %[[RDX]], %middle.block ]
				; CHECK: ret float %[[PHI]]
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
				%0 = load float, float* %arrayidx, align 4
				%add = fadd float %0, %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				ret float %add
				}

				define float @fadd_strict_unroll(float* noalias nocapture readonly %a, i64 %n) {
				; CHECK-LABEL: @fadd_strict_unroll
				; CHECK: vector.body:
				; CHECK: %[[VEC_PHI1:.]] = phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX4:.]], %vector.body ]
				; CHECK: %[[LOAD1:.]] = load <8 x float>, <8 x float>
				; CHECK: %[[LOAD2:.]] = load <8 x float>, <8 x float>
				; CHECK: %[[LOAD3:.]] = load <8 x float>, <8 x float>
				; CHECK: %[[LOAD4:.]] = load <8 x float>, <8 x float>
				; CHECK: %[[RDX1:.*]] = call float @llvm.vector.reduce.fadd.v8f32(float %[[VEC_PHI1]], <8 x float> %[[LOAD1]])
				; CHECK: %[[RDX2:.*]] = call float @llvm.vector.reduce.fadd.v8f32(float %[[RDX1]], <8 x float> %[[LOAD2]])
				; CHECK: %[[RDX3:.*]] = call float @llvm.vector.reduce.fadd.v8f32(float %[[RDX2]], <8 x float> %[[LOAD3]])
				; CHECK: %[[RDX4:.*]] = call float @llvm.vector.reduce.fadd.v8f32(float %[[RDX3]], <8 x float> %[[LOAD4]])
				; CHECK: for.end
				; CHECK: %[[PHI:.]] = phi float [ %[[SCALAR:.]], %for.body ], [ %[[RDX4]], %middle.block ]
				; CHECK: ret float %[[PHI]]
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
				%0 = load float, float* %arrayidx, align 4
				%add = fadd float %0, %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !1

				for.end:
				ret float %add
				}

				define void @fadd_strict_interleave(float* noalias nocapture readonly %a, float* noalias nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @fadd_strict_interleave
				; CHECK: entry
				; CHECK: %[[ARRAYIDX:.]] = getelementptr inbounds float, float %a, i64 1
				; CHECK: %[[LOAD1:.]] = load float, float %a
				; CHECK: %[[LOAD2:.]] = load float, float %[[ARRAYIDX]]
				; CHECK: vector.body
				; CHECK: %[[VEC_PHI1:.]] = phi float [ %[[LOAD2]], %vector.ph ], [ %[[RDX2:.]], %vector.body ]
				; CHECK: %[[VEC_PHI2:.]] = phi float [ %[[LOAD1]], %vector.ph ], [ %[[RDX1:.]], %vector.body ]
				; CHECK: %[[WIDE_LOAD:.]] = load <8 x float>, <8 x float>
				; CHECK: %[[STRIDED1:.*]] = shufflevector <8 x float> %[[WIDE_LOAD]], <8 x float> poison, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
				; CHECK: %[[STRIDED2:.*]] = shufflevector <8 x float> %[[WIDE_LOAD]], <8 x float> poison, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
				; CHECK: %[[RDX1]] = call float @llvm.vector.reduce.fadd.v4f32(float %[[VEC_PHI2]], <4 x float> %[[STRIDED1]])
				; CHECK: %[[RDX2]] = call float @llvm.vector.reduce.fadd.v4f32(float %[[VEC_PHI1]], <4 x float> %[[STRIDED2]])
				; CHECK: for.end
				; CHECK ret void
				entry:
				%arrayidxa = getelementptr inbounds float, float* %a, i64 1
				%a1 = load float, float* %a, align 4
				%a2 = load float, float* %arrayidxa, align 4
				br label %for.body

				for.body:
				%add.phi1 = phi float [ %a2, %entry ], [ %add2, %for.body ]
				%add.phi2 = phi float [ %a1, %entry ], [ %add1, %for.body ]
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%arrayidxb1 = getelementptr inbounds float, float* %b, i64 %iv
				%0 = load float, float* %arrayidxb1, align 4
				%add1 = fadd float %0, %add.phi2
				%or = or i64 %iv, 1
				%arrayidxb2 = getelementptr inbounds float, float* %b, i64 %or
				%1 = load float, float* %arrayidxb2, align 4
				%add2 = fadd float %1, %add.phi1
				%iv.next = add nuw nsw i64 %iv, 2
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !2

				for.end:
				store float %add1, float* %a, align 4
				store float %add2, float* %arrayidxa, align 4
				ret void
				}

				define float @fadd_invariant(float* noalias nocapture readonly %a, float* noalias nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @fadd_invariant
				; CHECK: vector.body
				; CHECK: %[[VEC_PHI1:.]] = phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX:.]], %vector.body ]
				; CHECK: %[[LOAD1:.]] = load <4 x float>, <4 x float>
				; CHECK: %[[LOAD2:.]] = load <4 x float>, <4 x float>
				; CHECK: %[[ADD:.*]] = fadd <4 x float> %[[LOAD1]], %[[LOAD2]]
				; CHECK: %[[RDX]] = call float @llvm.vector.reduce.fadd.v4f32(float %[[VEC_PHI1]], <4 x float> %[[ADD]])
				; CHECK: for.end.loopexit
				; CHECK: %[[EXIT_PHI:.]] = phi float [ %[[SCALAR:.]], %for.body ], [ %[[RDX]], %middle.block ]
				; CHECK: for.end
				; CHECK: %[[PHI:.*]] = phi float [ 0.000000e+00, %entry ], [ %[[EXIT_PHI]], %for.end.loopexit ]
				; CHECK: ret float %[[PHI]]
				entry:
				%arrayidx = getelementptr inbounds float, float* %a, i64 1
				%0 = load float, float* %arrayidx, align 4
				fhahnUnsubmitted Not Done Reply Inline Actions I don't think this test is testing what you intended it to. The condition `> 0.5f` is in the loop pre-header. I think it should be in the loop body, so it tests generating the correct mask? (The C source here is a bit misleading IMO) fhahn: I don't think this test is testing what you intended it to. The condition `> 0.5f` is in the…
				david-armUnsubmitted Not Done Reply Inline Actions Hi @fhahn, I suspect it's because when compiled with clang the condition is hoisted out before we reach the vectoriser, but you're right I think that the condition should be in the loop. At the moment this test is really the same as `@fadd_strict`. david-arm: Hi @fhahn, I suspect it's because when compiled with clang the condition is hoisted out before…
				kmclaughlinAuthorUnsubmitted Done Reply Inline Actions I tried to write a test where the condition is in the loop, but I don't think this is possible yet with just the changes in this patch (I believe there would also be some changes needed to getReductionOpChain to try and look through Phis before I could try this). Since this isn't testing anything that `@fadd_strict` doesn't already, I've removed it and instead added `@fadd_predicated` (which uses loop.vectorize.predicate.enable) and `@fadd_conditional` to test the masking in `VPReductionRecipe::execute` with in-order reductions. kmclaughlin: I tried to write a test where the condition is in the loop, but I don't think this is possible…
				%cmp1 = fcmp ogt float %0, 5.000000e-01
				br i1 %cmp1, label %for.body, label %for.end

				for.body: ; preds = %for.body
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%res.014 = phi float [ 0.000000e+00, %entry ], [ %rdx, %for.body ]
				%arrayidx2 = getelementptr inbounds float, float* %a, i64 %iv
				%1 = load float, float* %arrayidx2, align 4
				%arrayidx4 = getelementptr inbounds float, float* %b, i64 %iv
				%2 = load float, float* %arrayidx4, align 4
				%add = fadd float %1, %2
				%rdx = fadd float %res.014, %add
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !2

				for.end: ; preds = %for.body, %entry
				%res = phi float [ 0.000000e+00, %entry ], [ %rdx, %for.body ]
				ret float %res
				}

				define float @fadd_conditional(float* noalias nocapture readonly %a, float* noalias nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @fadd_conditional
				; CHECK: vector.body:
				; CHECK: %[[PHI:.]] = phi float [ 1.000000e+00, %vector.ph ], [ %[[RDX:.]], %pred.load.continue6 ]
				; CHECK: %[[LOAD1:.]] = load <4 x float>, <4 x float>
				; CHECK: %[[FCMP1:.*]] = fcmp une <4 x float> %[[LOAD1]], zeroinitializer
				; CHECK: %[[EXTRACT:.*]] = extractelement <4 x i1> %[[FCMP1]], i32 0
				; CHECK: br i1 %[[EXTRACT]], label %pred.load.if, label %pred.load.continue
				; CHECK: pred.load.continue6
				; CHECK: %[[PHI1:.]] = phi <4 x float> [ %[[PHI0:.]], %pred.load.continue4 ], [ %[[INS_ELT:.*]], %pred.load.if5 ]
				; CHECK: %[[PRED:.*]] = select <4 x i1> %[[FCMP1]], <4 x float> %[[PHI1]], <4 x float> <float 3.000000e+00, float 3.000000e+00, float 3.000000e+00, float 3.000000e+00>
				; CHECK: %[[RDX]] = call float @llvm.vector.reduce.fadd.v4f32(float %[[PHI]], <4 x float> %[[PRED]])
				; CHECK: for.body
				; CHECK: %[[RES_PHI:.]] = phi float [ %[[MERGE_RDX:.]], %scalar.ph ], [ %[[FADD:.*]], %for.inc ]
				; CHECK: %[[LOAD2:.]] = load float, float
				; CHECK: %[[FCMP2:.*]] = fcmp une float %[[LOAD2]], 0.000000e+00
				; CHECK: br i1 %[[FCMP2]], label %if.then, label %for.inc
				; CHECK: if.then
				; CHECK: %[[LOAD3:.]] = load float, float
				; CHECK: br label %for.inc
				; CHECK: for.inc
				; CHECK: %[[PHI2:.*]] = phi float [ %[[LOAD3]], %if.then ], [ 3.000000e+00, %for.body ]
				; CHECK: %[[FADD]] = fadd float %[[RES_PHI]], %[[PHI2]]
				; CHECK: for.end
				; CHECK: %[[RDX_PHI:.*]] = phi float [ %[[FADD]], %for.inc ], [ %[[RDX]], %middle.block ]
				; CHECK: ret float %[[RDX_PHI]]
				entry:
				br label %for.body

				for.body: ; preds = %for.body
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.inc ]
				%res = phi float [ 1.000000e+00, %entry ], [ %fadd, %for.inc ]
				%arrayidx = getelementptr inbounds float, float* %b, i64 %iv
				%0 = load float, float* %arrayidx, align 4
				%tobool = fcmp une float %0, 0.000000e+00
				br i1 %tobool, label %if.then, label %for.inc

				if.then: ; preds = %for.body
				%arrayidx2 = getelementptr inbounds float, float* %a, i64 %iv
				%1 = load float, float* %arrayidx2, align 4
				br label %for.inc

				for.inc:
				%phi = phi float [ %1, %if.then ], [ 3.000000e+00, %for.body ]
				%fadd = fadd float %res, %phi
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !2

				for.end:
				%rdx = phi float [ %fadd, %for.inc ]
				ret float %rdx
				}

				; Test to check masking correct, using the "llvm.loop.vectorize.predicate.enable" attribute
				define float @fadd_predicated(float* noalias nocapture %a, i64 %n) {
				; CHECK-LABEL: @fadd_predicated
				; CHECK: vector.ph
				; CHECK: %[[TRIP_MINUS_ONE:.*]] = add i64 %n, -1
				; CHECK: %[[BROADCAST_INS:.*]] = insertelement <2 x i64> poison, i64 %[[TRIP_MINUS_ONE]], i32 0
				; CHECK: %[[SPLAT:.*]] = shufflevector <2 x i64> %[[BROADCAST_INS]], <2 x i64> poison, <2 x i32> zeroinitializer
				; CHECK: vector.body
				; CHECK: %[[RDX_PHI:.]] = phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX:.]], %pred.load.continue2 ]
				; CHECK: pred.load.continue2
				; CHECK: %[[PHI:.]] = phi <2 x float> [ %[[PHI0:.]], %pred.load.continue ], [ %[[INS_ELT:.*]], %pred.load.if1 ]
				; CHECK: %[[MASK:.*]] = select <2 x i1> %0, <2 x float> %[[PHI]], <2 x float> <float -0.000000e+00, float -0.000000e+00>
				; CHECK: %[[RDX]] = call float @llvm.vector.reduce.fadd.v2f32(float %[[RDX_PHI]], <2 x float> %[[MASK]])
				; CHECK: for.end:
				; CHECK: %[[RES_PHI:.*]] = phi float [ undef, %for.body ], [ %[[RDX]], %middle.block ]
				; CHECK: ret float %[[RES_PHI]]
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%iv = phi i64 [ %iv.next, %for.body ], [ 0, %entry ]
				%sum.02 = phi float [ %l7, %for.body ], [ 0.000000e+00, %entry ]
				%l2 = getelementptr inbounds float, float* %a, i64 %iv
				%l3 = load float, float* %l2, align 4
				%l7 = fadd float %sum.02, %l3
				%iv.next = add i64 %iv, 1
				%exitcond = icmp eq i64 %iv.next, %n
				br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !3

				for.end: ; preds = %for.body
				%sum.0.lcssa = phi float [ %l7, %for.body ]
				ret float %sum.0.lcssa
				}

				; Negative test - loop contains multiple fadds which we cannot safely reorder
				define float @fadd_multiple(float* noalias nocapture %a, float* noalias nocapture %b, i64 %n) {
				; CHECK-LABEL: @fadd_multiple
				; CHECK: vector.body
				; CHECK: %[[PHI:.]] = phi <8 x float> [ <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, %vector.ph ], [ %[[VEC_FADD2:.]], %vector.body ]
				; CHECK: %[[VEC_LOAD1:.*]] = load <8 x float>, <8 x float>
				; CHECK: %[[VEC_FADD1:.*]] = fadd <8 x float> %[[PHI]], %[[VEC_LOAD1]]
				; CHECK: %[[VEC_LOAD2:.*]] = load <8 x float>, <8 x float>
				; CHECK: %[[VEC_FADD2]] = fadd <8 x float> %[[VEC_FADD1]], %[[VEC_LOAD2]]
				; CHECK: middle.block
				; CHECK: %[[RDX:.*]] = call float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> %[[VEC_FADD2]])
				; CHECK: for.body
				; CHECK: %[[SUM:.]] = phi float [ %bc.merge.rdx, %scalar.ph ], [ %[[FADD2:.]], %for.body ]
				; CHECK: %[[LOAD1:.]] = load float, float
				; CHECK: %[[FADD1:.*]] = fadd float %sum, %[[LOAD1]]
				; CHECK: %[[LOAD2:.]] = load float, float
				; CHECK: %[[FADD2]] = fadd float %[[FADD1]], %[[LOAD2]]
				; CHECK: for.end
				; CHECK: %[[RET:.*]] = phi float [ %[[FADD2]], %for.body ], [ %[[RDX]], %middle.block ]
				; CHECK: ret float %[[RET]]
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum = phi float [ -0.000000e+00, %entry ], [ %add3, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
				%0 = load float, float* %arrayidx, align 4
				%add = fadd float %sum, %0
				%arrayidx2 = getelementptr inbounds float, float* %b, i64 %iv
				%1 = load float, float* %arrayidx2, align 4
				%add3 = fadd float %add, %1
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body
				%rdx = phi float [ %add3, %for.body ]
				ret float %rdx
				}

				!0 = distinct !{!0, !4, !7, !9}
				!1 = distinct !{!1, !4, !8, !9}
				!2 = distinct !{!2, !5, !7, !9}
				!3 = distinct !{!3, !6, !7, !9, !10}
				!4 = !{!"llvm.loop.vectorize.width", i32 8}
				!5 = !{!"llvm.loop.vectorize.width", i32 4}
				!6 = !{!"llvm.loop.vectorize.width", i32 2}
				!7 = !{!"llvm.loop.interleave.count", i32 1}
				!8 = !{!"llvm.loop.interleave.count", i32 4}
				!9 = !{!"llvm.loop.vectorize.enable", i1 true}
				!10 = !{!"llvm.loop.vectorize.predicate.enable", i1 true}

This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Add strict in-order reduction support for fixed-width vectorizationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 335502

llvm/include/llvm/Analysis/IVDescriptors.h

llvm/include/llvm/Transforms/Utils/LoopUtils.h

llvm/lib/Analysis/IVDescriptors.cpp

llvm/lib/Transforms/Utils/LoopUtils.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll

[LoopVectorize] Add strict in-order reduction support for fixed-width vectorization
ClosedPublic