This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Vectorize/
-
llvm/
-
Transforms/
-
Vectorize/
3/6
LoopVectorizationLegality.h
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
6/12
LoopVectorizationLegality.cpp
3/9
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/AArch64/
-
Transforms/
-
LoopVectorize/
-
AArch64/
4/6
scalable-strict-fadd.ll
7/12
strict-fadd.ll

Differential D101836

[LoopVectorize] Enable strict reductions when allowReordering() returns false
ClosedPublic

Authored by kmclaughlin on May 4 2021, 7:38 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
david-arm
dmgreen
fhahn
peterwaller-arm
spatel

Commits

rG9f76a8526010: [LoopVectorize] Enable strict reductions when allowReordering() returns false

Summary

When loop hints are passed via metadata, the allowReordering function
in LoopVectorizationLegality will allow the order of floating point
operations to be changed:

bool allowReordering() const {
  // When enabling loop hints are provided we allow the vectorizer to change
  // the order of operations that is given by the scalar loop. This is not
  // enabled by default because can be unsafe or inefficient.

The -enable-strict-reductions flag introduced in D98435 will currently only
vectorize reductions in-loop if hints are used, since canVectorizeFPMath()
will return false if reordering is not allowed.

This patch changes canVectorizeFPMath() to query whether it is safe to
vectorize the loop with ordered reductions if no hints are used. For
testing purposes, an additional flag (-hints-allow-reordering) has been
added to disable the reordering behaviour described above.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

kmclaughlin created this revision.May 4 2021, 7:38 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptMay 4 2021, 7:38 AM

kmclaughlin requested review of this revision.May 4 2021, 7:38 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2021, 7:38 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B102525: Diff 342733.May 4 2021, 8:19 AM

sdesmalen added inline comments.May 4 2021, 1:38 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9848–9849	nit: indentation, please use clang-format.
9854	Why do all of the reductions have to be ordered for the LV to be able to vectorize FP math? (e.g. if there is an integer reduction and an ordered FP reduction, it would now choose not to vectorize based on this condition)
9963–9964	This condition is a bit odd. Should `canVectorizeOrderedFPMath` just contain the call to `Requirements.canVectorizeFPMath` instead? i.e. in order to vectorize ordered FP math, it must at least be able to vectorize FP math.

david-arm added inline comments.May 5 2021, 12:49 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9854	I guess it might be worth adding a test for this too then, i.e. having a loop with both an integer and FP reduction and ensure we vectorise with ordered reductions.
9963–9964	I think the existing canVectorizeFPMath function is badly named because it actually checks for reordering: bool canVectorizeFPMath(const LoopVectorizeHints &Hints) const { return !ExactFPMathInst \|\| Hints.allowReordering(); } so the logic in Kerry's patch is something like this: Is this an exact FP math instruction? If not -> vectorise, else Do hints permit reordering? If so -> vectorise, else Can we vectorise with ordered reductions? If not -> emit remark. It probably is possible to combine these into a single LoopVectorizationLegality::canVectorizeFPMath function that does all the above, since that class does have access to the Requirements I think.

Addressing review comments from @sdesmalen & @david-arm:

Merged canVectorizeFPMath with canVectorizeOrderedFPMath in LoopVectorizationLegality
Only check the IsOrdered flag of the RecurrenceDescriptor if hasExactFPMath() is true.
Added a test with different types of reductions (integer add & FP add) that we should be able to vectorize with the -enable-strict-reductions flag

Also added an extra RUN line to both strict-fadd.ll & scalable-strict-fadd.ll to test the changes made to allowReordering (i.e. changing EC.getKnownMinValue() > 1 to EC.isScalar()).

kmclaughlin added inline comments.May 5 2021, 7:50 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9854	Hi @sdesmalen, we should only need the FP reductions in the loop to be ordered. I've changed this so that only reductions where `hasExactFPMath()` is true need to be ordered & added a test for this scenario to strict-fadd.ll

david-arm added inline comments.May 5 2021, 7:57 AM

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
278	I think we can avoid passing in the Hints here as they are already a member of the class with the same name?

sdesmalen added inline comments.May 5 2021, 8:06 AM

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
161–162	is this change necessary?
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
908	Should a hint of VF=1 really lead to the diagnostic `"loop not vectorized: cannot prove it is safe to reorder floating-point operations"`?
908–909	This will also return false if there are no reductions at all, or if all reductions are unordered?
911–914	How about returning true if for each reduction variable, any of the following conditions is true: The reduction is no ExactFPMath instruction for the reduction. The reduction is unordered. EnableStrictReductions is true.
915	I'd prefer the default case to be `return false;`, i.e. when we cannot explicitly determine it is safe, we assume it isn't safe. That would handle the case where `Requirements->getExactFPInst()` is true, but it isn't an instruction used in the reduction. (although I don't know if that would ever happen?)
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9963	You don't need to pass EnableStrictReductions, since it is defined in the same file?

david-arm added inline comments.May 5 2021, 8:10 AM

llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
15	Thanks a lot for adding the VF=vscale x 1 case here, but perhaps `CHECK-SCALAR` should be `CHECK-VF1U1`, since we're still vectorising? Also, it's probably worth adding an extra CHECK line for at least one instruction that shows the "<vscale x 1 ..." - maybe the `call float ...` instruction?

david-arm added inline comments.May 5 2021, 8:17 AM

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
161–162	It's fixing a missing case where we weren't previously allowing reordering for scalable VF=vscale x 1. I think it's worth fixing, but maybe it doesn't have to live in this patch?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9963	I think it has to be passed since `EnableStrictReductions` lives in LoopVectorize.cpp and canVectorizeFPMath lives in LoopVectorizationLegality.cpp.

sdesmalen added inline comments.May 5 2021, 8:30 AM

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
161–162	@kmclaughlin can this be pulled out into a separate patch, or does it depend on changes in this patch in order to test it? I find the way the condition is written very confusing. It looks like the condition is synonymous, but it isn't. How about writing `EC.isVector()` instead?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9963	You're right, I didn't realise that, thanks!

Harbormaster completed remote builds in B102745: Diff 343044.May 5 2021, 8:39 AM

Removed changes to allowReordering() from this patch
Removed Hints.getWidth().isScalar() check from canVectorizeFPMath()
Changed canVectorizeFPMath to also look at induction variables, as we should not vectorize if the loop has any exact floating-point induction operators and we do not allow reassociation.
Added more tests to strict-fadd.ll which include floating-point induction variables to test the above changes.

kmclaughlin added inline comments.May 10 2021, 8:44 AM

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
161–162	@sdesmalen this doesn't depend on any other changes here so I've removed it from this patch
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
908	I've removed this check as I don't think it's necessary, but it was added to be consistent with allowReordering() which returns false if `EC.getKnownMinValue() > 1`

Harbormaster completed remote builds in B103497: Diff 344065.May 10 2021, 8:55 AM

sdesmalen added inline comments.May 10 2021, 9:28 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
916–922	is it sufficient to write: if (any_of(..... { })) return false; (i.e. if `ExactIndVars` is true, then `!getInductionVars().empty()` must also be true)
927	Is it still necessary to iterate through the reduction variables at this point? Given that EnableStrictReductions is true, and that reductions are the only other operations that can have exact FPMath instructions, I think you can just return `true`.

david-arm added inline comments.May 11 2021, 12:32 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
927	We don't support vectorising all of these reductions, for example we don't suppose strict reductions involving `fmul` and we don't support chains of `fadds` currently either. That's why in the code below we check `!RdxDesc.isOrdered()` because the ordered flag is only set for cases we can support at the moment. I think @kmclaughlin has added a test for this case as well below called `fast_induction_unordered_reduction` which shows how there are both fmul and fadd reductions in the same loop.

Removed the !getInductionVars().empty() test from canVectorizeFPMath()

david-arm added inline comments.May 12 2021, 6:13 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
930	nit: This is just a suggestion, but you could rename `ExactRdxVars` to `HasExactRdxVar` and then here simply do: return !HasExactRdxVar; since I think when the list is empty that variable should be false?

Harbormaster completed remote builds in B104031: Diff 344786.May 12 2021, 6:26 AM

Removed the getReductionVars().empty() test from canVectorizeFPMath() and renamed ExactRdxVar to HasExactRdxVar

Harbormaster completed remote builds in B104076: Diff 344868.May 12 2021, 11:27 AM

sdesmalen added inline comments.May 14 2021, 9:01 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
925	Can you rewrite this to: // We can now only vectorize if all reductions with Exact FP math also // have the isOrdered flag set, which indicates that we can move the // reduction operations in-loop. return all_of(getReductionVars(), [&](auto &Reduction) -> bool { RecurrenceDescriptor RdxDesc = Reduction.second; return !RdxDesc.hasExactFPMath() \|\| RdxDesc.isOrdered(); });
llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
6–32	Should all test functions have check lines for all VF8UF1, VF8UF4, etc. ? Conversely, is it sufficient to just pass the interleave-count hint (not the vector width) via metadata and have 1 RUN line for VF8UF1, VF8UF4, VF4UF1? Which also makes me wonder, what is the additional value of having both VF8UF1 and VF4UF1 ?

Removes HasExactRdxVar from canVectorizeFPMath() and instead returns the result from all_of(getReductionVars()...
Reduce the number of RUN lines in the tests

llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
6–32	I think it should be sufficient to pass the interleave count via metadata. I've changed VF4UF1 to VF8UF1 as there was no additional benefit in having both, similarly I've changed VF4UF1 in the strict-fadd.ll test as well to reduce the number of RUN lines.

david-arm added inline comments.May 17 2021, 6:18 AM

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
123–180	Hi @kmclaughlin, I think maybe this is meant to be CHECK-VF8UF4?

Harbormaster completed remote builds in B104805: Diff 345849.May 17 2021, 6:49 AM

Fixed incorrect CHECK lines in the @fadd_strict_unroll_last_val test in strict-fadd.ll (CHECK-VF8UF2 -> CHECK-VF8UF4)

Harbormaster completed remote builds in B104818: Diff 345867.May 17 2021, 8:03 AM

sdesmalen added inline comments.May 17 2021, 1:32 PM

llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
122	Why does this test need to be vectorized with VF=vscale x 8 instead of VF=vscale x 4? Is that because it needs to be driven using the cmdline flags to circumvent the "hint allows reordering" behaviour? (and so that the 1 RUN line covers all tests?) If that's the case, can you do a NFC patch where you first change the test to use the new VF, and then rebase this patch on top?
llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
0–3	The first test, `@fadd_strict`, has check lines that match the first RUN line, and the 4th RUN line, but none of the others. Why is that? And are all these RUN lines needed?
701–702	nit: ; Strict reduction could be performed in-loop, but ordered FP induction variables are not supported.
725	nit: this PHI is unnecessary? (same for the tests below)
729	Can you be more explicit in the comment? i.e. ; As above, but with the FP induction being unordered (fast), the loop can be vectorized.

kmclaughlin added inline comments.May 18 2021, 6:40 AM

llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
122	Hi @sdesmalen, it is the case that I changed the VF so that the test could be covered by one RUN line, and to try and circumvent the hints allow reordering behaviour. I will move changes to the RUN lines into a new patch so that this patch only adds new tests.
llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
0–3	Since the allowReordering() function returns false if `EC.getKnownMinValue() > 1`, I thought it was worth making sure that we don't vectorize a VF of 1 for at least one of the tests, which is why I added the extra RUN line to `@fadd_strict`. The RUN lines are needed so that we can pass the different VFs & interleave counts needed for each of the tests (e.g. `@fadd_strict_unroll` needs a UF > 1) and I didn't want to change the 'allow reordering' behaviour by passing hints through metadata. Though I think I could remove the `CHECK-PRED` line since the `@fadd_predicated` does rely on metadata if this would help at all?

sdesmalen added inline comments.May 19 2021, 1:35 AM

llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
122	I think it's worth adding a flag to the vectorizer to disable this weird behaviour for testing purposes, so that we don't need to change this test, and so that you don't need the multiple RUN lines in the other test in favour of using metadata to control the VF per individual test.

kmclaughlin mentioned this in D102774: [LoopVectorize] Add a flag to prevent reordering of FP operations with hints.May 19 2021, 6:52 AM

Rebased on the dependent changes, D102774
Removed the separate RUN lines from both strict-fadd.ll & scalable-strict-fadd.ll for different VF/UFs.
The tests now use one RUN line with the -hints-allow-reordering=false flag. This uses the existing CHECK lines in the tests, which prior to these changes only tested vectorization where reordering was allowed.

kmclaughlin added a parent revision: D102774: [LoopVectorize] Add a flag to prevent reordering of FP operations with hints.May 19 2021, 7:18 AM

Harbormaster completed remote builds in B105233: Diff 346450.May 19 2021, 7:18 AM

As discussed with @sdesmalen, I have made the following changes to this patch so that the tests are clearer:

Combined this patch with D102774, adding the -hints-allow-reordering flag here
Added CHECK-ORDERED and CHECK-UNORDERED lines to the tests in D103015
Updated the RUN lines in this patch to use the -hints-allow-reordering flag and also added a RUN line for CHECK-NOT-VECTORIZED. The tests themselves remain unchanged, with the exception of the new tests added for induction variables.

Harbormaster completed remote builds in B105890: Diff 347364.May 24 2021, 6:18 AM

kmclaughlin edited parent revisions, added: D103015: [NFC] Add CHECK lines for unordered FP reductions; removed: D102774: [LoopVectorize] Add a flag to prevent reordering of FP operations with hints.May 24 2021, 6:20 AM

kmclaughlin edited the summary of this revision. (Show Details)May 24 2021, 6:26 AM

sdesmalen added inline comments.May 24 2021, 6:30 AM

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
161–162	I'd suggest moving the implementation of `allowReordering` to `LoopVectorizationLegality.cpp`, so that you don't need to add the extern+include above.
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
909–910	nit: can you fold this into the condition below: if (!EnableStrictReductions \|\| any_of(...)) return false;
llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
3	Can you also add a RUN line for `-enable-strict-reductions=true -hints-allow-reordering=true` (which I think can reuse prefix CHECK-UNORDERED)

david-arm added inline comments.May 24 2021, 7:21 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
909–910	I guess the comment below will need updating too if the check is moved.
llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
632	Are these new tests missing hints that the other tests seem to use? I just wondered if it was better to be consistent here that's all. The reason I mention this is because I was expecting the UNORDERED case to vectorise due to the `-hints-allow-reordering=true` flag.

david-arm added inline comments.May 24 2021, 7:37 AM

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
632	I think I see now @kmclaughlin - you're testing the productisation of `-enable-strict-reductions` so you were adding some tests deliberately without hints, which also makes sense. In this case I'd also be happy if you left these tests as they are and just added some comments explaining why we expect the CHECK-UNORDERED case to not vectorise.

Moved the implementation of allowReordering() to LoopVectorizationLegality.cpp
Added a comment above the new tests in strict-fadd.ll to explain why the CHECK-UNORDERED case should not vectorize

Harbormaster completed remote builds in B106080: Diff 347653.May 25 2021, 5:56 AM

kmclaughlin added inline comments.May 25 2021, 5:57 AM

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
3	Hi @sdesmalen, I haven't added another RUN line for `-enable-strict-reductions=true -hints-allow-reordering=true` as this will cause failures with the fadd_multiple test. For most of the tests, the output where both flags are true will match the CHECK-ORDERED lines, since we will always use strict reductions where possible if this flag is set. For fadd_multiple, we cannot use strict reductions and so the value of `-hints-allow-reordering` will change whether or not the test vectorizes. As we discussed previously, I will follow this up with a patch which ensures we only choose strict reductions if we do not allow reordering. At this point I can add a RUN line as you've suggested and reuse the `CHECK-UNORDERED` prefix.
632	Hi @david-arm, I've added a comment above these tests to explain why the CHECK-UNORDERED case shouldn't vectorize.

Other than my request for a FIXME, I'm happy with the patch. LGTM!

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll
3	Fair enough, thanks for explaining. Can you just add a FIXME above the `cl::opt` for `-enable-strict-reductions` that this flag reverses the default behaviour we have now when hints are passed?

This revision is now accepted and ready to land.May 25 2021, 2:12 PM

This revision was landed with ongoing or failed builds.May 26 2021, 6:06 AM

Closed by commit rG9f76a8526010: [LoopVectorize] Enable strict reductions when allowReordering() returns false (authored by kmclaughlin). · Explain Why

This revision was automatically updated to reflect the committed changes.

kmclaughlin added a commit: rG9f76a8526010: [LoopVectorize] Enable strict reductions when allowReordering() returns false.

Thank you @sdesmalen & @david-arm for the reviews and comments!

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Vectorize/

LoopVectorizationLegality.h

24 lines

lib/

Transforms/

Vectorize/

LoopVectorizationLegality.cpp

42 lines

LoopVectorize.cpp

6 lines

test/

Transforms/

LoopVectorize/

AArch64/

scalable-strict-fadd.ll

45 lines

strict-fadd.ll

186 lines

Diff 347943

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

Show First 20 Lines • Show All 152 Lines • ▼ Show 20 Lines	public:
/// \return true if scalable vectorization has been explicitly disabled.		/// \return true if scalable vectorization has been explicitly disabled.
bool isScalableVectorizationDisabled() const {		bool isScalableVectorizationDisabled() const {
return Scalable.Value == SK_FixedWidthOnly;		return Scalable.Value == SK_FixedWidthOnly;
}		}

/// If hints are provided that force vectorization, use the AlwaysPrint		/// If hints are provided that force vectorization, use the AlwaysPrint
/// pass name to force the frontend to print the diagnostic.		/// pass name to force the frontend to print the diagnostic.
const char *vectorizeAnalysisPassName() const;		const char *vectorizeAnalysisPassName() const;

bool allowReordering() const {		/// When enabling loop hints are provided we allow the vectorizer to change
		sdesmalenUnsubmitted Not Done Reply Inline Actions is this change necessary? sdesmalen: is this change necessary?
		david-armUnsubmitted Not Done Reply Inline Actions It's fixing a missing case where we weren't previously allowing reordering for scalable VF=vscale x 1. I think it's worth fixing, but maybe it doesn't have to live in this patch? david-arm: It's fixing a missing case where we weren't previously allowing reordering for scalable…
		sdesmalenUnsubmitted Not Done Reply Inline Actions @kmclaughlin can this be pulled out into a separate patch, or does it depend on changes in this patch in order to test it? I find the way the condition is written very confusing. It looks like the condition is synonymous, but it isn't. How about writing `EC.isVector()` instead? sdesmalen: @kmclaughlin can this be pulled out into a separate patch, or does it depend on changes in this…
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions @sdesmalen this doesn't depend on any other changes here so I've removed it from this patch kmclaughlin: @sdesmalen this doesn't depend on any other changes here so I've removed it from this patch
		sdesmalenUnsubmitted Done Reply Inline Actions I'd suggest moving the implementation of `allowReordering` to `LoopVectorizationLegality.cpp`, so that you don't need to add the extern+include above. sdesmalen: I'd suggest moving the implementation of `allowReordering` to `LoopVectorizationLegality.cpp`…
// When enabling loop hints are provided we allow the vectorizer to change		/// the order of operations that is given by the scalar loop. This is not
// the order of operations that is given by the scalar loop. This is not		/// enabled by default because can be unsafe or inefficient. For example,
// enabled by default because can be unsafe or inefficient. For example,		/// reordering floating-point operations will change the way round-off
// reordering floating-point operations will change the way round-off		/// error accumulates in the loop.
// error accumulates in the loop.		bool allowReordering() const;
ElementCount EC = getWidth();
return getForce() == LoopVectorizeHints::FK_Enabled \|\|
EC.getKnownMinValue() > 1;
}

bool isPotentiallyUnsafe() const {		bool isPotentiallyUnsafe() const {
// Avoid FP vectorization if the target is unsure about proper support.		// Avoid FP vectorization if the target is unsure about proper support.
// This may be related to the SIMD unit in the target not handling		// This may be related to the SIMD unit in the target not handling
// IEEE 754 FP ops properly, or bad single-to-double promotions.		// IEEE 754 FP ops properly, or bad single-to-double promotions.
// Otherwise, a sequence of vectorized loops, even without reduction,		// Otherwise, a sequence of vectorized loops, even without reduction,
// could lead to different end results on the destination vectors.		// could lead to different end results on the destination vectors.
return getForce() != LoopVectorizeHints::FK_Enabled && PotentiallyUnsafe;		return getForce() != LoopVectorizeHints::FK_Enabled && PotentiallyUnsafe;
Show All 34 Lines	void addExactFPMathInst(Instruction *I) {
if (I && !ExactFPMathInst)		if (I && !ExactFPMathInst)
ExactFPMathInst = I;		ExactFPMathInst = I;
}		}

void addRuntimePointerChecks(unsigned Num) { NumRuntimePointerChecks = Num; }		void addRuntimePointerChecks(unsigned Num) { NumRuntimePointerChecks = Num; }


Instruction *getExactFPInst() { return ExactFPMathInst; }		Instruction *getExactFPInst() { return ExactFPMathInst; }
bool canVectorizeFPMath(const LoopVectorizeHints &Hints) const {
return !ExactFPMathInst \|\| Hints.allowReordering();
}

unsigned getNumRuntimePointerChecks() const {		unsigned getNumRuntimePointerChecks() const {
return NumRuntimePointerChecks;		return NumRuntimePointerChecks;
}		}

private:		private:
unsigned NumRuntimePointerChecks = 0;		unsigned NumRuntimePointerChecks = 0;
Instruction *ExactFPMathInst = nullptr;		Instruction *ExactFPMathInst = nullptr;
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	public:
/// This does not mean that it is profitable to vectorize this		/// This does not mean that it is profitable to vectorize this
/// loop, only that it is legal to do so.		/// loop, only that it is legal to do so.
/// Temporarily taking UseVPlanNativePath parameter. If true, take		/// Temporarily taking UseVPlanNativePath parameter. If true, take
/// the new code path being implemented for outer loop vectorization		/// the new code path being implemented for outer loop vectorization
/// (should be functional for inner loop vectorization) based on VPlan.		/// (should be functional for inner loop vectorization) based on VPlan.
/// If false, good old LV code.		/// If false, good old LV code.
bool canVectorize(bool UseVPlanNativePath);		bool canVectorize(bool UseVPlanNativePath);

		/// Returns true if it is legal to vectorize the FP math operations in this
		/// loop. Vectorizing is legal if we allow reordering of FP operations, or if
		/// we can use in-order reductions.
		bool canVectorizeFPMath(bool EnableStrictReductions);
		david-armUnsubmitted Done Reply Inline Actions I think we can avoid passing in the Hints here as they are already a member of the class with the same name? david-arm: I think we can avoid passing in the Hints here as they are already a member of the class with…

/// Return true if we can vectorize this loop while folding its tail by		/// Return true if we can vectorize this loop while folding its tail by
/// masking, and mark all respective loads/stores for masking.		/// masking, and mark all respective loads/stores for masking.
/// This object's state is only modified iff this function returns true.		/// This object's state is only modified iff this function returns true.
bool prepareToFoldTailByMasking();		bool prepareToFoldTailByMasking();

/// Returns the primary induction variable.		/// Returns the primary induction variable.
PHINode *getPrimaryInduction() { return PrimaryInduction; }		PHINode *getPrimaryInduction() { return PrimaryInduction; }

▲ Show 20 Lines • Show All 272 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

Show All 31 Lines
#define DEBUG_TYPE LV_NAME		#define DEBUG_TYPE LV_NAME

extern cl::opt<bool> EnableVPlanPredication;		extern cl::opt<bool> EnableVPlanPredication;

static cl::opt<bool>		static cl::opt<bool>
EnableIfConversion("enable-if-conversion", cl::init(true), cl::Hidden,		EnableIfConversion("enable-if-conversion", cl::init(true), cl::Hidden,
cl::desc("Enable if-conversion during vectorization."));		cl::desc("Enable if-conversion during vectorization."));

		namespace llvm {
		cl::opt<bool>
		HintsAllowReordering("hints-allow-reordering", cl::init(true), cl::Hidden,
		cl::desc("Allow enabling loop hints to reorder "
		"FP operations during vectorization."));
		}

// TODO: Move size-based thresholds out of legality checking, make cost based		// TODO: Move size-based thresholds out of legality checking, make cost based
// decisions instead of hard thresholds.		// decisions instead of hard thresholds.
static cl::opt<unsigned> VectorizeSCEVCheckThreshold(		static cl::opt<unsigned> VectorizeSCEVCheckThreshold(
"vectorize-scev-check-threshold", cl::init(16), cl::Hidden,		"vectorize-scev-check-threshold", cl::init(16), cl::Hidden,
cl::desc("The maximum number of SCEV checks allowed."));		cl::desc("The maximum number of SCEV checks allowed."));

static cl::opt<unsigned> PragmaVectorizeSCEVCheckThreshold(		static cl::opt<unsigned> PragmaVectorizeSCEVCheckThreshold(
"pragma-vectorize-scev-check-threshold", cl::init(128), cl::Hidden,		"pragma-vectorize-scev-check-threshold", cl::init(128), cl::Hidden,
▲ Show 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	if (getWidth() == ElementCount::getFixed(1))
return LV_NAME;		return LV_NAME;
if (getForce() == LoopVectorizeHints::FK_Disabled)		if (getForce() == LoopVectorizeHints::FK_Disabled)
return LV_NAME;		return LV_NAME;
if (getForce() == LoopVectorizeHints::FK_Undefined && getWidth().isZero())		if (getForce() == LoopVectorizeHints::FK_Undefined && getWidth().isZero())
return LV_NAME;		return LV_NAME;
return OptimizationRemarkAnalysis::AlwaysPrint;		return OptimizationRemarkAnalysis::AlwaysPrint;
}		}

		bool LoopVectorizeHints::allowReordering() const {
		// Allow the vectorizer to change the order of operations if enabling
		// loop hints are provided
		ElementCount EC = getWidth();
		return HintsAllowReordering &&
		(getForce() == LoopVectorizeHints::FK_Enabled \|\|
		EC.getKnownMinValue() > 1);
		}

void LoopVectorizeHints::getHintsFromMetadata() {		void LoopVectorizeHints::getHintsFromMetadata() {
MDNode *LoopID = TheLoop->getLoopID();		MDNode *LoopID = TheLoop->getLoopID();
if (!LoopID)		if (!LoopID)
return;		return;

// First operand should refer to the loop id itself.		// First operand should refer to the loop id itself.
assert(LoopID->getNumOperands() > 0 && "requires at least one operand");		assert(LoopID->getNumOperands() > 0 && "requires at least one operand");
assert(LoopID->getOperand(0) == LoopID && "invalid loop id");		assert(LoopID->getOperand(0) == LoopID && "invalid loop id");
▲ Show 20 Lines • Show All 657 Lines • ▼ Show 20 Lines	if (LAI->hasDependenceInvolvingLoopInvariantAddress()) {
return false;		return false;
}		}
Requirements->addRuntimePointerChecks(LAI->getNumRuntimePointerChecks());		Requirements->addRuntimePointerChecks(LAI->getNumRuntimePointerChecks());
PSE.addPredicate(LAI->getPSE().getUnionPredicate());		PSE.addPredicate(LAI->getPSE().getUnionPredicate());

return true;		return true;
}		}

		bool LoopVectorizationLegality::canVectorizeFPMath(
		bool EnableStrictReductions) {

		// First check if there is any ExactFP math or if we allow reassociations
		if (!Requirements->getExactFPInst() \|\| Hints->allowReordering())
		return true;
		sdesmalenUnsubmitted Not Done Reply Inline Actions Should a hint of VF=1 really lead to the diagnostic `"loop not vectorized: cannot prove it is safe to reorder floating-point operations"`? sdesmalen: Should a hint of VF=1 really lead to the diagnostic `"loop not vectorized: cannot prove it is…
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions I've removed this check as I don't think it's necessary, but it was added to be consistent with allowReordering() which returns false if `EC.getKnownMinValue() > 1` kmclaughlin: I've removed this check as I don't think it's necessary, but it was added to be consistent with…

		sdesmalenUnsubmitted Not Done Reply Inline Actions This will also return false if there are no reductions at all, or if all reductions are unordered? sdesmalen: This will also return false if there are no reductions at all, or if all reductions are…
		// If the above is false, we have ExactFPMath & do not allow reordering.
		sdesmalenUnsubmitted Done Reply Inline Actions nit: can you fold this into the condition below: if (!EnableStrictReductions \|\| any_of(...)) return false; sdesmalen: nit: can you fold this into the condition below: if (!EnableStrictReductions \|\| any_of(...))…
		david-armUnsubmitted Done Reply Inline Actions I guess the comment below will need updating too if the check is moved. david-arm: I guess the comment below will need updating too if the check is moved.
		// If the EnableStrictReductions flag is set, first check if we have any
		// Exact FP induction vars, which we cannot vectorize.
		if (!EnableStrictReductions \|\|
		any_of(getInductionVars(), [&](auto &Induction) -> bool {
		sdesmalenUnsubmitted Not Done Reply Inline Actions How about returning true if for each reduction variable, any of the following conditions is true: The reduction is no ExactFPMath instruction for the reduction. The reduction is unordered. EnableStrictReductions is true. sdesmalen: How about returning true if for each reduction variable, any of the following conditions is…
		InductionDescriptor IndDesc = Induction.second;
		sdesmalenUnsubmitted Not Done Reply Inline Actions I'd prefer the default case to be `return false;`, i.e. when we cannot explicitly determine it is safe, we assume it isn't safe. That would handle the case where `Requirements->getExactFPInst()` is true, but it isn't an instruction used in the reduction. (although I don't know if that would ever happen?) sdesmalen: I'd prefer the default case to be `return false;`, i.e. when we cannot explicitly determine it…
		return IndDesc.getExactFPMathInst();
		}))
		return false;

		// We can now only vectorize if all reductions with Exact FP math also
		// have the isOrdered flag set, which indicates that we can move the
		// reduction operations in-loop.
		sdesmalenUnsubmitted Done Reply Inline Actions is it sufficient to write: if (any_of(..... { })) return false; (i.e. if `ExactIndVars` is true, then `!getInductionVars().empty()` must also be true) sdesmalen: is it sufficient to write: if (any_of(..... { })) return false; (i.e. if `ExactIndVars`…
		return (all_of(getReductionVars(), [&](auto &Reduction) -> bool {
		RecurrenceDescriptor RdxDesc = Reduction.second;
		return !RdxDesc.hasExactFPMath() \|\| RdxDesc.isOrdered();
		sdesmalenUnsubmitted Done Reply Inline Actions Can you rewrite this to: // We can now only vectorize if all reductions with Exact FP math also // have the isOrdered flag set, which indicates that we can move the // reduction operations in-loop. return all_of(getReductionVars(), [&](auto &Reduction) -> bool { RecurrenceDescriptor RdxDesc = Reduction.second; return !RdxDesc.hasExactFPMath() \|\| RdxDesc.isOrdered(); }); sdesmalen: Can you rewrite this to: // We can now only vectorize if all reductions with Exact FP math…
		}));
		}
		sdesmalenUnsubmitted Not Done Reply Inline Actions Is it still necessary to iterate through the reduction variables at this point? Given that EnableStrictReductions is true, and that reductions are the only other operations that can have exact FPMath instructions, I think you can just return `true`. sdesmalen: Is it still necessary to iterate through the reduction variables at this point? Given that…
		david-armUnsubmitted Not Done Reply Inline Actions We don't support vectorising all of these reductions, for example we don't suppose strict reductions involving `fmul` and we don't support chains of `fadds` currently either. That's why in the code below we check `!RdxDesc.isOrdered()` because the ordered flag is only set for cases we can support at the moment. I think @kmclaughlin has added a test for this case as well below called `fast_induction_unordered_reduction` which shows how there are both fmul and fadd reductions in the same loop. david-arm: We don't support vectorising all of these reductions, for example we don't suppose strict…

bool LoopVectorizationLegality::isInductionPhi(const Value *V) {		bool LoopVectorizationLegality::isInductionPhi(const Value *V) {
Value In0 = const_cast<Value >(V);		Value In0 = const_cast<Value >(V);
		david-armUnsubmitted Done Reply Inline Actions nit: This is just a suggestion, but you could rename `ExactRdxVars` to `HasExactRdxVar` and then here simply do: return !HasExactRdxVar; since I think when the list is empty that variable should be false? david-arm: nit: This is just a suggestion, but you could rename `ExactRdxVars` to `HasExactRdxVar` and…
PHINode *PN = dyn_cast_or_null<PHINode>(In0);		PHINode *PN = dyn_cast_or_null<PHINode>(In0);
if (!PN)		if (!PN)
return false;		return false;

return Inductions.count(PN);		return Inductions.count(PN);
}		}

bool LoopVectorizationLegality::isCastedInductionVariable(const Value *V) {		bool LoopVectorizationLegality::isCastedInductionVariable(const Value *V) {
▲ Show 20 Lines • Show All 387 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 325 Lines • ▼ Show 20 Lines	cl::desc("The maximum interleave count to use when interleaving a scalar "
"reduction in a nested loop."));		"reduction in a nested loop."));

static cl::opt<bool>		static cl::opt<bool>
PreferInLoopReductions("prefer-inloop-reductions", cl::init(false),		PreferInLoopReductions("prefer-inloop-reductions", cl::init(false),
cl::Hidden,		cl::Hidden,
cl::desc("Prefer in-loop vector reductions, "		cl::desc("Prefer in-loop vector reductions, "
"overriding the targets preference."));		"overriding the targets preference."));

		// FIXME: When loop hints are passed which allow reordering of FP operations,
		// we still choose to use strict reductions with this flag. We should instead
		// use the default behaviour of vectorizing with unordered reductions if
		// reordering is allowed.
cl::opt<bool> EnableStrictReductions(		cl::opt<bool> EnableStrictReductions(
"enable-strict-reductions", cl::init(false), cl::Hidden,		"enable-strict-reductions", cl::init(false), cl::Hidden,
cl::desc("Enable the vectorisation of loops with in-order (strict) "		cl::desc("Enable the vectorisation of loops with in-order (strict) "
"FP reductions"));		"FP reductions"));

static cl::opt<bool> PreferPredicatedReductionSelect(		static cl::opt<bool> PreferPredicatedReductionSelect(
"prefer-predicated-reduction-select", cl::init(false), cl::Hidden,		"prefer-predicated-reduction-select", cl::init(false), cl::Hidden,
cl::desc(		cl::desc(
▲ Show 20 Lines • Show All 9,494 Lines • ▼ Show 20 Lines	while (!Worklist.empty()) {
for (Use &Op : I->operands())		for (Use &Op : I->operands())
if (auto *OpI = dyn_cast<Instruction>(Op))		if (auto *OpI = dyn_cast<Instruction>(Op))
Worklist.push_back(OpI);		Worklist.push_back(OpI);
}		}
}		}

LoopVectorizePass::LoopVectorizePass(LoopVectorizeOptions Opts)		LoopVectorizePass::LoopVectorizePass(LoopVectorizeOptions Opts)
: InterleaveOnlyWhenForced(Opts.InterleaveOnlyWhenForced \|\|		: InterleaveOnlyWhenForced(Opts.InterleaveOnlyWhenForced \|\|
!EnableLoopInterleaving),		!EnableLoopInterleaving),
VectorizeOnlyWhenForced(Opts.VectorizeOnlyWhenForced \|\|		VectorizeOnlyWhenForced(Opts.VectorizeOnlyWhenForced \|\|
		sdesmalenUnsubmitted Done Reply Inline Actions nit: indentation, please use clang-format. sdesmalen: nit: indentation, please use clang-format.
!EnableLoopVectorization) {}		!EnableLoopVectorization) {}

bool LoopVectorizePass::processLoop(Loop *L) {		bool LoopVectorizePass::processLoop(Loop *L) {
assert((EnableVPlanNativePath \|\| L->isInnermost()) &&		assert((EnableVPlanNativePath \|\| L->isInnermost()) &&
"VPlan-native path is not enabled. Only process inner loops.");		"VPlan-native path is not enabled. Only process inner loops.");
		sdesmalenUnsubmitted Not Done Reply Inline Actions Why do all of the reductions have to be ordered for the LV to be able to vectorize FP math? (e.g. if there is an integer reduction and an ordered FP reduction, it would now choose not to vectorize based on this condition) sdesmalen: Why do all of the reductions have to be ordered for the LV to be able to vectorize FP math? (e.
		david-armUnsubmitted Done Reply Inline Actions I guess it might be worth adding a test for this too then, i.e. having a loop with both an integer and FP reduction and ensure we vectorise with ordered reductions. david-arm: I guess it might be worth adding a test for this too then, i.e. having a loop with both an…
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions Hi @sdesmalen, we should only need the FP reductions in the loop to be ordered. I've changed this so that only reductions where `hasExactFPMath()` is true need to be ordered & added a test for this scenario to strict-fadd.ll kmclaughlin: Hi @sdesmalen, we should only need the FP reductions in the loop to be ordered. I've changed…

#ifndef NDEBUG		#ifndef NDEBUG
const std::string DebugLocStr = getDebugLocString(L);		const std::string DebugLocStr = getDebugLocString(L);
#endif /* NDEBUG */		#endif /* NDEBUG */

LLVM_DEBUG(dbgs() << "\nLV: Checking a loop in \""		LLVM_DEBUG(dbgs() << "\nLV: Checking a loop in \""
<< L->getHeader()->getParent()->getName() << "\" from "		<< L->getHeader()->getParent()->getName() << "\" from "
<< DebugLocStr << "\n");		<< DebugLocStr << "\n");
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	if (Hints.isPotentiallyUnsafe() &&
reportVectorizationFailure(		reportVectorizationFailure(
"Potentially unsafe FP op prevents vectorization",		"Potentially unsafe FP op prevents vectorization",
"loop not vectorized due to unsafe FP support.",		"loop not vectorized due to unsafe FP support.",
"UnsafeFP", ORE, L);		"UnsafeFP", ORE, L);
Hints.emitRemarkWithHints();		Hints.emitRemarkWithHints();
return false;		return false;
}		}

if (!Requirements.canVectorizeFPMath(Hints)) {		if (!LVL.canVectorizeFPMath(EnableStrictReductions)) {
		sdesmalenUnsubmitted Not Done Reply Inline Actions You don't need to pass EnableStrictReductions, since it is defined in the same file? sdesmalen: You don't need to pass EnableStrictReductions, since it is defined in the same file?
		david-armUnsubmitted Not Done Reply Inline Actions I think it has to be passed since `EnableStrictReductions` lives in LoopVectorize.cpp and canVectorizeFPMath lives in LoopVectorizationLegality.cpp. david-arm: I think it has to be passed since `EnableStrictReductions` lives in LoopVectorize.cpp and…
		sdesmalenUnsubmitted Not Done Reply Inline Actions You're right, I didn't realise that, thanks! sdesmalen: You're right, I didn't realise that, thanks!
ORE->emit([&]() {		ORE->emit([&]() {
		sdesmalenUnsubmitted Not Done Reply Inline Actions This condition is a bit odd. Should `canVectorizeOrderedFPMath` just contain the call to `Requirements.canVectorizeFPMath` instead? i.e. in order to vectorize ordered FP math, it must at least be able to vectorize FP math. sdesmalen: This condition is a bit odd. Should `canVectorizeOrderedFPMath` just contain the call to…
		david-armUnsubmitted Not Done Reply Inline Actions I think the existing canVectorizeFPMath function is badly named because it actually checks for reordering: bool canVectorizeFPMath(const LoopVectorizeHints &Hints) const { return !ExactFPMathInst \|\| Hints.allowReordering(); } so the logic in Kerry's patch is something like this: Is this an exact FP math instruction? If not -> vectorise, else Do hints permit reordering? If so -> vectorise, else Can we vectorise with ordered reductions? If not -> emit remark. It probably is possible to combine these into a single LoopVectorizationLegality::canVectorizeFPMath function that does all the above, since that class does have access to the Requirements I think. david-arm: I think the existing canVectorizeFPMath function is badly named because it actually checks for…
auto *ExactFPMathInst = Requirements.getExactFPInst();		auto *ExactFPMathInst = Requirements.getExactFPInst();
return OptimizationRemarkAnalysisFPCommute(DEBUG_TYPE, "CantReorderFPOps",		return OptimizationRemarkAnalysisFPCommute(DEBUG_TYPE, "CantReorderFPOps",
ExactFPMathInst->getDebugLoc(),		ExactFPMathInst->getDebugLoc(),
ExactFPMathInst->getParent())		ExactFPMathInst->getParent())
<< "loop not vectorized: cannot prove it is safe to reorder "		<< "loop not vectorized: cannot prove it is safe to reorder "
"floating-point operations";		"floating-point operations";
});		});
LLVM_DEBUG(dbgs() << "LV: loop not vectorized: cannot prove it is safe to "		LLVM_DEBUG(dbgs() << "LV: loop not vectorized: cannot prove it is safe to "
▲ Show 20 Lines • Show All 341 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll

; RUN: opt < %s -loop-vectorize -scalable-vectorization=on -mtriple aarch64-unknown-linux-gnu -mattr=+sve -enable-strict-reductions=false -S 2>%t \| FileCheck %s --check-prefix=CHECK-UNORDERED		; RUN: opt < %s -loop-vectorize -scalable-vectorization=on -mtriple aarch64-unknown-linux-gnu -mattr=+sve -enable-strict-reductions=false -hints-allow-reordering=false -S 2>%t \| FileCheck %s --check-prefix=CHECK-NOT-VECTORIZED
; RUN: opt < %s -loop-vectorize -scalable-vectorization=on -mtriple aarch64-unknown-linux-gnu -mattr=+sve -enable-strict-reductions=true -S 2>%t \| FileCheck %s --check-prefix=CHECK-ORDERED		; RUN: opt < %s -loop-vectorize -scalable-vectorization=on -mtriple aarch64-unknown-linux-gnu -mattr=+sve -enable-strict-reductions=false -hints-allow-reordering=true -S 2>%t \| FileCheck %s --check-prefix=CHECK-UNORDERED
		; RUN: opt < %s -loop-vectorize -scalable-vectorization=on -mtriple aarch64-unknown-linux-gnu -mattr=+sve -enable-strict-reductions=true -hints-allow-reordering=false -S 2>%t \| FileCheck %s --check-prefix=CHECK-ORDERED

define float @fadd_strict(float* noalias nocapture readonly %a, i64 %n) {		define float @fadd_strict(float* noalias nocapture readonly %a, i64 %n) {
; CHECK-ORDERED-LABEL: @fadd_strict		; CHECK-ORDERED-LABEL: @fadd_strict
; CHECK-ORDERED: vector.body:		; CHECK-ORDERED: vector.body:
; CHECK-ORDERED: %[[VEC_PHI:.]] = phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX:.]], %vector.body ]		; CHECK-ORDERED: %[[VEC_PHI:.]] = phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX:.]], %vector.body ]
; CHECK-ORDERED: %[[LOAD:.]] = load <vscale x 8 x float>, <vscale x 8 x float>		; CHECK-ORDERED: %[[LOAD:.]] = load <vscale x 8 x float>, <vscale x 8 x float>
; CHECK-ORDERED: %[[RDX]] = call float @llvm.vector.reduce.fadd.nxv8f32(float %[[VEC_PHI]], <vscale x 8 x float> %[[LOAD]])		; CHECK-ORDERED: %[[RDX]] = call float @llvm.vector.reduce.fadd.nxv8f32(float %[[VEC_PHI]], <vscale x 8 x float> %[[LOAD]])
; CHECK-ORDERED: for.end		; CHECK-ORDERED: for.end
; CHECK-ORDERED: %[[PHI:.]] = phi float [ %[[SCALAR:.]], %for.body ], [ %[[RDX]], %middle.block ]		; CHECK-ORDERED: %[[PHI:.]] = phi float [ %[[SCALAR:.]], %for.body ], [ %[[RDX]], %middle.block ]
; CHECK-ORDERED: ret float %[[PHI]]		; CHECK-ORDERED: ret float %[[PHI]]

; CHECK-UNORDERED-LABEL: @fadd_strict		; CHECK-UNORDERED-LABEL: @fadd_strict
		david-armUnsubmitted Done Reply Inline Actions Thanks a lot for adding the VF=vscale x 1 case here, but perhaps `CHECK-SCALAR` should be `CHECK-VF1U1`, since we're still vectorising? Also, it's probably worth adding an extra CHECK line for at least one instruction that shows the "<vscale x 1 ..." - maybe the `call float ...` instruction? david-arm: Thanks a lot for adding the VF=vscale x 1 case here, but perhaps `CHECK-SCALAR` should be…
; CHECK-UNORDERED: vector.body		; CHECK-UNORDERED: vector.body
; CHECK-UNORDERED: %[[VEC_PHI:.]] = phi <vscale x 8 x float> [ insertelement (<vscale x 8 x float> shufflevector (<vscale x 8 x float> insertelement (<vscale x 8 x float> undef, float -0.000000e+00, i32 0), <vscale x 8 x float> undef, <vscale x 8 x i32> zeroinitializer), float 0.000000e+00, i32 0), %vector.ph ], [ %[[FADD_VEC:.]], %vector.body ]		; CHECK-UNORDERED: %[[VEC_PHI:.]] = phi <vscale x 8 x float> [ insertelement (<vscale x 8 x float> shufflevector (<vscale x 8 x float> insertelement (<vscale x 8 x float> undef, float -0.000000e+00, i32 0), <vscale x 8 x float> undef, <vscale x 8 x i32> zeroinitializer), float 0.000000e+00, i32 0), %vector.ph ], [ %[[FADD_VEC:.]], %vector.body ]
; CHECK-UNORDERED: %[[LOAD_VEC:.]] = load <vscale x 8 x float>, <vscale x 8 x float>		; CHECK-UNORDERED: %[[LOAD_VEC:.]] = load <vscale x 8 x float>, <vscale x 8 x float>
; CHECK-UNORDERED: %[[FADD_VEC]] = fadd <vscale x 8 x float> %[[LOAD_VEC]], %[[VEC_PHI]]		; CHECK-UNORDERED: %[[FADD_VEC]] = fadd <vscale x 8 x float> %[[LOAD_VEC]], %[[VEC_PHI]]
; CHECK-UNORDERED-NOT: call float @llvm.vector.reduce.fadd		; CHECK-UNORDERED-NOT: call float @llvm.vector.reduce.fadd
; CHECK-UNORDERED: middle.block		; CHECK-UNORDERED: middle.block
; CHECK-UNORDERED: %[[RDX:.*]] = call float @llvm.vector.reduce.fadd.nxv8f32(float -0.000000e+00, <vscale x 8 x float> %[[FADD_VEC]])		; CHECK-UNORDERED: %[[RDX:.*]] = call float @llvm.vector.reduce.fadd.nxv8f32(float -0.000000e+00, <vscale x 8 x float> %[[FADD_VEC]])
; CHECK-UNORDERED: for.body		; CHECK-UNORDERED: for.body
; CHECK-UNORDERED: %[[LOAD:.]] = load float, float		; CHECK-UNORDERED: %[[LOAD:.]] = load float, float
; CHECK-UNORDERED: %[[FADD:.]] = fadd float %[[LOAD]], {{.}}		; CHECK-UNORDERED: %[[FADD:.]] = fadd float %[[LOAD]], {{.}}
; CHECK-UNORDERED: for.end		; CHECK-UNORDERED: for.end
; CHECK-UNORDERED: %[[RES:.*]] = phi float [ %[[FADD]], %for.body ], [ %[[RDX]], %middle.block ]		; CHECK-UNORDERED: %[[RES:.*]] = phi float [ %[[FADD]], %for.body ], [ %[[RDX]], %middle.block ]
; CHECK-UNORDERED: ret float %[[RES]]		; CHECK-UNORDERED: ret float %[[RES]]

		; CHECK-NOT-VECTORIZED-LABEL: @fadd_strict
		; CHECK-NOT-VECTORIZED-NOT: vector.body

		sdesmalenUnsubmitted Done Reply Inline Actions Should all test functions have check lines for all VF8UF1, VF8UF4, etc. ? Conversely, is it sufficient to just pass the interleave-count hint (not the vector width) via metadata and have 1 RUN line for VF8UF1, VF8UF4, VF4UF1? Which also makes me wonder, what is the additional value of having both VF8UF1 and VF4UF1 ? sdesmalen: Should all test functions have check lines for all VF8UF1, VF8UF4, etc. ? Conversely, is it…
		kmclaughlinAuthorUnsubmitted Not Done Reply Inline Actions I think it should be sufficient to pass the interleave count via metadata. I've changed VF4UF1 to VF8UF1 as there was no additional benefit in having both, similarly I've changed VF4UF1 in the strict-fadd.ll test as well to reduce the number of RUN lines. kmclaughlin: I think it should be sufficient to pass the interleave count via metadata. I've changed VF4UF1…
entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
%sum.07 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]		%sum.07 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]
%arrayidx = getelementptr inbounds float, float* %a, i64 %iv		%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
%0 = load float, float* %arrayidx, align 4		%0 = load float, float* %arrayidx, align 4
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
; CHECK-UNORDERED: %[[BIN_RDX3:.*]] = fadd <vscale x 8 x float> %[[VEC_FADD4]], %[[BIN_RDX2]]		; CHECK-UNORDERED: %[[BIN_RDX3:.*]] = fadd <vscale x 8 x float> %[[VEC_FADD4]], %[[BIN_RDX2]]
; CHECK-UNORDERED: %[[RDX:.*]] = call float @llvm.vector.reduce.fadd.nxv8f32(float -0.000000e+00, <vscale x 8 x float> %[[BIN_RDX3]])		; CHECK-UNORDERED: %[[RDX:.*]] = call float @llvm.vector.reduce.fadd.nxv8f32(float -0.000000e+00, <vscale x 8 x float> %[[BIN_RDX3]])
; CHECK-UNORDERED: for.body		; CHECK-UNORDERED: for.body
; CHECK-UNORDERED: %[[LOAD:.]] = load float, float		; CHECK-UNORDERED: %[[LOAD:.]] = load float, float
; CHECK-UNORDERED: %[[FADD:.]] = fadd float %[[LOAD]], {{.}}		; CHECK-UNORDERED: %[[FADD:.]] = fadd float %[[LOAD]], {{.}}
; CHECK-UNORDERED: for.end		; CHECK-UNORDERED: for.end
; CHECK-UNORDERED: %[[RES:.*]] = phi float [ %[[FADD]], %for.body ], [ %[[RDX]], %middle.block ]		; CHECK-UNORDERED: %[[RES:.*]] = phi float [ %[[FADD]], %for.body ], [ %[[RDX]], %middle.block ]
; CHECK-UNORDERED: ret float %[[RES]]		; CHECK-UNORDERED: ret float %[[RES]]

		; CHECK-NOT-VECTORIZED-LABEL: @fadd_strict_unroll
		; CHECK-NOT-VECTORIZED-NOT: vector.body

entry:		entry:
br label %for.body		br label %for.body

for.body:		for.body:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
%sum.07 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]		%sum.07 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]
%arrayidx = getelementptr inbounds float, float* %a, i64 %iv		%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
%0 = load float, float* %arrayidx, align 4		%0 = load float, float* %arrayidx, align 4
Show All 9 Lines
define void @fadd_strict_interleave(float* noalias nocapture readonly %a, float* noalias nocapture readonly %b, i64 %n) {		define void @fadd_strict_interleave(float* noalias nocapture readonly %a, float* noalias nocapture readonly %b, i64 %n) {
; CHECK-ORDERED-LABEL: @fadd_strict_interleave		; CHECK-ORDERED-LABEL: @fadd_strict_interleave
; CHECK-ORDERED: entry		; CHECK-ORDERED: entry
; CHECK-ORDERED: %[[ARRAYIDX:.]] = getelementptr inbounds float, float %a, i64 1		; CHECK-ORDERED: %[[ARRAYIDX:.]] = getelementptr inbounds float, float %a, i64 1
; CHECK-ORDERED: %[[LOAD1:.]] = load float, float %a		; CHECK-ORDERED: %[[LOAD1:.]] = load float, float %a
; CHECK-ORDERED: %[[LOAD2:.]] = load float, float %[[ARRAYIDX]]		; CHECK-ORDERED: %[[LOAD2:.]] = load float, float %[[ARRAYIDX]]
; CHECK-ORDERED: vector.ph		; CHECK-ORDERED: vector.ph
; CHECK-ORDERED: %[[STEPVEC1:.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()		; CHECK-ORDERED: %[[STEPVEC1:.*]] = call <vscale x 4 x i64> @llvm.experimental.stepvector.nxv4i64()
; CHECK-ORDERED: %[[STEPVEC_ADD1:.*]] = add <vscale x 4 x i64> %[[STEPVEC1]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 0, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)		; CHECK-ORDERED: %[[STEPVEC_ADD1:.*]] = add <vscale x 4 x i64> %[[STEPVEC1]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 0, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
		sdesmalenUnsubmitted Not Done Reply Inline Actions Why does this test need to be vectorized with VF=vscale x 8 instead of VF=vscale x 4? Is that because it needs to be driven using the cmdline flags to circumvent the "hint allows reordering" behaviour? (and so that the 1 RUN line covers all tests?) If that's the case, can you do a NFC patch where you first change the test to use the new VF, and then rebase this patch on top? sdesmalen: Why does this test need to be vectorized with VF=vscale x 8 instead of VF=vscale x 4? Is that…
		kmclaughlinAuthorUnsubmitted Done Reply Inline Actions Hi @sdesmalen, it is the case that I changed the VF so that the test could be covered by one RUN line, and to try and circumvent the hints allow reordering behaviour. I will move changes to the RUN lines into a new patch so that this patch only adds new tests. kmclaughlin: Hi @sdesmalen, it is the case that I changed the VF so that the test could be covered by one…
		sdesmalenUnsubmitted Done Reply Inline Actions I think it's worth adding a flag to the vectorizer to disable this weird behaviour for testing purposes, so that we don't need to change this test, and so that you don't need the multiple RUN lines in the other test in favour of using metadata to control the VF per individual test. sdesmalen: I think it's worth adding a flag to the vectorizer to disable this weird behaviour for testing…
; CHECK-ORDERED: %[[STEPVEC_MUL:.*]] = mul <vscale x 4 x i64> %[[STEPVEC_ADD1]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 2, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)		; CHECK-ORDERED: %[[STEPVEC_MUL:.*]] = mul <vscale x 4 x i64> %[[STEPVEC_ADD1]], shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 2, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer)
; CHECK-ORDERED: %[[INDUCTION:.*]] = add <vscale x 4 x i64> shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 0, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer), %[[STEPVEC_MUL]]		; CHECK-ORDERED: %[[INDUCTION:.*]] = add <vscale x 4 x i64> shufflevector (<vscale x 4 x i64> insertelement (<vscale x 4 x i64> poison, i64 0, i32 0), <vscale x 4 x i64> poison, <vscale x 4 x i32> zeroinitializer), %[[STEPVEC_MUL]]
; CHECK-ORDERED: vector.body		; CHECK-ORDERED: vector.body
; CHECK-ORDERED: %[[VEC_PHI2:.]] = phi float [ %[[LOAD2]], %vector.ph ], [ %[[RDX2:.]], %vector.body ]		; CHECK-ORDERED: %[[VEC_PHI2:.]] = phi float [ %[[LOAD2]], %vector.ph ], [ %[[RDX2:.]], %vector.body ]
; CHECK-ORDERED: %[[VEC_PHI1:.]] = phi float [ %[[LOAD1]], %vector.ph ], [ %[[RDX1:.]], %vector.body ]		; CHECK-ORDERED: %[[VEC_PHI1:.]] = phi float [ %[[LOAD1]], %vector.ph ], [ %[[RDX1:.]], %vector.body ]
; CHECK-ORDERED: %[[VEC_IND:.]] = phi <vscale x 4 x i64> [ %[[INDUCTION]], %vector.ph ], [ {{.}}, %vector.body ]		; CHECK-ORDERED: %[[VEC_IND:.]] = phi <vscale x 4 x i64> [ %[[INDUCTION]], %vector.ph ], [ {{.}}, %vector.body ]
; CHECK-ORDERED: %[[GEP1:.]] = getelementptr inbounds float, float %b, <vscale x 4 x i64> %[[VEC_IND]]		; CHECK-ORDERED: %[[GEP1:.]] = getelementptr inbounds float, float %b, <vscale x 4 x i64> %[[VEC_IND]]
; CHECK-ORDERED: %[[MGATHER1:.]] = call <vscale x 4 x float> @llvm.masked.gather.nxv4f32.nxv4p0f32(<vscale x 4 x float> %[[GEP1]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> undef, i1 true, i32 0), <vscale x 4 x i1> undef, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x float> undef)		; CHECK-ORDERED: %[[MGATHER1:.]] = call <vscale x 4 x float> @llvm.masked.gather.nxv4f32.nxv4p0f32(<vscale x 4 x float> %[[GEP1]], i32 4, <vscale x 4 x i1> shufflevector (<vscale x 4 x i1> insertelement (<vscale x 4 x i1> undef, i1 true, i32 0), <vscale x 4 x i1> undef, <vscale x 4 x i32> zeroinitializer), <vscale x 4 x float> undef)
Show All 38 Lines
; CHECK-UNORDERED: %[[FADD2:.]] = fadd float %[[LOAD4]], {{.}}		; CHECK-UNORDERED: %[[FADD2:.]] = fadd float %[[LOAD4]], {{.}}
; CHECK-UNORDERED: for.end		; CHECK-UNORDERED: for.end
; CHECK-UNORDERED: %[[RDX1:.*]] = phi float [ %[[FADD1]], %for.body ], [ %[[VEC_RDX1]], %middle.block ]		; CHECK-UNORDERED: %[[RDX1:.*]] = phi float [ %[[FADD1]], %for.body ], [ %[[VEC_RDX1]], %middle.block ]
; CHECK-UNORDERED: %[[RDX2:.*]] = phi float [ %[[FADD2]], %for.body ], [ %[[VEC_RDX2]], %middle.block ]		; CHECK-UNORDERED: %[[RDX2:.*]] = phi float [ %[[FADD2]], %for.body ], [ %[[VEC_RDX2]], %middle.block ]
; CHECK-UNORDERED: store float %[[RDX1]], float* %a		; CHECK-UNORDERED: store float %[[RDX1]], float* %a
; CHECK-UNORDERED: store float %[[RDX2]], float* {{.*}}		; CHECK-UNORDERED: store float %[[RDX2]], float* {{.*}}
; CHECK-UNORDERED: ret void		; CHECK-UNORDERED: ret void

		; CHECK-NOT-VECTORIZED-LABEL: @fadd_strict_interleave
		; CHECK-NOT-VECTORIZED-NOT: vector.body

entry:		entry:
%arrayidxa = getelementptr inbounds float, float* %a, i64 1		%arrayidxa = getelementptr inbounds float, float* %a, i64 1
%a1 = load float, float* %a, align 4		%a1 = load float, float* %a, align 4
%a2 = load float, float* %arrayidxa, align 4		%a2 = load float, float* %arrayidxa, align 4
br label %for.body		br label %for.body

for.body:		for.body:
%add.phi1 = phi float [ %a2, %entry ], [ %add2, %for.body ]		%add.phi1 = phi float [ %a2, %entry ], [ %add2, %for.body ]
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
; CHECK-UNORDERED: %[[FADD1:.*]] = fadd float %[[LOAD1]], %[[LOAD2]]		; CHECK-UNORDERED: %[[FADD1:.*]] = fadd float %[[LOAD1]], %[[LOAD2]]
; CHECK-UNORDERED: %[[FADD2:.]] = fadd float {{.}}, %[[FADD1]]		; CHECK-UNORDERED: %[[FADD2:.]] = fadd float {{.}}, %[[FADD1]]
; CHECK-UNORDERED: for.end.loopexit		; CHECK-UNORDERED: for.end.loopexit
; CHECK-UNORDERED: %[[EXIT:.*]] = phi float [ %[[FADD2]], %for.body ], [ %[[RDX]], %middle.block ]		; CHECK-UNORDERED: %[[EXIT:.*]] = phi float [ %[[FADD2]], %for.body ], [ %[[RDX]], %middle.block ]
; CHECK-UNORDERED: for.end		; CHECK-UNORDERED: for.end
; CHECK-UNORDERED: %[[SUM:.*]] = phi float [ 0.000000e+00, %entry ], [ %[[EXIT]], %for.end.loopexit ]		; CHECK-UNORDERED: %[[SUM:.*]] = phi float [ 0.000000e+00, %entry ], [ %[[EXIT]], %for.end.loopexit ]
; CHECK-UNORDERED: ret float %[[SUM]]		; CHECK-UNORDERED: ret float %[[SUM]]

		; CHECK-NOT-VECTORIZED-LABEL: @fadd_of_sum
		; CHECK-NOT-VECTORIZED-NOT: vector.body

entry:		entry:
%arrayidx = getelementptr inbounds float, float* %a, i64 1		%arrayidx = getelementptr inbounds float, float* %a, i64 1
%0 = load float, float* %arrayidx, align 4		%0 = load float, float* %arrayidx, align 4
%cmp1 = fcmp ogt float %0, 5.000000e-01		%cmp1 = fcmp ogt float %0, 5.000000e-01
br i1 %cmp1, label %for.body, label %for.end		br i1 %cmp1, label %for.body, label %for.end

for.body: ; preds = %for.body		for.body: ; preds = %for.body
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
; CHECK-UNORDERED: for.body		; CHECK-UNORDERED: for.body
; CHECK-UNORDERED: %[[RES:.]] = phi float [ %bc.merge.rdx, %scalar.ph ], [ %[[FADD:.]], %for.inc ]		; CHECK-UNORDERED: %[[RES:.]] = phi float [ %bc.merge.rdx, %scalar.ph ], [ %[[FADD:.]], %for.inc ]
; CHECK-UNORDERED: for.inc		; CHECK-UNORDERED: for.inc
; CHECK-UNORDERED: %[[FADD]] = fadd float %[[RES]], {{.*}}		; CHECK-UNORDERED: %[[FADD]] = fadd float %[[RES]], {{.*}}
; CHECK-UNORDERED: for.end		; CHECK-UNORDERED: for.end
; CHECK-UNORDERED: %[[RDX_PHI:.*]] = phi float [ %[[FADD]], %for.inc ], [ %[[RDX]], %middle.block ]		; CHECK-UNORDERED: %[[RDX_PHI:.*]] = phi float [ %[[FADD]], %for.inc ], [ %[[RDX]], %middle.block ]
; CHECK-UNORDERED: ret float %[[RDX_PHI]]		; CHECK-UNORDERED: ret float %[[RDX_PHI]]

		; CHECK-NOT-VECTORIZED-LABEL: @fadd_conditional
		; CHECK-NOT-VECTORIZED-NOT: vector.body

entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %for.body		for.body: ; preds = %for.body
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.inc ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.inc ]
%res = phi float [ 1.000000e+00, %entry ], [ %fadd, %for.inc ]		%res = phi float [ 1.000000e+00, %entry ], [ %fadd, %for.inc ]
%arrayidx = getelementptr inbounds float, float* %b, i64 %iv		%arrayidx = getelementptr inbounds float, float* %b, i64 %iv
%0 = load float, float* %arrayidx, align 4		%0 = load float, float* %arrayidx, align 4
Show All 13 Lines	for.inc:
br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !2		br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !2

for.end:		for.end:
%rdx = phi float [ %fadd, %for.inc ]		%rdx = phi float [ %fadd, %for.inc ]
ret float %rdx		ret float %rdx
}		}

; Negative test - loop contains multiple fadds which we cannot safely reorder		; Negative test - loop contains multiple fadds which we cannot safely reorder
; Note: This test vectorizes the loop with a non-strict implementation, which reorders the FAdd operations.
; This is happening because we are using hints, where allowReordering returns true.
define float @fadd_multiple(float* noalias nocapture %a, float* noalias nocapture %b, i64 %n) {		define float @fadd_multiple(float* noalias nocapture %a, float* noalias nocapture %b, i64 %n) {
; CHECK-ORDERED-LABEL: @fadd_multiple		; CHECK-ORDERED-LABEL: @fadd_multiple
; CHECK-ORDERED: vector.body		; CHECK-ORDERED-NOT: vector.body
; CHECK-ORDERED: %[[PHI:.]] = phi <vscale x 8 x float> [ insertelement (<vscale x 8 x float> shufflevector (<vscale x 8 x float> insertelement (<vscale x 8 x float> undef, float -0.000000e+00, i32 0), <vscale x 8 x float> undef, <vscale x 8 x i32> zeroinitializer), float -0.000000e+00, i32 0), %vector.ph ], [ %[[VEC_FADD2:.]], %vector.body ]
; CHECK-ORDERED: %[[VEC_LOAD1:.*]] = load <vscale x 8 x float>, <vscale x 8 x float>
; CHECK-ORDERED: %[[VEC_FADD1:.*]] = fadd <vscale x 8 x float> %[[PHI]], %[[VEC_LOAD1]]
; CHECK-ORDERED: %[[VEC_LOAD2:.*]] = load <vscale x 8 x float>, <vscale x 8 x float>
; CHECK-ORDERED: %[[VEC_FADD2]] = fadd <vscale x 8 x float> %[[VEC_FADD1]], %[[VEC_LOAD2]]
; CHECK-ORDERED: middle.block
; CHECK-ORDERED: %[[RDX:.*]] = call float @llvm.vector.reduce.fadd.nxv8f32(float -0.000000e+00, <vscale x 8 x float> %[[VEC_FADD2]])
; CHECK-ORDERED: for.body
; CHECK-ORDERED: %[[SUM:.]] = phi float [ %bc.merge.rdx, %scalar.ph ], [ %[[FADD2:.]], %for.body ]
; CHECK-ORDERED: %[[LOAD1:.]] = load float, float
; CHECK-ORDERED: %[[FADD1:.*]] = fadd float %[[SUM]], %[[LOAD1]]
; CHECK-ORDERED: %[[LOAD2:.]] = load float, float
; CHECK-ORDERED: %[[FADD2]] = fadd float %[[FADD1]], %[[LOAD2]]
; CHECK-ORDERED: for.end
; CHECK-ORDERED: %[[RET:.*]] = phi float [ %[[FADD2]], %for.body ], [ %[[RDX]], %middle.block ]
; CHECK-ORDERED: ret float %[[RET]]

; CHECK-UNORDERED-LABEL: @fadd_multiple		; CHECK-UNORDERED-LABEL: @fadd_multiple
; CHECK-UNORDERED: vector.body		; CHECK-UNORDERED: vector.body
; CHECK-UNORDERED: %[[PHI:.]] = phi <vscale x 8 x float> [ insertelement (<vscale x 8 x float> shufflevector (<vscale x 8 x float> insertelement (<vscale x 8 x float> undef, float -0.000000e+00, i32 0), <vscale x 8 x float> undef, <vscale x 8 x i32> zeroinitializer), float -0.000000e+00, i32 0), %vector.ph ], [ %[[VEC_FADD2:.]], %vector.body ]		; CHECK-UNORDERED: %[[PHI:.]] = phi <vscale x 8 x float> [ insertelement (<vscale x 8 x float> shufflevector (<vscale x 8 x float> insertelement (<vscale x 8 x float> undef, float -0.000000e+00, i32 0), <vscale x 8 x float> undef, <vscale x 8 x i32> zeroinitializer), float -0.000000e+00, i32 0), %vector.ph ], [ %[[VEC_FADD2:.]], %vector.body ]
; CHECK-UNORDERED: %[[VEC_LOAD1:.*]] = load <vscale x 8 x float>, <vscale x 8 x float>		; CHECK-UNORDERED: %[[VEC_LOAD1:.*]] = load <vscale x 8 x float>, <vscale x 8 x float>
; CHECK-UNORDERED: %[[VEC_FADD1:.*]] = fadd <vscale x 8 x float> %[[PHI]], %[[VEC_LOAD1]]		; CHECK-UNORDERED: %[[VEC_FADD1:.*]] = fadd <vscale x 8 x float> %[[PHI]], %[[VEC_LOAD1]]
; CHECK-UNORDERED: %[[VEC_LOAD2:.*]] = load <vscale x 8 x float>, <vscale x 8 x float>		; CHECK-UNORDERED: %[[VEC_LOAD2:.*]] = load <vscale x 8 x float>, <vscale x 8 x float>
; CHECK-UNORDERED: %[[VEC_FADD2]] = fadd <vscale x 8 x float> %[[VEC_FADD1]], %[[VEC_LOAD2]]		; CHECK-UNORDERED: %[[VEC_FADD2]] = fadd <vscale x 8 x float> %[[VEC_FADD1]], %[[VEC_LOAD2]]
; CHECK-UNORDERED: middle.block		; CHECK-UNORDERED: middle.block
; CHECK-UNORDERED: %[[RDX:.*]] = call float @llvm.vector.reduce.fadd.nxv8f32(float -0.000000e+00, <vscale x 8 x float> %[[VEC_FADD2]])		; CHECK-UNORDERED: %[[RDX:.*]] = call float @llvm.vector.reduce.fadd.nxv8f32(float -0.000000e+00, <vscale x 8 x float> %[[VEC_FADD2]])
; CHECK-UNORDERED: for.body		; CHECK-UNORDERED: for.body
; CHECK-UNORDERED: %[[SUM:.]] = phi float [ %bc.merge.rdx, %scalar.ph ], [ %[[FADD2:.]], %for.body ]		; CHECK-UNORDERED: %[[SUM:.]] = phi float [ %bc.merge.rdx, %scalar.ph ], [ %[[FADD2:.]], %for.body ]
; CHECK-UNORDERED: %[[LOAD1:.]] = load float, float		; CHECK-UNORDERED: %[[LOAD1:.]] = load float, float
; CHECK-UNORDERED: %[[FADD1:.*]] = fadd float %[[SUM]], %[[LOAD1]]		; CHECK-UNORDERED: %[[FADD1:.*]] = fadd float %[[SUM]], %[[LOAD1]]
; CHECK-UNORDERED: %[[LOAD2:.]] = load float, float		; CHECK-UNORDERED: %[[LOAD2:.]] = load float, float
; CHECK-UNORDERED: %[[FADD2]] = fadd float %[[FADD1]], %[[LOAD2]]		; CHECK-UNORDERED: %[[FADD2]] = fadd float %[[FADD1]], %[[LOAD2]]
; CHECK-UNORDERED: for.end		; CHECK-UNORDERED: for.end
; CHECK-UNORDERED: %[[RET:.*]] = phi float [ %[[FADD2]], %for.body ], [ %[[RDX]], %middle.block ]		; CHECK-UNORDERED: %[[RET:.*]] = phi float [ %[[FADD2]], %for.body ], [ %[[RDX]], %middle.block ]
; CHECK-UNORDERED: ret float %[[RET]]		; CHECK-UNORDERED: ret float %[[RET]]

		; CHECK-NOT-VECTORIZED-LABEL: @fadd_multiple
		; CHECK-NOT-VECTORIZED-NOT: vector.body

entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %entry, %for.body		for.body: ; preds = %entry, %for.body
%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
%sum = phi float [ -0.000000e+00, %entry ], [ %add3, %for.body ]		%sum = phi float [ -0.000000e+00, %entry ], [ %add3, %for.body ]
%arrayidx = getelementptr inbounds float, float* %a, i64 %iv		%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
%0 = load float, float* %arrayidx, align 4		%0 = load float, float* %arrayidx, align 4
Show All 22 Lines

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll

	; RUN: opt < %s -loop-vectorize -mtriple aarch64-unknown-linux-gnu -enable-strict-reductions=false -S 2>%t \| FileCheck %s --check-prefix=CHECK-UNORDERED			; RUN: opt < %s -loop-vectorize -mtriple aarch64-unknown-linux-gnu -enable-strict-reductions=false -hints-allow-reordering=false -S 2>%t \| FileCheck %s --check-prefix=CHECK-NOT-VECTORIZED
	; RUN: opt < %s -loop-vectorize -mtriple aarch64-unknown-linux-gnu -enable-strict-reductions=true -S 2>%t \| FileCheck %s --check-prefix=CHECK-ORDERED			; RUN: opt < %s -loop-vectorize -mtriple aarch64-unknown-linux-gnu -enable-strict-reductions=false -hints-allow-reordering=true -S 2>%t \| FileCheck %s --check-prefix=CHECK-UNORDERED
				; RUN: opt < %s -loop-vectorize -mtriple aarch64-unknown-linux-gnu -enable-strict-reductions=true -hints-allow-reordering=false -S 2>%t \| FileCheck %s --check-prefix=CHECK-ORDERED
				sdesmalenUnsubmitted Not Done Reply Inline Actions The first test, `@fadd_strict`, has check lines that match the first RUN line, and the 4th RUN line, but none of the others. Why is that? And are all these RUN lines needed? sdesmalen: The first test, `@fadd_strict`, has check lines that match the first RUN line, and the 4th RUN…
				kmclaughlinAuthorUnsubmitted Done Reply Inline Actions Since the allowReordering() function returns false if `EC.getKnownMinValue() > 1`, I thought it was worth making sure that we don't vectorize a VF of 1 for at least one of the tests, which is why I added the extra RUN line to `@fadd_strict`. The RUN lines are needed so that we can pass the different VFs & interleave counts needed for each of the tests (e.g. `@fadd_strict_unroll` needs a UF > 1) and I didn't want to change the 'allow reordering' behaviour by passing hints through metadata. Though I think I could remove the `CHECK-PRED` line since the `@fadd_predicated` does rely on metadata if this would help at all? kmclaughlin: Since the allowReordering() function returns false if `EC.getKnownMinValue() > 1`, I thought it…
				sdesmalenUnsubmitted Not Done Reply Inline Actions Can you also add a RUN line for `-enable-strict-reductions=true -hints-allow-reordering=true` (which I think can reuse prefix CHECK-UNORDERED) sdesmalen: Can you also add a RUN line for `-enable-strict-reductions=true -hints-allow-reordering=true`…
				kmclaughlinAuthorUnsubmitted Done Reply Inline Actions Hi @sdesmalen, I haven't added another RUN line for `-enable-strict-reductions=true -hints-allow-reordering=true` as this will cause failures with the fadd_multiple test. For most of the tests, the output where both flags are true will match the CHECK-ORDERED lines, since we will always use strict reductions where possible if this flag is set. For fadd_multiple, we cannot use strict reductions and so the value of `-hints-allow-reordering` will change whether or not the test vectorizes. As we discussed previously, I will follow this up with a patch which ensures we only choose strict reductions if we do not allow reordering. At this point I can add a RUN line as you've suggested and reuse the `CHECK-UNORDERED` prefix. kmclaughlin: Hi @sdesmalen, I haven't added another RUN line for `-enable-strict-reductions=true -hints…
				sdesmalenUnsubmitted Not Done Reply Inline Actions Fair enough, thanks for explaining. Can you just add a FIXME above the `cl::opt` for `-enable-strict-reductions` that this flag reverses the default behaviour we have now when hints are passed? sdesmalen: Fair enough, thanks for explaining. Can you just add a FIXME above the `cl::opt` for `-enable…

	define float @fadd_strict(float* noalias nocapture readonly %a, i64 %n) {			define float @fadd_strict(float* noalias nocapture readonly %a, i64 %n) {
	; CHECK-ORDERED-LABEL: @fadd_strict			; CHECK-ORDERED-LABEL: @fadd_strict
	; CHECK-ORDERED: vector.body:			; CHECK-ORDERED: vector.body:
	; CHECK-ORDERED: %[[VEC_PHI:.]] = phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX:.]], %vector.body ]			; CHECK-ORDERED: %[[VEC_PHI:.]] = phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX:.]], %vector.body ]
	; CHECK-ORDERED: %[[LOAD:.]] = load <8 x float>, <8 x float>			; CHECK-ORDERED: %[[LOAD:.]] = load <8 x float>, <8 x float>
	; CHECK-ORDERED: %[[RDX]] = call float @llvm.vector.reduce.fadd.v8f32(float %[[VEC_PHI]], <8 x float> %[[LOAD]])			; CHECK-ORDERED: %[[RDX]] = call float @llvm.vector.reduce.fadd.v8f32(float %[[VEC_PHI]], <8 x float> %[[LOAD]])
	; CHECK-ORDERED: for.end			; CHECK-ORDERED: for.end
	Show All 10 Lines
	; CHECK-UNORDERED: %[[RDX:.*]] = call float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> %[[FADD_VEC]])			; CHECK-UNORDERED: %[[RDX:.*]] = call float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> %[[FADD_VEC]])
	; CHECK-UNORDERED: for.body			; CHECK-UNORDERED: for.body
	; CHECK-UNORDERED: %[[LOAD:.]] = load float, float			; CHECK-UNORDERED: %[[LOAD:.]] = load float, float
	; CHECK-UNORDERED: %[[FADD:.]] = fadd float %[[LOAD]], {{.}}			; CHECK-UNORDERED: %[[FADD:.]] = fadd float %[[LOAD]], {{.}}
	; CHECK-UNORDERED: for.end			; CHECK-UNORDERED: for.end
	; CHECK-UNORDERED: %[[RES:.*]] = phi float [ %[[FADD]], %for.body ], [ %[[RDX]], %middle.block ]			; CHECK-UNORDERED: %[[RES:.*]] = phi float [ %[[FADD]], %for.body ], [ %[[RDX]], %middle.block ]
	; CHECK-UNORDERED: ret float %[[RES]]			; CHECK-UNORDERED: ret float %[[RES]]

				; CHECK-NOT-VECTORIZED-LABEL: @fadd_strict
				; CHECK-NOT-VECTORIZED-NOT: vector.body

	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%sum.07 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]			%sum.07 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]
	%arrayidx = getelementptr inbounds float, float* %a, i64 %iv			%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
	%0 = load float, float* %arrayidx, align 4			%0 = load float, float* %arrayidx, align 4
	▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; CHECK-UNORDERED: %[[RDX:.*]] = call float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> %[[BIN_RDX3]])			; CHECK-UNORDERED: %[[RDX:.*]] = call float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> %[[BIN_RDX3]])
	; CHECK-UNORDERED: for.body			; CHECK-UNORDERED: for.body
	; CHECK-UNORDERED: %[[LOAD:.]] = load float, float			; CHECK-UNORDERED: %[[LOAD:.]] = load float, float
	; CHECK-UNORDERED: %[[FADD:.]] = fadd float %[[LOAD]], {{.}}			; CHECK-UNORDERED: %[[FADD:.]] = fadd float %[[LOAD]], {{.}}
	; CHECK-UNORDERED: for.end			; CHECK-UNORDERED: for.end
	; CHECK-UNORDERED: %[[RES:.*]] = phi float [ %[[FADD]], %for.body ], [ %[[RDX]], %middle.block ]			; CHECK-UNORDERED: %[[RES:.*]] = phi float [ %[[FADD]], %for.body ], [ %[[RDX]], %middle.block ]
	; CHECK-UNORDERED: ret float %[[RES]]			; CHECK-UNORDERED: ret float %[[RES]]

				; CHECK-NOT-VECTORIZED-LABEL: @fadd_strict_unroll
				; CHECK-NOT-VECTORIZED-NOT: vector.body

	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%sum.07 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]			%sum.07 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]
	%arrayidx = getelementptr inbounds float, float* %a, i64 %iv			%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
	%0 = load float, float* %arrayidx, align 4			%0 = load float, float* %arrayidx, align 4
	Show All 10 Lines
	; float sum = 0, sum2;			; float sum = 0, sum2;
	; for(int i=0; i<N; ++i) {			; for(int i=0; i<N; ++i) {
	; sum += ptr[i];			; sum += ptr[i];
	; *ptr2 = sum + 42;			; *ptr2 = sum + 42;
	; }			; }
	; return sum;			; return sum;

	define float @fadd_strict_unroll_last_val(float* noalias nocapture readonly %a, float* noalias nocapture readonly %b, i64 %n) {			define float @fadd_strict_unroll_last_val(float* noalias nocapture readonly %a, float* noalias nocapture readonly %b, i64 %n) {
	; CHECK-ORDERED-LABEL: @fadd_strict_unroll_last_val			; CHECK-ORDERED-LABEL: @fadd_strict_unroll_last_val
	; CHECK-ORDERED: vector.body			; CHECK-ORDERED: vector.body
	; CHECK-ORDERED: %[[VEC_PHI1:.]] = phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX4:.]], %vector.body ]			; CHECK-ORDERED: %[[VEC_PHI1:.]] = phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX4:.]], %vector.body ]
	; CHECK-ORDERED-NOT: phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX4]], %vector.body ]			; CHECK-ORDERED-NOT: phi float [ 0.000000e+00, %vector.ph ], [ %[[RDX4]], %vector.body ]
	; CHECK-ORDERED: %[[LOAD1:.]] = load <8 x float>, <8 x float>			; CHECK-ORDERED: %[[LOAD1:.]] = load <8 x float>, <8 x float>
	; CHECK-ORDERED: %[[LOAD2:.]] = load <8 x float>, <8 x float>			; CHECK-ORDERED: %[[LOAD2:.]] = load <8 x float>, <8 x float>
	; CHECK-ORDERED: %[[LOAD3:.]] = load <8 x float>, <8 x float>			; CHECK-ORDERED: %[[LOAD3:.]] = load <8 x float>, <8 x float>
	; CHECK-ORDERED: %[[LOAD4:.]] = load <8 x float>, <8 x float>			; CHECK-ORDERED: %[[LOAD4:.]] = load <8 x float>, <8 x float>
	; CHECK-ORDERED: %[[RDX1:.*]] = call float @llvm.vector.reduce.fadd.v8f32(float %[[VEC_PHI1]], <8 x float> %[[LOAD1]])			; CHECK-ORDERED: %[[RDX1:.*]] = call float @llvm.vector.reduce.fadd.v8f32(float %[[VEC_PHI1]], <8 x float> %[[LOAD1]])
	; CHECK-ORDERED: %[[RDX2:.*]] = call float @llvm.vector.reduce.fadd.v8f32(float %[[RDX1]], <8 x float> %[[LOAD2]])			; CHECK-ORDERED: %[[RDX2:.*]] = call float @llvm.vector.reduce.fadd.v8f32(float %[[RDX1]], <8 x float> %[[LOAD2]])
	; CHECK-ORDERED: %[[RDX3:.*]] = call float @llvm.vector.reduce.fadd.v8f32(float %[[RDX2]], <8 x float> %[[LOAD3]])			; CHECK-ORDERED: %[[RDX3:.*]] = call float @llvm.vector.reduce.fadd.v8f32(float %[[RDX2]], <8 x float> %[[LOAD3]])
	; CHECK-ORDERED: %[[RDX4]] = call float @llvm.vector.reduce.fadd.v8f32(float %[[RDX3]], <8 x float> %[[LOAD4]])			; CHECK-ORDERED: %[[RDX4]] = call float @llvm.vector.reduce.fadd.v8f32(float %[[RDX3]], <8 x float> %[[LOAD4]])
	; CHECK-ORDERED: for.body			; CHECK-ORDERED: for.body
	; CHECK-ORDERED: %[[SUM_PHI:.]] = phi float [ %[[FADD:.]], %for.body ], [ {{.*}}, %scalar.ph ]			; CHECK-ORDERED: %[[SUM_PHI:.]] = phi float [ %[[FADD:.]], %for.body ], [ {{.*}}, %scalar.ph ]
	; CHECK-ORDERED: %[[LOAD5:.]] = load float, float			; CHECK-ORDERED: %[[LOAD5:.]] = load float, float
	; CHECK-ORDERED: %[[FADD]] = fadd float %[[SUM_PHI]], %[[LOAD5]]			; CHECK-ORDERED: %[[FADD]] = fadd float %[[SUM_PHI]], %[[LOAD5]]
	; CHECK-ORDERED: for.cond.cleanup			; CHECK-ORDERED: for.cond.cleanup
	; CHECK-ORDERED: %[[FADD_LCSSA:.*]] = phi float [ %[[FADD]], %for.body ], [ %[[RDX4]], %middle.block ]			; CHECK-ORDERED: %[[FADD_LCSSA:.*]] = phi float [ %[[FADD]], %for.body ], [ %[[RDX4]], %middle.block ]
	; CHECK-ORDERED: %[[FADD_42:.*]] = fadd float %[[FADD_LCSSA]], 4.200000e+01			; CHECK-ORDERED: %[[FADD_42:.*]] = fadd float %[[FADD_LCSSA]], 4.200000e+01
	; CHECK-ORDERED: store float %[[FADD_42]], float* %b			; CHECK-ORDERED: store float %[[FADD_42]], float* %b
	; CHECK-ORDERED: for.end			; CHECK-ORDERED: for.end
	; CHECK-ORDERED: %[[SUM_LCSSA:.*]] = phi float [ %[[FADD_LCSSA]], %for.cond.cleanup ], [ 0.000000e+00, %entry ]			; CHECK-ORDERED: %[[SUM_LCSSA:.*]] = phi float [ %[[FADD_LCSSA]], %for.cond.cleanup ], [ 0.000000e+00, %entry ]
	; CHECK-ORDERED: ret float %[[SUM_LCSSA]]			; CHECK-ORDERED: ret float %[[SUM_LCSSA]]

	; CHECK-UNORDERED-LABEL: @fadd_strict_unroll_last_val			; CHECK-UNORDERED-LABEL: @fadd_strict_unroll_last_val
	; CHECK-UNORDERED: vector.body			; CHECK-UNORDERED: vector.body
	; CHECK-UNORDERED: %[[VEC_PHI1:.]] = phi <8 x float> [ <float 0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, %vector.ph ], [ %[[VEC_FADD1:.]], %vector.body ]			; CHECK-UNORDERED: %[[VEC_PHI1:.]] = phi <8 x float> [ <float 0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, %vector.ph ], [ %[[VEC_FADD1:.]], %vector.body ]
	; CHECK-UNORDERED: %[[VEC_PHI2:.]] = phi <8 x float> [ <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, %vector.ph ], [ %[[VEC_FADD2:.]], %vector.body ]			; CHECK-UNORDERED: %[[VEC_PHI2:.]] = phi <8 x float> [ <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, %vector.ph ], [ %[[VEC_FADD2:.]], %vector.body ]
	; CHECK-UNORDERED: %[[VEC_PHI3:.]] = phi <8 x float> [ <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, %vector.ph ], [ %[[VEC_FADD3:.]], %vector.body ]			; CHECK-UNORDERED: %[[VEC_PHI3:.]] = phi <8 x float> [ <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, %vector.ph ], [ %[[VEC_FADD3:.]], %vector.body ]
	; CHECK-UNORDERED: %[[VEC_PHI4:.]] = phi <8 x float> [ <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, %vector.ph ], [ %[[VEC_FADD4:.]], %vector.body ]			; CHECK-UNORDERED: %[[VEC_PHI4:.]] = phi <8 x float> [ <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, %vector.ph ], [ %[[VEC_FADD4:.]], %vector.body ]
	; CHECK-UNORDERED: %[[VEC_LOAD1:.]] = load <8 x float>, <8 x float>			; CHECK-UNORDERED: %[[VEC_LOAD1:.]] = load <8 x float>, <8 x float>
	; CHECK-UNORDERED: %[[VEC_LOAD2:.]] = load <8 x float>, <8 x float>			; CHECK-UNORDERED: %[[VEC_LOAD2:.]] = load <8 x float>, <8 x float>
	; CHECK-UNORDERED: %[[VEC_LOAD3:.]] = load <8 x float>, <8 x float>			; CHECK-UNORDERED: %[[VEC_LOAD3:.]] = load <8 x float>, <8 x float>
	; CHECK-UNORDERED: %[[VEC_LOAD4:.]] = load <8 x float>, <8 x float>			; CHECK-UNORDERED: %[[VEC_LOAD4:.]] = load <8 x float>, <8 x float>
	; CHECK-UNORDERED: %[[VEC_FADD1]] = fadd <8 x float> %[[VEC_PHI1]], %[[VEC_LOAD1]]			; CHECK-UNORDERED: %[[VEC_FADD1]] = fadd <8 x float> %[[VEC_PHI1]], %[[VEC_LOAD1]]
	; CHECK-UNORDERED: %[[VEC_FADD2]] = fadd <8 x float> %[[VEC_PHI2]], %[[VEC_LOAD2]]			; CHECK-UNORDERED: %[[VEC_FADD2]] = fadd <8 x float> %[[VEC_PHI2]], %[[VEC_LOAD2]]
	; CHECK-UNORDERED: %[[VEC_FADD3]] = fadd <8 x float> %[[VEC_PHI3]], %[[VEC_LOAD3]]			; CHECK-UNORDERED: %[[VEC_FADD3]] = fadd <8 x float> %[[VEC_PHI3]], %[[VEC_LOAD3]]
	; CHECK-UNORDERED: %[[VEC_FADD4]] = fadd <8 x float> %[[VEC_PHI4]], %[[VEC_LOAD4]]			; CHECK-UNORDERED: %[[VEC_FADD4]] = fadd <8 x float> %[[VEC_PHI4]], %[[VEC_LOAD4]]
	; CHECK-UNORDERED-NOT: call float @llvm.vector.reduce.fadd			; CHECK-UNORDERED-NOT: call float @llvm.vector.reduce.fadd
	; CHECK-UNORDERED: middle.block			; CHECK-UNORDERED: middle.block
	; CHECK-UNORDERED: %[[BIN_RDX1:.*]] = fadd <8 x float> %[[VEC_FADD2]], %[[VEC_FADD1]]			; CHECK-UNORDERED: %[[BIN_RDX1:.*]] = fadd <8 x float> %[[VEC_FADD2]], %[[VEC_FADD1]]
	; CHECK-UNORDERED: %[[BIN_RDX2:.*]] = fadd <8 x float> %[[VEC_FADD3]], %[[BIN_RDX1]]			; CHECK-UNORDERED: %[[BIN_RDX2:.*]] = fadd <8 x float> %[[VEC_FADD3]], %[[BIN_RDX1]]
	; CHECK-UNORDERED: %[[BIN_RDX3:.*]] = fadd <8 x float> %[[VEC_FADD4]], %[[BIN_RDX2]]			; CHECK-UNORDERED: %[[BIN_RDX3:.*]] = fadd <8 x float> %[[VEC_FADD4]], %[[BIN_RDX2]]
	; CHECK-UNORDERED: %[[RDX:.*]] = call float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> %[[BIN_RDX3]])			; CHECK-UNORDERED: %[[RDX:.*]] = call float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> %[[BIN_RDX3]])
	; CHECK-UNORDERED: for.body			; CHECK-UNORDERED: for.body
	; CHECK-UNORDERED: %[[LOAD:.]] = load float, float			; CHECK-UNORDERED: %[[LOAD:.]] = load float, float
	; CHECK-UNORDERED: %[[FADD:.]] = fadd float {{.}}, %[[LOAD]]			; CHECK-UNORDERED: %[[FADD:.]] = fadd float {{.}}, %[[LOAD]]
	; CHECK-UNORDERED: for.cond.cleanup			; CHECK-UNORDERED: for.cond.cleanup
	; CHECK-UNORDERED: %[[FADD_LCSSA:.*]] = phi float [ %[[FADD]], %for.body ], [ %[[RDX]], %middle.block ]			; CHECK-UNORDERED: %[[FADD_LCSSA:.*]] = phi float [ %[[FADD]], %for.body ], [ %[[RDX]], %middle.block ]
	; CHECK-UNORDERED: %[[FADD_42:.*]] = fadd float %[[FADD_LCSSA]], 4.200000e+01			; CHECK-UNORDERED: %[[FADD_42:.*]] = fadd float %[[FADD_LCSSA]], 4.200000e+01
	; CHECK-UNORDERED: store float %[[FADD_42]], float* %b			; CHECK-UNORDERED: store float %[[FADD_42]], float* %b
	; CHECK-UNORDERED: for.end			; CHECK-UNORDERED: for.end
	; CHECK-UNORDERED: %[[SUM_LCSSA:.*]] = phi float [ %[[FADD_LCSSA]], %for.cond.cleanup ], [ 0.000000e+00, %entry ]			; CHECK-UNORDERED: %[[SUM_LCSSA:.*]] = phi float [ %[[FADD_LCSSA]], %for.cond.cleanup ], [ 0.000000e+00, %entry ]
	; CHECK-UNORDERED: ret float %[[SUM_LCSSA]]			; CHECK-UNORDERED: ret float %[[SUM_LCSSA]]

				; CHECK-NOT-VECTORIZED-LABEL: @fadd_strict_unroll_last_val
				; CHECK-NOT-VECTORIZED-NOT: vector.body

				david-armUnsubmitted Done Reply Inline Actions Hi @kmclaughlin, I think maybe this is meant to be CHECK-VF8UF4? david-arm: Hi @kmclaughlin, I think maybe this is meant to be CHECK-VF8UF4?
	entry:			entry:
	%cmp = icmp sgt i64 %n, 0			%cmp = icmp sgt i64 %n, 0
	br i1 %cmp, label %for.body, label %for.end			br i1 %cmp, label %for.body, label %for.end

	for.body:			for.body:
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%sum = phi float [ 0.000000e+00, %entry ], [ %fadd, %for.body ]			%sum = phi float [ 0.000000e+00, %entry ], [ %fadd, %for.body ]
	%arrayidx = getelementptr inbounds float, float* %a, i64 %iv			%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	; CHECK-UNORDERED: %[[FADD2:.]] = fadd float %[[LOAD2]], {{.}}			; CHECK-UNORDERED: %[[FADD2:.]] = fadd float %[[LOAD2]], {{.}}
	; CHECK-UNORDERED: for.end			; CHECK-UNORDERED: for.end
	; CHECK-UNORDERED: %[[SUM1:.*]] = phi float [ %[[FADD1]], %for.body ], [ %[[RDX1]], %middle.block ]			; CHECK-UNORDERED: %[[SUM1:.*]] = phi float [ %[[FADD1]], %for.body ], [ %[[RDX1]], %middle.block ]
	; CHECK-UNORDERED: %[[SUM2:.*]] = phi float [ %[[FADD2]], %for.body ], [ %[[RDX2]], %middle.block ]			; CHECK-UNORDERED: %[[SUM2:.*]] = phi float [ %[[FADD2]], %for.body ], [ %[[RDX2]], %middle.block ]
	; CHECK-UNORDERED: store float %[[SUM1]]			; CHECK-UNORDERED: store float %[[SUM1]]
	; CHECK-UNORDERED: store float %[[SUM2]]			; CHECK-UNORDERED: store float %[[SUM2]]
	; CHECK-UNORDERED: ret void			; CHECK-UNORDERED: ret void

				; CHECK-NOT-VECTORIZED-LABEL: @fadd_strict_interleave
				; CHECK-NOT-VECTORIZED-NOT: vector.body

	entry:			entry:
	%arrayidxa = getelementptr inbounds float, float* %a, i64 1			%arrayidxa = getelementptr inbounds float, float* %a, i64 1
	%a1 = load float, float* %a, align 4			%a1 = load float, float* %a, align 4
	%a2 = load float, float* %arrayidxa, align 4			%a2 = load float, float* %arrayidxa, align 4
	br label %for.body			br label %for.body

	for.body:			for.body:
	%add.phi1 = phi float [ %a2, %entry ], [ %add2, %for.body ]			%add.phi1 = phi float [ %a2, %entry ], [ %add2, %for.body ]
	▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	; CHECK-UNORDERED: %[[FADD1:.*]] = fadd float %[[LOAD1]], %[[LOAD2]]			; CHECK-UNORDERED: %[[FADD1:.*]] = fadd float %[[LOAD1]], %[[LOAD2]]
	; CHECK-UNORDERED: %[[FADD2:.]] = fadd float {{.}}, %[[FADD1]]			; CHECK-UNORDERED: %[[FADD2:.]] = fadd float {{.}}, %[[FADD1]]
	; CHECK-UNORDERED: for.end.loopexit			; CHECK-UNORDERED: for.end.loopexit
	; CHECK-UNORDERED: %[[EXIT:.*]] = phi float [ %[[FADD2]], %for.body ], [ %[[RDX]], %middle.block ]			; CHECK-UNORDERED: %[[EXIT:.*]] = phi float [ %[[FADD2]], %for.body ], [ %[[RDX]], %middle.block ]
	; CHECK-UNORDERED: for.end			; CHECK-UNORDERED: for.end
	; CHECK-UNORDERED: %[[SUM:.*]] = phi float [ 0.000000e+00, %entry ], [ %[[EXIT]], %for.end.loopexit ]			; CHECK-UNORDERED: %[[SUM:.*]] = phi float [ 0.000000e+00, %entry ], [ %[[EXIT]], %for.end.loopexit ]
	; CHECK-UNORDERED: ret float %[[SUM]]			; CHECK-UNORDERED: ret float %[[SUM]]

				; CHECK-NOT-VECTORIZED-LABEL: @fadd_of_sum
				; CHECK-NOT-VECTORIZED-NOT: vector.body

	entry:			entry:
	%arrayidx = getelementptr inbounds float, float* %a, i64 1			%arrayidx = getelementptr inbounds float, float* %a, i64 1
	%0 = load float, float* %arrayidx, align 4			%0 = load float, float* %arrayidx, align 4
	%cmp1 = fcmp ogt float %0, 5.000000e-01			%cmp1 = fcmp ogt float %0, 5.000000e-01
	br i1 %cmp1, label %for.body, label %for.end			br i1 %cmp1, label %for.body, label %for.end

	for.body: ; preds = %for.body			for.body: ; preds = %for.body
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; CHECK-UNORDERED: %[[LOAD3:.]] = load float, float			; CHECK-UNORDERED: %[[LOAD3:.]] = load float, float
	; CHECK-UNORDERED: for.inc			; CHECK-UNORDERED: for.inc
	; CHECK-UNORDERED: %[[PHI:.*]] = phi float [ %[[LOAD3]], %if.then ], [ 3.000000e+00, %for.body ]			; CHECK-UNORDERED: %[[PHI:.*]] = phi float [ %[[LOAD3]], %if.then ], [ 3.000000e+00, %for.body ]
	; CHECK-UNORDERED: %[[FADD]] = fadd float %[[RES_PHI]], %[[PHI]]			; CHECK-UNORDERED: %[[FADD]] = fadd float %[[RES_PHI]], %[[PHI]]
	; CHECK-UNORDERED: for.end			; CHECK-UNORDERED: for.end
	; CHECK-UNORDERED: %[[RDX_PHI:.*]] = phi float [ %[[FADD]], %for.inc ], [ %[[RDX]], %middle.block ]			; CHECK-UNORDERED: %[[RDX_PHI:.*]] = phi float [ %[[FADD]], %for.inc ], [ %[[RDX]], %middle.block ]
	; CHECK-UNORDERED: ret float %[[RDX_PHI]]			; CHECK-UNORDERED: ret float %[[RDX_PHI]]

				; CHECK-NOT-VECTORIZED-LABEL: @fadd_conditional
				; CHECK-NOT-VECTORIZED-NOT: vector.body

	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %for.body			for.body: ; preds = %for.body
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.inc ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.inc ]
	%res = phi float [ 1.000000e+00, %entry ], [ %fadd, %for.inc ]			%res = phi float [ 1.000000e+00, %entry ], [ %fadd, %for.inc ]
	%arrayidx = getelementptr inbounds float, float* %b, i64 %iv			%arrayidx = getelementptr inbounds float, float* %b, i64 %iv
	%0 = load float, float* %arrayidx, align 4			%0 = load float, float* %arrayidx, align 4
	▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	; CHECK-UNORDERED: %[[RDX:.*]] = call float @llvm.vector.reduce.fadd.v2f32(float -0.000000e+00, <2 x float> %[[MASK]])			; CHECK-UNORDERED: %[[RDX:.*]] = call float @llvm.vector.reduce.fadd.v2f32(float -0.000000e+00, <2 x float> %[[MASK]])
	; CHECK-UNORDERED: for.body			; CHECK-UNORDERED: for.body
	; CHECK-UNORDERED: %[[LOAD:.]] = load float, float			; CHECK-UNORDERED: %[[LOAD:.]] = load float, float
	; CHECK-UNORDERED: %[[FADD2:.]] = fadd float {{.}}, %[[LOAD]]			; CHECK-UNORDERED: %[[FADD2:.]] = fadd float {{.}}, %[[LOAD]]
	; CHECK-UNORDERED: for.end			; CHECK-UNORDERED: for.end
	; CHECK-UNORDERED: %[[SUM:.*]] = phi float [ %[[FADD2]], %for.body ], [ %[[RDX]], %middle.block ]			; CHECK-UNORDERED: %[[SUM:.*]] = phi float [ %[[FADD2]], %for.body ], [ %[[RDX]], %middle.block ]
	; CHECK-UNORDERED: ret float %[[SUM]]			; CHECK-UNORDERED: ret float %[[SUM]]

				; CHECK-NOT-VECTORIZED-LABEL: @fadd_predicated
				; CHECK-NOT-VECTORIZED-NOT: vector.body

	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%iv = phi i64 [ %iv.next, %for.body ], [ 0, %entry ]			%iv = phi i64 [ %iv.next, %for.body ], [ 0, %entry ]
	%sum.02 = phi float [ %l7, %for.body ], [ 0.000000e+00, %entry ]			%sum.02 = phi float [ %l7, %for.body ], [ 0.000000e+00, %entry ]
	%l2 = getelementptr inbounds float, float* %a, i64 %iv			%l2 = getelementptr inbounds float, float* %a, i64 %iv
	%l3 = load float, float* %l2, align 4			%l3 = load float, float* %l2, align 4
	%l7 = fadd float %sum.02, %l3			%l7 = fadd float %sum.02, %l3
	%iv.next = add i64 %iv, 1			%iv.next = add i64 %iv, 1
	%exitcond = icmp eq i64 %iv.next, %n			%exitcond = icmp eq i64 %iv.next, %n
	br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !3			br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !3

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	%sum.0.lcssa = phi float [ %l7, %for.body ]			%sum.0.lcssa = phi float [ %l7, %for.body ]
	ret float %sum.0.lcssa			ret float %sum.0.lcssa
	}			}

	; Negative test - loop contains multiple fadds which we cannot safely reorder			; Negative test - loop contains multiple fadds which we cannot safely reorder
	; Note: This test vectorizes the loop with a non-strict implementation, which reorders the FAdd operations.
	; This is happening because we are using hints, where allowReordering returns true.
	define float @fadd_multiple(float* noalias nocapture %a, float* noalias nocapture %b, i64 %n) {			define float @fadd_multiple(float* noalias nocapture %a, float* noalias nocapture %b, i64 %n) {
	; CHECK-ORDERED-LABEL: @fadd_multiple			; CHECK-ORDERED-LABEL: @fadd_multiple
	; CHECK-ORDERED: vector.body			; CHECK-ORDERED-NOT: vector.body
	; CHECK-ORDERED: %[[PHI:.]] = phi <8 x float> [ <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, %vector.ph ], [ %[[VEC_FADD2:.]], %vector.body ]
	; CHECK-ORDERED: %[[VEC_LOAD1:.*]] = load <8 x float>, <8 x float>
	; CHECK-ORDERED: %[[VEC_FADD1:.*]] = fadd <8 x float> %[[PHI]], %[[VEC_LOAD1]]
	; CHECK-ORDERED: %[[VEC_LOAD2:.*]] = load <8 x float>, <8 x float>
	; CHECK-ORDERED: %[[VEC_FADD2]] = fadd <8 x float> %[[VEC_FADD1]], %[[VEC_LOAD2]]
	; CHECK-ORDERED: middle.block
	; CHECK-ORDERED: %[[RDX:.*]] = call float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> %[[VEC_FADD2]])
	; CHECK-ORDERED: for.body
	; CHECK-ORDERED: %[[SUM:.]] = phi float [ %bc.merge.rdx, %scalar.ph ], [ %[[FADD2:.]], %for.body ]
	; CHECK-ORDERED: %[[LOAD1:.]] = load float, float
	; CHECK-ORDERED: %[[FADD1:.*]] = fadd float %sum, %[[LOAD1]]
	; CHECK-ORDERED: %[[LOAD2:.]] = load float, float
	; CHECK-ORDERED: %[[FADD2]] = fadd float %[[FADD1]], %[[LOAD2]]
	; CHECK-ORDERED: for.end
	; CHECK-ORDERED: %[[RET:.*]] = phi float [ %[[FADD2]], %for.body ], [ %[[RDX]], %middle.block ]
	; CHECK-ORDERED: ret float %[[RET]]

	; CHECK-UNORDERED-LABEL: @fadd_multiple			; CHECK-UNORDERED-LABEL: @fadd_multiple
	; CHECK-UNORDERED: vector.body			; CHECK-UNORDERED: vector.body
	; CHECK-UNORDERED: %[[PHI:.]] = phi <8 x float> [ <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, %vector.ph ], [ %[[VEC_FADD2:.]], %vector.body ]			; CHECK-UNORDERED: %[[PHI:.]] = phi <8 x float> [ <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, %vector.ph ], [ %[[VEC_FADD2:.]], %vector.body ]
	; CHECK-UNORDERED: %[[VEC_LOAD1:.*]] = load <8 x float>, <8 x float>			; CHECK-UNORDERED: %[[VEC_LOAD1:.*]] = load <8 x float>, <8 x float>
	; CHECK-UNORDERED: %[[VEC_FADD1:.*]] = fadd <8 x float> %[[PHI]], %[[VEC_LOAD1]]			; CHECK-UNORDERED: %[[VEC_FADD1:.*]] = fadd <8 x float> %[[PHI]], %[[VEC_LOAD1]]
	; CHECK-UNORDERED: %[[VEC_LOAD2:.*]] = load <8 x float>, <8 x float>			; CHECK-UNORDERED: %[[VEC_LOAD2:.*]] = load <8 x float>, <8 x float>
	; CHECK-UNORDERED: %[[VEC_FADD2]] = fadd <8 x float> %[[VEC_FADD1]], %[[VEC_LOAD2]]			; CHECK-UNORDERED: %[[VEC_FADD2]] = fadd <8 x float> %[[VEC_FADD1]], %[[VEC_LOAD2]]
	; CHECK-UNORDERED: middle.block			; CHECK-UNORDERED: middle.block
	; CHECK-UNORDERED: %[[RDX:.*]] = call float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> %[[VEC_FADD2]])			; CHECK-UNORDERED: %[[RDX:.*]] = call float @llvm.vector.reduce.fadd.v8f32(float -0.000000e+00, <8 x float> %[[VEC_FADD2]])
	; CHECK-UNORDERED: for.body			; CHECK-UNORDERED: for.body
	; CHECK-UNORDERED: %[[SUM:.]] = phi float [ %bc.merge.rdx, %scalar.ph ], [ %[[FADD2:.]], %for.body ]			; CHECK-UNORDERED: %[[SUM:.]] = phi float [ %bc.merge.rdx, %scalar.ph ], [ %[[FADD2:.]], %for.body ]
	; CHECK-UNORDERED: %[[LOAD1:.]] = load float, float			; CHECK-UNORDERED: %[[LOAD1:.]] = load float, float
	; CHECK-UNORDERED: %[[FADD1:.*]] = fadd float %sum, %[[LOAD1]]			; CHECK-UNORDERED: %[[FADD1:.*]] = fadd float %sum, %[[LOAD1]]
	; CHECK-UNORDERED: %[[LOAD2:.]] = load float, float			; CHECK-UNORDERED: %[[LOAD2:.]] = load float, float
	; CHECK-UNORDERED: %[[FADD2]] = fadd float %[[FADD1]], %[[LOAD2]]			; CHECK-UNORDERED: %[[FADD2]] = fadd float %[[FADD1]], %[[LOAD2]]
	; CHECK-UNORDERED: for.end			; CHECK-UNORDERED: for.end
	; CHECK-UNORDERED: %[[RET:.*]] = phi float [ %[[FADD2]], %for.body ], [ %[[RDX]], %middle.block ]			; CHECK-UNORDERED: %[[RET:.*]] = phi float [ %[[FADD2]], %for.body ], [ %[[RDX]], %middle.block ]
	; CHECK-UNORDERED: ret float %[[RET]]			; CHECK-UNORDERED: ret float %[[RET]]

				; CHECK-NOT-VECTORIZED-LABEL: @fadd_multiple
				; CHECK-NOT-VECTORIZED-NOT: vector.body

	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]			%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
	%sum = phi float [ -0.000000e+00, %entry ], [ %add3, %for.body ]			%sum = phi float [ -0.000000e+00, %entry ], [ %add3, %for.body ]
	%arrayidx = getelementptr inbounds float, float* %a, i64 %iv			%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
	%0 = load float, float* %arrayidx, align 4			%0 = load float, float* %arrayidx, align 4
	%add = fadd float %sum, %0			%add = fadd float %sum, %0
	%arrayidx2 = getelementptr inbounds float, float* %b, i64 %iv			%arrayidx2 = getelementptr inbounds float, float* %b, i64 %iv
	%1 = load float, float* %arrayidx2, align 4			%1 = load float, float* %arrayidx2, align 4
	%add3 = fadd float %add, %1			%add3 = fadd float %add, %1
	%iv.next = add nuw nsw i64 %iv, 1			%iv.next = add nuw nsw i64 %iv, 1
	%exitcond.not = icmp eq i64 %iv.next, %n			%exitcond.not = icmp eq i64 %iv.next, %n
	br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0			br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	%rdx = phi float [ %add3, %for.body ]			%rdx = phi float [ %add3, %for.body ]
	ret float %rdx			ret float %rdx
	}			}

				; Tests with both a floating point reduction & induction, e.g.
				;
				;float fp_iv_rdx_loop(float values, float init, float __restrict__ A, int N) {
				; float fp_inc = 2.0;
				; float x = init;
				; float sum = 0.0;
				; for (int i=0; i < N; ++i) {
				; A[i] = x;
				; x += fp_inc;
				; sum += values[i];
				; }
				; return sum;
				;}
				;
				; Note: These tests do not use metadata hints, and as such we should not expect the CHECK-UNORDERED case to vectorize, even
				; with the -hints-allow-reordering flag set to true.

				; Strict reduction could be performed in-loop, but ordered FP induction variables are not supported
				define float @induction_and_reduction(float* nocapture readonly %values, float %init, float* noalias nocapture %A, i64 %N) {
				; CHECK-ORDERED-LABEL: @induction_and_reduction
				; CHECK-ORDERED-NOT: vector.body

				; CHECK-UNORDERED-LABEL: @induction_and_reduction
				; CHECK-UNORDERED-NOT: vector.body

				; CHECK-NOT-VECTORIZED-LABEL: @induction_and_reduction
				; CHECK-NOT-VECTORIZED-NOT: vector.body

				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.015 = phi float [ 0.000000e+00, %entry ], [ %add3, %for.body ]
				%x.014 = phi float [ %init, %entry ], [ %add, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %A, i64 %iv
				store float %x.014, float* %arrayidx, align 4
				%add = fadd float %x.014, 2.000000e+00
				%arrayidx2 = getelementptr inbounds float, float* %values, i64 %iv
				%0 = load float, float* %arrayidx2, align 4
				%add3 = fadd float %sum.015, %0
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %N
				br i1 %exitcond.not, label %for.end, label %for.body

				for.end:
				ret float %add3
				}

				; As above, but with the FP induction being unordered (fast) the loop can be vectorized with strict reductions
				define float @fast_induction_and_reduction(float* nocapture readonly %values, float %init, float* noalias nocapture %A, i64 %N) {
				; CHECK-ORDERED-LABEL: @fast_induction_and_reduction
				; CHECK-ORDERED: vector.ph
				; CHECK-ORDERED: %[[INDUCTION:.]] = fadd fast <4 x float> {{.}}, <float 0.000000e+00, float 2.000000e+00, float 4.000000e+00, float 6.000000e+00>
				; CHECK-ORDERED: vector.body
				; CHECK-ORDERED: %[[RDX_PHI:.]] = phi float [ 0.000000e+00, %vector.ph ], [ %[[FADD2:.]], %vector.body ]
				; CHECK-ORDERED: %[[IND_PHI:.]] = phi <4 x float> [ %[[INDUCTION]], %vector.ph ], [ %[[VEC_IND_NEXT:.]], %vector.body ]
				; CHECK-ORDERED: %[[STEP_ADD:.*]] = fadd fast <4 x float> %[[IND_PHI]], <float 8.000000e+00, float 8.000000e+00, float 8.000000e+00, float 8.000000e+00>
				; CHECK-ORDERED: %[[LOAD1:.]] = load <4 x float>, <4 x float>
				; CHECK-ORDERED: %[[LOAD2:.]] = load <4 x float>, <4 x float>
				; CHECK-ORDERED: %[[FADD1:.*]] = call float @llvm.vector.reduce.fadd.v4f32(float %[[RDX_PHI]], <4 x float> %[[LOAD1]])
				; CHECK-ORDERED: %[[FADD2]] = call float @llvm.vector.reduce.fadd.v4f32(float %[[FADD1]], <4 x float> %[[LOAD2]])
				; CHECK-ORDERED: %[[VEC_IND_NEXT]] = fadd fast <4 x float> %[[STEP_ADD]], <float 8.000000e+00, float 8.000000e+00, float 8.000000e+00, float 8.000000e+00>
				; CHECK-ORDERED: for.body
				; CHECK-ORDERED: %[[RDX_SUM_PHI:.]] = phi float [ {{.}}, %scalar.ph ], [ %[[FADD3:.*]], %for.body ]
				; CHECK-ORDERED: %[[IND_SUM_PHI:.]] = phi fast float [ {{.}}, %scalar.ph ], [ %[[ADD_IND:.*]], %for.body ]
				; CHECK-ORDERED: store float %[[IND_SUM_PHI]], float*
				; CHECK-ORDERED: %[[ADD_IND]] = fadd fast float %[[IND_SUM_PHI]], 2.000000e+00
				; CHECK-ORDERED: %[[LOAD3:.]] = load float, float
				; CHECK-ORDERED: %[[FADD3]] = fadd float %[[RDX_SUM_PHI]], %[[LOAD3]]
				; CHECK-ORDERED: for.end
				; CHECK-ORDERED: %[[RES_PHI:.*]] = phi float [ %[[FADD3]], %for.body ], [ %[[FADD2]], %middle.block ]
				; CHECK-ORDERED: ret float %[[RES_PHI]]

				; CHECK-UNORDERED-LABEL: @fast_induction_and_reduction
				; CHECK-UNORDERED-NOT: vector.body

				; CHECK-NOT-VECTORIZED-LABEL: @fast_induction_and_reduction
				; CHECK-NOT-VECTORIZED-NOT: vector.body

				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.015 = phi float [ 0.000000e+00, %entry ], [ %add3, %for.body ]
				%x.014 = phi fast float [ %init, %entry ], [ %add, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %A, i64 %iv
				store float %x.014, float* %arrayidx, align 4
				%add = fadd fast float %x.014, 2.000000e+00
				%arrayidx2 = getelementptr inbounds float, float* %values, i64 %iv
				%0 = load float, float* %arrayidx2, align 4
				%add3 = fadd float %sum.015, %0
				david-armUnsubmitted Not Done Reply Inline Actions Are these new tests missing hints that the other tests seem to use? I just wondered if it was better to be consistent here that's all. The reason I mention this is because I was expecting the UNORDERED case to vectorise due to the `-hints-allow-reordering=true` flag. david-arm: Are these new tests missing hints that the other tests seem to use? I just wondered if it was…
				david-armUnsubmitted Not Done Reply Inline Actions I think I see now @kmclaughlin - you're testing the productisation of `-enable-strict-reductions` so you were adding some tests deliberately without hints, which also makes sense. In this case I'd also be happy if you left these tests as they are and just added some comments explaining why we expect the CHECK-UNORDERED case to not vectorise. david-arm: I think I see now @kmclaughlin - you're testing the productisation of `-enable-strict…
				kmclaughlinAuthorUnsubmitted Done Reply Inline Actions Hi @david-arm, I've added a comment above these tests to explain why the CHECK-UNORDERED case shouldn't vectorize. kmclaughlin: Hi @david-arm, I've added a comment above these tests to explain why the CHECK-UNORDERED case…
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %N
				br i1 %exitcond.not, label %for.end, label %for.body

				for.end:
				ret float %add3
				}

				; The FP induction is fast, but here we can't vectorize as only one of the reductions is an FAdd that can be performed in-loop
				define float @fast_induction_unordered_reduction(float* nocapture readonly %values, float %init, float* noalias nocapture %A, float* noalias nocapture %B, i64 %N) {

				; CHECK-ORDERED-LABEL: @fast_induction_unordered_reduction
				; CHECK-ORDERED-NOT: vector.body

				; CHECK-UNORDERED-LABEL: @fast_induction_unordered_reduction
				; CHECK-UNORDERED-NOT: vector.body

				; CHECK-NOT-VECTORIZED-LABEL: @fast_induction_unordered_reduction
				; CHECK-NOT-VECTORIZED-NOT: vector.body

				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum2.023 = phi float [ 3.000000e+00, %entry ], [ %mul, %for.body ]
				%sum.022 = phi float [ 0.000000e+00, %entry ], [ %add3, %for.body ]
				%x.021 = phi float [ %init, %entry ], [ %add, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %A, i64 %iv
				store float %x.021, float* %arrayidx, align 4
				%add = fadd fast float %x.021, 2.000000e+00
				%arrayidx2 = getelementptr inbounds float, float* %values, i64 %iv
				%0 = load float, float* %arrayidx2, align 4
				%add3 = fadd float %sum.022, %0
				%mul = fmul float %sum2.023, %0
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %N
				br i1 %exitcond.not, label %for.end, label %for.body

				for.end:
				%add6 = fadd float %add3, %mul
				ret float %add6
				}

	!0 = distinct !{!0, !4, !7, !9}			!0 = distinct !{!0, !4, !7, !9}
	!1 = distinct !{!1, !4, !8, !9}			!1 = distinct !{!1, !4, !8, !9}
	!2 = distinct !{!2, !5, !7, !9}			!2 = distinct !{!2, !5, !7, !9}
	!3 = distinct !{!3, !6, !7, !9, !10}			!3 = distinct !{!3, !6, !7, !9, !10}
	!4 = !{!"llvm.loop.vectorize.width", i32 8}			!4 = !{!"llvm.loop.vectorize.width", i32 8}
	!5 = !{!"llvm.loop.vectorize.width", i32 4}			!5 = !{!"llvm.loop.vectorize.width", i32 4}
	!6 = !{!"llvm.loop.vectorize.width", i32 2}			!6 = !{!"llvm.loop.vectorize.width", i32 2}
	!7 = !{!"llvm.loop.interleave.count", i32 1}			!7 = !{!"llvm.loop.interleave.count", i32 1}
	!8 = !{!"llvm.loop.interleave.count", i32 4}			!8 = !{!"llvm.loop.interleave.count", i32 4}
	!9 = !{!"llvm.loop.vectorize.enable", i1 true}			!9 = !{!"llvm.loop.vectorize.enable", i1 true}
	!10 = !{!"llvm.loop.vectorize.predicate.enable", i1 true}			!10 = !{!"llvm.loop.vectorize.predicate.enable", i1 true}
				sdesmalenUnsubmitted Done Reply Inline Actions nit: this PHI is unnecessary? (same for the tests below) sdesmalen: nit: this PHI is unnecessary? (same for the tests below)
				sdesmalenUnsubmitted Done Reply Inline Actions nit: ; Strict reduction could be performed in-loop, but ordered FP induction variables are not supported. sdesmalen: nit: ; Strict reduction could be performed in-loop, but ordered FP induction variables are…
				sdesmalenUnsubmitted Done Reply Inline Actions Can you be more explicit in the comment? i.e. ; As above, but with the FP induction being unordered (fast), the loop can be vectorized. sdesmalen: Can you be more explicit in the comment? i.e. ; As above, but with the FP induction being…

This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Enable strict reductions when allowReordering() returns falseClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 347943

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll

llvm/test/Transforms/LoopVectorize/AArch64/strict-fadd.ll

[LoopVectorize] Enable strict reductions when allowReordering() returns false
ClosedPublic