This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
2
LoopVectorizationPlanner.h
5/12
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
2/9
interleaved-pointer-runtime-check-unprofitable.ll

Differential D122126

[LoopVectorize] Don't interleave when the number of runtime checks exceeds the threshold
ClosedPublic

Authored by TiehuZhang on Mar 21 2022, 4:41 AM.

Download Raw Diff

Details

Reviewers

dmgreen
spatel
mdchen
sdesmalen
fhahn

Commits

rG3ed9f603fd59: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the…

Summary

The runtime check threshold should also restrict interleave count. Otherwise, too many runtime checks will be generated for some cases.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,680 ms	x64 debian > cfi-standalone-lld-x86_64.cross-dso/icall::dlopen.cpp
	60,160 ms	x64 debian > libFuzzer.libFuzzer::cleanse.test
	60,150 ms	x64 debian > libFuzzer.libFuzzer::compressed.test
	60,290 ms	x64 debian > libFuzzer.libFuzzer::cross_over_uniform_dist.test
	60,070 ms	x64 debian > libFuzzer.libFuzzer::fork.test
		View Full Test Results (10 Failed)

Event Timeline

TiehuZhang created this revision.Mar 21 2022, 4:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2022, 4:41 AM

Herald added subscribers: bmahjour, hiraditya. · View Herald Transcript

TiehuZhang requested review of this revision.Mar 21 2022, 4:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2022, 4:41 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B155353: Diff 416894.Mar 21 2022, 5:43 AM

ping

dmgreen added a reviewer: fhahn.Mar 23 2022, 2:21 AM

dmgreen added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7594	isTooManyRuntimeChecks -> hasTooManyRuntimeChecks ?
7675	Why does this check SelectedVF.Width > 1? Can we just remove it or does that not help?
llvm/test/Transforms/LoopVectorize/interleaved-pointer-runtime-check-unprofitable.ll
1	If you use -debug, it needs to REQUIRES: asserts It may be a simpler test to just show that the codegen is not vectorized/interleaved though, with a comment explaining that it would be too much overhead.
3	CHECK-LABEL
10	There are quite a lot of extra blocks in this test. Can a lot of them be removed?

fhahn added inline comments.Mar 23 2022, 2:28 AM

llvm/test/Transforms/LoopVectorize/interleaved-pointer-runtime-check-unprofitable.ll
76	Please avoid using `undef` in the test unless necessary.

TiehuZhang updated this revision to Diff 417901.Mar 24 2022, 6:06 AM

Harbormaster completed remote builds in B156043: Diff 417901.Mar 24 2022, 6:56 AM

TiehuZhang updated this revision to Diff 418565.Mar 28 2022, 6:17 AM

Why does this check SelectedVF.Width > 1? Can we just remove it or does that not help?

Does this mean this didn't help? It seems that code would not alter the IC in any case, as it is already returning VectorizationFactor::Disabled.

Just as a point of cleanup, is it possible to move the logic into selectInterleaveCount, so that it returns a value of 1 if there are too many runtime checks? I'm not sure all the info needed would be easily available inside the costmodel though.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7596	Should this be checking Hints.allowReordering like the other uses of PragmaVectorizeMemoryCheckThreshold?

Harbormaster completed remote builds in B156547: Diff 418565.Mar 29 2022, 10:58 AM

TiehuZhang updated this revision to Diff 419321.Mar 30 2022, 8:31 PM

In D122126#3413513, @dmgreen wrote:

Why does this check SelectedVF.Width > 1? Can we just remove it or does that not help?

Does this mean this didn't help? It seems that code would not alter the IC in any case, as it is already returning VectorizationFactor::Disabled.

Just as a point of cleanup, is it possible to move the logic into selectInterleaveCount, so that it returns a value of 1 if there are too many runtime checks? I'm not sure all the info needed would be easily available inside the costmodel though.

Requirements can not be accessed in selectInterleaveCount, so additional parameters need to be added to selectInterleaveCount. Do you think it is necessary?<br>
I'm not sure if I need to keep SelectedVF.Width.getKnownMinValue() > 1, but if it is not necessary, we could remove it in a separate patch.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7675	Probably to reduce compilation time (although the impact is small) ？
llvm/test/Transforms/LoopVectorize/interleaved-pointer-runtime-check-unprofitable.ll
1	Done, thanks very much!
76	Done, thanks very much!

Requirements can not be accessed in selectInterleaveCount, so additional parameters need to be added to selectInterleaveCount. Do you think it is necessary?<br>

Yeah - it might need to pass Requirements.getNumRuntimePointerChecks() through to selectInterleaveCount. I think it's OK as you have it right now, but Florian (or anyone else) can complain if they disagree.

I'm not sure if I need to keep SelectedVF.Width.getKnownMinValue() > 1, but if it is not necessary, we could remove it in a separate patch.

I was just hoping that if it got to that block then it would prevent the interleaving. It looks like it doesn't work like that though.

LGTM. Perhaps wait a day or two in case anyone else has comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7681–7682	Maybe just use a single if now: `if (SelectedVF.Width.getKnownMinValue() > 1 && hasTooManyRuntimeChecks()) {`

This revision is now accepted and ready to land.Mar 31 2022, 4:25 AM

TiehuZhang updated this revision to Diff 419415.Mar 31 2022, 6:09 AM

Harbormaster completed remote builds in B157156: Diff 419415.Mar 31 2022, 11:56 AM

fhahn added inline comments.Mar 31 2022, 2:51 PM

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
322	It would be good to document the helper. Also, `requiresTooManyRuntimeChecks` may be slightly better, because `has` seems to imply that the checks are already there to me.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10470	This check here should be sufficient, there should be no need to also check in `selectInterleaveCount`. Could you just move the remark generation & early exit from `::plan` here? You might want to skip those checks if there's a UserVF or UserIC used, with those I think we should always vectorize if possible. It also might be good to add a check line to your test which forces an interleave count > 1.
llvm/test/Transforms/LoopVectorize/interleaved-pointer-runtime-check-unprofitable.ll
2	As this relies on the PPC cost-model, this needs to go into the `PowerPC` subdirectory.
13	nit: can change `%z_e_8676_1962` to `i64` and remove the first load. Also, names could be tidied up a bit.
14	nit: it would be good to clean up the names of the blocks a bit more.

In D122126#3418927, @dmgreen wrote:

Requirements can not be accessed in selectInterleaveCount, so additional parameters need to be added to selectInterleaveCount. Do you think it is necessary?<br>

Yeah - it might need to pass Requirements.getNumRuntimePointerChecks() through to selectInterleaveCount. I think it's OK as you have it right now, but Florian (or anyone else) can complain if they disagree.

I'm not sure if I need to keep SelectedVF.Width.getKnownMinValue() > 1, but if it is not necessary, we could remove it in a separate patch.

I was just hoping that if it got to that block then it would prevent the interleaving. It looks like it doesn't work like that though.

LGTM. Perhaps wait a day or two in case anyone else has comments.

I think SelectedVF.Width.getKnownMinValue() > 1 is used to avoid the check when VF is scalar, but ElementCount::isScalar() seems better.

I don't think the position for the runtimeCheck is good.
Now, the code looks like:
1.If UserVF is OK, do vectorization using UserVF
2.Populate VFCandidates
3.Collect information for vplans
4.Build vplans for all vf candidates and select best VF
5.Check the number of runtime checks for selected VF if it's scalar. If it's too many, stop vectorization.

15234 would be a better order?
1.If UserVF is OK, do vectorization using UserVF
5.Check the number of runtime checks(No need to consider about the VF because the number of runtime checks are same for all VF). If it's too many, stop vectorization.
2.Populate VFCandidates
3.Collect information for vplans
4.Build vplans for all vf candidates

I'm not sure of my opinion. If it's wrong, happy to get a response :)

TiehuZhang updated this revision to Diff 419716.Apr 1 2022, 4:41 AM

Herald added a subscriber: nemanjai. · View Herald TranscriptApr 1 2022, 4:41 AM

Harbormaster completed remote builds in B157387: Diff 419716.Apr 1 2022, 5:18 AM

Thanks for the update!

In D122126#3421022, @TKaipeng wrote:

I don't think the position for the runtimeCheck is good.
Now, the code looks like:
1.If UserVF is OK, do vectorization using UserVF
2.Populate VFCandidates
3.Collect information for vplans
4.Build vplans for all vf candidates and select best VF
5.Check the number of runtime checks for selected VF if it's scalar. If it's too many, stop vectorization.

15234 would be a better order?
1.If UserVF is OK, do vectorization using UserVF
5.Check the number of runtime checks(No need to consider about the VF because the number of runtime checks are same for all VF). If it's too many, stop vectorization.
2.Populate VFCandidates
3.Collect information for vplans
4.Build vplans for all vf candidates

I'm not sure of my opinion. If it's wrong, happy to get a response :)

It's indeed not at the most optimal position focused solely at compile-time. But I think for this patch, I think it would be good to roughly keep roughly the original order, especially if we manage to check & error at a single place.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10470	Can the handling be merged into a single check & diagnostic?

fhahn mentioned this in D109368: [LV] Vectorize cases with larger number of RT checks, execute only if profitable..Apr 5 2022, 9:45 AM

TiehuZhang updated this revision to Diff 420798.Apr 6 2022, 5:04 AM

Harbormaster completed remote builds in B158187: Diff 420798.Apr 6 2022, 6:21 AM

Fix the failed case (optimization-remark-options.c), because the remark info should be updated

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 6:31 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B162903: Diff 427300.May 5 2022, 9:17 AM

TiehuZhang added a reviewer: weiwei.May 12 2022, 6:00 AM

TiehuZhang edited reviewers, added: Weiwei-2021; removed: weiwei.

TiehuZhang removed a reviewer: Weiwei-2021.

(Updated)
Difference with accepted version: Move memory runtime checks to processLoop to control both VF and IC

The code has been updated since accept. Please review it again. Thank you very much! @fhahn @dmgreen

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7681–7682	Done. Thanks for your review!
10470	Hi, @fhahn, thanks for your review! It sounds similar to doesNotMeet (https://reviews.llvm.org/D98634), but the difference is that I need to use UserIC and UserVF to control whether this check needs to be performed, right? E.g. if (!UserVF && LVP.requiresTooManyRuntimeChecks()) { /generate remarks/ VF = VectorizationFactor::Disabled(); } if (!UserIC && LVP.requiresTooManyRuntimeChecks()) { /generate remarks/ IC = 1; } Could you just move the remark generation & early exit from ::plan here? You might want to skip those checks if there's a UserVF or UserIC used, with those I think we should always vectorize if possible. It also might be good to add a check line to your test which forces an interleave count > 1.
10470	Hi，@fhahn, thanks for your reply! Does the current version meet the requirements？
10470	Hi, @fhahn, is there any other problem with this patch? ping

Harbormaster completed remote builds in B164094: Diff 428930.May 12 2022, 8:02 AM

LGTM with additional suggestions inline, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10474	I think we usually try to use early exits to reduce the indentation, so it might be worth doing something like if (LVP.requiresTooManyRuntimeChecks()) { ORE->emit([&]() { return OptimizationRemarkAnalysisAliasing( DEBUG_TYPE, "CantReorderMemOps", L->getStartLoc(), L->getHeader()) << "loop not vectorized: cannot prove it is safe to reorder " "memory operations"; }); LLVM_DEBUG(dbgs() << "LV: Too many memory checks needed.\n"); Hints.emitRemarkWithHints(); return false; } // Select the interleave count. IC = CM.selectInterleaveCount(VF.Width, *VF.Cost.getValue()); (this has the added benefit of not checking for `!LVP.requiresTooManyRuntimeChecks()` but the unnegated version, which is slightly more straight forward)
llvm/test/Transforms/LoopVectorize/PowerPC/interleaved-pointer-runtime-check-unprofitable.ll
3 ↗	(On Diff #428930)	Might be good to precommit the test case and then just show the difference in this diff (without the fix `; CHECK: vector.memcheck`)

TiehuZhang updated this revision to Diff 429110.May 12 2022, 6:29 PM

Harbormaster completed remote builds in B164222: Diff 429110.May 12 2022, 7:19 PM

Still LGTM, thanks! The remaining suggestion can be addressed directly before committing the patch.

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h
323	nit: could be turned into `const` if possible.

In D122126#3515070, @fhahn wrote:

Still LGTM, thanks! The remaining suggestion can be addressed directly before committing the patch.

Thanks, @fhahn! I'll turn the function into const and add the precommit test when committing the patch

This revision was landed with ongoing or failed builds.May 19 2022, 8:29 AM

Closed by commit rG3ed9f603fd59: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the… (authored by TiehuZhang, committed by mdchen). · Explain Why

This revision was automatically updated to reflect the committed changes.

mdchen mentioned this in rG94a2bd5a270b: [LoopVectorize] Precommit a test for D122126.

mdchen added a commit: rG3ed9f603fd59: [LoopVectorize] Don't interleave when the number of runtime checks exceeds the….

Thanks!

Allen added a subscriber: Allen.May 21 2022, 10:30 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorizationPlanner.h

2 lines

LoopVectorize.cpp

13 lines

test/

Transforms/

LoopVectorize/

interleaved-pointer-runtime-check-unprofitable.ll

111 lines

Diff 416894

llvm/lib/Transforms/Vectorize/LoopVectorizationPlanner.h

Show First 20 Lines • Show All 313 Lines • ▼ Show 20 Lines	#endif

/// Test a \p Predicate on a \p Range of VF's. Return the value of applying		/// Test a \p Predicate on a \p Range of VF's. Return the value of applying
/// \p Predicate on Range.Start, possibly decreasing Range.End such that the		/// \p Predicate on Range.Start, possibly decreasing Range.End such that the
/// returned value holds for the entire \p Range.		/// returned value holds for the entire \p Range.
static bool		static bool
getDecisionAndClampRange(const std::function<bool(ElementCount)> &Predicate,		getDecisionAndClampRange(const std::function<bool(ElementCount)> &Predicate,
VFRange &Range);		VFRange &Range);

		bool isTooManyRuntimeChecks();
		fhahnUnsubmitted Not Done Reply Inline Actions It would be good to document the helper. Also, `requiresTooManyRuntimeChecks` may be slightly better, because `has` seems to imply that the checks are already there to me. fhahn: It would be good to document the helper. Also, `requiresTooManyRuntimeChecks` may be slightly…

		fhahnUnsubmitted Not Done Reply Inline Actions nit: could be turned into `const` if possible. fhahn: nit: could be turned into `const` if possible.
protected:		protected:
/// Collect the instructions from the original loop that would be trivially		/// Collect the instructions from the original loop that would be trivially
/// dead in the vectorized loop if generated.		/// dead in the vectorized loop if generated.
void collectTriviallyDeadInstructions(		void collectTriviallyDeadInstructions(
SmallPtrSetImpl<Instruction *> &DeadInstructions);		SmallPtrSetImpl<Instruction *> &DeadInstructions);

/// Build VPlans for power-of-2 VF's between \p MinVF and \p MaxVF inclusive,		/// Build VPlans for power-of-2 VF's between \p MinVF and \p MaxVF inclusive,
/// according to the information gathered by Legal when it checked if it is		/// according to the information gathered by Legal when it checked if it is
Show All 33 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,585 Lines • ▼ Show 20 Lines	LoopVectorizationPlanner::planInVPlanNativePath(ElementCount UserVF) {
}		}

LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LV: Not vectorizing. Inner loops aren't supported in the "		dbgs() << "LV: Not vectorizing. Inner loops aren't supported in the "
"VPlan-native path.\n");		"VPlan-native path.\n");
return VectorizationFactor::Disabled();		return VectorizationFactor::Disabled();
}		}

		bool LoopVectorizationPlanner::isTooManyRuntimeChecks() {
		dmgreenUnsubmitted Not Done Reply Inline Actions isTooManyRuntimeChecks -> hasTooManyRuntimeChecks ? dmgreen: isTooManyRuntimeChecks -> hasTooManyRuntimeChecks ?
		unsigned NumRuntimePointerChecks = Requirements.getNumRuntimePointerChecks();
		bool PragmaThresholdReached =
		dmgreenUnsubmitted Not Done Reply Inline Actions Should this be checking Hints.allowReordering like the other uses of PragmaVectorizeMemoryCheckThreshold? dmgreen: Should this be checking Hints.allowReordering like the other uses of…
		NumRuntimePointerChecks > PragmaVectorizeMemoryCheckThreshold;
		bool ThresholdReached =
		NumRuntimePointerChecks > VectorizerParams::RuntimeMemoryCheckThreshold;
		return PragmaThresholdReached && ThresholdReached;
		}

Optional<VectorizationFactor>		Optional<VectorizationFactor>
LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) {		LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) {
assert(OrigLoop->isInnermost() && "Inner loop expected.");		assert(OrigLoop->isInnermost() && "Inner loop expected.");
FixedScalableVFPair MaxFactors = CM.computeMaxVF(UserVF, UserIC);		FixedScalableVFPair MaxFactors = CM.computeMaxVF(UserVF, UserIC);
if (!MaxFactors) // Cases that should not to be vectorized nor interleaved.		if (!MaxFactors) // Cases that should not to be vectorized nor interleaved.
return None;		return None;

// Invalidate interleave groups if all blocks of loop will be predicated.		// Invalidate interleave groups if all blocks of loop will be predicated.
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) {
if (!MaxFactors.hasVector())		if (!MaxFactors.hasVector())
return VectorizationFactor::Disabled();		return VectorizationFactor::Disabled();

// Select the optimal vectorization factor.		// Select the optimal vectorization factor.
auto SelectedVF = CM.selectVectorizationFactor(VFCandidates);		auto SelectedVF = CM.selectVectorizationFactor(VFCandidates);

// Check if it is profitable to vectorize with runtime checks.		// Check if it is profitable to vectorize with runtime checks.
unsigned NumRuntimePointerChecks = Requirements.getNumRuntimePointerChecks();		unsigned NumRuntimePointerChecks = Requirements.getNumRuntimePointerChecks();
if (SelectedVF.Width.getKnownMinValue() > 1 && NumRuntimePointerChecks) {		if (SelectedVF.Width.getKnownMinValue() > 1 && NumRuntimePointerChecks) {
		dmgreenUnsubmitted Not Done Reply Inline Actions Why does this check SelectedVF.Width > 1? Can we just remove it or does that not help? dmgreen: Why does this check SelectedVF.Width > 1? Can we just remove it or does that not help?
		TiehuZhangAuthorUnsubmitted Done Reply Inline Actions Probably to reduce compilation time (although the impact is small) ？ TiehuZhang: Probably to reduce compilation time (although the impact is small) ？
bool PragmaThresholdReached =		bool PragmaThresholdReached =
NumRuntimePointerChecks > PragmaVectorizeMemoryCheckThreshold;		NumRuntimePointerChecks > PragmaVectorizeMemoryCheckThreshold;
bool ThresholdReached =		bool ThresholdReached =
NumRuntimePointerChecks > VectorizerParams::RuntimeMemoryCheckThreshold;		NumRuntimePointerChecks > VectorizerParams::RuntimeMemoryCheckThreshold;
if ((ThresholdReached && !Hints.allowReordering()) \|\|		if ((ThresholdReached && !Hints.allowReordering()) \|\|
PragmaThresholdReached) {		PragmaThresholdReached) {
ORE->emit([&]() {		ORE->emit([&]() {
		dmgreenUnsubmitted Not Done Reply Inline Actions Maybe just use a single if now: `if (SelectedVF.Width.getKnownMinValue() > 1 && hasTooManyRuntimeChecks()) {` dmgreen: Maybe just use a single if now: `if (SelectedVF.Width.getKnownMinValue() > 1 &&…
		TiehuZhangAuthorUnsubmitted Done Reply Inline Actions Done. Thanks for your review! TiehuZhang: Done. Thanks for your review!
return OptimizationRemarkAnalysisAliasing(		return OptimizationRemarkAnalysisAliasing(
DEBUG_TYPE, "CantReorderMemOps", OrigLoop->getStartLoc(),		DEBUG_TYPE, "CantReorderMemOps", OrigLoop->getStartLoc(),
OrigLoop->getHeader())		OrigLoop->getHeader())
<< "loop not vectorized: cannot prove it is safe to reorder "		<< "loop not vectorized: cannot prove it is safe to reorder "
"memory operations";		"memory operations";
});		});
LLVM_DEBUG(dbgs() << "LV: Too many memory checks needed.\n");		LLVM_DEBUG(dbgs() << "LV: Too many memory checks needed.\n");
Hints.emitRemarkWithHints();		Hints.emitRemarkWithHints();
▲ Show 20 Lines • Show All 2,770 Lines • ▼ Show 20 Lines	#endif /* NDEBUG */
Optional<VectorizationFactor> MaybeVF = LVP.plan(UserVF, UserIC);		Optional<VectorizationFactor> MaybeVF = LVP.plan(UserVF, UserIC);

VectorizationFactor VF = VectorizationFactor::Disabled();		VectorizationFactor VF = VectorizationFactor::Disabled();
unsigned IC = 1;		unsigned IC = 1;

if (MaybeVF) {		if (MaybeVF) {
VF = *MaybeVF;		VF = *MaybeVF;
// Select the interleave count.		// Select the interleave count.
		if (!LVP.isTooManyRuntimeChecks()) {
IC = CM.selectInterleaveCount(VF.Width, *VF.Cost.getValue());		IC = CM.selectInterleaveCount(VF.Width, *VF.Cost.getValue());
		fhahnUnsubmitted Not Done Reply Inline Actions This check here should be sufficient, there should be no need to also check in `selectInterleaveCount`. Could you just move the remark generation & early exit from `::plan` here? You might want to skip those checks if there's a UserVF or UserIC used, with those I think we should always vectorize if possible. It also might be good to add a check line to your test which forces an interleave count > 1. fhahn: This check here should be sufficient, there should be no need to also check in…
		TiehuZhangAuthorUnsubmitted Done Reply Inline Actions Hi, @fhahn, thanks for your review! It sounds similar to doesNotMeet (https://reviews.llvm.org/D98634), but the difference is that I need to use UserIC and UserVF to control whether this check needs to be performed, right? E.g. if (!UserVF && LVP.requiresTooManyRuntimeChecks()) { /generate remarks/ VF = VectorizationFactor::Disabled(); } if (!UserIC && LVP.requiresTooManyRuntimeChecks()) { /generate remarks/ IC = 1; } Could you just move the remark generation & early exit from ::plan here? You might want to skip those checks if there's a UserVF or UserIC used, with those I think we should always vectorize if possible. It also might be good to add a check line to your test which forces an interleave count > 1. TiehuZhang: Hi, @fhahn, thanks for your review! It sounds similar to doesNotMeet (https://reviews.llvm.
		fhahnUnsubmitted Not Done Reply Inline Actions Can the handling be merged into a single check & diagnostic? fhahn: Can the handling be merged into a single check & diagnostic?
		TiehuZhangAuthorUnsubmitted Done Reply Inline Actions Hi，@fhahn, thanks for your reply! Does the current version meet the requirements？ TiehuZhang: Hi，@fhahn, thanks for your reply! Does the current version meet the requirements？
		TiehuZhangAuthorUnsubmitted Done Reply Inline Actions Hi, @fhahn, is there any other problem with this patch? ping TiehuZhang: Hi, @fhahn, is there any other problem with this patch? ping
}		}
		}

// Identify the diagnostic messages that should be produced.		// Identify the diagnostic messages that should be produced.
		fhahnUnsubmitted Not Done Reply Inline Actions I think we usually try to use early exits to reduce the indentation, so it might be worth doing something like if (LVP.requiresTooManyRuntimeChecks()) { ORE->emit([&]() { return OptimizationRemarkAnalysisAliasing( DEBUG_TYPE, "CantReorderMemOps", L->getStartLoc(), L->getHeader()) << "loop not vectorized: cannot prove it is safe to reorder " "memory operations"; }); LLVM_DEBUG(dbgs() << "LV: Too many memory checks needed.\n"); Hints.emitRemarkWithHints(); return false; } // Select the interleave count. IC = CM.selectInterleaveCount(VF.Width, VF.Cost.getValue()); (this has the added benefit of not checking for `!LVP.requiresTooManyRuntimeChecks()` but the unnegated version, which is slightly more straight forward) fhahn:* I think we usually try to use early exits to reduce the indentation, so it might be worth doing…
std::pair<StringRef, std::string> VecDiagMsg, IntDiagMsg;		std::pair<StringRef, std::string> VecDiagMsg, IntDiagMsg;
bool VectorizeLoop = true, InterleaveLoop = true;		bool VectorizeLoop = true, InterleaveLoop = true;
if (VF.Width.isScalar()) {		if (VF.Width.isScalar()) {
LLVM_DEBUG(dbgs() << "LV: Vectorization is possible but not beneficial.\n");		LLVM_DEBUG(dbgs() << "LV: Vectorization is possible but not beneficial.\n");
VecDiagMsg = std::make_pair(		VecDiagMsg = std::make_pair(
"VectorizationNotBeneficial",		"VectorizationNotBeneficial",
"the cost-model indicates that vectorization is not beneficial");		"the cost-model indicates that vectorization is not beneficial");
VectorizeLoop = false;		VectorizeLoop = false;
▲ Show 20 Lines • Show All 326 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/interleaved-pointer-runtime-check-unprofitable.ll

This file was added.

				; RUN: opt -mtriple=powerpc64-unknown-linux-gnu -mcpu=a2 -S -loop-vectorize -debug-only=loop-vectorize < %s 2>&1 \| FileCheck %s
				dmgreenUnsubmitted Not Done Reply Inline Actions If you use -debug, it needs to REQUIRES: asserts It may be a simpler test to just show that the codegen is not vectorized/interleaved though, with a comment explaining that it would be too much overhead. dmgreen: If you use -debug, it needs to REQUIRES: asserts It may be a simpler test to just show that the…
				TiehuZhangAuthorUnsubmitted Done Reply Inline Actions Done, thanks very much! TiehuZhang: Done, thanks very much!

				fhahnUnsubmitted Not Done Reply Inline Actions As this relies on the PPC cost-model, this needs to go into the `PowerPC` subdirectory. fhahn: As this relies on the PPC cost-model, this needs to go into the `PowerPC` subdirectory.
				; CHECK-LABLE: LV: Checking a loop in "eddy_diff_caleddy_"
				dmgreenUnsubmitted Not Done Reply Inline Actions CHECK-LABEL dmgreen: CHECK-LABEL
				; CHECK: LV: Interleaving is not beneficial.

				define fastcc void @eddy_diff_caleddy_(i64* %wet_cl, i64* %web_cl, i64* %jtbu_cl, i64* %jbbu_cl, i64* %n2ht_cl, i64* %n2hb_cl, i64* %lwp_cl, i64* %0, double* %1, i64 %2, i64* %3, double* %4, i64* %5, double* %6, i64* %7, double* %8, i64* %9, double* %10, i64* %11, double* %12, i64* %13, double* %14, i64* %15, double* %16, i64* %17, double* %18, i64* %19, double* %20, i64* %21, double* %22, i64* %23, double* %24, i64* %25, double* %26, i64* %27, double* %28) {
				L.LB7_2232.preheader:
				br label %L.LB7_2232

				L.LB7_2232: ; preds = %L.LB7_3015, %L.LB7_2232.preheader
				dmgreenUnsubmitted Not Done Reply Inline Actions There are quite a lot of extra blocks in this test. Can a lot of them be removed? dmgreen: There are quite a lot of extra blocks in this test. Can a lot of them be removed?
				br label %vector.ph

				vector.ph: ; preds = %L.LB7_2232
				fhahnUnsubmitted Not Done Reply Inline Actions nit: can change `%z_e_8676_1962` to `i64` and remove the first load. Also, names could be tidied up a bit. fhahn: nit: can change `%z_e_8676_1962` to `i64` and remove the first load. Also, names could be…
				br label %middle.block.unr-lcssa
				fhahnUnsubmitted Not Done Reply Inline Actions nit: it would be good to clean up the names of the blocks a bit more. fhahn: nit: it would be good to clean up the names of the blocks a bit more.

				middle.block.unr-lcssa: ; preds = %vector.ph
				br label %middle.block

				middle.block: ; preds = %middle.block.unr-lcssa
				br label %L.LB7_3015

				L.LB7_3015: ; preds = %middle.block
				br i1 false, label %L.LB7_2238.preheader165, label %L.LB7_2232

				L.LB7_2238.preheader165: ; preds = %L.LB7_3015
				br label %L.LB7_2238

				L.LB7_2238: ; preds = %L.LB7_2242.loopexit, %L.LB7_2238.preheader165
				br label %L.LB7_2241.preheader

				L.LB7_2241.preheader: ; preds = %L.LB7_2238
				br label %L.LB7_2241

				L.LB7_2241: ; preds = %L.LB7_2241, %L.LB7_2241.preheader
				br i1 false, label %L.LB7_2242.loopexit.loopexit, label %L.LB7_2241

				L.LB7_2242.loopexit.loopexit: ; preds = %L.LB7_2241
				br label %L.LB7_2242.loopexit

				L.LB7_2242.loopexit: ; preds = %L.LB7_2242.loopexit.loopexit
				br i1 false, label %L.LB7_2249.preheader, label %L.LB7_2238

				L.LB7_2249.preheader: ; preds = %L.LB7_2242.loopexit
				%29 = mul i64 0, 0
				br label %L.LB7_2249

				L.LB7_2249: ; preds = %L.LB7_2249, %L.LB7_2249.preheader
				%indvars.iv774 = phi i64 [ 0, %L.LB7_2249.preheader ], [ %indvars.iv.next775, %L.LB7_2249 ]
				%30 = add nsw i64 0, 0
				%31 = getelementptr i64, i64* %wet_cl, i64 undef
				%32 = bitcast i64* %wet_cl to double*
				store double 0.000000e+00, double* %1, align 8
				%33 = add i64 0, 0
				%34 = getelementptr i64, i64* %wet_cl, i64 %2
				%35 = bitcast i64* %wet_cl to double*
				store double 0.000000e+00, double* %4, align 8
				%36 = add i64 %indvars.iv774, undef
				%37 = getelementptr i64, i64* %wet_cl, i64 %36
				%38 = bitcast i64* %37 to double*
				store double 0.000000e+00, double* %38, align 8
				%39 = getelementptr i64, i64* %wet_cl, i64 undef
				%40 = bitcast i64* %wet_cl to double*
				store double 0.000000e+00, double* %6, align 8
				%41 = getelementptr i64, i64* %wet_cl, i64 undef
				%42 = bitcast i64* %wet_cl to double*
				store double 0.000000e+00, double* %8, align 8
				%43 = getelementptr i64, i64* %wet_cl, i64 undef
				%44 = bitcast i64* %wet_cl to double*
				store double 0.000000e+00, double* %10, align 8
				%45 = getelementptr i64, i64* %wet_cl, i64 undef
				%46 = bitcast i64* %wet_cl to double*
				store double 0.000000e+00, double* %12, align 8
				%47 = getelementptr i64, i64* %wet_cl, i64 undef
				%48 = bitcast i64* %wet_cl to double*
				store double 0.000000e+00, double* %14, align 8
				%49 = getelementptr i64, i64* %wet_cl, i64 undef
				fhahnUnsubmitted Not Done Reply Inline Actions Please avoid using `undef` in the test unless necessary. fhahn: Please avoid using `undef` in the test unless necessary.
				TiehuZhangAuthorUnsubmitted Done Reply Inline Actions Done, thanks very much! TiehuZhang: Done, thanks very much!
				%50 = bitcast i64* %wet_cl to double*
				store double 0.000000e+00, double* %16, align 8
				%51 = getelementptr i64, i64* %wet_cl, i64 undef
				%52 = bitcast i64* %wet_cl to double*
				store double 0.000000e+00, double* %18, align 8
				%53 = getelementptr i64, i64* %wet_cl, i64 undef
				%54 = bitcast i64* %wet_cl to double*
				store double 0.000000e+00, double* %20, align 8
				%55 = add i64 %indvars.iv774, 0
				%56 = getelementptr i64, i64* %wet_cl, i64 %55
				%57 = bitcast i64* %56 to double*
				store double 0.000000e+00, double* %57, align 8
				%58 = getelementptr i64, i64* %wet_cl, i64 undef
				%59 = bitcast i64* %wet_cl to double*
				store double 0.000000e+00, double* %22, align 8
				%60 = getelementptr i64, i64* %web_cl, i64 undef
				%61 = bitcast i64* %wet_cl to double*
				store double 0.000000e+00, double* %24, align 8
				%62 = getelementptr i64, i64* %web_cl, i64 %2
				%63 = bitcast i64* %wet_cl to double*
				store double 0.000000e+00, double* %26, align 8
				%64 = getelementptr i64, i64* %web_cl, i64 %36
				%65 = bitcast i64* %64 to double*
				store double 0.000000e+00, double* %65, align 8
				%66 = getelementptr i64, i64* %web_cl, i64 undef
				%67 = bitcast i64* %wet_cl to double*
				store double 0.000000e+00, double* %28, align 8
				%indvars.iv.next775 = add nuw nsw i64 %indvars.iv774, 1
				%exitcond778.not = icmp eq i64 %indvars.iv.next775, 0
				br i1 %exitcond778.not, label %L.LB7_2330.preheader, label %L.LB7_2249

				L.LB7_2330.preheader: ; preds = %L.LB7_2249
				ret void
				}