This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
LoopUtils.h
-
lib/Transforms/
-
Transforms/
-
Utils/
-
LoopUtils.cpp
-
Vectorize/
-
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
post-incs.ll

Differential D12765

[LV] Allow vectorization of loops with induction post-inc expressions
AbandonedPublic

Authored by kuhar on Sep 10 2015, 8:59 AM.

Download Raw Diff

Details

Reviewers: None

Summary

This patch teaches the LoopVectorizer not to bailout on loops with induction post-inc expressions.
Example:

for (int i = x; i < n; ++i) *ptr++ = *p++;
puts(ptr); // outside use

Previously the vectorizer wasn't able to vectorize similar code.

Diff Detail

Repository: rL LLVM

Event Timeline

kuhar updated this revision to Diff 34450.Sep 10 2015, 8:59 AM

kuhar retitled this revision from to Allow vectorization of loops with induction post-inc expressions.

kuhar updated this object.

kuhar set the repository for this revision to rL LLVM.

kuhar added a subscriber: llvm-commits.

kuhar retitled this revision from Allow vectorization of loops with induction post-inc expressions to [LV] Allow vectorization of loops with induction post-inc expressions.Sep 10 2015, 9:08 AM

mcrosier added a subscriber: mssimpso.Sep 10 2015, 11:52 AM

Hi Jakub,

How did you end up with such a loop? I see the issue when I run vectorizer on your IR tests, but I can't reproduce the issue with the source C code. I wonder if this patch actually just papers over a problem in another pass (IndVarSimplify?). Could you share more details on this please?

Thanks,
Michael

Hello Michael,

here is complete C code that is being vectorized now and wasn't previously. One could probably reduce it further, but this version preserves the main idea:

#include <stdlib.h>
 
#define BUFF_SIZE 4096

void __attribute__((noinline))
    foo(int y, char *restrict src, char *restrict dest) {
  char *p = src;
  char *ptr = dest;
  int n = (y - 1) < (dest + BUFF_SIZE - ptr - 1)
              ? (y - 1)
              : (dest + BUFF_SIZE - ptr - 1);
  for (int vv = 0; vv < n; ++vv)
    *ptr++ = *p++;

  *ptr++ = *p++;

  printf("%d\n", ptr);
}

int main() {
  char S[] = "01234567890abcdefghij123456789abcdefghij0123456789XXX";
  char D[128] = { 0 };
  foo(33, S, D);
  printf("%s\n", D);
}

I need this patch to perform some other optimizations (more advanced loop unrolling/vectorizing).

This patch triggers in real-life code. For example, it showed 24.54% improvement in lnt.MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan and 7.60% improvement in lnt.SingleSource/Benchmarks/BenchmarkGame/recursive on A57 in aarch64.

Hi Jakub,

I was able to reproduce the original issue with the test you posted, thanks! However, it still looks like the problem isn't in vectorizer. E.g. if you replace int n = y < BUFF_SIZE ? (y - 1) : (BUFF_SIZE - 1) (which is equivalent to the expression in your test) with something else, like int n = y or int n = y*7%19, the loop would be vectorized. Looking at dumps, I see that earlier passes (starting from MergedLoadStoreMotion) behave differently with these value of upper bound, which doesn't look right, and we should fix those passes, not vectorizer.

Thanks,
Michael

Hi Michael,

I followed your comments and did some investigation on the difference between the IR with int n = y < BUFF_SIZE ? (y - 1) : (BUFF_SIZE - 1) and with int n = (y*9)%17. As you mentioned, they start to differ after IndVarSimplify pass - with the second expression (y*9)%17 it replaces a phi node with gep generated by SCEVExpander. It's not expensive, because its essentially a reuse of previous value (%rem).

When it comes to the first code, it generates such BB when entering IndVarSimplify:

entry:
  %cmp = icmp slt i32 %y, 4096
  %sub = add nsw i32 %y, -1
  %cond = select i1 %cmp, i32 %sub, i32 4095
  %cmp1.4 = icmp sgt i32 %cond, 0
  br i1 %cmp1.4, label %for.body.lr.ph, label %for.end

...

for.end:                                          ; preds = %for.cond.for.end_crit_edge, %entry
  %ptr.0.lcssa = phi i8* [ %incdec.ptr2, %for.cond.for.end_crit_edge ], [ %dest, %entry ]

The problem is that the SCEV for backedge taken count looks like this: (1 + (zext i32 (-3 + (-1 * (-4097 smax (-1 + (-1 * %y))))<nsw>) to i64) + %dest). IndVarSimplify makes sure that it would be cheap to expand it with SCEVExpander and here the real problems start. SCEVExpander thinks that smax is a costly operation, so this whole expression is also considered costly.
I don't know how hard it'd be to make IndVarSimplify or SCEVExpander to look around the function to find an already generated equivalent expression, but I don't think it'd be trivial. IndVarSimplify certainly doesn't know anything about loop vectorization, so it's not able to determine that even though some SCEV is expensive to expand, it would be beneficial to do so because of speed-up caused by later vectorization.

I'm not yet convinced that it'd be better to follow this patch of fixing IndVarSimplify - performing this optimization in LoopVectorizer is quite easy and doesn't seem hacky to me.

Hi Jakub,

The problem with the current approach is that it only fixes one case, that is caused by a different problem (as you showed - the actual problem is in IndVarSimplify/SCEVExpander). We fix consequences, not the rootcause itself. Maybe next week someone will come up with a patch, that would fix the same issue in LoopUnroller, and next week - in LoopInterchange or any other loop transformation.

It might be not-trivial to fix it in SCEV/IndVarSimplify, but I think it should be possible, and that's what we need to try first. For this case we just need to teach SCEV/IndVarSimplify to take into account loop invariants. E.g. the entire expression (zext i32 (-3 + (-1 * (-4097 smax (-1 + (-1 * %y))))<nsw>) to i64) is a loop invariant, so it shouldn't have high cost.

The fix in vectorizer might be simple, but I believe it's a (small) step in a wrong direction.

Michael

Hi Michael,

I've spent the last two days trying to come up with a IndVarSimplify patch that would enable the unmodified LoopVectorizer to work on my code.
The problem is that currently IndVarSimplify::RewriteExitValues explicitly works only on expressions which are loop invariants and there is a cost check inside.
When I tired commenting out this HighCost check, benchmark scores were not great: the only benchmark significantly improved was automotive-susan (~ 21%), and there were many regressions, ex. (all the scores on A57/aarch64):
lnt.MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk 27.03%
lnt.MultiSource/Benchmarks/sim/sim 8.96%
lnt.MultiSource/Benchmarks/Olden/bh/bh 6.13%
lnt.MultiSource/Benchmarks/BitBench/uudecode/uudecode 6.07%
lnt.SingleSource/Benchmarks/Stanford/Puzzle 5.76%
lnt.MultiSource/Benchmarks/Prolangs-C++/ocean/ocean 5.34%
lnt.MultiSource/Applications/lemon/lemon 4.84%
lnt.MultiSource/Benchmarks/TSVC/IndirectAddressing-dbl/IndirectAddressing-dbl 4.73%
lnt.MultiSource/Benchmarks/FreeBench/pcompress2/pcompress2 4.15%
There were also some severe regressions on our internal benchmarks.

I tried to come up with some heuristic approach like recursively counting the number of expensive operations coming SCEV expansion and comparing it with the number of instructions in the whole loop. After playing with it for some time I managed to get rid of these most serious regressions in lnt - the only (serious) ones left were:
lnt.MultiSource/Benchmarks/BitBench/uudecode/uudecode 6.14%
lnt.MultiSource/Benchmarks/Olden/bh/bh 5.39%
lnt.SingleSource/Benchmarks/BenchmarkGame/recursive 3.44%

The problem was that there were also not as many (little) improvements and that there were still some serious regressions in out benchmarks. Another thing is that even in the most aggressive configuration (a.k.a. always assume cheap expansion) I was not able to reproduce improvements coming form my original LoopVectorizer patch. I suspect that the difference comes from the fact that IndVarSimplify actually creates some new instructions, and my changes in the LoopVectorizer only reuse some existing values.

I think that the problem is just that in IndVarSimplify it's to early to determine if it's beneficial to rewrite some exit values and that it leads to new instructions being emitted - it's so easy to regress some stuff. Maybe some other optimization pass (like recently discussed LEV) could do some cleanup after loop vectorization, I'm not sure... Anyway, I think that the biggest argument for patching the LoopVectorizer is that it can regresses code very little (from what I've seen running multiple benchmarks).
Do you have some other ideas on performing the changes in IndVarSimplify, Michael?

Cheers,
Jakub

Hi Jakub,

I also spent some time in debugger today, and now I'm convinced even more that we should try to handle it in IndVarSimplify/SCEV. As you correctly noted, the problem is that IndVarSimplify thinks that smax expression is expensive to expand. While it's indeed expensive to expand smax/smin, we don't need to do that in our case, as this particular expansion already exists in our code - variable %n holds this value. Moreover, SCEV already tries to find existing expansion (see SCEVExpander::findExistingExpansion), but fails in our case. The issue we should be solving here is how to make SCEVExpander not fail.

Now, how does it try to find existing expansion, and why does it fail? The logic is pretty simple here: we examine all checks in exiting blocks of our loop - LHS and RHS of these comparisons already have existing expansions by definition (they are the expansions). If their SCEV expression happens to be exactly the same as the one we're looking for (S) - we're done! But here is where our problem resides. In our case we are looking for the following expression:

(lldb) p S->dump()
(-3 + (-1 * (-4097 smax (-1 + (-1 * %y))))<nsw>)

But the most similar one that we have is this one:

(lldb) p SE.getSCEV(RHS)->dump()
(-2 + (-1 * (-4097 smax (-1 + (-1 * %y))))<nsw>)

So, we fail to see that we have almost the same existing one, so we think that we need to expand it from scratch - and that's expensive, so we bail out. Please note though that if we for some reason rewrote this expression as (1+ -3 + (-1 * (-4097 smax (-1 + (-1 * %y))))<nsw>), the current logic would catch it.

With that in mind, I tried the following patch:

diff --git a/lib/Analysis/ScalarEvolutionExpander.cpp b/lib/Analysis/ScalarEvolutionExpander.cpp
index ed7386b..e4af9dc 100644
--- a/lib/Analysis/ScalarEvolutionExpander.cpp
+++ b/lib/Analysis/ScalarEvolutionExpander.cpp
@@ -1829,10 +1829,16 @@ Value *SCEVExpander::findExistingExpansion(const SCEV *S,
                     TrueBB, FalseBB)))
       continue;

-    if (SE.getSCEV(LHS) == S && SE.DT.dominates(LHS, At))
+    const SCEV *LHSSE = SE.getSCEV(LHS);
+    if (LHSSE->getType() == S->getType() &&
+        isa<SCEVConstant>(SE.getMinusSCEV(LHSSE, S)) &&
+        SE.DT.dominates(LHS, At))
       return LHS;

-    if (SE.getSCEV(RHS) == S && SE.DT.dominates(RHS, At))
+    const SCEV *RHSSE = SE.getSCEV(RHS);
+    if (RHSSE->getType() == S->getType() &&
+        isa<SCEVConstant>(SE.getMinusSCEV(RHSSE, S)) &&
+        SE.DT.dominates(RHS, At))
       return RHS;
   }

The idea is that when an expression only differs by a constant from an existing one, we can say that it also exists (to be precise, there is a cheap way to expand it). However, I think it might be incorrect in some corner cases, so I'd like to check it with SCEV experts first. But anyway, even if it's not correct in this particular form, I think it should be possible to make it correct if we add some sanity checks, and I believe that's the way to go.

Thanks,
Michael

PS: I haven't done any testing of this patch except on the provided test case, which was successfully vectorized.

The idea is that when an expression only differs by a constant from an existing one, we can say that it also exists (to be precise, there is a cheap way to expand it). However, I think it might be incorrect in some corner cases, so I'd like to check it with SCEV experts first. But anyway, even if it's not correct in this particular form, I think it should be possible to make it correct if we add some sanity checks, and I believe that's the way to go.

I agree that this is a good thing to do. Can you post the patch for this? I'd like to get Sanjoy to look at this.

That having been said, do you have a theoretical argument that IndVarSimplify can always fix this?

Hi Hal,

I agree that this is a good thing to do. Can you post the patch for this? I'd like to get Sanjoy to look at this.

Actually, Sanjoy has already commented on the patch I posted above on the mailing list, but it didn't get to phabricator for some reason. I can create a separate phab-review for this patch, but I'd like to run some testing before that, which I hope I can do next week.

That having been said, do you have a theoretical argument that IndVarSimplify can always fix this?

Nope, I can't prove that. However, if this will fix all the cases we care about, then I'd rather not add new entities to vectorizer - it's already complicated enough.

Thanks,
Michael

In D12765#249249, @mzolotukhin wrote:

Hi Hal,

I agree that this is a good thing to do. Can you post the patch for this? I'd like to get Sanjoy to look at this.

Actually, Sanjoy has already commented on the patch I posted above on the mailing list, but it didn't get to phabricator for some reason. I can create a separate phab-review for this patch, but I'd like to run some testing before that, which I hope I can do next week.

Great, thanks!

That having been said, do you have a theoretical argument that IndVarSimplify can always fix this?

Nope, I can't prove that. However, if this will fix all the cases we care about, then I'd rather not add new entities to vectorizer - it's already complicated enough.

I doubt it is complicated enough ;) -- but that does not necessarily mean it needs this enhancement.

Thanks,
Michael

Hi everyone,

Just ot give an update on this: I tried the patch I posted before, but I'm not yet satisfied with it. The problem seems to be that we're detecting similar expressions now, but we're not reusing them. For the record, here is the test I'm playing with:

target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"

; Function Attrs: noinline nounwind ssp uwtable
define i8* @foo(i32 %y, i8* noalias %src, i8* noalias %dst) {
entry:
  %cmp = icmp slt i32 %y, 4096
  %sub = add nsw i32 %y, -1
  %tripcount = select i1 %cmp, i32 %sub, i32 4095
  %loop.entry.cond = icmp sgt i32 %tripcount, 0
  br i1 %loop.entry.cond, label %loop.ph, label %loop.exit

loop.ph:                                   ; preds = %entry
  br label %loop.body

loop.body:                                         ; preds = %loop.body, %loop.ph
  %iv = phi i32 [ 0, %loop.ph ], [ %iv.next, %loop.body ]
  %src.iv = phi i8* [ %src, %loop.ph ], [ %src.iv.next, %loop.body ]
  %dst.iv = phi i8* [ %dst, %loop.ph ], [ %dst.iv.next, %loop.body ]

  %tmp = load i8, i8* %src.iv, align 1
  store i8 %tmp, i8* %dst.iv, align 1

  %src.iv.next = getelementptr inbounds i8, i8* %src.iv, i64 1
  %dst.iv.next = getelementptr inbounds i8, i8* %dst.iv, i64 1
  %iv.next = add nsw i32 %iv, 1

  %loop.cond = icmp slt i32 %iv.next, %tripcount
  br i1 %loop.cond, label %loop.body, label %loop.exit

loop.exit:                                 ; preds = %loop.exit, %entry
  %src.iv.lcssa = phi i8* [ %src.iv.next, %loop.body ], [ %src, %entry ]
  ret i8* undef
}

!llvm.module.flags = !{!0, !1}
!llvm.ident = !{!2}

!0 = !{i32 2, !"Debug Info Version", i32 3}
!1 = !{i32 1, !"PIC Level", i32 2}
!2 = !{!"clang version 3.8.0 (trunk 247767) (llvm/trunk 247769)"}

What is expected from indvars on this test is to rewrite %src.iv.lcssa with a loop-invariant value. The patch does that, but it fills the loop preheader with code similar to what we have in entry: basic block, which seems unnecessary. My plan is to investigate it further, and hopefully fix it.

Michael

Hi Jakub and others,

As I mentioned before, my previous patch helped vectorizing the original test, but in the process we duplicated some code into the loop preheader. That happened because I fixed findExistingValues function to look for similar expression too, but I didn't fix SCEVExpander::expand to behave correspondingly. That's a bit bigger work to do than I thought before, so I have to postpone it until I have more time. I'm still sure that it's a right approach and eventually I hope to do it if no one else does that before. Jakub, do you have any plans on trying to implement this?

I filed a PR for this: https://llvm.org/bugs/show_bug.cgi?id=24920

Thanks,
Michael

Hello,

In D12765#252401, @mzolotukhin wrote:

Jakub, do you have any plans on trying to implement this?

If no one picks it up, then I'd like do it, but it won't be earlier than in 1 or 2 weeks from now - tomorrow is the last day of my internship and I'm returning to my university next week.

mzolotukhin mentioned this in D12494: New IR pass: LoopExitValues.Sep 25 2015, 7:20 PM

mzolotukhin mentioned this in D15559: [SCEVExpander] Make findExistingExpansion smarter.Dec 23 2015, 10:15 AM

kuhar abandoned this revision.Jul 9 2017, 10:19 PM

Revision Contents

Path

Size

include/

llvm/

Transforms/

Utils/

LoopUtils.h

10 lines

lib/

Transforms/

Utils/

LoopUtils.cpp

14 lines

Vectorize/

LoopVectorize.cpp

77 lines

test/

Transforms/

LoopVectorize/

post-incs.ll

145 lines

Diff 34450

include/llvm/Transforms/Utils/LoopUtils.h

	Show First 20 Lines • Show All 249 Lines • ▼ Show 20 Lines
	public:			public:
	/// This enum represents the kinds of inductions that we support.			/// This enum represents the kinds of inductions that we support.
	enum InductionKind {			enum InductionKind {
	IK_NoInduction, ///< Not an induction variable.			IK_NoInduction, ///< Not an induction variable.
	IK_IntInduction, ///< Integer induction variable. Step = C.			IK_IntInduction, ///< Integer induction variable. Step = C.
	IK_PtrInduction ///< Pointer induction var. Step = C / sizeof(elem).			IK_PtrInduction ///< Pointer induction var. Step = C / sizeof(elem).
	};			};

	public:
	/// Default constructor - creates an invalid induction.			/// Default constructor - creates an invalid induction.
	InductionDescriptor()			InductionDescriptor()
	: StartValue(nullptr), IK(IK_NoInduction), StepValue(nullptr) {}			: StartValue(nullptr), LoopExitInstr(nullptr), IK(IK_NoInduction),
				StepValue(nullptr) {}

	/// Get the consecutive direction. Returns:			/// Get the consecutive direction. Returns:
	/// 0 - unknown or non-consecutive.			/// 0 - unknown or non-consecutive.
	/// 1 - consecutive and increasing.			/// 1 - consecutive and increasing.
	/// -1 - consecutive and decreasing.			/// -1 - consecutive and decreasing.
	int getConsecutiveDirection() const;			int getConsecutiveDirection() const;

	/// Compute the transformed value of Index at offset StartValue using step			/// Compute the transformed value of Index at offset StartValue using step
	/// StepValue.			/// StepValue.
	/// For integer induction, returns StartValue + Index * StepValue.			/// For integer induction, returns StartValue + Index * StepValue.
	/// For pointer induction, returns StartValue[Index * StepValue].			/// For pointer induction, returns StartValue[Index * StepValue].
	/// FIXME: The newly created binary instructions should contain nsw/nuw			/// FIXME: The newly created binary instructions should contain nsw/nuw
	/// flags, which can be found from the original scalar operations.			/// flags, which can be found from the original scalar operations.
	Value transform(IRBuilder<> &B, Value Index) const;			Value transform(IRBuilder<> &B, Value Index) const;

	Value *getStartValue() const { return StartValue; }			Value *getStartValue() const { return StartValue; }
				Instruction *getLoopExitInstr() const { return LoopExitInstr; }
	InductionKind getKind() const { return IK; }			InductionKind getKind() const { return IK; }
	ConstantInt *getStepValue() const { return StepValue; }			ConstantInt *getStepValue() const { return StepValue; }

	static bool isInductionPHI(PHINode Phi, ScalarEvolution SE,			static bool isInductionPHI(PHINode Phi, ScalarEvolution SE,
	InductionDescriptor &D);			InductionDescriptor &D);

	private:			private:
	/// Private constructor - used by \c isInductionPHI.			/// Private constructor - used by \c isInductionPHI.
	InductionDescriptor(Value Start, InductionKind K, ConstantInt Step);			InductionDescriptor(Value Start, Instruction Exit, InductionKind K,
				ConstantInt *Step);

	/// Start value.			/// Start value.
	TrackingVH<Value> StartValue;			TrackingVH<Value> StartValue;
				// The instruction which value is used outside of the loop.
				Instruction *LoopExitInstr;
	/// Induction kind.			/// Induction kind.
	InductionKind IK;			InductionKind IK;
	/// Step value.			/// Step value.
	ConstantInt *StepValue;			ConstantInt *StepValue;
	};			};

	BasicBlock InsertPreheaderForLoop(Loop L, Pass *P);			BasicBlock InsertPreheaderForLoop(Loop L, Pass *P);

	▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

lib/Transforms/Utils/LoopUtils.cpp

Show First 20 Lines • Show All 590 Lines • ▼ Show 20 Lines	if (RK == MRK_FloatMin \|\| RK == MRK_FloatMax)
Cmp = Builder.CreateFCmp(P, Left, Right, "rdx.minmax.cmp");		Cmp = Builder.CreateFCmp(P, Left, Right, "rdx.minmax.cmp");
else		else
Cmp = Builder.CreateICmp(P, Left, Right, "rdx.minmax.cmp");		Cmp = Builder.CreateICmp(P, Left, Right, "rdx.minmax.cmp");

Value *Select = Builder.CreateSelect(Cmp, Left, Right, "rdx.minmax.select");		Value *Select = Builder.CreateSelect(Cmp, Left, Right, "rdx.minmax.select");
return Select;		return Select;
}		}

InductionDescriptor::InductionDescriptor(Value *Start, InductionKind K,		InductionDescriptor::InductionDescriptor(Value Start, Instruction Exit,
ConstantInt *Step)		InductionKind K, ConstantInt *Step)
: StartValue(Start), IK(K), StepValue(Step) {		: StartValue(Start), LoopExitInstr(Exit), IK(K), StepValue(Step) {
assert(IK != IK_NoInduction && "Not an induction");		assert(IK != IK_NoInduction && "Not an induction");
assert(StartValue && "StartValue is null");		assert(StartValue && "StartValue is null");
assert(StepValue && !StepValue->isZero() && "StepValue is zero");		assert(StepValue && !StepValue->isZero() && "StepValue is zero");
assert((IK != IK_PtrInduction \|\| StartValue->getType()->isPointerTy()) &&		assert((IK != IK_PtrInduction \|\| StartValue->getType()->isPointerTy()) &&
"StartValue is not a pointer for pointer induction");		"StartValue is not a pointer for pointer induction");
assert((IK != IK_IntInduction \|\| StartValue->getType()->isIntegerTy()) &&		assert((IK != IK_IntInduction \|\| StartValue->getType()->isIntegerTy()) &&
"StartValue is not an integer for integer induction");		"StartValue is not an integer for integer induction");
assert(StepValue->getType()->isIntegerTy() &&		assert(StepValue->getType()->isIntegerTy() &&
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	if (!AR) {
DEBUG(dbgs() << "LV: PHI is not a poly recurrence.\n");		DEBUG(dbgs() << "LV: PHI is not a poly recurrence.\n");
return false;		return false;
}		}

assert(AR->getLoop()->getHeader() == Phi->getParent() &&		assert(AR->getLoop()->getHeader() == Phi->getParent() &&
"PHI is an AddRec for a different loop?!");		"PHI is an AddRec for a different loop?!");
Value *StartValue =		Value *StartValue =
Phi->getIncomingValueForBlock(AR->getLoop()->getLoopPreheader());		Phi->getIncomingValueForBlock(AR->getLoop()->getLoopPreheader());

		auto *ExitInstr = dyn_cast<Instruction>(
		Phi->getIncomingValueForBlock(AR->getLoop()->getLoopLatch()));

const SCEV Step = AR->getStepRecurrence(SE);		const SCEV Step = AR->getStepRecurrence(SE);
// Calculate the pointer stride and check if it is consecutive.		// Calculate the pointer stride and check if it is consecutive.
const SCEVConstant *C = dyn_cast<SCEVConstant>(Step);		const SCEVConstant *C = dyn_cast<SCEVConstant>(Step);
if (!C)		if (!C)
return false;		return false;

ConstantInt *CV = C->getValue();		ConstantInt *CV = C->getValue();
if (PhiTy->isIntegerTy()) {		if (PhiTy->isIntegerTy()) {
D = InductionDescriptor(StartValue, IK_IntInduction, CV);		D = InductionDescriptor(StartValue, ExitInstr, IK_IntInduction, CV);
return true;		return true;
}		}

assert(PhiTy->isPointerTy() && "The PHI must be a pointer");		assert(PhiTy->isPointerTy() && "The PHI must be a pointer");
Type *PointerElementType = PhiTy->getPointerElementType();		Type *PointerElementType = PhiTy->getPointerElementType();
// The pointer stride cannot be determined if the pointer element type is not		// The pointer stride cannot be determined if the pointer element type is not
// sized.		// sized.
if (!PointerElementType->isSized())		if (!PointerElementType->isSized())
return false;		return false;

const DataLayout &DL = Phi->getModule()->getDataLayout();		const DataLayout &DL = Phi->getModule()->getDataLayout();
int64_t Size = static_cast<int64_t>(DL.getTypeAllocSize(PointerElementType));		int64_t Size = static_cast<int64_t>(DL.getTypeAllocSize(PointerElementType));
if (!Size)		if (!Size)
return false;		return false;

int64_t CVSize = CV->getSExtValue();		int64_t CVSize = CV->getSExtValue();
if (CVSize % Size)		if (CVSize % Size)
return false;		return false;
auto *StepValue = ConstantInt::getSigned(CV->getType(), CVSize / Size);		auto *StepValue = ConstantInt::getSigned(CV->getType(), CVSize / Size);

D = InductionDescriptor(StartValue, IK_PtrInduction, StepValue);		D = InductionDescriptor(StartValue, ExitInstr, IK_PtrInduction, StepValue);
return true;		return true;
}		}

/// \brief Returns the instructions that use values defined in the loop.		/// \brief Returns the instructions that use values defined in the loop.
SmallVector<Instruction , 8> llvm::findDefsUsedOutsideOfLoop(Loop L) {		SmallVector<Instruction , 8> llvm::findDefsUsedOutsideOfLoop(Loop L) {
SmallVector<Instruction *, 8> UsedOutside;		SmallVector<Instruction *, 8> UsedOutside;

for (auto *Block : L->getBlocks())		for (auto *Block : L->getBlocks())
Show All 13 Lines

lib/Transforms/Vectorize/LoopVectorize.cpp

	Show First 20 Lines • Show All 991 Lines • ▼ Show 20 Lines
	if (L->getParentLoop())			if (L->getParentLoop())
	L->getParentLoop()->addBasicBlockToLoop(NewBB, *LI);			L->getParentLoop()->addBasicBlockToLoop(NewBB, *LI);
	ReplaceInstWithInst(BB->getTerminator(),			ReplaceInstWithInst(BB->getTerminator(),
	BranchInst::Create(Bypass, NewBB, MemRuntimeCheck));			BranchInst::Create(Bypass, NewBB, MemRuntimeCheck));
	LoopBypassBlocks.push_back(BB);			LoopBypassBlocks.push_back(BB);
	AddedSafetyChecks = true;			AddedSafetyChecks = true;
	}			}

				struct PostIncInfo {
				PHINode *Induction;
				// If a loop is in LCSSA form there can only be one outside user. It's always
				// a PHI node in the loop exit block.
				PHINode *OutsideUser;
				};

				// Check if given ExitVal is an induction post-inc expression.
				static Optional<PostIncInfo> GetPostIncInfo(Instruction *ExitVal,
				PHINode IndVar, Loop OrigLoop,
				LoopVectorizationLegality *Legal) {
				if (!ExitVal)
				return None;

				// We need to find an outside user for the ExitVal. Loop is in LCSSA form, so
				// there is only one outside user that is a phi node.
				PHINode *OutsideUser = nullptr;
				for (auto *U : ExitVal->users()) {
				if (!OrigLoop->contains(cast<Instruction>(U))) {
				assert(!OutsideUser && "More than one outside user - not LCSSA form");

				OutsideUser = cast<PHINode>(U);
				assert(OutsideUser->getParent() == OrigLoop->getExitBlock() &&
				"Not LCSSA form");
				}
				}
				if (!OutsideUser)
				return None;

				return PostIncInfo{ IndVar, OutsideUser };
				}

	void InnerLoopVectorizer::createEmptyLoop() {			void InnerLoopVectorizer::createEmptyLoop() {
	/*			/*
	In this function we generate a new loop. The new loop will contain			In this function we generate a new loop. The new loop will contain
	the vectorized instructions while the old loop will continue to run the			the vectorized instructions while the old loop will continue to run the
	scalar remainder.			scalar remainder.

	[ ] <-- loop iteration number check.			[ ] <-- loop iteration number check.
	Show All 18 Lines
	\| [ ] \			\| [ ] \
	\| [ ]_\| <-- old scalar loop to handle remainder.			\| [ ]_\| <-- old scalar loop to handle remainder.
	\ \|			\ \|
	\ v			\ v
	>[ ] <-- exit block.			>[ ] <-- exit block.
	...			...
	*/			*/

				assert(OrigLoop->isLCSSAForm(*DT));
	BasicBlock *OldBasicBlock = OrigLoop->getHeader();			BasicBlock *OldBasicBlock = OrigLoop->getHeader();
	BasicBlock *VectorPH = OrigLoop->getLoopPreheader();			BasicBlock *VectorPH = OrigLoop->getLoopPreheader();
	BasicBlock *ExitBlock = OrigLoop->getExitBlock();			BasicBlock *ExitBlock = OrigLoop->getExitBlock();
	assert(VectorPH && "Invalid loop structure");			assert(VectorPH && "Invalid loop structure");
	assert(ExitBlock && "Must have an exit block");			assert(ExitBlock && "Must have an exit block");

				// If a loop exit values is a induction post-inc expression, map it to it's
				// PostIncInfo.
				DenseMap<Instruction *, PostIncInfo> PostIncExpressions;
				for (auto &IndInfo : *Legal->getInductionVars()) {
				// All induction variables are phi nodes.
				auto *InductionVar = cast<PHINode>(IndInfo.first);

				auto *ExitVal = IndInfo.second.getLoopExitInstr();
				Optional<PostIncInfo> OptPostIncInfo =
				GetPostIncInfo(ExitVal, InductionVar, OrigLoop, Legal);
				if (OptPostIncInfo.hasValue())
				PostIncExpressions[ExitVal] = OptPostIncInfo.getValue();
				}

	// Some loops have a single integer induction variable, while other loops			// Some loops have a single integer induction variable, while other loops
	// don't. One example is c++ iterators that often have multiple pointer			// don't. One example is c++ iterators that often have multiple pointer
	// induction variables. In the code below we also support a case where we			// induction variables. In the code below we also support a case where we
	// don't have a single induction variable.			// don't have a single induction variable.
	//			//
	// We try to obtain an induction variable from the original loop as hard			// We try to obtain an induction variable from the original loop as hard
	// as possible. However if we don't find one that:			// as possible. However if we don't find one that:
	// - is an integer			// - is an integer
	▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	// We are going to resume the execution of the scalar loop.			// We are going to resume the execution of the scalar loop.
	// Go over all of the induction variables that we found and fix the			// Go over all of the induction variables that we found and fix the
	// PHIs that are left in the scalar version of the loop.			// PHIs that are left in the scalar version of the loop.
	// The starting values of PHI nodes depend on the counter of the last			// The starting values of PHI nodes depend on the counter of the last
	// iteration in the vectorized loop.			// iteration in the vectorized loop.
	// If we come from a bypass edge then we need to start from the original			// If we come from a bypass edge then we need to start from the original
	// start value.			// start value.

				// Map end values of induction variables to their induction variables;
				// used later as resume values of post-inc expressions.
				DenseMap<Instruction, Value> IndVarEndVals;

	// This variable saves the new starting index for the scalar loop. It is used			// This variable saves the new starting index for the scalar loop. It is used
	// to test if there are any tail iterations left once the vector loop has			// to test if there are any tail iterations left once the vector loop has
	// completed.			// completed.
	LoopVectorizationLegality::InductionList::iterator I, E;			LoopVectorizationLegality::InductionList::iterator I, E;
	LoopVectorizationLegality::InductionList *List = Legal->getInductionVars();			LoopVectorizationLegality::InductionList *List = Legal->getInductionVars();
	for (I = List->begin(), E = List->end(); I != E; ++I) {			for (I = List->begin(), E = List->end(); I != E; ++I) {
	PHINode *OrigPhi = I->first;			PHINode *OrigPhi = I->first;
	InductionDescriptor II = I->second;			InductionDescriptor II = I->second;
	Show All 10 Lines
	IRBuilder<> B(LoopBypassBlocks.back()->getTerminator());			IRBuilder<> B(LoopBypassBlocks.back()->getTerminator());
	Value *CRD = B.CreateSExtOrTrunc(CountRoundDown,			Value *CRD = B.CreateSExtOrTrunc(CountRoundDown,
	II.getStepValue()->getType(),			II.getStepValue()->getType(),
	"cast.crd");			"cast.crd");
	EndValue = II.transform(B, CRD);			EndValue = II.transform(B, CRD);
	EndValue->setName("ind.end");			EndValue->setName("ind.end");
	}			}

				// Associate instruction with its end value - after the vector loop.
				IndVarEndVals[OrigPhi] = EndValue;

	// The new PHI merges the original incoming value, in case of a bypass,			// The new PHI merges the original incoming value, in case of a bypass,
	// or the value at the end of the vectorized loop.			// or the value at the end of the vectorized loop.
	BCResumeVal->addIncoming(EndValue, MiddleBlock);			BCResumeVal->addIncoming(EndValue, MiddleBlock);

	// Fix the scalar body counter (PHI node).			// Fix the scalar body counter (PHI node).
	unsigned BlockIdx = OrigPhi->getBasicBlockIndex(ScalarPH);			unsigned BlockIdx = OrigPhi->getBasicBlockIndex(ScalarPH);

	// The old induction's phi node in the scalar body needs the truncated			// The old induction's phi node in the scalar body needs the truncated
	Show All 10 Lines
	CountRoundDown, "cmp.n",			CountRoundDown, "cmp.n",
	MiddleBlock->getTerminator());			MiddleBlock->getTerminator());
	ReplaceInstWithInst(MiddleBlock->getTerminator(),			ReplaceInstWithInst(MiddleBlock->getTerminator(),
	BranchInst::Create(ExitBlock, ScalarPH, CmpN));			BranchInst::Create(ExitBlock, ScalarPH, CmpN));

	// Get ready to start creating new instructions into the vectorized body.			// Get ready to start creating new instructions into the vectorized body.
	Builder.SetInsertPoint(VecBody->getFirstInsertionPt());			Builder.SetInsertPoint(VecBody->getFirstInsertionPt());

				// Now we have to repair broken post-inc expressions. We need to add new
				// incoming arc to the PHI nodes that are their outside users. The incoming
				// arcs are from MiddleBlock and have the value of the associated induction
				// variables at the end of newly created vector loop.
				for (auto &PIE : PostIncExpressions) {
				auto *PHI = PIE.second.OutsideUser;
				PHI->addIncoming(IndVarEndVals[PIE.second.Induction], MiddleBlock);
				}

	// Save the state.			// Save the state.
	LoopVectorPreHeader = Lp->getLoopPreheader();			LoopVectorPreHeader = Lp->getLoopPreheader();
	LoopScalarPreHeader = ScalarPH;			LoopScalarPreHeader = ScalarPH;
	LoopMiddleBlock = MiddleBlock;			LoopMiddleBlock = MiddleBlock;
	LoopExitBlock = ExitBlock;			LoopExitBlock = ExitBlock;
	LoopVectorBody.push_back(VecBody);			LoopVectorBody.push_back(VecBody);
	LoopScalarBody = OldBasicBlock;			LoopScalarBody = OldBasicBlock;

	▲ Show 20 Lines • Show All 1,074 Lines • ▼ Show 20 Lines
	if (Ty0->getScalarSizeInBits() > Ty1->getScalarSizeInBits())			if (Ty0->getScalarSizeInBits() > Ty1->getScalarSizeInBits())
	return Ty0;			return Ty0;
	return Ty1;			return Ty1;
	}			}

	/// \brief Check that the instruction has outside loop users and is not an			/// \brief Check that the instruction has outside loop users and is not an
	/// identified reduction variable.			/// identified reduction variable.
	static bool hasOutsideLoopUser(const Loop TheLoop, Instruction Inst,			static bool hasOutsideLoopUser(const Loop TheLoop, Instruction Inst,
	SmallPtrSetImpl<Value *> &Reductions) {			const SmallPtrSetImpl<Value *> &Reductions) {
	// Reduction instructions are allowed to have exit users. All other			// Reduction instructions are allowed to have exit users. All other
	// instructions must not have external users.			// instructions must not have external users.
	if (!Reductions.count(Inst))			if (!Reductions.count(Inst))
	//Check that all of the users of the loop are inside the BB.			//Check that all of the users of the loop are inside the BB.
	for (User *U : Inst->users()) {			for (User *U : Inst->users()) {
	Instruction *UI = cast<Instruction>(U);			Instruction *UI = cast<Instruction>(U);
	// This user may be a reduction exit value.			// This user may be a reduction exit value.
	if (!TheLoop->contains(UI)) {			if (!TheLoop->contains(UI)) {
	▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	<< "control flow not understood by vectorizer");			<< "control flow not understood by vectorizer");
	DEBUG(dbgs() << "LV: Found an invalid PHI.\n");			DEBUG(dbgs() << "LV: Found an invalid PHI.\n");
	return false;			return false;
	}			}

	InductionDescriptor ID;			InductionDescriptor ID;
	if (InductionDescriptor::isInductionPHI(Phi, SE, ID)) {			if (InductionDescriptor::isInductionPHI(Phi, SE, ID)) {
	Inductions[Phi] = ID;			Inductions[Phi] = ID;

				if (ID.getLoopExitInstr() && TheLoop->isLCSSAForm(*DT))
				AllowedExit.insert(ID.getLoopExitInstr());

	// Get the widest type.			// Get the widest type.
	if (!WidestIndTy)			if (!WidestIndTy)
	WidestIndTy = convertPointerToIntegerType(DL, PhiTy);			WidestIndTy = convertPointerToIntegerType(DL, PhiTy);
	else			else
	WidestIndTy = getWiderType(DL, PhiTy, WidestIndTy);			WidestIndTy = getWiderType(DL, PhiTy, WidestIndTy);

	// Int inductions are special because we only allow one IV.			// Int inductions are special because we only allow one IV.
	if (ID.getKind() == InductionDescriptor::IK_IntInduction &&			if (ID.getKind() == InductionDescriptor::IK_IntInduction &&
	▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	if (EnableMemAccessVersioning)			if (EnableMemAccessVersioning)
	collectStridedAccess(ST);			collectStridedAccess(ST);
	}			}

	if (EnableMemAccessVersioning)			if (EnableMemAccessVersioning)
	if (LoadInst *LI = dyn_cast<LoadInst>(it))			if (LoadInst *LI = dyn_cast<LoadInst>(it))
	collectStridedAccess(LI);			collectStridedAccess(LI);

	// Reduction instructions are allowed to have exit users.			// Reduction instructions and post-inc expressions are allowed to have
	// All other instructions must not have external users.			// exit users. All other instructions must not have external users.
	if (hasOutsideLoopUser(TheLoop, it, AllowedExit)) {			if (hasOutsideLoopUser(TheLoop, it, AllowedExit)) {
	emitAnalysis(VectorizationReport(it) <<			// Check if an instruction could be a post-inc expression.
	"value cannot be used outside the loop");			emitAnalysis(VectorizationReport(it)
				<< "value cannot be used outside the loop");
	return false;			return false;
	}			}

	} // next instr.			} // next instr.

	}			}

	if (!Induction) {			if (!Induction) {
	▲ Show 20 Lines • Show All 991 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/post-incs.ll

This file was added.

				; RUN: opt < %s -loop-vectorize -force-vector-width=4 -S \| FileCheck %s

				target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"

				; Ensure that simple LCSSA loop with one induction post-inc expression gets
				; vectorized.
				; CHECK-LABEL: @t0
				; CHECK: %[[END:.]] = getelementptr i8, i8 %dest,
				; CHECK: vector.body
				; CHECK: for.end.loopexit:
				; CHECK: %incdec.ptr.lcssa = phi i8* [ %incdec.ptr, %for.body ], [ %[[END]], %middle.block ]
				; CHECK: %ptr.lcssa = phi i8* [ %dest, %entry ], [ %incdec.ptr.lcssa, %for.end.loopexit ]
				define void @t0(i32 %y, i32 %num, i8* noalias %dest) {
				entry:
				%cond = icmp sgt i32 %num, 0;
				br i1 %cond, label %for.body.preheader, label %for.end

				for.body.preheader:
				br label %for.body

				for.body:
				%vv = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
				%ptr = phi i8* [ %incdec.ptr, %for.body ], [ %dest, %for.body.preheader ]
				%incdec.ptr = getelementptr inbounds i8, i8* %ptr, i64 1
				store i8 0, i8* %ptr, align 1
				%inc = add nuw nsw i32 %vv, 1
				%cmp = icmp slt i32 %inc, %num
				br i1 %cmp, label %for.body, label %for.end.loopexit

				for.end.loopexit:
				%incdec.ptr.lcssa = phi i8* [ %incdec.ptr, %for.body ]
				br label %for.end

				for.end:
				%ptr.lcssa = phi i8* [ %dest, %entry ], [ %incdec.ptr.lcssa, %for.end.loopexit ]
				ret void
				}

				; Ensure that simple loop with one induction post-inc expression gets vectorized.
				; CHECK-LABEL: @t1
				; CHECK: %[[END:.]] = getelementptr i8, i8 %dest,
				; CHECK: vector.body
				; CHECK: for.end.loopexit:
				; CHECK: %[[LCSSA:.]] = phi i8 [ %incdec.ptr, %for.body ], [ %[[END]], %middle.block ]
				; CHECK: phi i8* [ %dest, %entry ], [ %[[LCSSA]], %for.end.loopexit ]
				define void @t1(i32 %y, i8* noalias %dest, i32 %num) {
				entry:
				%cond = icmp sgt i32 %num, 0;
				br i1 %cond, label %for.body.preheader, label %for.end

				for.body.preheader:
				br label %for.body

				for.body:
				%vv = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
				%ptr = phi i8* [ %incdec.ptr, %for.body ], [ %dest, %for.body.preheader ]
				%incdec.ptr = getelementptr inbounds i8, i8* %ptr, i64 1
				store i8 0, i8* %ptr, align 1
				%inc = add nuw nsw i32 %vv, 1
				%cmp = icmp slt i32 %inc, %num
				br i1 %cmp, label %for.body, label %for.end.loopexit

				for.end.loopexit:
				br label %for.end

				for.end:
				%ptr.lcssa = phi i8* [ %dest, %entry ], [ %incdec.ptr, %for.end.loopexit ]
				ret void
				}

				; Ensure that a loop with 2 post-inc expressions gets vectorized
				; and that resume values are correct.
				; CHECK-LABEL: @t2
				; CHECK: %[[SEND:.]] = getelementptr i8, i8 %src,
				; CHECK: %[[DEND:.]] = getelementptr i8, i8 %dest,
				; CHECK: vector.body:
				; CHECK: for.end.loopexit:
				; CHECK: %[[LCSSA1:.]] = phi i8 [ %incdec.ptr1, %for.body ], [ %[[DEND]], %middle.block ]
				; CHECK-NEXT: %[[LCSSA:.]] = phi i8 [ %incdec.ptr, %for.body ], [ %[[SEND]], %middle.block ]
				; CHECK: %ptr.lcssa = phi i8* [ %src, %entry ], [ %[[LCSSA]], %for.end.loopexit ]
				; CHECK-NEXT: %ptr1.lcssa = phi i8* [ %dest, %entry ], [ %[[LCSSA1]], %for.end.loopexit ]
				define void @t2(i32 %y, i8* noalias %src, i8* noalias %dest, i32 %num) {
				entry:
				%cond = icmp sgt i32 %num, 0;
				br i1 %cond, label %for.body.preheader, label %for.end

				for.body.preheader:
				br label %for.body

				for.body:
				%vv = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
				%p = phi i8* [ %incdec.ptr, %for.body ], [ %src, %for.body.preheader ]
				%ptr = phi i8* [ %incdec.ptr1, %for.body ], [ %dest, %for.body.preheader ]
				%incdec.ptr = getelementptr inbounds i8, i8* %p, i64 1
				%0 = load i8, i8* %p, align 1
				%incdec.ptr1 = getelementptr inbounds i8, i8* %ptr, i64 1
				store i8 %0, i8* %ptr, align 1
				%inc = add nuw nsw i32 %vv, 1
				%cmp = icmp slt i32 %inc, %num
				br i1 %cmp, label %for.body, label %for.end.loopexit

				for.end.loopexit:
				br label %for.end

				for.end:
				%ptr.lcssa = phi i8* [ %src, %entry ], [ %incdec.ptr, %for.end.loopexit ]
				%ptr1.lcssa = phi i8* [ %dest, %entry ], [ %incdec.ptr1, %for.end.loopexit ]
				ret void
				}

				; Ensure that a simple loop with integer post-inc expression gets vectorized,
				; even though the exit value has different value when coming from the entry
				; block than the induction start value.
				; CHECK-LABEL: @t3
				; CHECK: %[[END:.*]] = add i32 0,
				; CHECK: vector.body:
				; CHECK: for.body:
				; CHECK: %[[IND_VV:.]] = phi i32 [ %[[INC:.]], %for.body ], [ %[[RES:.*]], %scalar.ph ]
				; CHECK: for.end.loopexit:
				; CHECK: %[[LCSSA:.*]] = phi i32 [ %[[INC]], %for.body ], [ %[[END]], %middle.block ]
				; CHECK: phi i32 [ %num, %entry ], [ %[[LCSSA]], %for.end.loopexit ]
				define void @t3(i32 %y, i8* noalias %dest, i32 %num) {
				entry:
				%cond = icmp sgt i32 %num, 0;
				br i1 %cond, label %for.body.preheader, label %for.end

				for.body.preheader:
				br label %for.body

				for.body:
				%vv = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
				%ptr = phi i8* [ %incdec.ptr1, %for.body ], [ %dest, %for.body.preheader ]
				%incdec.ptr1 = getelementptr inbounds i8, i8* %ptr, i64 1
				store i8 0, i8* %ptr, align 1
				%inc = add nuw nsw i32 %vv, 1
				%cmp = icmp slt i32 %inc, %num
				br i1 %cmp, label %for.body, label %for.end.loopexit

				for.end.loopexit:
				br label %for.end

				for.end:
				%n.lcssa = phi i32 [ %num, %entry ], [ %inc, %for.end.loopexit ]
				ret void
				}
				No newline at end of file

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Allow vectorization of loops with induction post-inc expressionsAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 34450

include/llvm/Transforms/Utils/LoopUtils.h

lib/Transforms/Utils/LoopUtils.cpp

lib/Transforms/Vectorize/LoopVectorize.cpp

test/Transforms/LoopVectorize/post-incs.ll

[LV] Allow vectorization of loops with induction post-inc expressions
AbandonedPublic