This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
2/2
LoopAccessAnalysis.h
-
Transforms/Vectorize/
-
Vectorize/
6/6
LoopVectorizationLegality.h
-
lib/
-
Analysis/
44/47
LoopAccessAnalysis.cpp
-
Transforms/Vectorize/
-
Vectorize/
-
LoopVectorizationLegality.cpp
-
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
X86/
4/5
uniform_mem_op.ll
-
pr47343-expander-lcssa-after-cfg-update.ll
2/2
uniform_across_vf_induction1.ll
2/2
uniform_across_vf_induction1_and.ll
4/4
uniform_across_vf_induction1_div_urem.ll
6/6
uniform_across_vf_induction1_lshr.ll
6/6
uniform_across_vf_induction2.ll

Differential D148841

[LV] Use SCEV for uniformity analysis across VF
ClosedPublic

Authored by fhahn on Apr 20 2023, 2:02 PM.

Download Raw Diff

Details

Reviewers

reames
Ayal
gilr
vporpo

Commits

rG572cfa3fde54: [LV] Use SCEV for uniformity analysis across VF

Summary

This patch uses SCEV to check if a value is uniform across a given VF.

The basic idea is to construct SCEVs where the AddRecs of the loop are
adjusted to reflect the version in the vectorized loop (Step multiplied
by VF). We construct a SCEV for the value of the first vector lane
(offset 0) and one for the last vector lane (VF - 1). If they are equal,
consider the expression uniform.

While re-writing expressions, we also need to catch expressions we
cannot determine uniformity (e.g. SCEVUnknown).

I might be missing something that makes this approach unworkable in
practice, but it may be an alternative to D147735.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Apr 20 2023, 2:02 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2023, 2:02 PM

Herald added subscribers: StephenFan, javed.absar, hiraditya. · View Herald Transcript

fhahn requested review of this revision.Apr 20 2023, 2:02 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2023, 2:02 PM

Herald added subscribers: llvm-commits, • pcwang-thead. · View Herald Transcript

fhahn mentioned this in D147735: [LV] Adds simple analysis that improves VF-aware uniformity checks..Apr 20 2023, 2:06 PM

mingmingl added a subscriber: mingmingl.Apr 21 2023, 3:30 PM

Herald added a subscriber: hoy. · View Herald TranscriptApr 21 2023, 3:30 PM

fhahn mentioned this in D147734: [LV][NFC] Precommit test for a follow-up patch that introduces uniformity for a specific VF..Apr 25 2023, 1:34 PM

fhahn added reviewers: reames, Ayal, gilr, vporpo.Apr 25 2023, 1:41 PM

Rebase on top of extra tests from D147734.

The patch should be ready for review now.

nikic added a subscriber: nikic.Apr 25 2023, 2:08 PM

nikic added inline comments.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2556	Assert that the addrec loop is correct?
2563	Why is it valid to preserve nowrap flags here?

Address comments and fix broken variable names.

fhahn marked 2 inline comments as done.Apr 25 2023, 2:23 PM

fhahn added inline comments.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2556	Yep that's safer, added the assert. Should help due to the invariance check in ::visit().
2563	It should be valid for the vector loop, as we check that the vector induction doesn't overflow/wrap; although the loop we are dealing with here is still the original scalar loop... But it's not needed so I removed it, thanks!

vporpo added inline comments.Apr 25 2023, 2:56 PM

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
350–352	I think the description comment needs to be updated. It should mention that can now check uniformity within the VF, which should explain why we need the VF argument.
llvm/lib/Analysis/LoopAccessAnalysis.cpp
2536	A short description what this rewriter is for?
2653	Can't think of any case where this would trigger, but if there is such a corner case we may still miss it if it is only checked in the debug build, silently causing miscompilations. So perhaps we should not guard it under NDEBUG and use an llvm_unreachable instead of an assertion so that it is also checked in release builds? Wdyt?
2654	nit: Use seq<> ? `for (auto I : seq<unsigned>(1, VF->getKnownMinValue()-1))`

Harbormaster completed remote builds in B228096: Diff 516906.Apr 25 2023, 3:11 PM

fhahn mentioned this in rG883eb88caed0: [LV] Add extra uniformity tests with LSHR and AND..Apr 26 2023, 11:52 AM

Adress comments thanks! Also rebased on top of the committed tests and extra tests with AND and LSHR added in 883eb88caed04b269da7ba69265fd7c4dc815231.

This version of the patch also includes a change to the rewriter to track if we have seen a UDiv expression and we skip rewriting the second expression if there's no UDiv. This is to keep compile-time as low as possible.

With the latest version geomean compile-time increases by +0.01%: https://llvm-compile-time-tracker.com/compare.php?from=883eb88caed04b269da7ba69265fd7c4dc815231&to=943232d7acbeb48b1f2ed613903c77a161f80807&stat=instructions:u

Without the UDiv restriction that goes to +0.08% - +0.10%

fhahn marked an inline comment as done.Apr 26 2023, 12:30 PM

fhahn added inline comments.

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
350–352	Thanks, extended the comment.
llvm/lib/Analysis/LoopAccessAnalysis.cpp
2536	Added, thanks!
2653	IIUC this should be an invariant, but I added the assertion to give us a chance to catch any violations and investigate. I think having this check only when assertions are enabled is in line with how assertions are used widely in the LLVM codebase (for better or worse). But if people prefer to err on the side of caution, we can always run the checks, at the cost of extra compile-time overhead.
2654	Thanks, updated!

nikic added inline comments.Apr 26 2023, 12:36 PM

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2586–2588	Maybe?
2660	In line with the other comparison?

Simplify code as suggested, thanks!

fhahn marked 2 inline comments as done.Apr 26 2023, 12:51 PM

fhahn added inline comments.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2586–2588	Yep that's more compact, thanks!
2660	Updated! Originally the other check was also using `SE->isKnownPredicate` but it was increasing compile-time while not being needed for the first set of motivating cases.

Thanks for working on this @fhahn , and for adding all these new test cases, it looks good.
@nikic any more comments?

Harbormaster completed remote builds in B228365: Diff 517273.Apr 26 2023, 1:34 PM

nikic added inline comments.Apr 27 2023, 1:28 PM

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2636	Why do GEPs require special handling?

Remove special case for GEP which isn't needed in the latest version, also re-add isLoopInvariant to visit().

fhahn marked an inline comment as done.Apr 28 2023, 3:30 AM

fhahn added inline comments.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2636	There's no need with the latest version, I removed it, thanks!

Harbormaster completed remote builds in B228778: Diff 517856.Apr 28 2023, 5:07 AM

Thinking about this a bit, can't the form check be performed in terms of the original IV? Instead of computing the adjusted IV with the scaled index and an offset, can't we simply reason in terms of the relevant iterations of the original IV? I think this simply reduces to asking whether OrigIV mod VF is a loop invariant value and the high bits (OrigIV div VF) are fixed between iterations 0 and VF.

Saying that, I the later clause is slightly trickier than 0 and VF. It's any OrigIV mod VF == 0, and it's correspond OrigIV + VF -1. (Which is complicated subtraction expression involving the mod.) Unless maybe it's solving this part which leads to the current solution?

In D148841#4305388, @reames wrote:

Thinking about this a bit, can't the form check be performed in terms of the original IV? Instead of computing the adjusted IV with the scaled index and an offset, can't we simply reason in terms of the relevant iterations of the original IV? I think this simply reduces to asking whether OrigIV mod VF is a loop invariant value and the high bits (OrigIV div VF) are fixed between iterations 0 and VF.

Saying that, I the later clause is slightly trickier than 0 and VF. It's any OrigIV mod VF == 0, and it's correspond OrigIV + VF -1. (Which is complicated subtraction expression involving the mod.) Unless maybe it's solving this part which leads to the current solution?

I played around with this a bit before as well before. I might be missing something, but if we have OrigIV as AddRec {0,+,1}<nuw><nsw><%loop>, then wouldn't doing OrigIV mod VF always result in an AddRec that cycle through the remainder (for VF = 2, zext i1 {false,+,true}<%loop> to i64)? Also, if we would need to identify the AddRec sub-expressions, then we would also need a walk the whole expression as the re-writer does I think.

One other way I could think about reasoning about this would be to evaluate the AddRec at the start and VF-1, but that would only prove it for a specific value.

Ayal added inline comments.May 4 2023, 6:21 AM

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2566–2575	nit: would this look better?
2605	nit: a bit discrepant for canAnalyze() to mean more than CanAnalyze. Perhaps the latter should be CannotAnalyze.
2637	nit: worth setting `auto FixedVF = VF->getKnownMinValue();`.
2646	nit: suffice to set IsUniform to FirstLaneExpr == LastLaneExpr and assert that the latter is also canAnalyze if they're equal.
2649	Ahh, could URem with "VF-1" lead to equal expressions for first and last lanes, but not for all lanes inbetween? E.g., along with FoundUDiv: ((i++)/2)%3) and VF=8.

inclyc added a subscriber: inclyc.May 4 2023, 10:23 AM

Ayal mentioned this in D142895: [VPlan] Move mayHaveSideeffects for FORs check to VPlan..May 10 2023, 9:26 AM

Ayal mentioned this in D144491: [VPlan] Use isUniformAfterVec in VPReplicateRecipe::execute..May 11 2023, 8:49 AM

fhahn mentioned this in rG01efcec6dbd1: [LV] Add extra uniformity tests with UDIV and UREM..May 18 2023, 3:35 AM

fhahn marked 5 inline comments as done.May 18 2023, 4:17 AM

fhahn added inline comments.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2566–2575	Indeed, updated!
2605	Changed `CanAnalyze -> CannotAnalyze`, thanks!
2637	Updated, thanks!
2646	Adjusted, thanks!
2649	In theory there could be such expressions I think, but that particular one isn't handled incorrectly at the moment, possibly due to limitations in SCEV reasoning. Added additional tests in 01efcec6dbd1 But I checked compile-time impact of checking all expressions and there was no notable increase: https://llvm-compile-time-tracker.com/compare.php?from=9c1d65054818cd2fd9187cd7e7ef703d98b5c5e2&to=825bea6827d6558ab61c8e139f6d7ba4b007a69b&stat=instructions:u updated the code to check all lanes in between as well.

Address latest comments, thanks!

Harbormaster completed remote builds in B232832: Diff 523336.May 18 2023, 5:56 AM

+@simoll.

This hopefully impacts which VPlans are built across the possible VF range, say prefer VF=2 over VF=4 when the former has more uniforms than the latter (but none are fully invariant, as detected today, so covered by same VPlan), although undetected by current tests?

Adding various nits.

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
591–594
llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
351	ditto
354–359	nit: w/o VF, uniformity falls back to loop invariance.
llvm/lib/Analysis/LoopAccessAnalysis.cpp
2536	"Rewriter is designed to build the SCEVs for each of the VF lanes in the expected vectorized loop, which can then be compared to detect their uniformity. This is done by replacing the AddRec SCEVs of the original scalar loop with new AddRecs ..." Should the name "SCEVAddRecRewriter" convey what it's for?
2554	nit: explain why a non-loop-invariant uniform value is expected to involve a UDiv. Would an SDiv also work? This saves time by potentially bailing out after building a single (UDiv-free) expression for FirstLane, w/o building another expression for another lane, rather than saving in building an expression itself.
2565	nit: right, point (of error message) is that such addrec's should have been checked earlier?
2593	nit: return a SCEVCouldNotCompute instead, SCEV's inherent 'CannotAnalyze'?
2603	nit: can return SCEVCouldNotCompute (or null) at the end if not FoundUDiv.
2637	Update comment, fold LastLane into an additional VF-1 iteration of the loop, drop the max(1,VF-2).
2655	nit: suffice to check that SCEVs are equal?
llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
430	note: VF=4 is mandated, previous UF=2 decision with 8 loads is now UF=4 with 4 (uniform) loads & broadcasts. The load from test_base[i/8] could further fold two 'parts' of VF=4 together, as it is uniform across VF=8 / VF=4,UF=2. Worth leaving behind some assume, if not folding directly? I.e., record the largest VF for which uniformity was detected, even if a smaller VF is selected. Worth optimizing the analysis across VF's, i.e., if a value is known not to be uniform for VF avoid checking for 2VF? OTOH, if a value is known to be uniform for VF, check only two SCEVs for 2VF?
llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_and.ll
79	note: this load from A[iv & -2] is now recognized as uniform across VF=2.
llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_div_urem.ll
6–7	note: load from A[(iv/2)%3] rightfully not recognized as uniform for VF=8.
296–305	note: this load from A[(iv / 8) % 3] is now recognized as uniform for VF=8.
llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_lshr.ll
110	note: this load from A[iv >> 1] is now recognized as uniform for VF=2. Check that it is not considered uniform for VF=4?
224–225	note: load from A[iv>>2] recognized as uniform for VF=2, should also hold for VF=4.
858–859	note: load from A[1+i>>1] not recognized as uniform for VF=2 due to alignment.
llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll
148–149	note: load from A[iv/2 + iv2/2] i.e. A[2*(iv/2)] recognized as uniform for VF=2, but should not for VF > 2.

fhahn mentioned this in rG280656eae95a: [LV] Add check line with VF=4 to uniformity test..May 28 2023, 12:01 PM

Address latest comments, thanks!

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
591–594	Clarified, thanks!
llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
351	updated, thanks!
354–359	Updated with similar wording to isUniform, thanks!
llvm/lib/Analysis/LoopAccessAnalysis.cpp
2536	Adjusted the comment, thanks! Updated name to `SCEVAddRecForUniformityRewriter`
2554	Expanded the comment, thanks!
2565	Extended message to try to make this clear.
2593	Unfortunately that doesn't work without additional work, as the returned value may be used to construct the parent SCEV but the rewriter.
2603	Unfortunately that doesn't work without additional work, as the returned value may be used to construct the parent SCEV but the rewriter.
2637	Simplified the code, thanks!
2655	Simplified, thanks!
llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
430	The load from test_base[i/8] could further fold two 'parts' of VF=4 together, as it is uniform across VF=8 / VF=4,UF=2. I think that would be good as follow-up. Worth optimizing the analysis across VF's, i.e., if a value is known not to be uniform for VF avoid checking for 2*VF I was thinking about evaluating something like that as follow-up optimization. WDYT?
llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_and.ll
79	Added as a comment, thanks!
llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_div_urem.ll
6–7	Added comment, thanks!
296–305	added comment
llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_lshr.ll
110	Add check lines for VF=4 as well separately.
224–225	added comment, thanks!
858–859	Added note, thanks!
llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll
148–149	added note, thanks!

Harbormaster completed remote builds in B235077: Diff 526339.May 28 2023, 1:29 PM

Ayal added inline comments.May 29 2023, 6:19 AM

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2583–2585
2603	ok, this is fine. I see SCEVInitRewriter, SCEVPostIncRewriter, SCEVShiftRewriter indeed also record a similar SeenLoopVariantSCEVUnknown/Valid which their `rewrite()` queries after visit() to return SCEVCouldNotCompute. Worth doing the same, wrapping constructor/visit()/canAnalyze()?
2646	nit: "1 .. FixedVF-1" nit: "first lane" >> "lane 0" nit: why "reverse"?
llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
430	Sure, TODOs can be added to record potential follow-ups. Note also that uniformity could be improved beyond comparing equal SCEV expressions, by using Divergence Analysis which propagates uniformity also through uniform branches.
llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1.ll
1	line dropped intentionally?
llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll
7	lines changed intentionally?
149–150	lines dropped intentionally?

Address latest comments, thanks!

fhahn marked an inline comment as done.May 29 2023, 11:29 AM

fhahn added inline comments.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2583–2585	Merged, thanks!
2603	Updated, thanks!
llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
430	Added a TOOD, thanks!
llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1.ll
1	No that was an accident, added back, thanks!
llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll
7	Those were left over from the patch that added new run lines, removed in fcc135a8d6a7.
149–150	Those were left over from the patch that added new run lines, removed in fcc135a8d6a7.

Harbormaster completed remote builds in B235170: Diff 526461.May 29 2023, 11:29 AM

fhahn mentioned this in D151658: [LV] Check if value was already not uniform for previous VF..May 29 2023, 12:38 PM

This looks good to me, thanks!
Adding last couple of minor nits.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2558
2568	Constructor can now also be private.
llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
430	Thanks! Plus following-up with D151658!

This revision is now accepted and ready to land.May 29 2023, 1:49 PM

BTW, does the VF parameter need to be "optional", or should all callers of Legal/LAI::isUniform[MemOp]() be asked to provide a VF? Otherwise have them call isInvariant() instead. To encourage passing VF where available.

What do you think about checking for udiv presence upfront? That seems to eliminate the compile-time impact entirely for me. Something like this:

bool HasUDiv =
    SCEVExprContains(S, [](const SCEV *S) { return isa<SCEVUDivExpr>(S); });
if (!HasUDiv)
  return false;

Rebase so this can be applied directly on main and check for UDiv separately as suggested, thanks!

I am planning on landing this soon.

In D148841#4383992, @fhahn wrote:

Rebase so this can be applied directly on main and check for UDiv separately as suggested, thanks!

I am planning on landing this soon.

Have rewrite() take care of optimizing the pre-check for UDiv?

Are the std::optional<ElementCount> VF = std::nullopt really needed/desired?

Harbormaster completed remote builds in B235551: Diff 527011.May 31 2023, 7:12 AM

This revision was landed with ongoing or failed builds.May 31 2023, 8:01 AM

Closed by commit rG572cfa3fde54: [LV] Use SCEV for uniformity analysis across VF (authored by fhahn). · Explain Why

This revision was automatically updated to reflect the committed changes.

fhahn added a commit: rG572cfa3fde54: [LV] Use SCEV for uniformity analysis across VF.

In D148841#4384056, @Ayal wrote:

In D148841#4383992, @fhahn wrote:

Rebase so this can be applied directly on main and check for UDiv separately as suggested, thanks!

I am planning on landing this soon.

Have rewrite() take care of optimizing the pre-check for UDiv?

Taken care of in the committed version,

Are the std::optional<ElementCount> VF = std::nullopt really needed/desired?

Will disentangle isInvariant/isUniform separately.

Hi Florian!

This commit triggers failed asserts in my builds, reproduced as below:

$ cat repro.c 
typedef struct {
  int a;
  short b[]
} c;
void d() {
  c *e = d;
  for (int f = 1; f < 56; f++) {
    int g = f * f / 6;
    e->b[g] = f;
  }
}
$ clang -target x86_64-linux-gnu -w -c repro.c -O2
clang: /home/martin/code/llvm-project/llvm/lib/Analysis/ScalarEvolution.cpp:3674: const llvm::SCEV* llvm::ScalarEvolution::getAddRecExpr(llvm::SmallVectorImpl<const llvm::SCEV*>&, const llvm::Loop*, llvm::SCEV::NoWrapFlags): Assertion `isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!"' failed.

Can you have a look, and revert if it takes a while to get it fixed?

In D148841#4386771, @mstorsjo wrote:

Hi Florian!

This commit triggers failed asserts in my builds, reproduced as below:

$ cat repro.c 
typedef struct {
  int a;
  short b[]
} c;
void d() {
  c *e = d;
  for (int f = 1; f < 56; f++) {
    int g = f * f / 6;
    e->b[g] = f;
  }
}
$ clang -target x86_64-linux-gnu -w -c repro.c -O2
clang: /home/martin/code/llvm-project/llvm/lib/Analysis/ScalarEvolution.cpp:3674: const llvm::SCEV* llvm::ScalarEvolution::getAddRecExpr(llvm::SmallVectorImpl<const llvm::SCEV*>&, const llvm::Loop*, llvm::SCEV::NoWrapFlags): Assertion `isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!"' failed.

Can you have a look, and revert if it takes a while to get it fixed?

Thanks, should be fixed by 3b912e269a52

In D148841#4387363, @fhahn wrote:

In D148841#4386771, @mstorsjo wrote:

Hi Florian!

This commit triggers failed asserts in my builds, reproduced as below:

$ cat repro.c 
typedef struct {
  int a;
  short b[]
} c;
void d() {
  c *e = d;
  for (int f = 1; f < 56; f++) {
    int g = f * f / 6;
    e->b[g] = f;
  }
}
$ clang -target x86_64-linux-gnu -w -c repro.c -O2
clang: /home/martin/code/llvm-project/llvm/lib/Analysis/ScalarEvolution.cpp:3674: const llvm::SCEV* llvm::ScalarEvolution::getAddRecExpr(llvm::SmallVectorImpl<const llvm::SCEV*>&, const llvm::Loop*, llvm::SCEV::NoWrapFlags): Assertion `isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!"' failed.

Can you have a look, and revert if it takes a while to get it fixed?

Thanks, should be fixed by 3b912e269a52

Yes, the regression seems to be fixed in the original, non-reduced case now as well. Thanks!

fhahn mentioned this in rGe48b1e87a319: [LV] Split off invariance check from isUniform (NFCI)..Jun 1 2023, 11:09 AM

fhahn mentioned this in rGe19297471a09: [LV] Check if value was already not uniform for previous VF..Jun 4 2023, 12:31 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

LoopAccessAnalysis.h

5 lines

Transforms/

Vectorize/

LoopVectorizationLegality.h

11 lines

lib/

Analysis/

LoopAccessAnalysis.cpp

113 lines

Transforms/

Vectorize/

LoopVectorizationLegality.cpp

10 lines

LoopVectorize.cpp

6 lines

test/

Transforms/

LoopVectorize/

X86/

uniform_mem_op.ll

127 lines

pr47343-expander-lcssa-after-cfg-update.ll

2 lines

uniform_across_vf_induction1.ll

29 lines

uniform_across_vf_induction1_and.ll

29 lines

uniform_across_vf_induction1_div_urem.ll

56 lines

uniform_across_vf_induction1_lshr.ll

95 lines

uniform_across_vf_induction2.ll

34 lines

Diff 527045

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

Show First 20 Lines • Show All 582 Lines • ▼ Show 20 Lines

unsigned getNumRuntimePointerChecks() const {

return PtrRtChecking->getNumberOfChecks();

}

/// Return true if the block BB needs to be predicated in order for the loop

/// to be vectorized.

static bool blockNeedsPredication(BasicBlock *BB, Loop *TheLoop,

DominatorTree *DT);

/// Returns true if the value V is uniform within the loop.

/// Returns true if value \p V is uniform across \p VF lanes, when \p VF is

bool isUniform(Value *V) const;

/// provided, and otherwise if \p V is invariant across all loop iterations.

bool isUniform(Value *V, std::optional<ElementCount> VF = std::nullopt) const;

AyalUnsubmitted

Done

DominatorTree *DT);

- /// Returns true if the value V is uniform within the loop. If \p VF is

- /// provided, check if \p V is uniform across \p VF.

+ /// Returns true if value V is uniform across \p VF lanes, when \p VF is

+ /// provided, and otherwise if V is invariant across all loop iterations.

bool isUniform(Value *V, std::optional<ElementCount> VF = std::nullopt) const;

Ayal:

fhahnAuthorUnsubmitted

Done

Clarified, thanks!

fhahn: Clarified, thanks!

uint64_t getMaxSafeDepDistBytes() const { return MaxSafeDepDistBytes; }

unsigned getNumStores() const { return NumStores; }

unsigned getNumLoads() const { return NumLoads;}

/// The diagnostics report generated for the analysis. E.g. why we

/// couldn't analyze the loop.

const OptimizationRemarkAnalysis *getReport() const { return Report.get(); }

▲ Show 20 Lines • Show All 253 Lines • Show Last 20 Lines

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

Show First 20 Lines • Show All 341 Lines • ▼ Show 20 Lines

public:

/// Returns:

/// 0 - Stride is unknown or non-consecutive.

/// 1 - Address is consecutive.

/// -1 - Address is consecutive, and decreasing.

/// NOTE: This method must only be used before modifying the original scalar

/// loop. Do not use after invoking 'createVectorizedLoopSkeleton' (PR34965).

int isConsecutivePtr(Type *AccessTy, Value *Ptr) const;

/// Returns true if the value V is uniform within the loop.

/// Returns true if value V is uniform across \p VF lanes, when \p VF is

bool isUniform(Value *V) const;

/// provided, and otherwise if \p V is invariant across all loop iterations.

AyalUnsubmitted

Done

ditto

Ayal: ditto

fhahnAuthorUnsubmitted

Done

updated, thanks!

fhahn: updated, thanks!

bool isUniform(Value *V, std::optional<ElementCount> VF = std::nullopt) const;

vporpoUnsubmitted

Done

I think the description comment needs to be updated. It should mention that can now check uniformity within the VF, which should explain why we need the VF argument.

vporpo: I think the description comment needs to be updated. It should mention that can now check…

fhahnAuthorUnsubmitted

Done

Thanks, extended the comment.

fhahn: Thanks, extended the comment.

/// A uniform memory op is a load or store which accesses the same memory

/// location on all lanes.

/// location on all \p VF lanes, if \p VF is provided and otherwise if the

bool isUniformMemOp(Instruction &I) const;

/// memory location is invariant.

bool isUniformMemOp(Instruction &I,

std::optional<ElementCount> VF = std::nullopt) const;

AyalUnsubmitted

Done

bool isUniform(Value *V, std::optional<ElementCount> VF = std::nullopt) const;

/// A uniform memory op is a load or store which accesses the same memory

- /// location on all lanes. If \p VF is provided, check if \p I is uniform

- /// across \p VF.

- bool isUniformMemOp(Instruction &I,

+ /// location on all \p VF lanes. If \p VF is not provided, check if \p I is

+ /// invariant across all loop iterations. bool isUniformMemOp(Instruction &I,

nit: w/o VF, uniformity falls back to loop invariance.

Ayal: nit: w/o VF, uniformity falls back to loop invariance.

fhahnAuthorUnsubmitted

Done

Updated with similar wording to isUniform, thanks!

fhahn: Updated with similar wording to isUniform, thanks!

/// Returns the information that we collected about runtime memory check.

const RuntimePointerChecking *getRuntimePointerChecking() const {

return LAI->getRuntimePointerChecking();

}

const LoopAccessInfo *getLAI() const { return LAI; }

bool isSafeForAnyVectorWidth() const {

▲ Show 20 Lines • Show All 189 Lines • Show Last 20 Lines

llvm/lib/Analysis/LoopAccessAnalysis.cpp

Show First 20 Lines • Show All 2,526 Lines • ▼ Show 20 Lines if (I->getDebugLoc())

DL = I->getDebugLoc(); DL = I->getDebugLoc();

} }

Report = std::make_unique<OptimizationRemarkAnalysis>(DEBUG_TYPE, RemarkName, DL, Report = std::make_unique<OptimizationRemarkAnalysis>(DEBUG_TYPE, RemarkName, DL,

CodeRegion); CodeRegion);

return *Report; return *Report;

} }

bool LoopAccessInfo::isUniform(Value *V) const { namespace {

/// A rewriter to build the SCEVs for each of the VF lanes in the expected

vporpoUnsubmitted

Done

A short description what this rewriter is for?

vporpo: A short description what this rewriter is for?

fhahnAuthorUnsubmitted

Done

Added, thanks!

fhahn: Added, thanks!

AyalUnsubmitted

Not Done

"Rewriter is designed to build the SCEVs for each of the VF lanes in the expected vectorized loop, which can then be compared to detect their uniformity.
This is done by replacing the AddRec SCEVs of the original scalar loop with new AddRecs ..."

Should the name "SCEVAddRecRewriter" convey what it's for?

Ayal: "Rewriter is designed to build the SCEVs for each of the VF lanes in the expected vectorized…

fhahnAuthorUnsubmitted

Done

Adjusted the comment, thanks!

Updated name to SCEVAddRecForUniformityRewriter

fhahn: Adjusted the comment, thanks! Updated name to `SCEVAddRecForUniformityRewriter`

/// vectorized loop, which can then be compared to detect their uniformity. This

/// is done by replacing the AddRec SCEVs of the original scalar loop (TheLoop)

/// with new AddRecs where the step is multiplied by StepMultiplier and Offset *

/// Step is added. Also checks if all sub-expressions are analyzable w.r.t.

/// uniformity.

class SCEVAddRecForUniformityRewriter

: public SCEVRewriteVisitor<SCEVAddRecForUniformityRewriter> {

/// Multiplier to be applied to the step of AddRecs in TheLoop.

unsigned StepMultiplier;

/// Offset to be added to the AddRecs in TheLoop.

unsigned Offset;

/// Loop for which to rewrite AddRecsFor.

Loop *TheLoop;

/// Is any sub-expressions not analyzable w.r.t. uniformity?

bool CannotAnalyze = false;

AyalUnsubmitted

Done

nit: explain why a non-loop-invariant uniform value is expected to involve a UDiv. Would an SDiv also work?
This saves time by potentially bailing out after building a single (UDiv-free) expression for FirstLane, w/o building another expression for another lane, rather than saving in building an expression itself.

Ayal: nit: explain why a non-loop-invariant uniform value is expected to involve a UDiv. Would an…

fhahnAuthorUnsubmitted

Done

Expanded the comment, thanks!

fhahn: Expanded the comment, thanks!

bool canAnalyze() const { return !CannotAnalyze; }

nikicUnsubmitted

Done

Assert that the addrec loop is correct?

nikic: Assert that the addrec loop is correct?

fhahnAuthorUnsubmitted

Done

Yep that's safer, added the assert. Should help due to the invariance check in ::visit().

fhahn: Yep that's safer, added the assert. Should help due to the invariance check in ::visit().

public:

AyalUnsubmitted

Not Done

/// which are not loop invariant require operations to strip out the lowest

- /// bits. For now just look for UDivs and use it to avoid re-writing UDIV-free

+ /// bits. For now just look for UDivs and use it to avoid re-writing UDiv-free

/// expressions for other lanes to limit compile time.

Ayal:

SCEVAddRecForUniformityRewriter(ScalarEvolution &SE, unsigned StepMultiplier,

unsigned Offset, Loop *TheLoop)

: SCEVRewriteVisitor(SE), StepMultiplier(StepMultiplier), Offset(Offset),

TheLoop(TheLoop) {}

nikicUnsubmitted

Done

Why is it valid to preserve nowrap flags here?

nikic: Why is it valid to preserve nowrap flags here?

fhahnAuthorUnsubmitted

Done

It should be valid for the vector loop, as we check that the vector induction doesn't overflow/wrap; although the loop we are dealing with here is still the original scalar loop... But it's not needed so I removed it, thanks!

fhahn: It should be valid for the vector loop, as we check that the vector induction doesn't…

const SCEV *visitAddRecExpr(const SCEVAddRecExpr *Expr) {

assert(Expr->getLoop() == TheLoop &&

AyalUnsubmitted

Done

nit: right, point (of error message) is that such addrec's should have been checked earlier?

Ayal: nit: right, point (of error message) is that such addrec's should have been checked earlier?

fhahnAuthorUnsubmitted

Done

Extended message to try to make this clear.

fhahn: Extended message to try to make this clear.

"addrec outside of TheLoop must be invariant and should have been "

"handled earlier");

// Build a new AddRec by multiplying the step by StepMultiplier and

AyalUnsubmitted

Not Done

Constructor can now also be private.

Ayal: Constructor can now also be private.

// incrementing the start by Offset * step.

Type *Ty = Expr->getType();

auto *Step = Expr->getStepRecurrence(SE);

auto *NewStep = SE.getMulExpr(Step, SE.getConstant(Ty, StepMultiplier));

auto *ScaledOffset = SE.getMulExpr(Step, SE.getConstant(Ty, Offset));

auto *NewStart = SE.getAddExpr(Expr->getStart(), ScaledOffset);

return SE.getAddRecExpr(NewStart, NewStep, TheLoop, SCEV::FlagAnyWrap);

AyalUnsubmitted

Done

"addrec outside of TheLoop must be invariant");

- // Build a new AddRec by multiplying the step by StepMultiplier and adding

- // Offset * Step to the resulting AddRec.

+ // Build a new AddRec by multiplying the step by StepMultiplier and

+ // incrementing the start by Offset * step.

auto *Ty = Expr->getType();

- auto *StepC = SE.getConstant(Ty, StepMultiplier);

- auto *OffsetC = SE.getConstant(Ty, Offset);

- return SE.getAddExpr(

- SE.getAddRecExpr(Expr->getStart(),

- SE.getMulExpr(StepC, Expr->getStepRecurrence(SE)),

- Expr->getLoop(), SCEV::FlagAnyWrap),

- SE.getMulExpr(OffsetC, Expr->getStepRecurrence(SE)));

+ auto *Step = Expr->getStepRecurrence(SE);

+ auto *NewStep = SE.getMulExpr(Step, SE.getConstant(Ty, StepMultiplier));

+ auto *ScaledOffset = SE.getMulExpr(Step, SE.getConstant(Ty, Offset);

+ auto *NewStart = SE.getAddExpr(Expr->getStart(), ScaledOffset);

+ return SE.getAddRecExpr(NewStart, NewStep, TheLoop, SCEV::FlagAnyWrap);

}

const SCEV *visit(const SCEV *S) {

nit: would this look better?

Ayal: nit: would this look better?

fhahnAuthorUnsubmitted

Done

Indeed, updated!

fhahn: Indeed, updated!

}

const SCEV *visit(const SCEV *S) {

if (CannotAnalyze || SE.isLoopInvariant(S, TheLoop))

return S;

return SCEVRewriteVisitor<SCEVAddRecForUniformityRewriter>::visit(S);

}

const SCEV *visitUnknown(const SCEVUnknown *S) {

if (SE.isLoopInvariant(S, TheLoop))

AyalUnsubmitted

Done

const SCEV *visit(const SCEV *S) {

- if (CannotAnalyze)

- return S;

- if (SE.isLoopInvariant(S, TheLoop))

+ if (CannotAnalyze || SE.isLoopInvariant(S, TheLoop))

return S;

Ayal:

fhahnAuthorUnsubmitted

Done

Merged, thanks!

fhahn: Merged, thanks!

return S;

// The value could vary across iterations.

CannotAnalyze = true;

nikicUnsubmitted

Done

FoundUDiv = true;

- return SE.getUDivExpr(

- SCEVRewriteVisitor<SCEVAddRecRewriter>::visit(S->getOperand(0)),

- SCEVRewriteVisitor<SCEVAddRecRewriter>::visit(S->getOperand(1)));

+ return SCEVRewriteVisitor<SCEVAddRecRewriter>::visitUDiv(S);

}

const SCEV *visitUnknown(const SCEVUnknown *S) {

Maybe?

nikic: Maybe?

fhahnAuthorUnsubmitted

Done

Yep that's more compact, thanks!

fhahn: Yep that's more compact, thanks!

return S;

}

const SCEV *visitCouldNotCompute(const SCEVCouldNotCompute *S) {

// Could not analyze the expression.

AyalUnsubmitted

Done

nit: return a SCEVCouldNotCompute instead, SCEV's inherent 'CannotAnalyze'?

Ayal: nit: return a SCEVCouldNotCompute instead, SCEV's inherent 'CannotAnalyze'?

fhahnAuthorUnsubmitted

Done

Unfortunately that doesn't work without additional work, as the returned value may be used to construct the parent SCEV but the rewriter.

fhahn: Unfortunately that doesn't work without additional work, as the returned value may be used to…

CannotAnalyze = true;

return S;

}

static const SCEV *rewrite(const SCEV *S, ScalarEvolution &SE,

unsigned StepMultiplier, unsigned Offset,

Loop *TheLoop) {

/// Bail out if the expression does not contain an UDiv expression.

/// Uniform values which are not loop invariant require operations to strip

/// out the lowest bits. For now just look for UDivs and use it to avoid

AyalUnsubmitted

Done

nit: can return SCEVCouldNotCompute (or null) at the end if not FoundUDiv.

Ayal: nit: can return SCEVCouldNotCompute (or null) at the end if not FoundUDiv.

fhahnAuthorUnsubmitted

Done

Unfortunately that doesn't work without additional work, as the returned value may be used to construct the parent SCEV but the rewriter.

fhahn: Unfortunately that doesn't work without additional work, as the returned value may be used to…

AyalUnsubmitted

Done

ok, this is fine.

I see SCEVInitRewriter, SCEVPostIncRewriter, SCEVShiftRewriter indeed also record a similar SeenLoopVariantSCEVUnknown/Valid which their rewrite() queries after visit() to return SCEVCouldNotCompute. Worth doing the same, wrapping constructor/visit()/canAnalyze()?

Ayal: ok, this is fine. I see SCEVInitRewriter, SCEVPostIncRewriter, SCEVShiftRewriter indeed also…

fhahnAuthorUnsubmitted

Done

Updated, thanks!

fhahn: Updated, thanks!

/// re-writing UDIV-free expressions for other lanes to limit compile time.

if (!SCEVExprContains(S,

AyalUnsubmitted

Done

nit: a bit discrepant for canAnalyze() to mean more than CanAnalyze. Perhaps the latter should be CannotAnalyze.

Ayal: nit: a bit discrepant for canAnalyze() to mean more than CanAnalyze. Perhaps the latter should…

fhahnAuthorUnsubmitted

Done

Changed CanAnalyze -> CannotAnalyze, thanks!

fhahn: Changed `CanAnalyze -> CannotAnalyze`, thanks!

[](const SCEV *S) { return isa<SCEVUDivExpr>(S); }))

return SE.getCouldNotCompute();

SCEVAddRecForUniformityRewriter Rewriter(SE, StepMultiplier, Offset,

TheLoop);

const SCEV *Result = Rewriter.visit(S);

if (Rewriter.canAnalyze())

return Result;

return SE.getCouldNotCompute();

}

};

} // namespace

bool LoopAccessInfo::isUniform(Value *V, std::optional<ElementCount> VF) const {

auto *SE = PSE->getSE(); auto *SE = PSE->getSE();

// Since we rely on SCEV for uniformity, if the type is not SCEVable, it is // Since we rely on SCEV for uniformity, if the type is not SCEVable, it is

// never considered uniform. // never considered uniform.

// TODO: Is this really what we want? Even without FP SCEV, we may want some // TODO: Is this really what we want? Even without FP SCEV, we may want some

// trivially loop-invariant FP values to be considered uniform. // trivially loop-invariant FP values to be considered uniform.

if (!SE->isSCEVable(V->getType())) if (!SE->isSCEVable(V->getType()))

return false; return false;

return (SE->isLoopInvariant(SE->getSCEV(V), TheLoop)); const SCEV *S = SE->getSCEV(V);

if (SE->isLoopInvariant(S, TheLoop))

return true;

if (!VF || VF->isScalable())

return false;

if (VF->isScalar())

return true;

nikicUnsubmitted

Done

Why do GEPs require special handling?

nikic: Why do GEPs require special handling?

fhahnAuthorUnsubmitted

Done

There's no need with the latest version, I removed it, thanks!

fhahn: There's no need with the latest version, I removed it, thanks!

// Rewrite AddRecs in TheLoop to step by VF and check if the expression for

AyalUnsubmitted

Done

nit: worth setting auto FixedVF = VF->getKnownMinValue();.

Ayal: nit: worth setting `auto FixedVF = VF->getKnownMinValue();`.

fhahnAuthorUnsubmitted

Done

Updated, thanks!

fhahn: Updated, thanks!

AyalUnsubmitted

Done

Update comment, fold LastLane into an additional VF-1 iteration of the loop, drop the max(1,VF-2).

Ayal: Update comment, fold LastLane into an additional VF-1 iteration of the loop, drop the max(1,VF…

fhahnAuthorUnsubmitted

Done

Simplified the code, thanks!

fhahn: Simplified the code, thanks!

// lane 0 matches the expressions for all other lanes.

unsigned FixedVF = VF->getKnownMinValue();

const SCEV *FirstLaneExpr =

SCEVAddRecForUniformityRewriter::rewrite(S, *SE, FixedVF, 0, TheLoop);

if (isa<SCEVCouldNotCompute>(FirstLaneExpr))

return false;

// Make sure the expressions for lanes FixedVF-1..1 match the expression for

// lane 0. We check lanes in reverse order for compile-time, as frequently

AyalUnsubmitted

Done

nit: suffice to set IsUniform to FirstLaneExpr == LastLaneExpr and assert that the latter is also canAnalyze if they're equal.

Ayal: nit: suffice to set IsUniform to FirstLaneExpr == LastLaneExpr and assert that the latter is…

fhahnAuthorUnsubmitted

Done

Adjusted, thanks!

fhahn: Adjusted, thanks!

AyalUnsubmitted

Done

nit: "1 .. FixedVF-1"
nit: "first lane" >> "lane 0"
nit: why "reverse"?

Ayal: nit: "1 .. FixedVF-1" nit: "first lane" >> "lane 0" nit: why "reverse"?

// checking the last lane is sufficient to rule out uniformity.

return all_of(reverse(seq<unsigned>(1, FixedVF)), [&](unsigned I) {

const SCEV *IthLaneExpr =

AyalUnsubmitted

Done

Ahh, could URem with "VF-1" lead to equal expressions for first and last lanes, but not for all lanes inbetween? E.g., along with FoundUDiv: ((i++)/2)%3) and VF=8.

Ayal: Ahh, could URem with "VF-1" lead to equal expressions for first and last lanes, but not for all…

fhahnAuthorUnsubmitted

Done

In theory there could be such expressions I think, but that particular one isn't handled incorrectly at the moment, possibly due to limitations in SCEV reasoning. Added additional tests in 01efcec6dbd1

But I checked compile-time impact of checking all expressions and there was no notable increase: https://llvm-compile-time-tracker.com/compare.php?from=9c1d65054818cd2fd9187cd7e7ef703d98b5c5e2&to=825bea6827d6558ab61c8e139f6d7ba4b007a69b&stat=instructions:u

updated the code to check all lanes in between as well.

fhahn: In theory there could be such expressions I think, but that particular one isn't handled…

SCEVAddRecForUniformityRewriter::rewrite(S, *SE, FixedVF, I, TheLoop);

return FirstLaneExpr == IthLaneExpr;

});

} }

vporpoUnsubmitted

Done

Can't think of any case where this would trigger, but if there is such a corner case we may still miss it if it is only checked in the debug build, silently causing miscompilations. So perhaps we should not guard it under NDEBUG and use an llvm_unreachable instead of an assertion so that it is also checked in release builds? Wdyt?

vporpo: Can't think of any case where this would trigger, but if there is such a corner case we may…

fhahnAuthorUnsubmitted

Done

IIUC this should be an invariant, but I added the assertion to give us a chance to catch any violations and investigate. I think having this check only when assertions are enabled is in line with how assertions are used widely in the LLVM codebase (for better or worse). But if people prefer to err on the side of caution, we can always run the checks, at the cost of extra compile-time overhead.

fhahn: IIUC this should be an invariant, but I added the assertion to give us a chance to catch any…

vporpoUnsubmitted

Done

nit: Use seq<> ? for (auto I : seq<unsigned>(1, VF->getKnownMinValue()-1))

vporpo: nit: Use seq<> ? `for (auto I : seq<unsigned>(1, VF->getKnownMinValue()-1))`

fhahnAuthorUnsubmitted

Done

Thanks, updated!

fhahn: Thanks, updated!

/// Find the operand of the GEP that should be checked for consecutive /// Find the operand of the GEP that should be checked for consecutive

AyalUnsubmitted

Done

nit: suffice to check that SCEVs are equal?

Ayal: nit: suffice to check that SCEVs are equal?

fhahnAuthorUnsubmitted

Done

Simplified, thanks!

fhahn: Simplified, thanks!

/// stores. This ignores trailing indices that have no effect on the final /// stores. This ignores trailing indices that have no effect on the final

/// pointer. /// pointer.

static unsigned getGEPInductionOperand(const GetElementPtrInst *Gep) { static unsigned getGEPInductionOperand(const GetElementPtrInst *Gep) {

const DataLayout &DL = Gep->getModule()->getDataLayout(); const DataLayout &DL = Gep->getModule()->getDataLayout();

unsigned LastOperand = Gep->getNumOperands() - 1; unsigned LastOperand = Gep->getNumOperands() - 1;

nikicUnsubmitted

Done

IthLaneRewriter.canAnalyze() &&

- SE->isKnownPredicate(CmpInst::ICMP_EQ, FirstLaneExpr, IthLaneExpr) &&

+ FirstLaneExpr == IthLaneExpr &&

"first and last lane are equal, but not all lanes in between");

In line with the other comparison?

nikic: In line with the other comparison?

fhahnAuthorUnsubmitted

Done

Updated! Originally the other check was also using SE->isKnownPredicate but it was increasing compile-time while not being needed for the first set of motivating cases.

fhahn: Updated! Originally the other check was also using `SE->isKnownPredicate` but it was increasing…

TypeSize GEPAllocSize = DL.getTypeAllocSize(Gep->getResultElementType()); TypeSize GEPAllocSize = DL.getTypeAllocSize(Gep->getResultElementType());

// Walk backwards and try to peel off zeros. // Walk backwards and try to peel off zeros.

while (LastOperand > 1 && match(Gep->getOperand(LastOperand), m_Zero())) { while (LastOperand > 1 && match(Gep->getOperand(LastOperand), m_Zero())) {

// Find the type we're currently indexing into. // Find the type we're currently indexing into.

gep_type_iterator GEPTI = gep_type_begin(Gep); gep_type_iterator GEPTI = gep_type_begin(Gep);

std::advance(GEPTI, LastOperand - 2); std::advance(GEPTI, LastOperand - 2);

▲ Show 20 Lines • Show All 333 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

Show First 20 Lines • Show All 465 Lines • ▼ Show 20 Lines	int LoopVectorizationLegality::isConsecutivePtr(Type *AccessTy,
bool CanAddPredicate = !OptForSize;		bool CanAddPredicate = !OptForSize;
int Stride = getPtrStride(PSE, AccessTy, Ptr, TheLoop, Strides,		int Stride = getPtrStride(PSE, AccessTy, Ptr, TheLoop, Strides,
CanAddPredicate, false).value_or(0);		CanAddPredicate, false).value_or(0);
if (Stride == 1 \|\| Stride == -1)		if (Stride == 1 \|\| Stride == -1)
return Stride;		return Stride;
return 0;		return 0;
}		}

bool LoopVectorizationLegality::isUniform(Value *V) const {		bool LoopVectorizationLegality::isUniform(
return LAI->isUniform(V);		Value *V, std::optional<ElementCount> VF) const {
		return LAI->isUniform(V, VF);
}		}

bool LoopVectorizationLegality::isUniformMemOp(Instruction &I) const {		bool LoopVectorizationLegality::isUniformMemOp(
		Instruction &I, std::optional<ElementCount> VF) const {
Value *Ptr = getLoadStorePointerOperand(&I);		Value *Ptr = getLoadStorePointerOperand(&I);
if (!Ptr)		if (!Ptr)
return false;		return false;
// Note: There's nothing inherent which prevents predicated loads and		// Note: There's nothing inherent which prevents predicated loads and
// stores from being uniform. The current lowering simply doesn't handle		// stores from being uniform. The current lowering simply doesn't handle
// it; in particular, the cost model distinguishes scatter/gather from		// it; in particular, the cost model distinguishes scatter/gather from
// scalar w/predication, and we currently rely on the scalar path.		// scalar w/predication, and we currently rely on the scalar path.
return isUniform(Ptr) && !blockNeedsPredication(I.getParent());		return isUniform(Ptr, VF) && !blockNeedsPredication(I.getParent());
}		}

bool LoopVectorizationLegality::canVectorizeOuterLoop() {		bool LoopVectorizationLegality::canVectorizeOuterLoop() {
assert(!TheLoop->isInnermost() && "We are not vectorizing an outer loop.");		assert(!TheLoop->isInnermost() && "We are not vectorizing an outer loop.");
// Store the result and return it at the end instead of exiting early, in case		// Store the result and return it at the end instead of exiting early, in case
// allowExtraAnalysis is used to report multiple reasons for not vectorizing.		// allowExtraAnalysis is used to report multiple reasons for not vectorizing.
bool Result = true;		bool Result = true;
bool DoExtraAnalysis = ORE->allowExtraAnalysis(DEBUG_TYPE);		bool DoExtraAnalysis = ORE->allowExtraAnalysis(DEBUG_TYPE);
▲ Show 20 Lines • Show All 961 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,668 Lines • ▼ Show 20 Lines	void LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) {
// uniform.		// uniform.
auto *Cmp = dyn_cast<Instruction>(Latch->getTerminator()->getOperand(0));		auto *Cmp = dyn_cast<Instruction>(Latch->getTerminator()->getOperand(0));
if (Cmp && TheLoop->contains(Cmp) && Cmp->hasOneUse())		if (Cmp && TheLoop->contains(Cmp) && Cmp->hasOneUse())
addToWorklistIfAllowed(Cmp);		addToWorklistIfAllowed(Cmp);

// Return true if all lanes perform the same memory operation, and we can		// Return true if all lanes perform the same memory operation, and we can
// thus chose to execute only one.		// thus chose to execute only one.
auto isUniformMemOpUse = [&](Instruction *I) {		auto isUniformMemOpUse = [&](Instruction *I) {
if (!Legal->isUniformMemOp(*I))		if (!Legal->isUniformMemOp(*I, VF))
return false;		return false;
if (isa<LoadInst>(I))		if (isa<LoadInst>(I))
// Loading the same address always produces the same result - at least		// Loading the same address always produces the same result - at least
// assuming aliasing and ordering which have already been checked.		// assuming aliasing and ordering which have already been checked.
return true;		return true;
// Storing the same value on every iteration.		// Storing the same value on every iteration.
return TheLoop->isLoopInvariant(cast<StoreInst>(I)->getValueOperand());		return TheLoop->isLoopInvariant(cast<StoreInst>(I)->getValueOperand());
};		};
▲ Show 20 Lines • Show All 1,805 Lines • ▼ Show 20 Lines	if (Reverse)
Cost += TTI.getShuffleCost(TargetTransformInfo::SK_Reverse, VectorTy,		Cost += TTI.getShuffleCost(TargetTransformInfo::SK_Reverse, VectorTy,
std::nullopt, CostKind, 0);		std::nullopt, CostKind, 0);
return Cost;		return Cost;
}		}

InstructionCost		InstructionCost
LoopVectorizationCostModel::getUniformMemOpCost(Instruction *I,		LoopVectorizationCostModel::getUniformMemOpCost(Instruction *I,
ElementCount VF) {		ElementCount VF) {
assert(Legal->isUniformMemOp(*I));		assert(Legal->isUniformMemOp(*I, VF));

Type *ValTy = getLoadStoreType(I);		Type *ValTy = getLoadStoreType(I);
auto *VectorTy = cast<VectorType>(ToVectorTy(ValTy, VF));		auto *VectorTy = cast<VectorType>(ToVectorTy(ValTy, VF));
const Align Alignment = getLoadStoreAlignment(I);		const Align Alignment = getLoadStoreAlignment(I);
unsigned AS = getLoadStoreAddressSpace(I);		unsigned AS = getLoadStoreAddressSpace(I);
enum TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;		enum TTI::TargetCostKind CostKind = TTI::TCK_RecipThroughput;
if (isa<LoadInst>(I)) {		if (isa<LoadInst>(I)) {
return TTI.getAddressComputationCost(ValTy) +		return TTI.getAddressComputationCost(ValTy) +
▲ Show 20 Lines • Show All 359 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB) {

// TODO: We should generate better code and update the cost model for		// TODO: We should generate better code and update the cost model for
// predicated uniform stores. Today they are treated as any other		// predicated uniform stores. Today they are treated as any other
// predicated store (see added test cases in		// predicated store (see added test cases in
// invariant-store-vectorization.ll).		// invariant-store-vectorization.ll).
if (isa<StoreInst>(&I) && isScalarWithPredication(&I, VF))		if (isa<StoreInst>(&I) && isScalarWithPredication(&I, VF))
NumPredStores++;		NumPredStores++;

if (Legal->isUniformMemOp(I)) {		if (Legal->isUniformMemOp(I, VF)) {
auto isLegalToScalarize = [&]() {		auto isLegalToScalarize = [&]() {
if (!VF.isScalable())		if (!VF.isScalable())
// Scalarization of fixed length vectors "just works".		// Scalarization of fixed length vectors "just works".
return true;		return true;

// We have dedicated lowering for unpredicated uniform loads and		// We have dedicated lowering for unpredicated uniform loads and
// stores. Note that even with tail folding we know that at least		// stores. Note that even with tail folding we know that at least
// one lane is active (i.e. generalized predication is not possible		// one lane is active (i.e. generalized predication is not possible
▲ Show 20 Lines • Show All 3,740 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll

Show First 20 Lines • Show All 352 Lines • ▼ Show 20 Lines	loopexit:
ret void		ret void
}		}


declare void @init(ptr)		declare void @init(ptr)

;; Count the number of bits set in a bit vector -- key point of relevance is		;; Count the number of bits set in a bit vector -- key point of relevance is
;; that the byte load is uniform across 8 iterations at a time.		;; that the byte load is uniform across 8 iterations at a time.
		;; TODO: At the moment, this is vectorized with VF=4 and UF=4. The load is
		;; considered uniform across VF=4, but should be considered uniform across
		;; VF=8/VF=4,UF=2.
define i32 @test_count_bits(ptr %test_base) {		define i32 @test_count_bits(ptr %test_base) {
; CHECK-LABEL: @test_count_bits(		; CHECK-LABEL: @test_count_bits(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[ALLOCA:%.*]] = alloca [4096 x i32], align 4		; CHECK-NEXT: [[ALLOCA:%.*]] = alloca [4096 x i32], align 4
; CHECK-NEXT: call void @init(ptr [[ALLOCA]])		; CHECK-NEXT: call void @init(ptr [[ALLOCA]])
; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]		; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; CHECK: vector.ph:		; CHECK: vector.ph:
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP50:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[VEC_PHI:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP36:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[VEC_PHI2:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP51:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[VEC_PHI4:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP37:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[VEC_PHI5:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP38:%.]], [[VECTOR_BODY]] ]
		; CHECK-NEXT: [[VEC_PHI6:%.]] = phi <4 x i32> [ zeroinitializer, [[VECTOR_PH]] ], [ [[TMP39:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>		; CHECK-NEXT: [[STEP_ADD:%.*]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-NEXT: [[STEP_ADD1:%.*]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
		; CHECK-NEXT: [[STEP_ADD2:%.*]] = add <4 x i64> [[STEP_ADD1]], <i64 4, i64 4, i64 4, i64 4>
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 1		; CHECK-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 4
; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 2		; CHECK-NEXT: [[TMP2:%.*]] = add i64 [[INDEX]], 8
; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 3		; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[INDEX]], 12
; CHECK-NEXT: [[TMP4:%.*]] = add i64 [[INDEX]], 4		; CHECK-NEXT: [[TMP4:%.*]] = udiv i64 [[TMP0]], 8
; CHECK-NEXT: [[TMP5:%.*]] = add i64 [[INDEX]], 5		; CHECK-NEXT: [[TMP5:%.*]] = udiv i64 [[TMP1]], 8
; CHECK-NEXT: [[TMP6:%.*]] = add i64 [[INDEX]], 6		; CHECK-NEXT: [[TMP6:%.*]] = udiv i64 [[TMP2]], 8
; CHECK-NEXT: [[TMP7:%.*]] = add i64 [[INDEX]], 7		; CHECK-NEXT: [[TMP7:%.*]] = udiv i64 [[TMP3]], 8
; CHECK-NEXT: [[TMP8:%.*]] = udiv i64 [[TMP0]], 8		; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds i8, ptr [[TEST_BASE:%.]], i64 [[TMP4]]
; CHECK-NEXT: [[TMP9:%.*]] = udiv i64 [[TMP1]], 8		; CHECK-NEXT: [[TMP9:%.*]] = getelementptr inbounds i8, ptr [[TEST_BASE]], i64 [[TMP5]]
; CHECK-NEXT: [[TMP10:%.*]] = udiv i64 [[TMP2]], 8		; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i8, ptr [[TEST_BASE]], i64 [[TMP6]]
; CHECK-NEXT: [[TMP11:%.*]] = udiv i64 [[TMP3]], 8		; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i8, ptr [[TEST_BASE]], i64 [[TMP7]]
; CHECK-NEXT: [[TMP12:%.*]] = udiv i64 [[TMP4]], 8		; CHECK-NEXT: [[TMP12:%.*]] = load i8, ptr [[TMP8]], align 1
; CHECK-NEXT: [[TMP13:%.*]] = udiv i64 [[TMP5]], 8		; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i8> poison, i8 [[TMP12]], i64 0
; CHECK-NEXT: [[TMP14:%.*]] = udiv i64 [[TMP6]], 8		; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT]], <4 x i8> poison, <4 x i32> zeroinitializer
; CHECK-NEXT: [[TMP15:%.*]] = udiv i64 [[TMP7]], 8		; CHECK-NEXT: [[TMP13:%.*]] = load i8, ptr [[TMP9]], align 1
; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds i8, ptr [[TEST_BASE:%.]], i64 [[TMP8]]		; CHECK-NEXT: [[BROADCAST_SPLATINSERT7:%.*]] = insertelement <4 x i8> poison, i8 [[TMP13]], i64 0
; CHECK-NEXT: [[TMP17:%.*]] = getelementptr inbounds i8, ptr [[TEST_BASE]], i64 [[TMP9]]		; CHECK-NEXT: [[BROADCAST_SPLAT8:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT7]], <4 x i8> poison, <4 x i32> zeroinitializer
; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds i8, ptr [[TEST_BASE]], i64 [[TMP10]]		; CHECK-NEXT: [[TMP14:%.*]] = load i8, ptr [[TMP10]], align 1
; CHECK-NEXT: [[TMP19:%.*]] = getelementptr inbounds i8, ptr [[TEST_BASE]], i64 [[TMP11]]		; CHECK-NEXT: [[BROADCAST_SPLATINSERT9:%.*]] = insertelement <4 x i8> poison, i8 [[TMP14]], i64 0
; CHECK-NEXT: [[TMP20:%.*]] = getelementptr inbounds i8, ptr [[TEST_BASE]], i64 [[TMP12]]		; CHECK-NEXT: [[BROADCAST_SPLAT10:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT9]], <4 x i8> poison, <4 x i32> zeroinitializer
; CHECK-NEXT: [[TMP21:%.*]] = getelementptr inbounds i8, ptr [[TEST_BASE]], i64 [[TMP13]]		; CHECK-NEXT: [[TMP15:%.*]] = load i8, ptr [[TMP11]], align 1
; CHECK-NEXT: [[TMP22:%.*]] = getelementptr inbounds i8, ptr [[TEST_BASE]], i64 [[TMP14]]		; CHECK-NEXT: [[BROADCAST_SPLATINSERT11:%.*]] = insertelement <4 x i8> poison, i8 [[TMP15]], i64 0
; CHECK-NEXT: [[TMP23:%.*]] = getelementptr inbounds i8, ptr [[TEST_BASE]], i64 [[TMP15]]		; CHECK-NEXT: [[BROADCAST_SPLAT12:%.*]] = shufflevector <4 x i8> [[BROADCAST_SPLATINSERT11]], <4 x i8> poison, <4 x i32> zeroinitializer
; CHECK-NEXT: [[TMP24:%.*]] = load i8, ptr [[TMP16]], align 1		; CHECK-NEXT: [[TMP16:%.*]] = urem <4 x i64> [[VEC_IND]], <i64 8, i64 8, i64 8, i64 8>
; CHECK-NEXT: [[TMP25:%.*]] = load i8, ptr [[TMP17]], align 1		; CHECK-NEXT: [[TMP17:%.*]] = urem <4 x i64> [[STEP_ADD]], <i64 8, i64 8, i64 8, i64 8>
; CHECK-NEXT: [[TMP26:%.*]] = load i8, ptr [[TMP18]], align 1		; CHECK-NEXT: [[TMP18:%.*]] = urem <4 x i64> [[STEP_ADD1]], <i64 8, i64 8, i64 8, i64 8>
; CHECK-NEXT: [[TMP27:%.*]] = load i8, ptr [[TMP19]], align 1		; CHECK-NEXT: [[TMP19:%.*]] = urem <4 x i64> [[STEP_ADD2]], <i64 8, i64 8, i64 8, i64 8>
; CHECK-NEXT: [[TMP28:%.*]] = insertelement <4 x i8> poison, i8 [[TMP24]], i32 0		; CHECK-NEXT: [[TMP20:%.*]] = trunc <4 x i64> [[TMP16]] to <4 x i8>
; CHECK-NEXT: [[TMP29:%.*]] = insertelement <4 x i8> [[TMP28]], i8 [[TMP25]], i32 1		; CHECK-NEXT: [[TMP21:%.*]] = trunc <4 x i64> [[TMP17]] to <4 x i8>
; CHECK-NEXT: [[TMP30:%.*]] = insertelement <4 x i8> [[TMP29]], i8 [[TMP26]], i32 2		; CHECK-NEXT: [[TMP22:%.*]] = trunc <4 x i64> [[TMP18]] to <4 x i8>
; CHECK-NEXT: [[TMP31:%.*]] = insertelement <4 x i8> [[TMP30]], i8 [[TMP27]], i32 3		; CHECK-NEXT: [[TMP23:%.*]] = trunc <4 x i64> [[TMP19]] to <4 x i8>
; CHECK-NEXT: [[TMP32:%.*]] = load i8, ptr [[TMP20]], align 1		; CHECK-NEXT: [[TMP24:%.*]] = lshr <4 x i8> [[BROADCAST_SPLAT]], [[TMP20]]
; CHECK-NEXT: [[TMP33:%.*]] = load i8, ptr [[TMP21]], align 1		; CHECK-NEXT: [[TMP25:%.*]] = lshr <4 x i8> [[BROADCAST_SPLAT8]], [[TMP21]]
; CHECK-NEXT: [[TMP34:%.*]] = load i8, ptr [[TMP22]], align 1		; CHECK-NEXT: [[TMP26:%.*]] = lshr <4 x i8> [[BROADCAST_SPLAT10]], [[TMP22]]
; CHECK-NEXT: [[TMP35:%.*]] = load i8, ptr [[TMP23]], align 1		; CHECK-NEXT: [[TMP27:%.*]] = lshr <4 x i8> [[BROADCAST_SPLAT12]], [[TMP23]]
; CHECK-NEXT: [[TMP36:%.*]] = insertelement <4 x i8> poison, i8 [[TMP32]], i32 0		; CHECK-NEXT: [[TMP28:%.*]] = and <4 x i8> [[TMP24]], <i8 1, i8 1, i8 1, i8 1>
; CHECK-NEXT: [[TMP37:%.*]] = insertelement <4 x i8> [[TMP36]], i8 [[TMP33]], i32 1		; CHECK-NEXT: [[TMP29:%.*]] = and <4 x i8> [[TMP25]], <i8 1, i8 1, i8 1, i8 1>
; CHECK-NEXT: [[TMP38:%.*]] = insertelement <4 x i8> [[TMP37]], i8 [[TMP34]], i32 2		; CHECK-NEXT: [[TMP30:%.*]] = and <4 x i8> [[TMP26]], <i8 1, i8 1, i8 1, i8 1>
; CHECK-NEXT: [[TMP39:%.*]] = insertelement <4 x i8> [[TMP38]], i8 [[TMP35]], i32 3		; CHECK-NEXT: [[TMP31:%.*]] = and <4 x i8> [[TMP27]], <i8 1, i8 1, i8 1, i8 1>
; CHECK-NEXT: [[TMP40:%.*]] = urem <4 x i64> [[VEC_IND]], <i64 8, i64 8, i64 8, i64 8>		; CHECK-NEXT: [[TMP32:%.*]] = zext <4 x i8> [[TMP28]] to <4 x i32>
; CHECK-NEXT: [[TMP41:%.*]] = urem <4 x i64> [[STEP_ADD]], <i64 8, i64 8, i64 8, i64 8>		; CHECK-NEXT: [[TMP33:%.*]] = zext <4 x i8> [[TMP29]] to <4 x i32>
; CHECK-NEXT: [[TMP42:%.*]] = trunc <4 x i64> [[TMP40]] to <4 x i8>		; CHECK-NEXT: [[TMP34:%.*]] = zext <4 x i8> [[TMP30]] to <4 x i32>
; CHECK-NEXT: [[TMP43:%.*]] = trunc <4 x i64> [[TMP41]] to <4 x i8>		; CHECK-NEXT: [[TMP35:%.*]] = zext <4 x i8> [[TMP31]] to <4 x i32>
; CHECK-NEXT: [[TMP44:%.*]] = lshr <4 x i8> [[TMP31]], [[TMP42]]		; CHECK-NEXT: [[TMP36]] = add <4 x i32> [[VEC_PHI]], [[TMP32]]
; CHECK-NEXT: [[TMP45:%.*]] = lshr <4 x i8> [[TMP39]], [[TMP43]]		; CHECK-NEXT: [[TMP37]] = add <4 x i32> [[VEC_PHI4]], [[TMP33]]
; CHECK-NEXT: [[TMP46:%.*]] = and <4 x i8> [[TMP44]], <i8 1, i8 1, i8 1, i8 1>		; CHECK-NEXT: [[TMP38]] = add <4 x i32> [[VEC_PHI5]], [[TMP34]]
; CHECK-NEXT: [[TMP47:%.*]] = and <4 x i8> [[TMP45]], <i8 1, i8 1, i8 1, i8 1>		; CHECK-NEXT: [[TMP39]] = add <4 x i32> [[VEC_PHI6]], [[TMP35]]
; CHECK-NEXT: [[TMP48:%.*]] = zext <4 x i8> [[TMP46]] to <4 x i32>		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
		AyalUnsubmitted Done Reply Inline Actions note: VF=4 is mandated, previous UF=2 decision with 8 loads is now UF=4 with 4 (uniform) loads & broadcasts. The load from test_base[i/8] could further fold two 'parts' of VF=4 together, as it is uniform across VF=8 / VF=4,UF=2. Worth leaving behind some assume, if not folding directly? I.e., record the largest VF for which uniformity was detected, even if a smaller VF is selected. Worth optimizing the analysis across VF's, i.e., if a value is known not to be uniform for VF avoid checking for 2VF? OTOH, if a value is known to be uniform for VF, check only two SCEVs for 2VF? Ayal: note: VF=4 is mandated, previous UF=2 decision with 8 loads is now UF=4 with 4 (uniform) loads…
		fhahnAuthorUnsubmitted Done Reply Inline Actions The load from test_base[i/8] could further fold two 'parts' of VF=4 together, as it is uniform across VF=8 / VF=4,UF=2. I think that would be good as follow-up. Worth optimizing the analysis across VF's, i.e., if a value is known not to be uniform for VF avoid checking for 2VF I was thinking about evaluating something like that as follow-up optimization. WDYT? fhahn:* > The load from test_base[i/8] could further fold two 'parts' of VF=4 together, as it is…
		AyalUnsubmitted Done Reply Inline Actions Sure, TODOs can be added to record potential follow-ups. Note also that uniformity could be improved beyond comparing equal SCEV expressions, by using Divergence Analysis which propagates uniformity also through uniform branches. Ayal: Sure, TODOs can be added to record potential follow-ups. Note also that uniformity could be…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Added a TOOD, thanks! fhahn: Added a TOOD, thanks!
		AyalUnsubmitted Not Done Reply Inline Actions Thanks! Plus following-up with D151658! Ayal: Thanks! Plus following-up with D151658!
; CHECK-NEXT: [[TMP49:%.*]] = zext <4 x i8> [[TMP47]] to <4 x i32>		; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD2]], <i64 4, i64 4, i64 4, i64 4>
; CHECK-NEXT: [[TMP50]] = add <4 x i32> [[VEC_PHI]], [[TMP48]]		; CHECK-NEXT: [[TMP40:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
; CHECK-NEXT: [[TMP51]] = add <4 x i32> [[VEC_PHI2]], [[TMP49]]		; CHECK-NEXT: br i1 [[TMP40]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
; CHECK-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD]], <i64 4, i64 4, i64 4, i64 4>
; CHECK-NEXT: [[TMP52:%.*]] = icmp eq i64 [[INDEX_NEXT]], 4096
; CHECK-NEXT: br i1 [[TMP52]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP19:![0-9]+]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP51]], [[TMP50]]		; CHECK-NEXT: [[BIN_RDX:%.*]] = add <4 x i32> [[TMP37]], [[TMP36]]
; CHECK-NEXT: [[TMP53:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX]])		; CHECK-NEXT: [[BIN_RDX13:%.*]] = add <4 x i32> [[TMP38]], [[BIN_RDX]]
		; CHECK-NEXT: [[BIN_RDX14:%.*]] = add <4 x i32> [[TMP39]], [[BIN_RDX13]]
		; CHECK-NEXT: [[TMP41:%.*]] = call i32 @llvm.vector.reduce.add.v4i32(<4 x i32> [[BIN_RDX14]])
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096		; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 4096, 4096
; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]		; CHECK-NEXT: br i1 [[CMP_N]], label [[LOOP_EXIT:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:		; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]		; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 4096, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP53]], [[MIDDLE_BLOCK]] ]		; CHECK-NEXT: [[BC_MERGE_RDX:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[TMP41]], [[MIDDLE_BLOCK]] ]
; CHECK-NEXT: br label [[LOOP:%.*]]		; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:		; CHECK: loop:
; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]		; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
; CHECK-NEXT: [[ACCUM:%.]] = phi i32 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[ACCUM_NEXT:%.]], [[LOOP]] ]		; CHECK-NEXT: [[ACCUM:%.]] = phi i32 [ [[BC_MERGE_RDX]], [[SCALAR_PH]] ], [ [[ACCUM_NEXT:%.]], [[LOOP]] ]
; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1		; CHECK-NEXT: [[IV_NEXT]] = add i64 [[IV]], 1
; CHECK-NEXT: [[BYTE:%.*]] = udiv i64 [[IV]], 8		; CHECK-NEXT: [[BYTE:%.*]] = udiv i64 [[IV]], 8
; CHECK-NEXT: [[TEST_ADDR:%.*]] = getelementptr inbounds i8, ptr [[TEST_BASE]], i64 [[BYTE]]		; CHECK-NEXT: [[TEST_ADDR:%.*]] = getelementptr inbounds i8, ptr [[TEST_BASE]], i64 [[BYTE]]
; CHECK-NEXT: [[EARLYCND:%.*]] = load i8, ptr [[TEST_ADDR]], align 1		; CHECK-NEXT: [[EARLYCND:%.*]] = load i8, ptr [[TEST_ADDR]], align 1
; CHECK-NEXT: [[BIT:%.*]] = urem i64 [[IV]], 8		; CHECK-NEXT: [[BIT:%.*]] = urem i64 [[IV]], 8
; CHECK-NEXT: [[BIT_TRUNC:%.*]] = trunc i64 [[BIT]] to i8		; CHECK-NEXT: [[BIT_TRUNC:%.*]] = trunc i64 [[BIT]] to i8
; CHECK-NEXT: [[MASK:%.*]] = lshr i8 [[EARLYCND]], [[BIT_TRUNC]]		; CHECK-NEXT: [[MASK:%.*]] = lshr i8 [[EARLYCND]], [[BIT_TRUNC]]
; CHECK-NEXT: [[TEST:%.*]] = and i8 [[MASK]], 1		; CHECK-NEXT: [[TEST:%.*]] = and i8 [[MASK]], 1
; CHECK-NEXT: [[VAL:%.*]] = zext i8 [[TEST]] to i32		; CHECK-NEXT: [[VAL:%.*]] = zext i8 [[TEST]] to i32
; CHECK-NEXT: [[ACCUM_NEXT]] = add i32 [[ACCUM]], [[VAL]]		; CHECK-NEXT: [[ACCUM_NEXT]] = add i32 [[ACCUM]], [[VAL]]
; CHECK-NEXT: [[EXIT:%.*]] = icmp ugt i64 [[IV]], 4094		; CHECK-NEXT: [[EXIT:%.*]] = icmp ugt i64 [[IV]], 4094
; CHECK-NEXT: br i1 [[EXIT]], label [[LOOP_EXIT]], label [[LOOP]], !llvm.loop [[LOOP20:![0-9]+]]		; CHECK-NEXT: br i1 [[EXIT]], label [[LOOP_EXIT]], label [[LOOP]], !llvm.loop [[LOOP20:![0-9]+]]
; CHECK: loop_exit:		; CHECK: loop_exit:
; CHECK-NEXT: [[ACCUM_NEXT_LCSSA:%.*]] = phi i32 [ [[ACCUM_NEXT]], [[LOOP]] ], [ [[TMP53]], [[MIDDLE_BLOCK]] ]		; CHECK-NEXT: [[ACCUM_NEXT_LCSSA:%.*]] = phi i32 [ [[ACCUM_NEXT]], [[LOOP]] ], [ [[TMP41]], [[MIDDLE_BLOCK]] ]
; CHECK-NEXT: ret i32 [[ACCUM_NEXT_LCSSA]]		; CHECK-NEXT: ret i32 [[ACCUM_NEXT_LCSSA]]
;		;
entry:		entry:
%alloca = alloca [4096 x i32]		%alloca = alloca [4096 x i32]
call void @init(ptr %alloca)		call void @init(ptr %alloca)
br label %loop		br label %loop
loop:		loop:
%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]		%iv = phi i64 [ 0, %entry ], [ %iv.next, %loop ]
▲ Show 20 Lines • Show All 143 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/pr47343-expander-lcssa-after-cfg-update.ll

	Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[IV:%.]] = phi i32 [ [[IV_NEXT:%.]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]			; CHECK-NEXT: [[IV:%.]] = phi i32 [ [[IV_NEXT:%.]], [[LOOP]] ], [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ]
	; CHECK-NEXT: [[CONV6_US_US_US:%.*]] = zext i1 false to i32			; CHECK-NEXT: [[CONV6_US_US_US:%.*]] = zext i1 false to i32
	; CHECK-NEXT: store i32 [[CONV6_US_US_US]], ptr @f.e, align 1			; CHECK-NEXT: store i32 [[CONV6_US_US_US]], ptr @f.e, align 1
	; CHECK-NEXT: store i8 10, ptr [[TMP1]], align 1			; CHECK-NEXT: store i8 10, ptr [[TMP1]], align 1
	; CHECK-NEXT: [[IV_NEXT]] = add nsw i32 [[IV]], 1			; CHECK-NEXT: [[IV_NEXT]] = add nsw i32 [[IV]], 1
	; CHECK-NEXT: [[EC:%.*]] = icmp eq i32 [[IV_NEXT]], 500			; CHECK-NEXT: [[EC:%.*]] = icmp eq i32 [[IV_NEXT]], 500
	; CHECK-NEXT: br i1 [[EC]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP7:![0-9]+]]			; CHECK-NEXT: br i1 [[EC]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP8:![0-9]+]]
	; CHECK: exit:			; CHECK: exit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	br label %outer.header			br label %outer.header

	outer.header: ; preds = %cleanup, %entry			outer.header: ; preds = %cleanup, %entry
	%0 = load ptr, ptr @d, align 1			%0 = load ptr, ptr @d, align 1
	Show All 33 Lines

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
AyalUnsubmitted Done Reply Inline Actions line dropped intentionally? Ayal: line dropped intentionally?
fhahnAuthorUnsubmitted Done Reply Inline Actions No that was an accident, added back, thanks! fhahn: No that was an accident, added back, thanks!
; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=2 %s -S \| FileCheck %s		; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=2 %s -S \| FileCheck %s

; Tests for checking uniformity within a VF.		; Tests for checking uniformity within a VF.

; for (iv = 0 ; ; iv += 1) B[iv] = A[iv/1] + 42;		; for (iv = 0 ; ; iv += 1) B[iv] = A[iv/1] + 42;
define void @ld_div1_step1_start0_ind1(ptr noalias %A, ptr noalias %B) {		define void @ld_div1_step1_start0_ind1(ptr noalias %A, ptr noalias %B) {
; CHECK-LABEL: define void @ld_div1_step1_start0_ind1		; CHECK-LABEL: define void @ld_div1_step1_start0_ind1
; CHECK-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {		; CHECK-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	loop:
%iv_next = add nsw i64 %iv, 1		%iv_next = add nsw i64 %iv, 1
%cond = icmp eq i64 %iv_next, 1000		%cond = icmp eq i64 %iv_next, 1000
br i1 %cond, label %exit, label %loop		br i1 %cond, label %exit, label %loop
exit:		exit:
ret void		ret void
}		}

; for (iv = 0 ; ; iv += 1) B[iv] = A[iv/2] + 42;		; for (iv = 0 ; ; iv += 1) B[iv] = A[iv/2] + 42;
		; A[iv/2] is uniform for VF=2.
define void @ld_div2_step1_start0_ind1(ptr noalias %A, ptr noalias %B) {		define void @ld_div2_step1_start0_ind1(ptr noalias %A, ptr noalias %B) {
; CHECK-LABEL: define void @ld_div2_step1_start0_ind1		; CHECK-LABEL: define void @ld_div2_step1_start0_ind1
; CHECK-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {		; CHECK-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]		; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; CHECK: vector.ph:		; CHECK: vector.ph:
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[VEC_IND:%.]] = phi <2 x i64> [ <i64 0, i64 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; CHECK-NEXT: [[TMP1:%.*]] = udiv <2 x i64> [[VEC_IND]], <i64 2, i64 2>		; CHECK-NEXT: [[TMP1:%.*]] = udiv i64 [[TMP0]], 2
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i64> [[TMP1]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[TMP2]], align 8
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1		; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[TMP3]], i64 0
; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP4]]		; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
; CHECK-NEXT: [[TMP6:%.*]] = load i64, ptr [[TMP3]], align 8		; CHECK-NEXT: [[TMP4:%.*]] = add nsw <2 x i64> [[BROADCAST_SPLAT]], <i64 42, i64 42>
; CHECK-NEXT: [[TMP7:%.*]] = load i64, ptr [[TMP5]], align 8		; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i64> poison, i64 [[TMP6]], i32 0		; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[TMP5]], i32 0
; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i64> [[TMP8]], i64 [[TMP7]], i32 1		; CHECK-NEXT: store <2 x i64> [[TMP4]], ptr [[TMP6]], align 8
; CHECK-NEXT: [[TMP10:%.*]] = add nsw <2 x i64> [[TMP9]], <i64 42, i64 42>
; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i64, ptr [[TMP11]], i32 0
; CHECK-NEXT: store <2 x i64> [[TMP10]], ptr [[TMP12]], align 8
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>		; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000		; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000		; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]		; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:		; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]		; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
; CHECK-NEXT: br label [[LOOP:%.*]]		; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:		; CHECK: loop:
; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]		; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
▲ Show 20 Lines • Show All 1,118 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_and.ll

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	loop:
%iv_next = add nsw i64 %iv, 1		%iv_next = add nsw i64 %iv, 1
%cond = icmp eq i64 %iv_next, 1000		%cond = icmp eq i64 %iv_next, 1000
br i1 %cond, label %exit, label %loop		br i1 %cond, label %exit, label %loop
exit:		exit:
ret void		ret void
}		}

; for (iv = 0 ; ; iv += 1) B[iv] = A[iv&-2] + 42;		; for (iv = 0 ; ; iv += 1) B[iv] = A[iv&-2] + 42;
		; A[iv&-2] is uniform for VF=2.
define void @ld_and_neg2_step1_start0_ind1(ptr noalias %A, ptr noalias %B) {		define void @ld_and_neg2_step1_start0_ind1(ptr noalias %A, ptr noalias %B) {
; CHECK-LABEL: define void @ld_and_neg2_step1_start0_ind1		; CHECK-LABEL: define void @ld_and_neg2_step1_start0_ind1
; CHECK-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {		; CHECK-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]		; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; CHECK: vector.ph:		; CHECK: vector.ph:
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[VEC_IND:%.]] = phi <2 x i64> [ <i64 0, i64 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; CHECK-NEXT: [[TMP1:%.*]] = and <2 x i64> [[VEC_IND]], <i64 -2, i64 -2>		; CHECK-NEXT: [[TMP1:%.*]] = and i64 [[TMP0]], -2
; CHECK-NEXT: [[TMP2:%.*]] = extractelement <2 x i64> [[TMP1]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = load i64, ptr [[TMP2]], align 8
		AyalUnsubmitted Done Reply Inline Actions note: this load from A[iv & -2] is now recognized as uniform across VF=2. Ayal: note: this load from A[iv & -2] is now recognized as uniform across VF=2.
		fhahnAuthorUnsubmitted Done Reply Inline Actions Added as a comment, thanks! fhahn: Added as a comment, thanks!
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1		; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[TMP3]], i64 0
; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP4]]		; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
; CHECK-NEXT: [[TMP6:%.*]] = load i64, ptr [[TMP3]], align 8		; CHECK-NEXT: [[TMP4:%.*]] = add nsw <2 x i64> [[BROADCAST_SPLAT]], <i64 42, i64 42>
; CHECK-NEXT: [[TMP7:%.*]] = load i64, ptr [[TMP5]], align 8		; CHECK-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
; CHECK-NEXT: [[TMP8:%.*]] = insertelement <2 x i64> poison, i64 [[TMP6]], i32 0		; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[TMP5]], i32 0
; CHECK-NEXT: [[TMP9:%.*]] = insertelement <2 x i64> [[TMP8]], i64 [[TMP7]], i32 1		; CHECK-NEXT: store <2 x i64> [[TMP4]], ptr [[TMP6]], align 8
; CHECK-NEXT: [[TMP10:%.*]] = add nsw <2 x i64> [[TMP9]], <i64 42, i64 42>
; CHECK-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i64, ptr [[TMP11]], i32 0
; CHECK-NEXT: store <2 x i64> [[TMP10]], ptr [[TMP12]], align 8
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; CHECK-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>		; CHECK-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
; CHECK-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000		; CHECK-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000		; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]		; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:		; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]		; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
; CHECK-NEXT: br label [[LOOP:%.*]]		; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:		; CHECK: loop:
; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]		; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
▲ Show 20 Lines • Show All 655 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_div_urem.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=8 %s -S \| FileCheck %s		; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=8 %s -S \| FileCheck %s

; Tests for checking uniformity within a VF.		; Tests for checking uniformity within a VF.

; for (iv = 0 ; ; iv += 1) B[iv] = A[(iv/2)%3];		; for (iv = 0 ; ; iv += 1) B[iv] = A[(iv/2)%3];
		; A[(iv/2)%3] is not uniform for VF=8.
		AyalUnsubmitted Done Reply Inline Actions note: load from A[(iv/2)%3] rightfully not recognized as uniform for VF=8. Ayal: note: load from A[(iv/2)%3] rightfully not recognized as uniform for VF=8.
		fhahnAuthorUnsubmitted Done Reply Inline Actions Added comment, thanks! fhahn: Added comment, thanks!
define void @ld_div2_urem3_1(ptr noalias %A, ptr noalias %B) {		define void @ld_div2_urem3_1(ptr noalias %A, ptr noalias %B) {
; CHECK-LABEL: define void @ld_div2_urem3_1		; CHECK-LABEL: define void @ld_div2_urem3_1
; CHECK-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {		; CHECK-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]		; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; CHECK: vector.ph:		; CHECK: vector.ph:
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:		; CHECK: vector.body:
▲ Show 20 Lines • Show All 261 Lines • ▼ Show 20 Lines	loop:
%iv_next = add nsw i64 %iv, 1		%iv_next = add nsw i64 %iv, 1
%cond = icmp eq i64 %iv_next, 1000		%cond = icmp eq i64 %iv_next, 1000
br i1 %cond, label %exit, label %loop		br i1 %cond, label %exit, label %loop
exit:		exit:
ret void		ret void
}		}

; for (iv = 0 ; ; iv += 1) B[iv] = A[(iv/8)%3];		; for (iv = 0 ; ; iv += 1) B[iv] = A[(iv/8)%3];
		; A[(iv/8)%3] is uniform for VF=8.
define void @ld_div8_urem3(ptr noalias %A, ptr noalias %B) {		define void @ld_div8_urem3(ptr noalias %A, ptr noalias %B) {
; CHECK-LABEL: define void @ld_div8_urem3		; CHECK-LABEL: define void @ld_div8_urem3
; CHECK-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {		; CHECK-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]		; CHECK-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; CHECK: vector.ph:		; CHECK: vector.ph:
; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]		; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
; CHECK: vector.body:		; CHECK: vector.body:
; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; CHECK-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[VEC_IND:%.]] = phi <8 x i64> [ <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; CHECK-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; CHECK-NEXT: [[TMP1:%.*]] = udiv <8 x i64> [[VEC_IND]], <i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8>		; CHECK-NEXT: [[TMP1:%.*]] = udiv i64 [[TMP0]], 8
; CHECK-NEXT: [[TMP2:%.*]] = urem <8 x i64> [[TMP1]], <i64 3, i64 3, i64 3, i64 3, i64 3, i64 3, i64 3, i64 3>		; CHECK-NEXT: [[TMP2:%.*]] = urem i64 [[TMP1]], 3
; CHECK-NEXT: [[TMP3:%.*]] = extractelement <8 x i64> [[TMP2]], i32 0		; CHECK-NEXT: [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]
; CHECK-NEXT: [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP3]]		; CHECK-NEXT: [[TMP4:%.*]] = load i64, ptr [[TMP3]], align 8
; CHECK-NEXT: [[TMP5:%.*]] = extractelement <8 x i64> [[TMP2]], i32 1		; CHECK-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <8 x i64> poison, i64 [[TMP4]], i64 0
; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP5]]		; CHECK-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i64> [[BROADCAST_SPLATINSERT]], <8 x i64> poison, <8 x i32> zeroinitializer
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <8 x i64> [[TMP2]], i32 2		; CHECK-NEXT: [[TMP5:%.*]] = add nsw <8 x i64> [[BROADCAST_SPLAT]], <i64 42, i64 42, i64 42, i64 42, i64 42, i64 42, i64 42, i64 42>
; CHECK-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP7]]		; CHECK-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
; CHECK-NEXT: [[TMP9:%.*]] = extractelement <8 x i64> [[TMP2]], i32 3		; CHECK-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[TMP6]], i32 0
; CHECK-NEXT: [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP9]]		; CHECK-NEXT: store <8 x i64> [[TMP5]], ptr [[TMP7]], align 8
		AyalUnsubmitted Done Reply Inline Actions note: this load from A[(iv / 8) % 3] is now recognized as uniform for VF=8. Ayal: note: this load from A[(iv / 8) % 3] is now recognized as uniform for VF=8.
		fhahnAuthorUnsubmitted Done Reply Inline Actions added comment fhahn: added comment
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <8 x i64> [[TMP2]], i32 4
; CHECK-NEXT: [[TMP12:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP11]]
; CHECK-NEXT: [[TMP13:%.*]] = extractelement <8 x i64> [[TMP2]], i32 5
; CHECK-NEXT: [[TMP14:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP13]]
; CHECK-NEXT: [[TMP15:%.*]] = extractelement <8 x i64> [[TMP2]], i32 6
; CHECK-NEXT: [[TMP16:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP15]]
; CHECK-NEXT: [[TMP17:%.*]] = extractelement <8 x i64> [[TMP2]], i32 7
; CHECK-NEXT: [[TMP18:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP17]]
; CHECK-NEXT: [[TMP19:%.*]] = load i64, ptr [[TMP4]], align 8
; CHECK-NEXT: [[TMP20:%.*]] = load i64, ptr [[TMP6]], align 8
; CHECK-NEXT: [[TMP21:%.*]] = load i64, ptr [[TMP8]], align 8
; CHECK-NEXT: [[TMP22:%.*]] = load i64, ptr [[TMP10]], align 8
; CHECK-NEXT: [[TMP23:%.*]] = load i64, ptr [[TMP12]], align 8
; CHECK-NEXT: [[TMP24:%.*]] = load i64, ptr [[TMP14]], align 8
; CHECK-NEXT: [[TMP25:%.*]] = load i64, ptr [[TMP16]], align 8
; CHECK-NEXT: [[TMP26:%.*]] = load i64, ptr [[TMP18]], align 8
; CHECK-NEXT: [[TMP27:%.*]] = insertelement <8 x i64> poison, i64 [[TMP19]], i32 0
; CHECK-NEXT: [[TMP28:%.*]] = insertelement <8 x i64> [[TMP27]], i64 [[TMP20]], i32 1
; CHECK-NEXT: [[TMP29:%.*]] = insertelement <8 x i64> [[TMP28]], i64 [[TMP21]], i32 2
; CHECK-NEXT: [[TMP30:%.*]] = insertelement <8 x i64> [[TMP29]], i64 [[TMP22]], i32 3
; CHECK-NEXT: [[TMP31:%.*]] = insertelement <8 x i64> [[TMP30]], i64 [[TMP23]], i32 4
; CHECK-NEXT: [[TMP32:%.*]] = insertelement <8 x i64> [[TMP31]], i64 [[TMP24]], i32 5
; CHECK-NEXT: [[TMP33:%.*]] = insertelement <8 x i64> [[TMP32]], i64 [[TMP25]], i32 6
; CHECK-NEXT: [[TMP34:%.*]] = insertelement <8 x i64> [[TMP33]], i64 [[TMP26]], i32 7
; CHECK-NEXT: [[TMP35:%.*]] = add nsw <8 x i64> [[TMP34]], <i64 42, i64 42, i64 42, i64 42, i64 42, i64 42, i64 42, i64 42>
; CHECK-NEXT: [[TMP36:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
; CHECK-NEXT: [[TMP37:%.*]] = getelementptr inbounds i64, ptr [[TMP36]], i32 0
; CHECK-NEXT: store <8 x i64> [[TMP35]], ptr [[TMP37]], align 8
; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8		; CHECK-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
; CHECK-NEXT: [[VEC_IND_NEXT]] = add <8 x i64> [[VEC_IND]], <i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8, i64 8>		; CHECK-NEXT: [[TMP8:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
; CHECK-NEXT: [[TMP38:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000		; CHECK-NEXT: br i1 [[TMP8]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
; CHECK-NEXT: br i1 [[TMP38]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP8:![0-9]+]]
; CHECK: middle.block:		; CHECK: middle.block:
; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000		; CHECK-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]		; CHECK-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
; CHECK: scalar.ph:		; CHECK: scalar.ph:
; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]		; CHECK-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
; CHECK-NEXT: br label [[LOOP:%.*]]		; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:		; CHECK: loop:
; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]		; CHECK-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
Show All 30 Lines

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_lshr.ll

Show First 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	loop:
%iv_next = add nsw i64 %iv, 1		%iv_next = add nsw i64 %iv, 1
%cond = icmp eq i64 %iv_next, 1000		%cond = icmp eq i64 %iv_next, 1000
br i1 %cond, label %exit, label %loop		br i1 %cond, label %exit, label %loop
exit:		exit:
ret void		ret void
}		}

; for (iv = 0 ; ; iv += 1) B[iv] = A[iv>>1] + 42;		; for (iv = 0 ; ; iv += 1) B[iv] = A[iv>>1] + 42;
		; A[iv>>1] is uniform for VF=2 but not VF=4.
define void @ld_lshr1_step1_start0_ind1(ptr noalias %A, ptr noalias %B) {		define void @ld_lshr1_step1_start0_ind1(ptr noalias %A, ptr noalias %B) {
; VF2-LABEL: define void @ld_lshr1_step1_start0_ind1		; VF2-LABEL: define void @ld_lshr1_step1_start0_ind1
; VF2-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {		; VF2-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {
		AyalUnsubmitted Done Reply Inline Actions note: this load from A[iv >> 1] is now recognized as uniform for VF=2. Check that it is not considered uniform for VF=4? Ayal: note: this load from A[iv >> 1] is now recognized as uniform for VF=2. Check that it is not…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Add check lines for VF=4 as well separately. fhahn: Add check lines for VF=4 as well separately.
; VF2-NEXT: entry:		; VF2-NEXT: entry:
; VF2-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]		; VF2-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; VF2: vector.ph:		; VF2: vector.ph:
; VF2-NEXT: br label [[VECTOR_BODY:%.*]]		; VF2-NEXT: br label [[VECTOR_BODY:%.*]]
; VF2: vector.body:		; VF2: vector.body:
; VF2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; VF2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; VF2-NEXT: [[VEC_IND:%.]] = phi <2 x i64> [ <i64 0, i64 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
; VF2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; VF2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; VF2-NEXT: [[TMP1:%.*]] = lshr <2 x i64> [[VEC_IND]], <i64 1, i64 1>		; VF2-NEXT: [[TMP1:%.*]] = lshr i64 [[TMP0]], 1
; VF2-NEXT: [[TMP2:%.*]] = extractelement <2 x i64> [[TMP1]], i32 0		; VF2-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
; VF2-NEXT: [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]		; VF2-NEXT: [[TMP3:%.*]] = load i64, ptr [[TMP2]], align 8
; VF2-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1		; VF2-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[TMP3]], i64 0
; VF2-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP4]]		; VF2-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
; VF2-NEXT: [[TMP6:%.*]] = load i64, ptr [[TMP3]], align 8		; VF2-NEXT: [[TMP4:%.*]] = add nsw <2 x i64> [[BROADCAST_SPLAT]], <i64 42, i64 42>
; VF2-NEXT: [[TMP7:%.*]] = load i64, ptr [[TMP5]], align 8		; VF2-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
; VF2-NEXT: [[TMP8:%.*]] = insertelement <2 x i64> poison, i64 [[TMP6]], i32 0		; VF2-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[TMP5]], i32 0
; VF2-NEXT: [[TMP9:%.*]] = insertelement <2 x i64> [[TMP8]], i64 [[TMP7]], i32 1		; VF2-NEXT: store <2 x i64> [[TMP4]], ptr [[TMP6]], align 8
; VF2-NEXT: [[TMP10:%.*]] = add nsw <2 x i64> [[TMP9]], <i64 42, i64 42>
; VF2-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
; VF2-NEXT: [[TMP12:%.*]] = getelementptr inbounds i64, ptr [[TMP11]], i32 0
; VF2-NEXT: store <2 x i64> [[TMP10]], ptr [[TMP12]], align 8
; VF2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2		; VF2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; VF2-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>		; VF2-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
; VF2-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000		; VF2-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; VF2-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; VF2: middle.block:		; VF2: middle.block:
; VF2-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000		; VF2-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
; VF2-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]		; VF2-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
; VF2: scalar.ph:		; VF2: scalar.ph:
; VF2-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]		; VF2-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
; VF2-NEXT: br label [[LOOP:%.*]]		; VF2-NEXT: br label [[LOOP:%.*]]
; VF2: loop:		; VF2: loop:
; VF2-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]		; VF2-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	loop:
%iv_next = add nsw i64 %iv, 1		%iv_next = add nsw i64 %iv, 1
%cond = icmp eq i64 %iv_next, 1000		%cond = icmp eq i64 %iv_next, 1000
br i1 %cond, label %exit, label %loop		br i1 %cond, label %exit, label %loop
exit:		exit:
ret void		ret void
}		}

; for (iv = 0 ; ; iv += 1) B[iv] = A[iv>>2] + 42;		; for (iv = 0 ; ; iv += 1) B[iv] = A[iv>>2] + 42;
		; A[iv>>2] is uniform for VF=2 and VF=4.
define void @ld_lshr2_step1_start0_ind1(ptr noalias %A, ptr noalias %B) {		define void @ld_lshr2_step1_start0_ind1(ptr noalias %A, ptr noalias %B) {
; VF2-LABEL: define void @ld_lshr2_step1_start0_ind1		; VF2-LABEL: define void @ld_lshr2_step1_start0_ind1
		AyalUnsubmitted Done Reply Inline Actions note: load from A[iv>>2] recognized as uniform for VF=2, should also hold for VF=4. Ayal: note: load from A[iv>>2] recognized as uniform for VF=2, should also hold for VF=4.
		fhahnAuthorUnsubmitted Done Reply Inline Actions added comment, thanks! fhahn: added comment, thanks!
; VF2-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {		; VF2-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {
; VF2-NEXT: entry:		; VF2-NEXT: entry:
; VF2-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]		; VF2-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; VF2: vector.ph:		; VF2: vector.ph:
; VF2-NEXT: br label [[VECTOR_BODY:%.*]]		; VF2-NEXT: br label [[VECTOR_BODY:%.*]]
; VF2: vector.body:		; VF2: vector.body:
; VF2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; VF2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; VF2-NEXT: [[VEC_IND:%.]] = phi <2 x i64> [ <i64 0, i64 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
; VF2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; VF2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; VF2-NEXT: [[TMP1:%.*]] = lshr <2 x i64> [[VEC_IND]], <i64 2, i64 2>		; VF2-NEXT: [[TMP1:%.*]] = lshr i64 [[TMP0]], 2
; VF2-NEXT: [[TMP2:%.*]] = extractelement <2 x i64> [[TMP1]], i32 0		; VF2-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
; VF2-NEXT: [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]		; VF2-NEXT: [[TMP3:%.*]] = load i64, ptr [[TMP2]], align 8
; VF2-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP1]], i32 1		; VF2-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[TMP3]], i64 0
; VF2-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP4]]		; VF2-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
; VF2-NEXT: [[TMP6:%.*]] = load i64, ptr [[TMP3]], align 8		; VF2-NEXT: [[TMP4:%.*]] = add nsw <2 x i64> [[BROADCAST_SPLAT]], <i64 42, i64 42>
; VF2-NEXT: [[TMP7:%.*]] = load i64, ptr [[TMP5]], align 8		; VF2-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
; VF2-NEXT: [[TMP8:%.*]] = insertelement <2 x i64> poison, i64 [[TMP6]], i32 0		; VF2-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[TMP5]], i32 0
; VF2-NEXT: [[TMP9:%.*]] = insertelement <2 x i64> [[TMP8]], i64 [[TMP7]], i32 1		; VF2-NEXT: store <2 x i64> [[TMP4]], ptr [[TMP6]], align 8
; VF2-NEXT: [[TMP10:%.*]] = add nsw <2 x i64> [[TMP9]], <i64 42, i64 42>
; VF2-NEXT: [[TMP11:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
; VF2-NEXT: [[TMP12:%.*]] = getelementptr inbounds i64, ptr [[TMP11]], i32 0
; VF2-NEXT: store <2 x i64> [[TMP10]], ptr [[TMP12]], align 8
; VF2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2		; VF2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; VF2-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>		; VF2-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
; VF2-NEXT: [[TMP13:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000		; VF2-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
; VF2-NEXT: br i1 [[TMP13]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
; VF2: middle.block:		; VF2: middle.block:
; VF2-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000		; VF2-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
; VF2-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]		; VF2-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
; VF2: scalar.ph:		; VF2: scalar.ph:
; VF2-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]		; VF2-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
; VF2-NEXT: br label [[LOOP:%.*]]		; VF2-NEXT: br label [[LOOP:%.*]]
; VF2: loop:		; VF2: loop:
; VF2-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]		; VF2-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
Show All 12 Lines
; VF4-LABEL: define void @ld_lshr2_step1_start0_ind1		; VF4-LABEL: define void @ld_lshr2_step1_start0_ind1
; VF4-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {		; VF4-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {
; VF4-NEXT: entry:		; VF4-NEXT: entry:
; VF4-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]		; VF4-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; VF4: vector.ph:		; VF4: vector.ph:
; VF4-NEXT: br label [[VECTOR_BODY:%.*]]		; VF4-NEXT: br label [[VECTOR_BODY:%.*]]
; VF4: vector.body:		; VF4: vector.body:
; VF4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; VF4-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; VF4-NEXT: [[VEC_IND:%.]] = phi <4 x i64> [ <i64 0, i64 1, i64 2, i64 3>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
; VF4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; VF4-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; VF4-NEXT: [[TMP1:%.*]] = lshr <4 x i64> [[VEC_IND]], <i64 2, i64 2, i64 2, i64 2>		; VF4-NEXT: [[TMP1:%.*]] = lshr i64 [[TMP0]], 2
; VF4-NEXT: [[TMP2:%.*]] = extractelement <4 x i64> [[TMP1]], i32 0		; VF4-NEXT: [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP1]]
; VF4-NEXT: [[TMP3:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP2]]		; VF4-NEXT: [[TMP3:%.*]] = load i64, ptr [[TMP2]], align 8
; VF4-NEXT: [[TMP4:%.*]] = extractelement <4 x i64> [[TMP1]], i32 1		; VF4-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <4 x i64> poison, i64 [[TMP3]], i64 0
; VF4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP4]]		; VF4-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <4 x i64> [[BROADCAST_SPLATINSERT]], <4 x i64> poison, <4 x i32> zeroinitializer
; VF4-NEXT: [[TMP6:%.*]] = extractelement <4 x i64> [[TMP1]], i32 2		; VF4-NEXT: [[TMP4:%.*]] = add nsw <4 x i64> [[BROADCAST_SPLAT]], <i64 42, i64 42, i64 42, i64 42>
; VF4-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP6]]		; VF4-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
; VF4-NEXT: [[TMP8:%.*]] = extractelement <4 x i64> [[TMP1]], i32 3		; VF4-NEXT: [[TMP6:%.*]] = getelementptr inbounds i64, ptr [[TMP5]], i32 0
; VF4-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP8]]		; VF4-NEXT: store <4 x i64> [[TMP4]], ptr [[TMP6]], align 8
; VF4-NEXT: [[TMP10:%.*]] = load i64, ptr [[TMP3]], align 8
; VF4-NEXT: [[TMP11:%.*]] = load i64, ptr [[TMP5]], align 8
; VF4-NEXT: [[TMP12:%.*]] = load i64, ptr [[TMP7]], align 8
; VF4-NEXT: [[TMP13:%.*]] = load i64, ptr [[TMP9]], align 8
; VF4-NEXT: [[TMP14:%.*]] = insertelement <4 x i64> poison, i64 [[TMP10]], i32 0
; VF4-NEXT: [[TMP15:%.*]] = insertelement <4 x i64> [[TMP14]], i64 [[TMP11]], i32 1
; VF4-NEXT: [[TMP16:%.*]] = insertelement <4 x i64> [[TMP15]], i64 [[TMP12]], i32 2
; VF4-NEXT: [[TMP17:%.*]] = insertelement <4 x i64> [[TMP16]], i64 [[TMP13]], i32 3
; VF4-NEXT: [[TMP18:%.*]] = add nsw <4 x i64> [[TMP17]], <i64 42, i64 42, i64 42, i64 42>
; VF4-NEXT: [[TMP19:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
; VF4-NEXT: [[TMP20:%.*]] = getelementptr inbounds i64, ptr [[TMP19]], i32 0
; VF4-NEXT: store <4 x i64> [[TMP18]], ptr [[TMP20]], align 8
; VF4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4		; VF4-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
; VF4-NEXT: [[VEC_IND_NEXT]] = add <4 x i64> [[VEC_IND]], <i64 4, i64 4, i64 4, i64 4>		; VF4-NEXT: [[TMP7:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
; VF4-NEXT: [[TMP21:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000		; VF4-NEXT: br i1 [[TMP7]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
; VF4-NEXT: br i1 [[TMP21]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
; VF4: middle.block:		; VF4: middle.block:
; VF4-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000		; VF4-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
; VF4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]		; VF4-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
; VF4: scalar.ph:		; VF4: scalar.ph:
; VF4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]		; VF4-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
; VF4-NEXT: br label [[LOOP:%.*]]		; VF4-NEXT: br label [[LOOP:%.*]]
; VF4: loop:		; VF4: loop:
; VF4-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]		; VF4-NEXT: [[IV:%.]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.]], [[LOOP]] ]
▲ Show 20 Lines • Show All 555 Lines • ▼ Show 20 Lines	loop:
store i64 %calc, ptr %gep_st, align 8		store i64 %calc, ptr %gep_st, align 8
%iv_next = add nsw i64 %iv, 3		%iv_next = add nsw i64 %iv, 3
%cond = icmp eq i64 %iv_next, 1000		%cond = icmp eq i64 %iv_next, 1000
br i1 %cond, label %exit, label %loop		br i1 %cond, label %exit, label %loop
exit:		exit:
ret void		ret void
}		}

; for (iv = 1 ; ; iv += 1) B[iv] = A[iv>>1] + 42;		; for (iv = 1 ; ; iv += 1) B[iv] = A[iv>>1] + 42;
		; A[iv>>1] not uniform for VF=2 due to alignment (iv starts at 1).
		AyalUnsubmitted Done Reply Inline Actions note: load from A[1+i>>1] not recognized as uniform for VF=2 due to alignment. Ayal: note: load from A[1+i>>1] not recognized as uniform for VF=2 due to alignment.
		fhahnAuthorUnsubmitted Done Reply Inline Actions Added note, thanks! fhahn: Added note, thanks!
define void @ld_lshr1_step1_start1_ind1(ptr noalias %A, ptr noalias %B) {		define void @ld_lshr1_step1_start1_ind1(ptr noalias %A, ptr noalias %B) {
; VF2-LABEL: define void @ld_lshr1_step1_start1_ind1		; VF2-LABEL: define void @ld_lshr1_step1_start1_ind1
; VF2-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {		; VF2-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {
; VF2-NEXT: entry:		; VF2-NEXT: entry:
; VF2-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]		; VF2-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; VF2: vector.ph:		; VF2: vector.ph:
; VF2-NEXT: br label [[VECTOR_BODY:%.*]]		; VF2-NEXT: br label [[VECTOR_BODY:%.*]]
; VF2: vector.body:		; VF2: vector.body:
▲ Show 20 Lines • Show All 514 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=2 %s -S \| FileCheck --check-prefix=VF2 %s		; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=2 %s -S \| FileCheck --check-prefix=VF2 %s
; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=4 %s -S \| FileCheck --check-prefix=VF4 %s		; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=4 %s -S \| FileCheck --check-prefix=VF4 %s

; for (iv = 0, iv2 = 0 ; ; iv += 1, iv2 += 1) B[iv] = A[iv/1 + iv2/1] + 42;		; for (iv = 0, iv2 = 0 ; ; iv += 1, iv2 += 1) B[iv] = A[iv/1 + iv2/1] + 42;
define void @ld_div1_step1_start0_ind2(ptr noalias %A, ptr noalias %B) {		define void @ld_div1_step1_start0_ind2(ptr noalias %A, ptr noalias %B) {
; VF2-LABEL: define void @ld_div1_step1_start0_ind2		; VF2-LABEL: define void @ld_div1_step1_start0_ind2
AyalUnsubmitted Done Reply Inline Actions lines changed intentionally? Ayal: lines changed intentionally?
fhahnAuthorUnsubmitted Done Reply Inline Actions Those were left over from the patch that added new run lines, removed in fcc135a8d6a7. fhahn: Those were left over from the patch that added new run lines, removed in fcc135a8d6a7.
; VF2-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {		; VF2-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {
; VF2-NEXT: entry:		; VF2-NEXT: entry:
; VF2-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]		; VF2-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; VF2: vector.ph:		; VF2: vector.ph:
; VF2-NEXT: br label [[VECTOR_BODY:%.*]]		; VF2-NEXT: br label [[VECTOR_BODY:%.*]]
; VF2: vector.body:		; VF2: vector.body:
; VF2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; VF2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; VF2-NEXT: [[VEC_IND:%.]] = phi <2 x i64> [ <i64 0, i64 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]		; VF2-NEXT: [[VEC_IND:%.]] = phi <2 x i64> [ <i64 0, i64 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
▲ Show 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	loop:
%iv2_next = add nsw i64 %iv2, 1		%iv2_next = add nsw i64 %iv2, 1
%iv_next = add nsw i64 %iv, 1		%iv_next = add nsw i64 %iv, 1
%cond = icmp eq i64 %iv_next, 1000		%cond = icmp eq i64 %iv_next, 1000
br i1 %cond, label %exit, label %loop		br i1 %cond, label %exit, label %loop
exit:		exit:
ret void		ret void
}		}

; for (iv = 0, iv2 = 0 ; ; iv += 1, iv2 += 1) B[iv] = A[iv/2 + iv2/2] + 42;		; for (iv = 0, iv2 = 0 ; ; iv += 1, iv2 += 1) B[iv] = A[iv/2 + iv2/2] + 42;
		; A[iv/2 + iv2/2] is uniform for VF=2 but not for VF=4.
		AyalUnsubmitted Done Reply Inline Actions note: load from A[iv/2 + iv2/2] i.e. A[2(iv/2)] recognized as uniform for VF=2, but should not for VF > 2. Ayal:* note: load from A[iv/2 + iv2/2] i.e. A[2*(iv/2)] recognized as uniform for VF=2, but should not…
		fhahnAuthorUnsubmitted Done Reply Inline Actions added note, thanks! fhahn: added note, thanks!
define void @ld_div2_step1_start0_ind2(ptr noalias %A, ptr noalias %B) {		define void @ld_div2_step1_start0_ind2(ptr noalias %A, ptr noalias %B) {
; VF2-LABEL: define void @ld_div2_step1_start0_ind2		; VF2-LABEL: define void @ld_div2_step1_start0_ind2
AyalUnsubmitted Done Reply Inline Actions lines dropped intentionally? Ayal: lines dropped intentionally?
fhahnAuthorUnsubmitted Done Reply Inline Actions Those were left over from the patch that added new run lines, removed in fcc135a8d6a7. fhahn: Those were left over from the patch that added new run lines, removed in fcc135a8d6a7.
; VF2-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {		; VF2-SAME: (ptr noalias [[A:%.]], ptr noalias [[B:%.]]) {
; VF2-NEXT: entry:		; VF2-NEXT: entry:
; VF2-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]		; VF2-NEXT: br i1 false, label [[SCALAR_PH:%.]], label [[VECTOR_PH:%.]]
; VF2: vector.ph:		; VF2: vector.ph:
; VF2-NEXT: br label [[VECTOR_BODY:%.*]]		; VF2-NEXT: br label [[VECTOR_BODY:%.*]]
; VF2: vector.body:		; VF2: vector.body:
; VF2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]		; VF2-NEXT: [[INDEX:%.]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.]], [[VECTOR_BODY]] ]
; VF2-NEXT: [[VEC_IND:%.]] = phi <2 x i64> [ <i64 0, i64 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT:%.]], [[VECTOR_BODY]] ]
; VF2-NEXT: [[VEC_IND2:%.]] = phi <2 x i64> [ <i64 0, i64 1>, [[VECTOR_PH]] ], [ [[VEC_IND_NEXT3:%.]], [[VECTOR_BODY]] ]
; VF2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0		; VF2-NEXT: [[TMP0:%.*]] = add i64 [[INDEX]], 0
; VF2-NEXT: [[TMP1:%.*]] = udiv <2 x i64> [[VEC_IND]], <i64 2, i64 2>		; VF2-NEXT: [[TMP1:%.*]] = add i64 [[INDEX]], 0
; VF2-NEXT: [[TMP2:%.*]] = udiv <2 x i64> [[VEC_IND2]], <i64 2, i64 2>		; VF2-NEXT: [[TMP2:%.*]] = udiv i64 [[TMP1]], 2
; VF2-NEXT: [[TMP3:%.*]] = add <2 x i64> [[TMP1]], [[TMP2]]		; VF2-NEXT: [[TMP3:%.*]] = udiv i64 [[TMP0]], 2
; VF2-NEXT: [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0		; VF2-NEXT: [[TMP4:%.*]] = add i64 [[TMP2]], [[TMP3]]
; VF2-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP4]]		; VF2-NEXT: [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP4]]
; VF2-NEXT: [[TMP6:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1		; VF2-NEXT: [[TMP6:%.*]] = load i64, ptr [[TMP5]], align 8
; VF2-NEXT: [[TMP7:%.*]] = getelementptr inbounds i64, ptr [[A]], i64 [[TMP6]]		; VF2-NEXT: [[BROADCAST_SPLATINSERT:%.*]] = insertelement <2 x i64> poison, i64 [[TMP6]], i64 0
; VF2-NEXT: [[TMP8:%.*]] = load i64, ptr [[TMP5]], align 8		; VF2-NEXT: [[BROADCAST_SPLAT:%.*]] = shufflevector <2 x i64> [[BROADCAST_SPLATINSERT]], <2 x i64> poison, <2 x i32> zeroinitializer
; VF2-NEXT: [[TMP9:%.*]] = load i64, ptr [[TMP7]], align 8		; VF2-NEXT: [[TMP7:%.*]] = add nsw <2 x i64> [[BROADCAST_SPLAT]], <i64 42, i64 42>
; VF2-NEXT: [[TMP10:%.*]] = insertelement <2 x i64> poison, i64 [[TMP8]], i32 0		; VF2-NEXT: [[TMP8:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP1]]
; VF2-NEXT: [[TMP11:%.*]] = insertelement <2 x i64> [[TMP10]], i64 [[TMP9]], i32 1		; VF2-NEXT: [[TMP9:%.*]] = getelementptr inbounds i64, ptr [[TMP8]], i32 0
; VF2-NEXT: [[TMP12:%.*]] = add nsw <2 x i64> [[TMP11]], <i64 42, i64 42>		; VF2-NEXT: store <2 x i64> [[TMP7]], ptr [[TMP9]], align 8
; VF2-NEXT: [[TMP13:%.*]] = getelementptr inbounds i64, ptr [[B]], i64 [[TMP0]]
; VF2-NEXT: [[TMP14:%.*]] = getelementptr inbounds i64, ptr [[TMP13]], i32 0
; VF2-NEXT: store <2 x i64> [[TMP12]], ptr [[TMP14]], align 8
; VF2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2		; VF2-NEXT: [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 2
; VF2-NEXT: [[VEC_IND_NEXT]] = add <2 x i64> [[VEC_IND]], <i64 2, i64 2>		; VF2-NEXT: [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
; VF2-NEXT: [[VEC_IND_NEXT3]] = add <2 x i64> [[VEC_IND2]], <i64 2, i64 2>		; VF2-NEXT: br i1 [[TMP10]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; VF2-NEXT: [[TMP15:%.*]] = icmp eq i64 [[INDEX_NEXT]], 1000
; VF2-NEXT: br i1 [[TMP15]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP4:![0-9]+]]
; VF2: middle.block:		; VF2: middle.block:
; VF2-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000		; VF2-NEXT: [[CMP_N:%.*]] = icmp eq i64 1000, 1000
; VF2-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]		; VF2-NEXT: br i1 [[CMP_N]], label [[EXIT:%.*]], label [[SCALAR_PH]]
; VF2: scalar.ph:		; VF2: scalar.ph:
; VF2-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]		; VF2-NEXT: [[BC_RESUME_VAL:%.]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY:%.]] ]
; VF2-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]		; VF2-NEXT: [[BC_RESUME_VAL1:%.*]] = phi i64 [ 1000, [[MIDDLE_BLOCK]] ], [ 0, [[ENTRY]] ]
; VF2-NEXT: br label [[LOOP:%.*]]		; VF2-NEXT: br label [[LOOP:%.*]]
; VF2: loop:		; VF2: loop:
▲ Show 20 Lines • Show All 2,623 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Use SCEV for uniformity analysis across VFClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 527045

llvm/include/llvm/Analysis/LoopAccessAnalysis.h

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

llvm/lib/Analysis/LoopAccessAnalysis.cpp

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll

llvm/test/Transforms/LoopVectorize/pr47343-expander-lcssa-after-cfg-update.ll

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1.ll

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_and.ll

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_div_urem.ll

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_lshr.ll

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll

[LV] Use SCEV for uniformity analysis across VF
ClosedPublic