This is an archive of the discontinued LLVM Phabricator instance.

[LV] Use SCEV for uniformity analysis across VF
ClosedPublic

Authored by fhahn on Apr 20 2023, 2:02 PM.

Details

Summary

This patch uses SCEV to check if a value is uniform across a given VF.

The basic idea is to construct SCEVs where the AddRecs of the loop are
adjusted to reflect the version in the vectorized loop (Step multiplied
by VF). We construct a SCEV for the value of the first vector lane
(offset 0) and one for the last vector lane (VF - 1). If they are equal,
consider the expression uniform.

While re-writing expressions, we also need to catch expressions we
cannot determine uniformity (e.g. SCEVUnknown).

I might be missing something that makes this approach unworkable in
practice, but it may be an alternative to D147735.

Diff Detail

Event Timeline

fhahn created this revision.Apr 20 2023, 2:02 PM
fhahn requested review of this revision.Apr 20 2023, 2:02 PM
fhahn updated this revision to Diff 516890.Apr 25 2023, 1:42 PM

Rebase on top of extra tests from D147734.

The patch should be ready for review now.

nikic added a subscriber: nikic.Apr 25 2023, 2:08 PM
nikic added inline comments.
llvm/lib/Analysis/LoopAccessAnalysis.cpp
2556

Assert that the addrec loop is correct?

2563

Why is it valid to preserve nowrap flags here?

fhahn updated this revision to Diff 516906.Apr 25 2023, 2:21 PM

Address comments and fix broken variable names.

fhahn marked 2 inline comments as done.Apr 25 2023, 2:23 PM
fhahn added inline comments.
llvm/lib/Analysis/LoopAccessAnalysis.cpp
2556

Yep that's safer, added the assert. Should help due to the invariance check in ::visit().

2563

It should be valid for the vector loop, as we check that the vector induction doesn't overflow/wrap; although the loop we are dealing with here is still the original scalar loop... But it's not needed so I removed it, thanks!

vporpo added inline comments.Apr 25 2023, 2:56 PM
llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
350–352

I think the description comment needs to be updated. It should mention that can now check uniformity within the VF, which should explain why we need the VF argument.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2536

A short description what this rewriter is for?

2645

Can't think of any case where this would trigger, but if there is such a corner case we may still miss it if it is only checked in the debug build, silently causing miscompilations. So perhaps we should not guard it under NDEBUG and use an llvm_unreachable instead of an assertion so that it is also checked in release builds? Wdyt?

2646

nit: Use seq<> ? for (auto I : seq<unsigned>(1, VF->getKnownMinValue()-1))

fhahn updated this revision to Diff 517267.Apr 26 2023, 12:30 PM
fhahn marked 5 inline comments as done.

Adress comments thanks! Also rebased on top of the committed tests and extra tests with AND and LSHR added in 883eb88caed04b269da7ba69265fd7c4dc815231.

This version of the patch also includes a change to the rewriter to track if we have seen a UDiv expression and we skip rewriting the second expression if there's no UDiv. This is to keep compile-time as low as possible.

With the latest version geomean compile-time increases by +0.01%: https://llvm-compile-time-tracker.com/compare.php?from=883eb88caed04b269da7ba69265fd7c4dc815231&to=943232d7acbeb48b1f2ed613903c77a161f80807&stat=instructions:u

Without the UDiv restriction that goes to +0.08% - +0.10%

fhahn marked an inline comment as done.Apr 26 2023, 12:30 PM
fhahn added inline comments.
llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
350–352

Thanks, extended the comment.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2536

Added, thanks!

2645

IIUC this should be an invariant, but I added the assertion to give us a chance to catch any violations and investigate. I think having this check only when assertions are enabled is in line with how assertions are used widely in the LLVM codebase (for better or worse). But if people prefer to err on the side of caution, we can always run the checks, at the cost of extra compile-time overhead.

2646

Thanks, updated!

nikic added inline comments.Apr 26 2023, 12:36 PM
llvm/lib/Analysis/LoopAccessAnalysis.cpp
2586–2588

Maybe?

2652

In line with the other comparison?

fhahn updated this revision to Diff 517273.Apr 26 2023, 12:50 PM
fhahn marked an inline comment as done.

Simplify code as suggested, thanks!

fhahn marked 2 inline comments as done.Apr 26 2023, 12:51 PM
fhahn added inline comments.
llvm/lib/Analysis/LoopAccessAnalysis.cpp
2586–2588

Yep that's more compact, thanks!

2652

Updated! Originally the other check was also using SE->isKnownPredicate but it was increasing compile-time while not being needed for the first set of motivating cases.

Thanks for working on this @fhahn , and for adding all these new test cases, it looks good.
@nikic any more comments?

nikic added inline comments.Apr 27 2023, 1:28 PM
llvm/lib/Analysis/LoopAccessAnalysis.cpp
2628

Why do GEPs require special handling?

fhahn updated this revision to Diff 517856.Apr 28 2023, 3:26 AM
fhahn marked 2 inline comments as done.

Remove special case for GEP which isn't needed in the latest version, also re-add isLoopInvariant to visit().

fhahn marked an inline comment as done.Apr 28 2023, 3:30 AM
fhahn added inline comments.
llvm/lib/Analysis/LoopAccessAnalysis.cpp
2628

There's no need with the latest version, I removed it, thanks!

Thinking about this a bit, can't the form check be performed in terms of the original IV? Instead of computing the adjusted IV with the scaled index and an offset, can't we simply reason in terms of the relevant iterations of the original IV? I think this simply reduces to asking whether OrigIV mod VF is a loop invariant value and the high bits (OrigIV div VF) are fixed between iterations 0 and VF.

Saying that, I the later clause is slightly trickier than 0 and VF. It's any OrigIV mod VF == 0, and it's correspond OrigIV + VF -1. (Which is complicated subtraction expression involving the mod.) Unless maybe it's solving this part which leads to the current solution?

fhahn marked an inline comment as done.Apr 28 2023, 12:05 PM

Thinking about this a bit, can't the form check be performed in terms of the original IV? Instead of computing the adjusted IV with the scaled index and an offset, can't we simply reason in terms of the relevant iterations of the original IV? I think this simply reduces to asking whether OrigIV mod VF is a loop invariant value and the high bits (OrigIV div VF) are fixed between iterations 0 and VF.

Saying that, I the later clause is slightly trickier than 0 and VF. It's any OrigIV mod VF == 0, and it's correspond OrigIV + VF -1. (Which is complicated subtraction expression involving the mod.) Unless maybe it's solving this part which leads to the current solution?

I played around with this a bit before as well before. I might be missing something, but if we have OrigIV as AddRec {0,+,1}<nuw><nsw><%loop>, then wouldn't doing OrigIV mod VF always result in an AddRec that cycle through the remainder (for VF = 2, zext i1 {false,+,true}<%loop> to i64)? Also, if we would need to identify the AddRec sub-expressions, then we would also need a walk the whole expression as the re-writer does I think.

One other way I could think about reasoning about this would be to evaluate the AddRec at the start and VF-1, but that would only prove it for a specific value.

Ayal added inline comments.May 4 2023, 6:21 AM
llvm/lib/Analysis/LoopAccessAnalysis.cpp
2566–2575

nit: would this look better?

2605

nit: a bit discrepant for canAnalyze() to mean more than CanAnalyze. Perhaps the latter should be CannotAnalyze.

2629

nit: worth setting auto FixedVF = VF->getKnownMinValue();.

2638

nit: suffice to set IsUniform to FirstLaneExpr == LastLaneExpr and assert that the latter is also canAnalyze if they're equal.

2641

Ahh, could URem with "VF-1" lead to equal expressions for first and last lanes, but not for all lanes inbetween? E.g., along with FoundUDiv: ((i++)/2)%3) and VF=8.

inclyc added a subscriber: inclyc.May 4 2023, 10:23 AM
fhahn marked 5 inline comments as done.May 18 2023, 4:17 AM
fhahn added inline comments.
llvm/lib/Analysis/LoopAccessAnalysis.cpp
2566–2575

Indeed, updated!

2605

Changed CanAnalyze -> CannotAnalyze, thanks!

2629

Updated, thanks!

2638

Adjusted, thanks!

2641

In theory there could be such expressions I think, but that particular one isn't handled incorrectly at the moment, possibly due to limitations in SCEV reasoning. Added additional tests in 01efcec6dbd1

But I checked compile-time impact of checking all expressions and there was no notable increase: https://llvm-compile-time-tracker.com/compare.php?from=9c1d65054818cd2fd9187cd7e7ef703d98b5c5e2&to=825bea6827d6558ab61c8e139f6d7ba4b007a69b&stat=instructions:u

updated the code to check all lanes in between as well.

fhahn updated this revision to Diff 523336.May 18 2023, 4:17 AM
fhahn marked 5 inline comments as done.

Address latest comments, thanks!

Ayal added a subscriber: simoll.May 24 2023, 2:14 AM

+@simoll.

This hopefully impacts which VPlans are built across the possible VF range, say prefer VF=2 over VF=4 when the former has more uniforms than the latter (but none are fully invariant, as detected today, so covered by same VPlan), although undetected by current tests?

Adding various nits.

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
591–594
llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
351

ditto

354–359

nit: w/o VF, uniformity falls back to loop invariance.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2536

"Rewriter is designed to build the SCEVs for each of the VF lanes in the expected vectorized loop, which can then be compared to detect their uniformity.
This is done by replacing the AddRec SCEVs of the original scalar loop with new AddRecs ..."

Should the name "SCEVAddRecRewriter" convey what it's for?

2554

nit: explain why a non-loop-invariant uniform value is expected to involve a UDiv. Would an SDiv also work?
This saves time by potentially bailing out after building a single (UDiv-free) expression for FirstLane, w/o building another expression for another lane, rather than saving in building an expression itself.

2565

nit: right, point (of error message) is that such addrec's should have been checked earlier?

2593

nit: return a SCEVCouldNotCompute instead, SCEV's inherent 'CannotAnalyze'?

2603

nit: can return SCEVCouldNotCompute (or null) at the end if not FoundUDiv.

2629

Update comment, fold LastLane into an additional VF-1 iteration of the loop, drop the max(1,VF-2).

2647

nit: suffice to check that SCEVs are equal?

llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
430

note: VF=4 is mandated, previous UF=2 decision with 8 loads is now UF=4 with 4 (uniform) loads & broadcasts.

The load from test_base[i/8] could further fold two 'parts' of VF=4 together, as it is uniform across VF=8 / VF=4,UF=2.
Worth leaving behind some assume, if not folding directly? I.e., record the largest VF for which uniformity was detected, even if a smaller VF is selected.
Worth optimizing the analysis across VF's, i.e., if a value is known not to be uniform for VF avoid checking for 2*VF? OTOH, if a value is known to be uniform for VF, check only two SCEVs for 2*VF?

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_and.ll
79

note: this load from A[iv & -2] is now recognized as uniform across VF=2.

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_div_urem.ll
6–7

note: load from A[(iv/2)%3] rightfully not recognized as uniform for VF=8.

296–305

note: this load from A[(iv / 8) % 3] is now recognized as uniform for VF=8.

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_lshr.ll
110

note: this load from A[iv >> 1] is now recognized as uniform for VF=2.

Check that it is not considered uniform for VF=4?

224–225

note: load from A[iv>>2] recognized as uniform for VF=2, should also hold for VF=4.

858–859

note: load from A[1+i>>1] not recognized as uniform for VF=2 due to alignment.

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll
148–149

note: load from A[iv/2 + iv2/2] i.e. A[2*(iv/2)] recognized as uniform for VF=2, but should not for VF > 2.

fhahn updated this revision to Diff 526339.May 28 2023, 12:03 PM
fhahn marked 17 inline comments as done.

Address latest comments, thanks!

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
591–594

Clarified, thanks!

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
351

updated, thanks!

354–359

Updated with similar wording to isUniform, thanks!

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2536

Adjusted the comment, thanks!

Updated name to SCEVAddRecForUniformityRewriter

2554

Expanded the comment, thanks!

2565

Extended message to try to make this clear.

2593

Unfortunately that doesn't work without additional work, as the returned value may be used to construct the parent SCEV but the rewriter.

2603

Unfortunately that doesn't work without additional work, as the returned value may be used to construct the parent SCEV but the rewriter.

2629

Simplified the code, thanks!

2647

Simplified, thanks!

llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
430

The load from test_base[i/8] could further fold two 'parts' of VF=4 together, as it is uniform across VF=8 / VF=4,UF=2.

I think that would be good as follow-up.

Worth optimizing the analysis across VF's, i.e., if a value is known not to be uniform for VF avoid checking for 2*VF

I was thinking about evaluating something like that as follow-up optimization. WDYT?

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_and.ll
79

Added as a comment, thanks!

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_div_urem.ll
6–7

Added comment, thanks!

296–305

added comment

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_lshr.ll
110

Add check lines for VF=4 as well separately.

224–225

added comment, thanks!

858–859

Added note, thanks!

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll
148–149

added note, thanks!

Ayal added inline comments.May 29 2023, 6:19 AM
llvm/lib/Analysis/LoopAccessAnalysis.cpp
2583–2585
2603

ok, this is fine.

I see SCEVInitRewriter, SCEVPostIncRewriter, SCEVShiftRewriter indeed also record a similar SeenLoopVariantSCEVUnknown/Valid which their rewrite() queries after visit() to return SCEVCouldNotCompute. Worth doing the same, wrapping constructor/visit()/canAnalyze()?

2638

nit: "1 .. FixedVF-1"
nit: "first lane" >> "lane 0"
nit: why "reverse"?

llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
430

Sure, TODOs can be added to record potential follow-ups.

Note also that uniformity could be improved beyond comparing equal SCEV expressions, by using Divergence Analysis which propagates uniformity also through uniform branches.

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1.ll
1

line dropped intentionally?

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll
7

lines changed intentionally?

149–150

lines dropped intentionally?

fhahn updated this revision to Diff 526461.May 29 2023, 11:28 AM
fhahn marked 6 inline comments as done.

Address latest comments, thanks!

fhahn marked an inline comment as done.May 29 2023, 11:29 AM
fhahn added inline comments.
llvm/lib/Analysis/LoopAccessAnalysis.cpp
2583–2585

Merged, thanks!

2603

Updated, thanks!

llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
430

Added a TOOD, thanks!

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1.ll
1

No that was an accident, added back, thanks!

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll
7

Those were left over from the patch that added new run lines, removed in fcc135a8d6a7.

149–150

Those were left over from the patch that added new run lines, removed in fcc135a8d6a7.

Ayal accepted this revision.May 29 2023, 1:49 PM

This looks good to me, thanks!
Adding last couple of minor nits.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2558
2568

Constructor can now also be private.

llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
430

Thanks! Plus following-up with D151658!

This revision is now accepted and ready to land.May 29 2023, 1:49 PM
Ayal added a comment.May 29 2023, 2:06 PM

BTW, does the VF parameter need to be "optional", or should all callers of Legal/LAI::isUniform[MemOp]() be asked to provide a VF? Otherwise have them call isInvariant() instead. To encourage passing VF where available.

nikic added a comment.May 30 2023, 4:51 AM

What do you think about checking for udiv presence upfront? That seems to eliminate the compile-time impact entirely for me. Something like this:

bool HasUDiv =
    SCEVExprContains(S, [](const SCEV *S) { return isa<SCEVUDivExpr>(S); });
if (!HasUDiv)
  return false;
fhahn updated this revision to Diff 527011.May 31 2023, 6:28 AM
fhahn marked an inline comment as done.

Rebase so this can be applied directly on main and check for UDiv separately as suggested, thanks!

I am planning on landing this soon.

Ayal added a comment.May 31 2023, 6:38 AM

Rebase so this can be applied directly on main and check for UDiv separately as suggested, thanks!

I am planning on landing this soon.

Have rewrite() take care of optimizing the pre-check for UDiv?

Are the std::optional<ElementCount> VF = std::nullopt really needed/desired?

This revision was landed with ongoing or failed builds.May 31 2023, 8:01 AM
This revision was automatically updated to reflect the committed changes.

Rebase so this can be applied directly on main and check for UDiv separately as suggested, thanks!

I am planning on landing this soon.

Have rewrite() take care of optimizing the pre-check for UDiv?

Taken care of in the committed version,

Are the std::optional<ElementCount> VF = std::nullopt really needed/desired?

Will disentangle isInvariant/isUniform separately.

Hi Florian!

This commit triggers failed asserts in my builds, reproduced as below:

$ cat repro.c 
typedef struct {
  int a;
  short b[]
} c;
void d() {
  c *e = d;
  for (int f = 1; f < 56; f++) {
    int g = f * f / 6;
    e->b[g] = f;
  }
}
$ clang -target x86_64-linux-gnu -w -c repro.c -O2
clang: /home/martin/code/llvm-project/llvm/lib/Analysis/ScalarEvolution.cpp:3674: const llvm::SCEV* llvm::ScalarEvolution::getAddRecExpr(llvm::SmallVectorImpl<const llvm::SCEV*>&, const llvm::Loop*, llvm::SCEV::NoWrapFlags): Assertion `isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!"' failed.

Can you have a look, and revert if it takes a while to get it fixed?

fhahn added a comment.Jun 1 2023, 8:16 AM

Hi Florian!

This commit triggers failed asserts in my builds, reproduced as below:

$ cat repro.c 
typedef struct {
  int a;
  short b[]
} c;
void d() {
  c *e = d;
  for (int f = 1; f < 56; f++) {
    int g = f * f / 6;
    e->b[g] = f;
  }
}
$ clang -target x86_64-linux-gnu -w -c repro.c -O2
clang: /home/martin/code/llvm-project/llvm/lib/Analysis/ScalarEvolution.cpp:3674: const llvm::SCEV* llvm::ScalarEvolution::getAddRecExpr(llvm::SmallVectorImpl<const llvm::SCEV*>&, const llvm::Loop*, llvm::SCEV::NoWrapFlags): Assertion `isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!"' failed.

Can you have a look, and revert if it takes a while to get it fixed?

Thanks, should be fixed by 3b912e269a52

Hi Florian!

This commit triggers failed asserts in my builds, reproduced as below:

$ cat repro.c 
typedef struct {
  int a;
  short b[]
} c;
void d() {
  c *e = d;
  for (int f = 1; f < 56; f++) {
    int g = f * f / 6;
    e->b[g] = f;
  }
}
$ clang -target x86_64-linux-gnu -w -c repro.c -O2
clang: /home/martin/code/llvm-project/llvm/lib/Analysis/ScalarEvolution.cpp:3674: const llvm::SCEV* llvm::ScalarEvolution::getAddRecExpr(llvm::SmallVectorImpl<const llvm::SCEV*>&, const llvm::Loop*, llvm::SCEV::NoWrapFlags): Assertion `isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!"' failed.

Can you have a look, and revert if it takes a while to get it fixed?

Thanks, should be fixed by 3b912e269a52

Yes, the regression seems to be fixed in the original, non-reduced case now as well. Thanks!