Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

[LV] Use SCEV for uniformity analysis across VF
ClosedPublic

Authored by fhahn on Apr 20 2023, 2:02 PM.

Details

Summary

This patch uses SCEV to check if a value is uniform across a given VF.

The basic idea is to construct SCEVs where the AddRecs of the loop are
adjusted to reflect the version in the vectorized loop (Step multiplied
by VF). We construct a SCEV for the value of the first vector lane
(offset 0) and one for the last vector lane (VF - 1). If they are equal,
consider the expression uniform.

While re-writing expressions, we also need to catch expressions we
cannot determine uniformity (e.g. SCEVUnknown).

I might be missing something that makes this approach unworkable in
practice, but it may be an alternative to D147735.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
fhahn marked 5 inline comments as done.May 18 2023, 4:17 AM
fhahn added inline comments.
llvm/lib/Analysis/LoopAccessAnalysis.cpp
2566–2575

Indeed, updated!

2605

Changed CanAnalyze -> CannotAnalyze, thanks!

2637

Updated, thanks!

2646

Adjusted, thanks!

2649

In theory there could be such expressions I think, but that particular one isn't handled incorrectly at the moment, possibly due to limitations in SCEV reasoning. Added additional tests in 01efcec6dbd1

But I checked compile-time impact of checking all expressions and there was no notable increase: https://llvm-compile-time-tracker.com/compare.php?from=9c1d65054818cd2fd9187cd7e7ef703d98b5c5e2&to=825bea6827d6558ab61c8e139f6d7ba4b007a69b&stat=instructions:u

updated the code to check all lanes in between as well.

fhahn updated this revision to Diff 523336.May 18 2023, 4:17 AM
fhahn marked 5 inline comments as done.

Address latest comments, thanks!

Ayal added a subscriber: simoll.May 24 2023, 2:14 AM

+@simoll.

This hopefully impacts which VPlans are built across the possible VF range, say prefer VF=2 over VF=4 when the former has more uniforms than the latter (but none are fully invariant, as detected today, so covered by same VPlan), although undetected by current tests?

Adding various nits.

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
591–594
llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
351

ditto

354–359

nit: w/o VF, uniformity falls back to loop invariance.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2536

"Rewriter is designed to build the SCEVs for each of the VF lanes in the expected vectorized loop, which can then be compared to detect their uniformity.
This is done by replacing the AddRec SCEVs of the original scalar loop with new AddRecs ..."

Should the name "SCEVAddRecRewriter" convey what it's for?

2554

nit: explain why a non-loop-invariant uniform value is expected to involve a UDiv. Would an SDiv also work?
This saves time by potentially bailing out after building a single (UDiv-free) expression for FirstLane, w/o building another expression for another lane, rather than saving in building an expression itself.

2565

nit: right, point (of error message) is that such addrec's should have been checked earlier?

2593

nit: return a SCEVCouldNotCompute instead, SCEV's inherent 'CannotAnalyze'?

2603

nit: can return SCEVCouldNotCompute (or null) at the end if not FoundUDiv.

2637

Update comment, fold LastLane into an additional VF-1 iteration of the loop, drop the max(1,VF-2).

2655

nit: suffice to check that SCEVs are equal?

llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
430

note: VF=4 is mandated, previous UF=2 decision with 8 loads is now UF=4 with 4 (uniform) loads & broadcasts.

The load from test_base[i/8] could further fold two 'parts' of VF=4 together, as it is uniform across VF=8 / VF=4,UF=2.
Worth leaving behind some assume, if not folding directly? I.e., record the largest VF for which uniformity was detected, even if a smaller VF is selected.
Worth optimizing the analysis across VF's, i.e., if a value is known not to be uniform for VF avoid checking for 2*VF? OTOH, if a value is known to be uniform for VF, check only two SCEVs for 2*VF?

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_and.ll
79

note: this load from A[iv & -2] is now recognized as uniform across VF=2.

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_div_urem.ll
6–7

note: load from A[(iv/2)%3] rightfully not recognized as uniform for VF=8.

296–305

note: this load from A[(iv / 8) % 3] is now recognized as uniform for VF=8.

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_lshr.ll
110

note: this load from A[iv >> 1] is now recognized as uniform for VF=2.

Check that it is not considered uniform for VF=4?

224–225

note: load from A[iv>>2] recognized as uniform for VF=2, should also hold for VF=4.

858–859

note: load from A[1+i>>1] not recognized as uniform for VF=2 due to alignment.

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll
148–149

note: load from A[iv/2 + iv2/2] i.e. A[2*(iv/2)] recognized as uniform for VF=2, but should not for VF > 2.

fhahn updated this revision to Diff 526339.May 28 2023, 12:03 PM
fhahn marked 17 inline comments as done.

Address latest comments, thanks!

llvm/include/llvm/Analysis/LoopAccessAnalysis.h
591–594

Clarified, thanks!

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h
351

updated, thanks!

354–359

Updated with similar wording to isUniform, thanks!

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2536

Adjusted the comment, thanks!

Updated name to SCEVAddRecForUniformityRewriter

2554

Expanded the comment, thanks!

2565

Extended message to try to make this clear.

2593

Unfortunately that doesn't work without additional work, as the returned value may be used to construct the parent SCEV but the rewriter.

2603

Unfortunately that doesn't work without additional work, as the returned value may be used to construct the parent SCEV but the rewriter.

2637

Simplified the code, thanks!

2655

Simplified, thanks!

llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
430

The load from test_base[i/8] could further fold two 'parts' of VF=4 together, as it is uniform across VF=8 / VF=4,UF=2.

I think that would be good as follow-up.

Worth optimizing the analysis across VF's, i.e., if a value is known not to be uniform for VF avoid checking for 2*VF

I was thinking about evaluating something like that as follow-up optimization. WDYT?

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_and.ll
79

Added as a comment, thanks!

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_div_urem.ll
6–7

Added comment, thanks!

296–305

added comment

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_lshr.ll
110

Add check lines for VF=4 as well separately.

224–225

added comment, thanks!

858–859

Added note, thanks!

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll
148–149

added note, thanks!

Ayal added inline comments.May 29 2023, 6:19 AM
llvm/lib/Analysis/LoopAccessAnalysis.cpp
2583–2585
2603

ok, this is fine.

I see SCEVInitRewriter, SCEVPostIncRewriter, SCEVShiftRewriter indeed also record a similar SeenLoopVariantSCEVUnknown/Valid which their rewrite() queries after visit() to return SCEVCouldNotCompute. Worth doing the same, wrapping constructor/visit()/canAnalyze()?

2646

nit: "1 .. FixedVF-1"
nit: "first lane" >> "lane 0"
nit: why "reverse"?

llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
430

Sure, TODOs can be added to record potential follow-ups.

Note also that uniformity could be improved beyond comparing equal SCEV expressions, by using Divergence Analysis which propagates uniformity also through uniform branches.

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1.ll
1

line dropped intentionally?

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll
7

lines changed intentionally?

149–150

lines dropped intentionally?

fhahn updated this revision to Diff 526461.May 29 2023, 11:28 AM
fhahn marked 6 inline comments as done.

Address latest comments, thanks!

fhahn marked an inline comment as done.May 29 2023, 11:29 AM
fhahn added inline comments.
llvm/lib/Analysis/LoopAccessAnalysis.cpp
2583–2585

Merged, thanks!

2603

Updated, thanks!

llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
430

Added a TOOD, thanks!

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1.ll
1

No that was an accident, added back, thanks!

llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll
7

Those were left over from the patch that added new run lines, removed in fcc135a8d6a7.

149–150

Those were left over from the patch that added new run lines, removed in fcc135a8d6a7.

Ayal accepted this revision.May 29 2023, 1:49 PM

This looks good to me, thanks!
Adding last couple of minor nits.

llvm/lib/Analysis/LoopAccessAnalysis.cpp
2558
2568

Constructor can now also be private.

llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll
430

Thanks! Plus following-up with D151658!

This revision is now accepted and ready to land.May 29 2023, 1:49 PM
Ayal added a comment.May 29 2023, 2:06 PM

BTW, does the VF parameter need to be "optional", or should all callers of Legal/LAI::isUniform[MemOp]() be asked to provide a VF? Otherwise have them call isInvariant() instead. To encourage passing VF where available.

nikic added a comment.May 30 2023, 4:51 AM

What do you think about checking for udiv presence upfront? That seems to eliminate the compile-time impact entirely for me. Something like this:

bool HasUDiv =
    SCEVExprContains(S, [](const SCEV *S) { return isa<SCEVUDivExpr>(S); });
if (!HasUDiv)
  return false;
fhahn updated this revision to Diff 527011.May 31 2023, 6:28 AM
fhahn marked an inline comment as done.

Rebase so this can be applied directly on main and check for UDiv separately as suggested, thanks!

I am planning on landing this soon.

Ayal added a comment.May 31 2023, 6:38 AM

Rebase so this can be applied directly on main and check for UDiv separately as suggested, thanks!

I am planning on landing this soon.

Have rewrite() take care of optimizing the pre-check for UDiv?

Are the std::optional<ElementCount> VF = std::nullopt really needed/desired?

This revision was landed with ongoing or failed builds.May 31 2023, 8:01 AM
This revision was automatically updated to reflect the committed changes.

Rebase so this can be applied directly on main and check for UDiv separately as suggested, thanks!

I am planning on landing this soon.

Have rewrite() take care of optimizing the pre-check for UDiv?

Taken care of in the committed version,

Are the std::optional<ElementCount> VF = std::nullopt really needed/desired?

Will disentangle isInvariant/isUniform separately.

Hi Florian!

This commit triggers failed asserts in my builds, reproduced as below:

$ cat repro.c 
typedef struct {
  int a;
  short b[]
} c;
void d() {
  c *e = d;
  for (int f = 1; f < 56; f++) {
    int g = f * f / 6;
    e->b[g] = f;
  }
}
$ clang -target x86_64-linux-gnu -w -c repro.c -O2
clang: /home/martin/code/llvm-project/llvm/lib/Analysis/ScalarEvolution.cpp:3674: const llvm::SCEV* llvm::ScalarEvolution::getAddRecExpr(llvm::SmallVectorImpl<const llvm::SCEV*>&, const llvm::Loop*, llvm::SCEV::NoWrapFlags): Assertion `isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!"' failed.

Can you have a look, and revert if it takes a while to get it fixed?

fhahn added a comment.Jun 1 2023, 8:16 AM

Hi Florian!

This commit triggers failed asserts in my builds, reproduced as below:

$ cat repro.c 
typedef struct {
  int a;
  short b[]
} c;
void d() {
  c *e = d;
  for (int f = 1; f < 56; f++) {
    int g = f * f / 6;
    e->b[g] = f;
  }
}
$ clang -target x86_64-linux-gnu -w -c repro.c -O2
clang: /home/martin/code/llvm-project/llvm/lib/Analysis/ScalarEvolution.cpp:3674: const llvm::SCEV* llvm::ScalarEvolution::getAddRecExpr(llvm::SmallVectorImpl<const llvm::SCEV*>&, const llvm::Loop*, llvm::SCEV::NoWrapFlags): Assertion `isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!"' failed.

Can you have a look, and revert if it takes a while to get it fixed?

Thanks, should be fixed by 3b912e269a52

Hi Florian!

This commit triggers failed asserts in my builds, reproduced as below:

$ cat repro.c 
typedef struct {
  int a;
  short b[]
} c;
void d() {
  c *e = d;
  for (int f = 1; f < 56; f++) {
    int g = f * f / 6;
    e->b[g] = f;
  }
}
$ clang -target x86_64-linux-gnu -w -c repro.c -O2
clang: /home/martin/code/llvm-project/llvm/lib/Analysis/ScalarEvolution.cpp:3674: const llvm::SCEV* llvm::ScalarEvolution::getAddRecExpr(llvm::SmallVectorImpl<const llvm::SCEV*>&, const llvm::Loop*, llvm::SCEV::NoWrapFlags): Assertion `isLoopInvariant(Operands[i], L) && "SCEVAddRecExpr operand is not loop-invariant!"' failed.

Can you have a look, and revert if it takes a while to get it fixed?

Thanks, should be fixed by 3b912e269a52

Yes, the regression seems to be fixed in the original, non-reduced case now as well. Thanks!