This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Add stride->constant VPlan mapping at construction.
ClosedPublic

Authored by fhahn on Apr 7 2023, 5:02 AM.

Details

Summary

Add constant VPValues for versioned strides on VPlan construction. This
ensures the constant strides will be used directly.

Potential alternative to D147378.

Diff Detail

Event Timeline

fhahn created this revision.Apr 7 2023, 5:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2023, 5:02 AM
fhahn requested review of this revision.Apr 7 2023, 5:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2023, 5:02 AM
peixin added a comment.Apr 7 2023, 6:04 AM

D147378 made changes for 12 lines for strided-accesses.ll.

llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll
300–301

This symbolic stride is still here.

peixin added a comment.Apr 7 2023, 6:11 AM

Found one more problem, the symbolic strides in vector.ph are not replaced, either, although it's negligible.

Ayal added inline comments.Apr 9 2023, 1:30 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8970

nit: can alternatively do

Plan->getOrAddVPValue(Stride);
assert(Plan->getVPValue(Stride)->getLiveInIRValue() == CI &&
       "Adding different constants for the same stride!");

?

llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll
300–301

Right, this seems to catch (++IV)*stride cases but miss IV+=stride ones...

peixin added inline comments.Apr 9 2023, 2:11 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8970

I am thinking if it is really needed? The symblic strides are stored in one ValueToValueMap. It seems to be impossible to may different constants for the same stride.

fhahn updated this revision to Diff 512034.Apr 9 2023, 1:07 PM

Rebase on top of D147892 & D147891 which allows us to catch missed cases in strided-accesses.ll.

fhahn marked 4 inline comments as done.Apr 9 2023, 1:11 PM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8970

We need to map Stride -> VPValue(CI) so if we want to use getOrAddVPValue we would need to add a variant that takes an IR value and a VPValue.

I am thinking if it is really needed? The symblic strides are stored in one ValueToValueMap. It seems to be impossible to may different constants for the same stride.

The map just stores the IR value that's versioned, but doesn't contain the value it is versioned to (this is only in PSE). At the moment, we only version strides to 1, but in the future this may change and the assert may help to catch inconsistencies.

llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll
300–301

This should be handled in the latest version. The issue was the the code handling offsets here was using the VPExternalDefs mapping. D147892 removes that mapping and updates all code to use the Value2VPValue mapping, catching this case.

The end result is better than just replacing %stride with 1, as we can just use the canonical IV.

peixin added inline comments.Apr 9 2023, 6:21 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8944–8945

Don't get why this cannot capture the cases in preheader?

A couple points here.

  1. The restriction to constant strides is unfortunate. We only speculate on constant equality today, but there are other reasonable speculations we could make. For instance, given a loop with two independent strided IVs, we might speculate they are equal.
  1. This feels somewhat like you're reinventing the wheel. The whole point of PSE is that you can ask it to simplify a SCEV expression. Anywhere we have a SCEV expression, we should probably be using it to optimize the result before generating vectorized code. The stride speculation is only one of multiple assumption types PSE has. I suspect - though don't have a test case - that the overflow flags might also lead to interesting speculations.

On (2), be warned. I am not real comfortable with the PSE code quality. I'm reasonable sure we have bugs lurking that might be exposed if we started using it significantly more widely. Most of our usage in LV today is for generating checks and costing, adding it to the code generation path might cause some fallout. (And this is probably the biggest argument for not using PSE. I'm open to that argument, I'd just like to see it discussed explicitly.)

fhahn updated this revision to Diff 516880.Apr 25 2023, 1:18 PM
fhahn marked 2 inline comments as done.

Rebase and ping :)

A couple points here.

Thanks for taking a look!

  1. The restriction to constant strides is unfortunate. We only speculate on constant equality today, but there are other reasonable speculations we could make. For instance, given a loop with two independent strided IVs, we might speculate they are equal.

There is no need to restrict this to constant strides; if it is not constant (and not SCEVUnknown) then we would need to expand the SCEV, which can be done by introducing a VPExpandSCEVRecipe to take care of that.

  1. This feels somewhat like you're reinventing the wheel. The whole point of PSE is that you can ask it to simplify a SCEV expression. Anywhere we have a SCEV expression, we should probably be using it to optimize the result before generating vectorized code. The stride speculation is only one of multiple assumption types PSE has. I suspect - though don't have a test case - that the overflow flags might also lead to interesting speculations.

We are moving towards modeling SCEV expansion explicitly in VPlan (this is already the case for expanding steps of inductions used in the plan). One the one hand this takes care of the issues we have with querying/expanding SCEVs while we already modified the CFG (and it is in an temporary invalid state). On the other hand this would also make it very easy to re-query PSE for all used SCEVs down the line.

For this particular case, there's no SCEV expressions for the strides if it is used outside of inductions, so taking care of that using a mapping seems good first step to me. WDYT?

fhahn marked an inline comment as done.Apr 25 2023, 1:20 PM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8944–8945

The issue is that the strides in the preheader come from SCEV expressions, as @reames mentioned and that's not yet captured in the VPlan scope.

There's another patch (D147965) that moves step expansion used in the pre-header to VPlan as well. We should be then able to take care of this here as well.

fhahn updated this revision to Diff 523140.May 17 2023, 12:50 PM
fhahn marked an inline comment as done.

Rebased on top of e41dce4d4974, ping :)

reames requested changes to this revision.May 18 2023, 2:31 PM
reames added inline comments.
llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll
217

You're deleting - not updating - test lines here. I think you may need to split an existing check clause so that a delta can be seen.

734

This deletion looks weird. I don't know a reason that update_test_checks.py would produce this deletion.

This revision now requires changes to proceed.May 18 2023, 2:31 PM
fhahn updated this revision to Diff 523824.May 19 2023, 9:32 AM

Fix test lines.

fhahn marked 2 inline comments as done.May 19 2023, 9:33 AM
fhahn added inline comments.
llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll
217

Hm, not sure what went wrong there. Should be fixed now.

734

Hm, not sure what went wrong there. Should be fixed now.

LGTM. I'm not super familiar with VPlan internals through, so you may want to wait for another reviewer. I'll defer to you on that.

reames added inline comments.May 25 2023, 5:02 PM
llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll
236

Completely unrelated to this, but it looks like we've got opportunity to fold these multiplies now. Is this coming from an IRBuilder or a SCEVExpander? If the former, we may just need to adjust the folder at construction. (Assuming that eager folding doesn't interact badly with something inside LV/FPlan - which you'll have a much better idea than I do.)

Ayal added inline comments.May 29 2023, 12:50 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8971

This is essentially short-circuiting an RAUW. When VPlan will take care of the check for unit stride, only users inside the vector loop (, its preheader, middle block, and epilog vector loop?) will need to be replaced, the compare (and scalar loop) should continue to use the original stride of-course.
Worth setting it up as an RAUW of original stride VPValue with "1" VPValue, after all users of the former have been set up?

llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll
236

Yes, there's room for some constant folding here, and also the add 0. Would be good to ensure LV's cost model for such predicated unit stride accesses is accurate, and potentially apply these foldings in VPlan, independent of foldings at IR generation.

316

nit: such appearances of STRIDE in vector.ph should also be removed once Induction Resume Values are created via recipes, right?

fhahn updated this revision to Diff 530340.Jun 11 2023, 3:02 PM
fhahn marked 2 inline comments as done.

Update to RAUW and remove uneeded hasVPValueFor.

fhahn marked 4 inline comments as done.Jun 11 2023, 3:28 PM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8971

Updated to do RAUW which means the hasVPValueFor helper isn't needed any longer. This required an update to the VPlan transform optimizing inductions to use the existing step, instead of creating a new one: df357a71dd7d10ee4bdc0bd22e6be048c5ad7088

llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll
236

The redundant add comes from recipe expansion. I put up D152660 and D152659 to simplify during IR construction, as there are multiple places where this should benefit. It would also make sense to do this based on recipes on the future.

316

yes exactly.

Ayal accepted this revision.Jun 12 2023, 8:33 AM

LGTM. I'm not super familiar with VPlan internals through, so you may want to wait for another reviewer. I'll defer to you on that.

Looks good to me too, ship it!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9122

nit: suggests that getSymbolicStrides() should return SCEVUnknowns.

9129

LGTM!

This is an optimization that relies on Value2VPValue mapping, so placed here as late as possible? Maybe worth adding to comment.

Would be good to outline stuff out of the excessive tryToBuildVPlanWithVPRecipes()...

This revision was not accepted when it landed; it landed in state Needs Review.Jun 13 2023, 12:27 AM
This revision was automatically updated to reflect the committed changes.
fhahn marked 3 inline comments as done.

It introduced an assert in csky target, could you please reproduce it? @fhahn
./bin/clang --target=csky-unknown-linux -DNDEBUG -mfloat-abi=hard -O2 -DSMALL_PROBLEM_SIZE -w -Werror=date-time -fcommon -D__USE_LARGEFILE64 -D_FILE_OFFSET_BITS=64 -o erc_do_p.c.o -c ~/llvm-test-suite/MultiSource/Applications/JM/ldecod/erc_do_p.c

fhahn added a comment.Jun 14 2023, 2:26 AM

It introduced an assert in csky target, could you please reproduce it? @fhahn
./bin/clang --target=csky-unknown-linux -DNDEBUG -mfloat-abi=hard -O2 -DSMALL_PROBLEM_SIZE -w -Werror=date-time -fcommon -D__USE_LARGEFILE64 -D_FILE_OFFSET_BITS=64 -o erc_do_p.c.o -c ~/llvm-test-suite/MultiSource/Applications/JM/ldecod/erc_do_p.c

Unfortunately I have no system with a csky libc. Could you share the IR before the crash? (e.g. by passing -mllvm -print-on-crash -mllvm -print-module-scope)

zixuan-wu added a comment.EditedJun 14 2023, 2:35 AM

It introduced an assert in csky target, could you please reproduce it? @fhahn
./bin/clang --target=csky-unknown-linux -DNDEBUG -mfloat-abi=hard -O2 -DSMALL_PROBLEM_SIZE -w -Werror=date-time -fcommon -D__USE_LARGEFILE64 -D_FILE_OFFSET_BITS=64 -o erc_do_p.c.o -c ~/llvm-test-suite/MultiSource/Applications/JM/ldecod/erc_do_p.c

Unfortunately I have no system with a csky libc. Could you share the IR before the crash? (e.g. by passing -mllvm -print-on-crash -mllvm -print-module-scope)

OK, thx.


./bin/clang --target=csky-unknown-linux -DNDEBUG -mfloat-abi=hard -O2 -w -Werror=date-time -fcommon test.ll

fhahn added a comment.Jun 14 2023, 9:21 AM

It introduced an assert in csky target, could you please reproduce it? @fhahn
./bin/clang --target=csky-unknown-linux -DNDEBUG -mfloat-abi=hard -O2 -DSMALL_PROBLEM_SIZE -w -Werror=date-time -fcommon -D__USE_LARGEFILE64 -D_FILE_OFFSET_BITS=64 -o erc_do_p.c.o -c ~/llvm-test-suite/MultiSource/Applications/JM/ldecod/erc_do_p.c

Unfortunately I have no system with a csky libc. Could you share the IR before the crash? (e.g. by passing -mllvm -print-on-crash -mllvm -print-module-scope)

OK, thx.


./bin/clang --target=csky-unknown-linux -DNDEBUG -mfloat-abi=hard -O2 -w -Werror=date-time -fcommon test.ll

Thanks, should be fixed by ea6ca9cb2b6