This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Only generate single instr for unpredicated stores of varying value to invariant address
ClosedPublic

Authored by reames on Sep 9 2022, 8:28 AM.

Details

Summary

This extends the previously added uniform store case to handle stores of loop varying values to a loop invariant address. Note that the placement of this code only allows unpredicated stores; this is important for correctness. (That is "IsPredicated" is always false at this point in the function.)

This patch does not include scalable types. The diff felt "large enough" as it were; I'll handle that in a separate patch. (It requires some changes to cost modeling.)

Diff Detail

Event Timeline

reames created this revision.Sep 9 2022, 8:28 AM
reames requested review of this revision.Sep 9 2022, 8:28 AM
reames added inline comments.Sep 9 2022, 8:29 AM
llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll
1088 ↗(On Diff #459077)

This diff confused me at first since it seems to involve tail folding - which would involve predication. However, it looks like this is a case where an existing vectorizer bug causes a loop to be vectorized with a scalar epilogue even when the command line says to tail fold or don't vectorize. So, this isn't actually tail folded codegen.

Ayal added a subscriber: Ayal.Sep 11 2022, 12:34 PM
Ayal added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9669

@reames , @fhahn - rather than detect individual redundant instances at this late stage to suppress their code-gen, would be better to record such uniformity earlier in VPlan - thereby potentially facilitating the detection of additional redundancies, improving cost estimation, and simplifying code-gen?

llvm/test/Transforms/LoopVectorize/AArch64/sve-illegal-type.ll
89

(Just noting that this is a somewhat odd test, original CHECK was missing the final store of element 63 presumably.)

llvm/test/Transforms/LoopVectorize/pr47343-expander-lcssa-after-cfg-update.ll
50

Should this have been caught earlier by IsUniform?

reames added inline comments.Sep 12 2022, 8:04 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9669

Can we not let perfection be the enemy of the good here? I've tried implementing this a couple of different ways, and this is the one which made it through review. If you have a different direction you'd like to see, I'm fine making changes in that direction once we have agreement, but I would *strongly* prefer we don't block forward progress on a rework.

As for your actual suggestion, I would not be opposed to having a true notion of uniformity. However, it's hard to motivate the complexity beyond uniform loads and stores. Most uniform computation will have been hoisted by LICM. You only end up with uniform loads and dependent computations or uniform stores stuck around due to control dependence or aliasing.

I will note that cost model changes have previously been delicate and very tricky to get through review. So unless you're *strongly* in favor of this direction, I'd really rather not.

Ayal accepted this revision.Sep 22 2022, 1:24 AM

This makes good forward progress, adding a couple of nits.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9669

nit: " needs a single copy of the store." >> " needs only the last copy of the store."

9675

nit: another option is to fold this into the loops below by setting their loop bounds to be last-part and last-lane, akin to a recent patch by @fhahn.

This revision is now accepted and ready to land.Sep 22 2022, 1:24 AM
This revision was landed with ongoing or failed builds.Sep 22 2022, 8:54 AM
This revision was automatically updated to reflect the committed changes.