This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Use DUP for BUILD_VECTOR with few different elements.
ClosedPublic

Authored by fhahn on Oct 27 2020, 8:05 AM.

Details

Summary

If most elements of BUILD_VECTOR are the same, with a few different
elements, it is better to use DUP for the common elements and
INSERT_VECTOR_ELT for the different elements.

Currently this transform is guarded quite restrictively to only trigger
in clearly beneficial cases.

With D90176, the lowering for patterns originating from code like
float32x4_t y = {a,a,a,0}; (common in 3D apps) are lowered even
better (unnecessary fmov is removed).

Diff Detail

Event Timeline

fhahn created this revision.Oct 27 2020, 8:05 AM
fhahn requested review of this revision.Oct 27 2020, 8:05 AM
efriedma added inline comments.Oct 27 2020, 12:13 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9157

I think NumElts >= 4 is redundant? If NumElts == 2, there is no value of NumDifferentLanes that makes this pass.

llvm/test/CodeGen/AArch64/arm64-vector-insertion.ll
41

The test coverage here seems a bit lacking.

dmgreen added inline comments.Oct 28 2020, 12:20 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9193

This can be removed.

9205

Could this just create a DUP? Or is this so that it does not have to figure out what kind of DUP/DUPLANE to use?

llvm/test/CodeGen/AArch64/arm64-vector-insertion.ll
33

Comment can be updated too

fhahn updated this revision to Diff 301293.Oct 28 2020, 8:18 AM

Remove unnecessary check, spurious comment and fixed TODO, thanks!

fhahn marked an inline comment as done.Oct 28 2020, 8:21 AM
fhahn added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
9205

We could, but I am not sure how much extra logic would be required to handle constant and non-constant splats and other cases optimized by LowerBUILD_VECTOR. This seems to also be in-line with other parts of this function that also delegate back to LowerBUILD_VECTOR.

llvm/test/CodeGen/AArch64/arm64-vector-insertion.ll
33

yes, removed!

41

I added more positive and negative tests in 5febc535c4f8.

efriedma accepted this revision.Oct 28 2020, 11:18 AM

Maybe add a testcase for -0 in case someone else makes the same mistake? Otherwise LGTM

This revision is now accepted and ready to land.Oct 28 2020, 11:18 AM

Maybe add a testcase for -0 in case someone else makes the same mistake?

Err, I guess the -0 thing was the other patch; nevermind. ;)

This revision was landed with ongoing or failed builds.Oct 28 2020, 12:52 PM
This revision was automatically updated to reflect the committed changes.
fhahn added a comment.Oct 28 2020, 2:53 PM

Thank you very much for the review!