Page MenuHomePhabricator

[ARM] Split 128-bit vectors in BUILD_VECTOR lowering

Authored by efriedma on Dec 9 2016, 11:25 AM.



Given that INSERT_VECTOR_ELT operates on D registers anyway, combining 64-bit vectors into a 128-bit vector is basically free. Therefore, try to split BUILD_VECTOR nodes before giving up and lowering them to a series of INSERT_VECTOR_ELT instructions. Sometimes this allows dramatically better lowerings; see testcases for examples. Inspired by similar code in the x86 backend for AVX.

For the @vcombine_vdup, I'm not happy with the DAGCombine transforms which produce a BUILD_VECTOR in the first place; we're taking splat shuffles which were carefully preserved in the IR, and destroying them in DAGCombine by transforming concat_vec(splat(a), splat(a)) -> concat_vec(build_vector(a,a,a,a), build_vector(b,b,b,b)) -> build_vector(a,a,a,a,b,b,b,b). Maybe we could improve this somehow?

Diff Detail


Event Timeline

efriedma updated this revision to Diff 80918.Dec 9 2016, 11:25 AM
efriedma retitled this revision from to [ARM] Split 128-bit vectors in BUILD_VECTOR lowering.
efriedma updated this object.
efriedma added reviewers: ab, t.p.northover, jmolloy.
efriedma set the repository for this revision to rL LLVM.

Hi Eli,

This change looks good to me. I just have a small comment inline.


6232 ↗(On Diff #80918)

I'm assuming LowerBUILD_VECTOR can return SDNode() if there's no need for it, thus the concat only happening if both were lowered.

Is this is your strategy around "we might discover a better way to lower it"?

How can we avoid cases where it doesn't?

efriedma added inline comments.Dec 14 2016, 10:40 AM
6232 ↗(On Diff #80918)

LowerBUILD_VECTOR can return SDNode() in some cases... I think in practice, though, it only fails for splats and constants.

Mostly, I'm just depending on the fact that the worst case isn't any worse than what we would do anyway; a series of INSERT_VECTOR_ELT operations on two 64-bit vectors is roughly equivalent to a series of INSERT_VECTOR_ELT operations on one 128-bit vector.

rengolin accepted this revision.Dec 14 2016, 11:20 AM
rengolin added a reviewer: rengolin.

LGTM. Thanks!

6232 ↗(On Diff #80918)

Makes sense.

This revision is now accepted and ready to land.Dec 14 2016, 11:20 AM
This revision was automatically updated to reflect the committed changes.