Page MenuHomePhabricator

[ARM] Match dual lane vmovs from insert_vector_elt

Authored by dmgreen on Dec 3 2020, 1:15 AM.



MVE has a dual lane vector move instruction, capable of moving two general purpose registers into lanes of a vector register. They look like one of:

vmov q0[2], q0[0], r2, r0
vmov q0[3], q0[1], r3, r1

They only accept these lane indices though (and only insert into an i32), either moving lanes 1 and 3, or 0 and 2.

This patch adds some tablegen patterns for them, selecting from vector inserts elements. Because the insert_elements are knows to be canonicalized to ascending order there are several patterns that we need to select. These lane indices are:

3 2 1 0    -> vmovqrr 31; vmovqrr 20
3 2 1      -> vmovqrr 31; vmov 2
3 1        -> vmovqrr 31
2 1 0      -> vmovqrr 20; vmov 1
2 0        -> vmovqrr 20

With the top one being the most common. All other potential patterns of lane indices will be matched by a combination of these and the individual vmov pattern already present. This does mean that we are selecting several machine instructions at once due to the need to re-arrange the inserts, but in this case there is at least nothing else that will attempt to match an insert_vector_elt node.

Diff Detail

Event Timeline

dmgreen created this revision.Dec 3 2020, 1:15 AM
dmgreen requested review of this revision.Dec 3 2020, 1:15 AM

Now that I think about it more, perhaps the X86 code changes should be handled differently. There were three tests that changed, two of which (avx512-insert-extract.ll and haddsub-2.ll) are the same instructions in a different order. The other (avx512-mask-op.ll:test18) has a different series of kshiftl/kshiftr's with a single extra instruction.

Much of the ISD::BUILD_VECTOR creation code tries to use this and relies on sorted insertions chains - would it be better just to add a TLI virtual method that disables this at the target level for specific valuetypes?

Hmm. Yeah there might be some code that relies on this but isn't tested. It was added in, but that didn't have any tests itself.

I could probably just do this entirely in tablegen. There are only actually 5 patterns when you get down to it, so long as you know the nodes are in-order.. It's not very pretty, but I'll look into doing it that way.

dmgreen updated this revision to Diff 309540.Dec 4 2020, 8:25 AM
dmgreen retitled this revision from [ARM][X86] Match dual lane vmovs from insert_vector_elt to [ARM] Match dual lane vmovs from insert_vector_elt.
dmgreen edited the summary of this revision. (Show Details)

Now just using tablegen patterns.

LGTM (with one minor) but an ARM guru should really approve it.


Missing assert message.

SjoerdMeijer accepted this revision.Dec 9 2020, 6:10 AM

LGTM (with one minor) but an ARM guru should really approve it.

This puts some pressure on things ;-) .... but this looks like nice codegen changes, so LGTM too.

This revision is now accepted and ready to land.Dec 9 2020, 6:10 AM
This revision was automatically updated to reflect the committed changes.