This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Combine small-element shuffles of scalar_to_vector in terms of the wider scalar.
AbandonedPublic

Authored by ab on Apr 7 2015, 5:16 PM.

Details

Summary

Continuing on D8883 and D8884, now that (if?) we decided on the (vector_shuffle (bitcast (scalar_to_vector))) form, we can further try to use the wider shuffle element type that was the input of the scalar_to_vector, if the shuffle mask permits it.

In practice this lets us recognize special patterns (see the DUP testcase) without needing to teach everything to deal with "this might be an i32 DUP shuffle, but is expressed in terms of i8 <0,1,2,3> sequences"

Diff Detail

Event Timeline

ab updated this revision to Diff 23384.Apr 7 2015, 5:16 PM
ab retitled this revision from to [CodeGen] Combine small-element shuffles of scalar_to_vector in terms of the wider scalar..
ab updated this object.
ab edited the test plan for this revision. (Show Details)
ab added reviewers: qcolombet, t.p.northover, spatel.
ab added a subscriber: Unknown Object (MLST).
RKSimon added a subscriber: RKSimon.Apr 8 2015, 5:59 AM

There is already an optimization that attempts to shuffle 'scalar source' input operands into a single BUILD_VECTOR just above this code. Might it make more sense to add the ability to peek through bitcasts there?

ab added a comment.Apr 8 2015, 9:52 AM

I'm not sure: I initially dismissed it because it works on multiple scalar inputs, for which it makes sense to do a BUILD_VECTOR directly. For a single scalar input, I found it more straightforward to shuffle a SCALAR_TO_VECTOR, but I agree it's not very consistent.

Also, I'm not sure what happens to the BUILD_VECTOR later on, but I can imagine it being expanded to the shuffle anyway, so maybe it does make sense to do that instead. Let me look into it!

-Ahmed

In D8885#153311, @ab wrote:

I'm not sure: I initially dismissed it because it works on multiple scalar inputs, for which it makes sense to do a BUILD_VECTOR directly. For a single scalar input, I found it more straightforward to shuffle a SCALAR_TO_VECTOR, but I agree it's not very consistent.

If you look at the early versions of the patch on D8516 we did have a version that would create a SCALAR_TO_VECTOR if only the first scalar was defined, but there wasn't a good use case for it so it got dropped. Readding that path should be trivial.

ab updated this revision to Diff 23457.Apr 8 2015, 5:47 PM

Try combining to BUILD_VECTOR instead, adding to the 2-input scalar shuffle combine.

I didn't look at two bitcasts at the same time because this only fires because of D8884: otherwise, the vector_shuffle combine is way too early (and the operands are legalized to BUILD_VECTOR/SCALAR_TO_VECTOR right after), and either the operands get lowered between combine rounds, or other combines catch this without needing to peak through bitcasts (usually concat_vectors if you have two vector inputs).

This does change one ARM test, where we replace:

vmov r0, r1, d16
vmov.32 d16[0], r0
vmov.32 d16[1], r1
vext.8 q8, q8, q8, #4

with:

vmov r0, r1, d16
vmov.32 d16[0], r1

Which looks ..interesting, but is an improvement nonetheless.

-Ahmed

ab added a comment.Apr 8 2015, 5:50 PM

If you look at the early versions of the patch on D8516 we did have a version that would create a SCALAR_TO_VECTOR if only the first scalar was defined, but there wasn't a good use case for it so it got dropped. Readding that path should be trivial.

I don't think we need that after all; I agree with your comment there that it's fine to "rely on a target's lowering logic to do the right thing."

-Ahmed

ab abandoned this revision.Apr 9 2015, 6:32 PM

I only hit this from the SCALAR_TO_VECTOR generated by D8884. With BUILD_VECTOR, this isn't needed after all.

Or rather, I couldn't tickle this into triggering with D8948. Marking as abandoned, will update if I can come up with an actual testcase (the ARM one is pretty uninteresting IMO, but might be a good start).

-Ahmed