This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Add a combine for a shuffle of similar bitcasts
ClosedPublic

Authored by sanwou01 on Feb 24 2021, 9:22 AM.

Details

Summary

Some intrinsics wrapper code has the habit of ignoring the type of the
elements in vectors, thinking of vector registers as a "bag of bits". As
a consequence, some operations are shared between vectors of different
types are shared. For example, functions that rearrange elements in a
vector can be shared between vectors of int32 and float.

This can result in bitcasts in awkward places that prevent the backend
from recognizing some instructions. For AArch64 in particular, it
inhibits the selection of dup from a general purpose register (GPR), and
mov from GPR to a vector lane.

This patch adds a pattern in InstCombine to move the bitcasts past the
shufflevector if this is possible. Sometimes this even allows
InstCombine to remove the bitcast entirely, as in the included tests.

Alternatively this could be done with a few extra patterns in the
AArch64 backend, but InstCombine seems like a better place for this.

Diff Detail

Event Timeline

sanwou01 created this revision.Feb 24 2021, 9:22 AM
sanwou01 requested review of this revision.Feb 24 2021, 9:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2021, 9:22 AM
lebedev.ri added a subscriber: lebedev.ri.
lebedev.ri added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2297

One of the original bitcasts needs to be one-use, else we'll increase instruction count.

2299

Can we do anything for the case where we have element count mismatches?

2301

Whoops, phab did something weird

sanwou01 added inline comments.Feb 24 2021, 9:43 AM
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2297

Good point! I can bail out if neither is one-use.

2299

Yes, I'm being fairly conservative here. I think the following example would be legal to transform similarly. Is that what you had in mind?

%0 = bitcast <4 x i32> %a to <4 x float>
%1 = bitcast <2 x i32> %b to <2 x float>
%2 = shufflevector <4 x float> %0, <2 x float> %1, <2 x i32> <i32 3, i32 5>
2301

Nice, thanks!

lebedev.ri added inline comments.Feb 24 2021, 9:46 AM
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2299

I actually don't have anything particular in mind, just asking.
That being said, i think both operands of a shufflevector must have the same type
(including vector element count), so i'm not sure that example works?

We intentionally do not create new shuffle masks in instcombine because we can't guarantee that codegen can lower arbitrary masks efficiently, but this patch seems fine since it just re-uses the existing mask.
If there is motivation to handle casts of different-sized elements (and therefore requires a new mask), you might look at building on VectorCombine::foldBitcastShuf(). We use the cost model there to avoid creating unsupported shuffles.

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2299

It should be ok to handle a length-changing shuffle, but we need to add type checks to make it safe.
Note that as-is this patch doesn't have the right combination - this crashes:

define <4 x double> @bc_shuf(<4 x i32> %x, <4 x i32> %y) {
  %xb = bitcast <4 x i32> %x to <2 x double>
  %yb = bitcast <4 x i32> %y to <2 x double>
  %r = shufflevector <2 x double> %xb, <2 x double> %yb, <4 x i32> <i32 1, i32 3, i32 0, i32 2>
  ret <4 x double> %r
}

So definitely add some negative tests to make sure we don't break things.

We intentionally do not create new shuffle masks in instcombine because we can't guarantee that codegen can lower arbitrary masks efficiently, but this patch seems fine since it just re-uses the existing mask.
If there is motivation to handle casts of different-sized elements (and therefore requires a new mask), you might look at building on VectorCombine::foldBitcastShuf(). We use the cost model there to avoid creating unsupported shuffles.

I don't have a motivation to handle bitcasts that change the element size (or scalar to vector, for that matter), so I'd prefer to keep this simple.

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2299

Good catch, I've added tests for length-changing shuffles (which as you point out we can handle without changing the mask) and some negative tests for bitcasts that change the element size.

sanwou01 marked an inline comment as done.Mar 1 2021, 5:07 AM
sanwou01 updated this revision to Diff 327083.Mar 1 2021, 5:12 AM

Address comments.

sanwou01 retitled this revision from [InstCombine] Add a combine for a shuffle of identical bitcasts to [InstCombine] Add a combine for a shuffle of similar bitcasts.Mar 1 2021, 5:12 AM

I'd like to see some test improvements:

  1. There are no tests with extra uses on bitcasts
  2. There are no tests with something like
%xb = bitcast <2 x half> %x to <2 x i16>
%yb = bitcast <2 x bfloat> %y to <2 x i16>
%r = shufflevector <2 x i16> %xb, <2 x i16> %yb, <4 x i16> <i16 3, i16 2, i16 1, i16 0>

I suspect this will miscompile?

  1. Some tests have unneeded stuff. They should only contain 2 bitcasts and a shufflevector (and a ret), nothing more.

I'd like to see some test improvements:

  1. There are no tests with extra uses on bitcasts
  2. There are no tests with something like
%xb = bitcast <2 x half> %x to <2 x i16>
%yb = bitcast <2 x bfloat> %y to <2 x i16>
%r = shufflevector <2 x i16> %xb, <2 x i16> %yb, <4 x i16> <i16 3, i16 2, i16 1, i16 0>

I suspect this will miscompile?

  1. Some tests have unneeded stuff. They should only contain 2 bitcasts and a shufflevector (and a ret), nothing more.

Thanks for the review, @lebedev.ri !

  1. Fair enough, I'll add some tests for that.
  2. I'll add this as a test, but I don't think it miscompiles: I do check that the input types of the two bitcasts are identical.
  3. Can do. The first two tests were derived from my motivating examples, but I can simplify them some more as you point out, thanks.
sanwou01 updated this revision to Diff 327126.Mar 1 2021, 8:27 AM

Add a few more tests and some further test reduction.

@spatel @lebedev.ri thanks for the comments so far. Any other comments, or is this okay as is?

This revision is now accepted and ready to land.Mar 5 2021, 8:50 AM
spatel accepted this revision.Mar 5 2021, 9:46 AM

LGTM

This revision was landed with ongoing or failed builds.Mar 8 2021, 8:37 AM
This revision was automatically updated to reflect the committed changes.