This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
4/9
InstCombineVectorOps.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
shuffle-cast-dist.ll

Differential D97397

[InstCombine] Add a combine for a shuffle of similar bitcasts
ClosedPublic

Authored by sanwou01 on Feb 24 2021, 9:22 AM.

Download Raw Diff

Details

Reviewers

lebedev.ri
spatel
dmgreen
SjoerdMeijer
fhahn

Commits

rG05a6e2eb9a41: [InstCombine] Add a combine for a shuffle of similar bitcasts

Summary

Some intrinsics wrapper code has the habit of ignoring the type of the
elements in vectors, thinking of vector registers as a "bag of bits". As
a consequence, some operations are shared between vectors of different
types are shared. For example, functions that rearrange elements in a
vector can be shared between vectors of int32 and float.

This can result in bitcasts in awkward places that prevent the backend
from recognizing some instructions. For AArch64 in particular, it
inhibits the selection of dup from a general purpose register (GPR), and
mov from GPR to a vector lane.

This patch adds a pattern in InstCombine to move the bitcasts past the
shufflevector if this is possible. Sometimes this even allows
InstCombine to remove the bitcast entirely, as in the included tests.

Alternatively this could be done with a few extra patterns in the
AArch64 backend, but InstCombine seems like a better place for this.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sanwou01 created this revision.Feb 24 2021, 9:22 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald TranscriptFeb 24 2021, 9:22 AM

sanwou01 requested review of this revision.Feb 24 2021, 9:22 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2021, 9:22 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B90628: Diff 326116.Feb 24 2021, 9:23 AM

sanwou01 added reviewers: spatel, dmgreen, SjoerdMeijer, fhahn.Feb 24 2021, 9:24 AM

lebedev.ri edited reviewers, added: lebedev.ri; removed: spatel, dmgreen, SjoerdMeijer, fhahn.Feb 24 2021, 9:32 AM

lebedev.ri added a subscriber: lebedev.ri.

lebedev.ri added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2297	One of the original bitcasts needs to be one-use, else we'll increase instruction count.
2299	Can we do anything for the case where we have element count mismatches?
2301

Whoops, phab did something weird

sanwou01 added inline comments.Feb 24 2021, 9:43 AM

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2297	Good point! I can bail out if neither is one-use.
2299	Yes, I'm being fairly conservative here. I think the following example would be legal to transform similarly. Is that what you had in mind? %0 = bitcast <4 x i32> %a to <4 x float> %1 = bitcast <2 x i32> %b to <2 x float> %2 = shufflevector <4 x float> %0, <2 x float> %1, <2 x i32> <i32 3, i32 5>
2301	Nice, thanks!

lebedev.ri added inline comments.Feb 24 2021, 9:46 AM

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2299	I actually don't have anything particular in mind, just asking. That being said, i think both operands of a `shufflevector` must have the same type (including vector element count), so i'm not sure that example works?

We intentionally do not create new shuffle masks in instcombine because we can't guarantee that codegen can lower arbitrary masks efficiently, but this patch seems fine since it just re-uses the existing mask.
If there is motivation to handle casts of different-sized elements (and therefore requires a new mask), you might look at building on VectorCombine::foldBitcastShuf(). We use the cost model there to avoid creating unsupported shuffles.

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2299	It should be ok to handle a length-changing shuffle, but we need to add type checks to make it safe. Note that as-is this patch doesn't have the right combination - this crashes: define <4 x double> @bc_shuf(<4 x i32> %x, <4 x i32> %y) { %xb = bitcast <4 x i32> %x to <2 x double> %yb = bitcast <4 x i32> %y to <2 x double> %r = shufflevector <2 x double> %xb, <2 x double> %yb, <4 x i32> <i32 1, i32 3, i32 0, i32 2> ret <4 x double> %r } So definitely add some negative tests to make sure we don't break things.

In D97397#2585516, @spatel wrote:

We intentionally do not create new shuffle masks in instcombine because we can't guarantee that codegen can lower arbitrary masks efficiently, but this patch seems fine since it just re-uses the existing mask.
If there is motivation to handle casts of different-sized elements (and therefore requires a new mask), you might look at building on VectorCombine::foldBitcastShuf(). We use the cost model there to avoid creating unsupported shuffles.

I don't have a motivation to handle bitcasts that change the element size (or scalar to vector, for that matter), so I'd prefer to keep this simple.

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
2299	Good catch, I've added tests for length-changing shuffles (which as you point out we can handle without changing the mask) and some negative tests for bitcasts that change the element size.

sanwou01 marked an inline comment as done.Mar 1 2021, 5:07 AM

Address comments.

sanwou01 retitled this revision from [InstCombine] Add a combine for a shuffle of identical bitcasts to [InstCombine] Add a combine for a shuffle of similar bitcasts.Mar 1 2021, 5:12 AM

Harbormaster completed remote builds in B91316: Diff 327083.Mar 1 2021, 5:12 AM

I'd like to see some test improvements:

There are no tests with extra uses on bitcasts
There are no tests with something like

%xb = bitcast <2 x half> %x to <2 x i16>
%yb = bitcast <2 x bfloat> %y to <2 x i16>
%r = shufflevector <2 x i16> %xb, <2 x i16> %yb, <4 x i16> <i16 3, i16 2, i16 1, i16 0>

I suspect this will miscompile?

Some tests have unneeded stuff. They should only contain 2 bitcasts and a shufflevector (and a ret), nothing more.

In D97397#2594000, @lebedev.ri wrote:
I'd like to see some test improvements:

There are no tests with extra uses on bitcasts

There are no tests with something like
%xb = bitcast <2 x half> %x to <2 x i16>
%yb = bitcast <2 x bfloat> %y to <2 x i16>
%r = shufflevector <2 x i16> %xb, <2 x i16> %yb, <4 x i16> <i16 3, i16 2, i16 1, i16 0>
I suspect this will miscompile?

Some tests have unneeded stuff. They should only contain 2 bitcasts and a shufflevector (and a ret), nothing more.

Thanks for the review, @lebedev.ri !

Fair enough, I'll add some tests for that.
I'll add this as a test, but I don't think it miscompiles: I do check that the input types of the two bitcasts are identical.
Can do. The first two tests were derived from my motivating examples, but I can simplify them some more as you point out, thanks.

Add a few more tests and some further test reduction.

Harbormaster completed remote builds in B91345: Diff 327126.Mar 1 2021, 8:27 AM

@spatel @lebedev.ri thanks for the comments so far. Any other comments, or is this okay as is?

LGTM

This revision is now accepted and ready to land.Mar 5 2021, 8:50 AM

LGTM

This revision was landed with ongoing or failed builds.Mar 8 2021, 8:37 AM

Closed by commit rG05a6e2eb9a41: [InstCombine] Add a combine for a shuffle of similar bitcasts (authored by sanwou01). · Explain Why

This revision was automatically updated to reflect the committed changes.

sanwou01 added a commit: rG05a6e2eb9a41: [InstCombine] Add a combine for a shuffle of similar bitcasts.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineVectorOps.cpp

20 lines

test/

Transforms/

InstCombine/

shuffle-cast-dist.ll

21 lines

Diff 327126

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

Show First 20 Lines • Show All 2,283 Lines • ▼ Show 20 Lines if (auto *V = SimplifyShuffleVectorInst(LHS, RHS, SVI.getShuffleMask(),

return replaceInstUsesWith(SVI, V); return replaceInstUsesWith(SVI, V);

// Bail out for scalable vectors // Bail out for scalable vectors

if (isa<ScalableVectorType>(LHS->getType())) if (isa<ScalableVectorType>(LHS->getType()))

return nullptr; return nullptr;

unsigned VWidth = cast<FixedVectorType>(SVI.getType())->getNumElements(); unsigned VWidth = cast<FixedVectorType>(SVI.getType())->getNumElements();

unsigned LHSWidth = cast<FixedVectorType>(LHS->getType())->getNumElements(); unsigned LHSWidth = cast<FixedVectorType>(LHS->getType())->getNumElements();

// shuffle (bitcast X), (bitcast Y), Mask --> bitcast (shuffle X, Y, Mask)

// if X and Y are of the same (vector) type, and the element size is not

// changed by the bitcasts, we can distribute the bitcasts through the

// shuffle, hopefully reducing the number of instructions. We make sure that

lebedev.riUnsubmitted

Not Done

One of the original bitcasts needs to be one-use, else we'll increase instruction count.

lebedev.ri: One of the original bitcasts needs to be one-use, else we'll increase instruction count.

sanwou01AuthorUnsubmitted

Done

Good point! I can bail out if neither is one-use.

sanwou01: Good point! I can bail out if neither is one-use.

// at least one bitcast only has one use, so we don't *increase* the number of

// instructions here.

lebedev.riUnsubmitted

Not Done

Can we do anything for the case where we have element count mismatches?

lebedev.ri: Can we do anything for the case where we have element count mismatches?

sanwou01AuthorUnsubmitted

Not Done

Yes, I'm being fairly conservative here. I think the following example would be legal to transform similarly. Is that what you had in mind?

%0 = bitcast <4 x i32> %a to <4 x float>
%1 = bitcast <2 x i32> %b to <2 x float>
%2 = shufflevector <4 x float> %0, <2 x float> %1, <2 x i32> <i32 3, i32 5>

sanwou01: Yes, I'm being fairly conservative here. I think the following example would be legal to…

lebedev.riUnsubmitted

Not Done

I actually don't have anything particular in mind, just asking.
That being said, i think both operands of a shufflevector must have the same type
(including vector element count), so i'm not sure that example works?

lebedev.ri: I actually don't have anything particular in mind, just asking. That being said, i think both…

spatelUnsubmitted

Done

It should be ok to handle a length-changing shuffle, but we need to add type checks to make it safe.
Note that as-is this patch doesn't have the right combination - this crashes:

define <4 x double> @bc_shuf(<4 x i32> %x, <4 x i32> %y) {
  %xb = bitcast <4 x i32> %x to <2 x double>
  %yb = bitcast <4 x i32> %y to <2 x double>
  %r = shufflevector <2 x double> %xb, <2 x double> %yb, <4 x i32> <i32 1, i32 3, i32 0, i32 2>
  ret <4 x double> %r
}

So definitely add some negative tests to make sure we don't break things.

spatel: It should be ok to handle a length-changing shuffle, but we need to add type checks to make it…

sanwou01AuthorUnsubmitted

Done

Good catch, I've added tests for length-changing shuffles (which as you point out we can handle without changing the mask) and some negative tests for bitcasts that change the element size.

sanwou01: Good catch, I've added tests for length-changing shuffles (which as you point out we can handle…

Value *X, *Y;

if (match(LHS, m_BitCast(m_Value(X))) && match(RHS, m_BitCast(m_Value(Y))) &&

lebedev.riUnsubmitted

Not Done

X->getType() == Y->getType()) {

- Value *V = Builder.CreateShuffleVector(X, Y, SVI.getShuffleMask());

+ Value *V = Builder.CreateShuffleVector(X, Y, SVI.getShuffleMask(), SVI.getName()+".uncasted");

return new BitCastInst(V, SVI.getType());

lebedev.ri:

sanwou01AuthorUnsubmitted

Done

Nice, thanks!

sanwou01: Nice, thanks!

X->getType()->isVectorTy() && X->getType() == Y->getType() &&

X->getType()->getScalarSizeInBits() ==

SVI.getType()->getScalarSizeInBits() &&

(LHS->hasOneUse() || RHS->hasOneUse())) {

Value *V = Builder.CreateShuffleVector(X, Y, SVI.getShuffleMask(),

SVI.getName() + ".uncasted");

return new BitCastInst(V, SVI.getType());

}

ArrayRef<int> Mask = SVI.getShuffleMask(); ArrayRef<int> Mask = SVI.getShuffleMask();

Type *Int32Ty = Type::getInt32Ty(SVI.getContext()); Type *Int32Ty = Type::getInt32Ty(SVI.getContext());

// Peek through a bitcasted shuffle operand by scaling the mask. If the // Peek through a bitcasted shuffle operand by scaling the mask. If the

// simulated shuffle can simplify, then this shuffle is unnecessary: // simulated shuffle can simplify, then this shuffle is unnecessary:

// shuf (bitcast X), undef, Mask --> bitcast X' // shuf (bitcast X), undef, Mask --> bitcast X'

// TODO: This could be extended to allow length-changing shuffles. // TODO: This could be extended to allow length-changing shuffles.

// The transform might also be obsoleted if we allowed canonicalization // The transform might also be obsoleted if we allowed canonicalization

// of bitcasted shuffles. // of bitcasted shuffles.

Value *X;

if (match(LHS, m_BitCast(m_Value(X))) && match(RHS, m_Undef()) && if (match(LHS, m_BitCast(m_Value(X))) && match(RHS, m_Undef()) &&

X->getType()->isVectorTy() && VWidth == LHSWidth) { X->getType()->isVectorTy() && VWidth == LHSWidth) {

// Try to create a scaled mask constant. // Try to create a scaled mask constant.

auto *XType = cast<FixedVectorType>(X->getType()); auto *XType = cast<FixedVectorType>(X->getType());

unsigned XNumElts = XType->getNumElements(); unsigned XNumElts = XType->getNumElements();

SmallVector<int, 16> ScaledMask; SmallVector<int, 16> ScaledMask;

if (XNumElts >= VWidth) { if (XNumElts >= VWidth) {

assert(XNumElts % VWidth == 0 && "Unexpected vector bitcast"); assert(XNumElts % VWidth == 0 && "Unexpected vector bitcast");

▲ Show 20 Lines • Show All 335 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/shuffle-cast-dist.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt < %s -instcombine -S \| FileCheck %s		; RUN: opt < %s -instcombine -S \| FileCheck %s

define <2 x float> @vtrn1(<2 x i32> %v)		define <2 x float> @vtrn1(<2 x i32> %v)
; CHECK-LABEL: @vtrn1(		; CHECK-LABEL: @vtrn1(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[VB1:%.]] = bitcast <2 x i32> [[V:%.]] to <2 x float>		; CHECK-NEXT: [[R_UNCASTED:%.]] = shufflevector <2 x i32> [[V:%.]], <2 x i32> undef, <2 x i32> zeroinitializer
; CHECK-NEXT: [[VB2:%.*]] = bitcast <2 x i32> [[V]] to <2 x float>		; CHECK-NEXT: [[R:%.*]] = bitcast <2 x i32> [[R_UNCASTED]] to <2 x float>
; CHECK-NEXT: [[R:%.*]] = shufflevector <2 x float> [[VB1]], <2 x float> [[VB2]], <2 x i32> <i32 0, i32 2>
; CHECK-NEXT: ret <2 x float> [[R]]		; CHECK-NEXT: ret <2 x float> [[R]]
;		;
{		{
entry:		entry:
%vb1 = bitcast <2 x i32> %v to <2 x float>		%vb1 = bitcast <2 x i32> %v to <2 x float>
%vb2 = bitcast <2 x i32> %v to <2 x float>		%vb2 = bitcast <2 x i32> %v to <2 x float>
%r = shufflevector <2 x float> %vb1, <2 x float> %vb2, <2 x i32> <i32 0, i32 2>		%r = shufflevector <2 x float> %vb1, <2 x float> %vb2, <2 x i32> <i32 0, i32 2>
ret <2 x float> %r		ret <2 x float> %r
}		}

define <2 x float> @vtrn2(<2 x i32> %x, <2 x i32> %y) {		define <2 x float> @vtrn2(<2 x i32> %x, <2 x i32> %y) {
; CHECK-LABEL: @vtrn2(		; CHECK-LABEL: @vtrn2(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[XB:%.]] = bitcast <2 x i32> [[X:%.]] to <2 x float>		; CHECK-NEXT: [[R_UNCASTED:%.]] = shufflevector <2 x i32> [[X:%.]], <2 x i32> [[Y:%.*]], <2 x i32> <i32 1, i32 3>
; CHECK-NEXT: [[YB:%.]] = bitcast <2 x i32> [[Y:%.]] to <2 x float>		; CHECK-NEXT: [[R:%.*]] = bitcast <2 x i32> [[R_UNCASTED]] to <2 x float>
; CHECK-NEXT: [[R:%.*]] = shufflevector <2 x float> [[XB]], <2 x float> [[YB]], <2 x i32> <i32 1, i32 3>
; CHECK-NEXT: ret <2 x float> [[R]]		; CHECK-NEXT: ret <2 x float> [[R]]
;		;
entry:		entry:
%xb = bitcast <2 x i32> %x to <2 x float>		%xb = bitcast <2 x i32> %x to <2 x float>
%yb = bitcast <2 x i32> %y to <2 x float>		%yb = bitcast <2 x i32> %y to <2 x float>
%r = shufflevector <2 x float> %xb, <2 x float> %yb, <2 x i32> <i32 1, i32 3>		%r = shufflevector <2 x float> %xb, <2 x float> %yb, <2 x i32> <i32 1, i32 3>
ret <2 x float> %r		ret <2 x float> %r
}		}


define <4 x float> @bc_shuf_lenchange(<2 x i32> %x, <2 x i32> %y) {		define <4 x float> @bc_shuf_lenchange(<2 x i32> %x, <2 x i32> %y) {
; CHECK-LABEL: @bc_shuf_lenchange(		; CHECK-LABEL: @bc_shuf_lenchange(
; CHECK-NEXT: [[XB:%.]] = bitcast <2 x i32> [[X:%.]] to <2 x float>		; CHECK-NEXT: [[R_UNCASTED:%.]] = shufflevector <2 x i32> [[X:%.]], <2 x i32> [[Y:%.*]], <4 x i32> <i32 3, i32 2, i32 1, i32 0>
; CHECK-NEXT: [[YB:%.]] = bitcast <2 x i32> [[Y:%.]] to <2 x float>		; CHECK-NEXT: [[R:%.*]] = bitcast <4 x i32> [[R_UNCASTED]] to <4 x float>
; CHECK-NEXT: [[R:%.*]] = shufflevector <2 x float> [[XB]], <2 x float> [[YB]], <4 x i32> <i32 3, i32 2, i32 1, i32 0>
; CHECK-NEXT: ret <4 x float> [[R]]		; CHECK-NEXT: ret <4 x float> [[R]]
;		;
%xb = bitcast <2 x i32> %x to <2 x float>		%xb = bitcast <2 x i32> %x to <2 x float>
%yb = bitcast <2 x i32> %y to <2 x float>		%yb = bitcast <2 x i32> %y to <2 x float>
%r = shufflevector <2 x float> %xb, <2 x float> %yb, <4 x i32> <i32 3, i32 2, i32 1, i32 0>		%r = shufflevector <2 x float> %xb, <2 x float> %yb, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
ret <4 x float> %r		ret <4 x float> %r
}		}

▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	;
%xb = bitcast <4 x i32> %x to <4 x float>		%xb = bitcast <4 x i32> %x to <4 x float>
%r = shufflevector <4 x float> %xb, <4 x float> %xb, <2 x i32> <i32 0, i32 4>		%r = shufflevector <4 x float> %xb, <4 x float> %xb, <2 x i32> <i32 0, i32 4>
ret <2 x float> %r		ret <2 x float> %r
}		}

define <4 x float> @bc_shuf_y_hasoneuse(<4 x i32> %x, <4 x i32> %y){		define <4 x float> @bc_shuf_y_hasoneuse(<4 x i32> %x, <4 x i32> %y){
; CHECK-LABEL: @bc_shuf_y_hasoneuse(		; CHECK-LABEL: @bc_shuf_y_hasoneuse(
; CHECK-NEXT: [[XB:%.]] = bitcast <4 x i32> [[X:%.]] to <4 x float>		; CHECK-NEXT: [[XB:%.]] = bitcast <4 x i32> [[X:%.]] to <4 x float>
; CHECK-NEXT: [[YB:%.]] = bitcast <4 x i32> [[Y:%.]] to <4 x float>		; CHECK-NEXT: [[SHUF_UNCASTED:%.]] = shufflevector <4 x i32> [[X]], <4 x i32> [[Y:%.]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>
; CHECK-NEXT: [[SHUF:%.*]] = shufflevector <4 x float> [[XB]], <4 x float> [[YB]], <4 x i32> <i32 0, i32 1, i32 4, i32 5>		; CHECK-NEXT: [[SHUF:%.*]] = bitcast <4 x i32> [[SHUF_UNCASTED]] to <4 x float>
; CHECK-NEXT: [[R:%.*]] = fadd <4 x float> [[SHUF]], [[XB]]		; CHECK-NEXT: [[R:%.*]] = fadd <4 x float> [[XB]], [[SHUF]]
; CHECK-NEXT: ret <4 x float> [[R]]		; CHECK-NEXT: ret <4 x float> [[R]]
;		;
%xb = bitcast <4 x i32> %x to <4 x float>		%xb = bitcast <4 x i32> %x to <4 x float>
%yb = bitcast <4 x i32> %y to <4 x float>		%yb = bitcast <4 x i32> %y to <4 x float>
%shuf = shufflevector <4 x float> %xb, <4 x float> %yb, <4 x i32> <i32 0, i32 1, i32 4, i32 5>		%shuf = shufflevector <4 x float> %xb, <4 x float> %yb, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
%r = fadd <4 x float> %xb, %shuf		%r = fadd <4 x float> %xb, %shuf
ret <4 x float> %r		ret <4 x float> %r
}		}
Show All 17 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Add a combine for a shuffle of similar bitcastsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 327126

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

llvm/test/Transforms/InstCombine/shuffle-cast-dist.ll

[InstCombine] Add a combine for a shuffle of similar bitcasts
ClosedPublic