This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
3/3
InstCombineVectorOps.cpp
-
test/Transforms/
-
Transforms/
-
InstCombine/
-
shuffle-cast.ll
-
PhaseOrdering/
-
vector-trunc.ll

Differential D77299

[InstCombine] convert bitcast-shuffle to vector trunc
ClosedPublic

Authored by spatel on Apr 2 2020, 5:38 AM.

Download Raw Diff

Details

Reviewers

lebedev.ri
jeroen.dobbelaere
efriedma
dsprenkels

Commits

rG538a8f02271b: [InstCombine] convert bitcast-shuffle to vector trunc

Summary

As discussed in D76983, that patch can turn a chain of insert/extract with scalar trunc ops into bitcast+extract and existing instcombine vector transforms end up creating a shuffle out of that (see the PhaseOrdering test for an example). Currently, that process requires at least this sequence: -instcombine -early-cse -instcombine.

Before D76983, the sequence of insert/extract would reach the SLP vectorizer and become a vector trunc there.

Based on a small sampling of public targets/types, converting the shuffle to a trunc is better for codegen in most cases (and a regression of that form is the reason this was noticed). The trunc is clearly better for IR-level analysis as well.

This means that we can induce "spontaneous vectorization" without invoking any explicit vectorizer passes (at least a vector cast op may be created out of scalar casts), but that seems to be the right choice given that we started with a chain of insert/extract, and the backend would expand back to that chain if a target does not support the op.

Diff Detail

Event Timeline

spatel created this revision.Apr 2 2020, 5:38 AM

Herald added subscribers: hiraditya, mcrosier. · View Herald TranscriptApr 2 2020, 5:38 AM

spatel mentioned this in D76983: [InstCombine] Transform extractelement-trunc -> bitcast-extractelement.Apr 2 2020, 6:33 AM

In the spirit of https://llvm.org/docs/CodeReview.html#non-experts-should-review-code, I am reviewing the code. But I defer the approval of this patch to somebody else who has more experience than me. (I found only a single nit.)

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
1691	`i * TruncRatio` could in theory overflow for ridiculous types. Maybe consider using `int64_t` for `LSBIndex`?

Patch updated:
Widened type for scaled index and added assert.

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
1691	Good catch (and I should've remembered from D76983...).

This patch indeed fixes our regression. Thanks !

Just one small comment about the overflow.

jeroen.dobbelaere added inline comments.Apr 3 2020, 1:58 AM

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
1691	unsigned * unsigned = unsigned. Make 'TruncRatio' int64_t (or, use uint64_t, just like in D76983).

Patch updated:
Fixed types to deal with overflow (hopefully properly this time!).

LGTM.

(But still deferring approval to somebody with more experience.)

This revision is now accepted and ready to land.Apr 3 2020, 7:05 AM

LGTM. Thanks !

Closed by commit rG538a8f02271b: [InstCombine] convert bitcast-shuffle to vector trunc (authored by spatel). · Explain WhyApr 5 2020, 6:56 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2020, 6:56 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineVectorOps.cpp

44 lines

test/

Transforms/

InstCombine/

shuffle-cast.ll

103 lines

PhaseOrdering/

vector-trunc.ll

3 lines

Diff 254611

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

Show First 20 Lines • Show All 1,651 Lines • ▼ Show 20 Lines	static Instruction *foldSelectShuffle(ShuffleVectorInst &Shuf,
NewBO->andIRFlags(B1);		NewBO->andIRFlags(B1);
if (DropNSW)		if (DropNSW)
NewBO->setHasNoSignedWrap(false);		NewBO->setHasNoSignedWrap(false);
if (is_contained(Mask, UndefMaskElem) && !MightCreatePoisonOrUB)		if (is_contained(Mask, UndefMaskElem) && !MightCreatePoisonOrUB)
NewBO->dropPoisonGeneratingFlags();		NewBO->dropPoisonGeneratingFlags();
return NewBO;		return NewBO;
}		}

		/// Convert a narrowing shuffle of a bitcasted vector into a vector truncate.
		/// Example (little endian):
		/// shuf (bitcast <4 x i16> X to <8 x i8>), <0, 2, 4, 6> --> trunc X to <4 x i8>
		static Instruction *foldTruncShuffle(ShuffleVectorInst &Shuf,
		bool IsBigEndian) {
		// This must be a bitcasted shuffle of 1 vector integer operand.
		Type *DestType = Shuf.getType();
		Value *X;
		if (!match(Shuf.getOperand(0), m_BitCast(m_Value(X))) \|\|
		!match(Shuf.getOperand(1), m_Undef()) \|\| !DestType->isIntOrIntVectorTy())
		return nullptr;

		// The source type must have the same number of elements as the shuffle,
		// and the source element type must be larger than the shuffle element type.
		Type *SrcType = X->getType();
		if (!SrcType->isVectorTy() \|\| !SrcType->isIntOrIntVectorTy() \|\|
		SrcType->getVectorNumElements() != DestType->getVectorNumElements() \|\|
		SrcType->getScalarSizeInBits() % DestType->getScalarSizeInBits() != 0)
		return nullptr;

		assert(Shuf.changesLength() && !Shuf.increasesLength() &&
		"Expected a shuffle that decreases length");

		// Last, check that the mask chooses the correct low bits for each narrow
		// element in the result.
		unsigned TruncRatio =
		SrcType->getScalarSizeInBits() / DestType->getScalarSizeInBits();
		ArrayRef<int> Mask = Shuf.getShuffleMask();
		for (unsigned i = 0, e = Mask.size(); i != e; ++i) {
		if (Mask[i] == UndefMaskElem)
		continue;
		int64_t LSBIndex = IsBigEndian ? (i + 1) * TruncRatio - 1 : i * TruncRatio;
		dsprenkelsUnsubmitted Done Reply Inline Actions `i * TruncRatio` could in theory overflow for ridiculous types. Maybe consider using `int64_t` for `LSBIndex`? dsprenkels: `i * TruncRatio` could in theory overflow for ridiculous types. Maybe consider using `int64_t`…
		spatelAuthorUnsubmitted Done Reply Inline Actions Good catch (and I should've remembered from D76983...). spatel: Good catch (and I should've remembered from D76983...).
		jeroen.dobbelaereUnsubmitted Done Reply Inline Actions unsigned * unsigned = unsigned. Make 'TruncRatio' int64_t (or, use uint64_t, just like in D76983). jeroen.dobbelaere: unsigned * unsigned = unsigned. Make 'TruncRatio' int64_t (or, use uint64_t, just like in…
		assert(LSBIndex <= std::numeric_limits<int32_t>::max() &&
		"Overflowed 32-bits");
		if (Mask[i] != LSBIndex)
		return nullptr;
		}

		return new TruncInst(X, DestType);
		}

/// Match a shuffle-select-shuffle pattern where the shuffles are widening and		/// Match a shuffle-select-shuffle pattern where the shuffles are widening and
/// narrowing (concatenating with undef and extracting back to the original		/// narrowing (concatenating with undef and extracting back to the original
/// length). This allows replacing the wide select with a narrow select.		/// length). This allows replacing the wide select with a narrow select.
static Instruction *narrowVectorSelect(ShuffleVectorInst &Shuf,		static Instruction *narrowVectorSelect(ShuffleVectorInst &Shuf,
InstCombiner::BuilderTy &Builder) {		InstCombiner::BuilderTy &Builder) {
// This must be a narrowing identity shuffle. It extracts the 1st N elements		// This must be a narrowing identity shuffle. It extracts the 1st N elements
// of the 1st vector operand of a shuffle.		// of the 1st vector operand of a shuffle.
if (!match(Shuf.getOperand(1), m_Undef()) \|\| !Shuf.isIdentityWithExtract())		if (!match(Shuf.getOperand(1), m_Undef()) \|\| !Shuf.isIdentityWithExtract())
▲ Show 20 Lines • Show All 278 Lines • ▼ Show 20 Lines	Instruction *InstCombiner::visitShuffleVectorInst(ShuffleVectorInst &SVI) {
}		}

if (Instruction *I = canonicalizeInsertSplat(SVI, Builder))		if (Instruction *I = canonicalizeInsertSplat(SVI, Builder))
return I;		return I;

if (Instruction *I = foldSelectShuffle(SVI, Builder, DL))		if (Instruction *I = foldSelectShuffle(SVI, Builder, DL))
return I;		return I;

		if (Instruction *I = foldTruncShuffle(SVI, DL.isBigEndian()))
		return I;

if (Instruction *I = narrowVectorSelect(SVI, Builder))		if (Instruction *I = narrowVectorSelect(SVI, Builder))
return I;		return I;

APInt UndefElts(VWidth, 0);		APInt UndefElts(VWidth, 0);
APInt AllOnesEltMask(APInt::getAllOnesValue(VWidth));		APInt AllOnesEltMask(APInt::getAllOnesValue(VWidth));
if (Value *V = SimplifyDemandedVectorElts(&SVI, AllOnesEltMask, UndefElts)) {		if (Value *V = SimplifyDemandedVectorElts(&SVI, AllOnesEltMask, UndefElts)) {
if (V != &SVI)		if (V != &SVI)
return replaceInstUsesWith(SVI, V);		return replaceInstUsesWith(SVI, V);
▲ Show 20 Lines • Show All 292 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/shuffle-cast.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S -data-layout="e" \| FileCheck %s --check-prefixes=ANY,LE			; RUN: opt < %s -instcombine -S -data-layout="e" \| FileCheck %s --check-prefixes=ANY,LE
	; RUN: opt < %s -instcombine -S -data-layout="E" \| FileCheck %s --check-prefixes=ANY,BE			; RUN: opt < %s -instcombine -S -data-layout="E" \| FileCheck %s --check-prefixes=ANY,BE

	define <4 x i16> @trunc_little_endian(<4 x i32> %x) {			define <4 x i16> @trunc_little_endian(<4 x i32> %x) {
	; ANY-LABEL: @trunc_little_endian(			; LE-LABEL: @trunc_little_endian(
	; ANY-NEXT: [[B:%.]] = bitcast <4 x i32> [[X:%.]] to <8 x i16>			; LE-NEXT: [[R:%.]] = trunc <4 x i32> [[X:%.]] to <4 x i16>
	; ANY-NEXT: [[R:%.*]] = shufflevector <8 x i16> [[B]], <8 x i16> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>			; LE-NEXT: ret <4 x i16> [[R]]
	; ANY-NEXT: ret <4 x i16> [[R]]			;
				; BE-LABEL: @trunc_little_endian(
				; BE-NEXT: [[B:%.]] = bitcast <4 x i32> [[X:%.]] to <8 x i16>
				; BE-NEXT: [[R:%.*]] = shufflevector <8 x i16> [[B]], <8 x i16> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
				; BE-NEXT: ret <4 x i16> [[R]]
	;			;
	%b = bitcast <4 x i32> %x to <8 x i16>			%b = bitcast <4 x i32> %x to <8 x i16>
	%r = shufflevector <8 x i16> %b, <8 x i16> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>			%r = shufflevector <8 x i16> %b, <8 x i16> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
	ret <4 x i16> %r			ret <4 x i16> %r
	}			}

	define <4 x i16> @trunc_big_endian(<4 x i32> %x) {			define <4 x i16> @trunc_big_endian(<4 x i32> %x) {
	; ANY-LABEL: @trunc_big_endian(			; LE-LABEL: @trunc_big_endian(
	; ANY-NEXT: [[B:%.]] = bitcast <4 x i32> [[X:%.]] to <8 x i16>			; LE-NEXT: [[B:%.]] = bitcast <4 x i32> [[X:%.]] to <8 x i16>
	; ANY-NEXT: [[R:%.*]] = shufflevector <8 x i16> [[B]], <8 x i16> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>			; LE-NEXT: [[R:%.*]] = shufflevector <8 x i16> [[B]], <8 x i16> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
	; ANY-NEXT: ret <4 x i16> [[R]]			; LE-NEXT: ret <4 x i16> [[R]]
				;
				; BE-LABEL: @trunc_big_endian(
				; BE-NEXT: [[R:%.]] = trunc <4 x i32> [[X:%.]] to <4 x i16>
				; BE-NEXT: ret <4 x i16> [[R]]
	;			;
	%b = bitcast <4 x i32> %x to <8 x i16>			%b = bitcast <4 x i32> %x to <8 x i16>
	%r = shufflevector <8 x i16> %b, <8 x i16> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>			%r = shufflevector <8 x i16> %b, <8 x i16> undef, <4 x i32> <i32 1, i32 3, i32 5, i32 7>
	ret <4 x i16> %r			ret <4 x i16> %r
	}			}

	declare void @use_v8i16(<8 x i16>)			declare void @use_v8i16(<8 x i16>)

				; Extra use is ok.

	define <2 x i16> @trunc_little_endian_extra_use(<2 x i64> %x) {			define <2 x i16> @trunc_little_endian_extra_use(<2 x i64> %x) {
	; ANY-LABEL: @trunc_little_endian_extra_use(			; LE-LABEL: @trunc_little_endian_extra_use(
	; ANY-NEXT: [[B:%.]] = bitcast <2 x i64> [[X:%.]] to <8 x i16>			; LE-NEXT: [[B:%.]] = bitcast <2 x i64> [[X:%.]] to <8 x i16>
	; ANY-NEXT: call void @use_v8i16(<8 x i16> [[B]])			; LE-NEXT: call void @use_v8i16(<8 x i16> [[B]])
	; ANY-NEXT: [[R:%.*]] = shufflevector <8 x i16> [[B]], <8 x i16> undef, <2 x i32> <i32 0, i32 4>			; LE-NEXT: [[R:%.*]] = trunc <2 x i64> [[X]] to <2 x i16>
	; ANY-NEXT: ret <2 x i16> [[R]]			; LE-NEXT: ret <2 x i16> [[R]]
				;
				; BE-LABEL: @trunc_little_endian_extra_use(
				; BE-NEXT: [[B:%.]] = bitcast <2 x i64> [[X:%.]] to <8 x i16>
				; BE-NEXT: call void @use_v8i16(<8 x i16> [[B]])
				; BE-NEXT: [[R:%.*]] = shufflevector <8 x i16> [[B]], <8 x i16> undef, <2 x i32> <i32 0, i32 4>
				; BE-NEXT: ret <2 x i16> [[R]]
	;			;
	%b = bitcast <2 x i64> %x to <8 x i16>			%b = bitcast <2 x i64> %x to <8 x i16>
	call void @use_v8i16(<8 x i16> %b)			call void @use_v8i16(<8 x i16> %b)
	%r = shufflevector <8 x i16> %b, <8 x i16> undef, <2 x i32> <i32 0, i32 4>			%r = shufflevector <8 x i16> %b, <8 x i16> undef, <2 x i32> <i32 0, i32 4>
	ret <2 x i16> %r			ret <2 x i16> %r
	}			}

	declare void @use_v12i11(<12 x i11>)			declare void @use_v12i11(<12 x i11>)

				; Weird types are ok.

	define <4 x i11> @trunc_big_endian_extra_use(<4 x i33> %x) {			define <4 x i11> @trunc_big_endian_extra_use(<4 x i33> %x) {
	; ANY-LABEL: @trunc_big_endian_extra_use(			; LE-LABEL: @trunc_big_endian_extra_use(
	; ANY-NEXT: [[B:%.]] = bitcast <4 x i33> [[X:%.]] to <12 x i11>			; LE-NEXT: [[B:%.]] = bitcast <4 x i33> [[X:%.]] to <12 x i11>
	; ANY-NEXT: call void @use_v12i11(<12 x i11> [[B]])			; LE-NEXT: call void @use_v12i11(<12 x i11> [[B]])
	; ANY-NEXT: [[R:%.*]] = shufflevector <12 x i11> [[B]], <12 x i11> undef, <4 x i32> <i32 2, i32 5, i32 8, i32 11>			; LE-NEXT: [[R:%.*]] = shufflevector <12 x i11> [[B]], <12 x i11> undef, <4 x i32> <i32 2, i32 5, i32 8, i32 11>
	; ANY-NEXT: ret <4 x i11> [[R]]			; LE-NEXT: ret <4 x i11> [[R]]
				;
				; BE-LABEL: @trunc_big_endian_extra_use(
				; BE-NEXT: [[B:%.]] = bitcast <4 x i33> [[X:%.]] to <12 x i11>
				; BE-NEXT: call void @use_v12i11(<12 x i11> [[B]])
				; BE-NEXT: [[R:%.*]] = trunc <4 x i33> [[X]] to <4 x i11>
				; BE-NEXT: ret <4 x i11> [[R]]
	;			;
	%b = bitcast <4 x i33> %x to <12 x i11>			%b = bitcast <4 x i33> %x to <12 x i11>
	call void @use_v12i11(<12 x i11> %b)			call void @use_v12i11(<12 x i11> %b)
	%r = shufflevector <12 x i11> %b, <12 x i11> undef, <4 x i32> <i32 2, i32 5, i32 8, i32 11>			%r = shufflevector <12 x i11> %b, <12 x i11> undef, <4 x i32> <i32 2, i32 5, i32 8, i32 11>
	ret <4 x i11> %r			ret <4 x i11> %r
	}			}

				define <4 x i16> @wrong_cast1(i128 %x) {
				; ANY-LABEL: @wrong_cast1(
				; ANY-NEXT: [[B:%.]] = bitcast i128 [[X:%.]] to <8 x i16>
				; ANY-NEXT: [[R:%.*]] = shufflevector <8 x i16> [[B]], <8 x i16> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
				; ANY-NEXT: ret <4 x i16> [[R]]
				;
				%b = bitcast i128 %x to <8 x i16>
				%r = shufflevector <8 x i16> %b, <8 x i16> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
				ret <4 x i16> %r
				}

				define <4 x i16> @wrong_cast2(<4 x float> %x) {
				; ANY-LABEL: @wrong_cast2(
				; ANY-NEXT: [[B:%.]] = bitcast <4 x float> [[X:%.]] to <8 x i16>
				; ANY-NEXT: [[R:%.*]] = shufflevector <8 x i16> [[B]], <8 x i16> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
				; ANY-NEXT: ret <4 x i16> [[R]]
				;
				%b = bitcast <4 x float> %x to <8 x i16>
				%r = shufflevector <8 x i16> %b, <8 x i16> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
				ret <4 x i16> %r
				}

				define <4 x half> @wrong_cast3(<4 x i32> %x) {
				; ANY-LABEL: @wrong_cast3(
				; ANY-NEXT: [[B:%.]] = bitcast <4 x i32> [[X:%.]] to <8 x half>
				; ANY-NEXT: [[R:%.*]] = shufflevector <8 x half> [[B]], <8 x half> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
				; ANY-NEXT: ret <4 x half> [[R]]
				;
				%b = bitcast <4 x i32> %x to <8 x half>
				%r = shufflevector <8 x half> %b, <8 x half> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
				ret <4 x half> %r
				}

				define <2 x i16> @wrong_shuffle(<4 x i32> %x) {
				; ANY-LABEL: @wrong_shuffle(
				; ANY-NEXT: [[B:%.]] = bitcast <4 x i32> [[X:%.]] to <8 x i16>
				; ANY-NEXT: [[R:%.*]] = shufflevector <8 x i16> [[B]], <8 x i16> undef, <2 x i32> <i32 0, i32 2>
				; ANY-NEXT: ret <2 x i16> [[R]]
				;
				%b = bitcast <4 x i32> %x to <8 x i16>
				%r = shufflevector <8 x i16> %b, <8 x i16> undef, <2 x i32> <i32 0, i32 2>
				ret <2 x i16> %r
				}

llvm/test/Transforms/PhaseOrdering/vector-trunc.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -O2 -S -data-layout="e" < %s \| FileCheck %s --check-prefixes=ANY,OLDPM			; RUN: opt -O2 -S -data-layout="e" < %s \| FileCheck %s --check-prefixes=ANY,OLDPM
	; RUN: opt -passes='default<O2>' -S -data-layout="e" < %s \| FileCheck %s --check-prefixes=ANY,NEWPM			; RUN: opt -passes='default<O2>' -S -data-layout="e" < %s \| FileCheck %s --check-prefixes=ANY,NEWPM

	define <4 x i16> @truncate(<4 x i32> %x) {			define <4 x i16> @truncate(<4 x i32> %x) {
	; ANY-LABEL: @truncate(			; ANY-LABEL: @truncate(
	; ANY-NEXT: [[TMP1:%.]] = bitcast <4 x i32> [[X:%.]] to <8 x i16>			; ANY-NEXT: [[V3:%.]] = trunc <4 x i32> [[X:%.]] to <4 x i16>
	; ANY-NEXT: [[V3:%.*]] = shufflevector <8 x i16> [[TMP1]], <8 x i16> undef, <4 x i32> <i32 0, i32 2, i32 4, i32 6>
	; ANY-NEXT: ret <4 x i16> [[V3]]			; ANY-NEXT: ret <4 x i16> [[V3]]
	;			;
	%x0 = extractelement <4 x i32> %x, i32 0			%x0 = extractelement <4 x i32> %x, i32 0
	%t0 = trunc i32 %x0 to i16			%t0 = trunc i32 %x0 to i16
	%v0 = insertelement <4 x i16> undef, i16 %t0, i32 0			%v0 = insertelement <4 x i16> undef, i16 %t0, i32 0
	%x1 = extractelement <4 x i32> %x, i32 1			%x1 = extractelement <4 x i32> %x, i32 1
	%t1 = trunc i32 %x1 to i16			%t1 = trunc i32 %x1 to i16
	%v1 = insertelement <4 x i16> %v0, i16 %t1, i32 1			%v1 = insertelement <4 x i16> %v0, i16 %t1, i32 1
	%x2 = extractelement <4 x i32> %x, i32 2			%x2 = extractelement <4 x i32> %x, i32 2
	%t2 = trunc i32 %x2 to i16			%t2 = trunc i32 %x2 to i16
	%v2 = insertelement <4 x i16> %v1, i16 %t2, i32 2			%v2 = insertelement <4 x i16> %v1, i16 %t2, i32 2
	%x3 = extractelement <4 x i32> %x, i32 3			%x3 = extractelement <4 x i32> %x, i32 3
	%t3 = trunc i32 %x3 to i16			%t3 = trunc i32 %x3 to i16
	%v3 = insertelement <4 x i16> %v2, i16 %t3, i32 3			%v3 = insertelement <4 x i16> %v2, i16 %t3, i32 3
	ret <4 x i16> %v3			ret <4 x i16> %v3
	}			}