This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineVectorOps.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
shuffle_select.ll

Differential D48485

[InstCombine] allow shl+mul combos with shuffle (select) fold (PR37806)
ClosedPublic

Authored by spatel on Jun 22 2018, 7:04 AM.

Download Raw Diff

Details

Reviewers

RKSimon
lebedev.ri
efriedma

Commits

rG57bda365bfce: [InstCombine] allow shl+mul combos with shuffle (select) fold (PR37806)
rL335888: [InstCombine] allow shl+mul combos with shuffle (select) fold (PR37806)

Summary

This is an enhancement to D48401 that was discussed in:
https://bugs.llvm.org/show_bug.cgi?id=37806

We can convert a shift-left-by-constant into a multiply (we canonicalize IR in the other direction because that's generally better of course). This allows us to remove the shuffle as we do in the regular opcodes-are-the-same cases.

This requires a small hack to make sure we don't mistakenly introduce any extra poison:
https://rise4fun.com/Alive/ZGv

The other examples of opcodes where this would work are add+sub and fadd+fsub, but we already canonicalize those subs into adds, so there's nothing to do for those cases AFAICT. Are there other opcode pairs where we can do this kind of transform?

Note that there's a different fold needed if we've already managed to simplify away a binop as seen in the test based on PR37806, but we manage to get that one case here because the fold is positioned above the demanded elements fold currently.

Diff Detail

Repository: rL LLVM

Event Timeline

spatel created this revision.Jun 22 2018, 7:04 AM

Herald added a subscriber: mcrosier. · View Herald TranscriptJun 22 2018, 7:04 AM

Oops - just noticed a typo that makes this patch wrong.

This revision is now accepted and ready to land.Jun 22 2018, 7:11 AM

spatel planned changes to this revision.Jun 22 2018, 7:11 AM

Patch updated:
In the last rev, I forgot to remove the use of the original opcode when we create the new binop.
So we could have shifts when we should have multiplies (and the tests showed that).

This revision is now accepted and ready to land.Jun 22 2018, 7:18 AM

spatel requested review of this revision.Jun 22 2018, 7:19 AM

RKSimon added inline comments.Jun 22 2018, 7:39 AM

lib/Transforms/InstCombine/InstCombineVectorOps.cpp
1177 ↗	(On Diff #152473)	Is this going to scale well? There's likely to be a lot of 'similar' cases (ADD x,x -> SHL x,1 etc.)

spatel added inline comments.Jun 22 2018, 9:22 AM

lib/Transforms/InstCombine/InstCombineVectorOps.cpp
1177 ↗	(On Diff #152473)	That's what I wasn't sure about. I was guessing that add/sub was the common case, and we already canonicalize those. Can you list others? We can make some kind of map if there are a lot, but each case requires its own constant adjustment, so we'd end up with a switch I think.

RKSimon added inline comments.Jun 22 2018, 9:58 AM

lib/Transforms/InstCombine/InstCombineVectorOps.cpp
1177 ↗	(On Diff #152473)	ADD x,x -> SHL x,1 (or MUL x, 2) AND x,0 -> MUL x,0 might happen (not sure - it probably disappears too early) Also, merging OR x, c1 and ADD x, c2 if the carry bits don't clash (sorry, I've forgotten what this is called....) - similarly for OR x,c1 and XOR x,c2 UDIV and LSHR maybe (tricky.....)

spatel added inline comments.Jun 22 2018, 10:13 AM

lib/Transforms/InstCombine/InstCombineVectorOps.cpp
1177 ↗	(On Diff #152473)	ADD x, x is...different. We won't have a constant operand. Not sure yet what that patch looks like. The cases that simplify (only 1 binop) will be handled in another patch. Cases where we reverse a logic op into add (no common bits set?) - yes, that should slot in here. Same with udiv (I'm assuming that's a rare case though).

Is this waiting for a review, or are there changes planned?

The logic here seems sound.

lib/Transforms/InstCombine/InstCombineVectorOps.cpp
1202–1203 ↗	(On Diff #152473)	This is going to be confusing later on, given that we already have `Opc0` and `Opc1`.

In D48485#1144839, @lebedev.ri wrote:

Is this waiting for a review, or are there changes planned?

It's waiting for further review. (I screwed up the Phab state/history by clicking the wrong 'Add Action...' when I made the first revision.)
As Simon noted, there are other opcode pairs that we can handle. I think they can be added individually as follow-up patches, and the code will evolve into different shapes as needed.

lib/Transforms/InstCombine/InstCombineVectorOps.cpp
1202–1203 ↗	(On Diff #152473)	Is it just the abbreviated variable naming (damn that 80-col limit!)? If so, I could spell out 'Opcode' vs. 'Operand'. Or rearrange the logic some way?

In D48485#1144839, @lebedev.ri wrote:

Is this waiting for a review, or are there changes planned?

The logic here seems sound.

My concern is how much this will get refactored as more cases detailed in PR37806 are added

lebedev.ri added inline comments.Jun 27 2018, 5:55 AM

lib/Transforms/InstCombine/InstCombineVectorOps.cpp
1202–1203 ↗	(On Diff #152473)	damn that 80-col limit! It's actually great :) If so, I could spell out 'Opcode' vs. 'Operand'. Or rearrange the logic some way? It is perfectly clear that `Opc0` means `op code 0`. What i was talking about is that we have two of them, and yet use the same one in both cases. So maybe doing assert(Opc0 == Opc1); unsigned Opc = Opc0; and using it would be cleanest.

Patch updated:
This is just the readability improvement suggested by Roman - make it clear that the opcodes are the same when we do the transform.

The implementation could be substantially different to handle other opcodes. For example, we'll need to call value tracking to determine when 'or' can become 'add'.

Not sure how far we'll go in that direction, but I'll post a proposal that includes the add/or case as a separate patch, and we can decide if we want to build it up in pieces or add the generalization to allow more folds first.

spatel mentioned this in D48662: [InstCombine] reverse canonicalization of binops to allow more shuffle folding.Jun 27 2018, 10:42 AM

LGTM

This revision is now accepted and ready to land.Jun 28 2018, 8:12 AM

Closed by commit rL335888: [InstCombine] allow shl+mul combos with shuffle (select) fold (PR37806) (authored by spatel). · Explain WhyJun 28 2018, 10:52 AM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D48678: [InstCombine] enhance shuffle-of-binops to allow different variable ops (PR37806).Jun 28 2018, 2:27 PM

spatel mentioned this in D48830: [InstCombine] fold shuffle-with-binop and common value.Jul 2 2018, 6:59 AM

spatel mentioned this in rL336128: [InstCombine] reverse canonicalization of add --> or to allow more shuffle….Jul 2 2018, 10:47 AM

spatel mentioned this in rL336196: [InstCombine] fold shuffle-with-binop and common value.Jul 3 2018, 6:49 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

InstCombine/

InstCombineVectorOps.cpp

34 lines

test/

Transforms/

InstCombine/

shuffle_select.ll

16 lines

Diff 153354

llvm/trunk/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

Show First 20 Lines • Show All 1,158 Lines • ▼ Show 20 Lines	if (match(B0, m_BinOp(m_Value(X), m_Constant(C0))) &&
match(B1, m_BinOp(m_Specific(X), m_Constant(C1))))		match(B1, m_BinOp(m_Specific(X), m_Constant(C1))))
ConstantsAreOp1 = true;		ConstantsAreOp1 = true;
else if (match(B0, m_BinOp(m_Constant(C0), m_Value(X))) &&		else if (match(B0, m_BinOp(m_Constant(C0), m_Value(X))) &&
match(B1, m_BinOp(m_Constant(C1), m_Specific(X))))		match(B1, m_BinOp(m_Constant(C1), m_Specific(X))))
ConstantsAreOp1 = false;		ConstantsAreOp1 = false;
else		else
return nullptr;		return nullptr;

// TODO: There are potential folds where the opcodes do not match (mul+shl).		// We need matching binops to fold the lanes together.
if (B0->getOpcode() != B1->getOpcode())		BinaryOperator::BinaryOps Opc0 = B0->getOpcode();
		BinaryOperator::BinaryOps Opc1 = B1->getOpcode();
		bool DropNSW = false;
		if (ConstantsAreOp1 && Opc0 != Opc1) {
		// If we have multiply and shift-left-by-constant, convert the shift:
		// shl X, C --> mul X, 1 << C
		// TODO: We drop "nsw" if shift is converted into multiply because it may
		// not be correct when the shift amount is BitWidth - 1. We could examine
		// each vector element to determine if it is safe to keep that flag.
		if (Opc0 == Instruction::Mul && Opc1 == Instruction::Shl) {
		C1 = ConstantExpr::getShl(ConstantInt::get(C1->getType(), 1), C1);
		Opc1 = Instruction::Mul;
		DropNSW = true;
		} else if (Opc0 == Instruction::Shl && Opc1 == Instruction::Mul) {
		C0 = ConstantExpr::getShl(ConstantInt::get(C0->getType(), 1), C0);
		Opc0 = Instruction::Mul;
		DropNSW = true;
		}
		}

		if (Opc0 != Opc1)
return nullptr;		return nullptr;

		// The opcodes must be the same. Use a new name to make that clear.
		BinaryOperator::BinaryOps BOpc = Opc0;

// Remove a binop and the shuffle by rearranging the constant:		// Remove a binop and the shuffle by rearranging the constant:
// shuffle (op X, C0), (op X, C1), M --> op X, C'		// shuffle (op X, C0), (op X, C1), M --> op X, C'
// shuffle (op C0, X), (op C1, X), M --> op C', X		// shuffle (op C0, X), (op C1, X), M --> op C', X
Constant *NewC = ConstantExpr::getShuffleVector(C0, C1, Shuf.getMask());		Constant *NewC = ConstantExpr::getShuffleVector(C0, C1, Shuf.getMask());

// If the shuffle mask contains undef elements, then the new constant		// If the shuffle mask contains undef elements, then the new constant
// vector will have undefs in those lanes. This could cause the entire		// vector will have undefs in those lanes. This could cause the entire
// binop to be undef.		// binop to be undef.
if (B0->isIntDivRem())		if (B0->isIntDivRem())
NewC = getSafeVectorConstantForIntDivRem(NewC);		NewC = getSafeVectorConstantForIntDivRem(NewC);

BinaryOperator::BinaryOps Opc = B0->getOpcode();		Instruction *NewBO = ConstantsAreOp1 ? BinaryOperator::Create(BOpc, X, NewC) :
Instruction *NewBO = ConstantsAreOp1 ? BinaryOperator::Create(Opc, X, NewC) :		BinaryOperator::Create(BOpc, NewC, X);
BinaryOperator::Create(Opc, NewC, X);

// Flags are intersected from the 2 source binops.		// Flags are intersected from the 2 source binops.
NewBO->copyIRFlags(B0);		NewBO->copyIRFlags(B0);
NewBO->andIRFlags(B1);		NewBO->andIRFlags(B1);
		if (DropNSW)
		NewBO->setHasNoSignedWrap(false);
return NewBO;		return NewBO;
}		}

Instruction *InstCombiner::visitShuffleVectorInst(ShuffleVectorInst &SVI) {		Instruction *InstCombiner::visitShuffleVectorInst(ShuffleVectorInst &SVI) {
Value *LHS = SVI.getOperand(0);		Value *LHS = SVI.getOperand(0);
Value *RHS = SVI.getOperand(1);		Value *RHS = SVI.getOperand(1);
SmallVector<int, 16> Mask = SVI.getShuffleMask();		SmallVector<int, 16> Mask = SVI.getShuffleMask();
Type *Int32Ty = Type::getInt32Ty(SVI.getContext());		Type *Int32Ty = Type::getInt32Ty(SVI.getContext());
▲ Show 20 Lines • Show All 351 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/shuffle_select.ll

	Show First 20 Lines • Show All 496 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret <4 x double> [[T3]]			; CHECK-NEXT: ret <4 x double> [[T3]]
	;			;
	%t1 = fdiv <4 x double> <double 1.0, double 2.0, double 3.0, double 4.0>, %v0			%t1 = fdiv <4 x double> <double 1.0, double 2.0, double 3.0, double 4.0>, %v0
	%t2 = fdiv <4 x double> %v1, <double 5.0, double 6.0, double 7.0, double 8.0>			%t2 = fdiv <4 x double> %v1, <double 5.0, double 6.0, double 7.0, double 8.0>
	%t3 = shufflevector <4 x double> %t1, <4 x double> %t2, <4 x i32> <i32 0, i32 1, i32 6, i32 7>			%t3 = shufflevector <4 x double> %t1, <4 x double> %t2, <4 x i32> <i32 0, i32 1, i32 6, i32 7>
	ret <4 x double> %t3			ret <4 x double> %t3
	}			}

	; FIXME:
	; Shift-left with constant shift amount can be converted to mul to enable the fold.			; Shift-left with constant shift amount can be converted to mul to enable the fold.

	define <4 x i32> @mul_shl(<4 x i32> %v0) {			define <4 x i32> @mul_shl(<4 x i32> %v0) {
	; CHECK-LABEL: @mul_shl(			; CHECK-LABEL: @mul_shl(
	; CHECK-NEXT: [[T1:%.]] = mul nuw <4 x i32> [[V0:%.]], <i32 undef, i32 undef, i32 3, i32 4>			; CHECK-NEXT: [[T3:%.]] = mul nuw <4 x i32> [[V0:%.]], <i32 32, i32 64, i32 3, i32 4>
	; CHECK-NEXT: [[T2:%.*]] = shl nuw <4 x i32> [[V0]], <i32 5, i32 6, i32 7, i32 8>
	; CHECK-NEXT: [[T3:%.*]] = shufflevector <4 x i32> [[T1]], <4 x i32> [[T2]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
	; CHECK-NEXT: ret <4 x i32> [[T3]]			; CHECK-NEXT: ret <4 x i32> [[T3]]
	;			;
	%t1 = mul nuw <4 x i32> %v0, <i32 1, i32 2, i32 3, i32 4>			%t1 = mul nuw <4 x i32> %v0, <i32 1, i32 2, i32 3, i32 4>
	%t2 = shl nuw <4 x i32> %v0, <i32 5, i32 6, i32 7, i32 8>			%t2 = shl nuw <4 x i32> %v0, <i32 5, i32 6, i32 7, i32 8>
	%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 4, i32 5, i32 2, i32 3>			%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 4, i32 5, i32 2, i32 3>
	ret <4 x i32> %t3			ret <4 x i32> %t3
	}			}

				; Try with shift as operand 0 of the shuffle; 'nsw' is dropped for safety, but that could be improved.

	define <4 x i32> @shl_mul(<4 x i32> %v0) {			define <4 x i32> @shl_mul(<4 x i32> %v0) {
	; CHECK-LABEL: @shl_mul(			; CHECK-LABEL: @shl_mul(
	; CHECK-NEXT: [[T1:%.]] = shl nsw <4 x i32> [[V0:%.]], <i32 1, i32 2, i32 3, i32 4>			; CHECK-NEXT: [[T3:%.]] = mul <4 x i32> [[V0:%.]], <i32 5, i32 undef, i32 8, i32 16>
	; CHECK-NEXT: [[T2:%.*]] = mul nsw <4 x i32> [[V0]], <i32 5, i32 undef, i32 undef, i32 undef>
	; CHECK-NEXT: [[T3:%.*]] = shufflevector <4 x i32> [[T1]], <4 x i32> [[T2]], <4 x i32> <i32 4, i32 undef, i32 2, i32 3>
	; CHECK-NEXT: ret <4 x i32> [[T3]]			; CHECK-NEXT: ret <4 x i32> [[T3]]
	;			;
	%t1 = shl nsw <4 x i32> %v0, <i32 1, i32 2, i32 3, i32 4>			%t1 = shl nsw <4 x i32> %v0, <i32 1, i32 2, i32 3, i32 4>
	%t2 = mul nsw <4 x i32> %v0, <i32 5, i32 6, i32 7, i32 8>			%t2 = mul nsw <4 x i32> %v0, <i32 5, i32 6, i32 7, i32 8>
	%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 4, i32 undef, i32 2, i32 3>			%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 4, i32 undef, i32 2, i32 3>
	ret <4 x i32> %t3			ret <4 x i32> %t3
	}			}

	; PR37806 - https://bugs.llvm.org/show_bug.cgi?id=37806			; PR37806 - https://bugs.llvm.org/show_bug.cgi?id=37806
	; Demanded elements + simplification can remove the mul alone, but that's not the best case.			; Demanded elements + simplification can remove the mul alone, but that's not the best case.

	define <4 x i32> @mul_is_nop_shl(<4 x i32> %v0) {			define <4 x i32> @mul_is_nop_shl(<4 x i32> %v0) {
	; CHECK-LABEL: @mul_is_nop_shl(			; CHECK-LABEL: @mul_is_nop_shl(
	; CHECK-NEXT: [[T2:%.]] = shl <4 x i32> [[V0:%.]], <i32 5, i32 6, i32 7, i32 8>			; CHECK-NEXT: [[T3:%.]] = shl <4 x i32> [[V0:%.]], <i32 0, i32 6, i32 7, i32 8>
	; CHECK-NEXT: [[T3:%.*]] = shufflevector <4 x i32> [[V0]], <4 x i32> [[T2]], <4 x i32> <i32 0, i32 5, i32 6, i32 7>
	; CHECK-NEXT: ret <4 x i32> [[T3]]			; CHECK-NEXT: ret <4 x i32> [[T3]]
	;			;
	%t1 = mul <4 x i32> %v0, <i32 1, i32 2, i32 3, i32 4>			%t1 = mul <4 x i32> %v0, <i32 1, i32 2, i32 3, i32 4>
	%t2 = shl <4 x i32> %v0, <i32 5, i32 6, i32 7, i32 8>			%t2 = shl <4 x i32> %v0, <i32 5, i32 6, i32 7, i32 8>
	%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 0, i32 5, i32 6, i32 7>			%t3 = shufflevector <4 x i32> %t1, <4 x i32> %t2, <4 x i32> <i32 0, i32 5, i32 6, i32 7>
	ret <4 x i32> %t3			ret <4 x i32> %t3
	}			}

				; Negative test: shift amount (operand 1) must be constant.

	define <4 x i32> @shl_mul_not_constant_shift_amount(<4 x i32> %v0) {			define <4 x i32> @shl_mul_not_constant_shift_amount(<4 x i32> %v0) {
	; CHECK-LABEL: @shl_mul_not_constant_shift_amount(			; CHECK-LABEL: @shl_mul_not_constant_shift_amount(
	; CHECK-NEXT: [[T1:%.]] = shl <4 x i32> <i32 1, i32 2, i32 3, i32 4>, [[V0:%.]]			; CHECK-NEXT: [[T1:%.]] = shl <4 x i32> <i32 1, i32 2, i32 3, i32 4>, [[V0:%.]]
	; CHECK-NEXT: [[T2:%.*]] = mul <4 x i32> [[V0]], <i32 5, i32 6, i32 undef, i32 undef>			; CHECK-NEXT: [[T2:%.*]] = mul <4 x i32> [[V0]], <i32 5, i32 6, i32 undef, i32 undef>
	; CHECK-NEXT: [[T3:%.*]] = shufflevector <4 x i32> [[T1]], <4 x i32> [[T2]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>			; CHECK-NEXT: [[T3:%.*]] = shufflevector <4 x i32> [[T1]], <4 x i32> [[T2]], <4 x i32> <i32 4, i32 5, i32 2, i32 3>
	; CHECK-NEXT: ret <4 x i32> [[T3]]			; CHECK-NEXT: ret <4 x i32> [[T3]]
	;			;
	%t1 = shl <4 x i32> <i32 1, i32 2, i32 3, i32 4>, %v0			%t1 = shl <4 x i32> <i32 1, i32 2, i32 3, i32 4>, %v0
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines