Download Raw Diff

Details

Reviewers

spatel
delena
mkuper
hfinkel

Commits

rG8f509a7044e5: [X86] - Catch extra combine opportunities for redundant imuls.
rL251028: [X86] - Catch extra combine opportunities for redundant imuls.

Summary

When we fold (mul (add x, c1), c1) -> (add (mul x, c2), c1*c2), we bail if (add x, c1) has multiple users, leaving and extra add instruction. In such cases, this patch adds a check to see if we can eliminate a multiply instruction in exchange for the extra add. combine-multiplies.ll illustrates this issue in a little more detail.

Diff Detail

Event Timeline

zansari updated this revision to Diff 37382.Oct 14 2015, 1:28 PM

zansari retitled this revision from to Catch combine opportunities for redundant imuls.

zansari updated this object.

zansari added reviewers: mkuper, delena, spatel, hfinkel.

zansari added a subscriber: llvm-commits.

I agree with this change. I think it should be profitable also for FP operations.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2219	You can use isConstOrConstSplat() .

spatel added inline comments.Oct 15 2015, 8:57 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2158–2160	Please move this into a helper function or two. That should reduce the indentation, the parameters can take on the new variable names, and the high-level comments can apply to a whole function.
test/CodeGen/X86/combine-multiplies.ll
2	I didn't step through the debug output, but I see that this optimization doesn't fire for x86-64. Did you look into that case?
25	I would prefer to see more of the correct output here rather than a CHECK-NOT. Eg: CHECK: imull $400, %ecx, %edx # imm = 0x190 CHECK-NEXT: leal (%edx,%eax), %esi CHECK-NEXT: movl $11, 2020(%esi,%ecx,4) CHECK-NEXT: movl $22, 2080(%edx,%eax) CHECK-NEXT: movl $33, 10080(%edx,%eax) You can commit this test case and check for the current codegen before proceeding with this patch; that will make the functional difference from the new combine clearer.
43	Remove unnecessary attributes.
45	Looks like the test case text was duplicated when you created the diff. Should there be a vector version of the test case?

New patch with changes made based on code review comments. Factored out profitability code, and beefed up lit test.

Thanks Elena and Sanjay for the review and comments.

I liked the refactoring comment, and I made those changes. Looks much nicer now. Thanks.
The code doesn't kick in for 64bit stuff, due to intermediate extends (coincidentally, given your recent patch :) ). That's why I didn't include 64bit in the lit test. I might be worth looking into this later to see if anything can be done.
I beefed up the lit test, as you suggested, but I generalized it a little, since I'm not a fan of relying too much on register allocation and symbol table layout.
I have no idea what happened to the lit test to duplicate the code. Sorry about that.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2221	Will this replace both isConstantSplatVector and isa<Constan...> ? Or just the latter? I had a hard time trying to figure out the differences between the former, and isConstOrConstSplay to see if they can be both replaced.

RKSimon added a subscriber: RKSimon.Oct 15 2015, 3:30 PM

RKSimon added inline comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2221	You should be able to replace both with isConstOrConstSplat calls. In fact I think you could use isConstantIntBuildVectorOrConstantInt instead and match all constants not just those with a splatted value. This would definitely need a vector test case though.

Thanks, Simon... Let me work on that, and hopefully that will also answer Sanjay's last question regarding needing a vector version in the lit test, to which I didn't comment on earlier.

Zia.

In D13740#268529, @zansari wrote:

Thanks, Simon... Let me work on that, and hopefully that will also answer Sanjay's last question regarding needing a vector version in the lit test, to which I didn't comment on earlier.

Thanks for the updates, Zia. Yes, we should have a vector test since this code is designed to handle vectors. Alternatively, you could bail out on vectors in this patch, add a TODO comment for that case, and make vector handling a small follow-on patch.

And yes, I'd certainly like to make this work for 64-bit too and see if D13757 will help in that case.

New patch includes changes to use "isConstOrConstSplat" to consolidate the 2 constant checks, as suggested.

Also, as suggested, I added a test to make sure we catch the vector multiply by constant case. I verified that before this patch we generate 4 pmulduqs, compared to just 2 now (i.e. we eliminate an extra vector multiply after the patch).

Thanks,
Zia.

Before:

movdqa  .LCPI1_0, %xmm1         # xmm1 = [11,11,11,11]
paddd   %xmm0, %xmm1
movdqa  .LCPI1_1, %xmm2         # xmm2 = [22,22,22,22]
pshufd  $245, %xmm1, %xmm3      # xmm3 = xmm1[1,1,3,3]
movdqa  %xmm1, x
pmuludq %xmm2, %xmm1
pshufd  $232, %xmm1, %xmm1      # xmm1 = xmm1[0,2,2,3]
pmuludq %xmm2, %xmm3
pshufd  $232, %xmm3, %xmm3      # xmm3 = xmm3[0,2,2,3]
punpckldq       %xmm3, %xmm1    # xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1]
movdqa  %xmm1, v2
pshufd  $245, %xmm0, %xmm1      # xmm1 = xmm0[1,1,3,3]
pmuludq %xmm2, %xmm0
pshufd  $232, %xmm0, %xmm0      # xmm0 = xmm0[0,2,2,3]
pmuludq %xmm2, %xmm1
pshufd  $232, %xmm1, %xmm1      # xmm1 = xmm1[0,2,2,3]
punpckldq       %xmm1, %xmm0    # xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
paddd   .LCPI1_2, %xmm0
movdqa  %xmm0, v3
retl

After:

movdqa  .LCPI1_0, %xmm1         # xmm1 = [11,11,11,11]
paddd   %xmm0, %xmm1
movdqa  .LCPI1_1, %xmm2         # xmm2 = [22,22,22,22]
pshufd  $245, %xmm0, %xmm3      # xmm3 = xmm0[1,1,3,3]
pmuludq %xmm2, %xmm0
pshufd  $232, %xmm0, %xmm0      # xmm0 = xmm0[0,2,2,3]
pmuludq %xmm2, %xmm3
pshufd  $232, %xmm3, %xmm2      # xmm2 = xmm3[0,2,2,3]
punpckldq       %xmm2, %xmm0    # xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
movdqa  .LCPI1_2, %xmm2         # xmm2 = [242,242,242,242]
paddd   %xmm0, %xmm2
movdqa  %xmm2, v2
paddd   .LCPI1_3, %xmm0
movdqa  %xmm0, v3
movdqa  %xmm1, x
retl

Thanks, Zia. Sadly, the scalar case won't fire on x86-64 even after r250560 because of the sexts, but that can be a follow-on patch.

Just a couple of nitpicks, otherwise LGTM.

I think Simon was concerned with the non-splat vector case though, so I'll let him give this another look if that scenario needs a different test case.

test/CodeGen/X86/combine-multiplies.ll
93	multiple -> multiply
120	Would a "-mattr=sse2" on the RUN line constrain this enough? I prefer to specify necessary attributes rather than CPU models as proxies for those attributes.

Thanks, Sanjay.

I've incorporated your comments.

Also, apologies to Simon as I seem to have missed the point about "non-splatted" cases. I made the suggested change to use the more general vector constant check (these were never being caught before due to a conservative constant check which I had to change), and I also added another test case to check for the non-splatted cases which verifies the desired elimination of the extra two multiplies.

Thanks, again, and I appreciate your patience with me as I'm slowly learning llvm code and process.
Zia.

The vectors test look fine to me - although please can you give them more descriptive names than 'foo'?

Thanks, Simon.. Will do.

Closed by commit rL251028: [X86] - Catch extra combine opportunities for redundant imuls. (authored by zansari). · Explain WhyOct 22 2015, 9:16 AM

This revision was automatically updated to reflect the committed changes.

Diff 37906

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 396 Lines • ▼ Show 20 Lines	struct MemOpLink {
LSBaseSDNode *MemNode;		LSBaseSDNode *MemNode;
// Offset from the base ptr.		// Offset from the base ptr.
int64_t OffsetFromBase;		int64_t OffsetFromBase;
// What is the sequence number of this mem node.		// What is the sequence number of this mem node.
// Lowest mem operand in the DAG starts at zero.		// Lowest mem operand in the DAG starts at zero.
unsigned SequenceNum;		unsigned SequenceNum;
};		};

		/// This is a helper function for visitMUL to check the profitability
		/// of folding (mul (add x, c1), c2) -> (add (mul x, c2), c1*c2).
		/// MulNode is the original multiply, AddNode is (add x, c1),
		/// and ConstNode is c2.
		bool isMulAddWithConstProfitable(SDNode *MulNode,
		SDValue &AddNode,
		SDValue &ConstNode);

/// This is a helper function for MergeStoresOfConstantsOrVecElts. Returns a		/// This is a helper function for MergeStoresOfConstantsOrVecElts. Returns a
/// constant build_vector of the stored constant values in Stores.		/// constant build_vector of the stored constant values in Stores.
SDValue getMergedConstantVectorStore(SelectionDAG &DAG,		SDValue getMergedConstantVectorStore(SelectionDAG &DAG,
SDLoc SL,		SDLoc SL,
ArrayRef<MemOpLink> Stores,		ArrayRef<MemOpLink> Stores,
SmallVectorImpl<SDValue> &Chains,		SmallVectorImpl<SDValue> &Chains,
EVT Ty) const;		EVT Ty) const;

▲ Show 20 Lines • Show All 1,727 Lines • ▼ Show 20 Lines	if (Sh.getNode()) {
SDValue Mul = DAG.getNode(ISD::MUL, SDLoc(N), VT,		SDValue Mul = DAG.getNode(ISD::MUL, SDLoc(N), VT,
Sh.getOperand(0), Y);		Sh.getOperand(0), Y);
return DAG.getNode(ISD::SHL, SDLoc(N), VT,		return DAG.getNode(ISD::SHL, SDLoc(N), VT,
Mul, Sh.getOperand(1));		Mul, Sh.getOperand(1));
}		}
}		}

// fold (mul (add x, c1), c2) -> (add (mul x, c2), c1*c2)		// fold (mul (add x, c1), c2) -> (add (mul x, c2), c1*c2)
if (N1IsConst && N0.getOpcode() == ISD::ADD && N0.getNode()->hasOneUse() &&		if (isConstantIntBuildVectorOrConstantInt(N1) &&
(isConstantSplatVector(N0.getOperand(1).getNode(), Val) \|\|		N0.getOpcode() == ISD::ADD &&
isa<ConstantSDNode>(N0.getOperand(1))))		isConstantIntBuildVectorOrConstantInt(N0.getOperand(1)) &&
		isMulAddWithConstProfitable(N, N0, N1))
return DAG.getNode(ISD::ADD, SDLoc(N), VT,		return DAG.getNode(ISD::ADD, SDLoc(N), VT,
		spatelUnsubmitted Not Done Reply Inline Actions Please move this into a helper function or two. That should reduce the indentation, the parameters can take on the new variable names, and the high-level comments can apply to a whole function. spatel: Please move this into a helper function or two. That should reduce the indentation, the…
DAG.getNode(ISD::MUL, SDLoc(N0), VT,		DAG.getNode(ISD::MUL, SDLoc(N0), VT,
N0.getOperand(0), N1),		N0.getOperand(0), N1),
DAG.getNode(ISD::MUL, SDLoc(N1), VT,		DAG.getNode(ISD::MUL, SDLoc(N1), VT,
N0.getOperand(1), N1));		N0.getOperand(1), N1));

// reassociate mul		// reassociate mul
if (SDValue RMUL = ReassociateOps(ISD::MUL, SDLoc(N), N0, N1))		if (SDValue RMUL = ReassociateOps(ISD::MUL, SDLoc(N), N0, N1))
return RMUL;		return RMUL;

return SDValue();		return SDValue();
}		}

Show All 38 Lines	if (N1C && !N1C->isNullValue() && !N1C->isOpaque() &&
(N1C->getAPIntValue().isPowerOf2() \|\|		(N1C->getAPIntValue().isPowerOf2() \|\|
(-N1C->getAPIntValue()).isPowerOf2())) {		(-N1C->getAPIntValue()).isPowerOf2())) {
// Target-specific implementation of sdiv x, pow2.		// Target-specific implementation of sdiv x, pow2.
if (SDValue Res = BuildSDIVPow2(N))		if (SDValue Res = BuildSDIVPow2(N))
return Res;		return Res;

unsigned lg2 = N1C->getAPIntValue().countTrailingZeros();		unsigned lg2 = N1C->getAPIntValue().countTrailingZeros();
SDLoc DL(N);		SDLoc DL(N);

		delenaUnsubmitted Not Done Reply Inline Actions You can use isConstOrConstSplat() . delena: You can use isConstOrConstSplat() .
// Splat the sign bit into the register		// Splat the sign bit into the register
SDValue SGN =		SDValue SGN =
		zansariAuthorUnsubmitted Not Done Reply Inline Actions Will this replace both isConstantSplatVector and isa<Constan...> ? Or just the latter? I had a hard time trying to figure out the differences between the former, and isConstOrConstSplay to see if they can be both replaced. zansari: Will this replace both isConstantSplatVector and isa<Constan...> ? Or just the latter? I had a…
		RKSimonUnsubmitted Not Done Reply Inline Actions You should be able to replace both with isConstOrConstSplat calls. In fact I think you could use isConstantIntBuildVectorOrConstantInt instead and match all constants not just those with a splatted value. This would definitely need a vector test case though. RKSimon: You should be able to replace both with isConstOrConstSplat calls. In fact I think you could…
DAG.getNode(ISD::SRA, DL, VT, N0,		DAG.getNode(ISD::SRA, DL, VT, N0,
DAG.getConstant(VT.getScalarSizeInBits() - 1, DL,		DAG.getConstant(VT.getScalarSizeInBits() - 1, DL,
getShiftAmountTy(N0.getValueType())));		getShiftAmountTy(N0.getValueType())));
AddToWorklist(SGN.getNode());		AddToWorklist(SGN.getNode());

// Add (N0 < 0) ? abs2 - 1 : 0;		// Add (N0 < 0) ? abs2 - 1 : 0;
SDValue SRL =		SDValue SRL =
DAG.getNode(ISD::SRL, DL, VT, SGN,		DAG.getNode(ISD::SRL, DL, VT, SGN,
▲ Show 20 Lines • Show All 8,570 Lines • ▼ Show 20 Lines	static BaseIndexOffset match(SDValue Ptr) {
} else IsIndexSignExt = false;		} else IsIndexSignExt = false;

int64_t Off = cast<ConstantSDNode>(Offset)->getSExtValue();		int64_t Off = cast<ConstantSDNode>(Offset)->getSExtValue();
return BaseIndexOffset(Base, Index, Off, IsIndexSignExt);		return BaseIndexOffset(Base, Index, Off, IsIndexSignExt);
}		}
};		};
} // namespace		} // namespace

		// This is a helper function for visitMUL to check the profitability
		// of folding (mul (add x, c1), c2) -> (add (mul x, c2), c1*c2).
		// MulNode is the original multiply, AddNode is (add x, c1),
		// and ConstNode is c2.
		//
		// If the (add x, c1) has multiple uses, we could increase
		// the number of adds if we make this transformation.
		// It would only be worth doing this if we can remove a
		// multiply in the process. Check for that here.
		// To illustrate:
		// (A + c1) * c3
		// (A + c2) * c3
		// We're checking for cases where we have common "c3 * A" expressions.
		bool DAGCombiner::isMulAddWithConstProfitable(SDNode *MulNode,
		SDValue &AddNode,
		SDValue &ConstNode) {
		APInt Val;

		// If the add only has one use, this would be OK to do.
		if (AddNode.getNode()->hasOneUse())
		return true;

		// Walk all the users of the constant with which we're multiplying.
		for (SDNode *Use : ConstNode->uses()) {

		if (Use == MulNode) // This use is the one we're on right now. Skip it.
		continue;

		if (Use->getOpcode() == ISD::MUL) { // We have another multiply use.
		SDNode *OtherOp;
		SDNode *MulVar = AddNode.getOperand(0).getNode();

		// OtherOp is what we're multiplying against the constant.
		if (Use->getOperand(0) == ConstNode)
		OtherOp = Use->getOperand(1).getNode();
		else
		OtherOp = Use->getOperand(0).getNode();

		// Check to see if multiply is with the same operand of our "add".
		//
		// ConstNode = CONST
		// Use = ConstNode * A <-- visiting Use. OtherOp is A.
		// ...
		// AddNode = (A + c1) <-- MulVar is A.
		// = AddNode * ConstNode <-- current visiting instruction.
		//
		// If we make this transformation, we will have a common
		// multiply (ConstNode * A) that we can save.
		if (OtherOp == MulVar)
		return true;

		// Now check to see if a future expansion will give us a common
		// multiply.
		//
		// ConstNode = CONST
		// AddNode = (A + c1)
		// ... = AddNode * ConstNode <-- current visiting instruction.
		// ...
		// OtherOp = (A + c2)
		// Use = OtherOp * ConstNode <-- visiting Use.
		//
		// If we make this transformation, we will have a common
		// multiply (CONST * A) after we also do the same transformation
		// to the "t2" instruction.
		if (OtherOp->getOpcode() == ISD::ADD &&
		isConstantIntBuildVectorOrConstantInt(OtherOp->getOperand(1)) &&
		OtherOp->getOperand(0).getNode() == MulVar)
		return true;
		}
		}

		// Didn't find a case where this would be profitable.
		return false;
		}

SDValue DAGCombiner::getMergedConstantVectorStore(SelectionDAG &DAG,		SDValue DAGCombiner::getMergedConstantVectorStore(SelectionDAG &DAG,
SDLoc SL,		SDLoc SL,
ArrayRef<MemOpLink> Stores,		ArrayRef<MemOpLink> Stores,
SmallVectorImpl<SDValue> &Chains,		SmallVectorImpl<SDValue> &Chains,
EVT Ty) const {		EVT Ty) const {
SmallVector<SDValue, 8> BuildVector;		SmallVector<SDValue, 8> BuildVector;

for (unsigned I = 0, E = Ty.getVectorNumElements(); I != E; ++I) {		for (unsigned I = 0, E = Ty.getVectorNumElements(); I != E; ++I) {
▲ Show 20 Lines • Show All 3,756 Lines • Show Last 20 Lines

test/CodeGen/X86/combine-multiplies.ll

				; RUN: llc < %s -mattr=sse2 -mtriple=i386-unknown-linux-gnu \| FileCheck %s

				spatelUnsubmitted Not Done Reply Inline Actions I didn't step through the debug output, but I see that this optimization doesn't fire for x86-64. Did you look into that case? spatel: I didn't step through the debug output, but I see that this optimization doesn't fire for x86…
				; Source file looks something like this:
				;
				; typedef int AAA[100][100];
				;
				; void foo(AAA a,int lll)
				; {
				; int LOC = lll + 5;
				;
				; a[LOC][LOC] = 11;
				;
				; a[LOC][20] = 22;
				; a[LOC+20][20] = 33;
				; }
				;
				; We want to make sure we don't generate 2 multiply instructions,
				; one for a[LOC][] and one for a[LOC+20]. visitMUL in DAGCombiner.cpp
				; should combine the instructions in such a way to avoid the extra
				; multiply.
				;
				; Output looks roughly like this:
				;
				; movl 8(%esp), %eax
				; movl 12(%esp), %ecx
				spatelUnsubmitted Not Done Reply Inline Actions I would prefer to see more of the correct output here rather than a CHECK-NOT. Eg: CHECK: imull $400, %ecx, %edx # imm = 0x190 CHECK-NEXT: leal (%edx,%eax), %esi CHECK-NEXT: movl $11, 2020(%esi,%ecx,4) CHECK-NEXT: movl $22, 2080(%edx,%eax) CHECK-NEXT: movl $33, 10080(%edx,%eax) You can commit this test case and check for the current codegen before proceeding with this patch; that will make the functional difference from the new combine clearer. spatel: I would prefer to see more of the correct output here rather than a CHECK-NOT. Eg: CHECK…
				; imull $400, %ecx, %edx # imm = 0x190
				; leal (%edx,%eax), %esi
				; movl $11, 2020(%esi,%ecx,4)
				; movl $22, 2080(%edx,%eax)
				; movl $33, 10080(%edx,%eax)
				;
				; CHECK-LABEL: foo
				; CHECK: imull $400, [[ARG1:%[a-z]+]], [[MUL:%[a-z]+]] # imm = 0x190
				; CHECK-NEXT: leal ([[MUL]],[[ARG2:%[a-z]+]]), [[LEA:%[a-z]+]]
				; CHECK-NEXT: movl $11, {{[0-9]+}}([[LEA]],[[ARG1]],4)
				; CHECK-NEXT: movl $22, {{[0-9]+}}([[MUL]],[[ARG2]])
				; CHECK-NEXT: movl $33, {{[0-9]+}}([[MUL]],[[ARG2]])
				; CHECK: retl
				;

				; Function Attrs: nounwind
				define void @foo([100 x i32]* nocapture %a, i32 %lll) {
				entry:
				spatelUnsubmitted Not Done Reply Inline Actions Remove unnecessary attributes. spatel: Remove unnecessary attributes.
				%add = add nsw i32 %lll, 5
				%arrayidx1 = getelementptr inbounds [100 x i32], [100 x i32]* %a, i32 %add, i32 %add
				spatelUnsubmitted Not Done Reply Inline Actions Looks like the test case text was duplicated when you created the diff. Should there be a vector version of the test case? spatel: Looks like the test case text was duplicated when you created the diff. Should there be a…
				store i32 11, i32* %arrayidx1, align 4
				%arrayidx3 = getelementptr inbounds [100 x i32], [100 x i32]* %a, i32 %add, i32 20
				store i32 22, i32* %arrayidx3, align 4
				%add4 = add nsw i32 %lll, 25
				%arrayidx6 = getelementptr inbounds [100 x i32], [100 x i32]* %a, i32 %add4, i32 20
				store i32 33, i32* %arrayidx6, align 4
				ret void
				}


				; Test for the same optimization on vector multiplies.
				;
				; Source looks something like this:
				;
				; typedef int v4int __attribute__((__vector_size__(16)));
				;
				; v4int x;
				; v4int v2, v3;
				; void foo_splat(v4int v1) {
				; v2 = (v1 + (v4int){ 11, 11, 11, 11 }) * (v4int) {22, 22, 22, 22};
				; v3 = (v1 + (v4int){ 33, 33, 33, 33 }) * (v4int) {22, 22, 22, 22};
				; x = (v1 + (v4int){ 11, 11, 11, 11 });
				; }
				;
				; Output looks something like this:
				;
				; foo_splat: # @foo_splat
				; # BB#0: # %entry
				; movdqa .LCPI1_0, %xmm1 # xmm1 = [11,11,11,11]
				; paddd %xmm0, %xmm1
				; movdqa .LCPI1_1, %xmm2 # xmm2 = [22,22,22,22]
				; pshufd $245, %xmm0, %xmm3 # xmm3 = xmm0[1,1,3,3]
				; pmuludq %xmm2, %xmm0
				; pshufd $232, %xmm0, %xmm0 # xmm0 = xmm0[0,2,2,3]
				; pmuludq %xmm2, %xmm3
				; pshufd $232, %xmm3, %xmm2 # xmm2 = xmm3[0,2,2,3]
				; punpckldq %xmm2, %xmm0 # xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
				; movdqa .LCPI1_2, %xmm2 # xmm2 = [242,242,242,242]
				; paddd %xmm0, %xmm2
				; paddd .LCPI1_3, %xmm0
				; movdqa %xmm2, v2
				; movdqa %xmm0, v3
				; movdqa %xmm1, x
				; retl
				;
				; Again, we want to make sure we don't generate two different multiplies.
				; We should have a single multiply for "v1 * {22, 22, 22, 22}" (made up of two
				; pmuludq instructions), followed by two adds. Without this optimization, we'd
				spatelUnsubmitted Not Done Reply Inline Actions multiple -> multiply spatel: multiple -> multiply
				; do 2 adds, followed by 2 multiplies (i.e. 4 pmuludq instructions).
				;
				; CHECK-LABEL: foo_splat
				; CHECK: movdqa .LCPI1_0, [[C11:%xmm[0-9]]]
				; CHECK-NEXT: paddd %xmm0, [[C11]]
				; CHECK-NEXT: movdqa .LCPI1_1, [[C22:%xmm[0-9]]]
				; CHECK-NEXT: pshufd $245, %xmm0, [[T1:%xmm[0-9]]]
				; CHECK-NEXT: pmuludq [[C22]], [[T2:%xmm[0-9]]]
				; CHECK-NEXT: pshufd $232, [[T2]], [[T3:%xmm[0-9]]]
				; CHECK-NEXT: pmuludq [[C22]], [[T4:%xmm[0-9]]]
				; CHECK-NEXT: pshufd $232, [[T4]], [[T5:%xmm[0-9]]]
				; CHECK-NEXT: punpckldq [[T5]], [[T6:%xmm[0-9]]]
				; CHECK-NEXT: movdqa .LCPI1_2, [[C242:%xmm[0-9]]]
				; CHECK-NEXT: paddd [[T6]], [[C242]]
				; CHECK-NEXT: paddd .LCPI1_3, [[C726:%xmm[0-9]]]
				; CHECK-NEXT: movdqa [[C242]], v2
				; CHECK-NEXT: [[C726]], v3
				; CHECK-NEXT: [[C11]], x
				; CHECK-NEXT: retl

				@v2 = common global <4 x i32> zeroinitializer, align 16
				@v3 = common global <4 x i32> zeroinitializer, align 16
				@x = common global <4 x i32> zeroinitializer, align 16

				; Function Attrs: nounwind
				define void @foo_splat(<4 x i32> %v1) {
				entry:
				spatelUnsubmitted Not Done Reply Inline Actions Would a "-mattr=sse2" on the RUN line constrain this enough? I prefer to specify necessary attributes rather than CPU models as proxies for those attributes. spatel: Would a "-mattr=sse2" on the RUN line constrain this enough? I prefer to specify necessary…
				%add1 = add <4 x i32> %v1, <i32 11, i32 11, i32 11, i32 11>
				%mul1 = mul <4 x i32> %add1, <i32 22, i32 22, i32 22, i32 22>
				%add2 = add <4 x i32> %v1, <i32 33, i32 33, i32 33, i32 33>
				%mul2 = mul <4 x i32> %add2, <i32 22, i32 22, i32 22, i32 22>
				store <4 x i32> %mul1, <4 x i32>* @v2, align 16
				store <4 x i32> %mul2, <4 x i32>* @v3, align 16
				store <4 x i32> %add1, <4 x i32>* @x, align 16
				ret void
				}

				; Finally, check the non-splatted vector case. This is very similar
				; to the previous test case, except for the vector values.
				;
				; CHECK-LABEL: foo_non_splat
				; CHECK: movdqa .LCPI2_0, [[C11:%xmm[0-9]]]
				; CHECK-NEXT: paddd %xmm0, [[C11]]
				; CHECK-NEXT: movdqa .LCPI2_1, [[C22:%xmm[0-9]]]
				; CHECK-NEXT: pshufd $245, %xmm0, [[T1:%xmm[0-9]]]
				; CHECK-NEXT: pmuludq [[C22]], [[T2:%xmm[0-9]]]
				; CHECK-NEXT: pshufd $232, [[T2]], [[T3:%xmm[0-9]]]
				; CHECK-NEXT: pshufd $245, [[C22]], [[T7:%xmm[0-9]]]
				; CHECK-NEXT: pmuludq [[T1]], [[T7]]
				; CHECK-NEXT: pshufd $232, [[T7]], [[T5:%xmm[0-9]]]
				; CHECK-NEXT: punpckldq [[T5]], [[T6:%xmm[0-9]]]
				; CHECK-NEXT: movdqa .LCPI2_2, [[C242:%xmm[0-9]]]
				; CHECK-NEXT: paddd [[T6]], [[C242]]
				; CHECK-NEXT: paddd .LCPI2_3, [[C726:%xmm[0-9]]]
				; CHECK-NEXT: movdqa [[C242]], v2
				; CHECK-NEXT: [[C726]], v3
				; CHECK-NEXT: [[C11]], x
				; CHECK-NEXT: retl
				; Function Attrs: nounwind
				define void @foo_non_splat(<4 x i32> %v1) {
				entry:
				%add1 = add <4 x i32> %v1, <i32 11, i32 22, i32 33, i32 44>
				%mul1 = mul <4 x i32> %add1, <i32 22, i32 33, i32 44, i32 55>
				%add2 = add <4 x i32> %v1, <i32 33, i32 44, i32 55, i32 66>
				%mul2 = mul <4 x i32> %add2, <i32 22, i32 33, i32 44, i32 55>
				store <4 x i32> %mul1, <4 x i32>* @v2, align 16
				store <4 x i32> %mul2, <4 x i32>* @v3, align 16
				store <4 x i32> %add1, <4 x i32>* @x, align 16
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

Catch combine opportunities for redundant imuls
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 37906

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/X86/combine-multiplies.ll

This is an archive of the discontinued LLVM Phabricator instance.

Catch combine opportunities for redundant imulsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 37906

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/X86/combine-multiplies.ll

Catch combine opportunities for redundant imuls
ClosedPublic