Download Raw Diff

Details

Reviewers

power-llvm-team
nemanjai
stefanp
hfinkel
craig.topper

Group Reviewers

Restricted Project

Commits

rGa3ada630d8ab: [DAGCombiner] Combine shifts into multiply-high

Summary

This patch implements a target independent DAG combine to produce multiply-high
instructions from shifts. This DAG combine will combine shifts for any type as
long as the MULH on the narrow type is legal.

For now, it is enabled on PowerPC as PowerPC is the only target that has an
implementation of the isMulhCheaperThanMulShift TLI hook introduced in
D78271.

Moreover, this DAG combine focuses on catching the pattern:

(shift (mul (ext <narrow_type>:$a to <wide_type>), (ext <narrow_type>:$b to <wide_type>)), <narrow_width>)

to produce mulhs when we have a sign-extend, and mulhu when we have
a zero-extend.

The patch performs the following checks:

Operation is a right shift arithmetic (sra) or logical (srl)
Input to the shift is a multiply
Both operands to the shift are sext/zext nodes
The extends into the multiply are both the same
The narrow type is half the width of the wide type
The shift amount is the width of the narrow type
The respective mulh operation is legal

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

amyk created this revision.Apr 15 2020, 10:03 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 15 2020, 10:03 PM

Herald added subscribers: llvm-commits, shchenz, kbarton, hiraditya. · View Herald Transcript

Harbormaster failed remote builds in B53506: Diff 257956!Apr 15 2020, 10:03 PM

Herald added a subscriber: • wuzish. · View Herald TranscriptApr 15 2020, 10:03 PM

lebedev.ri added a subscriber: lebedev.ri.Apr 16 2020, 1:32 AM

lebedev.ri added inline comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
15918 ↗	(On Diff #257956)	This doesn't look like the best check for this. Do we not want this transform in general, is it expected to be pessimizing somewhere? Then i'd expect it to be a TLI hook.

amyk marked an inline comment as done.Apr 16 2020, 9:24 PM

amyk added inline comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
15918 ↗	(On Diff #257956)	I actually realize now after doing some testing that it may be better to remove this check and to run this transformation in other passes, as well. I will update the patch to reflect this. Thanks for reviewing.

Updated the patch so that the combine for shifts runs not only before type legalization, but in subsequent passes, too.

Harbormaster failed remote builds in B53676: Diff 258231!Apr 16 2020, 9:38 PM

anil9 added a subscriber: anil9.Apr 20 2020, 9:19 PM

anil9 added inline comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
15849 ↗	(On Diff #258231)	niit: combineShifttoMULH -> combineShiftToMULH

Address comment to rename combineShifttoMULH to combineShiftToMULH.

amyk marked an inline comment as done.Apr 21 2020, 2:52 PM

Is there anything that would stop us making this a generic combine in DAGCombiner?

The description mentions i32/i64 explicitly, but there does not seem to be anything in the combine that narrows it down to those two types. In fact, it appears that it will combine it for any type (scalar or vector) as long as the MULH on the narrow type is legal.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
15876 ↗	(On Diff #259109)	You will probably need something like `(void)WideVT2;` to silence warnings on non-assert builds. Or you can just not define `WideVT2`, not have the assert and trust that the well-formedness of the multiply is adequately checked in target-independent code.
15888 ↗	(On Diff #259109)	I don't understand this. Why do we assume that either `ShiftAmtSrc` is constant or its second operand is constant? What node do you expect it to be if it is not constant? Also, won't this crash on `(srl (mul (zext i32:%a to i64), (zext i32:%b to i64)), %c)`? i.e. something like: unsigned test(unsigned a, unsigned b, unsigned c) { return (unsigned) (((uint64_t)a * b) >> c); }

This revision now requires changes to proceed.Apr 24 2020, 6:08 AM

In D78272#2001767, @nemanjai wrote:

The description mentions i32/i64 explicitly, but there does not seem to be anything in the combine that narrows it down to those two types. In fact, it appears that it will combine it for any type (scalar or vector) as long as the MULH on the narrow type is legal.

Just to be clear, I am suggesting the description should be updated to match the combine, not the other way around.

Addressed review comments, and updated the summary for this patch.

In D78272#1997042, @RKSimon wrote:

Is there anything that would stop us making this a generic combine in DAGCombiner?

Is there a preference to have this target independent instead? It may be possible if there is a preference/demand for it to be.

In D78272#2021795, @amyk wrote:

In D78272#1997042, @RKSimon wrote:

Is there anything that would stop us making this a generic combine in DAGCombiner?

Is there a preference to have this target independent instead? It may be possible if there is a preference/demand for it to be.

@craig.topper - any thoughts? Given how expensive PMULLD/PMULLQ can be, we don't much to encourage PMULH generation.

In D78272#2022341, @RKSimon wrote:

In D78272#2021795, @amyk wrote:

In D78272#1997042, @RKSimon wrote:

Is there anything that would stop us making this a generic combine in DAGCombiner?

Is there a preference to have this target independent instead? It may be possible if there is a preference/demand for it to be.

@craig.topper - any thoughts? Given how expensive PMULLD/PMULLQ can be, we don't much to encourage PMULH generation.

We have a combine similar to this in X86 back it starts at the truncate not the shift.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
15882 ↗	(On Diff #262264)	SIGN_EXTEND_INREG will never pass this check will it? The input and output type for that are the same. There's an extra operand carrying the type to extend from.

craig.topper added inline comments.May 6 2020, 12:15 PM

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
15905 ↗	(On Diff #262264)	I think what extend to use at the end needs to be base of the shift opcode not the extend opcode. If its an SRL, you need to put 0s in the upper bits even if the multiply is MULHS.

craig.topper added inline comments.May 6 2020, 9:47 PM

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
15904 ↗	(On Diff #262264)	I don't see a check that RightOp.getOperand(0) and LeftOp.getOperand(0) are the the same type?

@craig.topper Do you think common-ing out the X86/PPC parts to combine to mulh into the target independent combiner is a good idea?

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
15882 ↗	(On Diff #262264)	That's a good point. I thought I had tests involving SIGN_EXTEND_INREG that works with this, but I realize now that I actually don't and they're all sign extends for this patch. I've decided to move the check for SIGN_EXTEND_INREG.
15904 ↗	(On Diff #262264)	That's true, thank you for pointing that out.
15905 ↗	(On Diff #262264)	You're right, I'll fix that.

In D78272#2025841, @amyk wrote:

@craig.topper Do you think common-ing out the X86/PPC parts to combine to mulh into the target independent combiner is a good idea?

I see a few issues.

X86 doesn't have scalar MULHU/MULHS instructions, but we have vector MULHU/MULHS on vXi16. We currently match from truncate rather than from shift. I tried to move it to shift following your code here, but I got regressions. Primarily because we handled the truncate first and turned into PACKSS/PACKUS, may an AND or SIGN_EXTEND_INREG, and some subvector extracts.. Then we match the MULH, but we couldn't fold away the sext/zext and what we had turned the truncate into. We might be able to improve that. I think it is useful to match from the shift since the truncate won't always be there. So we might need matching from both.

The other issue is that for vectors we need to match for vectors width more than the legal number of elements before type legalization. Otherwise the extends we match get type legalized into something much harder to match. But checking isOperationLegal won't work for that. Maybe we could walk the type legalization steps to find what it would be legalized to?

Updated the diff of this patch to address comments that were raised previously.

In D78272#2026458, @craig.topper wrote:

In D78272#2025841, @amyk wrote:

@craig.topper Do you think common-ing out the X86/PPC parts to combine to mulh into the target independent combiner is a good idea?

I see a few issues.

X86 doesn't have scalar MULHU/MULHS instructions, but we have vector MULHU/MULHS on vXi16. We currently match from truncate rather than from shift. I tried to move it to shift following your code here, but I got regressions. Primarily because we handled the truncate first and turned into PACKSS/PACKUS, may an AND or SIGN_EXTEND_INREG, and some subvector extracts.. Then we match the MULH, but we couldn't fold away the sext/zext and what we had turned the truncate into. We might be able to improve that. I think it is useful to match from the shift since the truncate won't always be there. So we might need matching from both.

It sounds like if we were to put this code into target-independent DAG Combine, you shouldn't see similar regressions since your combines will still run and this patch might pick up some of what couldn't be combined. Or am I misinterpreting your comment?

The other issue is that for vectors we need to match for vectors width more than the legal number of elements before type legalization. Otherwise the extends we match get type legalized into something much harder to match. But checking isOperationLegal won't work for that. Maybe we could walk the type legalization steps to find what it would be legalized to?

If I am not mistaken, this runs before legalization as well so it should combine something like
(srl (mul (zext v16i16:$a to v16i32), (zext v16i16:$a to v16i32)), (splat 16 to v16i32))
As long as MULH is legal for v16i16 (even though v16i32 is not a legal type). Of course, the actual types just for illustration - not to suggest that those are the actual types for any specific target.
Or am I again not reading your comment correctly?

In D78272#2029885, @nemanjai wrote:

In D78272#2026458, @craig.topper wrote:

In D78272#2025841, @amyk wrote:

@craig.topper Do you think common-ing out the X86/PPC parts to combine to mulh into the target independent combiner is a good idea?

I see a few issues.

X86 doesn't have scalar MULHU/MULHS instructions, but we have vector MULHU/MULHS on vXi16. We currently match from truncate rather than from shift. I tried to move it to shift following your code here, but I got regressions. Primarily because we handled the truncate first and turned into PACKSS/PACKUS, may an AND or SIGN_EXTEND_INREG, and some subvector extracts.. Then we match the MULH, but we couldn't fold away the sext/zext and what we had turned the truncate into. We might be able to improve that. I think it is useful to match from the shift since the truncate won't always be there. So we might need matching from both.

It sounds like if we were to put this code into target-independent DAG Combine, you shouldn't see similar regressions since your combines will still run and this patch might pick up some of what couldn't be combined. Or am I misinterpreting your comment?

Yes our combine will still run. I have no issue putting this in target-independent combine. I was answering from the position of why putting it in target combine doesn't get rid of X86 specific code. Which I assume was at least part of @RKSimon's motivation for moving it.

The other issue is that for vectors we need to match for vectors width more than the legal number of elements before type legalization. Otherwise the extends we match get type legalized into something much harder to match. But checking isOperationLegal won't work for that. Maybe we could walk the type legalization steps to find what it would be legalized to?

If I am not mistaken, this runs before legalization as well so it should combine something like
(srl (mul (zext v16i16:$a to v16i32), (zext v16i16:$a to v16i32)), (splat 16 to v16i32))
As long as MULH is legal for v16i16 (even though v16i32 is not a legal type). Of course, the actual types just for illustration - not to suggest that those are the actual types for any specific target.
Or am I again not reading your comment correctly?

But it's still probably worthwhile for illegal types like v128i16 since it turns two extends on the inputs to one on the output. Type legalization will split the v128i16 MULHU/MULHS. This needs to be matched before type legalization makes the v128i32 zero extend hard to spot. Number of elements exaggerated to make it look it likely illegal.

In D78272#2029966, @craig.topper wrote:

Yes our combine will still run. I have no issue putting this in target-independent combine. I was answering from the position of why putting it in target combine doesn't get rid of X86 specific code. Which I assume was at least part of @RKSimon's motivation for moving it.

Yes this was a query as to whether it would help on top of the existing x86 specific combines.

Since it would seem that this can't immediately be combined with the only other target that seems to want this, I would recommend that we keep this in the PPC back end for now. If there is interest in commoning it up in the future, we revisit this then.
Does that sound like a good plan?
And of course, thanks for all your feedback @craig.topper @RKSimon!

In D78272#2033067, @nemanjai wrote:

Since it would seem that this can't immediately be combined with the only other target that seems to want this, I would recommend that we keep this in the PPC back end for now. If there is interest in commoning it up in the future, we revisit this then.
Does that sound like a good plan?
And of course, thanks for all your feedback @craig.topper @RKSimon!

No objections from me - sorry for the delay!

In D78272#2033067, @nemanjai wrote:

Since it would seem that this can't immediately be combined with the only other target that seems to want this, I would recommend that we keep this in the PPC back end for now. If there is interest in commoning it up in the future, we revisit this then.
Does that sound like a good plan?
And of course, thanks for all your feedback @craig.topper @RKSimon!

I can see this combine being useful on RISC-V (where we have a mulh[[s]u] instruction) - would it be useful for me to work on some testcases for it?

In D78272#2033514, @lenary wrote:

I can see this combine being useful on RISC-V (where we have a mulh[[s]u] instruction) - would it be useful for me to work on some testcases for it?

Absolutely. If we have a test case (along with invocation and desired codegen) it would make it much easier to move this out to the target independent code and ensure it is doing what it is supposed to. Thank you.

craig.topper mentioned this in D80485: [DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook. Use isOperationLegalOrCustom directly instead..May 26 2020, 12:02 AM

Moved the function to combine shifts into multiply high into DAGCombiner.cpp. For now, this combine runs only on PowerPC as PowerPC has an implementation of the isMulhCheaperThanMulShift TLI query.

Herald added subscribers: ecnelises, steven.zhang. · View Herald TranscriptMay 26 2020, 8:46 AM

LGTM

@nemanjai Any more comments?

LGTM. My only remaining comments are nits about early exits from the function not happening as early as they can, but such reordering is trivial and can happen on the commit. Thank you.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7948	I think the check for the types of the inputs to the two extend nodes belongs here. Generally favour early exits as soon as the necessary information is available.
7960	This check is on the outermost operation (i.e. the shift). We can probably move this early exit towards the very top of the function as it makes no sense to do any other checks if the shift amount isn't constant. I don't think another round of review is required for this though - feel free to address this on the commit.

Now if only I remember to select "Accept Revision"... :)

This revision is now accepted and ready to land.Jun 2 2020, 5:16 AM

Closed by commit rGa3ada630d8ab: [DAGCombiner] Combine shifts into multiply-high (authored by amyk). · Explain WhyJun 2 2020, 1:45 PM

This revision was automatically updated to reflect the committed changes.

Diff 267984

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,905 Lines • ▼ Show 20 Lines	if (ConstantSDNode *NC1 = isConstOrConstSplat(N->getOperand(1))) {
APInt C0 = N0.getConstantOperandAPInt(0);		APInt C0 = N0.getConstantOperandAPInt(0);
APInt C1 = NC1->getAPIntValue();		APInt C1 = NC1->getAPIntValue();
return DAG.getVScale(DL, VT, C0 << C1);		return DAG.getVScale(DL, VT, C0 << C1);
}		}

return SDValue();		return SDValue();
}		}

		// Transform a right shift of a multiply into a multiply-high.
		// Examples:
		// (srl (mul (zext i32:$a to i64), (zext i32:$a to i64)), 32) -> (mulhu $a, $b)
		// (sra (mul (sext i32:$a to i64), (sext i32:$a to i64)), 32) -> (mulhs $a, $b)
		static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
		const TargetLowering &TLI) {
		assert((N->getOpcode() == ISD::SRL \|\| N->getOpcode() == ISD::SRA) &&
		"SRL or SRA node is required here!");

		// Check the shift amount. Proceed with the transformation if the shift
		// amount is constant.
		ConstantSDNode *ShiftAmtSrc = isConstOrConstSplat(N->getOperand(1));
		if (!ShiftAmtSrc)
		return SDValue();

		SDLoc DL(N);

		// The operation feeding into the shift must be a multiply.
		SDValue ShiftOperand = N->getOperand(0);
		if (ShiftOperand.getOpcode() != ISD::MUL)
		return SDValue();

		// Both operands must be equivalent extend nodes.
		SDValue LeftOp = ShiftOperand.getOperand(0);
		SDValue RightOp = ShiftOperand.getOperand(1);
		bool IsSignExt = LeftOp.getOpcode() == ISD::SIGN_EXTEND;
		bool IsZeroExt = LeftOp.getOpcode() == ISD::ZERO_EXTEND;

		if ((!(IsSignExt \|\| IsZeroExt)) \|\| LeftOp.getOpcode() != RightOp.getOpcode())
		return SDValue();

		EVT WideVT1 = LeftOp.getValueType();
		EVT WideVT2 = RightOp.getValueType();
		// Proceed with the transformation if the wide types match.
		assert((WideVT1 == WideVT2) &&
		nemanjaiUnsubmitted Not Done Reply Inline Actions I think the check for the types of the inputs to the two extend nodes belongs here. Generally favour early exits as soon as the necessary information is available. nemanjai: I think the check for the types of the inputs to the two extend nodes belongs here. Generally…
		"Cannot have a multiply node with two different operand types.");

		EVT NarrowVT = LeftOp.getOperand(0).getValueType();
		// Check that the two extend nodes are the same type.
		if (NarrowVT != RightOp.getOperand(0).getValueType())
		return SDValue();

		// Only transform into mulh if mulh for the narrow type is cheaper than
		// a multiply followed by a shift. This should also check if mulh is
		// legal for NarrowVT on the target.
		if (!TLI.isMulhCheaperThanMulShift(NarrowVT))
		return SDValue();
		nemanjaiUnsubmitted Not Done Reply Inline Actions This check is on the outermost operation (i.e. the shift). We can probably move this early exit towards the very top of the function as it makes no sense to do any other checks if the shift amount isn't constant. I don't think another round of review is required for this though - feel free to address this on the commit. nemanjai: This check is on the outermost operation (i.e. the shift). We can probably move this early exit…

		// Proceed with the transformation if the wide type is twice as large
		// as the narrow type.
		unsigned NarrowVTSize = NarrowVT.getScalarSizeInBits();
		if (WideVT1.getScalarSizeInBits() != 2 * NarrowVTSize)
		return SDValue();

		// Check the shift amount with the narrow type size.
		// Proceed with the transformation if the shift amount is the width
		// of the narrow type.
		unsigned ShiftAmt = ShiftAmtSrc->getZExtValue();
		if (ShiftAmt != NarrowVTSize)
		return SDValue();

		// If the operation feeding into the MUL is a sign extend (sext),
		// we use mulhs. Othewise, zero extends (zext) use mulhu.
		unsigned MulhOpcode = IsSignExt ? ISD::MULHS : ISD::MULHU;

		SDValue Result = DAG.getNode(MulhOpcode, DL, NarrowVT, LeftOp.getOperand(0),
		RightOp.getOperand(0));
		return (N->getOpcode() == ISD::SRA ? DAG.getSExtOrTrunc(Result, DL, WideVT1)
		: DAG.getZExtOrTrunc(Result, DL, WideVT1));
		}

SDValue DAGCombiner::visitSRA(SDNode *N) {		SDValue DAGCombiner::visitSRA(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
if (SDValue V = DAG.simplifyShift(N0, N1))		if (SDValue V = DAG.simplifyShift(N0, N1))
return V;		return V;

EVT VT = N0.getValueType();		EVT VT = N0.getValueType();
unsigned OpSizeInBits = VT.getScalarSizeInBits();		unsigned OpSizeInBits = VT.getScalarSizeInBits();
▲ Show 20 Lines • Show All 171 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitSRA(SDNode *N) {
// If the sign bit is known to be zero, switch this to a SRL.		// If the sign bit is known to be zero, switch this to a SRL.
if (DAG.SignBitIsZero(N0))		if (DAG.SignBitIsZero(N0))
return DAG.getNode(ISD::SRL, SDLoc(N), VT, N0, N1);		return DAG.getNode(ISD::SRL, SDLoc(N), VT, N0, N1);

if (N1C && !N1C->isOpaque())		if (N1C && !N1C->isOpaque())
if (SDValue NewSRA = visitShiftByConstant(N))		if (SDValue NewSRA = visitShiftByConstant(N))
return NewSRA;		return NewSRA;

		// Try to transform this shift into a multiply-high if
		// it matches the appropriate pattern detected in combineShiftToMULH.
		if (SDValue MULH = combineShiftToMULH(N, DAG, TLI))
		return MULH;

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitSRL(SDNode *N) {		SDValue DAGCombiner::visitSRL(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
if (SDValue V = DAG.simplifyShift(N0, N1))		if (SDValue V = DAG.simplifyShift(N0, N1))
return V;		return V;
▲ Show 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	if (N->hasOneUse()) {
else if (Use->getOpcode() == ISD::TRUNCATE && Use->hasOneUse()) {		else if (Use->getOpcode() == ISD::TRUNCATE && Use->hasOneUse()) {
// Also look pass the truncate.		// Also look pass the truncate.
Use = *Use->use_begin();		Use = *Use->use_begin();
if (Use->getOpcode() == ISD::BRCOND)		if (Use->getOpcode() == ISD::BRCOND)
AddToWorklist(Use);		AddToWorklist(Use);
}		}
}		}

		// Try to transform this shift into a multiply-high if
		// it matches the appropriate pattern detected in combineShiftToMULH.
		if (SDValue MULH = combineShiftToMULH(N, DAG, TLI))
		return MULH;

return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitFunnelShift(SDNode *N) {		SDValue DAGCombiner::visitFunnelShift(SDNode *N) {
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
SDValue N2 = N->getOperand(2);		SDValue N2 = N->getOperand(2);
▲ Show 20 Lines • Show All 13,516 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/combine-to-mulh-shift-amount.ll

This file was added.

				; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu \
				; RUN: -mcpu=pwr9 -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr < %s \| \
				; RUN: FileCheck %s

				; These tests show that for 32-bit and 64-bit scalars, combining a shift to
				; a single multiply-high is only valid when the shift amount is the same as
				; the width of the narrow type.

				; That is, combining a shift to mulh is only valid for 32-bit when the shift
				; amount is 32.
				; Likewise, combining a shift to mulh is only valid for 64-bit when the shift
				; amount is 64.

				define i32 @test_mulhw(i32 %a, i32 %b) {
				; CHECK-LABEL: test_mulhw:
				; CHECK: mulld
				; CHECK-NOT: mulhw
				; CHECK: blr
				%1 = sext i32 %a to i64
				%2 = sext i32 %b to i64
				%mul = mul i64 %1, %2
				%shr = lshr i64 %mul, 33
				%tr = trunc i64 %shr to i32
				ret i32 %tr
				}

				define i32 @test_mulhu(i32 %a, i32 %b) {
				; CHECK-LABEL: test_mulhu:
				; CHECK: mulld
				; CHECK-NOT: mulhwu
				; CHECK: blr
				%1 = zext i32 %a to i64
				%2 = zext i32 %b to i64
				%mul = mul i64 %1, %2
				%shr = lshr i64 %mul, 33
				%tr = trunc i64 %shr to i32
				ret i32 %tr
				}

				define i64 @test_mulhd(i64 %a, i64 %b) {
				; CHECK-LABEL: test_mulhd:
				; CHECK: mulhd
				; CHECK: mulld
				; CHECK: blr
				%1 = sext i64 %a to i128
				%2 = sext i64 %b to i128
				%mul = mul i128 %1, %2
				%shr = lshr i128 %mul, 63
				%tr = trunc i128 %shr to i64
				ret i64 %tr
				}

				define i64 @test_mulhdu(i64 %a, i64 %b) {
				; CHECK-LABEL: test_mulhdu:
				; CHECK: mulhdu
				; CHECK: mulld
				; CHECK: blr
				%1 = zext i64 %a to i128
				%2 = zext i64 %b to i128
				%mul = mul i128 %1, %2
				%shr = lshr i128 %mul, 63
				%tr = trunc i128 %shr to i64
				ret i64 %tr
				}

				define signext i32 @test_mulhw_signext(i32 %a, i32 %b) {
				; CHECK-LABEL: test_mulhw_signext:
				; CHECK: mulld
				; CHECK-NOT: mulhw
				; CHECK: blr
				%1 = sext i32 %a to i64
				%2 = sext i32 %b to i64
				%mul = mul i64 %1, %2
				%shr = lshr i64 %mul, 33
				%tr = trunc i64 %shr to i32
				ret i32 %tr
				}

				define zeroext i32 @test_mulhu_zeroext(i32 %a, i32 %b) {
				; CHECK-LABEL: test_mulhu_zeroext:
				; CHECK: mulld
				; CHECK-NOT: mulhwu
				; CHECK: blr
				%1 = zext i32 %a to i64
				%2 = zext i32 %b to i64
				%mul = mul i64 %1, %2
				%shr = lshr i64 %mul, 33
				%tr = trunc i64 %shr to i32
				ret i32 %tr
				}

				define signext i64 @test_mulhd_signext(i64 %a, i64 %b) {
				; CHECK-LABEL: test_mulhd_signext:
				; CHECK: mulhd
				; CHECK: mulld
				; CHECK: blr
				%1 = sext i64 %a to i128
				%2 = sext i64 %b to i128
				%mul = mul i128 %1, %2
				%shr = lshr i128 %mul, 63
				%tr = trunc i128 %shr to i64
				ret i64 %tr
				}

				define zeroext i64 @test_mulhdu_zeroext(i64 %a, i64 %b) {
				; CHECK-LABEL: test_mulhdu_zeroext:
				; CHECK: mulhdu
				; CHECK: mulld
				; CHECK: blr
				%1 = zext i64 %a to i128
				%2 = zext i64 %b to i128
				%mul = mul i128 %1, %2
				%shr = lshr i128 %mul, 63
				%tr = trunc i128 %shr to i64
				ret i64 %tr
				}

llvm/test/CodeGen/PowerPC/mul-high.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu \
				; RUN: -mcpu=pwr9 -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr < %s \| \
				; RUN: FileCheck %s

				; This test case tests multiply high for i32 and i64. When the values are
				; sign-extended, mulh[d\|w] is emitted. When values are zero-extended,
				; mulh[d\|w]u is emitted instead.

				; The primary goal is transforming the pattern:
				; (shift (mul (ext $a, <wide_type>), (ext $b, <wide_type>)), <narrow_type>)
				; into (mulhs $a, $b) for sign extend, and (mulhu $a, $b) for zero extend,
				; provided that the mulh operation is legal for <narrow_type>.
				; The shift operation can be either the srl or sra operations.

				; When no attribute is present on i32, the shift operation is srl.
				define i32 @test_mulhw(i32 %a, i32 %b) {
				; CHECK-LABEL: test_mulhw:
				; CHECK: # %bb.0:
				; CHECK-NEXT: mulhw r3, r3, r4
				; CHECK-NEXT: clrldi r3, r3, 32
				; CHECK-NEXT: blr
				%1 = sext i32 %a to i64
				%2 = sext i32 %b to i64
				%mul = mul i64 %1, %2
				%shr = lshr i64 %mul, 32
				%tr = trunc i64 %shr to i32
				ret i32 %tr
				}

				define i32 @test_mulhu(i32 %a, i32 %b) {
				; CHECK-LABEL: test_mulhu:
				; CHECK: # %bb.0:
				; CHECK-NEXT: mulhwu r3, r3, r4
				; CHECK-NEXT: clrldi r3, r3, 32
				; CHECK-NEXT: blr
				%1 = zext i32 %a to i64
				%2 = zext i32 %b to i64
				%mul = mul i64 %1, %2
				%shr = lshr i64 %mul, 32
				%tr = trunc i64 %shr to i32
				ret i32 %tr
				}

				define i64 @test_mulhd(i64 %a, i64 %b) {
				; CHECK-LABEL: test_mulhd:
				; CHECK: # %bb.0:
				; CHECK-NEXT: mulhd r3, r3, r4
				; CHECK-NEXT: blr
				%1 = sext i64 %a to i128
				%2 = sext i64 %b to i128
				%mul = mul i128 %1, %2
				%shr = lshr i128 %mul, 64
				%tr = trunc i128 %shr to i64
				ret i64 %tr
				}

				define i64 @test_mulhdu(i64 %a, i64 %b) {
				; CHECK-LABEL: test_mulhdu:
				; CHECK: # %bb.0:
				; CHECK-NEXT: mulhdu r3, r3, r4
				; CHECK-NEXT: blr
				%1 = zext i64 %a to i128
				%2 = zext i64 %b to i128
				%mul = mul i128 %1, %2
				%shr = lshr i128 %mul, 64
				%tr = trunc i128 %shr to i64
				ret i64 %tr
				}

				; When the signext attribute is present on i32, the shift operation is sra.
				; We are actually transforming (sra (mul sext_in_reg, sext_in_reg)) into mulh.
				define signext i32 @test_mulhw_signext(i32 %a, i32 %b) {
				; CHECK-LABEL: test_mulhw_signext:
				; CHECK: # %bb.0:
				; CHECK-NEXT: mulhw r3, r3, r4
				; CHECK-NEXT: extsw r3, r3
				; CHECK-NEXT: blr
				%1 = sext i32 %a to i64
				%2 = sext i32 %b to i64
				%mul = mul i64 %1, %2
				%shr = lshr i64 %mul, 32
				%tr = trunc i64 %shr to i32
				ret i32 %tr
				}

				define zeroext i32 @test_mulhu_zeroext(i32 %a, i32 %b) {
				; CHECK-LABEL: test_mulhu_zeroext:
				; CHECK: # %bb.0:
				; CHECK-NEXT: mulhwu r3, r3, r4
				; CHECK-NEXT: clrldi r3, r3, 32
				; CHECK-NEXT: blr
				%1 = zext i32 %a to i64
				%2 = zext i32 %b to i64
				%mul = mul i64 %1, %2
				%shr = lshr i64 %mul, 32
				%tr = trunc i64 %shr to i32
				ret i32 %tr
				}

				define signext i64 @test_mulhd_signext(i64 %a, i64 %b) {
				; CHECK-LABEL: test_mulhd_signext:
				; CHECK: # %bb.0:
				; CHECK-NEXT: mulhd r3, r3, r4
				; CHECK-NEXT: blr
				%1 = sext i64 %a to i128
				%2 = sext i64 %b to i128
				%mul = mul i128 %1, %2
				%shr = lshr i128 %mul, 64
				%tr = trunc i128 %shr to i64
				ret i64 %tr
				}

				define zeroext i64 @test_mulhdu_zeroext(i64 %a, i64 %b) {
				; CHECK-LABEL: test_mulhdu_zeroext:
				; CHECK: # %bb.0:
				; CHECK-NEXT: mulhdu r3, r3, r4
				; CHECK-NEXT: blr
				%1 = zext i64 %a to i128
				%2 = zext i64 %b to i128
				%mul = mul i128 %1, %2
				%shr = lshr i128 %mul, 64
				%tr = trunc i128 %shr to i64
				ret i64 %tr
				}

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Combine shifts into multiply-high
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 267984

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/PowerPC/combine-to-mulh-shift-amount.ll

llvm/test/CodeGen/PowerPC/mul-high.ll

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Combine shifts into multiply-high ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 267984

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/PowerPC/combine-to-mulh-shift-amount.ll

llvm/test/CodeGen/PowerPC/mul-high.ll

[DAGCombiner] Combine shifts into multiply-high
ClosedPublic