This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
5/11
PPCISelLowering.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
combine-to-mulh-shift-amount.ll
-
mul-high.ll

Differential D78272

[DAGCombiner] Combine shifts into multiply-high
ClosedPublic

Authored by amyk on Apr 15 2020, 10:03 PM.

Download Raw Diff

Details

Reviewers

power-llvm-team
nemanjai
stefanp
hfinkel
craig.topper

Group Reviewers

Restricted Project

Commits

rGa3ada630d8ab: [DAGCombiner] Combine shifts into multiply-high

Summary

This patch implements a target independent DAG combine to produce multiply-high
instructions from shifts. This DAG combine will combine shifts for any type as
long as the MULH on the narrow type is legal.

For now, it is enabled on PowerPC as PowerPC is the only target that has an
implementation of the isMulhCheaperThanMulShift TLI hook introduced in
D78271.

Moreover, this DAG combine focuses on catching the pattern:

(shift (mul (ext <narrow_type>:$a to <wide_type>), (ext <narrow_type>:$b to <wide_type>)), <narrow_width>)

to produce mulhs when we have a sign-extend, and mulhu when we have
a zero-extend.

The patch performs the following checks:

Operation is a right shift arithmetic (sra) or logical (srl)
Input to the shift is a multiply
Both operands to the shift are sext/zext nodes
The extends into the multiply are both the same
The narrow type is half the width of the wide type
The shift amount is the width of the narrow type
The respective mulh operation is legal

Diff Detail

Event Timeline

amyk created this revision.Apr 15 2020, 10:03 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 15 2020, 10:03 PM

Herald added subscribers: llvm-commits, shchenz, kbarton, hiraditya. · View Herald Transcript

Harbormaster failed remote builds in B53506: Diff 257956!Apr 15 2020, 10:03 PM

Herald added a subscriber: • wuzish. · View Herald TranscriptApr 15 2020, 10:03 PM

lebedev.ri added a subscriber: lebedev.ri.Apr 16 2020, 1:32 AM

lebedev.ri added inline comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
15921	This doesn't look like the best check for this. Do we not want this transform in general, is it expected to be pessimizing somewhere? Then i'd expect it to be a TLI hook.

amyk marked an inline comment as done.Apr 16 2020, 9:24 PM

amyk added inline comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
15921	I actually realize now after doing some testing that it may be better to remove this check and to run this transformation in other passes, as well. I will update the patch to reflect this. Thanks for reviewing.

Updated the patch so that the combine for shifts runs not only before type legalization, but in subsequent passes, too.

Harbormaster failed remote builds in B53676: Diff 258231!Apr 16 2020, 9:38 PM

anil9 added a subscriber: anil9.Apr 20 2020, 9:19 PM

anil9 added inline comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
15849	niit: combineShifttoMULH -> combineShiftToMULH

Address comment to rename combineShifttoMULH to combineShiftToMULH.

amyk marked an inline comment as done.Apr 21 2020, 2:52 PM

Is there anything that would stop us making this a generic combine in DAGCombiner?

The description mentions i32/i64 explicitly, but there does not seem to be anything in the combine that narrows it down to those two types. In fact, it appears that it will combine it for any type (scalar or vector) as long as the MULH on the narrow type is legal.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
15876	You will probably need something like `(void)WideVT2;` to silence warnings on non-assert builds. Or you can just not define `WideVT2`, not have the assert and trust that the well-formedness of the multiply is adequately checked in target-independent code.
15888	I don't understand this. Why do we assume that either `ShiftAmtSrc` is constant or its second operand is constant? What node do you expect it to be if it is not constant? Also, won't this crash on `(srl (mul (zext i32:%a to i64), (zext i32:%b to i64)), %c)`? i.e. something like: unsigned test(unsigned a, unsigned b, unsigned c) { return (unsigned) (((uint64_t)a * b) >> c); }

This revision now requires changes to proceed.Apr 24 2020, 6:08 AM

In D78272#2001767, @nemanjai wrote:

The description mentions i32/i64 explicitly, but there does not seem to be anything in the combine that narrows it down to those two types. In fact, it appears that it will combine it for any type (scalar or vector) as long as the MULH on the narrow type is legal.

Just to be clear, I am suggesting the description should be updated to match the combine, not the other way around.

Addressed review comments, and updated the summary for this patch.

In D78272#1997042, @RKSimon wrote:

Is there anything that would stop us making this a generic combine in DAGCombiner?

Is there a preference to have this target independent instead? It may be possible if there is a preference/demand for it to be.

In D78272#2021795, @amyk wrote:

In D78272#1997042, @RKSimon wrote:

Is there anything that would stop us making this a generic combine in DAGCombiner?

Is there a preference to have this target independent instead? It may be possible if there is a preference/demand for it to be.

@craig.topper - any thoughts? Given how expensive PMULLD/PMULLQ can be, we don't much to encourage PMULH generation.

In D78272#2022341, @RKSimon wrote:

In D78272#2021795, @amyk wrote:

In D78272#1997042, @RKSimon wrote:

Is there anything that would stop us making this a generic combine in DAGCombiner?

Is there a preference to have this target independent instead? It may be possible if there is a preference/demand for it to be.

@craig.topper - any thoughts? Given how expensive PMULLD/PMULLQ can be, we don't much to encourage PMULH generation.

We have a combine similar to this in X86 back it starts at the truncate not the shift.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
15882	SIGN_EXTEND_INREG will never pass this check will it? The input and output type for that are the same. There's an extra operand carrying the type to extend from.

craig.topper added inline comments.May 6 2020, 12:15 PM

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
15905	I think what extend to use at the end needs to be base of the shift opcode not the extend opcode. If its an SRL, you need to put 0s in the upper bits even if the multiply is MULHS.

craig.topper added inline comments.May 6 2020, 9:47 PM

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
15904	I don't see a check that RightOp.getOperand(0) and LeftOp.getOperand(0) are the the same type?

@craig.topper Do you think common-ing out the X86/PPC parts to combine to mulh into the target independent combiner is a good idea?

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
15882	That's a good point. I thought I had tests involving SIGN_EXTEND_INREG that works with this, but I realize now that I actually don't and they're all sign extends for this patch. I've decided to move the check for SIGN_EXTEND_INREG.
15904	That's true, thank you for pointing that out.
15905	You're right, I'll fix that.

In D78272#2025841, @amyk wrote:

@craig.topper Do you think common-ing out the X86/PPC parts to combine to mulh into the target independent combiner is a good idea?

I see a few issues.

X86 doesn't have scalar MULHU/MULHS instructions, but we have vector MULHU/MULHS on vXi16. We currently match from truncate rather than from shift. I tried to move it to shift following your code here, but I got regressions. Primarily because we handled the truncate first and turned into PACKSS/PACKUS, may an AND or SIGN_EXTEND_INREG, and some subvector extracts.. Then we match the MULH, but we couldn't fold away the sext/zext and what we had turned the truncate into. We might be able to improve that. I think it is useful to match from the shift since the truncate won't always be there. So we might need matching from both.

The other issue is that for vectors we need to match for vectors width more than the legal number of elements before type legalization. Otherwise the extends we match get type legalized into something much harder to match. But checking isOperationLegal won't work for that. Maybe we could walk the type legalization steps to find what it would be legalized to?

Updated the diff of this patch to address comments that were raised previously.

In D78272#2026458, @craig.topper wrote:

In D78272#2025841, @amyk wrote:

@craig.topper Do you think common-ing out the X86/PPC parts to combine to mulh into the target independent combiner is a good idea?

I see a few issues.

X86 doesn't have scalar MULHU/MULHS instructions, but we have vector MULHU/MULHS on vXi16. We currently match from truncate rather than from shift. I tried to move it to shift following your code here, but I got regressions. Primarily because we handled the truncate first and turned into PACKSS/PACKUS, may an AND or SIGN_EXTEND_INREG, and some subvector extracts.. Then we match the MULH, but we couldn't fold away the sext/zext and what we had turned the truncate into. We might be able to improve that. I think it is useful to match from the shift since the truncate won't always be there. So we might need matching from both.

It sounds like if we were to put this code into target-independent DAG Combine, you shouldn't see similar regressions since your combines will still run and this patch might pick up some of what couldn't be combined. Or am I misinterpreting your comment?

The other issue is that for vectors we need to match for vectors width more than the legal number of elements before type legalization. Otherwise the extends we match get type legalized into something much harder to match. But checking isOperationLegal won't work for that. Maybe we could walk the type legalization steps to find what it would be legalized to?

If I am not mistaken, this runs before legalization as well so it should combine something like
(srl (mul (zext v16i16:$a to v16i32), (zext v16i16:$a to v16i32)), (splat 16 to v16i32))
As long as MULH is legal for v16i16 (even though v16i32 is not a legal type). Of course, the actual types just for illustration - not to suggest that those are the actual types for any specific target.
Or am I again not reading your comment correctly?

In D78272#2029885, @nemanjai wrote:

In D78272#2026458, @craig.topper wrote:

In D78272#2025841, @amyk wrote:

@craig.topper Do you think common-ing out the X86/PPC parts to combine to mulh into the target independent combiner is a good idea?

I see a few issues.

X86 doesn't have scalar MULHU/MULHS instructions, but we have vector MULHU/MULHS on vXi16. We currently match from truncate rather than from shift. I tried to move it to shift following your code here, but I got regressions. Primarily because we handled the truncate first and turned into PACKSS/PACKUS, may an AND or SIGN_EXTEND_INREG, and some subvector extracts.. Then we match the MULH, but we couldn't fold away the sext/zext and what we had turned the truncate into. We might be able to improve that. I think it is useful to match from the shift since the truncate won't always be there. So we might need matching from both.

It sounds like if we were to put this code into target-independent DAG Combine, you shouldn't see similar regressions since your combines will still run and this patch might pick up some of what couldn't be combined. Or am I misinterpreting your comment?

Yes our combine will still run. I have no issue putting this in target-independent combine. I was answering from the position of why putting it in target combine doesn't get rid of X86 specific code. Which I assume was at least part of @RKSimon's motivation for moving it.

The other issue is that for vectors we need to match for vectors width more than the legal number of elements before type legalization. Otherwise the extends we match get type legalized into something much harder to match. But checking isOperationLegal won't work for that. Maybe we could walk the type legalization steps to find what it would be legalized to?

If I am not mistaken, this runs before legalization as well so it should combine something like
(srl (mul (zext v16i16:$a to v16i32), (zext v16i16:$a to v16i32)), (splat 16 to v16i32))
As long as MULH is legal for v16i16 (even though v16i32 is not a legal type). Of course, the actual types just for illustration - not to suggest that those are the actual types for any specific target.
Or am I again not reading your comment correctly?

But it's still probably worthwhile for illegal types like v128i16 since it turns two extends on the inputs to one on the output. Type legalization will split the v128i16 MULHU/MULHS. This needs to be matched before type legalization makes the v128i32 zero extend hard to spot. Number of elements exaggerated to make it look it likely illegal.

In D78272#2029966, @craig.topper wrote:

Yes our combine will still run. I have no issue putting this in target-independent combine. I was answering from the position of why putting it in target combine doesn't get rid of X86 specific code. Which I assume was at least part of @RKSimon's motivation for moving it.

Yes this was a query as to whether it would help on top of the existing x86 specific combines.

Since it would seem that this can't immediately be combined with the only other target that seems to want this, I would recommend that we keep this in the PPC back end for now. If there is interest in commoning it up in the future, we revisit this then.
Does that sound like a good plan?
And of course, thanks for all your feedback @craig.topper @RKSimon!

In D78272#2033067, @nemanjai wrote:

Since it would seem that this can't immediately be combined with the only other target that seems to want this, I would recommend that we keep this in the PPC back end for now. If there is interest in commoning it up in the future, we revisit this then.
Does that sound like a good plan?
And of course, thanks for all your feedback @craig.topper @RKSimon!

No objections from me - sorry for the delay!

In D78272#2033067, @nemanjai wrote:

Since it would seem that this can't immediately be combined with the only other target that seems to want this, I would recommend that we keep this in the PPC back end for now. If there is interest in commoning it up in the future, we revisit this then.
Does that sound like a good plan?
And of course, thanks for all your feedback @craig.topper @RKSimon!

I can see this combine being useful on RISC-V (where we have a mulh[[s]u] instruction) - would it be useful for me to work on some testcases for it?

In D78272#2033514, @lenary wrote:

I can see this combine being useful on RISC-V (where we have a mulh[[s]u] instruction) - would it be useful for me to work on some testcases for it?

Absolutely. If we have a test case (along with invocation and desired codegen) it would make it much easier to move this out to the target independent code and ensure it is doing what it is supposed to. Thank you.

craig.topper mentioned this in D80485: [DAGCombiner][PowerPC] Remove isMulhCheaperThanMulShift TLI hook. Use isOperationLegalOrCustom directly instead..May 26 2020, 12:02 AM

Moved the function to combine shifts into multiply high into DAGCombiner.cpp. For now, this combine runs only on PowerPC as PowerPC has an implementation of the isMulhCheaperThanMulShift TLI query.

Herald added subscribers: ecnelises, steven.zhang. · View Herald TranscriptMay 26 2020, 8:46 AM

LGTM

@nemanjai Any more comments?

LGTM. My only remaining comments are nits about early exits from the function not happening as early as they can, but such reordering is trivial and can happen on the commit. Thank you.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
7947 ↗	(On Diff #266233)	I think the check for the types of the inputs to the two extend nodes belongs here. Generally favour early exits as soon as the necessary information is available.
7959 ↗	(On Diff #266233)	This check is on the outermost operation (i.e. the shift). We can probably move this early exit towards the very top of the function as it makes no sense to do any other checks if the shift amount isn't constant. I don't think another round of review is required for this though - feel free to address this on the commit.

Now if only I remember to select "Accept Revision"... :)

This revision is now accepted and ready to land.Jun 2 2020, 5:16 AM

Closed by commit rGa3ada630d8ab: [DAGCombiner] Combine shifts into multiply-high (authored by amyk). · Explain WhyJun 2 2020, 1:45 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

PowerPC/

PPCISelLowering.cpp

78 lines

test/

CodeGen/

PowerPC/

combine-to-mulh-shift-amount.ll

116 lines

mul-high.ll

125 lines

Diff 262813

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 15,836 Lines • ▼ Show 20 Lines	SDValue PPCTargetLowering::combineSHL(SDNode *N, DAGCombinerInfo &DCI) const {
// have an i64.		// have an i64.
if (ShiftBy.getValueType() == MVT::i64)		if (ShiftBy.getValueType() == MVT::i64)
ShiftBy = DCI.DAG.getConstant(CN1->getZExtValue(), DL, MVT::i32);		ShiftBy = DCI.DAG.getConstant(CN1->getZExtValue(), DL, MVT::i32);

return DCI.DAG.getNode(PPCISD::EXTSWSLI, DL, MVT::i64, N0->getOperand(0),		return DCI.DAG.getNode(PPCISD::EXTSWSLI, DL, MVT::i64, N0->getOperand(0),
ShiftBy);		ShiftBy);
}		}

		// Transform a right shift of a multiply into a multiply-high.
		// Examples:
		// (srl (mul (zext i32:$a to i64), (zext i32:$a to i64)), 32) -> (mulhu $a, $b)
		// (sra (mul (sext i32:$a to i64), (sext i32:$a to i64)), 32) -> (mulhs $a, $b)
		static SDValue combineShiftToMULH(SDNode *N, SelectionDAG &DAG,
		anil9Unsubmitted Done Reply Inline Actions niit: combineShifttoMULH -> combineShiftToMULH anil9: niit: combineShifttoMULH -> combineShiftToMULH
		const TargetLowering &TLI,
		const PPCSubtarget &Subtarget) {
		assert((N->getOpcode() == ISD::SRL \|\| N->getOpcode() == ISD::SRA) &&
		"SRL or SRA node is required here!");
		SDLoc DL(N);

		// The operation feeding into the shift must be a multiply.
		SDValue ShiftOperand = N->getOperand(0);
		if (ShiftOperand.getOpcode() != ISD::MUL)
		return SDValue();

		// Both operands must be equivalent extend nodes.
		SDValue LeftOp = ShiftOperand.getOperand(0);
		SDValue RightOp = ShiftOperand.getOperand(1);
		bool IsSignExt = LeftOp.getOpcode() == ISD::SIGN_EXTEND;
		bool IsZeroExt = LeftOp.getOpcode() == ISD::ZERO_EXTEND;

		if ((!(IsSignExt \|\| IsZeroExt)) \|\| LeftOp.getOpcode() != RightOp.getOpcode())
		return SDValue();

		EVT WideVT1 = LeftOp.getValueType();
		EVT WideVT2 = RightOp.getValueType();
		// Proceed with the transformation if the wide types match.
		assert((WideVT1 == WideVT2) &&
		"Cannot have a multiply node with two different operand types.");
		(void)WideVT2;

		nemanjaiUnsubmitted Not Done Reply Inline Actions You will probably need something like `(void)WideVT2;` to silence warnings on non-assert builds. Or you can just not define `WideVT2`, not have the assert and trust that the well-formedness of the multiply is adequately checked in target-independent code. nemanjai: You will probably need something like `(void)WideVT2;` to silence warnings on non-assert builds.
		EVT NarrowVT = LeftOp.getOperand(0).getValueType();
		// Proceed with the transformation if the wide type is twice as large
		// as the narrow type.
		unsigned NarrowVTSize = NarrowVT.getScalarSizeInBits();
		if (WideVT1.getScalarSizeInBits() != 2 * NarrowVTSize)
		return SDValue();
		craig.topperUnsubmitted Not Done Reply Inline Actions SIGN_EXTEND_INREG will never pass this check will it? The input and output type for that are the same. There's an extra operand carrying the type to extend from. craig.topper: SIGN_EXTEND_INREG will never pass this check will it? The input and output type for that are…
		amykAuthorUnsubmitted Done Reply Inline Actions That's a good point. I thought I had tests involving SIGN_EXTEND_INREG that works with this, but I realize now that I actually don't and they're all sign extends for this patch. I've decided to move the check for SIGN_EXTEND_INREG. amyk: That's a good point. I thought I had tests involving SIGN_EXTEND_INREG that works with this…

		// Check the shift amount with the narrow type size.
		// Proceed with the transformation if the shift amount is a constant and
		// if the shift amount is the width of the narrow type.
		ConstantSDNode *ShiftAmtSrc = isConstOrConstSplat(N->getOperand(1));
		if (!ShiftAmtSrc)
		nemanjaiUnsubmitted Not Done Reply Inline Actions I don't understand this. Why do we assume that either `ShiftAmtSrc` is constant or its second operand is constant? What node do you expect it to be if it is not constant? Also, won't this crash on `(srl (mul (zext i32:%a to i64), (zext i32:%b to i64)), %c)`? i.e. something like: unsigned test(unsigned a, unsigned b, unsigned c) { return (unsigned) (((uint64_t)a * b) >> c); } nemanjai: I don't understand this. Why do we assume that either `ShiftAmtSrc` is constant or its second…
		return SDValue();

		unsigned ShiftAmt = ShiftAmtSrc->getZExtValue();
		if (ShiftAmt != NarrowVTSize)
		return SDValue();

		// If the operation feeding into the MUL is a sign extend (sext),
		// we use mulhs. Othewise, zero extends (zext) use mulhu.
		unsigned MulhOpcode = IsSignExt ? ISD::MULHS : ISD::MULHU;

		if (!TLI.isOperationLegal(MulhOpcode, NarrowVT))
		return SDValue();

		if (NarrowVT != RightOp.getOperand(0).getValueType())
		return SDValue();

		craig.topperUnsubmitted Not Done Reply Inline Actions I don't see a check that RightOp.getOperand(0) and LeftOp.getOperand(0) are the the same type? craig.topper: I don't see a check that RightOp.getOperand(0) and LeftOp.getOperand(0) are the the same type?
		amykAuthorUnsubmitted Done Reply Inline Actions That's true, thank you for pointing that out. amyk: That's true, thank you for pointing that out.
		SDValue Result = DAG.getNode(MulhOpcode, DL, NarrowVT, LeftOp.getOperand(0),
		craig.topperUnsubmitted Not Done Reply Inline Actions I think what extend to use at the end needs to be base of the shift opcode not the extend opcode. If its an SRL, you need to put 0s in the upper bits even if the multiply is MULHS. craig.topper: I think what extend to use at the end needs to be base of the shift opcode not the extend…
		amykAuthorUnsubmitted Done Reply Inline Actions You're right, I'll fix that. amyk: You're right, I'll fix that.
		RightOp.getOperand(0));
		return (N->getOpcode() == ISD::SRA ? DAG.getSExtOrTrunc(Result, DL, WideVT1)
		: DAG.getZExtOrTrunc(Result, DL, WideVT1));
		}

SDValue PPCTargetLowering::combineSRA(SDNode *N, DAGCombinerInfo &DCI) const {		SDValue PPCTargetLowering::combineSRA(SDNode *N, DAGCombinerInfo &DCI) const {
if (auto Value = stripModuloOnShift(*this, N, DCI.DAG))		if (auto Value = stripModuloOnShift(*this, N, DCI.DAG))
return Value;		return Value;

		// On 64-bit PowerPC, try to transform this shift into a multiply-high if
		// it matches the pattern.
		if (Subtarget.isPPC64())
		if (SDValue MULH = combineShiftToMULH(N, DCI.DAG, *this, Subtarget))
		return MULH;

return SDValue();		return SDValue();
		lebedev.riUnsubmitted Not Done Reply Inline Actions This doesn't look like the best check for this. Do we not want this transform in general, is it expected to be pessimizing somewhere? Then i'd expect it to be a TLI hook. lebedev.ri: This doesn't look like the best check for this. Do we not want this transform in general, is it…
		amykAuthorUnsubmitted Done Reply Inline Actions I actually realize now after doing some testing that it may be better to remove this check and to run this transformation in other passes, as well. I will update the patch to reflect this. Thanks for reviewing. amyk: I actually realize now after doing some testing that it may be better to remove this check and…
}		}

SDValue PPCTargetLowering::combineSRL(SDNode *N, DAGCombinerInfo &DCI) const {		SDValue PPCTargetLowering::combineSRL(SDNode *N, DAGCombinerInfo &DCI) const {
if (auto Value = stripModuloOnShift(*this, N, DCI.DAG))		if (auto Value = stripModuloOnShift(*this, N, DCI.DAG))
return Value;		return Value;

		// On 64-bit PowerPC, try to transform this shift into a multiply-high if
		// it matches the pattern.
		if (Subtarget.isPPC64())
		if (SDValue MULH = combineShiftToMULH(N, DCI.DAG, *this, Subtarget))
		return MULH;

return SDValue();		return SDValue();
}		}

// Transform (add X, (zext(setne Z, C))) -> (addze X, (addic (addi Z, -C), -1))		// Transform (add X, (zext(setne Z, C))) -> (addze X, (addic (addi Z, -C), -1))
// Transform (add X, (zext(sete Z, C))) -> (addze X, (subfic (addi Z, -C), 0))		// Transform (add X, (zext(sete Z, C))) -> (addze X, (subfic (addi Z, -C), 0))
// When C is zero, the equation (addi Z, -C) can be simplified to Z		// When C is zero, the equation (addi Z, -C) can be simplified to Z
// Requirement: -C in [-32768, 32767], X and Z are MVT::i64 types		// Requirement: -C in [-32768, 32767], X and Z are MVT::i64 types
static SDValue combineADDToADDZE(SDNode *N, SelectionDAG &DAG,		static SDValue combineADDToADDZE(SDNode *N, SelectionDAG &DAG,
▲ Show 20 Lines • Show All 386 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/combine-to-mulh-shift-amount.ll

This file was added.

				; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu \
				; RUN: -mcpu=pwr9 -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr < %s \| \
				; RUN: FileCheck %s

				; These tests show that for 32-bit and 64-bit scalars, combining a shift to
				; a single multiply-high is only valid when the shift amount is the same as
				; the width of the narrow type.

				; That is, combining a shift to mulh is only valid for 32-bit when the shift
				; amount is 32.
				; Likewise, combining a shift to mulh is only valid for 64-bit when the shift
				; amount is 64.

				define i32 @test_mulhw(i32 %a, i32 %b) {
				; CHECK-LABEL: test_mulhw:
				; CHECK: mulld
				; CHECK-NOT: mulhw
				; CHECK: blr
				%1 = sext i32 %a to i64
				%2 = sext i32 %b to i64
				%mul = mul i64 %1, %2
				%shr = lshr i64 %mul, 33
				%tr = trunc i64 %shr to i32
				ret i32 %tr
				}

				define i32 @test_mulhu(i32 %a, i32 %b) {
				; CHECK-LABEL: test_mulhu:
				; CHECK: mulld
				; CHECK-NOT: mulhwu
				; CHECK: blr
				%1 = zext i32 %a to i64
				%2 = zext i32 %b to i64
				%mul = mul i64 %1, %2
				%shr = lshr i64 %mul, 33
				%tr = trunc i64 %shr to i32
				ret i32 %tr
				}

				define i64 @test_mulhd(i64 %a, i64 %b) {
				; CHECK-LABEL: test_mulhd:
				; CHECK: mulhd
				; CHECK: mulld
				; CHECK: blr
				%1 = sext i64 %a to i128
				%2 = sext i64 %b to i128
				%mul = mul i128 %1, %2
				%shr = lshr i128 %mul, 63
				%tr = trunc i128 %shr to i64
				ret i64 %tr
				}

				define i64 @test_mulhdu(i64 %a, i64 %b) {
				; CHECK-LABEL: test_mulhdu:
				; CHECK: mulhdu
				; CHECK: mulld
				; CHECK: blr
				%1 = zext i64 %a to i128
				%2 = zext i64 %b to i128
				%mul = mul i128 %1, %2
				%shr = lshr i128 %mul, 63
				%tr = trunc i128 %shr to i64
				ret i64 %tr
				}

				define signext i32 @test_mulhw_signext(i32 %a, i32 %b) {
				; CHECK-LABEL: test_mulhw_signext:
				; CHECK: mulld
				; CHECK-NOT: mulhw
				; CHECK: blr
				%1 = sext i32 %a to i64
				%2 = sext i32 %b to i64
				%mul = mul i64 %1, %2
				%shr = lshr i64 %mul, 33
				%tr = trunc i64 %shr to i32
				ret i32 %tr
				}

				define zeroext i32 @test_mulhu_zeroext(i32 %a, i32 %b) {
				; CHECK-LABEL: test_mulhu_zeroext:
				; CHECK: mulld
				; CHECK-NOT: mulhwu
				; CHECK: blr
				%1 = zext i32 %a to i64
				%2 = zext i32 %b to i64
				%mul = mul i64 %1, %2
				%shr = lshr i64 %mul, 33
				%tr = trunc i64 %shr to i32
				ret i32 %tr
				}

				define signext i64 @test_mulhd_signext(i64 %a, i64 %b) {
				; CHECK-LABEL: test_mulhd_signext:
				; CHECK: mulhd
				; CHECK: mulld
				; CHECK: blr
				%1 = sext i64 %a to i128
				%2 = sext i64 %b to i128
				%mul = mul i128 %1, %2
				%shr = lshr i128 %mul, 63
				%tr = trunc i128 %shr to i64
				ret i64 %tr
				}

				define zeroext i64 @test_mulhdu_zeroext(i64 %a, i64 %b) {
				; CHECK-LABEL: test_mulhdu_zeroext:
				; CHECK: mulhdu
				; CHECK: mulld
				; CHECK: blr
				%1 = zext i64 %a to i128
				%2 = zext i64 %b to i128
				%mul = mul i128 %1, %2
				%shr = lshr i128 %mul, 63
				%tr = trunc i128 %shr to i64
				ret i64 %tr
				}

llvm/test/CodeGen/PowerPC/mul-high.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu \
				; RUN: -mcpu=pwr9 -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr < %s \| \
				; RUN: FileCheck %s

				; This test case tests multiply high for i32 and i64. When the values are
				; sign-extended, mulh[d\|w] is emitted. When values are zero-extended,
				; mulh[d\|w]u is emitted instead.

				; The primary goal is transforming the pattern:
				; (shift (mul (ext $a, <wide_type>), (ext $b, <wide_type>)), <narrow_type>)
				; into (mulhs $a, $b) for sign extend, and (mulhu $a, $b) for zero extend,
				; provided that the mulh operation is legal for <narrow_type>.
				; The shift operation can be either the srl or sra operations.

				; When no attribute is present on i32, the shift operation is srl.
				define i32 @test_mulhw(i32 %a, i32 %b) {
				; CHECK-LABEL: test_mulhw:
				; CHECK: # %bb.0:
				; CHECK-NEXT: mulhw r3, r3, r4
				; CHECK-NEXT: clrldi r3, r3, 32
				; CHECK-NEXT: blr
				%1 = sext i32 %a to i64
				%2 = sext i32 %b to i64
				%mul = mul i64 %1, %2
				%shr = lshr i64 %mul, 32
				%tr = trunc i64 %shr to i32
				ret i32 %tr
				}

				define i32 @test_mulhu(i32 %a, i32 %b) {
				; CHECK-LABEL: test_mulhu:
				; CHECK: # %bb.0:
				; CHECK-NEXT: mulhwu r3, r3, r4
				; CHECK-NEXT: clrldi r3, r3, 32
				; CHECK-NEXT: blr
				%1 = zext i32 %a to i64
				%2 = zext i32 %b to i64
				%mul = mul i64 %1, %2
				%shr = lshr i64 %mul, 32
				%tr = trunc i64 %shr to i32
				ret i32 %tr
				}

				define i64 @test_mulhd(i64 %a, i64 %b) {
				; CHECK-LABEL: test_mulhd:
				; CHECK: # %bb.0:
				; CHECK-NEXT: mulhd r3, r3, r4
				; CHECK-NEXT: blr
				%1 = sext i64 %a to i128
				%2 = sext i64 %b to i128
				%mul = mul i128 %1, %2
				%shr = lshr i128 %mul, 64
				%tr = trunc i128 %shr to i64
				ret i64 %tr
				}

				define i64 @test_mulhdu(i64 %a, i64 %b) {
				; CHECK-LABEL: test_mulhdu:
				; CHECK: # %bb.0:
				; CHECK-NEXT: mulhdu r3, r3, r4
				; CHECK-NEXT: blr
				%1 = zext i64 %a to i128
				%2 = zext i64 %b to i128
				%mul = mul i128 %1, %2
				%shr = lshr i128 %mul, 64
				%tr = trunc i128 %shr to i64
				ret i64 %tr
				}

				; When the signext attribute is present on i32, the shift operation is sra.
				; We are actually transforming (sra (mul sext_in_reg, sext_in_reg)) into mulh.
				define signext i32 @test_mulhw_signext(i32 %a, i32 %b) {
				; CHECK-LABEL: test_mulhw_signext:
				; CHECK: # %bb.0:
				; CHECK-NEXT: mulhw r3, r3, r4
				; CHECK-NEXT: extsw r3, r3
				; CHECK-NEXT: blr
				%1 = sext i32 %a to i64
				%2 = sext i32 %b to i64
				%mul = mul i64 %1, %2
				%shr = lshr i64 %mul, 32
				%tr = trunc i64 %shr to i32
				ret i32 %tr
				}

				define zeroext i32 @test_mulhu_zeroext(i32 %a, i32 %b) {
				; CHECK-LABEL: test_mulhu_zeroext:
				; CHECK: # %bb.0:
				; CHECK-NEXT: mulhwu r3, r3, r4
				; CHECK-NEXT: clrldi r3, r3, 32
				; CHECK-NEXT: blr
				%1 = zext i32 %a to i64
				%2 = zext i32 %b to i64
				%mul = mul i64 %1, %2
				%shr = lshr i64 %mul, 32
				%tr = trunc i64 %shr to i32
				ret i32 %tr
				}

				define signext i64 @test_mulhd_signext(i64 %a, i64 %b) {
				; CHECK-LABEL: test_mulhd_signext:
				; CHECK: # %bb.0:
				; CHECK-NEXT: mulhd r3, r3, r4
				; CHECK-NEXT: blr
				%1 = sext i64 %a to i128
				%2 = sext i64 %b to i128
				%mul = mul i128 %1, %2
				%shr = lshr i128 %mul, 64
				%tr = trunc i128 %shr to i64
				ret i64 %tr
				}

				define zeroext i64 @test_mulhdu_zeroext(i64 %a, i64 %b) {
				; CHECK-LABEL: test_mulhdu_zeroext:
				; CHECK: # %bb.0:
				; CHECK-NEXT: mulhdu r3, r3, r4
				; CHECK-NEXT: blr
				%1 = zext i64 %a to i128
				%2 = zext i64 %b to i128
				%mul = mul i128 %1, %2
				%shr = lshr i128 %mul, 64
				%tr = trunc i128 %shr to i64
				ret i64 %tr
				}