This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/GlobalISel/
-
llvm/
-
CodeGen/
-
GlobalISel/
-
LegalizerHelper.h
-
lib/
-
CodeGen/GlobalISel/
-
GlobalISel/
15/20
LegalizerHelper.cpp
-
Target/AMDGPU/
-
AMDGPU/
2/3
AMDGPULegalizerInfo.cpp
-
test/CodeGen/AMDGPU/GlobalISel/
-
CodeGen/
-
AMDGPU/
-
GlobalISel/
2
legalize-smulo.mir
4/7
legalize-umulo.mir

Differential D93963

[GlobalISel][AMDGPU] Lower G_UMULO/G_SMULO
ClosedPublic

Authored by pdhaliwal on Jan 1 2021, 2:58 AM.

Download Raw Diff

Details

Reviewers

arsenm
foad

Commits

rGd0e5422eb8bf: [GlobalISel][AMDGPU] Lower G_UMULO/G_SMULO

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	350 ms	x64 debian > LLVM.CodeGen/AMDGPU/GlobalISel::legalize-smulo.mir
	380 ms	x64 debian > LLVM.CodeGen/AMDGPU/GlobalISel::legalize-umulo.mir
	290 ms	x64 windows > LLVM.CodeGen/AMDGPU/GlobalISel::legalize-smulo.mir
	240 ms	x64 windows > LLVM.CodeGen/AMDGPU/GlobalISel::legalize-umulo.mir

Event Timeline

pdhaliwal created this revision.Jan 1 2021, 2:58 AM

Herald added subscribers: kerbowa, hiraditya, t-tye and 7 others. · View Herald TranscriptJan 1 2021, 2:58 AM

pdhaliwal requested review of this revision.Jan 1 2021, 2:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 1 2021, 2:58 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

Harbormaster completed remote builds in B83801: Diff 314215.Jan 1 2021, 3:41 AM

arsenm added inline comments.Jan 4 2021, 12:55 PM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
602	Would you get better results if you: scalarized vectors first Promoted small scalar types first?
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulo.mir
4	Do you need the abort=0s?

Removed global-isel-abort=0

pdhaliwal marked an inline comment as done.Jan 10 2021, 11:45 PM

pdhaliwal added inline comments.

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
602	In both cases, the MULO is somehow considered legal by compiler. For e.g. even for s32, expansion does not occur, compiler is directly using S/UMULO instructions. I am investigating this.

Harbormaster completed remote builds in B84638: Diff 315710.Jan 11 2021, 12:26 AM

Moved ops close to ADDO

pdhaliwal added inline comments.Jan 14 2021, 10:26 PM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
602	It was due to missing corresponding definition of widenScalar for this operation. I tried implementing one for UMULO as it was easier. Didn't see any better result.

Harbormaster completed remote builds in B85295: Diff 316847.Jan 14 2021, 11:10 PM

foad added a subscriber: foad.Jan 15 2021, 1:58 AM

foad added inline comments.Jan 15 2021, 2:15 AM

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulo.mir
61	Why do we get ANYEXT followed by AND with 1? In the scalar s32 case we just get ZEXT which is nicer.
116–119	Is there a good reason why this is using UADDO plus ZEXT plus a second ADD, instead of using UADDE?
219	This expansion does an s32 multiply and an s16 multiply. It would be better to just do one s32 multiply -- you can extract all the information you need from the result of that.

Hi, apologies for late reply as I got sidetracked to some other work.

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulo.mir
61	This is coming from widening of BUILD_VECTOR resulting from legalization of ICMP instruction. So, I guess ZEXT is presumed dead once BUILD_VECTOR gets legalized.
116–119	This is because of legalization of UMULH for s64. I am thinking of having different patch for this as it impacts narrowing of UMULH.
219	I was able to get unsigned operation use single s32 multiply. Signed is getting bit tricky.

pdhaliwal added a reviewer: foad.Jan 28 2021, 2:27 AM

Scalarize the vectors first
Using widened operation for smaller types

Harbormaster completed remote builds in B87348: Diff 320428.Feb 1 2021, 3:49 AM

Ping!

arsenm added inline comments.Feb 11 2021, 3:35 PM

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1829–1830	Why is this using SExtInReg in the signed case, but ZExt in the other? SExtInReg doesn't widen the type

foad added inline comments.Feb 12 2021, 8:46 AM

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1826	Maybe assert that WideTy is at least twice as wide as SrcTy, otherwise the trick we use for calculating overflow below does not work.
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-smulo.mir
3	Can you use -check-prefixes=GCN,GFX8 and GCN,GFX9 so that update_mir_test_checks will common up the identical ones?
113	This looks wrong as Matt noted above. Doesn't G_SEXT_INREG require identical source and result types? Would this fail MIR verification?

Addressed review comments.

Can you use -check-prefixes=GCN,GFX8 and GCN,GFX9 so that update_mir_test_checks will common up the identical ones?

It does not work. Script warns as WARNING: Ignoring common prefixes: {'GCN'}: llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulo.mir

Harbormaster completed remote builds in B89194: Diff 323683.Feb 15 2021, 1:39 AM

arsenm added inline comments.Feb 15 2021, 2:25 PM

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1836	Can you copy some of the comments from the DAG version? Stuff like // Unsigned overflow occurred if the high part is non-zero. and // Signed overflow occurred if the high part does not sign extend the low.
3652–3655	Should just directly extract the reg here, there's no reason to refer to the MachineOperand. This is also using value copies of MachineOperand, which are generally not a good idea
3677–3678	You can hide the createGenericVirtual register calls with build Instr like buildInstr(Opc, {ResultTy, OverFlowTy}...
3686–3687	You should preserve the boolean type of the incoming, not hardcode to s1. We also have LLT.changeElementType for this

Addressed review comments.

pdhaliwal marked 4 inline comments as done.Feb 15 2021, 11:11 PM

Harbormaster completed remote builds in B89320: Diff 323892.Feb 15 2021, 11:45 PM

foad added inline comments.Feb 17 2021, 12:30 AM

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1826	Did you test with assertions enabled? I think this needs to be `>=`.

Fixed the assert.

pdhaliwal added inline comments.Feb 17 2021, 2:37 AM

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1826	Thanks for pointing out. I was not testing with assertions enabled so I missed it.

Harbormaster completed remote builds in B89521: Diff 324247.Feb 17 2021, 3:30 AM

arsenm added inline comments.Mar 3 2021, 6:50 PM

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1832	Hmm. This is actually going beyond what I expect widenScalar to do. In general widenScalar should try to produce the original opcode. In this case, the DAG does the same overflow opcode in the wider type, and then ors the flag at the end.
1836	The DAG version still has a few more comments ( To determine if the result overflowed in a larger type...)
1842–1845	This is a common enough pattern. The DAG provides a getZExtInReg to help produce masks like this, maybe move this to MIRBuilder?

Rebase & Review comments

Harbormaster completed remote builds in B92812: Diff 329237.Mar 9 2021, 6:53 AM

Ping!

arsenm requested changes to this revision.Mar 17 2021, 3:42 PM

arsenm added inline comments.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1854–1855	I don't think the unsigned case is right. The DAG version inserts a shift here, not a mask

This revision now requires changes to proceed.Mar 17 2021, 3:42 PM

Use SHRL for unsigned case.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1854–1855	I have changed it use shift instead of masking. Just curious, why was previous logic wrong? I thought zero'ing the upper bits of multiplication result and then comparing it with latter should provide the correct result.

Harbormaster completed remote builds in B94403: Diff 331474.Mar 18 2021, 1:24 AM

foad added inline comments.Mar 19 2021, 3:11 AM

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1828	If the wide type is at least twice as wide as the original type, then the widened multiply provably will not overflow, so you don't need the final "or". If we want to support widening to a type that is less than twice as wide as the original type, then remove this assert and only do the final "or" if `WideTy.getScalarSizeInBits() >= 2 * SrcBitWidth`. But I'm not sure how you would write a test for that code path.

Removed assert. Now WideTy overflow check is only done when WideTy is not sufficient enough.
Added test case for s24 to verify the logic.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1828	That "or" exists to match the existing DAG logic. But you are right that the assert makes "or" redundant. I have removed the assert and added tests for s24 to verify the logic.

I think this looks good, just some nits inline. If there are any further improvements they can be done as follow ups.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1854–1855	I'm pretty sure the previous logic was fine too, it's just a different way of checking the upper part is zero.
1867	Needs a comment that the multiply can't possible overflow if the wide type is >= 2 * original width. It's a shame that you have to duplicate this check from line 1845. Maybe @arsenm knows a cleaner way to write this.
3650	Personally I would be tempted to generalize LegalizerHelper::fewerElementsVectorMultiEltType into a generic function that can handle any operation that works on vector elements independently. But that does not have to be part of this patch.

arsenm added inline comments.Mar 19 2021, 6:28 AM

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1854–1855	Oh yes, I just can't read. Arguably avoiding the shift is better since shifts can be more expensive

Harbormaster completed remote builds in B94682: Diff 331845.Mar 19 2021, 7:06 AM

This version of the patch has bit more changes. Please let me know if it still looks good.

Revert the unsigned case to use Masks.
Simplified the logic for widenScalar

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1854–1855	I have reverted to using masks.

Fix comments.

Still looks good to me.

Harbormaster completed remote builds in B94945: Diff 332214.Mar 22 2021, 2:36 AM

Harbormaster completed remote builds in B94947: Diff 332217.Mar 22 2021, 3:20 AM

This revision was not accepted when it landed; it landed in state Needs Review.Mar 22 2021, 10:46 PM

Closed by commit rGd0e5422eb8bf: [GlobalISel][AMDGPU] Lower G_UMULO/G_SMULO (authored by pdhaliwal). · Explain Why

This revision was automatically updated to reflect the committed changes.

pdhaliwal added a commit: rGd0e5422eb8bf: [GlobalISel][AMDGPU] Lower G_UMULO/G_SMULO.

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

GlobalISel/

LegalizerHelper.h

5 lines

lib/

CodeGen/

GlobalISel/

LegalizerHelper.cpp

95 lines

Target/

AMDGPU/

AMDGPULegalizerInfo.cpp

5 lines

test/

CodeGen/

AMDGPU/

GlobalISel/

legalize-smulo.mir

521 lines

legalize-umulo.mir

622 lines

Diff 323892

llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h

Show First 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	private:
LegalizeResult		LegalizeResult
widenScalarExtract(MachineInstr &MI, unsigned TypeIdx, LLT WideTy);		widenScalarExtract(MachineInstr &MI, unsigned TypeIdx, LLT WideTy);
LegalizeResult		LegalizeResult
widenScalarInsert(MachineInstr &MI, unsigned TypeIdx, LLT WideTy);		widenScalarInsert(MachineInstr &MI, unsigned TypeIdx, LLT WideTy);
LegalizeResult widenScalarAddSubOverflow(MachineInstr &MI, unsigned TypeIdx,		LegalizeResult widenScalarAddSubOverflow(MachineInstr &MI, unsigned TypeIdx,
LLT WideTy);		LLT WideTy);
LegalizeResult widenScalarAddSubShlSat(MachineInstr &MI, unsigned TypeIdx,		LegalizeResult widenScalarAddSubShlSat(MachineInstr &MI, unsigned TypeIdx,
LLT WideTy);		LLT WideTy);
		LegalizeResult widenScalarMulo(MachineInstr &MI, unsigned TypeIdx,
		LLT WideTy);

/// Helper function to split a wide generic register into bitwise blocks with		/// Helper function to split a wide generic register into bitwise blocks with
/// the given Type (which implies the number of blocks needed). The generic		/// the given Type (which implies the number of blocks needed). The generic
/// registers created are appended to Ops, starting at bit 0 of Reg.		/// registers created are appended to Ops, starting at bit 0 of Reg.
void extractParts(Register Reg, LLT Ty, int NumParts,		void extractParts(Register Reg, LLT Ty, int NumParts,
SmallVectorImpl<Register> &VRegs);		SmallVectorImpl<Register> &VRegs);

/// Version which handles irregular splits.		/// Version which handles irregular splits.
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	LegalizeResult fewerElementsVectorUnmergeValues(MachineInstr &MI,
unsigned TypeIdx,		unsigned TypeIdx,
LLT NarrowTy);		LLT NarrowTy);
LegalizeResult fewerElementsVectorMerge(MachineInstr &MI, unsigned TypeIdx,		LegalizeResult fewerElementsVectorMerge(MachineInstr &MI, unsigned TypeIdx,
LLT NarrowTy);		LLT NarrowTy);
LegalizeResult fewerElementsVectorExtractInsertVectorElt(MachineInstr &MI,		LegalizeResult fewerElementsVectorExtractInsertVectorElt(MachineInstr &MI,
unsigned TypeIdx,		unsigned TypeIdx,
LLT NarrowTy);		LLT NarrowTy);

		LegalizeResult fewerElementsVectorMulo(MachineInstr &MI, unsigned TypeIdx,
		LLT NarrowTy);

LegalizeResult		LegalizeResult
reduceLoadStoreWidth(MachineInstr &MI, unsigned TypeIdx, LLT NarrowTy);		reduceLoadStoreWidth(MachineInstr &MI, unsigned TypeIdx, LLT NarrowTy);

/// Legalize an instruction by reducing the operation width, either by		/// Legalize an instruction by reducing the operation width, either by
/// narrowing the type of the operation or by reducing the number of elements		/// narrowing the type of the operation or by reducing the number of elements
/// of a vector.		/// of a vector.
/// The used strategy (narrow vs. fewerElements) is decided by \p NarrowTy.		/// The used strategy (narrow vs. fewerElements) is decided by \p NarrowTy.
/// Narrow is used if the scalar type of \p NarrowTy and \p DstTy differ,		/// Narrow is used if the scalar type of \p NarrowTy and \p DstTy differ,
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

Show First 20 Lines • Show All 1,805 Lines • ▼ Show 20 Lines	auto Result = IsSigned ? MIRBuilder.buildAShr(WideTy, WideInst, ShiftK)
: MIRBuilder.buildLShr(WideTy, WideInst, ShiftK);		: MIRBuilder.buildLShr(WideTy, WideInst, ShiftK);

MIRBuilder.buildTrunc(DstReg, Result);		MIRBuilder.buildTrunc(DstReg, Result);
MI.eraseFromParent();		MI.eraseFromParent();
return Legalized;		return Legalized;
}		}

LegalizerHelper::LegalizeResult		LegalizerHelper::LegalizeResult
		LegalizerHelper::widenScalarMulo(MachineInstr &MI, unsigned TypeIdx,
		LLT WideTy) {
		if (TypeIdx == 1)
		return UnableToLegalize;

		bool IsSigned = MI.getOpcode() == TargetOpcode::G_SMULO;
		Register Result = MI.getOperand(0).getReg();
		Register Overflow = MI.getOperand(1).getReg();
		Register LHS = MI.getOperand(2).getReg();
		Register RHS = MI.getOperand(3).getReg();
		LLT SrcTy = MRI.getType(LHS);
		unsigned SrcBitWidth = SrcTy.getScalarSizeInBits();
		assert(WideTy.getScalarSizeInBits() == 2 * SrcBitWidth);
		foadUnsubmitted Done Reply Inline Actions Maybe assert that WideTy is at least twice as wide as SrcTy, otherwise the trick we use for calculating overflow below does not work. foad: Maybe assert that WideTy is at least twice as wide as SrcTy, otherwise the trick we use for…
		foadUnsubmitted Not Done Reply Inline Actions Did you test with assertions enabled? I think this needs to be `>=`. foad: Did you test with assertions enabled? I think this needs to be `>=`.
		pdhaliwalAuthorUnsubmitted Done Reply Inline Actions Thanks for pointing out. I was not testing with assertions enabled so I missed it. pdhaliwal: Thanks for pointing out. I was not testing with assertions enabled so I missed it.

		unsigned ExtOp = IsSigned ? TargetOpcode::G_SEXT : TargetOpcode::G_ZEXT;
		foadUnsubmitted Not Done Reply Inline Actions If the wide type is at least twice as wide as the original type, then the widened multiply provably will not overflow, so you don't need the final "or". If we want to support widening to a type that is less than twice as wide as the original type, then remove this assert and only do the final "or" if `WideTy.getScalarSizeInBits() >= 2 * SrcBitWidth`. But I'm not sure how you would write a test for that code path. foad: If the wide type is at least twice as wide as the original type, then the widened multiply…
		pdhaliwalAuthorUnsubmitted Done Reply Inline Actions That "or" exists to match the existing DAG logic. But you are right that the assert makes "or" redundant. I have removed the assert and added tests for s24 to verify the logic. pdhaliwal: That "or" exists to match the existing DAG logic. But you are right that the assert makes "or"…
		auto LeftOperand = MIRBuilder.buildInstr(ExtOp, {WideTy}, {LHS});
		auto RightOperand = MIRBuilder.buildInstr(ExtOp, {WideTy}, {RHS});
		arsenmUnsubmitted Done Reply Inline Actions Why is this using SExtInReg in the signed case, but ZExt in the other? SExtInReg doesn't widen the type arsenm: Why is this using SExtInReg in the signed case, but ZExt in the other? SExtInReg doesn't widen…

		auto Mul = MIRBuilder.buildMul(WideTy, LeftOperand, RightOperand);
		arsenmUnsubmitted Done Reply Inline Actions Hmm. This is actually going beyond what I expect widenScalar to do. In general widenScalar should try to produce the original opcode. In this case, the DAG does the same overflow opcode in the wider type, and then ors the flag at the end. arsenm: Hmm. This is actually going beyond what I expect widenScalar to do. In general widenScalar…
		MIRBuilder.buildTrunc(Result, Mul);

		if (IsSigned) {
		// For signed, overflow occurred when the high part does not sign-extend
		arsenmUnsubmitted Done Reply Inline Actions Can you copy some of the comments from the DAG version? Stuff like // Unsigned overflow occurred if the high part is non-zero. and // Signed overflow occurred if the high part does not sign extend the low. arsenm: Can you copy some of the comments from the DAG version? Stuff like // Unsigned overflow…
		arsenmUnsubmitted Done Reply Inline Actions The DAG version still has a few more comments ( To determine if the result overflowed in a larger type...) arsenm: The DAG version still has a few more comments ( To determine if the result overflowed in a…
		// the low part.
		auto SExtResult = MIRBuilder.buildSExtInReg(WideTy, Mul, SrcBitWidth);
		MIRBuilder.buildICmp(CmpInst::ICMP_NE, Overflow, Mul, SExtResult);
		} else {
		// Unsigned overflow occurred if the high part is non-zero
		auto Mask = MIRBuilder.buildConstant(
		WideTy,
		APInt::getLowBitsSet(WideTy.getScalarSizeInBits(), SrcBitWidth));
		auto And = MIRBuilder.buildAnd(WideTy, Mul, Mask);
		arsenmUnsubmitted Done Reply Inline Actions This is a common enough pattern. The DAG provides a getZExtInReg to help produce masks like this, maybe move this to MIRBuilder? arsenm: This is a common enough pattern. The DAG provides a getZExtInReg to help produce masks like…
		MIRBuilder.buildICmp(CmpInst::ICMP_NE, Overflow, Mul, And);
		}

		MI.eraseFromParent();
		return Legalized;
		}

		LegalizerHelper::LegalizeResult
LegalizerHelper::widenScalar(MachineInstr &MI, unsigned TypeIdx, LLT WideTy) {		LegalizerHelper::widenScalar(MachineInstr &MI, unsigned TypeIdx, LLT WideTy) {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
		arsenmUnsubmitted Done Reply Inline Actions I don't think the unsigned case is right. The DAG version inserts a shift here, not a mask arsenm: I don't think the unsigned case is right. The DAG version inserts a shift here, not a mask
		pdhaliwalAuthorUnsubmitted Done Reply Inline Actions I have changed it use shift instead of masking. Just curious, why was previous logic wrong? I thought zero'ing the upper bits of multiplication result and then comparing it with latter should provide the correct result. pdhaliwal: I have changed it use shift instead of masking. Just curious, why was previous logic wrong? I…
		foadUnsubmitted Not Done Reply Inline Actions I'm pretty sure the previous logic was fine too, it's just a different way of checking the upper part is zero. foad: I'm pretty sure the previous logic was fine too, it's just a different way of checking the…
		arsenmUnsubmitted Not Done Reply Inline Actions Oh yes, I just can't read. Arguably avoiding the shift is better since shifts can be more expensive arsenm: Oh yes, I just can't read. Arguably avoiding the shift is better since shifts can be more…
		pdhaliwalAuthorUnsubmitted Done Reply Inline Actions I have reverted to using masks. pdhaliwal: I have reverted to using masks.
default:		default:
return UnableToLegalize;		return UnableToLegalize;
case TargetOpcode::G_EXTRACT:		case TargetOpcode::G_EXTRACT:
return widenScalarExtract(MI, TypeIdx, WideTy);		return widenScalarExtract(MI, TypeIdx, WideTy);
case TargetOpcode::G_INSERT:		case TargetOpcode::G_INSERT:
return widenScalarInsert(MI, TypeIdx, WideTy);		return widenScalarInsert(MI, TypeIdx, WideTy);
case TargetOpcode::G_MERGE_VALUES:		case TargetOpcode::G_MERGE_VALUES:
return widenScalarMergeValues(MI, TypeIdx, WideTy);		return widenScalarMergeValues(MI, TypeIdx, WideTy);
case TargetOpcode::G_UNMERGE_VALUES:		case TargetOpcode::G_UNMERGE_VALUES:
return widenScalarUnmergeValues(MI, TypeIdx, WideTy);		return widenScalarUnmergeValues(MI, TypeIdx, WideTy);
case TargetOpcode::G_SADDO:		case TargetOpcode::G_SADDO:
case TargetOpcode::G_SSUBO:		case TargetOpcode::G_SSUBO:
		foadUnsubmitted Done Reply Inline Actions Needs a comment that the multiply can't possible overflow if the wide type is >= 2 * original width. It's a shame that you have to duplicate this check from line 1845. Maybe @arsenm knows a cleaner way to write this. foad: Needs a comment that the multiply can't possible overflow if the wide type is >= 2 * original…
case TargetOpcode::G_UADDO:		case TargetOpcode::G_UADDO:
case TargetOpcode::G_USUBO:		case TargetOpcode::G_USUBO:
case TargetOpcode::G_SADDE:		case TargetOpcode::G_SADDE:
case TargetOpcode::G_SSUBE:		case TargetOpcode::G_SSUBE:
case TargetOpcode::G_UADDE:		case TargetOpcode::G_UADDE:
case TargetOpcode::G_USUBE:		case TargetOpcode::G_USUBE:
return widenScalarAddSubOverflow(MI, TypeIdx, WideTy);		return widenScalarAddSubOverflow(MI, TypeIdx, WideTy);
		case TargetOpcode::G_UMULO:
		case TargetOpcode::G_SMULO:
		return widenScalarMulo(MI, TypeIdx, WideTy);
case TargetOpcode::G_SADDSAT:		case TargetOpcode::G_SADDSAT:
case TargetOpcode::G_SSUBSAT:		case TargetOpcode::G_SSUBSAT:
case TargetOpcode::G_SSHLSAT:		case TargetOpcode::G_SSHLSAT:
case TargetOpcode::G_UADDSAT:		case TargetOpcode::G_UADDSAT:
case TargetOpcode::G_USUBSAT:		case TargetOpcode::G_USUBSAT:
case TargetOpcode::G_USHLSAT:		case TargetOpcode::G_USHLSAT:
return widenScalarAddSubShlSat(MI, TypeIdx, WideTy);		return widenScalarAddSubShlSat(MI, TypeIdx, WideTy);
case TargetOpcode::G_CTTZ:		case TargetOpcode::G_CTTZ:
▲ Show 20 Lines • Show All 1,755 Lines • ▼ Show 20 Lines	for (int J = 0; J != PartsPerUnmerge; ++J)
MIB.addDef(MI.getOperand(I * PartsPerUnmerge + J).getReg());		MIB.addDef(MI.getOperand(I * PartsPerUnmerge + J).getReg());
MIB.addUse(Unmerge.getReg(I));		MIB.addUse(Unmerge.getReg(I));
}		}

MI.eraseFromParent();		MI.eraseFromParent();
return Legalized;		return Legalized;
}		}

		LegalizerHelper::LegalizeResult
		LegalizerHelper::fewerElementsVectorMulo(MachineInstr &MI, unsigned TypeIdx,
		foadUnsubmitted Not Done Reply Inline Actions Personally I would be tempted to generalize LegalizerHelper::fewerElementsVectorMultiEltType into a generic function that can handle any operation that works on vector elements independently. But that does not have to be part of this patch. foad: Personally I would be tempted to generalize LegalizerHelper::fewerElementsVectorMultiEltType…
		LLT NarrowTy) {
		Register Result = MI.getOperand(0).getReg();
		Register Overflow = MI.getOperand(1).getReg();
		Register LHS = MI.getOperand(2).getReg();
		Register RHS = MI.getOperand(3).getReg();
		arsenmUnsubmitted Done Reply Inline Actions Should just directly extract the reg here, there's no reason to refer to the MachineOperand. This is also using value copies of MachineOperand, which are generally not a good idea arsenm: Should just directly extract the reg here, there's no reason to refer to the MachineOperand.

		LLT SrcTy = MRI.getType(LHS);
		if (!SrcTy.isVector())
		return UnableToLegalize;

		LLT ElementType = SrcTy.getElementType();
		LLT OverflowElementTy = MRI.getType(Overflow).getElementType();
		const int NumResult = SrcTy.getNumElements();
		LLT GCDTy = getGCDType(SrcTy, NarrowTy);

		// Unmerge the operands to smaller parts of GCD type.
		auto UnmergeLHS = MIRBuilder.buildUnmerge(GCDTy, LHS);
		auto UnmergeRHS = MIRBuilder.buildUnmerge(GCDTy, RHS);

		const int NumOps = UnmergeLHS->getNumOperands() - 1;
		const int PartsPerUnmerge = NumResult / NumOps;
		LLT OverflowTy = LLT::scalarOrVector(PartsPerUnmerge, OverflowElementTy);
		LLT ResultTy = LLT::scalarOrVector(PartsPerUnmerge, ElementType);

		// Perform the operation over unmerged parts.
		SmallVector<Register, 8> ResultParts;
		SmallVector<Register, 8> OverflowParts;
		for (int I = 0; I != NumOps; ++I) {
		arsenmUnsubmitted Done Reply Inline Actions You can hide the createGenericVirtual register calls with build Instr like buildInstr(Opc, {ResultTy, OverFlowTy}... arsenm: You can hide the createGenericVirtual register calls with build Instr like buildInstr(Opc…
		Register Operand1 = UnmergeLHS->getOperand(I).getReg();
		Register Operand2 = UnmergeRHS->getOperand(I).getReg();
		auto PartMul = MIRBuilder.buildInstr(MI.getOpcode(), {ResultTy, OverflowTy},
		{Operand1, Operand2});
		ResultParts.push_back(PartMul->getOperand(0).getReg());
		OverflowParts.push_back(PartMul->getOperand(1).getReg());
		}

		LLT ResultLCMTy = buildLCMMergePieces(SrcTy, NarrowTy, GCDTy, ResultParts);
		arsenmUnsubmitted Done Reply Inline Actions You should preserve the boolean type of the incoming, not hardcode to s1. We also have LLT.changeElementType for this arsenm: You should preserve the boolean type of the incoming, not hardcode to s1. We also have LLT.
		LLT OverflowLCMTy =
		LLT::scalarOrVector(ResultLCMTy.getNumElements(), OverflowElementTy);

		// Recombine the pieces to the original result and overflow registers.
		buildWidenedRemergeToDst(Result, ResultLCMTy, ResultParts);
		buildWidenedRemergeToDst(Overflow, OverflowLCMTy, OverflowParts);
		MI.eraseFromParent();
		return Legalized;
		}

// Handle FewerElementsVector a G_BUILD_VECTOR or G_CONCAT_VECTORS that produces		// Handle FewerElementsVector a G_BUILD_VECTOR or G_CONCAT_VECTORS that produces
// a vector		// a vector
//		//
// Create a G_BUILD_VECTOR or G_CONCAT_VECTORS of NarrowTy pieces, padding with		// Create a G_BUILD_VECTOR or G_CONCAT_VECTORS of NarrowTy pieces, padding with
// undef as necessary.		// undef as necessary.
//		//
// %3:_(<3 x s16>) = G_BUILD_VECTOR %0, %1, %2		// %3:_(<3 x s16>) = G_BUILD_VECTOR %0, %1, %2
// -> <2 x s16>		// -> <2 x s16>
▲ Show 20 Lines • Show All 391 Lines • ▼ Show 20 Lines	LegalizerHelper::fewerElementsVector(MachineInstr &MI, unsigned TypeIdx,
case G_FSHL:		case G_FSHL:
case G_FSHR:		case G_FSHR:
case G_FREEZE:		case G_FREEZE:
case G_SADDSAT:		case G_SADDSAT:
case G_SSUBSAT:		case G_SSUBSAT:
case G_UADDSAT:		case G_UADDSAT:
case G_USUBSAT:		case G_USUBSAT:
return reduceOperationWidth(MI, TypeIdx, NarrowTy);		return reduceOperationWidth(MI, TypeIdx, NarrowTy);
		case G_UMULO:
		case G_SMULO:
		return fewerElementsVectorMulo(MI, TypeIdx, NarrowTy);
case G_SHL:		case G_SHL:
case G_LSHR:		case G_LSHR:
case G_ASHR:		case G_ASHR:
case G_SSHLSAT:		case G_SSHLSAT:
case G_USHLSAT:		case G_USHLSAT:
case G_CTLZ:		case G_CTLZ:
case G_CTLZ_ZERO_UNDEF:		case G_CTLZ_ZERO_UNDEF:
case G_CTTZ:		case G_CTTZ:
▲ Show 20 Lines • Show All 2,279 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

Show First 20 Lines • Show All 593 Lines • ▼ Show 20 Lines	Mulh
.clampMaxNumElements(0, S8, 2)		.clampMaxNumElements(0, S8, 2)
.lowerFor({V2S8});		.lowerFor({V2S8});
}		}

Mulh		Mulh
.scalarize(0)		.scalarize(0)
.lower();		.lower();

// Report legal for any types we can handle anywhere. For the cases only legal		// Report legal for any types we can handle anywhere. For the cases only legal
		arsenmUnsubmitted Not Done Reply Inline Actions Would you get better results if you: scalarized vectors first Promoted small scalar types first? arsenm: Would you get better results if you: 1. scalarized vectors first 2. Promoted small scalar types…
		pdhaliwalAuthorUnsubmitted Done Reply Inline Actions In both cases, the MULO is somehow considered legal by compiler. For e.g. even for s32, expansion does not occur, compiler is directly using S/UMULO instructions. I am investigating this. pdhaliwal: In both cases, the MULO is somehow considered legal by compiler. For e.g. even for s32…
		pdhaliwalAuthorUnsubmitted Done Reply Inline Actions It was due to missing corresponding definition of widenScalar for this operation. I tried implementing one for UMULO as it was easier. Didn't see any better result. pdhaliwal: It was due to missing corresponding definition of widenScalar for this operation. I tried…
// on the SALU, RegBankSelect will be able to re-legalize.		// on the SALU, RegBankSelect will be able to re-legalize.
getActionDefinitionsBuilder({G_AND, G_OR, G_XOR})		getActionDefinitionsBuilder({G_AND, G_OR, G_XOR})
.legalFor({S32, S1, S64, V2S32, S16, V2S16, V4S16})		.legalFor({S32, S1, S64, V2S32, S16, V2S16, V4S16})
.clampScalar(0, S32, S64)		.clampScalar(0, S32, S64)
.moreElementsIf(isSmallOddVector(0), oneMoreElement(0))		.moreElementsIf(isSmallOddVector(0), oneMoreElement(0))
.fewerElementsIf(vectorWiderThan(0, 64), fewerEltsToSize64Vector(0))		.fewerElementsIf(vectorWiderThan(0, 64), fewerEltsToSize64Vector(0))
.widenScalarToNextPow2(0)		.widenScalarToNextPow2(0)
.scalarize(0);		.scalarize(0);
▲ Show 20 Lines • Show All 985 Lines • ▼ Show 20 Lines	getActionDefinitionsBuilder(G_FSHR)
.lower();		.lower();

getActionDefinitionsBuilder(G_READCYCLECOUNTER)		getActionDefinitionsBuilder(G_READCYCLECOUNTER)
.legalFor({S64});		.legalFor({S64});

getActionDefinitionsBuilder(G_FENCE)		getActionDefinitionsBuilder(G_FENCE)
.alwaysLegal();		.alwaysLegal();

		getActionDefinitionsBuilder({G_SMULO, G_UMULO})
		.scalarize(0)
		.minScalar(0, S32)
		.lower();

getActionDefinitionsBuilder({		getActionDefinitionsBuilder({
// TODO: Verify V_BFI_B32 is generated from expanded bit ops		// TODO: Verify V_BFI_B32 is generated from expanded bit ops
G_FCOPYSIGN,		G_FCOPYSIGN,

G_ATOMIC_CMPXCHG_WITH_SUCCESS,		G_ATOMIC_CMPXCHG_WITH_SUCCESS,
G_ATOMICRMW_NAND,		G_ATOMICRMW_NAND,
G_ATOMICRMW_FSUB,		G_ATOMICRMW_FSUB,
G_READ_REGISTER,		G_READ_REGISTER,
▲ Show 20 Lines • Show All 3,233 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-smulo.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=fiji -O0 -run-pass=legalizer %s -o - \| FileCheck %s --check-prefix=GFX8
				# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx906 -O0 -run-pass=legalizer %s -o - \| FileCheck %s --check-prefix=GFX9
				foadUnsubmitted Not Done Reply Inline Actions Can you use -check-prefixes=GCN,GFX8 and GCN,GFX9 so that update_mir_test_checks will common up the identical ones? foad: Can you use -check-prefixes=GCN,GFX8 and GCN,GFX9 so that update_mir_test_checks will common up…

				---
				name: test_smulo_s32
				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1

				; GFX8-LABEL: name: test_smulo_s32
				; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX8: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX8: [[SMULH:%[0-9]+]]:_(s32) = G_SMULH [[COPY]], [[COPY1]]
				; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[COPY]], [[COPY1]]
				; GFX8: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 31
				; GFX8: [[ASHR:%[0-9]+]]:_(s32) = G_ASHR [[MUL]], [[C]](s32)
				; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[SMULH]](s32), [[ASHR]]
				; GFX8: [[SEXT:%[0-9]+]]:_(s32) = G_SEXT [[ICMP]](s1)
				; GFX8: $vgpr0 = COPY [[MUL]](s32)
				; GFX8: $vgpr1 = COPY [[SEXT]](s32)
				; GFX9-LABEL: name: test_smulo_s32
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[SMULH:%[0-9]+]]:_(s32) = G_SMULH [[COPY]], [[COPY1]]
				; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[COPY]], [[COPY1]]
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 31
				; GFX9: [[ASHR:%[0-9]+]]:_(s32) = G_ASHR [[MUL]], [[C]](s32)
				; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[SMULH]](s32), [[ASHR]]
				; GFX9: [[SEXT:%[0-9]+]]:_(s32) = G_SEXT [[ICMP]](s1)
				; GFX9: $vgpr0 = COPY [[MUL]](s32)
				; GFX9: $vgpr1 = COPY [[SEXT]](s32)
				%0:_(s32) = COPY $vgpr0
				%1:_(s32) = COPY $vgpr1
				%2:_(s32), %3:_(s1) = G_SMULO %0, %1
				%4:_(s32) = G_SEXT %3
				$vgpr0 = COPY %2
				$vgpr1 = COPY %4
				...

				---
				name: test_smulo_v2s32
				body: \|
				bb.0:
				liveins: $vgpr0_vgpr1, $vgpr2_vgpr3

				; GFX8-LABEL: name: test_smulo_v2s32
				; GFX8: [[COPY:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr0_vgpr1
				; GFX8: [[COPY1:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr2_vgpr3
				; GFX8: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY]](<2 x s32>)
				; GFX8: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](<2 x s32>)
				; GFX8: [[SMULH:%[0-9]+]]:_(s32) = G_SMULH [[UV]], [[UV2]]
				; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[UV]], [[UV2]]
				; GFX8: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 31
				; GFX8: [[ASHR:%[0-9]+]]:_(s32) = G_ASHR [[MUL]], [[C]](s32)
				; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[SMULH]](s32), [[ASHR]]
				; GFX8: [[SMULH1:%[0-9]+]]:_(s32) = G_SMULH [[UV1]], [[UV3]]
				; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UV1]], [[UV3]]
				; GFX8: [[ASHR1:%[0-9]+]]:_(s32) = G_ASHR [[MUL1]], [[C]](s32)
				; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[SMULH1]](s32), [[ASHR1]]
				; GFX8: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[MUL]](s32), [[MUL1]](s32)
				; GFX8: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP]](s1)
				; GFX8: [[ANYEXT1:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP1]](s1)
				; GFX8: [[COPY2:%[0-9]+]]:_(s32) = COPY [[ANYEXT]](s32)
				; GFX8: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY2]], 1
				; GFX8: [[COPY3:%[0-9]+]]:_(s32) = COPY [[ANYEXT1]](s32)
				; GFX8: [[SEXT_INREG1:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY3]], 1
				; GFX8: [[BUILD_VECTOR1:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SEXT_INREG]](s32), [[SEXT_INREG1]](s32)
				; GFX8: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>)
				; GFX8: $vgpr2_vgpr3 = COPY [[BUILD_VECTOR1]](<2 x s32>)
				; GFX9-LABEL: name: test_smulo_v2s32
				; GFX9: [[COPY:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr0_vgpr1
				; GFX9: [[COPY1:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr2_vgpr3
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY]](<2 x s32>)
				; GFX9: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](<2 x s32>)
				; GFX9: [[SMULH:%[0-9]+]]:_(s32) = G_SMULH [[UV]], [[UV2]]
				; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[UV]], [[UV2]]
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 31
				; GFX9: [[ASHR:%[0-9]+]]:_(s32) = G_ASHR [[MUL]], [[C]](s32)
				; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[SMULH]](s32), [[ASHR]]
				; GFX9: [[SMULH1:%[0-9]+]]:_(s32) = G_SMULH [[UV1]], [[UV3]]
				; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UV1]], [[UV3]]
				; GFX9: [[ASHR1:%[0-9]+]]:_(s32) = G_ASHR [[MUL1]], [[C]](s32)
				; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[SMULH1]](s32), [[ASHR1]]
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[MUL]](s32), [[MUL1]](s32)
				; GFX9: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP]](s1)
				; GFX9: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[ANYEXT]], 1
				; GFX9: [[ANYEXT1:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP1]](s1)
				; GFX9: [[SEXT_INREG1:%[0-9]+]]:_(s32) = G_SEXT_INREG [[ANYEXT1]], 1
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SEXT_INREG]](s32), [[SEXT_INREG1]](s32)
				; GFX9: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>)
				; GFX9: $vgpr2_vgpr3 = COPY [[BUILD_VECTOR1]](<2 x s32>)
				%0:_(<2 x s32>) = COPY $vgpr0_vgpr1
				%1:_(<2 x s32>) = COPY $vgpr2_vgpr3
				%2:_(<2 x s32>), %3:_(<2 x s1>) = G_SMULO %0, %1
				%4:_(<2 x s32>) = G_SEXT %3
				$vgpr0_vgpr1 = COPY %2
				$vgpr2_vgpr3 = COPY %4
				...

				---
				name: test_smulo_s16
				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1

				; GFX8-LABEL: name: test_smulo_s16
				; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX8: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX8: [[COPY2:%[0-9]+]]:_(s32) = COPY [[COPY]](s32)
				; GFX8: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY2]], 16
				; GFX8: [[COPY3:%[0-9]+]]:_(s32) = COPY [[COPY1]](s32)
				; GFX8: [[SEXT_INREG1:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY3]], 16
				foadUnsubmitted Not Done Reply Inline Actions This looks wrong as Matt noted above. Doesn't G_SEXT_INREG require identical source and result types? Would this fail MIR verification? foad: This looks wrong as Matt noted above. Doesn't G_SEXT_INREG require identical source and result…
				; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SEXT_INREG]], [[SEXT_INREG1]]
				; GFX8: [[SEXT_INREG2:%[0-9]+]]:_(s32) = G_SEXT_INREG [[MUL]], 16
				; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL]](s32), [[SEXT_INREG2]]
				; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[MUL]](s32)
				; GFX8: [[SEXT_INREG3:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY4]], 16
				; GFX8: [[SEXT:%[0-9]+]]:_(s32) = G_SEXT [[ICMP]](s1)
				; GFX8: $vgpr0 = COPY [[SEXT_INREG3]](s32)
				; GFX8: $vgpr1 = COPY [[SEXT]](s32)
				; GFX9-LABEL: name: test_smulo_s16
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY [[COPY]](s32)
				; GFX9: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY2]], 16
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY [[COPY1]](s32)
				; GFX9: [[SEXT_INREG1:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY3]], 16
				; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SEXT_INREG]], [[SEXT_INREG1]]
				; GFX9: [[SEXT_INREG2:%[0-9]+]]:_(s32) = G_SEXT_INREG [[MUL]], 16
				; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL]](s32), [[SEXT_INREG2]]
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[MUL]](s32)
				; GFX9: [[SEXT_INREG3:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY4]], 16
				; GFX9: [[SEXT:%[0-9]+]]:_(s32) = G_SEXT [[ICMP]](s1)
				; GFX9: $vgpr0 = COPY [[SEXT_INREG3]](s32)
				; GFX9: $vgpr1 = COPY [[SEXT]](s32)
				%0:_(s32) = COPY $vgpr0
				%1:_(s32) = COPY $vgpr1
				%2:_(s16) = G_TRUNC %0
				%3:_(s16) = G_TRUNC %1
				%4:_(s16), %6:_(s1) = G_SMULO %2, %3
				%5:_(s32) = G_SEXT %4
				%7:_(s32) = G_SEXT %6
				$vgpr0 = COPY %5
				$vgpr1 = COPY %7
				...

				---
				name: test_smulo_s8
				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1

				; GFX8-LABEL: name: test_smulo_s8
				; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX8: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX8: [[COPY2:%[0-9]+]]:_(s32) = COPY [[COPY]](s32)
				; GFX8: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY2]], 8
				; GFX8: [[COPY3:%[0-9]+]]:_(s32) = COPY [[COPY1]](s32)
				; GFX8: [[SEXT_INREG1:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY3]], 8
				; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SEXT_INREG]], [[SEXT_INREG1]]
				; GFX8: [[SEXT_INREG2:%[0-9]+]]:_(s32) = G_SEXT_INREG [[MUL]], 8
				; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL]](s32), [[SEXT_INREG2]]
				; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[MUL]](s32)
				; GFX8: [[SEXT_INREG3:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY4]], 8
				; GFX8: [[SEXT:%[0-9]+]]:_(s32) = G_SEXT [[ICMP]](s1)
				; GFX8: $vgpr0 = COPY [[SEXT_INREG3]](s32)
				; GFX8: $vgpr1 = COPY [[SEXT]](s32)
				; GFX9-LABEL: name: test_smulo_s8
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY [[COPY]](s32)
				; GFX9: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY2]], 8
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY [[COPY1]](s32)
				; GFX9: [[SEXT_INREG1:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY3]], 8
				; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SEXT_INREG]], [[SEXT_INREG1]]
				; GFX9: [[SEXT_INREG2:%[0-9]+]]:_(s32) = G_SEXT_INREG [[MUL]], 8
				; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL]](s32), [[SEXT_INREG2]]
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[MUL]](s32)
				; GFX9: [[SEXT_INREG3:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY4]], 8
				; GFX9: [[SEXT:%[0-9]+]]:_(s32) = G_SEXT [[ICMP]](s1)
				; GFX9: $vgpr0 = COPY [[SEXT_INREG3]](s32)
				; GFX9: $vgpr1 = COPY [[SEXT]](s32)
				%0:_(s32) = COPY $vgpr0
				%1:_(s32) = COPY $vgpr1
				%2:_(s8) = G_TRUNC %0
				%3:_(s8) = G_TRUNC %1
				%4:_(s8), %6:_(s1) = G_SMULO %2, %3
				%5:_(s32) = G_SEXT %4
				%7:_(s32) = G_SEXT %6
				$vgpr0 = COPY %5
				$vgpr1 = COPY %7
				...

				---
				name: test_smulo_v2s16
				body: \|
				bb.0:
				liveins: $vgpr0_vgpr1, $vgpr2_vgpr3
				; GFX8-LABEL: name: test_smulo_v2s16
				; GFX8: [[COPY:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr0_vgpr1
				; GFX8: [[COPY1:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr2_vgpr3
				; GFX8: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY]](<2 x s32>)
				; GFX8: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](<2 x s32>)
				; GFX8: [[COPY2:%[0-9]+]]:_(s32) = COPY [[UV]](s32)
				; GFX8: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY2]], 16
				; GFX8: [[COPY3:%[0-9]+]]:_(s32) = COPY [[UV2]](s32)
				; GFX8: [[SEXT_INREG1:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY3]], 16
				; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SEXT_INREG]], [[SEXT_INREG1]]
				; GFX8: [[SEXT_INREG2:%[0-9]+]]:_(s32) = G_SEXT_INREG [[MUL]], 16
				; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL]](s32), [[SEXT_INREG2]]
				; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[UV1]](s32)
				; GFX8: [[SEXT_INREG3:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY4]], 16
				; GFX8: [[COPY5:%[0-9]+]]:_(s32) = COPY [[UV3]](s32)
				; GFX8: [[SEXT_INREG4:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY5]], 16
				; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[SEXT_INREG3]], [[SEXT_INREG4]]
				; GFX8: [[SEXT_INREG5:%[0-9]+]]:_(s32) = G_SEXT_INREG [[MUL1]], 16
				; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL1]](s32), [[SEXT_INREG5]]
				; GFX8: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 65535
				; GFX8: [[COPY6:%[0-9]+]]:_(s32) = COPY [[MUL]](s32)
				; GFX8: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C]]
				; GFX8: [[COPY7:%[0-9]+]]:_(s32) = COPY [[MUL1]](s32)
				; GFX8: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C]]
				; GFX8: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX8: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND1]], [[C1]](s32)
				; GFX8: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND]], [[SHL]]
				; GFX8: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[OR]](s32)
				; GFX8: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP]](s1)
				; GFX8: [[ANYEXT1:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP1]](s1)
				; GFX8: [[COPY8:%[0-9]+]]:_(s32) = COPY [[ANYEXT]](s32)
				; GFX8: [[SEXT_INREG6:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY8]], 1
				; GFX8: [[COPY9:%[0-9]+]]:_(s32) = COPY [[ANYEXT1]](s32)
				; GFX8: [[SEXT_INREG7:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY9]], 1
				; GFX8: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SEXT_INREG6]](s32), [[SEXT_INREG7]](s32)
				; GFX8: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[BITCAST]](<2 x s16>)
				; GFX8: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C1]](s32)
				; GFX8: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST1]](s32)
				; GFX8: [[SEXT_INREG8:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY10]], 16
				; GFX8: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX8: [[SEXT_INREG9:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY11]], 16
				; GFX8: [[BUILD_VECTOR1:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SEXT_INREG8]](s32), [[SEXT_INREG9]](s32)
				; GFX8: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR1]](<2 x s32>)
				; GFX8: $vgpr2_vgpr3 = COPY [[BUILD_VECTOR]](<2 x s32>)
				; GFX9-LABEL: name: test_smulo_v2s16
				; GFX9: [[COPY:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr0_vgpr1
				; GFX9: [[COPY1:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr2_vgpr3
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY]](<2 x s32>)
				; GFX9: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](<2 x s32>)
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY [[UV]](s32)
				; GFX9: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY2]], 16
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY [[UV2]](s32)
				; GFX9: [[SEXT_INREG1:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY3]], 16
				; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SEXT_INREG]], [[SEXT_INREG1]]
				; GFX9: [[SEXT_INREG2:%[0-9]+]]:_(s32) = G_SEXT_INREG [[MUL]], 16
				; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL]](s32), [[SEXT_INREG2]]
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[UV1]](s32)
				; GFX9: [[SEXT_INREG3:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY4]], 16
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY [[UV3]](s32)
				; GFX9: [[SEXT_INREG4:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY5]], 16
				; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[SEXT_INREG3]], [[SEXT_INREG4]]
				; GFX9: [[SEXT_INREG5:%[0-9]+]]:_(s32) = G_SEXT_INREG [[MUL1]], 16
				; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL1]](s32), [[SEXT_INREG5]]
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY [[MUL]](s32)
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY [[MUL1]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP]](s1)
				; GFX9: [[SEXT_INREG6:%[0-9]+]]:_(s32) = G_SEXT_INREG [[ANYEXT]], 1
				; GFX9: [[ANYEXT1:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP1]](s1)
				; GFX9: [[SEXT_INREG7:%[0-9]+]]:_(s32) = G_SEXT_INREG [[ANYEXT1]], 1
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SEXT_INREG6]](s32), [[SEXT_INREG7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST]], [[C]](s32)
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[SEXT_INREG8:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY8]], 16
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[SEXT_INREG9:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY9]], 16
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SEXT_INREG8]](s32), [[SEXT_INREG9]](s32)
				; GFX9: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR1]](<2 x s32>)
				; GFX9: $vgpr2_vgpr3 = COPY [[BUILD_VECTOR]](<2 x s32>)
				%0:_(<2 x s32>) = COPY $vgpr0_vgpr1
				%1:_(<2 x s32>) = COPY $vgpr2_vgpr3
				%2:_(<2 x s16>) = G_TRUNC %0
				%3:_(<2 x s16>) = G_TRUNC %1
				%4:_(<2 x s16>), %6:_(<2 x s1>) = G_SMULO %2, %3
				%7:_(<2 x s32>) = G_SEXT %6
				%5:_(<2 x s32>) = G_SEXT %4
				$vgpr0_vgpr1 = COPY %5
				$vgpr2_vgpr3 = COPY %7
				...


				---
				name: test_smulo_v2s8
				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3
				; GFX8-LABEL: name: test_smulo_v2s8
				; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX8: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX8: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX8: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[COPY]](s32)
				; GFX8: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY4]], 8
				; GFX8: [[COPY5:%[0-9]+]]:_(s32) = COPY [[COPY2]](s32)
				; GFX8: [[SEXT_INREG1:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY5]], 8
				; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SEXT_INREG]], [[SEXT_INREG1]]
				; GFX8: [[SEXT_INREG2:%[0-9]+]]:_(s32) = G_SEXT_INREG [[MUL]], 8
				; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL]](s32), [[SEXT_INREG2]]
				; GFX8: [[COPY6:%[0-9]+]]:_(s32) = COPY [[COPY1]](s32)
				; GFX8: [[SEXT_INREG3:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY6]], 8
				; GFX8: [[COPY7:%[0-9]+]]:_(s32) = COPY [[COPY3]](s32)
				; GFX8: [[SEXT_INREG4:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY7]], 8
				; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[SEXT_INREG3]], [[SEXT_INREG4]]
				; GFX8: [[SEXT_INREG5:%[0-9]+]]:_(s32) = G_SEXT_INREG [[MUL1]], 8
				; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL1]](s32), [[SEXT_INREG5]]
				; GFX8: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 255
				; GFX8: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[MUL]](s32)
				; GFX8: [[AND:%[0-9]+]]:_(s16) = G_AND [[TRUNC]], [[C]]
				; GFX8: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[MUL1]](s32)
				; GFX8: [[AND1:%[0-9]+]]:_(s16) = G_AND [[TRUNC1]], [[C]]
				; GFX8: [[C1:%[0-9]+]]:_(s16) = G_CONSTANT i16 8
				; GFX8: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[AND1]], [[C1]](s16)
				; GFX8: [[OR:%[0-9]+]]:_(s16) = G_OR [[AND]], [[SHL]]
				; GFX8: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[OR]](s16)
				; GFX8: [[ANYEXT1:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP]](s1)
				; GFX8: [[ANYEXT2:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP1]](s1)
				; GFX8: $vgpr0 = COPY [[ANYEXT]](s32)
				; GFX8: $vgpr1 = COPY [[ANYEXT1]](s32)
				; GFX8: $vgpr2 = COPY [[ANYEXT2]](s32)
				; GFX9-LABEL: name: test_smulo_v2s8
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[COPY]](s32)
				; GFX9: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY4]], 8
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY [[COPY2]](s32)
				; GFX9: [[SEXT_INREG1:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY5]], 8
				; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SEXT_INREG]], [[SEXT_INREG1]]
				; GFX9: [[SEXT_INREG2:%[0-9]+]]:_(s32) = G_SEXT_INREG [[MUL]], 8
				; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL]](s32), [[SEXT_INREG2]]
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY [[COPY1]](s32)
				; GFX9: [[SEXT_INREG3:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY6]], 8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY [[COPY3]](s32)
				; GFX9: [[SEXT_INREG4:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY7]], 8
				; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[SEXT_INREG3]], [[SEXT_INREG4]]
				; GFX9: [[SEXT_INREG5:%[0-9]+]]:_(s32) = G_SEXT_INREG [[MUL1]], 8
				; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL1]](s32), [[SEXT_INREG5]]
				; GFX9: [[C:%[0-9]+]]:_(s16) = G_CONSTANT i16 255
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[MUL]](s32)
				; GFX9: [[AND:%[0-9]+]]:_(s16) = G_AND [[TRUNC]], [[C]]
				; GFX9: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[MUL1]](s32)
				; GFX9: [[AND1:%[0-9]+]]:_(s16) = G_AND [[TRUNC1]], [[C]]
				; GFX9: [[C1:%[0-9]+]]:_(s16) = G_CONSTANT i16 8
				; GFX9: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[AND1]], [[C1]](s16)
				; GFX9: [[OR:%[0-9]+]]:_(s16) = G_OR [[AND]], [[SHL]]
				; GFX9: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[OR]](s16)
				; GFX9: [[ANYEXT1:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP]](s1)
				; GFX9: [[ANYEXT2:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP1]](s1)
				; GFX9: $vgpr0 = COPY [[ANYEXT]](s32)
				; GFX9: $vgpr1 = COPY [[ANYEXT1]](s32)
				; GFX9: $vgpr2 = COPY [[ANYEXT2]](s32)
				%0:_(s32) = COPY $vgpr0
				%1:_(s32) = COPY $vgpr1
				%2:_(s32) = COPY $vgpr2
				%3:_(s32) = COPY $vgpr3
				%5:_(s8) = G_TRUNC %0
				%6:_(s8) = G_TRUNC %1
				%7:_(s8) = G_TRUNC %2
				%8:_(s8) = G_TRUNC %3
				%11:_(<2 x s8>) = G_BUILD_VECTOR %5, %6
				%12:_(<2 x s8>) = G_BUILD_VECTOR %7, %8
				%13:_(<2 x s8>), %19:_(<2 x s1>) = G_SMULO %11, %12
				%20:_(<2 x s32>) = G_SEXT %19
				%14:_(s8), %15:_(s8) = G_UNMERGE_VALUES %13
				%21:_(s1), %22:_(s1) = G_UNMERGE_VALUES %19
				%17:_(s16) = G_MERGE_VALUES %14, %15
				%18:_(s32) = G_ANYEXT %17
				%23:_(s32) = G_ANYEXT %21
				%24:_(s32) = G_ANYEXT %22
				$vgpr0 = COPY %18
				$vgpr1 = COPY %23
				$vgpr2 = COPY %24
				...

				---
				name: test_smulo_v4s8
				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1
				; GFX8-LABEL: name: test_smulo_v4s8
				; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX8: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX8: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 8
				; GFX8: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[COPY]], [[C]](s32)
				; GFX8: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX8: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[COPY]], [[C1]](s32)
				; GFX8: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 24
				; GFX8: [[LSHR2:%[0-9]+]]:_(s32) = G_LSHR [[COPY]], [[C2]](s32)
				; GFX8: [[LSHR3:%[0-9]+]]:_(s32) = G_LSHR [[COPY1]], [[C]](s32)
				; GFX8: [[LSHR4:%[0-9]+]]:_(s32) = G_LSHR [[COPY1]], [[C1]](s32)
				; GFX8: [[LSHR5:%[0-9]+]]:_(s32) = G_LSHR [[COPY1]], [[C2]](s32)
				; GFX8: [[COPY2:%[0-9]+]]:_(s32) = COPY [[COPY]](s32)
				; GFX8: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY2]], 8
				; GFX8: [[COPY3:%[0-9]+]]:_(s32) = COPY [[COPY1]](s32)
				; GFX8: [[SEXT_INREG1:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY3]], 8
				; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SEXT_INREG]], [[SEXT_INREG1]]
				; GFX8: [[SEXT_INREG2:%[0-9]+]]:_(s32) = G_SEXT_INREG [[MUL]], 8
				; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL]](s32), [[SEXT_INREG2]]
				; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX8: [[SEXT_INREG3:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY4]], 8
				; GFX8: [[COPY5:%[0-9]+]]:_(s32) = COPY [[LSHR3]](s32)
				; GFX8: [[SEXT_INREG4:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY5]], 8
				; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[SEXT_INREG3]], [[SEXT_INREG4]]
				; GFX8: [[SEXT_INREG5:%[0-9]+]]:_(s32) = G_SEXT_INREG [[MUL1]], 8
				; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL1]](s32), [[SEXT_INREG5]]
				; GFX8: [[COPY6:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX8: [[SEXT_INREG6:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY6]], 8
				; GFX8: [[COPY7:%[0-9]+]]:_(s32) = COPY [[LSHR4]](s32)
				; GFX8: [[SEXT_INREG7:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY7]], 8
				; GFX8: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SEXT_INREG6]], [[SEXT_INREG7]]
				; GFX8: [[SEXT_INREG8:%[0-9]+]]:_(s32) = G_SEXT_INREG [[MUL2]], 8
				; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL2]](s32), [[SEXT_INREG8]]
				; GFX8: [[COPY8:%[0-9]+]]:_(s32) = COPY [[LSHR2]](s32)
				; GFX8: [[SEXT_INREG9:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY8]], 8
				; GFX8: [[COPY9:%[0-9]+]]:_(s32) = COPY [[LSHR5]](s32)
				; GFX8: [[SEXT_INREG10:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY9]], 8
				; GFX8: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[SEXT_INREG9]], [[SEXT_INREG10]]
				; GFX8: [[SEXT_INREG11:%[0-9]+]]:_(s32) = G_SEXT_INREG [[MUL3]], 8
				; GFX8: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL3]](s32), [[SEXT_INREG11]]
				; GFX8: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 255
				; GFX8: [[COPY10:%[0-9]+]]:_(s32) = COPY [[MUL]](s32)
				; GFX8: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY10]], [[C3]]
				; GFX8: [[COPY11:%[0-9]+]]:_(s32) = COPY [[MUL1]](s32)
				; GFX8: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY11]], [[C3]]
				; GFX8: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND1]], [[C]](s32)
				; GFX8: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND]], [[SHL]]
				; GFX8: [[COPY12:%[0-9]+]]:_(s32) = COPY [[MUL2]](s32)
				; GFX8: [[AND2:%[0-9]+]]:_(s32) = G_AND [[COPY12]], [[C3]]
				; GFX8: [[SHL1:%[0-9]+]]:_(s32) = G_SHL [[AND2]], [[C1]](s32)
				; GFX8: [[OR1:%[0-9]+]]:_(s32) = G_OR [[OR]], [[SHL1]]
				; GFX8: [[COPY13:%[0-9]+]]:_(s32) = COPY [[MUL3]](s32)
				; GFX8: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY13]], [[C3]]
				; GFX8: [[SHL2:%[0-9]+]]:_(s32) = G_SHL [[AND3]], [[C2]](s32)
				; GFX8: [[OR2:%[0-9]+]]:_(s32) = G_OR [[OR1]], [[SHL2]]
				; GFX8: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP]](s1)
				; GFX8: $vgpr0 = COPY [[OR2]](s32)
				; GFX8: $vgpr1 = COPY [[ANYEXT]](s32)
				; GFX9-LABEL: name: test_smulo_v4s8
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 8
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[COPY]], [[C]](s32)
				; GFX9: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[COPY]], [[C1]](s32)
				; GFX9: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 24
				; GFX9: [[LSHR2:%[0-9]+]]:_(s32) = G_LSHR [[COPY]], [[C2]](s32)
				; GFX9: [[LSHR3:%[0-9]+]]:_(s32) = G_LSHR [[COPY1]], [[C]](s32)
				; GFX9: [[LSHR4:%[0-9]+]]:_(s32) = G_LSHR [[COPY1]], [[C1]](s32)
				; GFX9: [[LSHR5:%[0-9]+]]:_(s32) = G_LSHR [[COPY1]], [[C2]](s32)
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY [[COPY]](s32)
				; GFX9: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY2]], 8
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY [[COPY1]](s32)
				; GFX9: [[SEXT_INREG1:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY3]], 8
				; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SEXT_INREG]], [[SEXT_INREG1]]
				; GFX9: [[SEXT_INREG2:%[0-9]+]]:_(s32) = G_SEXT_INREG [[MUL]], 8
				; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL]](s32), [[SEXT_INREG2]]
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[SEXT_INREG3:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY4]], 8
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY [[LSHR3]](s32)
				; GFX9: [[SEXT_INREG4:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY5]], 8
				; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[SEXT_INREG3]], [[SEXT_INREG4]]
				; GFX9: [[SEXT_INREG5:%[0-9]+]]:_(s32) = G_SEXT_INREG [[MUL1]], 8
				; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL1]](s32), [[SEXT_INREG5]]
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX9: [[SEXT_INREG6:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY6]], 8
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY [[LSHR4]](s32)
				; GFX9: [[SEXT_INREG7:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY7]], 8
				; GFX9: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[SEXT_INREG6]], [[SEXT_INREG7]]
				; GFX9: [[SEXT_INREG8:%[0-9]+]]:_(s32) = G_SEXT_INREG [[MUL2]], 8
				; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL2]](s32), [[SEXT_INREG8]]
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY [[LSHR2]](s32)
				; GFX9: [[SEXT_INREG9:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY8]], 8
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY [[LSHR5]](s32)
				; GFX9: [[SEXT_INREG10:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY9]], 8
				; GFX9: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[SEXT_INREG9]], [[SEXT_INREG10]]
				; GFX9: [[SEXT_INREG11:%[0-9]+]]:_(s32) = G_SEXT_INREG [[MUL3]], 8
				; GFX9: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL3]](s32), [[SEXT_INREG11]]
				; GFX9: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 255
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY [[MUL]](s32)
				; GFX9: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY10]], [[C3]]
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY [[MUL1]](s32)
				; GFX9: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY11]], [[C3]]
				; GFX9: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND1]], [[C]](s32)
				; GFX9: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND]], [[SHL]]
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY [[MUL2]](s32)
				; GFX9: [[AND2:%[0-9]+]]:_(s32) = G_AND [[COPY12]], [[C3]]
				; GFX9: [[SHL1:%[0-9]+]]:_(s32) = G_SHL [[AND2]], [[C1]](s32)
				; GFX9: [[OR1:%[0-9]+]]:_(s32) = G_OR [[OR]], [[SHL1]]
				; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY [[MUL3]](s32)
				; GFX9: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY13]], [[C3]]
				; GFX9: [[SHL2:%[0-9]+]]:_(s32) = G_SHL [[AND3]], [[C2]](s32)
				; GFX9: [[OR2:%[0-9]+]]:_(s32) = G_OR [[OR1]], [[SHL2]]
				; GFX9: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP]](s1)
				; GFX9: $vgpr0 = COPY [[OR2]](s32)
				; GFX9: $vgpr1 = COPY [[ANYEXT]](s32)
				%0:_(s32) = COPY $vgpr0
				%1:_(s32) = COPY $vgpr1
				%2:_(s8), %3:_(s8), %4:_(s8), %5:_(s8) = G_UNMERGE_VALUES %0
				%6:_(s8), %7:_(s8), %8:_(s8), %9:_(s8) = G_UNMERGE_VALUES %1
				%10:_(<4 x s8>) = G_BUILD_VECTOR %2:_(s8), %3:_(s8), %4:_(s8), %5:_(s8)
				%11:_(<4 x s8>) = G_BUILD_VECTOR %6:_(s8), %7:_(s8), %8:_(s8), %9:_(s8)
				%12:_(<4 x s8>), %18:_(<4 x s1>) = G_SMULO %10:_, %11:_
				%13:_(s8), %14:_(s8), %15:_(s8), %16:_(s8) = G_UNMERGE_VALUES %12:_(<4 x s8>)
				%19:_(s1), %20:_(s1), %21:_(s1), %22:_(s1) = G_UNMERGE_VALUES %18:_(<4 x s1>)
				%17:_(s32) = G_MERGE_VALUES %13, %14, %15, %16
				%23:_(s32) = G_ANYEXT %19
				$vgpr0 = COPY %17
				$vgpr1 = COPY %23
				...

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulo.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
				# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=fiji -O0 -run-pass=legalizer %s -o - \| FileCheck %s --check-prefix=GFX8
				# RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx906 -O0 -run-pass=legalizer %s -o - \| FileCheck %s --check-prefix=GFX9

				arsenmUnsubmitted Done Reply Inline Actions Do you need the abort=0s? arsenm: Do you need the abort=0s?
				---
				name: test_umulo_s32
				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1

				; GFX8-LABEL: name: test_umulo_s32
				; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX8: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[COPY]], [[COPY1]]
				; GFX8: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
				; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[COPY]], [[COPY1]]
				; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[UMULH]](s32), [[C]]
				; GFX8: [[ZEXT:%[0-9]+]]:_(s32) = G_ZEXT [[ICMP]](s1)
				; GFX8: $vgpr0 = COPY [[MUL]](s32)
				; GFX8: $vgpr1 = COPY [[ZEXT]](s32)
				; GFX9-LABEL: name: test_umulo_s32
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[COPY]], [[COPY1]]
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
				; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[COPY]], [[COPY1]]
				; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[UMULH]](s32), [[C]]
				; GFX9: [[ZEXT:%[0-9]+]]:_(s32) = G_ZEXT [[ICMP]](s1)
				; GFX9: $vgpr0 = COPY [[MUL]](s32)
				; GFX9: $vgpr1 = COPY [[ZEXT]](s32)
				%0:_(s32) = COPY $vgpr0
				%1:_(s32) = COPY $vgpr1
				%2:_(s32), %3:_(s1) = G_UMULO %0, %1
				%4:_(s32) = G_ZEXT %3
				$vgpr0 = COPY %2
				$vgpr1 = COPY %4
				...

				---
				name: test_umulo_v2s32
				body: \|
				bb.0:
				liveins: $vgpr0_vgpr1, $vgpr2_vgpr3

				; GFX8-LABEL: name: test_umulo_v2s32
				; GFX8: [[COPY:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr0_vgpr1
				; GFX8: [[COPY1:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr2_vgpr3
				; GFX8: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY]](<2 x s32>)
				; GFX8: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](<2 x s32>)
				; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[UV]], [[UV2]]
				; GFX8: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
				; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[UV]], [[UV2]]
				; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[UMULH]](s32), [[C]]
				; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[UV1]], [[UV3]]
				; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UV1]], [[UV3]]
				; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[UMULH1]](s32), [[C]]
				; GFX8: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[MUL]](s32), [[MUL1]](s32)
				; GFX8: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP]](s1)
				; GFX8: [[ANYEXT1:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP1]](s1)
				; GFX8: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
				; GFX8: [[COPY2:%[0-9]+]]:_(s32) = COPY [[ANYEXT]](s32)
				foadUnsubmitted Not Done Reply Inline Actions Why do we get ANYEXT followed by AND with 1? In the scalar s32 case we just get ZEXT which is nicer. foad: Why do we get ANYEXT followed by AND with 1? In the scalar s32 case we just get ZEXT which is…
				pdhaliwalAuthorUnsubmitted Done Reply Inline Actions This is coming from widening of BUILD_VECTOR resulting from legalization of ICMP instruction. So, I guess ZEXT is presumed dead once BUILD_VECTOR gets legalized. pdhaliwal: This is coming from widening of BUILD_VECTOR resulting from legalization of ICMP instruction.
				; GFX8: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY2]], [[C1]]
				; GFX8: [[COPY3:%[0-9]+]]:_(s32) = COPY [[ANYEXT1]](s32)
				; GFX8: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C1]]
				; GFX8: [[BUILD_VECTOR1:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[AND]](s32), [[AND1]](s32)
				; GFX8: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>)
				; GFX8: $vgpr2_vgpr3 = COPY [[BUILD_VECTOR1]](<2 x s32>)
				; GFX9-LABEL: name: test_umulo_v2s32
				; GFX9: [[COPY:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr0_vgpr1
				; GFX9: [[COPY1:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr2_vgpr3
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY]](<2 x s32>)
				; GFX9: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](<2 x s32>)
				; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[UV]], [[UV2]]
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 0
				; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[UV]], [[UV2]]
				; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[UMULH]](s32), [[C]]
				; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[UV1]], [[UV3]]
				; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UV1]], [[UV3]]
				; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[UMULH1]](s32), [[C]]
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[MUL]](s32), [[MUL1]](s32)
				; GFX9: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
				; GFX9: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP]](s1)
				; GFX9: [[AND:%[0-9]+]]:_(s32) = G_AND [[ANYEXT]], [[C1]]
				; GFX9: [[ANYEXT1:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP1]](s1)
				; GFX9: [[AND1:%[0-9]+]]:_(s32) = G_AND [[ANYEXT1]], [[C1]]
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[AND]](s32), [[AND1]](s32)
				; GFX9: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>)
				; GFX9: $vgpr2_vgpr3 = COPY [[BUILD_VECTOR1]](<2 x s32>)
				%0:_(<2 x s32>) = COPY $vgpr0_vgpr1
				%1:_(<2 x s32>) = COPY $vgpr2_vgpr3
				%2:_(<2 x s32>), %3:_(<2 x s1>) = G_UMULO %0, %1
				%4:_(<2 x s32>) = G_ZEXT %3
				$vgpr0_vgpr1 = COPY %2
				$vgpr2_vgpr3 = COPY %4
				...

				---
				name: test_umulo_s64
				body: \|
				bb.0:
				liveins: $vgpr0_vgpr1, $vgpr2_vgpr3

				; GFX8-LABEL: name: test_umulo_s64
				; GFX8: [[COPY:%[0-9]+]]:_(s64) = COPY $vgpr0_vgpr1
				; GFX8: [[COPY1:%[0-9]+]]:_(s64) = COPY $vgpr2_vgpr3
				; GFX8: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY]](s64)
				; GFX8: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](s64)
				; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[UV1]], [[UV2]]
				; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UV]], [[UV3]]
				; GFX8: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[UV]], [[UV2]]
				; GFX8: [[UADDO:%[0-9]+]]:_(s32), [[UADDO1:%[0-9]+]]:_(s1) = G_UADDO [[MUL]], [[MUL1]]
				; GFX8: [[ZEXT:%[0-9]+]]:_(s32) = G_ZEXT [[UADDO1]](s1)
				; GFX8: [[UADDO2:%[0-9]+]]:_(s32), [[UADDO3:%[0-9]+]]:_(s1) = G_UADDO [[UADDO]], [[UMULH]]
				; GFX8: [[ZEXT1:%[0-9]+]]:_(s32) = G_ZEXT [[UADDO3]](s1)
				; GFX8: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[ZEXT]], [[ZEXT1]]
				; GFX8: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[UV1]], [[UV3]]
				; GFX8: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[UV1]], [[UV2]]
				; GFX8: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[UV]], [[UV3]]
				; GFX8: [[UADDO4:%[0-9]+]]:_(s32), [[UADDO5:%[0-9]+]]:_(s1) = G_UADDO [[MUL2]], [[UMULH1]]
				foadUnsubmitted Not Done Reply Inline Actions Is there a good reason why this is using UADDO plus ZEXT plus a second ADD, instead of using UADDE? foad: Is there a good reason why this is using UADDO plus ZEXT plus a second ADD, instead of using…
				pdhaliwalAuthorUnsubmitted Done Reply Inline Actions This is because of legalization of UMULH for s64. I am thinking of having different patch for this as it impacts narrowing of UMULH. pdhaliwal: This is because of legalization of UMULH for s64. I am thinking of having different patch for…
				; GFX8: [[ZEXT2:%[0-9]+]]:_(s32) = G_ZEXT [[UADDO5]](s1)
				; GFX8: [[UADDO6:%[0-9]+]]:_(s32), [[UADDO7:%[0-9]+]]:_(s1) = G_UADDO [[UADDO4]], [[UMULH2]]
				; GFX8: [[ZEXT3:%[0-9]+]]:_(s32) = G_ZEXT [[UADDO7]](s1)
				; GFX8: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[ZEXT2]], [[ZEXT3]]
				; GFX8: [[UADDO8:%[0-9]+]]:_(s32), [[UADDO9:%[0-9]+]]:_(s1) = G_UADDO [[UADDO6]], [[ADD]]
				; GFX8: [[ZEXT4:%[0-9]+]]:_(s32) = G_ZEXT [[UADDO9]](s1)
				; GFX8: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[ADD1]], [[ZEXT4]]
				; GFX8: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[UV1]], [[UV3]]
				; GFX8: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH3]], [[ADD2]]
				; GFX8: [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[UADDO8]](s32), [[ADD3]](s32)
				; GFX8: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 0
				; GFX8: [[UV4:%[0-9]+]]:_(s32), [[UV5:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY]](s64)
				; GFX8: [[UV6:%[0-9]+]]:_(s32), [[UV7:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](s64)
				; GFX8: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UV4]], [[UV6]]
				; GFX8: [[MUL4:%[0-9]+]]:_(s32) = G_MUL [[UV5]], [[UV6]]
				; GFX8: [[MUL5:%[0-9]+]]:_(s32) = G_MUL [[UV4]], [[UV7]]
				; GFX8: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[UV4]], [[UV6]]
				; GFX8: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[MUL4]], [[MUL5]]
				; GFX8: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[ADD4]], [[UMULH4]]
				; GFX8: [[MV1:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[MUL3]](s32), [[ADD5]](s32)
				; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MV]](s64), [[C]]
				; GFX8: [[ZEXT5:%[0-9]+]]:_(s64) = G_ZEXT [[ICMP]](s1)
				; GFX8: $vgpr0_vgpr1 = COPY [[MV1]](s64)
				; GFX8: $vgpr2_vgpr3 = COPY [[ZEXT5]](s64)
				; GFX9-LABEL: name: test_umulo_s64
				; GFX9: [[COPY:%[0-9]+]]:_(s64) = COPY $vgpr0_vgpr1
				; GFX9: [[COPY1:%[0-9]+]]:_(s64) = COPY $vgpr2_vgpr3
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY]](s64)
				; GFX9: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](s64)
				; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[UV1]], [[UV2]]
				; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[UV]], [[UV3]]
				; GFX9: [[UMULH:%[0-9]+]]:_(s32) = G_UMULH [[UV]], [[UV2]]
				; GFX9: [[UADDO:%[0-9]+]]:_(s32), [[UADDO1:%[0-9]+]]:_(s1) = G_UADDO [[MUL]], [[MUL1]]
				; GFX9: [[ZEXT:%[0-9]+]]:_(s32) = G_ZEXT [[UADDO1]](s1)
				; GFX9: [[UADDO2:%[0-9]+]]:_(s32), [[UADDO3:%[0-9]+]]:_(s1) = G_UADDO [[UADDO]], [[UMULH]]
				; GFX9: [[ZEXT1:%[0-9]+]]:_(s32) = G_ZEXT [[UADDO3]](s1)
				; GFX9: [[ADD:%[0-9]+]]:_(s32) = G_ADD [[ZEXT]], [[ZEXT1]]
				; GFX9: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[UV1]], [[UV3]]
				; GFX9: [[UMULH1:%[0-9]+]]:_(s32) = G_UMULH [[UV1]], [[UV2]]
				; GFX9: [[UMULH2:%[0-9]+]]:_(s32) = G_UMULH [[UV]], [[UV3]]
				; GFX9: [[UADDO4:%[0-9]+]]:_(s32), [[UADDO5:%[0-9]+]]:_(s1) = G_UADDO [[MUL2]], [[UMULH1]]
				; GFX9: [[ZEXT2:%[0-9]+]]:_(s32) = G_ZEXT [[UADDO5]](s1)
				; GFX9: [[UADDO6:%[0-9]+]]:_(s32), [[UADDO7:%[0-9]+]]:_(s1) = G_UADDO [[UADDO4]], [[UMULH2]]
				; GFX9: [[ZEXT3:%[0-9]+]]:_(s32) = G_ZEXT [[UADDO7]](s1)
				; GFX9: [[ADD1:%[0-9]+]]:_(s32) = G_ADD [[ZEXT2]], [[ZEXT3]]
				; GFX9: [[UADDO8:%[0-9]+]]:_(s32), [[UADDO9:%[0-9]+]]:_(s1) = G_UADDO [[UADDO6]], [[ADD]]
				; GFX9: [[ZEXT4:%[0-9]+]]:_(s32) = G_ZEXT [[UADDO9]](s1)
				; GFX9: [[ADD2:%[0-9]+]]:_(s32) = G_ADD [[ADD1]], [[ZEXT4]]
				; GFX9: [[UMULH3:%[0-9]+]]:_(s32) = G_UMULH [[UV1]], [[UV3]]
				; GFX9: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH3]], [[ADD2]]
				; GFX9: [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[UADDO8]](s32), [[ADD3]](s32)
				; GFX9: [[C:%[0-9]+]]:_(s64) = G_CONSTANT i64 0
				; GFX9: [[UV4:%[0-9]+]]:_(s32), [[UV5:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY]](s64)
				; GFX9: [[UV6:%[0-9]+]]:_(s32), [[UV7:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](s64)
				; GFX9: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[UV4]], [[UV6]]
				; GFX9: [[MUL4:%[0-9]+]]:_(s32) = G_MUL [[UV5]], [[UV6]]
				; GFX9: [[MUL5:%[0-9]+]]:_(s32) = G_MUL [[UV4]], [[UV7]]
				; GFX9: [[UMULH4:%[0-9]+]]:_(s32) = G_UMULH [[UV4]], [[UV6]]
				; GFX9: [[ADD4:%[0-9]+]]:_(s32) = G_ADD [[MUL4]], [[MUL5]]
				; GFX9: [[ADD5:%[0-9]+]]:_(s32) = G_ADD [[ADD4]], [[UMULH4]]
				; GFX9: [[MV1:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[MUL3]](s32), [[ADD5]](s32)
				; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MV]](s64), [[C]]
				; GFX9: [[ZEXT5:%[0-9]+]]:_(s64) = G_ZEXT [[ICMP]](s1)
				; GFX9: $vgpr0_vgpr1 = COPY [[MV1]](s64)
				; GFX9: $vgpr2_vgpr3 = COPY [[ZEXT5]](s64)
				%0:_(s64) = COPY $vgpr0_vgpr1
				%1:_(s64) = COPY $vgpr2_vgpr3
				%2:_(s64), %3:_(s1) = G_UMULO %0, %1
				%4:_(s64) = G_ZEXT %3
				$vgpr0_vgpr1 = COPY %2
				$vgpr2_vgpr3 = COPY %4
				...

				---
				name: test_umulo_s16
				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1

				; GFX8-LABEL: name: test_umulo_s16
				; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX8: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX8: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 65535
				; GFX8: [[COPY2:%[0-9]+]]:_(s32) = COPY [[COPY]](s32)
				; GFX8: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY2]], [[C]]
				; GFX8: [[COPY3:%[0-9]+]]:_(s32) = COPY [[COPY1]](s32)
				; GFX8: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]]
				; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[AND]], [[AND1]]
				; GFX8: [[AND2:%[0-9]+]]:_(s32) = G_AND [[MUL]], [[C]]
				; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL]](s32), [[AND2]]
				; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[MUL]](s32)
				; GFX8: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C]]
				; GFX8: [[ZEXT:%[0-9]+]]:_(s32) = G_ZEXT [[ICMP]](s1)
				; GFX8: $vgpr0 = COPY [[AND3]](s32)
				; GFX8: $vgpr1 = COPY [[ZEXT]](s32)
				; GFX9-LABEL: name: test_umulo_s16
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 65535
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY [[COPY]](s32)
				foadUnsubmitted Not Done Reply Inline Actions This expansion does an s32 multiply and an s16 multiply. It would be better to just do one s32 multiply -- you can extract all the information you need from the result of that. foad: This expansion does an s32 multiply and an s16 multiply. It would be better to just do one s32…
				pdhaliwalAuthorUnsubmitted Done Reply Inline Actions I was able to get unsigned operation use single s32 multiply. Signed is getting bit tricky. pdhaliwal: I was able to get unsigned operation use single s32 multiply. Signed is getting bit tricky.
				; GFX9: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY2]], [[C]]
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY [[COPY1]](s32)
				; GFX9: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]]
				; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[AND]], [[AND1]]
				; GFX9: [[AND2:%[0-9]+]]:_(s32) = G_AND [[MUL]], [[C]]
				; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL]](s32), [[AND2]]
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[MUL]](s32)
				; GFX9: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C]]
				; GFX9: [[ZEXT:%[0-9]+]]:_(s32) = G_ZEXT [[ICMP]](s1)
				; GFX9: $vgpr0 = COPY [[AND3]](s32)
				; GFX9: $vgpr1 = COPY [[ZEXT]](s32)
				%0:_(s32) = COPY $vgpr0
				%1:_(s32) = COPY $vgpr1
				%2:_(s16) = G_TRUNC %0
				%3:_(s16) = G_TRUNC %1
				%4:_(s16), %6:_(s1) = G_UMULO %2, %3
				%5:_(s32) = G_ZEXT %4
				%7:_(s32) = G_ZEXT %6
				$vgpr0 = COPY %5
				$vgpr1 = COPY %7
				...

				---
				name: test_umulo_s8
				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1

				; GFX8-LABEL: name: test_umulo_s8
				; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX8: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX8: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 255
				; GFX8: [[COPY2:%[0-9]+]]:_(s32) = COPY [[COPY]](s32)
				; GFX8: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY2]], [[C]]
				; GFX8: [[COPY3:%[0-9]+]]:_(s32) = COPY [[COPY1]](s32)
				; GFX8: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]]
				; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[AND]], [[AND1]]
				; GFX8: [[AND2:%[0-9]+]]:_(s32) = G_AND [[MUL]], [[C]]
				; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL]](s32), [[AND2]]
				; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[MUL]](s32)
				; GFX8: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C]]
				; GFX8: [[ZEXT:%[0-9]+]]:_(s32) = G_ZEXT [[ICMP]](s1)
				; GFX8: $vgpr0 = COPY [[AND3]](s32)
				; GFX8: $vgpr1 = COPY [[ZEXT]](s32)
				; GFX9-LABEL: name: test_umulo_s8
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 255
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY [[COPY]](s32)
				; GFX9: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY2]], [[C]]
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY [[COPY1]](s32)
				; GFX9: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]]
				; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[AND]], [[AND1]]
				; GFX9: [[AND2:%[0-9]+]]:_(s32) = G_AND [[MUL]], [[C]]
				; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL]](s32), [[AND2]]
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[MUL]](s32)
				; GFX9: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C]]
				; GFX9: [[ZEXT:%[0-9]+]]:_(s32) = G_ZEXT [[ICMP]](s1)
				; GFX9: $vgpr0 = COPY [[AND3]](s32)
				; GFX9: $vgpr1 = COPY [[ZEXT]](s32)
				%0:_(s32) = COPY $vgpr0
				%1:_(s32) = COPY $vgpr1
				%2:_(s8) = G_TRUNC %0
				%3:_(s8) = G_TRUNC %1
				%4:_(s8), %6:_(s1) = G_UMULO %2, %3
				%5:_(s32) = G_ZEXT %4
				%7:_(s32) = G_ZEXT %6
				$vgpr0 = COPY %5
				$vgpr1 = COPY %7
				...

				---
				name: test_umulo_v2s16
				body: \|
				bb.0:
				liveins: $vgpr0_vgpr1, $vgpr2_vgpr3
				; GFX8-LABEL: name: test_umulo_v2s16
				; GFX8: [[COPY:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr0_vgpr1
				; GFX8: [[COPY1:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr2_vgpr3
				; GFX8: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY]](<2 x s32>)
				; GFX8: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](<2 x s32>)
				; GFX8: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 65535
				; GFX8: [[COPY2:%[0-9]+]]:_(s32) = COPY [[UV]](s32)
				; GFX8: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY2]], [[C]]
				; GFX8: [[COPY3:%[0-9]+]]:_(s32) = COPY [[UV2]](s32)
				; GFX8: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]]
				; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[AND]], [[AND1]]
				; GFX8: [[AND2:%[0-9]+]]:_(s32) = G_AND [[MUL]], [[C]]
				; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL]](s32), [[AND2]]
				; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[UV1]](s32)
				; GFX8: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C]]
				; GFX8: [[COPY5:%[0-9]+]]:_(s32) = COPY [[UV3]](s32)
				; GFX8: [[AND4:%[0-9]+]]:_(s32) = G_AND [[COPY5]], [[C]]
				; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[AND3]], [[AND4]]
				; GFX8: [[AND5:%[0-9]+]]:_(s32) = G_AND [[MUL1]], [[C]]
				; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL1]](s32), [[AND5]]
				; GFX8: [[COPY6:%[0-9]+]]:_(s32) = COPY [[MUL]](s32)
				; GFX8: [[AND6:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C]]
				; GFX8: [[COPY7:%[0-9]+]]:_(s32) = COPY [[MUL1]](s32)
				; GFX8: [[AND7:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C]]
				; GFX8: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX8: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND7]], [[C1]](s32)
				; GFX8: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND6]], [[SHL]]
				; GFX8: [[BITCAST:%[0-9]+]]:_(<2 x s16>) = G_BITCAST [[OR]](s32)
				; GFX8: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP]](s1)
				; GFX8: [[ANYEXT1:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP1]](s1)
				; GFX8: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
				; GFX8: [[COPY8:%[0-9]+]]:_(s32) = COPY [[ANYEXT]](s32)
				; GFX8: [[AND8:%[0-9]+]]:_(s32) = G_AND [[COPY8]], [[C2]]
				; GFX8: [[COPY9:%[0-9]+]]:_(s32) = COPY [[ANYEXT1]](s32)
				; GFX8: [[AND9:%[0-9]+]]:_(s32) = G_AND [[COPY9]], [[C2]]
				; GFX8: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[AND8]](s32), [[AND9]](s32)
				; GFX8: [[BITCAST1:%[0-9]+]]:_(s32) = G_BITCAST [[BITCAST]](<2 x s16>)
				; GFX8: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST1]], [[C1]](s32)
				; GFX8: [[COPY10:%[0-9]+]]:_(s32) = COPY [[BITCAST1]](s32)
				; GFX8: [[AND10:%[0-9]+]]:_(s32) = G_AND [[COPY10]], [[C]]
				; GFX8: [[COPY11:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX8: [[AND11:%[0-9]+]]:_(s32) = G_AND [[COPY11]], [[C]]
				; GFX8: [[BUILD_VECTOR1:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[AND10]](s32), [[AND11]](s32)
				; GFX8: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR1]](<2 x s32>)
				; GFX8: $vgpr2_vgpr3 = COPY [[BUILD_VECTOR]](<2 x s32>)
				; GFX9-LABEL: name: test_umulo_v2s16
				; GFX9: [[COPY:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr0_vgpr1
				; GFX9: [[COPY1:%[0-9]+]]:_(<2 x s32>) = COPY $vgpr2_vgpr3
				; GFX9: [[UV:%[0-9]+]]:_(s32), [[UV1:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY]](<2 x s32>)
				; GFX9: [[UV2:%[0-9]+]]:_(s32), [[UV3:%[0-9]+]]:_(s32) = G_UNMERGE_VALUES [[COPY1]](<2 x s32>)
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 65535
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY [[UV]](s32)
				; GFX9: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY2]], [[C]]
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY [[UV2]](s32)
				; GFX9: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]]
				; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[AND]], [[AND1]]
				; GFX9: [[AND2:%[0-9]+]]:_(s32) = G_AND [[MUL]], [[C]]
				; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL]](s32), [[AND2]]
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[UV1]](s32)
				; GFX9: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C]]
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY [[UV3]](s32)
				; GFX9: [[AND4:%[0-9]+]]:_(s32) = G_AND [[COPY5]], [[C]]
				; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[AND3]], [[AND4]]
				; GFX9: [[AND5:%[0-9]+]]:_(s32) = G_AND [[MUL1]], [[C]]
				; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL1]](s32), [[AND5]]
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY [[MUL]](s32)
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY [[MUL1]](s32)
				; GFX9: [[BUILD_VECTOR_TRUNC:%[0-9]+]]:_(<2 x s16>) = G_BUILD_VECTOR_TRUNC [[COPY6]](s32), [[COPY7]](s32)
				; GFX9: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 1
				; GFX9: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP]](s1)
				; GFX9: [[AND6:%[0-9]+]]:_(s32) = G_AND [[ANYEXT]], [[C1]]
				; GFX9: [[ANYEXT1:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP1]](s1)
				; GFX9: [[AND7:%[0-9]+]]:_(s32) = G_AND [[ANYEXT1]], [[C1]]
				; GFX9: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[AND6]](s32), [[AND7]](s32)
				; GFX9: [[BITCAST:%[0-9]+]]:_(s32) = G_BITCAST [[BUILD_VECTOR_TRUNC]](<2 x s16>)
				; GFX9: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[BITCAST]], [[C2]](s32)
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY [[BITCAST]](s32)
				; GFX9: [[AND8:%[0-9]+]]:_(s32) = G_AND [[COPY8]], [[C]]
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[AND9:%[0-9]+]]:_(s32) = G_AND [[COPY9]], [[C]]
				; GFX9: [[BUILD_VECTOR1:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[AND8]](s32), [[AND9]](s32)
				; GFX9: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR1]](<2 x s32>)
				; GFX9: $vgpr2_vgpr3 = COPY [[BUILD_VECTOR]](<2 x s32>)
				%0:_(<2 x s32>) = COPY $vgpr0_vgpr1
				%1:_(<2 x s32>) = COPY $vgpr2_vgpr3
				%2:_(<2 x s16>) = G_TRUNC %0
				%3:_(<2 x s16>) = G_TRUNC %1
				%4:_(<2 x s16>), %6:_(<2 x s1>) = G_UMULO %2, %3
				%7:_(<2 x s32>) = G_ZEXT %6
				%5:_(<2 x s32>) = G_ZEXT %4
				$vgpr0_vgpr1 = COPY %5
				$vgpr2_vgpr3 = COPY %7
				...


				---
				name: test_umulo_v2s8
				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3
				; GFX8-LABEL: name: test_umulo_v2s8
				; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX8: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX8: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX8: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX8: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 255
				; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[COPY]](s32)
				; GFX8: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C]]
				; GFX8: [[COPY5:%[0-9]+]]:_(s32) = COPY [[COPY2]](s32)
				; GFX8: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY5]], [[C]]
				; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[AND]], [[AND1]]
				; GFX8: [[AND2:%[0-9]+]]:_(s32) = G_AND [[MUL]], [[C]]
				; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL]](s32), [[AND2]]
				; GFX8: [[COPY6:%[0-9]+]]:_(s32) = COPY [[COPY1]](s32)
				; GFX8: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C]]
				; GFX8: [[COPY7:%[0-9]+]]:_(s32) = COPY [[COPY3]](s32)
				; GFX8: [[AND4:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C]]
				; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[AND3]], [[AND4]]
				; GFX8: [[AND5:%[0-9]+]]:_(s32) = G_AND [[MUL1]], [[C]]
				; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL1]](s32), [[AND5]]
				; GFX8: [[C1:%[0-9]+]]:_(s16) = G_CONSTANT i16 255
				; GFX8: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[MUL]](s32)
				; GFX8: [[AND6:%[0-9]+]]:_(s16) = G_AND [[TRUNC]], [[C1]]
				; GFX8: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[MUL1]](s32)
				; GFX8: [[AND7:%[0-9]+]]:_(s16) = G_AND [[TRUNC1]], [[C1]]
				; GFX8: [[C2:%[0-9]+]]:_(s16) = G_CONSTANT i16 8
				; GFX8: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[AND7]], [[C2]](s16)
				; GFX8: [[OR:%[0-9]+]]:_(s16) = G_OR [[AND6]], [[SHL]]
				; GFX8: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[OR]](s16)
				; GFX8: [[ANYEXT1:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP]](s1)
				; GFX8: [[ANYEXT2:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP1]](s1)
				; GFX8: $vgpr0 = COPY [[ANYEXT]](s32)
				; GFX8: $vgpr1 = COPY [[ANYEXT1]](s32)
				; GFX8: $vgpr2 = COPY [[ANYEXT2]](s32)
				; GFX9-LABEL: name: test_umulo_v2s8
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr3
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 255
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[COPY]](s32)
				; GFX9: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C]]
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY [[COPY2]](s32)
				; GFX9: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY5]], [[C]]
				; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[AND]], [[AND1]]
				; GFX9: [[AND2:%[0-9]+]]:_(s32) = G_AND [[MUL]], [[C]]
				; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL]](s32), [[AND2]]
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY [[COPY1]](s32)
				; GFX9: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C]]
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY [[COPY3]](s32)
				; GFX9: [[AND4:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C]]
				; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[AND3]], [[AND4]]
				; GFX9: [[AND5:%[0-9]+]]:_(s32) = G_AND [[MUL1]], [[C]]
				; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL1]](s32), [[AND5]]
				; GFX9: [[C1:%[0-9]+]]:_(s16) = G_CONSTANT i16 255
				; GFX9: [[TRUNC:%[0-9]+]]:_(s16) = G_TRUNC [[MUL]](s32)
				; GFX9: [[AND6:%[0-9]+]]:_(s16) = G_AND [[TRUNC]], [[C1]]
				; GFX9: [[TRUNC1:%[0-9]+]]:_(s16) = G_TRUNC [[MUL1]](s32)
				; GFX9: [[AND7:%[0-9]+]]:_(s16) = G_AND [[TRUNC1]], [[C1]]
				; GFX9: [[C2:%[0-9]+]]:_(s16) = G_CONSTANT i16 8
				; GFX9: [[SHL:%[0-9]+]]:_(s16) = G_SHL [[AND7]], [[C2]](s16)
				; GFX9: [[OR:%[0-9]+]]:_(s16) = G_OR [[AND6]], [[SHL]]
				; GFX9: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[OR]](s16)
				; GFX9: [[ANYEXT1:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP]](s1)
				; GFX9: [[ANYEXT2:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP1]](s1)
				; GFX9: $vgpr0 = COPY [[ANYEXT]](s32)
				; GFX9: $vgpr1 = COPY [[ANYEXT1]](s32)
				; GFX9: $vgpr2 = COPY [[ANYEXT2]](s32)
				%0:_(s32) = COPY $vgpr0
				%1:_(s32) = COPY $vgpr1
				%2:_(s32) = COPY $vgpr2
				%3:_(s32) = COPY $vgpr3
				%5:_(s8) = G_TRUNC %0
				%6:_(s8) = G_TRUNC %1
				%7:_(s8) = G_TRUNC %2
				%8:_(s8) = G_TRUNC %3
				%11:_(<2 x s8>) = G_BUILD_VECTOR %5, %6
				%12:_(<2 x s8>) = G_BUILD_VECTOR %7, %8
				%13:_(<2 x s8>), %19:_(<2 x s1>) = G_UMULO %11, %12
				%20:_(<2 x s32>) = G_ZEXT %19
				%14:_(s8), %15:_(s8) = G_UNMERGE_VALUES %13
				%21:_(s1), %22:_(s1) = G_UNMERGE_VALUES %19
				%17:_(s16) = G_MERGE_VALUES %14, %15
				%18:_(s32) = G_ANYEXT %17
				%23:_(s32) = G_ANYEXT %21
				%24:_(s32) = G_ANYEXT %22
				$vgpr0 = COPY %18
				$vgpr1 = COPY %23
				$vgpr2 = COPY %24
				...

				---
				name: test_umulo_v4s8
				body: \|
				bb.0:
				liveins: $vgpr0, $vgpr1
				; GFX8-LABEL: name: test_umulo_v4s8
				; GFX8: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX8: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX8: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 8
				; GFX8: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[COPY]], [[C]](s32)
				; GFX8: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX8: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[COPY]], [[C1]](s32)
				; GFX8: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 24
				; GFX8: [[LSHR2:%[0-9]+]]:_(s32) = G_LSHR [[COPY]], [[C2]](s32)
				; GFX8: [[LSHR3:%[0-9]+]]:_(s32) = G_LSHR [[COPY1]], [[C]](s32)
				; GFX8: [[LSHR4:%[0-9]+]]:_(s32) = G_LSHR [[COPY1]], [[C1]](s32)
				; GFX8: [[LSHR5:%[0-9]+]]:_(s32) = G_LSHR [[COPY1]], [[C2]](s32)
				; GFX8: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 255
				; GFX8: [[COPY2:%[0-9]+]]:_(s32) = COPY [[COPY]](s32)
				; GFX8: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY2]], [[C3]]
				; GFX8: [[COPY3:%[0-9]+]]:_(s32) = COPY [[COPY1]](s32)
				; GFX8: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C3]]
				; GFX8: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[AND]], [[AND1]]
				; GFX8: [[AND2:%[0-9]+]]:_(s32) = G_AND [[MUL]], [[C3]]
				; GFX8: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL]](s32), [[AND2]]
				; GFX8: [[COPY4:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX8: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C3]]
				; GFX8: [[COPY5:%[0-9]+]]:_(s32) = COPY [[LSHR3]](s32)
				; GFX8: [[AND4:%[0-9]+]]:_(s32) = G_AND [[COPY5]], [[C3]]
				; GFX8: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[AND3]], [[AND4]]
				; GFX8: [[AND5:%[0-9]+]]:_(s32) = G_AND [[MUL1]], [[C3]]
				; GFX8: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL1]](s32), [[AND5]]
				; GFX8: [[COPY6:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX8: [[AND6:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C3]]
				; GFX8: [[COPY7:%[0-9]+]]:_(s32) = COPY [[LSHR4]](s32)
				; GFX8: [[AND7:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C3]]
				; GFX8: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[AND6]], [[AND7]]
				; GFX8: [[AND8:%[0-9]+]]:_(s32) = G_AND [[MUL2]], [[C3]]
				; GFX8: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL2]](s32), [[AND8]]
				; GFX8: [[COPY8:%[0-9]+]]:_(s32) = COPY [[LSHR2]](s32)
				; GFX8: [[AND9:%[0-9]+]]:_(s32) = G_AND [[COPY8]], [[C3]]
				; GFX8: [[COPY9:%[0-9]+]]:_(s32) = COPY [[LSHR5]](s32)
				; GFX8: [[AND10:%[0-9]+]]:_(s32) = G_AND [[COPY9]], [[C3]]
				; GFX8: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[AND9]], [[AND10]]
				; GFX8: [[AND11:%[0-9]+]]:_(s32) = G_AND [[MUL3]], [[C3]]
				; GFX8: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL3]](s32), [[AND11]]
				; GFX8: [[COPY10:%[0-9]+]]:_(s32) = COPY [[MUL]](s32)
				; GFX8: [[AND12:%[0-9]+]]:_(s32) = G_AND [[COPY10]], [[C3]]
				; GFX8: [[COPY11:%[0-9]+]]:_(s32) = COPY [[MUL1]](s32)
				; GFX8: [[AND13:%[0-9]+]]:_(s32) = G_AND [[COPY11]], [[C3]]
				; GFX8: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND13]], [[C]](s32)
				; GFX8: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND12]], [[SHL]]
				; GFX8: [[COPY12:%[0-9]+]]:_(s32) = COPY [[MUL2]](s32)
				; GFX8: [[AND14:%[0-9]+]]:_(s32) = G_AND [[COPY12]], [[C3]]
				; GFX8: [[SHL1:%[0-9]+]]:_(s32) = G_SHL [[AND14]], [[C1]](s32)
				; GFX8: [[OR1:%[0-9]+]]:_(s32) = G_OR [[OR]], [[SHL1]]
				; GFX8: [[COPY13:%[0-9]+]]:_(s32) = COPY [[MUL3]](s32)
				; GFX8: [[AND15:%[0-9]+]]:_(s32) = G_AND [[COPY13]], [[C3]]
				; GFX8: [[SHL2:%[0-9]+]]:_(s32) = G_SHL [[AND15]], [[C2]](s32)
				; GFX8: [[OR2:%[0-9]+]]:_(s32) = G_OR [[OR1]], [[SHL2]]
				; GFX8: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP]](s1)
				; GFX8: $vgpr0 = COPY [[OR2]](s32)
				; GFX8: $vgpr1 = COPY [[ANYEXT]](s32)
				; GFX9-LABEL: name: test_umulo_v4s8
				; GFX9: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
				; GFX9: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
				; GFX9: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 8
				; GFX9: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[COPY]], [[C]](s32)
				; GFX9: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
				; GFX9: [[LSHR1:%[0-9]+]]:_(s32) = G_LSHR [[COPY]], [[C1]](s32)
				; GFX9: [[C2:%[0-9]+]]:_(s32) = G_CONSTANT i32 24
				; GFX9: [[LSHR2:%[0-9]+]]:_(s32) = G_LSHR [[COPY]], [[C2]](s32)
				; GFX9: [[LSHR3:%[0-9]+]]:_(s32) = G_LSHR [[COPY1]], [[C]](s32)
				; GFX9: [[LSHR4:%[0-9]+]]:_(s32) = G_LSHR [[COPY1]], [[C1]](s32)
				; GFX9: [[LSHR5:%[0-9]+]]:_(s32) = G_LSHR [[COPY1]], [[C2]](s32)
				; GFX9: [[C3:%[0-9]+]]:_(s32) = G_CONSTANT i32 255
				; GFX9: [[COPY2:%[0-9]+]]:_(s32) = COPY [[COPY]](s32)
				; GFX9: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY2]], [[C3]]
				; GFX9: [[COPY3:%[0-9]+]]:_(s32) = COPY [[COPY1]](s32)
				; GFX9: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C3]]
				; GFX9: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[AND]], [[AND1]]
				; GFX9: [[AND2:%[0-9]+]]:_(s32) = G_AND [[MUL]], [[C3]]
				; GFX9: [[ICMP:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL]](s32), [[AND2]]
				; GFX9: [[COPY4:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
				; GFX9: [[AND3:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C3]]
				; GFX9: [[COPY5:%[0-9]+]]:_(s32) = COPY [[LSHR3]](s32)
				; GFX9: [[AND4:%[0-9]+]]:_(s32) = G_AND [[COPY5]], [[C3]]
				; GFX9: [[MUL1:%[0-9]+]]:_(s32) = G_MUL [[AND3]], [[AND4]]
				; GFX9: [[AND5:%[0-9]+]]:_(s32) = G_AND [[MUL1]], [[C3]]
				; GFX9: [[ICMP1:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL1]](s32), [[AND5]]
				; GFX9: [[COPY6:%[0-9]+]]:_(s32) = COPY [[LSHR1]](s32)
				; GFX9: [[AND6:%[0-9]+]]:_(s32) = G_AND [[COPY6]], [[C3]]
				; GFX9: [[COPY7:%[0-9]+]]:_(s32) = COPY [[LSHR4]](s32)
				; GFX9: [[AND7:%[0-9]+]]:_(s32) = G_AND [[COPY7]], [[C3]]
				; GFX9: [[MUL2:%[0-9]+]]:_(s32) = G_MUL [[AND6]], [[AND7]]
				; GFX9: [[AND8:%[0-9]+]]:_(s32) = G_AND [[MUL2]], [[C3]]
				; GFX9: [[ICMP2:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL2]](s32), [[AND8]]
				; GFX9: [[COPY8:%[0-9]+]]:_(s32) = COPY [[LSHR2]](s32)
				; GFX9: [[AND9:%[0-9]+]]:_(s32) = G_AND [[COPY8]], [[C3]]
				; GFX9: [[COPY9:%[0-9]+]]:_(s32) = COPY [[LSHR5]](s32)
				; GFX9: [[AND10:%[0-9]+]]:_(s32) = G_AND [[COPY9]], [[C3]]
				; GFX9: [[MUL3:%[0-9]+]]:_(s32) = G_MUL [[AND9]], [[AND10]]
				; GFX9: [[AND11:%[0-9]+]]:_(s32) = G_AND [[MUL3]], [[C3]]
				; GFX9: [[ICMP3:%[0-9]+]]:_(s1) = G_ICMP intpred(ne), [[MUL3]](s32), [[AND11]]
				; GFX9: [[COPY10:%[0-9]+]]:_(s32) = COPY [[MUL]](s32)
				; GFX9: [[AND12:%[0-9]+]]:_(s32) = G_AND [[COPY10]], [[C3]]
				; GFX9: [[COPY11:%[0-9]+]]:_(s32) = COPY [[MUL1]](s32)
				; GFX9: [[AND13:%[0-9]+]]:_(s32) = G_AND [[COPY11]], [[C3]]
				; GFX9: [[SHL:%[0-9]+]]:_(s32) = G_SHL [[AND13]], [[C]](s32)
				; GFX9: [[OR:%[0-9]+]]:_(s32) = G_OR [[AND12]], [[SHL]]
				; GFX9: [[COPY12:%[0-9]+]]:_(s32) = COPY [[MUL2]](s32)
				; GFX9: [[AND14:%[0-9]+]]:_(s32) = G_AND [[COPY12]], [[C3]]
				; GFX9: [[SHL1:%[0-9]+]]:_(s32) = G_SHL [[AND14]], [[C1]](s32)
				; GFX9: [[OR1:%[0-9]+]]:_(s32) = G_OR [[OR]], [[SHL1]]
				; GFX9: [[COPY13:%[0-9]+]]:_(s32) = COPY [[MUL3]](s32)
				; GFX9: [[AND15:%[0-9]+]]:_(s32) = G_AND [[COPY13]], [[C3]]
				; GFX9: [[SHL2:%[0-9]+]]:_(s32) = G_SHL [[AND15]], [[C2]](s32)
				; GFX9: [[OR2:%[0-9]+]]:_(s32) = G_OR [[OR1]], [[SHL2]]
				; GFX9: [[ANYEXT:%[0-9]+]]:_(s32) = G_ANYEXT [[ICMP]](s1)
				; GFX9: $vgpr0 = COPY [[OR2]](s32)
				; GFX9: $vgpr1 = COPY [[ANYEXT]](s32)
				%0:_(s32) = COPY $vgpr0
				%1:_(s32) = COPY $vgpr1
				%2:_(s8), %3:_(s8), %4:_(s8), %5:_(s8) = G_UNMERGE_VALUES %0
				%6:_(s8), %7:_(s8), %8:_(s8), %9:_(s8) = G_UNMERGE_VALUES %1
				%10:_(<4 x s8>) = G_BUILD_VECTOR %2:_(s8), %3:_(s8), %4:_(s8), %5:_(s8)
				%11:_(<4 x s8>) = G_BUILD_VECTOR %6:_(s8), %7:_(s8), %8:_(s8), %9:_(s8)
				%12:_(<4 x s8>), %18:_(<4 x s1>) = G_UMULO %10:_, %11:_
				%13:_(s8), %14:_(s8), %15:_(s8), %16:_(s8) = G_UNMERGE_VALUES %12:_(<4 x s8>)
				%19:_(s1), %20:_(s1), %21:_(s1), %22:_(s1) = G_UNMERGE_VALUES %18:_(<4 x s1>)
				%17:_(s32) = G_MERGE_VALUES %13, %14, %15, %16
				%23:_(s32) = G_ANYEXT %19
				$vgpr0 = COPY %17
				$vgpr1 = COPY %23
				...

This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel][AMDGPU] Lower G_UMULO/G_SMULOClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 323892

llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-smulo.mir

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulo.mir

[GlobalISel][AMDGPU] Lower G_UMULO/G_SMULO
ClosedPublic