Download Raw Diff

Details

Reviewers

arsenm
Petar.Avramovic
foad
paquette

Commits

rG41d6669f1f16: [GlobalISel][AMDGPU] Lower G_SMULH/G_UMULH

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pdhaliwal created this revision.Aug 10 2020, 7:09 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 10 2020, 7:09 AM

Herald added subscribers: llvm-commits, kerbowa, hiraditya and 3 others. · View Herald Transcript

pdhaliwal requested review of this revision.Aug 10 2020, 7:09 AM

Herald added a subscriber: wdng. · View Herald TranscriptAug 10 2020, 7:09 AM

removed unneeded changes

Harbormaster completed remote builds in B67708: Diff 284354.Aug 10 2020, 7:41 AM

Harbormaster completed remote builds in B67709: Diff 284356.Aug 10 2020, 7:45 AM

arsenm added inline comments.Aug 10 2020, 12:41 PM

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1764	no auto. Can't this use anyext?
1768	Something seems off to me about introducing a full multiply, and in whatever type the user requested. I think this only works if WideTy == 2 * OriginalType. Can you produce a mulh in the wider type? This seems more like a lowering
1769	Use Register, I would worry about introducing a copy of MachineOperand here
1771	LLT not auto
1775	ShiftAmt?
1775	Why isn't the shift amount WideTy.getSizeInBits() - Size? I don't understand - IsSigned
1819	Extra newline
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-smulh.mir
68	Can you add an 8 and 24-bit test?
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulh.mir
109	Can you add an 8 and 24-bit test?

arsenm added reviewers: Petar.Avramovic, foad, paquette.Aug 10 2020, 12:42 PM

Review comments

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
1764	I am a bit doubtful if G_ANYEXT would work here. From docs, it doesn't take care of higher bits.
1768	Yes, it would only work when WideTy == 2 * OriginalType. And now if I think again it is more of a lowering operation than widening as user is not always free to choose the wider type.
1775	To accomodate the sign bit in case of signed operation.
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-smulh.mir
68	24-bit case won't work as it requires 48-bit MUL op which is not working yet.
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulh.mir
109	Added 8-bit case. But, 24-bit case won't work as it requires 48-bit MUL op which is not working yet.

Harbormaster completed remote builds in B67845: Diff 284585.Aug 11 2020, 12:09 AM

pdhaliwal retitled this revision from [GlobalISel] widenScalar G_SMULH/G_UMULH to [GlobalISel][AMDGPU] Lower G_SMULH/G_UMULH.Aug 11 2020, 5:18 AM

Herald added subscribers: t-tye, tpr, dstuttard and 2 others. · View Herald TranscriptAug 11 2020, 5:18 AM

foad requested changes to this revision.Aug 11 2020, 5:36 AM

foad added inline comments.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
6034	I agree that anyext would not work here.
6037	Would this lowering also work for vector types, if you used LLT::scalarOrVector here?
6044	As Matt said you definitely should not subtract IsSigned here.

This revision now requires changes to proceed.Aug 11 2020, 5:36 AM

Added support for vector types.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
6044	I got confused in signed binary multiplication. For this operation, it is not required to subtract IsSigned.

Looks OK to me modulo one inline comment, as long as Matt has no further objections.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
6039–6041	Actually it would be neater to use `LLT::changeElementSize`.

This revision is now accepted and ready to land.Aug 11 2020, 9:07 AM

Harbormaster completed remote builds in B67925: Diff 284762.Aug 11 2020, 10:06 AM

arsenm added inline comments.Aug 13 2020, 6:16 AM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
542 ↗	(On Diff #284762)	The expansion can fully use packed instructions with VOP3P instructions. This should try to clamp the number of elements for 16-bit cases if available before scalarizing
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-smulh.mir
41	Should add <2 x s16>, <3 x 16> and <4 x s16> cases

Updated review comments.

Herald added a subscriber: danielkiss. · View Herald TranscriptAug 30 2020, 9:10 PM

Harbormaster completed remote builds in B70062: Diff 288886.Aug 30 2020, 10:13 PM

@arsenm , let me know if it is good to land.

arsenm added inline comments.Sep 2 2020, 5:07 PM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
545 ↗	(On Diff #288886)	This isn't the right logic, the intent is to go down to 2 elements for cases that can promote to <2 x i16>. s8 is't special here

arsenm requested changes to this revision.Sep 3 2020, 4:23 PM

This revision now requires changes to proceed.Sep 3 2020, 4:23 PM

Updated tests and clamping number of elements to 2

Harbormaster completed remote builds in B72000: Diff 292461.Sep 17 2020, 5:18 AM

arsenm added inline comments.Sep 17 2020, 6:40 AM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
597 ↗	(On Diff #292461)	This should be unnecessary

pdhaliwal added inline comments.Sep 17 2020, 9:25 PM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
597 ↗	(On Diff #292461)	If I drop this, the <2 x s32> case starts generating worse code. This is due to lowering coming into the picture which promotes the 32-bit mulh to 64-bit mul and then legalizing 64-bit mul. I can use VOP3P instruction only for S8. For others, I need to specify the scalarization.

arsenm added inline comments.Sep 18 2020, 9:40 AM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
597 ↗	(On Diff #292461)	This should be an unconditional scalarize. The scalarization shouldn't cause a 64-bit multiply to be used

pdhaliwal added inline comments.Sep 20 2020, 9:14 PM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
597 ↗	(On Diff #292461)	Hmm, unconditional scalarize would remove the possibility of using vector path for <2 x s8>. This is bit different from other operations like MUL, ADD where <2 x s16> would have been legal and unconditional scalarization would have worked. The whole point of having the scalarization conditional is because <2 x s8> can easily use <2 x s16> MUL from lowering path. And as <2 x s16> is legal for AMDGPU, the lowering will correctly use vector operations. Unconditional scalarization would simply make logic of using vector ops void.

arsenm added inline comments.Sep 22 2020, 6:19 AM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
597 ↗	(On Diff #292461)	You already handled this case with the first fewerElementsIf, the second one just handles everything else. It doesn't need to specify not -s8

Added lowerFor({V2S8})

Removed unused code

arsenm added inline comments.Sep 22 2020, 7:13 AM

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
597 ↗	(On Diff #293449)	Put the actions on separate lines
600 ↗	(On Diff #293449)	Separate lines
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulh.mir
567	Shouldn't use implicit uses of s8 values. I'm trying to fix implicit uses with illegal register types because we can't ultimately legalize these

Harbormaster completed remote builds in B72515: Diff 293449.Sep 22 2020, 7:45 AM

Harbormaster completed remote builds in B72514: Diff 293446.

Formatting and removed implicit uses

pdhaliwal marked 4 inline comments as done.Sep 22 2020, 9:34 PM

Harbormaster completed remote builds in B72618: Diff 293638.Sep 22 2020, 10:10 PM

arsenm accepted this revision.Sep 23 2020, 6:10 AM

This revision is now accepted and ready to land.Sep 23 2020, 6:10 AM

This revision was landed with ongoing or failed builds.Sep 23 2020, 7:26 PM

Closed by commit rG41d6669f1f16: [GlobalISel][AMDGPU] Lower G_SMULH/G_UMULH (authored by pdhaliwal). · Explain Why

This revision was automatically updated to reflect the committed changes.

pdhaliwal added a commit: rG41d6669f1f16: [GlobalISel][AMDGPU] Lower G_SMULH/G_UMULH.

Diff 284356

llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h

Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	private:
LegalizeResult		LegalizeResult
widenScalarUnmergeValues(MachineInstr &MI, unsigned TypeIdx, LLT WideTy);		widenScalarUnmergeValues(MachineInstr &MI, unsigned TypeIdx, LLT WideTy);
LegalizeResult		LegalizeResult
widenScalarExtract(MachineInstr &MI, unsigned TypeIdx, LLT WideTy);		widenScalarExtract(MachineInstr &MI, unsigned TypeIdx, LLT WideTy);
LegalizeResult		LegalizeResult
widenScalarInsert(MachineInstr &MI, unsigned TypeIdx, LLT WideTy);		widenScalarInsert(MachineInstr &MI, unsigned TypeIdx, LLT WideTy);
LegalizeResult		LegalizeResult
widenScalarAddSubShlSat(MachineInstr &MI, unsigned TypeIdx, LLT WideTy);		widenScalarAddSubShlSat(MachineInstr &MI, unsigned TypeIdx, LLT WideTy);
		LegalizeResult widenScalarMulh(MachineInstr &MI, unsigned TypeIdx,
		LLT WideTy);

/// Helper function to split a wide generic register into bitwise blocks with		/// Helper function to split a wide generic register into bitwise blocks with
/// the given Type (which implies the number of blocks needed). The generic		/// the given Type (which implies the number of blocks needed). The generic
/// registers created are appended to Ops, starting at bit 0 of Reg.		/// registers created are appended to Ops, starting at bit 0 of Reg.
void extractParts(Register Reg, LLT Ty, int NumParts,		void extractParts(Register Reg, LLT Ty, int NumParts,
SmallVectorImpl<Register> &VRegs);		SmallVectorImpl<Register> &VRegs);

/// Version which handles irregular splits.		/// Version which handles irregular splits.
▲ Show 20 Lines • Show All 199 Lines • Show Last 20 Lines

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

Show First 20 Lines • Show All 1,752 Lines • ▼ Show 20 Lines	auto Result = IsSigned ? MIRBuilder.buildAShr(WideTy, WideInst, ShiftK)
: MIRBuilder.buildLShr(WideTy, WideInst, ShiftK);		: MIRBuilder.buildLShr(WideTy, WideInst, ShiftK);

MIRBuilder.buildTrunc(DstReg, Result);		MIRBuilder.buildTrunc(DstReg, Result);
MI.eraseFromParent();		MI.eraseFromParent();
return Legalized;		return Legalized;
}		}

LegalizerHelper::LegalizeResult		LegalizerHelper::LegalizeResult
		LegalizerHelper::widenScalarMulh(MachineInstr &MI, unsigned TypeIdx,
		LLT WideTy) {
		bool IsSigned = MI.getOpcode() == TargetOpcode::G_SMULH;
		auto ExtOp = IsSigned ? TargetOpcode::G_SEXT : TargetOpcode::G_ZEXT;
		arsenmUnsubmitted Not Done Reply Inline Actions no auto. Can't this use anyext? arsenm: no auto. Can't this use anyext?
		pdhaliwalAuthorUnsubmitted Done Reply Inline Actions I am a bit doubtful if G_ANYEXT would work here. From docs, it doesn't take care of higher bits. pdhaliwal: I am a bit doubtful if G_ANYEXT would work here. From docs, it doesn't take care of [[ https…

		auto LHS = MIRBuilder.buildInstr(ExtOp, {WideTy}, {MI.getOperand(1)});
		auto RHS = MIRBuilder.buildInstr(ExtOp, {WideTy}, {MI.getOperand(2)});
		auto Mul = MIRBuilder.buildMul(WideTy, LHS, RHS);
		arsenmUnsubmitted Not Done Reply Inline Actions Something seems off to me about introducing a full multiply, and in whatever type the user requested. I think this only works if WideTy == 2 * OriginalType. Can you produce a mulh in the wider type? This seems more like a lowering arsenm: Something seems off to me about introducing a full multiply, and in whatever type the user…
		pdhaliwalAuthorUnsubmitted Done Reply Inline Actions Yes, it would only work when WideTy == 2 * OriginalType. And now if I think again it is more of a lowering operation than widening as user is not always free to choose the wider type. pdhaliwal: Yes, it would only work when WideTy == 2 * OriginalType. And now if I think again it is more…
		auto Result = MI.getOperand(0);
		arsenmUnsubmitted Done Reply Inline Actions Use Register, I would worry about introducing a copy of MachineOperand here arsenm: Use Register, I would worry about introducing a copy of MachineOperand here

		auto Ty = MRI.getType(MI.getOperand(0).getReg());
		arsenmUnsubmitted Done Reply Inline Actions LLT not auto arsenm: LLT not auto
		auto Size = Ty.getScalarSizeInBits();
		auto ShiftOp = IsSigned ? TargetOpcode::G_ASHR : TargetOpcode::G_LSHR;

		auto Shamt = MIRBuilder.buildConstant(WideTy, Size - IsSigned);
		arsenmUnsubmitted Done Reply Inline Actions ShiftAmt? arsenm: ShiftAmt?
		arsenmUnsubmitted Done Reply Inline Actions Why isn't the shift amount WideTy.getSizeInBits() - Size? I don't understand - IsSigned arsenm: Why isn't the shift amount WideTy.getSizeInBits() - Size? I don't understand - IsSigned
		pdhaliwalAuthorUnsubmitted Done Reply Inline Actions To accomodate the sign bit in case of signed operation. pdhaliwal: To accomodate the sign bit in case of signed operation.
		auto Shifted = MIRBuilder.buildInstr(ShiftOp, {WideTy}, {Mul, Shamt});
		MIRBuilder.buildTrunc(Result, Shifted);

		MI.eraseFromParent();
		return Legalized;
		}

		LegalizerHelper::LegalizeResult
LegalizerHelper::widenScalar(MachineInstr &MI, unsigned TypeIdx, LLT WideTy) {		LegalizerHelper::widenScalar(MachineInstr &MI, unsigned TypeIdx, LLT WideTy) {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
default:		default:
return UnableToLegalize;		return UnableToLegalize;
case TargetOpcode::G_EXTRACT:		case TargetOpcode::G_EXTRACT:
return widenScalarExtract(MI, TypeIdx, WideTy);		return widenScalarExtract(MI, TypeIdx, WideTy);
case TargetOpcode::G_INSERT:		case TargetOpcode::G_INSERT:
return widenScalarInsert(MI, TypeIdx, WideTy);		return widenScalarInsert(MI, TypeIdx, WideTy);
Show All 19 Lines	auto AndOp = MIRBuilder.buildAnd(
WideTy, NewOp, MIRBuilder.buildConstant(WideTy, Mask));		WideTy, NewOp, MIRBuilder.buildConstant(WideTy, Mask));
// There is no overflow if the AndOp is the same as NewOp.		// There is no overflow if the AndOp is the same as NewOp.
MIRBuilder.buildICmp(CmpInst::ICMP_NE, MI.getOperand(1), NewOp, AndOp);		MIRBuilder.buildICmp(CmpInst::ICMP_NE, MI.getOperand(1), NewOp, AndOp);
// Now trunc the NewOp to the original result.		// Now trunc the NewOp to the original result.
MIRBuilder.buildTrunc(MI.getOperand(0), NewOp);		MIRBuilder.buildTrunc(MI.getOperand(0), NewOp);
MI.eraseFromParent();		MI.eraseFromParent();
return Legalized;		return Legalized;
}		}

		arsenmUnsubmitted Done Reply Inline Actions Extra newline arsenm: Extra newline
		case TargetOpcode::G_UMULH:
		case TargetOpcode::G_SMULH:
		return widenScalarMulh(MI, TypeIdx, WideTy);
case TargetOpcode::G_SADDSAT:		case TargetOpcode::G_SADDSAT:
case TargetOpcode::G_SSUBSAT:		case TargetOpcode::G_SSUBSAT:
case TargetOpcode::G_SSHLSAT:		case TargetOpcode::G_SSHLSAT:
case TargetOpcode::G_UADDSAT:		case TargetOpcode::G_UADDSAT:
case TargetOpcode::G_USUBSAT:		case TargetOpcode::G_USUBSAT:
case TargetOpcode::G_USHLSAT:		case TargetOpcode::G_USHLSAT:
return widenScalarAddSubShlSat(MI, TypeIdx, WideTy);		return widenScalarAddSubShlSat(MI, TypeIdx, WideTy);
case TargetOpcode::G_CTTZ:		case TargetOpcode::G_CTTZ:
▲ Show 20 Lines • Show All 4,190 Lines • ▼ Show 20 Lines	LegalizerHelper::lowerReadWriteRegister(MachineInstr &MI) {

if (IsRead)		if (IsRead)
MIRBuilder.buildCopy(ValReg, PhysReg);		MIRBuilder.buildCopy(ValReg, PhysReg);
else		else
MIRBuilder.buildCopy(PhysReg, ValReg);		MIRBuilder.buildCopy(PhysReg, ValReg);

MI.eraseFromParent();		MI.eraseFromParent();
return Legalized;		return Legalized;
}		}
		foadUnsubmitted Done Reply Inline Actions Would this lowering also work for vector types, if you used LLT::scalarOrVector here? foad: Would this lowering also work for vector types, if you used LLT::scalarOrVector here?
		foadUnsubmitted Not Done Reply Inline Actions I agree that anyext would not work here. foad: I agree that anyext would not work here.
		foadUnsubmitted Done Reply Inline Actions As Matt said you definitely should not subtract IsSigned here. foad: As Matt said you definitely should not subtract IsSigned here.
		pdhaliwalAuthorUnsubmitted Done Reply Inline Actions I got confused in signed binary multiplication. For this operation, it is not required to subtract IsSigned. pdhaliwal: I got confused in signed binary multiplication. For this operation, it is not required to…
		foadUnsubmitted Not Done Reply Inline Actions Actually it would be neater to use `LLT::changeElementSize`. foad: Actually it would be neater to use `LLT::changeElementSize`.

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-smulh.mir

Show All 32 Lines	bb.0:
; CHECK: [[SMULH1:%[0-9]+]]:_(s32) = G_SMULH [[UV1]], [[UV3]]		; CHECK: [[SMULH1:%[0-9]+]]:_(s32) = G_SMULH [[UV1]], [[UV3]]
; CHECK: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SMULH]](s32), [[SMULH1]](s32)		; CHECK: [[BUILD_VECTOR:%[0-9]+]]:_(<2 x s32>) = G_BUILD_VECTOR [[SMULH]](s32), [[SMULH1]](s32)
; CHECK: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>)		; CHECK: $vgpr0_vgpr1 = COPY [[BUILD_VECTOR]](<2 x s32>)
%0:_(<2 x s32>) = COPY $vgpr0_vgpr1		%0:_(<2 x s32>) = COPY $vgpr0_vgpr1
%1:_(<2 x s32>) = COPY $vgpr2_vgpr3		%1:_(<2 x s32>) = COPY $vgpr2_vgpr3
%2:_(<2 x s32>) = G_SMULH %0, %1		%2:_(<2 x s32>) = G_SMULH %0, %1
$vgpr0_vgpr1 = COPY %2		$vgpr0_vgpr1 = COPY %2
...		...

		arsenmUnsubmitted Done Reply Inline Actions Should add <2 x s16>, <3 x 16> and <4 x s16> cases arsenm: Should add <2 x s16>, <3 x 16> and <4 x s16> cases
		---
		name: test_smulh_s16
		body: \|
		bb.0:
		liveins: $vgpr0, $vgpr1

		; CHECK-LABEL: name: test_smulh_s16
		; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
		; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
		; CHECK: [[COPY2:%[0-9]+]]:_(s32) = COPY [[COPY]](s32)
		; CHECK: [[SEXT_INREG:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY2]], 16
		; CHECK: [[COPY3:%[0-9]+]]:_(s32) = COPY [[COPY1]](s32)
		; CHECK: [[SEXT_INREG1:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY3]], 16
		; CHECK: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[SEXT_INREG]], [[SEXT_INREG1]]
		; CHECK: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 15
		; CHECK: [[ASHR:%[0-9]+]]:_(s32) = G_ASHR [[MUL]], [[C]](s32)
		; CHECK: [[COPY4:%[0-9]+]]:_(s32) = COPY [[ASHR]](s32)
		; CHECK: [[SEXT_INREG2:%[0-9]+]]:_(s32) = G_SEXT_INREG [[COPY4]], 16
		; CHECK: $vgpr0 = COPY [[SEXT_INREG2]](s32)
		%0:_(s32) = COPY $vgpr0
		%1:_(s32) = COPY $vgpr1
		%2:_(s16) = G_TRUNC %0
		%3:_(s16) = G_TRUNC %1
		%4:_(s16) = G_SMULH %2, %3
		%5:_(s32) = G_SEXT %4
		$vgpr0 = COPY %5
		...
		arsenmUnsubmitted Not Done Reply Inline Actions Can you add an 8 and 24-bit test? arsenm: Can you add an 8 and 24-bit test?
		pdhaliwalAuthorUnsubmitted Done Reply Inline Actions 24-bit case won't work as it requires 48-bit MUL op which is not working yet. pdhaliwal: 24-bit case won't work as it requires 48-bit MUL op which is not working yet.

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulh.mir

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	bb.0:
; CHECK: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH3]], [[ADD2]]		; CHECK: [[ADD3:%[0-9]+]]:_(s32) = G_ADD [[UMULH3]], [[ADD2]]
; CHECK: [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[UADDO8]](s32), [[ADD3]](s32)		; CHECK: [[MV:%[0-9]+]]:_(s64) = G_MERGE_VALUES [[UADDO8]](s32), [[ADD3]](s32)
; CHECK: $vgpr0_vgpr1 = COPY [[MV]](s64)		; CHECK: $vgpr0_vgpr1 = COPY [[MV]](s64)
%0:_(s64) = COPY $vgpr0_vgpr1		%0:_(s64) = COPY $vgpr0_vgpr1
%1:_(s64) = COPY $vgpr2_vgpr3		%1:_(s64) = COPY $vgpr2_vgpr3
%2:_(s64) = G_UMULH %0, %1		%2:_(s64) = G_UMULH %0, %1
$vgpr0_vgpr1 = COPY %2		$vgpr0_vgpr1 = COPY %2
...		...

		---
		name: test_umulh_s16
		body: \|
		bb.0:
		liveins: $vgpr0, $vgpr1

		; CHECK-LABEL: name: test_umulh_s16
		; CHECK: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
		; CHECK: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
		; CHECK: [[C:%[0-9]+]]:_(s32) = G_CONSTANT i32 65535
		; CHECK: [[COPY2:%[0-9]+]]:_(s32) = COPY [[COPY]](s32)
		; CHECK: [[AND:%[0-9]+]]:_(s32) = G_AND [[COPY2]], [[C]]
		; CHECK: [[COPY3:%[0-9]+]]:_(s32) = COPY [[COPY1]](s32)
		; CHECK: [[AND1:%[0-9]+]]:_(s32) = G_AND [[COPY3]], [[C]]
		; CHECK: [[MUL:%[0-9]+]]:_(s32) = G_MUL [[AND]], [[AND1]]
		; CHECK: [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 16
		; CHECK: [[LSHR:%[0-9]+]]:_(s32) = G_LSHR [[MUL]], [[C1]](s32)
		; CHECK: [[COPY4:%[0-9]+]]:_(s32) = COPY [[LSHR]](s32)
		; CHECK: [[AND2:%[0-9]+]]:_(s32) = G_AND [[COPY4]], [[C]]
		; CHECK: $vgpr0 = COPY [[AND2]](s32)
		%0:_(s32) = COPY $vgpr0
		%1:_(s32) = COPY $vgpr1
		%2:_(s16) = G_TRUNC %0
		%3:_(s16) = G_TRUNC %1
		%4:_(s16) = G_UMULH %2, %3
		%5:_(s32) = G_ZEXT %4
		$vgpr0 = COPY %5
		...
		arsenmUnsubmitted Not Done Reply Inline Actions Can you add an 8 and 24-bit test? arsenm: Can you add an 8 and 24-bit test?
		pdhaliwalAuthorUnsubmitted Done Reply Inline Actions Added 8-bit case. But, 24-bit case won't work as it requires 48-bit MUL op which is not working yet. pdhaliwal: Added 8-bit case. But, 24-bit case won't work as it requires 48-bit MUL op which is not working…
		arsenmUnsubmitted Done Reply Inline Actions Shouldn't use implicit uses of s8 values. I'm trying to fix implicit uses with illegal register types because we can't ultimately legalize these arsenm: Shouldn't use implicit uses of s8 values. I'm trying to fix implicit uses with illegal register…

This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel][AMDGPU] Lower G_SMULH/G_UMULH
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 284356

llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-smulh.mir

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulh.mir

This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel][AMDGPU] Lower G_SMULH/G_UMULHClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 284356

llvm/include/llvm/CodeGen/GlobalISel/LegalizerHelper.h

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-smulh.mir

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-umulh.mir

[GlobalISel][AMDGPU] Lower G_SMULH/G_UMULH
ClosedPublic