This is an archive of the discontinued LLVM Phabricator instance.

Differential D103321

[AMDGPU] Stop mulhi from doing 24 bit mul for uniform values
ClosedPublic

Authored by dstuttard on May 28 2021, 8:34 AM.

Download Raw Diff

Details

Reviewers

foad

Commits

rGb8173c317812: [AMDGPU] Stop mulhi from doing 24 bit mul for uniform values

Summary

Added support to check if architecture supports s_mulhi which is used as part of
the decision whether or not to use valu 24 bit mul (if the mulhi gets
transformed to a valu op anyway, then may as well use it).

This is an extension of the work in D97063

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dstuttard created this revision.May 28 2021, 8:34 AM

Herald added subscribers: foad, kerbowa, hiraditya and 3 others. · View Herald TranscriptMay 28 2021, 8:34 AM

dstuttard requested review of this revision.May 28 2021, 8:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 28 2021, 8:34 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

dstuttard retitled this revision from Stop mulhi from doing 24 bit mul for uniform values to [AMDGPU] Stop mulhi from doing 24 bit mul for uniform values.May 28 2021, 8:35 AM

Herald added subscribers: t-tye, tpr, yaxunl and 2 others. · View Herald TranscriptMay 28 2021, 8:35 AM

Wasn't sure about the new HasSMulHi - any thoughts?

Harbormaster completed remote builds in B106721: Diff 348537.May 28 2021, 9:05 AM

foad added inline comments.May 28 2021, 9:08 AM

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
166	I would just put the >= GFX9 test in here. No need for a HasSMulHi field. There are plenty of precedents for this.

dstuttard added inline comments.Jun 18 2021, 3:39 AM

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
166	Having given this some thought I'm not sure that I agree - the precedent in the implementation is to do something like I've implemented. That's the consistent thing to do. Is there a good reason not to (other than it is slightly less code?) - implemented this way it is a lot clearer too.

foad added inline comments.Jun 18 2021, 4:01 AM

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
166	Both implementations ultimately say that s_mul_hi is supported on gfx9+. Mine just has fewer moving parts. I don't find yours clearer.

dstuttard added inline comments.Jun 18 2021, 9:17 AM

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
166	Until there's a variant at some point in the future that doesn't have SMulHi (although I guess the implementation can be changed if that ever occurs).

dstuttard marked an inline comment as done.Jun 21 2021, 6:07 AM

dstuttard added inline comments.

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
166	Actually - this is the way that this is handled in other cases. Because this test is inside AMDGPUISelLowering it has to use the SubTarget - so the test function has to be exposed from AMDGPUSubtarget - the getGeneration is only supported in GCNSubtarget (and defaults to false for the others - which is correct).

LGTM with some extra test coverage for s_mul_hi_u32.

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
166	Fair enough.
llvm/test/CodeGen/AMDGPU/mul_int24.ll
3	Could you also add GFX9 coverage in mul_uint24-amdgcn.ll?

This revision is now accepted and ready to land.Jun 21 2021, 6:30 AM

Adding mul_uint24-amdgcn.ll gfx9 variant

dstuttard marked 2 inline comments as done.Jun 21 2021, 7:08 AM

Harbormaster completed remote builds in B110193: Diff 353360.Jun 21 2021, 9:39 AM

This revision was landed with ongoing or failed builds.Jul 5 2021, 2:34 AM

Closed by commit rGb8173c317812: [AMDGPU] Stop mulhi from doing 24 bit mul for uniform values (authored by dstuttard). · Explain Why

This revision was automatically updated to reflect the committed changes.

dstuttard added a commit: rGb8173c317812: [AMDGPU] Stop mulhi from doing 24 bit mul for uniform values.

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUISelLowering.cpp

18 lines

AMDGPUSubtarget.h

5 lines

AMDGPUSubtarget.cpp

2 lines

test/

CodeGen/

AMDGPU/

mul_int24.ll

40 lines

mul_uint24-amdgcn.ll

58 lines

Diff 356455

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

	Show First 20 Lines • Show All 3,455 Lines • ▼ Show 20 Lines

	SDValue AMDGPUTargetLowering::performMulhsCombine(SDNode *N,			SDValue AMDGPUTargetLowering::performMulhsCombine(SDNode *N,
	DAGCombinerInfo &DCI) const {			DAGCombinerInfo &DCI) const {
	EVT VT = N->getValueType(0);			EVT VT = N->getValueType(0);

	if (!Subtarget->hasMulI24() \|\| VT.isVector())			if (!Subtarget->hasMulI24() \|\| VT.isVector())
	return SDValue();			return SDValue();

				// Don't generate 24-bit multiplies on values that are in SGPRs, since
				// we only have a 32-bit scalar multiply (avoid values being moved to VGPRs
				// unnecessarily). isDivergent() is used as an approximation of whether the
				// value is in an SGPR.
				// This doesn't apply if no s_mul_hi is available (since we'll end up with a
				// valu op anyway)
				if (Subtarget->hasSMulHi() && !N->isDivergent())
				return SDValue();

	SelectionDAG &DAG = DCI.DAG;			SelectionDAG &DAG = DCI.DAG;
	SDLoc DL(N);			SDLoc DL(N);

	SDValue N0 = N->getOperand(0);			SDValue N0 = N->getOperand(0);
	SDValue N1 = N->getOperand(1);			SDValue N1 = N->getOperand(1);

	if (!isI24(N0, DAG) \|\| !isI24(N1, DAG))			if (!isI24(N0, DAG) \|\| !isI24(N1, DAG))
	return SDValue();			return SDValue();

	N0 = DAG.getSExtOrTrunc(N0, DL, MVT::i32);			N0 = DAG.getSExtOrTrunc(N0, DL, MVT::i32);
	N1 = DAG.getSExtOrTrunc(N1, DL, MVT::i32);			N1 = DAG.getSExtOrTrunc(N1, DL, MVT::i32);

	SDValue Mulhi = DAG.getNode(AMDGPUISD::MULHI_I24, DL, MVT::i32, N0, N1);			SDValue Mulhi = DAG.getNode(AMDGPUISD::MULHI_I24, DL, MVT::i32, N0, N1);
	DCI.AddToWorklist(Mulhi.getNode());			DCI.AddToWorklist(Mulhi.getNode());
	return DAG.getSExtOrTrunc(Mulhi, DL, VT);			return DAG.getSExtOrTrunc(Mulhi, DL, VT);
	}			}

	SDValue AMDGPUTargetLowering::performMulhuCombine(SDNode *N,			SDValue AMDGPUTargetLowering::performMulhuCombine(SDNode *N,
	DAGCombinerInfo &DCI) const {			DAGCombinerInfo &DCI) const {
	EVT VT = N->getValueType(0);			EVT VT = N->getValueType(0);

	if (!Subtarget->hasMulU24() \|\| VT.isVector() \|\| VT.getSizeInBits() > 32)			if (!Subtarget->hasMulU24() \|\| VT.isVector() \|\| VT.getSizeInBits() > 32)
	return SDValue();			return SDValue();

				// Don't generate 24-bit multiplies on values that are in SGPRs, since
				// we only have a 32-bit scalar multiply (avoid values being moved to VGPRs
				// unnecessarily). isDivergent() is used as an approximation of whether the
				// value is in an SGPR.
				// This doesn't apply if no s_mul_hi is available (since we'll end up with a
				// valu op anyway)
				if (Subtarget->hasSMulHi() && !N->isDivergent())
				return SDValue();

	SelectionDAG &DAG = DCI.DAG;			SelectionDAG &DAG = DCI.DAG;
	SDLoc DL(N);			SDLoc DL(N);

	SDValue N0 = N->getOperand(0);			SDValue N0 = N->getOperand(0);
	SDValue N1 = N->getOperand(1);			SDValue N1 = N->getOperand(1);

	if (!isU24(N0, DAG) \|\| !isU24(N1, DAG))			if (!isU24(N0, DAG) \|\| !isU24(N1, DAG))
	return SDValue();			return SDValue();
	▲ Show 20 Lines • Show All 1,339 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h

Show First 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	protected:
bool Has16BitInsts;		bool Has16BitInsts;
bool HasMadMixInsts;		bool HasMadMixInsts;
bool HasMadMacF32Insts;		bool HasMadMacF32Insts;
bool HasDsSrc2Insts;		bool HasDsSrc2Insts;
bool HasSDWA;		bool HasSDWA;
bool HasVOP3PInsts;		bool HasVOP3PInsts;
bool HasMulI24;		bool HasMulI24;
bool HasMulU24;		bool HasMulU24;
		bool HasSMulHi;
bool HasInv2PiInlineImm;		bool HasInv2PiInlineImm;
bool HasFminFmaxLegacy;		bool HasFminFmaxLegacy;
bool EnablePromoteAlloca;		bool EnablePromoteAlloca;
bool HasTrigReducedRange;		bool HasTrigReducedRange;
unsigned MaxWavesPerEU;		unsigned MaxWavesPerEU;
unsigned LocalMemorySize;		unsigned LocalMemorySize;
char WavefrontSizeLog2;		char WavefrontSizeLog2;

▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	public:
bool hasMulI24() const {		bool hasMulI24() const {
return HasMulI24;		return HasMulI24;
}		}

bool hasMulU24() const {		bool hasMulU24() const {
return HasMulU24;		return HasMulU24;
}		}

		bool hasSMulHi() const {
		return HasSMulHi;
		foadUnsubmitted Not Done Reply Inline Actions I would just put the >= GFX9 test in here. No need for a HasSMulHi field. There are plenty of precedents for this. foad: I would just put the >= GFX9 test in here. No need for a HasSMulHi field. There are plenty of…
		dstuttardAuthorUnsubmitted Done Reply Inline Actions Having given this some thought I'm not sure that I agree - the precedent in the implementation is to do something like I've implemented. That's the consistent thing to do. Is there a good reason not to (other than it is slightly less code?) - implemented this way it is a lot clearer too. dstuttard: Having given this some thought I'm not sure that I agree - the precedent in the implementation…
		foadUnsubmitted Done Reply Inline Actions Both implementations ultimately say that s_mul_hi is supported on gfx9+. Mine just has fewer moving parts. I don't find yours clearer. foad: Both implementations ultimately say that s_mul_hi is supported on gfx9+. Mine just has fewer…
		dstuttardAuthorUnsubmitted Done Reply Inline Actions Until there's a variant at some point in the future that doesn't have SMulHi (although I guess the implementation can be changed if that ever occurs). dstuttard: Until there's a variant at some point in the future that doesn't have SMulHi (although I guess…
		dstuttardAuthorUnsubmitted Done Reply Inline Actions Actually - this is the way that this is handled in other cases. Because this test is inside AMDGPUISelLowering it has to use the SubTarget - so the test function has to be exposed from AMDGPUSubtarget - the getGeneration is only supported in GCNSubtarget (and defaults to false for the others - which is correct). dstuttard: Actually - this is the way that this is handled in other cases. Because this test is inside…
		foadUnsubmitted Done Reply Inline Actions Fair enough. foad: Fair enough.
		}

bool hasInv2PiInlineImm() const {		bool hasInv2PiInlineImm() const {
return HasInv2PiInlineImm;		return HasInv2PiInlineImm;
}		}

bool hasFminFmaxLegacy() const {		bool hasFminFmaxLegacy() const {
return HasFminFmaxLegacy;		return HasFminFmaxLegacy;
}		}

▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp

Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	if (!HasMovrel && !HasVGPRIndexMode)
HasMovrel = true;		HasMovrel = true;
}		}

// Don't crash on invalid devices.		// Don't crash on invalid devices.
if (WavefrontSizeLog2 == 0)		if (WavefrontSizeLog2 == 0)
WavefrontSizeLog2 = 5;		WavefrontSizeLog2 = 5;

HasFminFmaxLegacy = getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS;		HasFminFmaxLegacy = getGeneration() < AMDGPUSubtarget::VOLCANIC_ISLANDS;
		HasSMulHi = getGeneration() >= AMDGPUSubtarget::GFX9;

TargetID.setTargetIDFromFeaturesString(FS);		TargetID.setTargetIDFromFeaturesString(FS);

LLVM_DEBUG(dbgs() << "xnack setting for subtarget: "		LLVM_DEBUG(dbgs() << "xnack setting for subtarget: "
<< TargetID.getXnackSetting() << '\n');		<< TargetID.getXnackSetting() << '\n');
LLVM_DEBUG(dbgs() << "sramecc setting for subtarget: "		LLVM_DEBUG(dbgs() << "sramecc setting for subtarget: "
<< TargetID.getSramEccSetting() << '\n');		<< TargetID.getSramEccSetting() << '\n');

return *this;		return *this;
}		}

AMDGPUSubtarget::AMDGPUSubtarget(const Triple &TT) :		AMDGPUSubtarget::AMDGPUSubtarget(const Triple &TT) :
TargetTriple(TT),		TargetTriple(TT),
GCN3Encoding(false),		GCN3Encoding(false),
Has16BitInsts(false),		Has16BitInsts(false),
HasMadMixInsts(false),		HasMadMixInsts(false),
HasMadMacF32Insts(false),		HasMadMacF32Insts(false),
HasDsSrc2Insts(false),		HasDsSrc2Insts(false),
HasSDWA(false),		HasSDWA(false),
HasVOP3PInsts(false),		HasVOP3PInsts(false),
HasMulI24(true),		HasMulI24(true),
HasMulU24(true),		HasMulU24(true),
		HasSMulHi(false),
HasInv2PiInlineImm(false),		HasInv2PiInlineImm(false),
HasFminFmaxLegacy(true),		HasFminFmaxLegacy(true),
EnablePromoteAlloca(false),		EnablePromoteAlloca(false),
HasTrigReducedRange(false),		HasTrigReducedRange(false),
MaxWavesPerEU(10),		MaxWavesPerEU(10),
LocalMemorySize(0),		LocalMemorySize(0),
WavefrontSizeLog2(0)		WavefrontSizeLog2(0)
{ }		{ }
▲ Show 20 Lines • Show All 953 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/mul_int24.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,SI,FUNC,SIVI %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=VI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,VI,FUNC,SIVI %s
				; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,FUNC,GFX9 %s
				foadUnsubmitted Done Reply Inline Actions Could you also add GFX9 coverage in mul_uint24-amdgcn.ll? foad: Could you also add GFX9 coverage in mul_uint24-amdgcn.ll?
	; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s			; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s
	; RUN: llc -march=r600 -mcpu=cayman < %s \| FileCheck -check-prefix=CM -check-prefix=FUNC %s			; RUN: llc -march=r600 -mcpu=cayman < %s \| FileCheck -check-prefix=CM -check-prefix=FUNC %s

	; FUNC-LABEL: {{^}}test_smul24_i32:			; FUNC-LABEL: {{^}}test_smul24_i32:
	; GCN: s_mul_i32			; GCN: s_mul_i32

	; Signed 24-bit multiply is not supported on pre-Cayman GPUs.			; Signed 24-bit multiply is not supported on pre-Cayman GPUs.
	; EG: MULLO_INT			; EG: MULLO_INT

	; CM: MULLO_INT			; CM: MULLO_INT
	define amdgpu_kernel void @test_smul24_i32(i32 addrspace(1)* %out, i32 %a, i32 %b) #0 {			define amdgpu_kernel void @test_smul24_i32(i32 addrspace(1)* %out, i32 %a, i32 %b) #0 {
	entry:			entry:
	%a.shl = shl i32 %a, 8			%a.shl = shl i32 %a, 8
	%a.24 = ashr i32 %a.shl, 8			%a.24 = ashr i32 %a.shl, 8
	%b.shl = shl i32 %b, 8			%b.shl = shl i32 %b, 8
	%b.24 = ashr i32 %b.shl, 8			%b.24 = ashr i32 %b.shl, 8
	%mul24 = mul i32 %a.24, %b.24			%mul24 = mul i32 %a.24, %b.24
	store i32 %mul24, i32 addrspace(1)* %out			store i32 %mul24, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}test_smulhi24_i64:			; FUNC-LABEL: {{^}}test_smulhi24_i64:
	; GCN-NOT: bfe			; SIVI-NOT: bfe
	; GCN-NOT: ashr			; GCN-NOT: ashr
	; GCN: v_mul_hi_i32_i24_e32 [[RESULT:v[0-9]+]],			; SIVI: v_mul_hi_i32_i24_e32 [[RESULT:v[0-9]+]],
	; GCN-NEXT: buffer_store_dword [[RESULT]]			; GFX9: s_mul_hi_i32 [[RES1:s[0-9]+]],
				; GFX9: v_mov_b32_e32 [[RESULT:v[0-9]+]], [[RES1]]
				; GCN: buffer_store_dword [[RESULT]]

	; EG: ASHR			; EG: ASHR
	; EG: ASHR			; EG: ASHR
	; EG: MULHI_INT			; EG: MULHI_INT

	; CM-NOT: ASHR			; CM-NOT: ASHR
	; CM: MULHI_INT24			; CM: MULHI_INT24
	; CM: MULHI_INT24			; CM: MULHI_INT24
	Show All 20 Lines
	; multiple uses by the separate mul and mulhi.			; multiple uses by the separate mul and mulhi.

	; FUNC-LABEL: {{^}}test_smul24_i64:			; FUNC-LABEL: {{^}}test_smul24_i64:
	; GCN: s_load_dword s			; GCN: s_load_dword s
	; GCN: s_load_dword s			; GCN: s_load_dword s

	; GCN-NOT: ashr			; GCN-NOT: ashr

	; GCN-DAG: v_mul_hi_i32_i24_e32			; SIVI-DAG: v_mul_hi_i32_i24_e32
	; GCN-DAG: s_mul_i32			; SIVI-DAG: s_mul_i32
				; GFX9-DAG: s_mul_hi_i32
				; GFX9-DAG: s_mul_i32

	; GCN: buffer_store_dwordx2			; GCN: buffer_store_dwordx2
	define amdgpu_kernel void @test_smul24_i64(i64 addrspace(1)* %out, [8 x i32], i32 %a, [8 x i32], i32 %b) #0 {			define amdgpu_kernel void @test_smul24_i64(i64 addrspace(1)* %out, [8 x i32], i32 %a, [8 x i32], i32 %b) #0 {
	%shl.i = shl i32 %a, 8			%shl.i = shl i32 %a, 8
	%shr.i = ashr i32 %shl.i, 8			%shr.i = ashr i32 %shl.i, 8
	%conv.i = sext i32 %shr.i to i64			%conv.i = sext i32 %shr.i to i64
	%shl1.i = shl i32 %b, 8			%shl1.i = shl i32 %b, 8
	%shr2.i = ashr i32 %shl1.i, 8			%shr2.i = ashr i32 %shl1.i, 8
	%conv3.i = sext i32 %shr2.i to i64			%conv3.i = sext i32 %shr2.i to i64
	%mul.i = mul i64 %conv3.i, %conv.i			%mul.i = mul i64 %conv3.i, %conv.i
	store i64 %mul.i, i64 addrspace(1)* %out			store i64 %mul.i, i64 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}test_smul24_i64_square:			; FUNC-LABEL: {{^}}test_smul24_i64_square:
	; GCN: s_load_dword [[A:s[0-9]+]]			; GCN: s_load_dword [[A:s[0-9]+]]
	; GCN-DAG: v_mul_hi_i32_i24_e64 v{{[0-9]+}}, [[A]], [[A]]			; SIVI-DAG: v_mul_hi_i32_i24_e64 v{{[0-9]+}}, [[A]], [[A]]
	; GCN-DAG: s_mul_i32 s{{[0-9]+}}, [[A]], [[A]]			; SIVI-DAG: s_mul_i32 s{{[0-9]+}}, [[A]], [[A]]
				; GFX9: s_bfe_i32 [[B:s[0-9]+]], [[A]]
				; GFX9-DAG: s_mul_hi_i32 s{{[0-9]+}}, [[B]], [[B]]
				; GFX9-DAG: s_mul_i32 s{{[0-9]+}}, [[B]], [[B]]
	; GCN: buffer_store_dwordx2			; GCN: buffer_store_dwordx2
	define amdgpu_kernel void @test_smul24_i64_square(i64 addrspace(1)* %out, i32 %a, i32 %b) #0 {			define amdgpu_kernel void @test_smul24_i64_square(i64 addrspace(1)* %out, i32 %a, i32 %b) #0 {
	%shl.i = shl i32 %a, 8			%shl.i = shl i32 %a, 8
	%shr.i = ashr i32 %shl.i, 8			%shr.i = ashr i32 %shl.i, 8
	%conv.i = sext i32 %shr.i to i64			%conv.i = sext i32 %shr.i to i64
	%mul.i = mul i64 %conv.i, %conv.i			%mul.i = mul i64 %conv.i, %conv.i
	store i64 %mul.i, i64 addrspace(1)* %out			store i64 %mul.i, i64 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}test_smul24_i33:			; FUNC-LABEL: {{^}}test_smul24_i33:
	; GCN: s_load_dword s			; GCN: s_load_dword s
	; GCN: s_load_dword s			; GCN: s_load_dword s

	; GCN-NOT: and			; GCN-NOT: and
	; GCN-NOT: lshr			; GCN-NOT: lshr

	; GCN-DAG: s_mul_i32			; SIVI-DAG: s_mul_i32
	; GCN-DAG: v_mul_hi_i32_i24_e32			; SIVI-DAG: v_mul_hi_i32_i24_e32
	; SI: v_lshl_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, 31			; SI: v_lshl_b64 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, 31
	; SI: v_ashr_i64 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, 31			; SI: v_ashr_i64 v{{\[[0-9]+:[0-9]+\]}}, v{{\[[0-9]+:[0-9]+\]}}, 31

	; VI: v_lshlrev_b64 v{{\[[0-9]+:[0-9]+\]}}, 31, v{{\[[0-9]+:[0-9]+\]}}			; VI: v_lshlrev_b64 v{{\[[0-9]+:[0-9]+\]}}, 31, v{{\[[0-9]+:[0-9]+\]}}
	; VI: v_ashrrev_i64 v{{\[[0-9]+:[0-9]+\]}}, 31, v{{\[[0-9]+:[0-9]+\]}}			; VI: v_ashrrev_i64 v{{\[[0-9]+:[0-9]+\]}}, 31, v{{\[[0-9]+:[0-9]+\]}}

				; GFX9-DAG: s_mul_i32
				; GFX9-DAG: s_mul_hi_i32
				; GFX9: s_lshl_b64 s{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 31
				; GFX9: s_ashr_i64 s{{\[[0-9]+:[0-9]+\]}}, s{{\[[0-9]+:[0-9]+\]}}, 31

	; GCN: buffer_store_dwordx2			; GCN: buffer_store_dwordx2
	define amdgpu_kernel void @test_smul24_i33(i64 addrspace(1)* %out, i33 %a, i33 %b) #0 {			define amdgpu_kernel void @test_smul24_i33(i64 addrspace(1)* %out, i33 %a, i33 %b) #0 {
	entry:			entry:
	%a.shl = shl i33 %a, 9			%a.shl = shl i33 %a, 9
	%a.24 = ashr i33 %a.shl, 9			%a.24 = ashr i33 %a.shl, 9
	%b.shl = shl i33 %b, 9			%b.shl = shl i33 %b, 9
	%b.24 = ashr i33 %b.shl, 9			%b.24 = ashr i33 %b.shl, 9
	%mul24 = mul i33 %a.24, %b.24			%mul24 = mul i33 %a.24, %b.24
	%ext = sext i33 %mul24 to i64			%ext = sext i33 %mul24 to i64
	store i64 %ext, i64 addrspace(1)* %out			store i64 %ext, i64 addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}test_smulhi24_i33:			; FUNC-LABEL: {{^}}test_smulhi24_i33:
	; SI: s_load_dword s			; SI: s_load_dword s
	; SI: s_load_dword s			; SI: s_load_dword s

	; SI-NOT: bfe			; SI-NOT: bfe

	; SI: v_mul_hi_i32_i24_e32 v[[MUL_HI:[0-9]+]],			; SI: v_mul_hi_i32_i24_e32 v[[MUL_HI:[0-9]+]],
	; SI-NEXT: v_and_b32_e32 v[[HI:[0-9]+]], 1, v[[MUL_HI]]			; SI-NEXT: v_and_b32_e32 v[[HI:[0-9]+]], 1, v[[MUL_HI]]
	; SI-NEXT: buffer_store_dword v[[HI]]			; SI-NEXT: buffer_store_dword v[[HI]]

				; GFX9: s_mul_hi_i32 s[[MUL_HI:[0-9]+]],
				; GFX9-NEXT: s_and_b32 s[[HI:[0-9]+]], s[[MUL_HI]], 1
				; GFX9-NEXT: v_mov_b32_e32 v[[RES:[0-9]+]], s[[HI]]
				; GFX9-NEXT: buffer_store_dword v[[RES]]
	define amdgpu_kernel void @test_smulhi24_i33(i32 addrspace(1)* %out, i33 %a, i33 %b) {			define amdgpu_kernel void @test_smulhi24_i33(i32 addrspace(1)* %out, i33 %a, i33 %b) {
	entry:			entry:
	%tmp0 = shl i33 %a, 9			%tmp0 = shl i33 %a, 9
	%a_24 = ashr i33 %tmp0, 9			%a_24 = ashr i33 %tmp0, 9
	%tmp1 = shl i33 %b, 9			%tmp1 = shl i33 %b, 9
	%b_24 = ashr i33 %tmp1, 9			%b_24 = ashr i33 %tmp1, 9
	%tmp2 = mul i33 %a_24, %b_24			%tmp2 = mul i33 %a_24, %b_24
	%hi = lshr i33 %tmp2, 32			%hi = lshr i33 %tmp2, 32
	Show All 32 Lines

llvm/test/CodeGen/AMDGPU/mul_uint24-amdgcn.ll

; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SI,FUNC %s		; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SI,SIVI,FUNC %s
; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,VI,FUNC %s		; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,VI,SIVI,FUNC %s
		; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,GFX9,FUNC %s

declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone		declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
declare i32 @llvm.amdgcn.workitem.id.y() nounwind readnone		declare i32 @llvm.amdgcn.workitem.id.y() nounwind readnone

; FUNC-LABEL: {{^}}test_umul24_i32:		; FUNC-LABEL: {{^}}test_umul24_i32:
; GCN: s_mul_i32		; GCN: s_mul_i32
define amdgpu_kernel void @test_umul24_i32(i32 addrspace(1)* %out, i32 %a, i32 %b) {		define amdgpu_kernel void @test_umul24_i32(i32 addrspace(1)* %out, i32 %a, i32 %b) {
entry:		entry:
Show All 15 Lines	entry:
%ext = sext i16 %mul to i32		%ext = sext i16 %mul to i32
store i32 %ext, i32 addrspace(1)* %out		store i32 %ext, i32 addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}test_umul24_i16_vgpr_sext:		; FUNC-LABEL: {{^}}test_umul24_i16_vgpr_sext:
; SI: v_mul_u32_u24_e{{(32\|64)}} [[MUL:v[0-9]]], {{[sv][0-9], [sv][0-9]}}		; SI: v_mul_u32_u24_e{{(32\|64)}} [[MUL:v[0-9]]], {{[sv][0-9], [sv][0-9]}}
; VI: v_mul_lo_u16_e{{(32\|64)}} [[MUL:v[0-9]]], {{[sv][0-9], [sv][0-9]}}		; VI: v_mul_lo_u16_e{{(32\|64)}} [[MUL:v[0-9]]], {{[sv][0-9], [sv][0-9]}}
		; GFX9: v_mul_lo_u16_e{{(32\|64)}} [[MUL:v[0-9]]], {{[sv][0-9], [sv][0-9]}}
; GCN: v_bfe_i32 v{{[0-9]}}, [[MUL]], 0, 16		; GCN: v_bfe_i32 v{{[0-9]}}, [[MUL]], 0, 16
define amdgpu_kernel void @test_umul24_i16_vgpr_sext(i32 addrspace(1)* %out, i16 addrspace(1)* %in) {		define amdgpu_kernel void @test_umul24_i16_vgpr_sext(i32 addrspace(1)* %out, i16 addrspace(1)* %in) {
%tid.x = call i32 @llvm.amdgcn.workitem.id.x()		%tid.x = call i32 @llvm.amdgcn.workitem.id.x()
%tid.y = call i32 @llvm.amdgcn.workitem.id.y()		%tid.y = call i32 @llvm.amdgcn.workitem.id.y()
%ptr_a = getelementptr i16, i16 addrspace(1)* %in, i32 %tid.x		%ptr_a = getelementptr i16, i16 addrspace(1)* %in, i32 %tid.x
%ptr_b = getelementptr i16, i16 addrspace(1)* %in, i32 %tid.y		%ptr_b = getelementptr i16, i16 addrspace(1)* %in, i32 %tid.y
%a = load i16, i16 addrspace(1)* %ptr_a		%a = load i16, i16 addrspace(1)* %ptr_a
%b = load i16, i16 addrspace(1)* %ptr_b		%b = load i16, i16 addrspace(1)* %ptr_b
Show All 13 Lines	entry:
store i32 %ext, i32 addrspace(1)* %out		store i32 %ext, i32 addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}test_umul24_i16_vgpr:		; FUNC-LABEL: {{^}}test_umul24_i16_vgpr:
; SI: v_mul_u32_u24_e32		; SI: v_mul_u32_u24_e32
; SI: v_and_b32_e32		; SI: v_and_b32_e32
; VI: v_mul_lo_u16		; VI: v_mul_lo_u16
		; GFX9: v_mul_lo_u16
define amdgpu_kernel void @test_umul24_i16_vgpr(i32 addrspace(1)* %out, i16 addrspace(1)* %in) {		define amdgpu_kernel void @test_umul24_i16_vgpr(i32 addrspace(1)* %out, i16 addrspace(1)* %in) {
%tid.x = call i32 @llvm.amdgcn.workitem.id.x()		%tid.x = call i32 @llvm.amdgcn.workitem.id.x()
%tid.y = call i32 @llvm.amdgcn.workitem.id.y()		%tid.y = call i32 @llvm.amdgcn.workitem.id.y()
%ptr_a = getelementptr i16, i16 addrspace(1)* %in, i32 %tid.x		%ptr_a = getelementptr i16, i16 addrspace(1)* %in, i32 %tid.x
%ptr_b = getelementptr i16, i16 addrspace(1)* %in, i32 %tid.y		%ptr_b = getelementptr i16, i16 addrspace(1)* %in, i32 %tid.y
%a = load i16, i16 addrspace(1)* %ptr_a		%a = load i16, i16 addrspace(1)* %ptr_a
%b = load i16, i16 addrspace(1)* %ptr_b		%b = load i16, i16 addrspace(1)* %ptr_b
%mul = mul i16 %a, %b		%mul = mul i16 %a, %b
%val = zext i16 %mul to i32		%val = zext i16 %mul to i32
store i32 %val, i32 addrspace(1)* %out		store i32 %val, i32 addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}test_umul24_i8_vgpr:		; FUNC-LABEL: {{^}}test_umul24_i8_vgpr:
; SI: v_mul_u32_u24_e{{(32\|64)}} [[MUL:v[0-9]]], {{[sv][0-9], [sv][0-9]}}		; SI: v_mul_u32_u24_e{{(32\|64)}} [[MUL:v[0-9]]], {{[sv][0-9], [sv][0-9]}}
; VI: v_mul_lo_u16_e{{(32\|64)}} [[MUL:v[0-9]]], {{[sv][0-9], [sv][0-9]}}		; VI: v_mul_lo_u16_e{{(32\|64)}} [[MUL:v[0-9]]], {{[sv][0-9], [sv][0-9]}}
		; GFX9: v_mul_lo_u16_e{{(32\|64)}} [[MUL:v[0-9]]], {{[sv][0-9], [sv][0-9]}}
; GCN: v_bfe_i32 v{{[0-9]}}, [[MUL]], 0, 8		; GCN: v_bfe_i32 v{{[0-9]}}, [[MUL]], 0, 8
define amdgpu_kernel void @test_umul24_i8_vgpr(i32 addrspace(1)* %out, i8 addrspace(1)* %a, i8 addrspace(1)* %b) {		define amdgpu_kernel void @test_umul24_i8_vgpr(i32 addrspace(1)* %out, i8 addrspace(1)* %a, i8 addrspace(1)* %b) {
entry:		entry:
%tid.x = call i32 @llvm.amdgcn.workitem.id.x()		%tid.x = call i32 @llvm.amdgcn.workitem.id.x()
%tid.y = call i32 @llvm.amdgcn.workitem.id.y()		%tid.y = call i32 @llvm.amdgcn.workitem.id.y()
%a.ptr = getelementptr i8, i8 addrspace(1)* %a, i32 %tid.x		%a.ptr = getelementptr i8, i8 addrspace(1)* %a, i32 %tid.x
%b.ptr = getelementptr i8, i8 addrspace(1)* %b, i32 %tid.y		%b.ptr = getelementptr i8, i8 addrspace(1)* %b, i32 %tid.y
%a.l = load i8, i8 addrspace(1)* %a.ptr		%a.l = load i8, i8 addrspace(1)* %a.ptr
%b.l = load i8, i8 addrspace(1)* %b.ptr		%b.l = load i8, i8 addrspace(1)* %b.ptr
%mul = mul i8 %a.l, %b.l		%mul = mul i8 %a.l, %b.l
%ext = sext i8 %mul to i32		%ext = sext i8 %mul to i32
store i32 %ext, i32 addrspace(1)* %out		store i32 %ext, i32 addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}test_umulhi24_i32_i64:		; FUNC-LABEL: {{^}}test_umulhi24_i32_i64:
; GCN-NOT: and		; SIVI-NOT: and
; GCN: v_mul_hi_u32_u24_e32 [[RESULT:v[0-9]+]],		; SIVI: v_mul_hi_u32_u24_e32 [[RESULT:v[0-9]+]],
		; GFX9: s_mul_hi_u32 [[SRESULT:s[0-9]+]],
		; GFX9: v_mov_b32_e32 [[RESULT:v[0-9]+]], [[SRESULT]]
; GCN-NEXT: buffer_store_dword [[RESULT]]		; GCN-NEXT: buffer_store_dword [[RESULT]]
define amdgpu_kernel void @test_umulhi24_i32_i64(i32 addrspace(1)* %out, i32 %a, i32 %b) {		define amdgpu_kernel void @test_umulhi24_i32_i64(i32 addrspace(1)* %out, i32 %a, i32 %b) {
entry:		entry:
%a.24 = and i32 %a, 16777215		%a.24 = and i32 %a, 16777215
%b.24 = and i32 %b, 16777215		%b.24 = and i32 %b, 16777215
%a.24.i64 = zext i32 %a.24 to i64		%a.24.i64 = zext i32 %a.24 to i64
%b.24.i64 = zext i32 %b.24 to i64		%b.24.i64 = zext i32 %b.24 to i64
%mul48 = mul i64 %a.24.i64, %b.24.i64		%mul48 = mul i64 %a.24.i64, %b.24.i64
%mul48.hi = lshr i64 %mul48, 32		%mul48.hi = lshr i64 %mul48, 32
%mul24hi = trunc i64 %mul48.hi to i32		%mul24hi = trunc i64 %mul48.hi to i32
store i32 %mul24hi, i32 addrspace(1)* %out		store i32 %mul24hi, i32 addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}test_umulhi24:		; FUNC-LABEL: {{^}}test_umulhi24:
; GCN-NOT: and		; SIVI-NOT: and
; GCN: v_mul_hi_u32_u24_e32 [[RESULT:v[0-9]+]],		; SIVI: v_mul_hi_u32_u24_e32 [[RESULT:v[0-9]+]],
		; GFX9: s_mul_hi_u32 [[SRESULT:s[0-9]+]],
		; GFX9: v_mov_b32_e32 [[RESULT:v[0-9]+]], [[SRESULT]]
; GCN-NEXT: buffer_store_dword [[RESULT]]		; GCN-NEXT: buffer_store_dword [[RESULT]]
define amdgpu_kernel void @test_umulhi24(i32 addrspace(1)* %out, i64 %a, i64 %b) {		define amdgpu_kernel void @test_umulhi24(i32 addrspace(1)* %out, i64 %a, i64 %b) {
entry:		entry:
%a.24 = and i64 %a, 16777215		%a.24 = and i64 %a, 16777215
%b.24 = and i64 %b, 16777215		%b.24 = and i64 %b, 16777215
%mul48 = mul i64 %a.24, %b.24		%mul48 = mul i64 %a.24, %b.24
%mul48.hi = lshr i64 %mul48, 32		%mul48.hi = lshr i64 %mul48, 32
%mul24.hi = trunc i64 %mul48.hi to i32		%mul24.hi = trunc i64 %mul48.hi to i32
store i32 %mul24.hi, i32 addrspace(1)* %out		store i32 %mul24.hi, i32 addrspace(1)* %out
ret void		ret void
}		}

; Multiply with 24-bit inputs and 64-bit output.		; Multiply with 24-bit inputs and 64-bit output.
; FUNC-LABEL: {{^}}test_umul24_i64:		; FUNC-LABEL: {{^}}test_umul24_i64:
; GCN-NOT: lshr		; GCN-NOT: lshr
; GCN-DAG: s_mul_i32		; SIVI-DAG: s_mul_i32
; GCN-DAG: v_mul_hi_u32_u24_e32		; SIVI-DAG: v_mul_hi_u32_u24_e32
		; GFX9-DAG: s_mul_i32
		; GFX9-DAG: s_mul_hi_u32
; GCN: buffer_store_dwordx2		; GCN: buffer_store_dwordx2
define amdgpu_kernel void @test_umul24_i64(i64 addrspace(1)* %out, i64 %a, i64 %b) {		define amdgpu_kernel void @test_umul24_i64(i64 addrspace(1)* %out, i64 %a, i64 %b) {
entry:		entry:
%tmp0 = shl i64 %a, 40		%tmp0 = shl i64 %a, 40
%a_24 = lshr i64 %tmp0, 40		%a_24 = lshr i64 %tmp0, 40
%tmp1 = shl i64 %b, 40		%tmp1 = shl i64 %b, 40
%b_24 = lshr i64 %tmp1, 40		%b_24 = lshr i64 %tmp1, 40
%tmp2 = mul i64 %a_24, %b_24		%tmp2 = mul i64 %a_24, %b_24
store i64 %tmp2, i64 addrspace(1)* %out		store i64 %tmp2, i64 addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}test_umul24_i64_square:		; FUNC-LABEL: {{^}}test_umul24_i64_square:
; GCN: s_load_dword [[A:s[0-9]+]]		; GCN: s_load_dword [[A:s[0-9]+]]
; GCN: s_and_b32 [[B:s[0-9]+]], [[A]], 0xffffff		; GCN: s_and_b32 [[B:s[0-9]+]], [[A]], 0xffffff
; GCN-DAG: s_mul_i32 s{{[0-9]+}}, [[B]], [[B]]		; SIVI-DAG: s_mul_i32 s{{[0-9]+}}, [[B]], [[B]]
; GCN-DAG: v_mul_hi_u32_u24_e64 v{{[0-9]+}}, [[A]], [[A]]		; SIVI-DAG: v_mul_hi_u32_u24_e64 v{{[0-9]+}}, [[A]], [[A]]
		; GFX9-DAG: s_mul_i32 s{{[0-9]+}}, [[B]], [[B]]
		; GFX9-DAG: s_mul_hi_u32 s{{[0-9]+}}, [[B]], [[B]]
define amdgpu_kernel void @test_umul24_i64_square(i64 addrspace(1)* %out, [8 x i32], i64 %a) {		define amdgpu_kernel void @test_umul24_i64_square(i64 addrspace(1)* %out, [8 x i32], i64 %a) {
entry:		entry:
%tmp0 = shl i64 %a, 40		%tmp0 = shl i64 %a, 40
%a.24 = lshr i64 %tmp0, 40		%a.24 = lshr i64 %tmp0, 40
%tmp2 = mul i64 %a.24, %a.24		%tmp2 = mul i64 %a.24, %a.24
store i64 %tmp2, i64 addrspace(1)* %out		store i64 %tmp2, i64 addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}test_umulhi16_i32:		; FUNC-LABEL: {{^}}test_umulhi16_i32:
; GCN: s_and_b32		; GCN: s_and_b32
; GCN: s_and_b32		; GCN: s_and_b32
; GCN: s_mul_i32 [[MUL24:s[0-9]+]]		; GCN: s_mul_i32 [[MUL24:s[0-9]+]]
; GCN: s_lshr_b32 s{{[0-9]+}}, [[MUL24]], 16		; SIVI: s_lshr_b32 s{{[0-9]+}}, [[MUL24]], 16
		; GFX9: v_mov_b32_e32 [[RESULT:v[0-9]+]], [[MUL24]]
		; GFX9: global_store_short_d16_hi v{{[0-9]+}}, [[RESULT]]
define amdgpu_kernel void @test_umulhi16_i32(i16 addrspace(1)* %out, i32 %a, i32 %b) {		define amdgpu_kernel void @test_umulhi16_i32(i16 addrspace(1)* %out, i32 %a, i32 %b) {
entry:		entry:
%a.16 = and i32 %a, 65535		%a.16 = and i32 %a, 65535
%b.16 = and i32 %b, 65535		%b.16 = and i32 %b, 65535
%mul = mul i32 %a.16, %b.16		%mul = mul i32 %a.16, %b.16
%hi = lshr i32 %mul, 16		%hi = lshr i32 %mul, 16
%mulhi = trunc i32 %hi to i16		%mulhi = trunc i32 %hi to i16
store i16 %mulhi, i16 addrspace(1)* %out		store i16 %mulhi, i16 addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}test_umul24_i33:		; FUNC-LABEL: {{^}}test_umul24_i33:
; GCN: s_load_dword s		; GCN: s_load_dword s
; GCN: s_load_dword s		; GCN: s_load_dword s
; GCN-NOT: lshr		; GCN-NOT: lshr
; GCN-DAG: s_mul_i32 s[[MUL_LO:[0-9]+]],		; SIVI-DAG: s_mul_i32 s[[MUL_LO:[0-9]+]],
; GCN-DAG: v_mul_hi_u32_u24_e32 v[[MUL_HI:[0-9]+]],		; SIVI-DAG: v_mul_hi_u32_u24_e32 v[[MUL_HI:[0-9]+]],
; GCN-DAG: v_and_b32_e32 v[[HI:[0-9]+]], 1, v[[MUL_HI]]		; SIVI-DAG: v_and_b32_e32 v[[HI:[0-9]+]], 1, v[[MUL_HI]]
; GCN-DAG: v_mov_b32_e32 v[[LO:[0-9]+]], s[[MUL_LO]]		; SIVI-DAG: v_mov_b32_e32 v[[LO:[0-9]+]], s[[MUL_LO]]
		; GFX9-DAG: s_mul_i32 s[[MUL_LO:[0-9]+]],
		; GFX9-DAG: s_mul_hi_u32 s[[MUL_HI:[0-9]+]],
		; GFX9-DAG: s_and_b32 s[[AND_HI:[0-9]+]], s[[MUL_HI]], 1
		; GFX9-DAG: v_mov_b32_e32 v[[LO:[0-9]+]], s[[MUL_LO]]
		; GFX9-DAG: v_mov_b32_e32 v[[HI:[0-9]+]], s[[AND_HI]]
; GCN: buffer_store_dwordx2 v{{\[}}[[LO]]:[[HI]]{{\]}}		; GCN: buffer_store_dwordx2 v{{\[}}[[LO]]:[[HI]]{{\]}}
define amdgpu_kernel void @test_umul24_i33(i64 addrspace(1)* %out, i33 %a, i33 %b) {		define amdgpu_kernel void @test_umul24_i33(i64 addrspace(1)* %out, i33 %a, i33 %b) {
entry:		entry:
%tmp0 = shl i33 %a, 9		%tmp0 = shl i33 %a, 9
%a_24 = lshr i33 %tmp0, 9		%a_24 = lshr i33 %tmp0, 9
%tmp1 = shl i33 %b, 9		%tmp1 = shl i33 %b, 9
%b_24 = lshr i33 %tmp1, 9		%b_24 = lshr i33 %tmp1, 9
%tmp2 = mul i33 %a_24, %b_24		%tmp2 = mul i33 %a_24, %b_24
%ext = zext i33 %tmp2 to i64		%ext = zext i33 %tmp2 to i64
store i64 %ext, i64 addrspace(1)* %out		store i64 %ext, i64 addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}test_umulhi24_i33:		; FUNC-LABEL: {{^}}test_umulhi24_i33:
; GCN: s_load_dword s		; GCN: s_load_dword s
; GCN: s_load_dword s		; GCN: s_load_dword s
; GCN-NOT: and		; SIVI-NOT: and
; GCN-NOT: lshr		; GCN-NOT: lshr
; GCN: v_mul_hi_u32_u24_e32 v[[MUL_HI:[0-9]+]],		; SIVI: v_mul_hi_u32_u24_e32 v[[MUL_HI:[0-9]+]],
; GCN: v_and_b32_e32 v[[HI:[0-9]+]], 1, v[[MUL_HI]]		; SIVI: v_and_b32_e32 v[[HI:[0-9]+]], 1, v[[MUL_HI]]
		; GFX9: s_mul_hi_u32 s[[MUL_HI:[0-9]+]],
		; GFX9: s_and_b32 s[[AND_HI:[0-9]+]], s[[MUL_HI]], 1
		; GFX9: v_mov_b32_e32 v[[HI:[0-9]+]], s[[AND_HI]]
; GCN-NEXT: buffer_store_dword v[[HI]]		; GCN-NEXT: buffer_store_dword v[[HI]]
define amdgpu_kernel void @test_umulhi24_i33(i32 addrspace(1)* %out, i33 %a, i33 %b) {		define amdgpu_kernel void @test_umulhi24_i33(i32 addrspace(1)* %out, i33 %a, i33 %b) {
entry:		entry:
%tmp0 = shl i33 %a, 9		%tmp0 = shl i33 %a, 9
%a_24 = lshr i33 %tmp0, 9		%a_24 = lshr i33 %tmp0, 9
%tmp1 = shl i33 %b, 9		%tmp1 = shl i33 %b, 9
%b_24 = lshr i33 %tmp1, 9		%b_24 = lshr i33 %tmp1, 9
%tmp2 = mul i33 %a_24, %b_24		%tmp2 = mul i33 %a_24, %b_24
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines