Download Raw Diff

Details

Reviewers

RKSimon
MatzeB
foad

Commits

rG83cb9632a13d: [DAGCombiner] Add support for mulhi const folding in DAGCombiner

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dstuttard created this revision.May 28 2021, 8:42 AM

Herald added subscribers: ecnelises, hiraditya. · View Herald TranscriptMay 28 2021, 8:42 AM

dstuttard requested review of this revision.May 28 2021, 8:42 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 28 2021, 8:42 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

dstuttard added reviewers: RKSimon, MatzeB, foad.May 28 2021, 8:43 AM

RKSimon added inline comments.May 28 2021, 8:50 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
5086	APInt::extractBits ?
5092	APInt::extractBits ?

Test cases would be useful as well of course

Harbormaster completed remote builds in B106723: Diff 348540.May 28 2021, 9:02 AM

foad added inline comments.May 28 2021, 9:03 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
5084	You don't need "OrTrunc" here and below.

Thanks for reviews.
Made suggested changes.

I'll add some test cases as well - but I'm off for a few days. I'll do it on return.

dstuttard marked 3 inline comments as done.May 28 2021, 9:14 AM

Harbormaster completed remote builds in B106731: Diff 348550.May 28 2021, 9:59 AM

craig.topper added a subscriber: craig.topper.May 28 2021, 10:18 AM

craig.topper added inline comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4467	Use DL variable that already exists
4519	Use DL variable that already exists

Updating for review comments.

Tests still to be added...

dstuttard marked 2 inline comments as done.Jun 17 2021, 1:14 AM

Harbormaster completed remote builds in B109658: Diff 352637.Jun 17 2021, 9:37 AM

In D103323#2823907, @dstuttard wrote:

Updating for review comments.

Tests still to be added...

Do you have any in mind? I can add SSE2 vector examples if that would help? But scalars are trickier

Add test

Herald added subscribers: kerbowa, nhaehnle, jvesely. · View Herald TranscriptJun 18 2021, 3:22 AM

You can simplify the test case to:

define amdgpu_cs i64 @main(i64 %arg) {
entry:
  %d = udiv i64 %arg, 100000
  ret i64 %d
}

and it still shows the effect. Surely there are already some tests for i64 divide-by-constant that you could tweak, rather than adding a whole new file.

llvm/test/CodeGen/AMDGPU/dagcombine-mulhs-const.ll
5 ↗	(On Diff #352960)	Obviously folding the mul_hi is good, but the s_add that you check for looks like this: s_mov_b32 s0, 0x346d900 ... s_add_u32 s0, 0x4237, s0 so it should also be folded to a constant!

Harbormaster completed remote builds in B109894: Diff 352960.Jun 18 2021, 7:20 PM

RKSimon mentioned this in rGcc38f8939da4: [X86][SSE] Add mulhu/mulhs constant folding tests.Jul 3 2021, 9:02 AM

@dstuttard Please can you rebase? rGcc38f8939da4aec85e7d0ef4de412e30d4de5a14 should give you vector coverage

Updated existing test based on feedback

Rebase and adjust X86 test that now folds

Herald added a subscriber: pengfei. · View Herald TranscriptJul 5 2021, 1:47 AM

In D103323#2856828, @RKSimon wrote:

@dstuttard Please can you rebase? rGcc38f8939da4aec85e7d0ef4de412e30d4de5a14 should give you vector coverage

Thanks - have rebased and updated the test

dstuttard added inline comments.Jul 5 2021, 1:50 AM

llvm/test/CodeGen/AMDGPU/dagcombine-mulhs-const.ll
5 ↗	(On Diff #352960)	Yes - that could be another one to do - then fix up this test (or not worry about it at all given that there's now an X86 test that tests this, thanks to Simon)

Cheers!

llvm/test/CodeGen/AMDGPU/udiv.ll
206	Possibly pre-commit this to show current codegen? Do we need a GFX1030-NOT v_mul_hi_u32 check of some kind?
212	Fix missing newline

Harbormaster completed remote builds in B112412: Diff 356448.Jul 5 2021, 2:31 AM

dstuttard mentioned this in D105424: [DAGCombiner] Pre-commit test to demonstrate mulhi const folding.Jul 5 2021, 3:24 AM

Pre-committed test and rebased on top

See D105424

Harbormaster completed remote builds in B112422: Diff 356464.Jul 5 2021, 3:27 AM

dstuttard marked 2 inline comments as done.Jul 5 2021, 3:28 AM

dstuttard added inline comments.

llvm/test/CodeGen/AMDGPU/udiv.ll
206	Good idea. See latest change.

dstuttard marked an inline comment as done.Jul 5 2021, 3:28 AM

dstuttard mentioned this in rG4b125b23ba95: [DAGCombiner] Pre-commit test to demonstrate mulhi const folding.Jul 5 2021, 3:35 AM

LGTM - cheers!

This revision is now accepted and ready to land.Jul 5 2021, 3:58 AM

This revision was landed with ongoing or failed builds.Jul 5 2021, 4:08 AM

Closed by commit rG83cb9632a13d: [DAGCombiner] Add support for mulhi const folding in DAGCombiner (authored by dstuttard). · Explain Why

This revision was automatically updated to reflect the committed changes.

dstuttard added a commit: rG83cb9632a13d: [DAGCombiner] Add support for mulhi const folding in DAGCombiner.

Diff 356468

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,456 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitMULHS(SDNode *N) {
if (VT.isVector()) {		if (VT.isVector()) {
// fold (mulhs x, 0) -> 0		// fold (mulhs x, 0) -> 0
// do not return N0/N1, because undef node may exist.		// do not return N0/N1, because undef node may exist.
if (ISD::isConstantSplatVectorAllZeros(N0.getNode()) \|\|		if (ISD::isConstantSplatVectorAllZeros(N0.getNode()) \|\|
ISD::isConstantSplatVectorAllZeros(N1.getNode()))		ISD::isConstantSplatVectorAllZeros(N1.getNode()))
return DAG.getConstant(0, DL, VT);		return DAG.getConstant(0, DL, VT);
}		}

		// fold (mulhs c1, c2)
		if (SDValue C = DAG.FoldConstantArithmetic(ISD::MULHS, DL, VT, {N0, N1}))
		return C;
		craig.topperUnsubmitted Done Reply Inline Actions Use DL variable that already exists craig.topper: Use DL variable that already exists

// fold (mulhs x, 0) -> 0		// fold (mulhs x, 0) -> 0
if (isNullConstant(N1))		if (isNullConstant(N1))
return N1;		return N1;
// fold (mulhs x, 1) -> (sra x, size(x)-1)		// fold (mulhs x, 1) -> (sra x, size(x)-1)
if (isOneConstant(N1))		if (isOneConstant(N1))
return DAG.getNode(ISD::SRA, DL, N0.getValueType(), N0,		return DAG.getNode(ISD::SRA, DL, N0.getValueType(), N0,
DAG.getConstant(N0.getScalarValueSizeInBits() - 1, DL,		DAG.getConstant(N0.getScalarValueSizeInBits() - 1, DL,
getShiftAmountTy(N0.getValueType())));		getShiftAmountTy(N0.getValueType())));
Show All 32 Lines	SDValue DAGCombiner::visitMULHU(SDNode *N) {
if (VT.isVector()) {		if (VT.isVector()) {
// fold (mulhu x, 0) -> 0		// fold (mulhu x, 0) -> 0
// do not return N0/N1, because undef node may exist.		// do not return N0/N1, because undef node may exist.
if (ISD::isConstantSplatVectorAllZeros(N0.getNode()) \|\|		if (ISD::isConstantSplatVectorAllZeros(N0.getNode()) \|\|
ISD::isConstantSplatVectorAllZeros(N1.getNode()))		ISD::isConstantSplatVectorAllZeros(N1.getNode()))
return DAG.getConstant(0, DL, VT);		return DAG.getConstant(0, DL, VT);
}		}

		// fold (mulhu c1, c2)
		if (SDValue C = DAG.FoldConstantArithmetic(ISD::MULHU, DL, VT, {N0, N1}))
		return C;
		craig.topperUnsubmitted Done Reply Inline Actions Use DL variable that already exists craig.topper: Use DL variable that already exists

// fold (mulhu x, 0) -> 0		// fold (mulhu x, 0) -> 0
if (isNullConstant(N1))		if (isNullConstant(N1))
return N1;		return N1;
// fold (mulhu x, 1) -> 0		// fold (mulhu x, 1) -> 0
if (isOneConstant(N1))		if (isOneConstant(N1))
return DAG.getConstant(0, DL, N0.getValueType());		return DAG.getConstant(0, DL, N0.getValueType());
// fold (mulhu x, undef) -> 0		// fold (mulhu x, undef) -> 0
if (N0.isUndef() \|\| N1.isUndef())		if (N0.isUndef() \|\| N1.isUndef())
▲ Show 20 Lines • Show All 18,783 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,073 Lines • ▼ Show 20 Lines	static llvm::Optional<APInt> FoldValue(unsigned Opcode, const APInt &C1,
case ISD::SDIV:		case ISD::SDIV:
if (!C2.getBoolValue())		if (!C2.getBoolValue())
break;		break;
return C1.sdiv(C2);		return C1.sdiv(C2);
case ISD::SREM:		case ISD::SREM:
if (!C2.getBoolValue())		if (!C2.getBoolValue())
break;		break;
return C1.srem(C2);		return C1.srem(C2);
		case ISD::MULHS: {
		unsigned FullWidth = C1.getBitWidth() * 2;
		APInt C1Ext = C1.sext(FullWidth);
		foadUnsubmitted Done Reply Inline Actions You don't need "OrTrunc" here and below. foad: You don't need "OrTrunc" here and below.
		APInt C2Ext = C2.sext(FullWidth);
		return (C1Ext * C2Ext).extractBits(C1.getBitWidth(), C1.getBitWidth());
		RKSimonUnsubmitted Done Reply Inline Actions APInt::extractBits ? RKSimon: APInt::extractBits ?
		}
		case ISD::MULHU: {
		unsigned FullWidth = C1.getBitWidth() * 2;
		APInt C1Ext = C1.zext(FullWidth);
		APInt C2Ext = C2.zext(FullWidth);
		return (C1Ext * C2Ext).extractBits(C1.getBitWidth(), C1.getBitWidth());
		RKSimonUnsubmitted Done Reply Inline Actions APInt::extractBits ? RKSimon: APInt::extractBits ?
		}
}		}
return llvm::None;		return llvm::None;
}		}

SDValue SelectionDAG::FoldSymbolOffset(unsigned Opcode, EVT VT,		SDValue SelectionDAG::FoldSymbolOffset(unsigned Opcode, EVT VT,
const GlobalAddressSDNode *GA,		const GlobalAddressSDNode *GA,
const SDNode *N2) {		const SDNode *N2) {
if (GA->getOpcode() != ISD::GlobalAddress)		if (GA->getOpcode() != ISD::GlobalAddress)
▲ Show 20 Lines • Show All 5,510 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/udiv.ll

Show First 20 Lines • Show All 197 Lines • ▼ Show 20 Lines	bb:
%tmp5 = sdiv i32 %tmp1, %tmp4		%tmp5 = sdiv i32 %tmp1, %tmp4
%tmp6 = trunc i32 %tmp5 to i8		%tmp6 = trunc i32 %tmp5 to i8
store i8 %tmp6, i8 addrspace(1)* null, align 1		store i8 %tmp6, i8 addrspace(1)* null, align 1
ret void		ret void
}		}

define i64 @v_test_udiv64_mulhi_fold(i64 %arg) {		define i64 @v_test_udiv64_mulhi_fold(i64 %arg) {
; GCN-LABEL: v_test_udiv64_mulhi_fold		; GCN-LABEL: v_test_udiv64_mulhi_fold
; GFX1030: s_mov_b32 [[VAL1:s[0-9]+]], 0xa9000000		; GFX1030: s_add_u32 [[VAL:s[0-9]+]], 0x4237, s{{[0-9]+}}
		RKSimonUnsubmitted Done Reply Inline Actions Possibly pre-commit this to show current codegen? Do we need a GFX1030-NOT v_mul_hi_u32 check of some kind? RKSimon: Possibly pre-commit this to show current codegen? Do we need a GFX1030-NOT v_mul_hi_u32 check…
		dstuttardAuthorUnsubmitted Done Reply Inline Actions Good idea. See latest change. dstuttard: Good idea. See latest change.
; GFX1030: s_brev_b32 [[VAL2:s[0-9]+]], 6		; GFX1030-NOT: s_mul_hi_u32
; GFX1030: s_movk_i32 [[VAL3:s[0-9]+]], 0x500		; GFX1030: v_add_co_u32 v{{[0-9]+}}, [[VAL]], 0xa9000000, [[VAL]]
; GFX1030: s_mul_hi_u32 s7, [[VAL1]], [[VAL2]]
; GFX1030: s_mov_b32 [[VAL4:s[0-9]+]], 0xa7c5
; GFX1030: s_mul_hi_u32 s8, [[VAL1]], [[VAL3]]
; GFX1030: s_mul_hi_u32 s5, [[VAL4]], [[VAL2]]
; GFX1030: s_mul_hi_u32 s6, [[VAL4]], [[VAL3]]
; GFX1030: v_add_co_u32 v{{[0-9]+}}, s{{[0-9]+}}, s{{[0-9]+}}, s{{[0-9]+}}
%d = udiv i64 %arg, 100000		%d = udiv i64 %arg, 100000
ret i64 %d		ret i64 %d
}		}
		RKSimonUnsubmitted Done Reply Inline Actions Fix missing newline RKSimon: Fix missing newline

llvm/test/CodeGen/X86/pmulh.ll

Show First 20 Lines • Show All 2,184 Lines • ▼ Show 20 Lines	; AVX512-NEXT: retq
%c = mul <8 x i64> %a1, %b1		%c = mul <8 x i64> %a1, %b1
%d = ashr <8 x i64> %c, <i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16>		%d = ashr <8 x i64> %c, <i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16, i64 16>
ret <8 x i64> %d		ret <8 x i64> %d
}		}

define <8 x i16> @sse2_pmulh_w_const(<8 x i16> %a0, <8 x i16> %a1) {		define <8 x i16> @sse2_pmulh_w_const(<8 x i16> %a0, <8 x i16> %a1) {
; SSE-LABEL: sse2_pmulh_w_const:		; SSE-LABEL: sse2_pmulh_w_const:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: movdqa {{.*#+}} xmm0 = [65535,65534,65533,65532,65531,65530,65529,0]		; SSE-NEXT: movaps {{.*#+}} xmm0 = [0,65535,65535,65535,65535,65535,65535,0]
; SSE-NEXT: pmulhw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: sse2_pmulh_w_const:		; AVX-LABEL: sse2_pmulh_w_const:
; AVX: # %bb.0:		; AVX: # %bb.0:
; AVX-NEXT: vmovdqa {{.*#+}} xmm0 = [65535,65534,65533,65532,65531,65530,65529,0]		; AVX-NEXT: vmovaps {{.*#+}} xmm0 = [0,65535,65535,65535,65535,65535,65535,0]
; AVX-NEXT: vpmulhw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
%res = call <8 x i16> @llvm.x86.sse2.pmulh.w(<8 x i16> <i16 -1, i16 -2, i16 -3, i16 -4, i16 -5, i16 -6, i16 -7, i16 0>, <8 x i16> <i16 0, i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 7>)		%res = call <8 x i16> @llvm.x86.sse2.pmulh.w(<8 x i16> <i16 -1, i16 -2, i16 -3, i16 -4, i16 -5, i16 -6, i16 -7, i16 0>, <8 x i16> <i16 0, i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 7>)
ret <8 x i16> %res		ret <8 x i16> %res
}		}
declare <8 x i16> @llvm.x86.sse2.pmulh.w(<8 x i16>, <8 x i16>)		declare <8 x i16> @llvm.x86.sse2.pmulh.w(<8 x i16>, <8 x i16>)

define <8 x i16> @sse2_pmulhu_w_const(<8 x i16> %a0, <8 x i16> %a1) {		define <8 x i16> @sse2_pmulhu_w_const(<8 x i16> %a0, <8 x i16> %a1) {
; SSE-LABEL: sse2_pmulhu_w_const:		; SSE-LABEL: sse2_pmulhu_w_const:
; SSE: # %bb.0:		; SSE: # %bb.0:
; SSE-NEXT: movdqa {{.*#+}} xmm0 = [65535,65534,65533,65532,65531,65530,65529,0]		; SSE-NEXT: movaps {{.*#+}} xmm0 = [0,0,1,2,3,4,5,0]
; SSE-NEXT: pmulhuw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
; SSE-NEXT: retq		; SSE-NEXT: retq
;		;
; AVX-LABEL: sse2_pmulhu_w_const:		; AVX-LABEL: sse2_pmulhu_w_const:
; AVX: # %bb.0:		; AVX: # %bb.0:
; AVX-NEXT: vmovdqa {{.*#+}} xmm0 = [65535,65534,65533,65532,65531,65530,65529,0]		; AVX-NEXT: vmovaps {{.*#+}} xmm0 = [0,0,1,2,3,4,5,0]
; AVX-NEXT: vpmulhuw {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
; AVX-NEXT: retq		; AVX-NEXT: retq
%res = call <8 x i16> @llvm.x86.sse2.pmulhu.w(<8 x i16> <i16 -1, i16 -2, i16 -3, i16 -4, i16 -5, i16 -6, i16 -7, i16 0>, <8 x i16> <i16 0, i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 7>)		%res = call <8 x i16> @llvm.x86.sse2.pmulhu.w(<8 x i16> <i16 -1, i16 -2, i16 -3, i16 -4, i16 -5, i16 -6, i16 -7, i16 0>, <8 x i16> <i16 0, i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 7>)
ret <8 x i16> %res		ret <8 x i16> %res
}		}
declare <8 x i16> @llvm.x86.sse2.pmulhu.w(<8 x i16>, <8 x i16>)		declare <8 x i16> @llvm.x86.sse2.pmulhu.w(<8 x i16>, <8 x i16>)

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Add support for mulhi const folding in DAGCombiner
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 356468

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/test/CodeGen/AMDGPU/udiv.ll

llvm/test/CodeGen/X86/pmulh.ll

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Add support for mulhi const folding in DAGCombinerClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 356468

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/test/CodeGen/AMDGPU/udiv.ll

llvm/test/CodeGen/X86/pmulh.ll

[DAGCombiner] Add support for mulhi const folding in DAGCombiner
ClosedPublic