This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Add isel pattern to optimize (mul (and X, 0xffffffff), (and Y, 0xffffffff)) on RV64
ClosedPublic

Authored by craig.topper on Mar 20 2021, 12:44 PM.

Download Raw Diff

Details

Reviewers

asb
luismarques
jrtc27
frasercrmck

Commits

rGb0d8823a8a44: [RISCV] Add isel pattern to optimize (mul (and X, 0xffffffff), (and Y…

Summary

This patterns computes the full 64 bit product of a 32x32 unsigned
multiply. This requires a two pairs of SLLI+SRLI to zero the
upper 32 bits of the inputs.

We can do better than this by using two SLLI to move the lower
bits to the upper bits then use MULHU to compute the product. This
is the high half of a full 64x64 product. Since we put 32 0s in the lower
bits of the inputs we know the 128-bit product will have zeros in the
lower 64 bits. So the upper 64 bits, which MULHU computes, will contain
the original 64 bit product we were after.

The same trick would work for (mul (sext_inreg X, i32), (sext_inreg Y, i32)) using MULHS, but sext_inreg
is sext.w which is already one instruction so we wouldn't save anything.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

craig.topper created this revision.Mar 20 2021, 12:44 PM

Herald added subscribers: StephenFan, vkmr, evandro and 22 others. · View Herald TranscriptMar 20 2021, 12:44 PM

craig.topper requested review of this revision.Mar 20 2021, 12:44 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2021, 12:44 PM

Herald added a subscriber: MaskRay. · View Herald Transcript

I think the final line of your commit message is the wrong way round?

In D99026#2639787, @jrtc27 wrote:

I think the final line of your commit message is the wrong way round?

Maybe it was poorly written. How does it look now?

In D99026#2639788, @craig.topper wrote:

In D99026#2639787, @jrtc27 wrote:

I think the final line of your commit message is the wrong way round?

Maybe it was poorly written. How does it look now?

Ah that makes more sense, I misunderstood what your point was

jrtc27 accepted this revision.Mar 20 2021, 1:01 PM

This revision is now accepted and ready to land.Mar 20 2021, 1:01 PM

Harbormaster completed remote builds in B94873: Diff 332125.Mar 20 2021, 1:21 PM

In D99026#2639790, @jrtc27 wrote:

In D99026#2639788, @craig.topper wrote:

In D99026#2639787, @jrtc27 wrote:

I think the final line of your commit message is the wrong way round?

Maybe it was poorly written. How does it look now?

Ah that makes more sense, I misunderstood what your point was

I guess maybe we shouldn’t do this with Zba which has zext.w. Unless we know the ‘and’ is only used by the mul.

Closed by commit rGb0d8823a8a44: [RISCV] Add isel pattern to optimize (mul (and X, 0xffffffff), (and Y… (authored by craig.topper). · Explain WhyMar 20 2021, 2:55 PM

This revision was automatically updated to reflect the committed changes.

craig.topper added a commit: rGb0d8823a8a44: [RISCV] Add isel pattern to optimize (mul (and X, 0xffffffff), (and Y….

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVInstrInfoM.td

9 lines

test/

CodeGen/

RISCV/

rv64i-w-insts-legalization.ll

12 lines

xaluo.ll

16 lines

Diff 332130

llvm/lib/Target/RISCV/RISCVInstrInfoM.td

Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	def : Pat<(and (riscv_remuw (assertzexti32 GPR:$rs1),
(assertzexti32 GPR:$rs2)), 0xffffffff),		(assertzexti32 GPR:$rs2)), 0xffffffff),
(REMU GPR:$rs1, GPR:$rs2)>;		(REMU GPR:$rs1, GPR:$rs2)>;

// Although the sexti32 operands may not have originated from an i32 srem,		// Although the sexti32 operands may not have originated from an i32 srem,
// this pattern is safe as it is impossible for two sign extended inputs to		// this pattern is safe as it is impossible for two sign extended inputs to
// produce a result where res[63:32]=0 and res[31]=1.		// produce a result where res[63:32]=0 and res[31]=1.
def : Pat<(srem (sexti32 (i64 GPR:$rs1)), (sexti32 (i64 GPR:$rs2))),		def : Pat<(srem (sexti32 (i64 GPR:$rs1)), (sexti32 (i64 GPR:$rs2))),
(REMW GPR:$rs1, GPR:$rs2)>;		(REMW GPR:$rs1, GPR:$rs2)>;

		// Special case for calculating the full 64-bit product of a 32x32 unsigned
		// multiply where the inputs aren't known to be zero extended. We can shift the
		// inputs left by 32 and use a MULHU. This saves two SRLIs needed to finish
		// zeroing the upper 32 bits.
		// TODO: If one of the operands is zero extended and the other isn't, we might
		// still be better off shifting both left by 32.
		def : Pat<(i64 (mul (and GPR:$rs1, 0xffffffff), (and GPR:$rs2, 0xffffffff))),
		(MULHU (SLLI GPR:$rs1, 32), (SLLI GPR:$rs2, 32))>;
} // Predicates = [HasStdExtM, IsRV64]		} // Predicates = [HasStdExtM, IsRV64]

llvm/test/CodeGen/RISCV/rv64i-w-insts-legalization.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=riscv64 -mattr=+m -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=riscv64 -mattr=+m -verify-machineinstrs < %s \| FileCheck %s

	define signext i32 @addw(i32 signext %s, i32 signext %n, i32 signext %k) nounwind {			define signext i32 @addw(i32 signext %s, i32 signext %n, i32 signext %k) nounwind {
	; CHECK-LABEL: addw:			; CHECK-LABEL: addw:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: bge a0, a1, .LBB0_2			; CHECK-NEXT: bge a0, a1, .LBB0_2
	; CHECK-NEXT: # %bb.1: # %for.body.preheader			; CHECK-NEXT: # %bb.1: # %for.body.preheader
	; CHECK-NEXT: not a2, a0			; CHECK-NEXT: not a2, a0
	; CHECK-NEXT: add a2, a2, a1			; CHECK-NEXT: add a2, a2, a1
	; CHECK-NEXT: addi a3, a0, 1			; CHECK-NEXT: addi a3, a0, 1
	; CHECK-NEXT: mul a3, a2, a3			; CHECK-NEXT: mul a3, a2, a3
	; CHECK-NEXT: slli a2, a2, 32
	; CHECK-NEXT: srli a2, a2, 32
	; CHECK-NEXT: sub a1, a1, a0			; CHECK-NEXT: sub a1, a1, a0
	; CHECK-NEXT: addi a1, a1, -2			; CHECK-NEXT: addi a1, a1, -2
	; CHECK-NEXT: slli a1, a1, 32			; CHECK-NEXT: slli a1, a1, 32
	; CHECK-NEXT: srli a1, a1, 32			; CHECK-NEXT: slli a2, a2, 32
	; CHECK-NEXT: mul a1, a2, a1			; CHECK-NEXT: mulhu a1, a2, a1
	; CHECK-NEXT: srli a1, a1, 1			; CHECK-NEXT: srli a1, a1, 1
	; CHECK-NEXT: add a0, a3, a0			; CHECK-NEXT: add a0, a3, a0
	; CHECK-NEXT: addw a0, a0, a1			; CHECK-NEXT: addw a0, a0, a1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: .LBB0_2:			; CHECK-NEXT: .LBB0_2:
	; CHECK-NEXT: mv a0, zero			; CHECK-NEXT: mv a0, zero
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	Show All 24 Lines
	define signext i32 @subw(i32 signext %s, i32 signext %n, i32 signext %k) nounwind {			define signext i32 @subw(i32 signext %s, i32 signext %n, i32 signext %k) nounwind {
	; CHECK-LABEL: subw:			; CHECK-LABEL: subw:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: bge a0, a1, .LBB1_2			; CHECK-NEXT: bge a0, a1, .LBB1_2
	; CHECK-NEXT: # %bb.1: # %for.body.preheader			; CHECK-NEXT: # %bb.1: # %for.body.preheader
	; CHECK-NEXT: not a2, a0			; CHECK-NEXT: not a2, a0
	; CHECK-NEXT: add a3, a2, a1			; CHECK-NEXT: add a3, a2, a1
	; CHECK-NEXT: mul a2, a3, a2			; CHECK-NEXT: mul a2, a3, a2
	; CHECK-NEXT: slli a3, a3, 32
	; CHECK-NEXT: srli a3, a3, 32
	; CHECK-NEXT: sub a1, a1, a0			; CHECK-NEXT: sub a1, a1, a0
	; CHECK-NEXT: addi a1, a1, -2			; CHECK-NEXT: addi a1, a1, -2
	; CHECK-NEXT: slli a1, a1, 32			; CHECK-NEXT: slli a1, a1, 32
	; CHECK-NEXT: srli a1, a1, 32			; CHECK-NEXT: slli a3, a3, 32
	; CHECK-NEXT: mul a1, a3, a1			; CHECK-NEXT: mulhu a1, a3, a1
	; CHECK-NEXT: srli a1, a1, 1			; CHECK-NEXT: srli a1, a1, 1
	; CHECK-NEXT: sub a0, a2, a0			; CHECK-NEXT: sub a0, a2, a0
	; CHECK-NEXT: subw a0, a0, a1			; CHECK-NEXT: subw a0, a0, a1
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	; CHECK-NEXT: .LBB1_2:			; CHECK-NEXT: .LBB1_2:
	; CHECK-NEXT: mv a0, zero			; CHECK-NEXT: mv a0, zero
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	Show All 23 Lines

llvm/test/CodeGen/RISCV/xaluo.ll

	Show First 20 Lines • Show All 550 Lines • ▼ Show 20 Lines
	; RV32-NEXT: mul a0, a0, a1			; RV32-NEXT: mul a0, a0, a1
	; RV32-NEXT: sw a0, 0(a2)			; RV32-NEXT: sw a0, 0(a2)
	; RV32-NEXT: mv a0, a3			; RV32-NEXT: mv a0, a3
	; RV32-NEXT: ret			; RV32-NEXT: ret
	;			;
	; RV64-LABEL: umulo.i32:			; RV64-LABEL: umulo.i32:
	; RV64: # %bb.0: # %entry			; RV64: # %bb.0: # %entry
	; RV64-NEXT: slli a1, a1, 32			; RV64-NEXT: slli a1, a1, 32
	; RV64-NEXT: srli a1, a1, 32
	; RV64-NEXT: slli a0, a0, 32			; RV64-NEXT: slli a0, a0, 32
	; RV64-NEXT: srli a0, a0, 32			; RV64-NEXT: mulhu a1, a0, a1
	; RV64-NEXT: mul a1, a0, a1
	; RV64-NEXT: srli a0, a1, 32			; RV64-NEXT: srli a0, a1, 32
	; RV64-NEXT: snez a0, a0			; RV64-NEXT: snez a0, a0
	; RV64-NEXT: sw a1, 0(a2)			; RV64-NEXT: sw a1, 0(a2)
	; RV64-NEXT: ret			; RV64-NEXT: ret
	entry:			entry:
	%t = call {i32, i1} @llvm.umul.with.overflow.i32(i32 %v1, i32 %v2)			%t = call {i32, i1} @llvm.umul.with.overflow.i32(i32 %v1, i32 %v2)
	%val = extractvalue {i32, i1} %t, 0			%val = extractvalue {i32, i1} %t, 0
	%obit = extractvalue {i32, i1} %t, 1			%obit = extractvalue {i32, i1} %t, 1
	▲ Show 20 Lines • Show All 721 Lines • ▼ Show 20 Lines
	; RV32-NEXT: # %bb.1: # %entry			; RV32-NEXT: # %bb.1: # %entry
	; RV32-NEXT: mv a0, a1			; RV32-NEXT: mv a0, a1
	; RV32-NEXT: .LBB42_2: # %entry			; RV32-NEXT: .LBB42_2: # %entry
	; RV32-NEXT: ret			; RV32-NEXT: ret
	;			;
	; RV64-LABEL: umulo.select.i32:			; RV64-LABEL: umulo.select.i32:
	; RV64: # %bb.0: # %entry			; RV64: # %bb.0: # %entry
	; RV64-NEXT: slli a2, a1, 32			; RV64-NEXT: slli a2, a1, 32
	; RV64-NEXT: srli a2, a2, 32
	; RV64-NEXT: slli a3, a0, 32			; RV64-NEXT: slli a3, a0, 32
	; RV64-NEXT: srli a3, a3, 32			; RV64-NEXT: mulhu a2, a3, a2
	; RV64-NEXT: mul a2, a3, a2
	; RV64-NEXT: srli a2, a2, 32			; RV64-NEXT: srli a2, a2, 32
	; RV64-NEXT: bnez a2, .LBB42_2			; RV64-NEXT: bnez a2, .LBB42_2
	; RV64-NEXT: # %bb.1: # %entry			; RV64-NEXT: # %bb.1: # %entry
	; RV64-NEXT: mv a0, a1			; RV64-NEXT: mv a0, a1
	; RV64-NEXT: .LBB42_2: # %entry			; RV64-NEXT: .LBB42_2: # %entry
	; RV64-NEXT: ret			; RV64-NEXT: ret
	entry:			entry:
	%t = call {i32, i1} @llvm.umul.with.overflow.i32(i32 %v1, i32 %v2)			%t = call {i32, i1} @llvm.umul.with.overflow.i32(i32 %v1, i32 %v2)
	%obit = extractvalue {i32, i1} %t, 1			%obit = extractvalue {i32, i1} %t, 1
	%ret = select i1 %obit, i32 %v1, i32 %v2			%ret = select i1 %obit, i32 %v1, i32 %v2
	ret i32 %ret			ret i32 %ret
	}			}

	define i1 @umulo.not.i32(i32 %v1, i32 %v2) {			define i1 @umulo.not.i32(i32 %v1, i32 %v2) {
	; RV32-LABEL: umulo.not.i32:			; RV32-LABEL: umulo.not.i32:
	; RV32: # %bb.0: # %entry			; RV32: # %bb.0: # %entry
	; RV32-NEXT: mulhu a0, a0, a1			; RV32-NEXT: mulhu a0, a0, a1
	; RV32-NEXT: seqz a0, a0			; RV32-NEXT: seqz a0, a0
	; RV32-NEXT: ret			; RV32-NEXT: ret
	;			;
	; RV64-LABEL: umulo.not.i32:			; RV64-LABEL: umulo.not.i32:
	; RV64: # %bb.0: # %entry			; RV64: # %bb.0: # %entry
	; RV64-NEXT: slli a1, a1, 32			; RV64-NEXT: slli a1, a1, 32
	; RV64-NEXT: srli a1, a1, 32
	; RV64-NEXT: slli a0, a0, 32			; RV64-NEXT: slli a0, a0, 32
	; RV64-NEXT: srli a0, a0, 32			; RV64-NEXT: mulhu a0, a0, a1
	; RV64-NEXT: mul a0, a0, a1
	; RV64-NEXT: srli a0, a0, 32			; RV64-NEXT: srli a0, a0, 32
	; RV64-NEXT: seqz a0, a0			; RV64-NEXT: seqz a0, a0
	; RV64-NEXT: ret			; RV64-NEXT: ret
	entry:			entry:
	%t = call {i32, i1} @llvm.umul.with.overflow.i32(i32 %v1, i32 %v2)			%t = call {i32, i1} @llvm.umul.with.overflow.i32(i32 %v1, i32 %v2)
	%obit = extractvalue {i32, i1} %t, 1			%obit = extractvalue {i32, i1} %t, 1
	%ret = xor i1 %obit, true			%ret = xor i1 %obit, true
	ret i1 %ret			ret i1 %ret
	▲ Show 20 Lines • Show All 549 Lines • ▼ Show 20 Lines
	; RV32-NEXT: ret			; RV32-NEXT: ret
	; RV32-NEXT: .LBB57_2: # %continue			; RV32-NEXT: .LBB57_2: # %continue
	; RV32-NEXT: addi a0, zero, 1			; RV32-NEXT: addi a0, zero, 1
	; RV32-NEXT: ret			; RV32-NEXT: ret
	;			;
	; RV64-LABEL: umulo.br.i32:			; RV64-LABEL: umulo.br.i32:
	; RV64: # %bb.0: # %entry			; RV64: # %bb.0: # %entry
	; RV64-NEXT: slli a1, a1, 32			; RV64-NEXT: slli a1, a1, 32
	; RV64-NEXT: srli a1, a1, 32
	; RV64-NEXT: slli a0, a0, 32			; RV64-NEXT: slli a0, a0, 32
	; RV64-NEXT: srli a0, a0, 32			; RV64-NEXT: mulhu a0, a0, a1
	; RV64-NEXT: mul a0, a0, a1
	; RV64-NEXT: srli a0, a0, 32			; RV64-NEXT: srli a0, a0, 32
	; RV64-NEXT: beqz a0, .LBB57_2			; RV64-NEXT: beqz a0, .LBB57_2
	; RV64-NEXT: # %bb.1: # %overflow			; RV64-NEXT: # %bb.1: # %overflow
	; RV64-NEXT: mv a0, zero			; RV64-NEXT: mv a0, zero
	; RV64-NEXT: ret			; RV64-NEXT: ret
	; RV64-NEXT: .LBB57_2: # %continue			; RV64-NEXT: .LBB57_2: # %continue
	; RV64-NEXT: addi a0, zero, 1			; RV64-NEXT: addi a0, zero, 1
	; RV64-NEXT: ret			; RV64-NEXT: ret
	▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines