Download Raw Diff

Details

Reviewers

asb
lenary
luismarques
shiva0217
kito-cheng
MaskRay

Commits

rGcb82de296017: [RISCV] Optimize multiplication by constant

Summary

... to shift/add or shift/sub.

Diff Detail

Event Timeline

benshi001 created this revision.Jun 26 2020, 7:46 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 26 2020, 7:46 AM

Herald added subscribers: llvm-commits, evandro, apazos and 23 others. · View Herald Transcript

This patch can not cover all cases (especially "call __mulsi3" on rv32 without M extension), but at least it works well for most cases.

Maybe a better solution is make ISD::MUL as Custom, which I will try later. You are appreciated to review and accept such a partial optimization.

benshi001 edited the summary of this revision. (Show Details)Jun 26 2020, 8:11 AM

benshi001 edited the summary of this revision. (Show Details)Jun 26 2020, 8:14 AM

benshi001 edited the summary of this revision. (Show Details)Jun 26 2020, 8:25 AM

Thanks for the patch!

This optimisation is done by DAGCombine if you instead implement decomposeMulByConstant in RISCVTargetLowering. Read the comment on the TargetLoweringBase class to understand how to use it. This is preferrable, because we don't want to maintain a target-specific copy of this optimisation if we can avoid it.

It would be sensible to base your implementation on the one in the x86 backend: X86TargetLowering::decomposeMulByConstant, which deals with some phase ordering issues around legalisation.

benshi001 edited the summary of this revision. (Show Details)Jun 26 2020, 8:27 AM

benshi001 updated this revision to Diff 273889.Jun 27 2020, 1:49 AM

benshi001 edited the summary of this revision. (Show Details)

Thanks. I have uploaded a new version according to your suggestion!

In D82660#2117161, @lenary wrote:

Thanks for the patch!

This optimisation is done by DAGCombine if you instead implement decomposeMulByConstant in RISCVTargetLowering. Read the comment on the TargetLoweringBase class to understand how to use it. This is preferrable, because we don't want to maintain a target-specific copy of this optimisation if we can avoid it.

It would be sensible to base your implementation on the one in the x86 backend: X86TargetLowering::decomposeMulByConstant, which deals with some phase ordering issues around legalisation.

This is looking good.

I'm going to pre-commit the test additions today - if you could rebase your changes on top, that will allow us to see how this change affects the new testcases you added. I'll keep you as the author and let you know the sha so you can rebase on top of the commit.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2989 ↗	(On Diff #273889)	This TODO should not apply to RISC-V, yet.

In D82660#2126537, @lenary wrote:

This is looking good.

I'm going to pre-commit the test additions today - if you could rebase your changes on top, that will allow us to see how this change affects the new testcases you added. I'll keep you as the author and let you know the sha so you can rebase on top of the commit.

Done in rG003a086ffc0.

lenary mentioned this in rG003a086ffc0d: [RISCV][NFC] Pre-commit tests for D82660.Jul 1 2020, 3:09 PM

benshi001 updated this revision to Diff 274997.Jul 1 2020, 8:24 PM

benshi001 edited the summary of this revision. (Show Details)

benshi001 marked 2 inline comments as done.Jul 1 2020, 8:31 PM

benshi001 added inline comments.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2989 ↗	(On Diff #273889)	Thanks. I have rebased and fixed according to what you suggested.

I'm happy with this optimisation where this patch removes multiply libcalls.

Where the target has a the m extension, and especially for 64-bit multiplies on rv32im, I'm not sure this is an optimisation.

I think that, for the moment, we should add a guard to the hook to avoid this transformation where we do have mul instructions:

if (Subtarget.hasStdExtM())
  return false;

What do you think?

llvm/test/CodeGen/RISCV/mul.ll
305–379	I think this is a pessimisation, though I realise that depends on how slow the 32-bit multiplier is compared to add/shift.

benshi001 updated this revision to Diff 275136.Jul 2 2020, 8:36 AM

benshi001 marked an inline comment as done.

benshi001 updated this revision to Diff 275139.Jul 2 2020, 8:44 AM

In D82660#2127626, @lenary wrote:
I'm happy with this optimisation where this patch removes multiply libcalls.

Where the target has a the m extension, and especially for 64-bit multiplies on rv32im, I'm not sure this is an optimisation.

I think that, for the moment, we should add a guard to the hook to avoid this transformation where we do have mul instructions:
if (Subtarget.hasStdExtM())
  return false;
What do you think?

Shall we loose the guard condition to that ?

if (!Subtarget.is64Bit && Subtarget.hasStdExtM())
   return false;

This will prevent the optimization for RV32IM, but still work for RV64IM。

I think a mul-instruction's latency is sure to be >=2, so all existing test cases will not have regresion.

LGTM.
I'm not overly concerned about the occasional code size increases from doing the optimization for RV32IM, so the loosening of the condition is OK IMO.
Everything else seems to be in order now.
Maybe wait a couple of days more for @lenary's OK.

This revision is now accepted and ready to land.Jul 6 2020, 10:10 AM

One issue, then I'm happy.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2990 ↗	(On Diff #275139)	getSExtValue will assert if the value does not fit into 64 bits - you need to do a check before you get there. I think this hook can be called before legalisation, so you may not get only legal types in this call.

benshi001 updated this revision to Diff 276297.Jul 7 2020, 6:27 PM

benshi001 marked an inline comment as done.

MaskRay accepted this revision.Jul 7 2020, 6:38 PM

MaskRay retitled this revision from [RISCV] Optimize multiplication by specific immediates to [RISCV] Optimize multiplication by constant.

MaskRay edited the summary of this revision. (Show Details)

Closed by commit rGcb82de296017: [RISCV] Optimize multiplication by constant (authored by benshi001, committed by MaskRay). · Explain WhyJul 7 2020, 6:50 PM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: jrtc27. · View Herald TranscriptJul 7 2020, 6:50 PM

Diff 273727

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	void RISCVDAGToDAGISel::Select(SDNode *Node) {
// Instruction Selection not handled by the auto-generated tablegen selection		// Instruction Selection not handled by the auto-generated tablegen selection
// should be handled here.		// should be handled here.
unsigned Opcode = Node->getOpcode();		unsigned Opcode = Node->getOpcode();
MVT XLenVT = Subtarget->getXLenVT();		MVT XLenVT = Subtarget->getXLenVT();
SDLoc DL(Node);		SDLoc DL(Node);
EVT VT = Node->getValueType(0);		EVT VT = Node->getValueType(0);

switch (Opcode) {		switch (Opcode) {
		// Optimize the node (mul r, imm) to below pattern if possible.
		// (mul r, imm) -> (add (shl r, log2(imm-1)), r)
		// (mul r, imm) -> (sub (shl r, log2(imm+1)), r)
		case ISD::MUL: {
		// The second operand must be an immediate.
		if (auto *ConstOp = dyn_cast<ConstantSDNode>(Node->getOperand(1))) {
		uint32_t Shift = 0;
		unsigned Opc;
		if (!Subtarget->is64Bit()) { // Calculate Opc/Shift for riscv32.
		uint32_t Imm = ConstOp->getZExtValue();
		if (isPowerOf2_32(Imm + 1)) {
		Opc = RISCV::SUB;
		Shift = Log2_32(Imm + 1);
		} else if (isPowerOf2_32(Imm - 1)) {
		Opc = RISCV::ADD;
		Shift = Log2_32(Imm - 1);
		}
		} else { // Calculate Opc/Shift for riscv64.
		uint64_t Imm = ConstOp->getZExtValue();
		if (isPowerOf2_64(Imm + 1)) {
		Opc = RISCV::SUB;
		Shift = Log2_64(Imm + 1);
		} else if (isPowerOf2_64(Imm - 1)) {
		Opc = RISCV::ADD;
		Shift = Log2_64(Imm - 1);
		}
		}
		// Build the optimized DAG if applicable.
		if (Shift > 0) {
		SDLoc DL(Node);
		EVT VT = Node->getValueType(0);
		SDValue ShiftOp = CurDAG->getTargetConstant(Shift, DL, VT);
		auto *NodeShl = CurDAG->getMachineNode(RISCV::SLLI, DL, VT,
		Node->getOperand(0), ShiftOp);
		auto *NodeAddSub = CurDAG->getMachineNode(Opc, DL, VT,
		SDValue(NodeShl, 0),
		Node->getOperand(0));
		ReplaceNode(Node, NodeAddSub);
		return;
		}
		}
		break;
		}
case ISD::Constant: {		case ISD::Constant: {
auto ConstNode = cast<ConstantSDNode>(Node);		auto ConstNode = cast<ConstantSDNode>(Node);
if (VT == XLenVT && ConstNode->isNullValue()) {		if (VT == XLenVT && ConstNode->isNullValue()) {
SDValue New = CurDAG->getCopyFromReg(CurDAG->getEntryNode(), SDLoc(Node),		SDValue New = CurDAG->getCopyFromReg(CurDAG->getEntryNode(), SDLoc(Node),
RISCV::X0, XLenVT);		RISCV::X0, XLenVT);
ReplaceNode(Node, New.getNode());		ReplaceNode(Node, New.getNode());
return;		return;
}		}
▲ Show 20 Lines • Show All 174 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/mul.ll

	Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	; RV32I-NEXT: addi a1, zero, 5			; RV32I-NEXT: addi a1, zero, 5
	; RV32I-NEXT: call __mulsi3			; RV32I-NEXT: call __mulsi3
	; RV32I-NEXT: lw ra, 12(sp)			; RV32I-NEXT: lw ra, 12(sp)
	; RV32I-NEXT: addi sp, sp, 16			; RV32I-NEXT: addi sp, sp, 16
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	;			;
	; RV32IM-LABEL: mul_constant:			; RV32IM-LABEL: mul_constant:
	; RV32IM: # %bb.0:			; RV32IM: # %bb.0:
	; RV32IM-NEXT: addi a1, zero, 5			; RV32IM-NEXT: slli a1, a0, 2
	; RV32IM-NEXT: mul a0, a0, a1			; RV32IM-NEXT: add a0, a1, a0
	; RV32IM-NEXT: ret			; RV32IM-NEXT: ret
	;			;
	; RV64I-LABEL: mul_constant:			; RV64I-LABEL: mul_constant:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: addi sp, sp, -16			; RV64I-NEXT: addi sp, sp, -16
	; RV64I-NEXT: sd ra, 8(sp)			; RV64I-NEXT: sd ra, 8(sp)
	; RV64I-NEXT: addi a1, zero, 5			; RV64I-NEXT: addi a1, zero, 5
	; RV64I-NEXT: call __muldi3			; RV64I-NEXT: call __muldi3
	▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines
	; RV32I-NEXT: call __muldi3			; RV32I-NEXT: call __muldi3
	; RV32I-NEXT: lw ra, 12(sp)			; RV32I-NEXT: lw ra, 12(sp)
	; RV32I-NEXT: addi sp, sp, 16			; RV32I-NEXT: addi sp, sp, 16
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	;			;
	; RV32IM-LABEL: mul64_constant:			; RV32IM-LABEL: mul64_constant:
	; RV32IM: # %bb.0:			; RV32IM: # %bb.0:
	; RV32IM-NEXT: addi a2, zero, 5			; RV32IM-NEXT: addi a2, zero, 5
	; RV32IM-NEXT: mul a1, a1, a2			; RV32IM-NEXT: mulhu a2, a0, a2
	; RV32IM-NEXT: mulhu a3, a0, a2			; RV32IM-NEXT: slli a3, a1, 2
	; RV32IM-NEXT: add a1, a3, a1			; RV32IM-NEXT: add a1, a3, a1
	; RV32IM-NEXT: mul a0, a0, a2			; RV32IM-NEXT: add a1, a2, a1
				; RV32IM-NEXT: slli a2, a0, 2
				; RV32IM-NEXT: add a0, a2, a0
	; RV32IM-NEXT: ret			; RV32IM-NEXT: ret
	;			;
	; RV64I-LABEL: mul64_constant:			; RV64I-LABEL: mul64_constant:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: addi sp, sp, -16			; RV64I-NEXT: addi sp, sp, -16
	; RV64I-NEXT: sd ra, 8(sp)			; RV64I-NEXT: sd ra, 8(sp)
	; RV64I-NEXT: addi a1, zero, 5			; RV64I-NEXT: addi a1, zero, 5
	; RV64I-NEXT: call __muldi3			; RV64I-NEXT: call __muldi3
	; RV64I-NEXT: ld ra, 8(sp)			; RV64I-NEXT: ld ra, 8(sp)
	; RV64I-NEXT: addi sp, sp, 16			; RV64I-NEXT: addi sp, sp, 16
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	;			;
	; RV64IM-LABEL: mul64_constant:			; RV64IM-LABEL: mul64_constant:
	; RV64IM: # %bb.0:			; RV64IM: # %bb.0:
	; RV64IM-NEXT: addi a1, zero, 5			; RV64IM-NEXT: slli a1, a0, 2
	; RV64IM-NEXT: mul a0, a0, a1			; RV64IM-NEXT: add a0, a1, a0
	; RV64IM-NEXT: ret			; RV64IM-NEXT: ret
	%1 = mul i64 %a, 5			%1 = mul i64 %a, 5
	ret i64 %1			ret i64 %1
	}			}

	define i32 @mulhs(i32 %a, i32 %b) nounwind {			define i32 @mulhs(i32 %a, i32 %b) nounwind {
	; RV32I-LABEL: mulhs:			; RV32I-LABEL: mulhs:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
	; RV64IM-NEXT: srli a0, a0, 32			; RV64IM-NEXT: srli a0, a0, 32
	; RV64IM-NEXT: ret			; RV64IM-NEXT: ret
	%1 = zext i32 %a to i64			%1 = zext i32 %a to i64
	%2 = zext i32 %b to i64			%2 = zext i32 %b to i64
	%3 = mul i64 %1, %2			%3 = mul i64 %1, %2
	%4 = lshr i64 %3, 32			%4 = lshr i64 %3, 32
	%5 = trunc i64 %4 to i32			%5 = trunc i64 %4 to i32
	ret i32 %5			ret i32 %5
	}			}

				define i32 @muli32_65(i32 %a) nounwind {
				; RV32IM-LABEL: muli32_65:
				; RV32IM: # %bb.0:
				; RV32IM-NEXT: slli a1, a0, 6
				; RV32IM-NEXT: add a0, a1, a0
				; RV32IM-NEXT: ret
				;
				; RV64IM-LABEL: muli32_65:
				; RV64IM: # %bb.0:
				; RV64IM-NEXT: slli a1, a0, 6
				; RV64IM-NEXT: add a0, a1, a0
				; RV64IM-NEXT: ret
				%1 = mul i32 %a, 65
				ret i32 %1
				}

				define i32 @muli32_63(i32 %a) nounwind {
				; RV32IM-LABEL: muli32_63:
				; RV32IM: # %bb.0:
				; RV32IM-NEXT: slli a1, a0, 6
				; RV32IM-NEXT: sub a0, a1, a0
				; RV32IM-NEXT: ret
				;
				; RV64IM-LABEL: muli32_63:
				; RV64IM: # %bb.0:
				; RV64IM-NEXT: slli a1, a0, 6
				; RV64IM-NEXT: sub a0, a1, a0
				; RV64IM-NEXT: ret
				%1 = mul i32 %a, 63
				ret i32 %1
				}

				define i64 @muli64_65(i64 %a) nounwind {
				; RV32IM-LABEL: muli64_65:
				; RV32IM: # %bb.0:
				; RV32IM-NEXT: addi a2, zero, 65
				; RV32IM-NEXT: mulhu a2, a0, a2
				; RV32IM-NEXT: slli a3, a1, 6
				; RV32IM-NEXT: add a1, a3, a1
				; RV32IM-NEXT: add a1, a2, a1
				; RV32IM-NEXT: slli a2, a0, 6
				; RV32IM-NEXT: add a0, a2, a0
				; RV32IM-NEXT: ret
				;
				; RV64IM-LABEL: muli64_65:
				; RV64IM: # %bb.0:
				; RV64IM-NEXT: slli a1, a0, 6
				; RV64IM-NEXT: add a0, a1, a0
				; RV64IM-NEXT: ret
				%1 = mul i64 %a, 65
				ret i64 %1
				}

				define i64 @muli64_63(i64 %a) nounwind {
				; RV32IM-LABEL: muli64_63:
				; RV32IM: # %bb.0:
				; RV32IM-NEXT: addi a2, zero, 63
				; RV32IM-NEXT: mulhu a2, a0, a2
				; RV32IM-NEXT: slli a3, a1, 6
				; RV32IM-NEXT: sub a1, a3, a1
				; RV32IM-NEXT: add a1, a2, a1
				; RV32IM-NEXT: slli a2, a0, 6
				; RV32IM-NEXT: sub a0, a2, a0
				; RV32IM-NEXT: ret
				;
				; RV64IM-LABEL: muli64_63:
				; RV64IM: # %bb.0:
				; RV64IM-NEXT: slli a1, a0, 6
				; RV64IM-NEXT: sub a0, a1, a0
				; RV64IM-NEXT: ret
				%1 = mul i64 %a, 63
				ret i64 %1
				}
				lenaryUnsubmitted Done Reply Inline Actions I think this is a pessimisation, though I realise that depends on how slow the 32-bit multiplier is compared to add/shift. lenary: I think this is a pessimisation, though I realise that depends on how slow the 32-bit…

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Optimize multiplication by constant
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 273727

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp

llvm/test/CodeGen/RISCV/mul.ll

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Optimize multiplication by constantClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 273727

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp

llvm/test/CodeGen/RISCV/mul.ll

[RISCV] Optimize multiplication by constant
ClosedPublic