Download Raw Diff

Details

Reviewers

craig.topper
asb
luismarques
LevyHsu

Commits

rG9e5c5afc7ee2: [RISCV] Optimize multiplication in the zba extension with SH*ADD

Summary

This patch make the following optimization.

(mul x, 3 * power_of_2) -> (SLLI (SH1ADD x, x), bits)
(mul x, 5 * power_of_2) -> (SLLI (SH2ADD x, x), bits)
(mul x, 9 * power_of_2) -> (SLLI (SH3ADD x, x), bits)

Diff Detail

Unit TestsFailed

	Time	Test
	2,890 ms	x64 debian > libarcher.critical::critical.c
	2,590 ms	x64 debian > libarcher.critical::lock-nested.c
	2,870 ms	x64 debian > libarcher.parallel::parallel-simple.c
	2,790 ms	x64 debian > libarcher.races::critical-unrelated.c
	2,630 ms	x64 debian > libarcher.races::lock-nested-unrelated.c
		View Full Test Results (20 Failed)

Event Timeline

benshi001 created this revision.Jul 12 2021, 12:36 AM

Herald added subscribers: vkmr, frasercrmck, evandro and 23 others. · View Herald TranscriptJul 12 2021, 12:37 AM

benshi001 requested review of this revision.Jul 12 2021, 12:37 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2021, 12:37 AM

Herald added subscribers: llvm-commits, MaskRay. · View Herald Transcript

benshi001 edited the summary of this revision. (Show Details)Jul 12 2021, 12:38 AM

The new optimization rules must be put before the following generic SH*ADD rules, otherwise the new optimization does not work.

def : Pat<(add (shl GPR:$rs1, (XLenVT 1)), non_imm12:$rs2),
          (SH1ADD GPR:$rs1, GPR:$rs2)>;
def : Pat<(add (shl GPR:$rs1, (XLenVT 2)), non_imm12:$rs2),
          (SH2ADD GPR:$rs1, GPR:$rs2)>;
def : Pat<(add (shl GPR:$rs1, (XLenVT 3)), non_imm12:$rs2),
          (SH3ADD GPR:$rs1, GPR:$rs2)>;

Harbormaster completed remote builds in B113446: Diff 357848.Jul 12 2021, 1:20 AM

benshi001 updated this revision to Diff 357914.Jul 12 2021, 5:59 AM

benshi001 edited the summary of this revision. (Show Details)

For previous Pats, such as

def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 12)), GPR:$rs2),
          (SH2ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;

The pattern (rs1*12 + rs2) no long existed after changes in decomposeMulByConstant(), so it has to be changed to

ParFrag<shadd ...>
def : Pat<(shadd GPR:$rs1, (XLenVT 1), (XLenVT 2), non_imm12:$rs2),
          (SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;

to preserve the previous optimization.

What's more, they must be put before the generic rs1*2+rs2->(shadd rs1, rs2) ones otherwise they can not be matched.

benshi001 updated this revision to Diff 357918.Jul 12 2021, 6:16 AM

benshi001 edited the summary of this revision. (Show Details)

jrtc27 added inline comments.Jul 12 2021, 6:22 AM

llvm/lib/Target/RISCV/RISCVInstrInfoB.td
972–977	To be, shadd means something like `A << B + C`, not `A << B + A << C + D`

Harbormaster completed remote builds in B113490: Diff 357918.Jul 12 2021, 7:00 AM

benshi001 updated this revision to Diff 357937.Jul 12 2021, 7:08 AM

benshi001 marked an inline comment as done.

benshi001 added inline comments.

llvm/lib/Target/RISCV/RISCVInstrInfoB.td
972–977	change the ParFrag's name to addshl.

Harbormaster completed remote builds in B113501: Diff 357937.Jul 12 2021, 7:46 AM

LevyHsu added inline comments.Jul 12 2021, 7:52 AM

llvm/lib/Target/RISCV/RISCVInstrInfoB.td
972–977	If I understand it correctly, (shl node:$A, node:$B) matches (A<<B) (shl node:$A, node:$C) matches (A<<C) which makes the pattern like Jessica said: (A<<B) + (A<<C) + D But on spec those patterns are: uint_xlen_t sh1add(uint_xlen_t rs1, uint_xlen_t rs2) { return (rs1 << 1) + rs2; } uint_xlen_t sh2add(uint_xlen_t rs1, uint_xlen_t rs2) { return (rs1 << 2) + rs2; } uint_xlen_t sh3add(uint_xlen_t rs1, uint_xlen_t rs2) { return (rs1 << 3) + rs2; }

LevyHsu added inline comments.Jul 12 2021, 8:23 AM

llvm/lib/Target/RISCV/RISCVInstrInfoB.td
972–977	Nevermind...missed the update

benshi001 marked 2 inline comments as done.Jul 12 2021, 6:34 PM

benshi001 updated this revision to Diff 358125.Jul 12 2021, 7:26 PM

benshi001 edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B113634: Diff 358125.Jul 12 2021, 8:04 PM

benshi001 updated this revision to Diff 358496.Jul 13 2021, 7:40 PM

benshi001 edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B113904: Diff 358496.Jul 13 2021, 8:19 PM

I did not put the optimization of (mul x, 3 * power_of_2) in decomposeMulByConstant(). The reason is that using a PatLeaf is more simple than checking the pattern (ADD (SLLI), (SLLI).

benshi001 updated this revision to Diff 359522.Jul 16 2021, 8:09 PM

I change the $rs2 in the Pat from non_imm12 to GPR as following

def : Pat<(addshl GPR:$rs1, (XLenVT 1), (XLenVT 2), GPR:$rs2),
          (SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;

Because for a*6 + 10, non_imm12:$rs2 will lead to

addi Rb, zero, 6
mul Ra, Rb, Ra
addi Ra, Ra, 6

while GPR:$rs2 will lead to

addi Rb, zero, 10
sh1add Ra, Ra, Ra
sh1add Ra, Ra, Rb

And I think the later one is better, so I changed non_imm12:$rs2 to GPR:$rs2.

Harbormaster completed remote builds in B114656: Diff 359522.Jul 16 2021, 8:42 PM

In D105796#2885104, @benshi001 wrote:
I change the $rs2 in the Pat from non_imm12 to GPR as following
def : Pat<(addshl GPR:$rs1, (XLenVT 1), (XLenVT 2), GPR:$rs2),
          (SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
Because for a*6 + 10, non_imm12:$rs2 will lead to
addi Rb, zero, 6
mul Ra, Rb, Ra
addi Ra, Ra, 6
while GPR:$rs2 will lead to
addi Rb, zero, 10
sh1add Ra, Ra, Ra
sh1add Ra, Ra, Rb
And I think the later one is better, so I changed non_imm12:$rs2 to GPR:$rs2.

I'm not sure I follow this. How can a change to the isel pattern cause a mul to be created? Wasn't the mul already decomposed?

benshi001 added a comment.Jul 16 2021, 9:31 PM

This comment was removed by benshi001.

In D105796#2885130, @craig.topper wrote:
In D105796#2885104, @benshi001 wrote:
I change the $rs2 in the Pat from non_imm12 to GPR as following
def : Pat<(addshl GPR:$rs1, (XLenVT 1), (XLenVT 2), GPR:$rs2),
          (SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
Because for a*6 + 10, non_imm12:$rs2 will lead to
addi Rb, zero, 6
mul Ra, Rb, Ra
addi Ra, Ra, 6
while GPR:$rs2 will lead to
addi Rb, zero, 10
sh1add Ra, Ra, Ra
sh1add Ra, Ra, Rb
And I think the later one is better, so I changed non_imm12:$rs2 to GPR:$rs2.
I'm not sure I follow this. How can a change to the isel pattern cause a mul to be created? Wasn't the mul already decomposed?

You are right. For non_imm12:$rs2 in a*10+6, the following is generated

slli    a1, a0, 1
sh3add  a0, a0, a1
addi    a0, a0, 6

For GPR:$rs2, the assembly is

sh2add  a0, a0, a0
addi    a1, zero, 6
sh1add  a0, a0, a1

I think maybe the first one is better, so I will rollback my patch.

benshi001 updated this revision to Diff 359527.Jul 16 2021, 9:40 PM

Harbormaster completed remote builds in B114661: Diff 359527.Jul 16 2021, 10:13 PM

benshi001 updated this revision to Diff 360351.Jul 20 2021, 8:42 PM

benshi001 edited the summary of this revision. (Show Details)

This comment was removed by benshi001.

I have split previous patch to smaller ones, which would be easy to review. And current patch only contains

(mul x, 3 * power_of_2) -> (SLLI (SH1ADD x, x), bits)
(mul x, 5 * power_of_2) -> (SLLI (SH2ADD x, x), bits)
(mul x, 9 * power_of_2) -> (SLLI (SH3ADD x, x), bits)

Harbormaster completed remote builds in B115242: Diff 360351.Jul 20 2021, 9:27 PM

craig.topper added inline comments.Jul 21 2021, 9:56 AM

llvm/lib/Target/RISCV/RISCVInstrInfoB.td
1001	Can we rename BSETINVTwoBitsMaskLow to TrailingZerosXForm?

Do we need to change BSETINVTwoBitsMaskHigh to LeadingZerosXForm ?

In D105796#2895304, @benshi001 wrote:

Do we need to change BSETINVTwoBitsMaskHigh to LeadingZerosXForm ?

Since it does 63-leadingzeros i wouldn’t name it LeadingZerosXForm. You can leave it as is until we find another use for it.

LGTM

This revision is now accepted and ready to land.Jul 21 2021, 7:09 PM

This revision was landed with ongoing or failed builds.Jul 21 2021, 7:29 PM

Closed by commit rG9e5c5afc7ee2: [RISCV] Optimize multiplication in the zba extension with SH*ADD (authored by benshi001). · Explain Why

This revision was automatically updated to reflect the committed changes.

benshi001 added a commit: rG9e5c5afc7ee2: [RISCV] Optimize multiplication in the zba extension with SH*ADD.

Harbormaster completed remote builds in B115474: Diff 360682.Jul 21 2021, 8:27 PM

Jimerlife mentioned this in D116917: [RISCV] Optimize some mul operation using SH*ADDUW instruction.Jan 9 2022, 11:19 PM

Diff 358125

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,748 Lines • ▼ Show 20 Lines	if (VT.isScalarInteger()) {
if (Subtarget.hasStdExtM() && VT.getSizeInBits() > Subtarget.getXLen())		if (Subtarget.hasStdExtM() && VT.getSizeInBits() > Subtarget.getXLen())
return false;		return false;
if (auto *ConstNode = dyn_cast<ConstantSDNode>(C.getNode())) {		if (auto *ConstNode = dyn_cast<ConstantSDNode>(C.getNode())) {
// Break the MUL to a SLLI and an ADD/SUB.		// Break the MUL to a SLLI and an ADD/SUB.
const APInt &Imm = ConstNode->getAPIntValue();		const APInt &Imm = ConstNode->getAPIntValue();
if ((Imm + 1).isPowerOf2() \|\| (Imm - 1).isPowerOf2() \|\|		if ((Imm + 1).isPowerOf2() \|\| (Imm - 1).isPowerOf2() \|\|
(1 - Imm).isPowerOf2() \|\| (-1 - Imm).isPowerOf2())		(1 - Imm).isPowerOf2() \|\| (-1 - Imm).isPowerOf2())
return true;		return true;
		// Optimize x*(PowerOf2+[2\|4\|8]) to (SH[1\|2\|3]ADD x, (SLLI x, bits)).
		if (Subtarget.hasStdExtZba() &&
		((Imm - 2).isPowerOf2() \|\| (Imm - 4).isPowerOf2() \|\|
		(Imm - 8).isPowerOf2()))
		return true;
// Omit the following optimization if the sub target has the M extension		// Omit the following optimization if the sub target has the M extension
// and the data size >= XLen.		// and the data size >= XLen.
if (Subtarget.hasStdExtM() && VT.getSizeInBits() >= Subtarget.getXLen())		if (Subtarget.hasStdExtM() && VT.getSizeInBits() >= Subtarget.getXLen())
return false;		return false;
// Break the MUL to two SLLI instructions and an ADD/SUB, if Imm needs		// Break the MUL to two SLLI instructions and an ADD/SUB, if Imm needs
// a pair of LUI/ADDI.		// a pair of LUI/ADDI.
if (!Imm.isSignedIntN(12) && Imm.countTrailingZeros() < 12) {		if (!Imm.isSignedIntN(12) && Imm.countTrailingZeros() < 12) {
APInt ImmS = Imm.ashr(Imm.countTrailingZeros());		APInt ImmS = Imm.ashr(Imm.countTrailingZeros());
if ((ImmS + 1).isPowerOf2() \|\| (ImmS - 1).isPowerOf2() \|\|		if ((ImmS + 1).isPowerOf2() \|\| (ImmS - 1).isPowerOf2() \|\|
(1 - ImmS).isPowerOf2())		(1 - ImmS).isPowerOf2())
return true;		return true;
}		}
}		}
}		}

return false;		return false;
}		}

bool RISCVTargetLowering::allowsMisalignedMemoryAccesses(		bool RISCVTargetLowering::allowsMisalignedMemoryAccesses(
▲ Show 20 Lines • Show All 126 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVInstrInfoB.td

	Show First 20 Lines • Show All 963 Lines • ▼ Show 20 Lines
	def : Pat<(i64 (and GPR:$rs, 0xFFFF)), (ZEXTH_RV64 GPR:$rs)>;			def : Pat<(i64 (and GPR:$rs, 0xFFFF)), (ZEXTH_RV64 GPR:$rs)>;

	// Pattern to exclude simm12 immediates from matching.			// Pattern to exclude simm12 immediates from matching.
	def non_imm12 : PatLeaf<(XLenVT GPR:$a), [{			def non_imm12 : PatLeaf<(XLenVT GPR:$a), [{
	auto *C = dyn_cast<ConstantSDNode>(N);			auto *C = dyn_cast<ConstantSDNode>(N);
	return !C \|\| !isInt<12>(C->getSExtValue());			return !C \|\| !isInt<12>(C->getSExtValue());
	}]>;			}]>;

	let Predicates = [HasStdExtZba] in {			def addshl : PatFrag<(ops node:$A, node:$B, node:$C, node:$D),
	def : Pat<(add (shl GPR:$rs1, (XLenVT 1)), non_imm12:$rs2),			(add (add (shl node:$A, node:$B), (shl node:$A, node:$C)),
	(SH1ADD GPR:$rs1, GPR:$rs2)>;			node:$D), [{
	def : Pat<(add (shl GPR:$rs1, (XLenVT 2)), non_imm12:$rs2),			return N->getOperand(0)->getOperand(0)->hasOneUse() &&
	(SH2ADD GPR:$rs1, GPR:$rs2)>;			N->getOperand(0)->getOperand(1)->hasOneUse();
	def : Pat<(add (shl GPR:$rs1, (XLenVT 3)), non_imm12:$rs2),			}]>;
				jrtc27Unsubmitted Done Reply Inline Actions To be, shadd means something like `A << B + C`, not `A << B + A << C + D` jrtc27: To be, shadd means something like `A << B + C`, not `A << B + A << C + D`
				benshi001AuthorUnsubmitted Done Reply Inline Actions change the ParFrag's name to addshl. benshi001: change the ParFrag's name to addshl.
				LevyHsuUnsubmitted Done Reply Inline Actions If I understand it correctly, (shl node:$A, node:$B) matches (A<<B) (shl node:$A, node:$C) matches (A<<C) which makes the pattern like Jessica said: (A<<B) + (A<<C) + D But on spec those patterns are: uint_xlen_t sh1add(uint_xlen_t rs1, uint_xlen_t rs2) { return (rs1 << 1) + rs2; } uint_xlen_t sh2add(uint_xlen_t rs1, uint_xlen_t rs2) { return (rs1 << 2) + rs2; } uint_xlen_t sh3add(uint_xlen_t rs1, uint_xlen_t rs2) { return (rs1 << 3) + rs2; } LevyHsu: If I understand it correctly, (shl node:$A, node:$B) matches (A<<B) (shl node:$A, node:$C)…
				LevyHsuUnsubmitted Done Reply Inline Actions Nevermind...missed the update LevyHsu: Nevermind...missed the update
	(SH3ADD GPR:$rs1, GPR:$rs2)>;

	def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 6)), GPR:$rs2),			let Predicates = [HasStdExtZba] in {
				def : Pat<(addshl GPR:$rs1, (XLenVT 1), (XLenVT 2), non_imm12:$rs2),
	(SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;			(SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
	def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 10)), GPR:$rs2),			def : Pat<(addshl GPR:$rs1, (XLenVT 1), (XLenVT 3), non_imm12:$rs2),
	(SH1ADD (SH2ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;			(SH1ADD (SH2ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
	def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 18)), GPR:$rs2),			def : Pat<(addshl GPR:$rs1, (XLenVT 1), (XLenVT 4), non_imm12:$rs2),
	(SH1ADD (SH3ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;			(SH1ADD (SH3ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
	def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 12)), GPR:$rs2),			def : Pat<(addshl GPR:$rs1, (XLenVT 2), (XLenVT 3), non_imm12:$rs2),
	(SH2ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;			(SH2ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
	def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 20)), GPR:$rs2),			def : Pat<(addshl GPR:$rs1, (XLenVT 2), (XLenVT 4), non_imm12:$rs2),
	(SH2ADD (SH2ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;			(SH2ADD (SH2ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
	def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 36)), GPR:$rs2),			def : Pat<(addshl GPR:$rs1, (XLenVT 2), (XLenVT 5), non_imm12:$rs2),
	(SH2ADD (SH3ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;			(SH2ADD (SH3ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
	def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 24)), GPR:$rs2),			def : Pat<(addshl GPR:$rs1, (XLenVT 3), (XLenVT 4), non_imm12:$rs2),
	(SH3ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;			(SH3ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
	def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 40)), GPR:$rs2),			def : Pat<(addshl GPR:$rs1, (XLenVT 3), (XLenVT 5), non_imm12:$rs2),
	(SH3ADD (SH2ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;			(SH3ADD (SH2ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
	def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 72)), GPR:$rs2),			def : Pat<(addshl GPR:$rs1, (XLenVT 3), (XLenVT 6), non_imm12:$rs2),
	(SH3ADD (SH3ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;			(SH3ADD (SH3ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;

				def : Pat<(add (shl GPR:$rs1, (XLenVT 1)), non_imm12:$rs2),
				(SH1ADD GPR:$rs1, GPR:$rs2)>;
				def : Pat<(add (shl GPR:$rs1, (XLenVT 2)), non_imm12:$rs2),
				craig.topperUnsubmitted Done Reply Inline Actions Can we rename BSETINVTwoBitsMaskLow to TrailingZerosXForm? craig.topper: Can we rename BSETINVTwoBitsMaskLow to TrailingZerosXForm?
				(SH2ADD GPR:$rs1, GPR:$rs2)>;
				def : Pat<(add (shl GPR:$rs1, (XLenVT 3)), non_imm12:$rs2),
				(SH3ADD GPR:$rs1, GPR:$rs2)>;
	} // Predicates = [HasStdExtZba]			} // Predicates = [HasStdExtZba]

	let Predicates = [HasStdExtZba, IsRV64] in {			let Predicates = [HasStdExtZba, IsRV64] in {
	def : Pat<(i64 (SLLIUWPat GPR:$rs1, uimm5:$shamt)),			def : Pat<(i64 (SLLIUWPat GPR:$rs1, uimm5:$shamt)),
	(SLLIUW GPR:$rs1, uimm5:$shamt)>;			(SLLIUW GPR:$rs1, uimm5:$shamt)>;
	def : Pat<(i64 (shl (and GPR:$rs1, 0xFFFFFFFF), uimm5:$shamt)),			def : Pat<(i64 (shl (and GPR:$rs1, 0xFFFFFFFF), uimm5:$shamt)),
	(SLLIUW GPR:$rs1, uimm5:$shamt)>;			(SLLIUW GPR:$rs1, uimm5:$shamt)>;
	def : Pat<(i64 (add (and GPR:$rs1, 0xFFFFFFFF), non_imm12:$rs2)),			def : Pat<(i64 (add (and GPR:$rs1, 0xFFFFFFFF), non_imm12:$rs2)),
	▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rv32zba.ll

	Show First 20 Lines • Show All 366 Lines • ▼ Show 20 Lines
	; RV32I-LABEL: mul258:			; RV32I-LABEL: mul258:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	; RV32I-NEXT: addi a1, zero, 258			; RV32I-NEXT: addi a1, zero, 258
	; RV32I-NEXT: mul a0, a0, a1			; RV32I-NEXT: mul a0, a0, a1
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	;			;
	; RV32IB-LABEL: mul258:			; RV32IB-LABEL: mul258:
	; RV32IB: # %bb.0:			; RV32IB: # %bb.0:
	; RV32IB-NEXT: addi a1, zero, 258			; RV32IB-NEXT: slli a1, a0, 8
	; RV32IB-NEXT: mul a0, a0, a1			; RV32IB-NEXT: sh1add a0, a0, a1
	; RV32IB-NEXT: ret			; RV32IB-NEXT: ret
	;			;
	; RV32IBA-LABEL: mul258:			; RV32IBA-LABEL: mul258:
	; RV32IBA: # %bb.0:			; RV32IBA: # %bb.0:
	; RV32IBA-NEXT: addi a1, zero, 258			; RV32IBA-NEXT: slli a1, a0, 8
	; RV32IBA-NEXT: mul a0, a0, a1			; RV32IBA-NEXT: sh1add a0, a0, a1
	; RV32IBA-NEXT: ret			; RV32IBA-NEXT: ret
	%c = mul i32 %a, 258			%c = mul i32 %a, 258
	ret i32 %c			ret i32 %c
	}			}

	define i32 @mul260(i32 %a) {			define i32 @mul260(i32 %a) {
	; RV32I-LABEL: mul260:			; RV32I-LABEL: mul260:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	; RV32I-NEXT: addi a1, zero, 260			; RV32I-NEXT: addi a1, zero, 260
	; RV32I-NEXT: mul a0, a0, a1			; RV32I-NEXT: mul a0, a0, a1
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	;			;
	; RV32IB-LABEL: mul260:			; RV32IB-LABEL: mul260:
	; RV32IB: # %bb.0:			; RV32IB: # %bb.0:
	; RV32IB-NEXT: addi a1, zero, 260			; RV32IB-NEXT: slli a1, a0, 8
	; RV32IB-NEXT: mul a0, a0, a1			; RV32IB-NEXT: sh2add a0, a0, a1
	; RV32IB-NEXT: ret			; RV32IB-NEXT: ret
	;			;
	; RV32IBA-LABEL: mul260:			; RV32IBA-LABEL: mul260:
	; RV32IBA: # %bb.0:			; RV32IBA: # %bb.0:
	; RV32IBA-NEXT: addi a1, zero, 260			; RV32IBA-NEXT: slli a1, a0, 8
	; RV32IBA-NEXT: mul a0, a0, a1			; RV32IBA-NEXT: sh2add a0, a0, a1
	; RV32IBA-NEXT: ret			; RV32IBA-NEXT: ret
	%c = mul i32 %a, 260			%c = mul i32 %a, 260
	ret i32 %c			ret i32 %c
	}			}

	define i32 @mul264(i32 %a) {			define i32 @mul264(i32 %a) {
	; RV32I-LABEL: mul264:			; RV32I-LABEL: mul264:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	; RV32I-NEXT: addi a1, zero, 264			; RV32I-NEXT: addi a1, zero, 264
	; RV32I-NEXT: mul a0, a0, a1			; RV32I-NEXT: mul a0, a0, a1
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	;			;
	; RV32IB-LABEL: mul264:			; RV32IB-LABEL: mul264:
	; RV32IB: # %bb.0:			; RV32IB: # %bb.0:
	; RV32IB-NEXT: addi a1, zero, 264			; RV32IB-NEXT: slli a1, a0, 8
	; RV32IB-NEXT: mul a0, a0, a1			; RV32IB-NEXT: sh3add a0, a0, a1
	; RV32IB-NEXT: ret			; RV32IB-NEXT: ret
	;			;
	; RV32IBA-LABEL: mul264:			; RV32IBA-LABEL: mul264:
	; RV32IBA: # %bb.0:			; RV32IBA: # %bb.0:
	; RV32IBA-NEXT: addi a1, zero, 264			; RV32IBA-NEXT: slli a1, a0, 8
	; RV32IBA-NEXT: mul a0, a0, a1			; RV32IBA-NEXT: sh3add a0, a0, a1
	; RV32IBA-NEXT: ret			; RV32IBA-NEXT: ret
	%c = mul i32 %a, 264			%c = mul i32 %a, 264
	ret i32 %c			ret i32 %c
	}			}

llvm/test/CodeGen/RISCV/rv64zba.ll

	Show First 20 Lines • Show All 824 Lines • ▼ Show 20 Lines
	; RV64I-LABEL: mul258:			; RV64I-LABEL: mul258:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: addi a1, zero, 258			; RV64I-NEXT: addi a1, zero, 258
	; RV64I-NEXT: mul a0, a0, a1			; RV64I-NEXT: mul a0, a0, a1
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	;			;
	; RV64IB-LABEL: mul258:			; RV64IB-LABEL: mul258:
	; RV64IB: # %bb.0:			; RV64IB: # %bb.0:
	; RV64IB-NEXT: addi a1, zero, 258			; RV64IB-NEXT: slli a1, a0, 8
	; RV64IB-NEXT: mul a0, a0, a1			; RV64IB-NEXT: sh1add a0, a0, a1
	; RV64IB-NEXT: ret			; RV64IB-NEXT: ret
	;			;
	; RV64IBA-LABEL: mul258:			; RV64IBA-LABEL: mul258:
	; RV64IBA: # %bb.0:			; RV64IBA: # %bb.0:
	; RV64IBA-NEXT: addi a1, zero, 258			; RV64IBA-NEXT: slli a1, a0, 8
	; RV64IBA-NEXT: mul a0, a0, a1			; RV64IBA-NEXT: sh1add a0, a0, a1
	; RV64IBA-NEXT: ret			; RV64IBA-NEXT: ret
	%c = mul i64 %a, 258			%c = mul i64 %a, 258
	ret i64 %c			ret i64 %c
	}			}

	define i64 @mul260(i64 %a) {			define i64 @mul260(i64 %a) {
	; RV64I-LABEL: mul260:			; RV64I-LABEL: mul260:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: addi a1, zero, 260			; RV64I-NEXT: addi a1, zero, 260
	; RV64I-NEXT: mul a0, a0, a1			; RV64I-NEXT: mul a0, a0, a1
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	;			;
	; RV64IB-LABEL: mul260:			; RV64IB-LABEL: mul260:
	; RV64IB: # %bb.0:			; RV64IB: # %bb.0:
	; RV64IB-NEXT: addi a1, zero, 260			; RV64IB-NEXT: slli a1, a0, 8
	; RV64IB-NEXT: mul a0, a0, a1			; RV64IB-NEXT: sh2add a0, a0, a1
	; RV64IB-NEXT: ret			; RV64IB-NEXT: ret
	;			;
	; RV64IBA-LABEL: mul260:			; RV64IBA-LABEL: mul260:
	; RV64IBA: # %bb.0:			; RV64IBA: # %bb.0:
	; RV64IBA-NEXT: addi a1, zero, 260			; RV64IBA-NEXT: slli a1, a0, 8
	; RV64IBA-NEXT: mul a0, a0, a1			; RV64IBA-NEXT: sh2add a0, a0, a1
	; RV64IBA-NEXT: ret			; RV64IBA-NEXT: ret
	%c = mul i64 %a, 260			%c = mul i64 %a, 260
	ret i64 %c			ret i64 %c
	}			}

	define i64 @mul264(i64 %a) {			define i64 @mul264(i64 %a) {
	; RV64I-LABEL: mul264:			; RV64I-LABEL: mul264:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: addi a1, zero, 264			; RV64I-NEXT: addi a1, zero, 264
	; RV64I-NEXT: mul a0, a0, a1			; RV64I-NEXT: mul a0, a0, a1
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	;			;
	; RV64IB-LABEL: mul264:			; RV64IB-LABEL: mul264:
	; RV64IB: # %bb.0:			; RV64IB: # %bb.0:
	; RV64IB-NEXT: addi a1, zero, 264			; RV64IB-NEXT: slli a1, a0, 8
	; RV64IB-NEXT: mul a0, a0, a1			; RV64IB-NEXT: sh3add a0, a0, a1
	; RV64IB-NEXT: ret			; RV64IB-NEXT: ret
	;			;
	; RV64IBA-LABEL: mul264:			; RV64IBA-LABEL: mul264:
	; RV64IBA: # %bb.0:			; RV64IBA: # %bb.0:
	; RV64IBA-NEXT: addi a1, zero, 264			; RV64IBA-NEXT: slli a1, a0, 8
	; RV64IBA-NEXT: mul a0, a0, a1			; RV64IBA-NEXT: sh3add a0, a0, a1
	; RV64IBA-NEXT: ret			; RV64IBA-NEXT: ret
	%c = mul i64 %a, 264			%c = mul i64 %a, 264
	ret i64 %c			ret i64 %c
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Optimize multiplication in the zba extension with SH*ADD
ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 358125

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/lib/Target/RISCV/RISCVInstrInfoB.td

llvm/test/CodeGen/RISCV/rv32zba.ll

llvm/test/CodeGen/RISCV/rv64zba.ll

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Optimize multiplication in the zba extension with SH*ADDClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 358125

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/lib/Target/RISCV/RISCVInstrInfoB.td

llvm/test/CodeGen/RISCV/rv32zba.ll

llvm/test/CodeGen/RISCV/rv64zba.ll

[RISCV] Optimize multiplication in the zba extension with SH*ADD
ClosedPublic