Download Raw Diff

Details

Reviewers

craig.topper
asb
luismarques
LevyHsu

Commits

rG9e5c5afc7ee2: [RISCV] Optimize multiplication in the zba extension with SH*ADD

Summary

This patch make the following optimization.

(mul x, 3 * power_of_2) -> (SLLI (SH1ADD x, x), bits)
(mul x, 5 * power_of_2) -> (SLLI (SH2ADD x, x), bits)
(mul x, 9 * power_of_2) -> (SLLI (SH3ADD x, x), bits)

Diff Detail

Unit TestsFailed

	Time	Test
	2,610 ms	x64 debian > libarcher.critical::critical.c
	2,680 ms	x64 debian > libarcher.critical::lock-nested.c
	2,900 ms	x64 debian > libarcher.races::critical-unrelated.c
	2,740 ms	x64 debian > libarcher.races::lock-nested-unrelated.c
	2,790 ms	x64 debian > libarcher.races::lock-unrelated.c
		View Full Test Results (17 Failed)

Event Timeline

benshi001 created this revision.Jul 12 2021, 12:36 AM

Herald added subscribers: vkmr, frasercrmck, evandro and 23 others. · View Herald TranscriptJul 12 2021, 12:37 AM

benshi001 requested review of this revision.Jul 12 2021, 12:37 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2021, 12:37 AM

Herald added subscribers: llvm-commits, MaskRay. · View Herald Transcript

benshi001 edited the summary of this revision. (Show Details)Jul 12 2021, 12:38 AM

The new optimization rules must be put before the following generic SH*ADD rules, otherwise the new optimization does not work.

def : Pat<(add (shl GPR:$rs1, (XLenVT 1)), non_imm12:$rs2),
          (SH1ADD GPR:$rs1, GPR:$rs2)>;
def : Pat<(add (shl GPR:$rs1, (XLenVT 2)), non_imm12:$rs2),
          (SH2ADD GPR:$rs1, GPR:$rs2)>;
def : Pat<(add (shl GPR:$rs1, (XLenVT 3)), non_imm12:$rs2),
          (SH3ADD GPR:$rs1, GPR:$rs2)>;

Harbormaster completed remote builds in B113446: Diff 357848.Jul 12 2021, 1:20 AM

benshi001 updated this revision to Diff 357914.Jul 12 2021, 5:59 AM

benshi001 edited the summary of this revision. (Show Details)

For previous Pats, such as

def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 12)), GPR:$rs2),
          (SH2ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;

The pattern (rs1*12 + rs2) no long existed after changes in decomposeMulByConstant(), so it has to be changed to

ParFrag<shadd ...>
def : Pat<(shadd GPR:$rs1, (XLenVT 1), (XLenVT 2), non_imm12:$rs2),
          (SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;

to preserve the previous optimization.

What's more, they must be put before the generic rs1*2+rs2->(shadd rs1, rs2) ones otherwise they can not be matched.

benshi001 updated this revision to Diff 357918.Jul 12 2021, 6:16 AM

benshi001 edited the summary of this revision. (Show Details)

jrtc27 added inline comments.Jul 12 2021, 6:22 AM

llvm/lib/Target/RISCV/RISCVInstrInfoB.td
987–992	To be, shadd means something like `A << B + C`, not `A << B + A << C + D`

Harbormaster completed remote builds in B113490: Diff 357918.Jul 12 2021, 7:00 AM

benshi001 updated this revision to Diff 357937.Jul 12 2021, 7:08 AM

benshi001 marked an inline comment as done.

benshi001 added inline comments.

llvm/lib/Target/RISCV/RISCVInstrInfoB.td
987–992	change the ParFrag's name to addshl.

Harbormaster completed remote builds in B113501: Diff 357937.Jul 12 2021, 7:46 AM

LevyHsu added inline comments.Jul 12 2021, 7:52 AM

llvm/lib/Target/RISCV/RISCVInstrInfoB.td
987–992	If I understand it correctly, (shl node:$A, node:$B) matches (A<<B) (shl node:$A, node:$C) matches (A<<C) which makes the pattern like Jessica said: (A<<B) + (A<<C) + D But on spec those patterns are: uint_xlen_t sh1add(uint_xlen_t rs1, uint_xlen_t rs2) { return (rs1 << 1) + rs2; } uint_xlen_t sh2add(uint_xlen_t rs1, uint_xlen_t rs2) { return (rs1 << 2) + rs2; } uint_xlen_t sh3add(uint_xlen_t rs1, uint_xlen_t rs2) { return (rs1 << 3) + rs2; }

LevyHsu added inline comments.Jul 12 2021, 8:23 AM

llvm/lib/Target/RISCV/RISCVInstrInfoB.td
987–992	Nevermind...missed the update

benshi001 marked 2 inline comments as done.Jul 12 2021, 6:34 PM

benshi001 updated this revision to Diff 358125.Jul 12 2021, 7:26 PM

benshi001 edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B113634: Diff 358125.Jul 12 2021, 8:04 PM

benshi001 updated this revision to Diff 358496.Jul 13 2021, 7:40 PM

benshi001 edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B113904: Diff 358496.Jul 13 2021, 8:19 PM

I did not put the optimization of (mul x, 3 * power_of_2) in decomposeMulByConstant(). The reason is that using a PatLeaf is more simple than checking the pattern (ADD (SLLI), (SLLI).

benshi001 updated this revision to Diff 359522.Jul 16 2021, 8:09 PM

I change the $rs2 in the Pat from non_imm12 to GPR as following

def : Pat<(addshl GPR:$rs1, (XLenVT 1), (XLenVT 2), GPR:$rs2),
          (SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;

Because for a*6 + 10, non_imm12:$rs2 will lead to

addi Rb, zero, 6
mul Ra, Rb, Ra
addi Ra, Ra, 6

while GPR:$rs2 will lead to

addi Rb, zero, 10
sh1add Ra, Ra, Ra
sh1add Ra, Ra, Rb

And I think the later one is better, so I changed non_imm12:$rs2 to GPR:$rs2.

Harbormaster completed remote builds in B114656: Diff 359522.Jul 16 2021, 8:42 PM

In D105796#2885104, @benshi001 wrote:
I change the $rs2 in the Pat from non_imm12 to GPR as following
def : Pat<(addshl GPR:$rs1, (XLenVT 1), (XLenVT 2), GPR:$rs2),
          (SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
Because for a*6 + 10, non_imm12:$rs2 will lead to
addi Rb, zero, 6
mul Ra, Rb, Ra
addi Ra, Ra, 6
while GPR:$rs2 will lead to
addi Rb, zero, 10
sh1add Ra, Ra, Ra
sh1add Ra, Ra, Rb
And I think the later one is better, so I changed non_imm12:$rs2 to GPR:$rs2.

I'm not sure I follow this. How can a change to the isel pattern cause a mul to be created? Wasn't the mul already decomposed?

benshi001 added a comment.Jul 16 2021, 9:31 PM

This comment was removed by benshi001.

In D105796#2885130, @craig.topper wrote:
In D105796#2885104, @benshi001 wrote:
I change the $rs2 in the Pat from non_imm12 to GPR as following
def : Pat<(addshl GPR:$rs1, (XLenVT 1), (XLenVT 2), GPR:$rs2),
          (SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
Because for a*6 + 10, non_imm12:$rs2 will lead to
addi Rb, zero, 6
mul Ra, Rb, Ra
addi Ra, Ra, 6
while GPR:$rs2 will lead to
addi Rb, zero, 10
sh1add Ra, Ra, Ra
sh1add Ra, Ra, Rb
And I think the later one is better, so I changed non_imm12:$rs2 to GPR:$rs2.
I'm not sure I follow this. How can a change to the isel pattern cause a mul to be created? Wasn't the mul already decomposed?

You are right. For non_imm12:$rs2 in a*10+6, the following is generated

slli    a1, a0, 1
sh3add  a0, a0, a1
addi    a0, a0, 6

For GPR:$rs2, the assembly is

sh2add  a0, a0, a0
addi    a1, zero, 6
sh1add  a0, a0, a1

I think maybe the first one is better, so I will rollback my patch.

benshi001 updated this revision to Diff 359527.Jul 16 2021, 9:40 PM

Harbormaster completed remote builds in B114661: Diff 359527.Jul 16 2021, 10:13 PM

benshi001 updated this revision to Diff 360351.Jul 20 2021, 8:42 PM

benshi001 edited the summary of this revision. (Show Details)

This comment was removed by benshi001.

I have split previous patch to smaller ones, which would be easy to review. And current patch only contains

(mul x, 3 * power_of_2) -> (SLLI (SH1ADD x, x), bits)
(mul x, 5 * power_of_2) -> (SLLI (SH2ADD x, x), bits)
(mul x, 9 * power_of_2) -> (SLLI (SH3ADD x, x), bits)

Harbormaster completed remote builds in B115242: Diff 360351.Jul 20 2021, 9:27 PM

craig.topper added inline comments.Jul 21 2021, 9:56 AM

llvm/lib/Target/RISCV/RISCVInstrInfoB.td
1016	Can we rename BSETINVTwoBitsMaskLow to TrailingZerosXForm?

Do we need to change BSETINVTwoBitsMaskHigh to LeadingZerosXForm ?

In D105796#2895304, @benshi001 wrote:

Do we need to change BSETINVTwoBitsMaskHigh to LeadingZerosXForm ?

Since it does 63-leadingzeros i wouldn’t name it LeadingZerosXForm. You can leave it as is until we find another use for it.

LGTM

This revision is now accepted and ready to land.Jul 21 2021, 7:09 PM

This revision was landed with ongoing or failed builds.Jul 21 2021, 7:29 PM

Closed by commit rG9e5c5afc7ee2: [RISCV] Optimize multiplication in the zba extension with SH*ADD (authored by benshi001). · Explain Why

This revision was automatically updated to reflect the committed changes.

benshi001 added a commit: rG9e5c5afc7ee2: [RISCV] Optimize multiplication in the zba extension with SH*ADD.

Harbormaster completed remote builds in B115474: Diff 360682.Jul 21 2021, 8:27 PM

Jimerlife mentioned this in D116917: [RISCV] Optimize some mul operation using SH*ADDUW instruction.Jan 9 2022, 11:19 PM

Diff 359527

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,766 Lines • ▼ Show 20 Lines	if (VT.isScalarInteger()) {
if (Subtarget.hasStdExtM() && VT.getSizeInBits() > Subtarget.getXLen())		if (Subtarget.hasStdExtM() && VT.getSizeInBits() > Subtarget.getXLen())
return false;		return false;
if (auto *ConstNode = dyn_cast<ConstantSDNode>(C.getNode())) {		if (auto *ConstNode = dyn_cast<ConstantSDNode>(C.getNode())) {
// Break the MUL to a SLLI and an ADD/SUB.		// Break the MUL to a SLLI and an ADD/SUB.
const APInt &Imm = ConstNode->getAPIntValue();		const APInt &Imm = ConstNode->getAPIntValue();
if ((Imm + 1).isPowerOf2() \|\| (Imm - 1).isPowerOf2() \|\|		if ((Imm + 1).isPowerOf2() \|\| (Imm - 1).isPowerOf2() \|\|
(1 - Imm).isPowerOf2() \|\| (-1 - Imm).isPowerOf2())		(1 - Imm).isPowerOf2() \|\| (-1 - Imm).isPowerOf2())
return true;		return true;
		// Optimize x*(PowerOf2+[2\|4\|8]) to (SH[1\|2\|3]ADD x, (SLLI x, bits)).
		if (Subtarget.hasStdExtZba() &&
		((Imm - 2).isPowerOf2() \|\| (Imm - 4).isPowerOf2() \|\|
		(Imm - 8).isPowerOf2()))
		return true;
// Omit the following optimization if the sub target has the M extension		// Omit the following optimization if the sub target has the M extension
// and the data size >= XLen.		// and the data size >= XLen.
if (Subtarget.hasStdExtM() && VT.getSizeInBits() >= Subtarget.getXLen())		if (Subtarget.hasStdExtM() && VT.getSizeInBits() >= Subtarget.getXLen())
return false;		return false;
// Break the MUL to two SLLI instructions and an ADD/SUB, if Imm needs		// Break the MUL to two SLLI instructions and an ADD/SUB, if Imm needs
// a pair of LUI/ADDI.		// a pair of LUI/ADDI.
if (!Imm.isSignedIntN(12) && Imm.countTrailingZeros() < 12) {		if (!Imm.isSignedIntN(12) && Imm.countTrailingZeros() < 12) {
APInt ImmS = Imm.ashr(Imm.countTrailingZeros());		APInt ImmS = Imm.ashr(Imm.countTrailingZeros());
if ((ImmS + 1).isPowerOf2() \|\| (ImmS - 1).isPowerOf2() \|\|		if ((ImmS + 1).isPowerOf2() \|\| (ImmS - 1).isPowerOf2() \|\|
(1 - ImmS).isPowerOf2())		(1 - ImmS).isPowerOf2())
return true;		return true;
}		}
}		}
}		}

return false;		return false;
}		}

bool RISCVTargetLowering::allowsMisalignedMemoryAccesses(		bool RISCVTargetLowering::allowsMisalignedMemoryAccesses(
▲ Show 20 Lines • Show All 126 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVInstrInfoB.td

Show First 20 Lines • Show All 165 Lines • ▼ Show 20 Lines	def BCLRIANDIMask : PatLeaf<(imm), [{
return Subtarget->is64Bit() ? isPowerOf2_64(~I) : isPowerOf2_32(~I);		return Subtarget->is64Bit() ? isPowerOf2_64(~I) : isPowerOf2_32(~I);
}]>;		}]>;

def BCLRIANDIMaskLow : SDNodeXForm<imm, [{		def BCLRIANDIMaskLow : SDNodeXForm<imm, [{
return CurDAG->getTargetConstant((N->getZExtValue() & 0x7ff) \| ~0x7ffull,		return CurDAG->getTargetConstant((N->getZExtValue() & 0x7ff) \| ~0x7ffull,
SDLoc(N), N->getValueType(0));		SDLoc(N), N->getValueType(0));
}]>;		}]>;

		def C3LeftShift : PatLeaf<(imm), [{
		uint64_t C = N->getZExtValue();
		return C > 3 && ((C % 3) == 0) && isPowerOf2_64(C / 3);
		}]>;

		def C5LeftShift : PatLeaf<(imm), [{
		uint64_t C = N->getZExtValue();
		return C > 5 && ((C % 5) == 0) && isPowerOf2_64(C / 5);
		}]>;

		def C9LeftShift : PatLeaf<(imm), [{
		uint64_t C = N->getZExtValue();
		return C > 9 && ((C % 9) == 0) && isPowerOf2_64(C / 9);
		}]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Instruction class templates		// Instruction class templates
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// Some of these templates should be moved to RISCVInstrFormats.td once the B		// Some of these templates should be moved to RISCVInstrFormats.td once the B
// extension has been ratified.		// extension has been ratified.

let hasSideEffects = 0, mayLoad = 0, mayStore = 0 in		let hasSideEffects = 0, mayLoad = 0, mayStore = 0 in
▲ Show 20 Lines • Show All 782 Lines • ▼ Show 20 Lines
def : Pat<(i64 (and GPR:$rs, 0xFFFF)), (ZEXTH_RV64 GPR:$rs)>;		def : Pat<(i64 (and GPR:$rs, 0xFFFF)), (ZEXTH_RV64 GPR:$rs)>;

// Pattern to exclude simm12 immediates from matching.		// Pattern to exclude simm12 immediates from matching.
def non_imm12 : PatLeaf<(XLenVT GPR:$a), [{		def non_imm12 : PatLeaf<(XLenVT GPR:$a), [{
auto *C = dyn_cast<ConstantSDNode>(N);		auto *C = dyn_cast<ConstantSDNode>(N);
return !C \|\| !isInt<12>(C->getSExtValue());		return !C \|\| !isInt<12>(C->getSExtValue());
}]>;		}]>;

let Predicates = [HasStdExtZba] in {		def addshl : PatFrag<(ops node:$A, node:$B, node:$C, node:$D),
def : Pat<(add (shl GPR:$rs1, (XLenVT 1)), non_imm12:$rs2),		(add (add (shl node:$A, node:$B), (shl node:$A, node:$C)),
(SH1ADD GPR:$rs1, GPR:$rs2)>;		node:$D), [{
def : Pat<(add (shl GPR:$rs1, (XLenVT 2)), non_imm12:$rs2),		return N->getOperand(0)->getOperand(0)->hasOneUse() &&
(SH2ADD GPR:$rs1, GPR:$rs2)>;		N->getOperand(0)->getOperand(1)->hasOneUse();
def : Pat<(add (shl GPR:$rs1, (XLenVT 3)), non_imm12:$rs2),		}]>;
		jrtc27Unsubmitted Done Reply Inline Actions To be, shadd means something like `A << B + C`, not `A << B + A << C + D` jrtc27: To be, shadd means something like `A << B + C`, not `A << B + A << C + D`
		benshi001AuthorUnsubmitted Done Reply Inline Actions change the ParFrag's name to addshl. benshi001: change the ParFrag's name to addshl.
		LevyHsuUnsubmitted Done Reply Inline Actions If I understand it correctly, (shl node:$A, node:$B) matches (A<<B) (shl node:$A, node:$C) matches (A<<C) which makes the pattern like Jessica said: (A<<B) + (A<<C) + D But on spec those patterns are: uint_xlen_t sh1add(uint_xlen_t rs1, uint_xlen_t rs2) { return (rs1 << 1) + rs2; } uint_xlen_t sh2add(uint_xlen_t rs1, uint_xlen_t rs2) { return (rs1 << 2) + rs2; } uint_xlen_t sh3add(uint_xlen_t rs1, uint_xlen_t rs2) { return (rs1 << 3) + rs2; } LevyHsu: If I understand it correctly, (shl node:$A, node:$B) matches (A<<B) (shl node:$A, node:$C)…
		LevyHsuUnsubmitted Done Reply Inline Actions Nevermind...missed the update LevyHsu: Nevermind...missed the update
(SH3ADD GPR:$rs1, GPR:$rs2)>;

def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 6)), GPR:$rs2),		let Predicates = [HasStdExtZba] in {
		def : Pat<(addshl GPR:$rs1, (XLenVT 1), (XLenVT 2), non_imm12:$rs2),
(SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;		(SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 10)), GPR:$rs2),		def : Pat<(addshl GPR:$rs1, (XLenVT 1), (XLenVT 3), non_imm12:$rs2),
(SH1ADD (SH2ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;		(SH1ADD (SH2ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 18)), GPR:$rs2),		def : Pat<(addshl GPR:$rs1, (XLenVT 1), (XLenVT 4), non_imm12:$rs2),
(SH1ADD (SH3ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;		(SH1ADD (SH3ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 12)), GPR:$rs2),		def : Pat<(addshl GPR:$rs1, (XLenVT 2), (XLenVT 3), non_imm12:$rs2),
(SH2ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;		(SH2ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 20)), GPR:$rs2),		def : Pat<(addshl GPR:$rs1, (XLenVT 2), (XLenVT 4), non_imm12:$rs2),
(SH2ADD (SH2ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;		(SH2ADD (SH2ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 36)), GPR:$rs2),		def : Pat<(addshl GPR:$rs1, (XLenVT 2), (XLenVT 5), non_imm12:$rs2),
(SH2ADD (SH3ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;		(SH2ADD (SH3ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 24)), GPR:$rs2),		def : Pat<(addshl GPR:$rs1, (XLenVT 3), (XLenVT 4), non_imm12:$rs2),
(SH3ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;		(SH3ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 40)), GPR:$rs2),		def : Pat<(addshl GPR:$rs1, (XLenVT 3), (XLenVT 5), non_imm12:$rs2),
(SH3ADD (SH2ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;		(SH3ADD (SH2ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 72)), GPR:$rs2),		def : Pat<(addshl GPR:$rs1, (XLenVT 3), (XLenVT 6), non_imm12:$rs2),
(SH3ADD (SH3ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;		(SH3ADD (SH3ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;

		def : Pat<(add (shl GPR:$rs1, (XLenVT 1)), non_imm12:$rs2),
		(SH1ADD GPR:$rs1, GPR:$rs2)>;
		def : Pat<(add (shl GPR:$rs1, (XLenVT 2)), non_imm12:$rs2),
		craig.topperUnsubmitted Done Reply Inline Actions Can we rename BSETINVTwoBitsMaskLow to TrailingZerosXForm? craig.topper: Can we rename BSETINVTwoBitsMaskLow to TrailingZerosXForm?
		(SH2ADD GPR:$rs1, GPR:$rs2)>;
		def : Pat<(add (shl GPR:$rs1, (XLenVT 3)), non_imm12:$rs2),
		(SH3ADD GPR:$rs1, GPR:$rs2)>;

		def : Pat<(mul GPR:$r, C3LeftShift:$i),
		(SLLI (SH1ADD GPR:$r, GPR:$r),
		(BSETINVTwoBitsMaskLow C3LeftShift:$i))>;
		def : Pat<(mul GPR:$r, C5LeftShift:$i),
		(SLLI (SH2ADD GPR:$r, GPR:$r),
		(BSETINVTwoBitsMaskLow C5LeftShift:$i))>;
		def : Pat<(mul GPR:$r, C9LeftShift:$i),
		(SLLI (SH3ADD GPR:$r, GPR:$r),
		(BSETINVTwoBitsMaskLow C9LeftShift:$i))>;
} // Predicates = [HasStdExtZba]		} // Predicates = [HasStdExtZba]

let Predicates = [HasStdExtZba, IsRV64] in {		let Predicates = [HasStdExtZba, IsRV64] in {
def : Pat<(i64 (SLLIUWPat GPR:$rs1, uimm5:$shamt)),		def : Pat<(i64 (SLLIUWPat GPR:$rs1, uimm5:$shamt)),
(SLLIUW GPR:$rs1, uimm5:$shamt)>;		(SLLIUW GPR:$rs1, uimm5:$shamt)>;
def : Pat<(i64 (shl (and GPR:$rs1, 0xFFFFFFFF), uimm5:$shamt)),		def : Pat<(i64 (shl (and GPR:$rs1, 0xFFFFFFFF), uimm5:$shamt)),
(SLLIUW GPR:$rs1, uimm5:$shamt)>;		(SLLIUW GPR:$rs1, uimm5:$shamt)>;
def : Pat<(i64 (add (and GPR:$rs1, 0xFFFFFFFF), non_imm12:$rs2)),		def : Pat<(i64 (add (and GPR:$rs1, 0xFFFFFFFF), non_imm12:$rs2)),
▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rv32zba.ll

	Show First 20 Lines • Show All 300 Lines • ▼ Show 20 Lines
	; RV32I-LABEL: mul96:			; RV32I-LABEL: mul96:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	; RV32I-NEXT: addi a1, zero, 96			; RV32I-NEXT: addi a1, zero, 96
	; RV32I-NEXT: mul a0, a0, a1			; RV32I-NEXT: mul a0, a0, a1
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	;			;
	; RV32IB-LABEL: mul96:			; RV32IB-LABEL: mul96:
	; RV32IB: # %bb.0:			; RV32IB: # %bb.0:
	; RV32IB-NEXT: addi a1, zero, 96			; RV32IB-NEXT: sh1add a0, a0, a0
	; RV32IB-NEXT: mul a0, a0, a1			; RV32IB-NEXT: slli a0, a0, 5
	; RV32IB-NEXT: ret			; RV32IB-NEXT: ret
	;			;
	; RV32IBA-LABEL: mul96:			; RV32IBA-LABEL: mul96:
	; RV32IBA: # %bb.0:			; RV32IBA: # %bb.0:
	; RV32IBA-NEXT: addi a1, zero, 96			; RV32IBA-NEXT: sh1add a0, a0, a0
	; RV32IBA-NEXT: mul a0, a0, a1			; RV32IBA-NEXT: slli a0, a0, 5
	; RV32IBA-NEXT: ret			; RV32IBA-NEXT: ret
	%c = mul i32 %a, 96			%c = mul i32 %a, 96
	ret i32 %c			ret i32 %c
	}			}

	define i32 @mul160(i32 %a) {			define i32 @mul160(i32 %a) {
	; RV32I-LABEL: mul160:			; RV32I-LABEL: mul160:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	; RV32I-NEXT: addi a1, zero, 160			; RV32I-NEXT: addi a1, zero, 160
	; RV32I-NEXT: mul a0, a0, a1			; RV32I-NEXT: mul a0, a0, a1
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	;			;
	; RV32IB-LABEL: mul160:			; RV32IB-LABEL: mul160:
	; RV32IB: # %bb.0:			; RV32IB: # %bb.0:
	; RV32IB-NEXT: addi a1, zero, 160			; RV32IB-NEXT: sh2add a0, a0, a0
	; RV32IB-NEXT: mul a0, a0, a1			; RV32IB-NEXT: slli a0, a0, 5
	; RV32IB-NEXT: ret			; RV32IB-NEXT: ret
	;			;
	; RV32IBA-LABEL: mul160:			; RV32IBA-LABEL: mul160:
	; RV32IBA: # %bb.0:			; RV32IBA: # %bb.0:
	; RV32IBA-NEXT: addi a1, zero, 160			; RV32IBA-NEXT: sh2add a0, a0, a0
	; RV32IBA-NEXT: mul a0, a0, a1			; RV32IBA-NEXT: slli a0, a0, 5
	; RV32IBA-NEXT: ret			; RV32IBA-NEXT: ret
	%c = mul i32 %a, 160			%c = mul i32 %a, 160
	ret i32 %c			ret i32 %c
	}			}

	define i32 @mul288(i32 %a) {			define i32 @mul288(i32 %a) {
	; RV32I-LABEL: mul288:			; RV32I-LABEL: mul288:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	; RV32I-NEXT: addi a1, zero, 288			; RV32I-NEXT: addi a1, zero, 288
	; RV32I-NEXT: mul a0, a0, a1			; RV32I-NEXT: mul a0, a0, a1
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	;			;
	; RV32IB-LABEL: mul288:			; RV32IB-LABEL: mul288:
	; RV32IB: # %bb.0:			; RV32IB: # %bb.0:
	; RV32IB-NEXT: addi a1, zero, 288			; RV32IB-NEXT: sh3add a0, a0, a0
	; RV32IB-NEXT: mul a0, a0, a1			; RV32IB-NEXT: slli a0, a0, 5
	; RV32IB-NEXT: ret			; RV32IB-NEXT: ret
	;			;
	; RV32IBA-LABEL: mul288:			; RV32IBA-LABEL: mul288:
	; RV32IBA: # %bb.0:			; RV32IBA: # %bb.0:
	; RV32IBA-NEXT: addi a1, zero, 288			; RV32IBA-NEXT: sh3add a0, a0, a0
	; RV32IBA-NEXT: mul a0, a0, a1			; RV32IBA-NEXT: slli a0, a0, 5
	; RV32IBA-NEXT: ret			; RV32IBA-NEXT: ret
	%c = mul i32 %a, 288			%c = mul i32 %a, 288
	ret i32 %c			ret i32 %c
	}			}

	define i32 @mul258(i32 %a) {			define i32 @mul258(i32 %a) {
	; RV32I-LABEL: mul258:			; RV32I-LABEL: mul258:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	; RV32I-NEXT: addi a1, zero, 258			; RV32I-NEXT: addi a1, zero, 258
	; RV32I-NEXT: mul a0, a0, a1			; RV32I-NEXT: mul a0, a0, a1
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	;			;
	; RV32IB-LABEL: mul258:			; RV32IB-LABEL: mul258:
	; RV32IB: # %bb.0:			; RV32IB: # %bb.0:
	; RV32IB-NEXT: addi a1, zero, 258			; RV32IB-NEXT: slli a1, a0, 8
	; RV32IB-NEXT: mul a0, a0, a1			; RV32IB-NEXT: sh1add a0, a0, a1
	; RV32IB-NEXT: ret			; RV32IB-NEXT: ret
	;			;
	; RV32IBA-LABEL: mul258:			; RV32IBA-LABEL: mul258:
	; RV32IBA: # %bb.0:			; RV32IBA: # %bb.0:
	; RV32IBA-NEXT: addi a1, zero, 258			; RV32IBA-NEXT: slli a1, a0, 8
	; RV32IBA-NEXT: mul a0, a0, a1			; RV32IBA-NEXT: sh1add a0, a0, a1
	; RV32IBA-NEXT: ret			; RV32IBA-NEXT: ret
	%c = mul i32 %a, 258			%c = mul i32 %a, 258
	ret i32 %c			ret i32 %c
	}			}

	define i32 @mul260(i32 %a) {			define i32 @mul260(i32 %a) {
	; RV32I-LABEL: mul260:			; RV32I-LABEL: mul260:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	; RV32I-NEXT: addi a1, zero, 260			; RV32I-NEXT: addi a1, zero, 260
	; RV32I-NEXT: mul a0, a0, a1			; RV32I-NEXT: mul a0, a0, a1
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	;			;
	; RV32IB-LABEL: mul260:			; RV32IB-LABEL: mul260:
	; RV32IB: # %bb.0:			; RV32IB: # %bb.0:
	; RV32IB-NEXT: addi a1, zero, 260			; RV32IB-NEXT: slli a1, a0, 8
	; RV32IB-NEXT: mul a0, a0, a1			; RV32IB-NEXT: sh2add a0, a0, a1
	; RV32IB-NEXT: ret			; RV32IB-NEXT: ret
	;			;
	; RV32IBA-LABEL: mul260:			; RV32IBA-LABEL: mul260:
	; RV32IBA: # %bb.0:			; RV32IBA: # %bb.0:
	; RV32IBA-NEXT: addi a1, zero, 260			; RV32IBA-NEXT: slli a1, a0, 8
	; RV32IBA-NEXT: mul a0, a0, a1			; RV32IBA-NEXT: sh2add a0, a0, a1
	; RV32IBA-NEXT: ret			; RV32IBA-NEXT: ret
	%c = mul i32 %a, 260			%c = mul i32 %a, 260
	ret i32 %c			ret i32 %c
	}			}

	define i32 @mul264(i32 %a) {			define i32 @mul264(i32 %a) {
	; RV32I-LABEL: mul264:			; RV32I-LABEL: mul264:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	; RV32I-NEXT: addi a1, zero, 264			; RV32I-NEXT: addi a1, zero, 264
	; RV32I-NEXT: mul a0, a0, a1			; RV32I-NEXT: mul a0, a0, a1
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	;			;
	; RV32IB-LABEL: mul264:			; RV32IB-LABEL: mul264:
	; RV32IB: # %bb.0:			; RV32IB: # %bb.0:
	; RV32IB-NEXT: addi a1, zero, 264			; RV32IB-NEXT: slli a1, a0, 8
	; RV32IB-NEXT: mul a0, a0, a1			; RV32IB-NEXT: sh3add a0, a0, a1
	; RV32IB-NEXT: ret			; RV32IB-NEXT: ret
	;			;
	; RV32IBA-LABEL: mul264:			; RV32IBA-LABEL: mul264:
	; RV32IBA: # %bb.0:			; RV32IBA: # %bb.0:
	; RV32IBA-NEXT: addi a1, zero, 264			; RV32IBA-NEXT: slli a1, a0, 8
	; RV32IBA-NEXT: mul a0, a0, a1			; RV32IBA-NEXT: sh3add a0, a0, a1
	; RV32IBA-NEXT: ret			; RV32IBA-NEXT: ret
	%c = mul i32 %a, 264			%c = mul i32 %a, 264
	ret i32 %c			ret i32 %c
	}			}

llvm/test/CodeGen/RISCV/rv64zba.ll

	Show First 20 Lines • Show All 590 Lines • ▼ Show 20 Lines
	; RV64I-LABEL: mul96:			; RV64I-LABEL: mul96:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: addi a1, zero, 96			; RV64I-NEXT: addi a1, zero, 96
	; RV64I-NEXT: mul a0, a0, a1			; RV64I-NEXT: mul a0, a0, a1
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	;			;
	; RV64IB-LABEL: mul96:			; RV64IB-LABEL: mul96:
	; RV64IB: # %bb.0:			; RV64IB: # %bb.0:
	; RV64IB-NEXT: addi a1, zero, 96			; RV64IB-NEXT: sh1add a0, a0, a0
	; RV64IB-NEXT: mul a0, a0, a1			; RV64IB-NEXT: slli a0, a0, 5
	; RV64IB-NEXT: ret			; RV64IB-NEXT: ret
	;			;
	; RV64IBA-LABEL: mul96:			; RV64IBA-LABEL: mul96:
	; RV64IBA: # %bb.0:			; RV64IBA: # %bb.0:
	; RV64IBA-NEXT: addi a1, zero, 96			; RV64IBA-NEXT: sh1add a0, a0, a0
	; RV64IBA-NEXT: mul a0, a0, a1			; RV64IBA-NEXT: slli a0, a0, 5
	; RV64IBA-NEXT: ret			; RV64IBA-NEXT: ret
	%c = mul i64 %a, 96			%c = mul i64 %a, 96
	ret i64 %c			ret i64 %c
	}			}

	define i64 @mul160(i64 %a) {			define i64 @mul160(i64 %a) {
	; RV64I-LABEL: mul160:			; RV64I-LABEL: mul160:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: addi a1, zero, 160			; RV64I-NEXT: addi a1, zero, 160
	; RV64I-NEXT: mul a0, a0, a1			; RV64I-NEXT: mul a0, a0, a1
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	;			;
	; RV64IB-LABEL: mul160:			; RV64IB-LABEL: mul160:
	; RV64IB: # %bb.0:			; RV64IB: # %bb.0:
	; RV64IB-NEXT: addi a1, zero, 160			; RV64IB-NEXT: sh2add a0, a0, a0
	; RV64IB-NEXT: mul a0, a0, a1			; RV64IB-NEXT: slli a0, a0, 5
	; RV64IB-NEXT: ret			; RV64IB-NEXT: ret
	;			;
	; RV64IBA-LABEL: mul160:			; RV64IBA-LABEL: mul160:
	; RV64IBA: # %bb.0:			; RV64IBA: # %bb.0:
	; RV64IBA-NEXT: addi a1, zero, 160			; RV64IBA-NEXT: sh2add a0, a0, a0
	; RV64IBA-NEXT: mul a0, a0, a1			; RV64IBA-NEXT: slli a0, a0, 5
	; RV64IBA-NEXT: ret			; RV64IBA-NEXT: ret
	%c = mul i64 %a, 160			%c = mul i64 %a, 160
	ret i64 %c			ret i64 %c
	}			}

	define i64 @mul288(i64 %a) {			define i64 @mul288(i64 %a) {
	; RV64I-LABEL: mul288:			; RV64I-LABEL: mul288:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: addi a1, zero, 288			; RV64I-NEXT: addi a1, zero, 288
	; RV64I-NEXT: mul a0, a0, a1			; RV64I-NEXT: mul a0, a0, a1
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	;			;
	; RV64IB-LABEL: mul288:			; RV64IB-LABEL: mul288:
	; RV64IB: # %bb.0:			; RV64IB: # %bb.0:
	; RV64IB-NEXT: addi a1, zero, 288			; RV64IB-NEXT: sh3add a0, a0, a0
	; RV64IB-NEXT: mul a0, a0, a1			; RV64IB-NEXT: slli a0, a0, 5
	; RV64IB-NEXT: ret			; RV64IB-NEXT: ret
	;			;
	; RV64IBA-LABEL: mul288:			; RV64IBA-LABEL: mul288:
	; RV64IBA: # %bb.0:			; RV64IBA: # %bb.0:
	; RV64IBA-NEXT: addi a1, zero, 288			; RV64IBA-NEXT: sh3add a0, a0, a0
	; RV64IBA-NEXT: mul a0, a0, a1			; RV64IBA-NEXT: slli a0, a0, 5
	; RV64IBA-NEXT: ret			; RV64IBA-NEXT: ret
	%c = mul i64 %a, 288			%c = mul i64 %a, 288
	ret i64 %c			ret i64 %c
	}			}

	define i64 @sh1add_imm(i64 %0) {			define i64 @sh1add_imm(i64 %0) {
	; RV64I-LABEL: sh1add_imm:			; RV64I-LABEL: sh1add_imm:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines
	; RV64I-LABEL: mul258:			; RV64I-LABEL: mul258:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: addi a1, zero, 258			; RV64I-NEXT: addi a1, zero, 258
	; RV64I-NEXT: mul a0, a0, a1			; RV64I-NEXT: mul a0, a0, a1
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	;			;
	; RV64IB-LABEL: mul258:			; RV64IB-LABEL: mul258:
	; RV64IB: # %bb.0:			; RV64IB: # %bb.0:
	; RV64IB-NEXT: addi a1, zero, 258			; RV64IB-NEXT: slli a1, a0, 8
	; RV64IB-NEXT: mul a0, a0, a1			; RV64IB-NEXT: sh1add a0, a0, a1
	; RV64IB-NEXT: ret			; RV64IB-NEXT: ret
	;			;
	; RV64IBA-LABEL: mul258:			; RV64IBA-LABEL: mul258:
	; RV64IBA: # %bb.0:			; RV64IBA: # %bb.0:
	; RV64IBA-NEXT: addi a1, zero, 258			; RV64IBA-NEXT: slli a1, a0, 8
	; RV64IBA-NEXT: mul a0, a0, a1			; RV64IBA-NEXT: sh1add a0, a0, a1
	; RV64IBA-NEXT: ret			; RV64IBA-NEXT: ret
	%c = mul i64 %a, 258			%c = mul i64 %a, 258
	ret i64 %c			ret i64 %c
	}			}

	define i64 @mul260(i64 %a) {			define i64 @mul260(i64 %a) {
	; RV64I-LABEL: mul260:			; RV64I-LABEL: mul260:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: addi a1, zero, 260			; RV64I-NEXT: addi a1, zero, 260
	; RV64I-NEXT: mul a0, a0, a1			; RV64I-NEXT: mul a0, a0, a1
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	;			;
	; RV64IB-LABEL: mul260:			; RV64IB-LABEL: mul260:
	; RV64IB: # %bb.0:			; RV64IB: # %bb.0:
	; RV64IB-NEXT: addi a1, zero, 260			; RV64IB-NEXT: slli a1, a0, 8
	; RV64IB-NEXT: mul a0, a0, a1			; RV64IB-NEXT: sh2add a0, a0, a1
	; RV64IB-NEXT: ret			; RV64IB-NEXT: ret
	;			;
	; RV64IBA-LABEL: mul260:			; RV64IBA-LABEL: mul260:
	; RV64IBA: # %bb.0:			; RV64IBA: # %bb.0:
	; RV64IBA-NEXT: addi a1, zero, 260			; RV64IBA-NEXT: slli a1, a0, 8
	; RV64IBA-NEXT: mul a0, a0, a1			; RV64IBA-NEXT: sh2add a0, a0, a1
	; RV64IBA-NEXT: ret			; RV64IBA-NEXT: ret
	%c = mul i64 %a, 260			%c = mul i64 %a, 260
	ret i64 %c			ret i64 %c
	}			}

	define i64 @mul264(i64 %a) {			define i64 @mul264(i64 %a) {
	; RV64I-LABEL: mul264:			; RV64I-LABEL: mul264:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: addi a1, zero, 264			; RV64I-NEXT: addi a1, zero, 264
	; RV64I-NEXT: mul a0, a0, a1			; RV64I-NEXT: mul a0, a0, a1
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	;			;
	; RV64IB-LABEL: mul264:			; RV64IB-LABEL: mul264:
	; RV64IB: # %bb.0:			; RV64IB: # %bb.0:
	; RV64IB-NEXT: addi a1, zero, 264			; RV64IB-NEXT: slli a1, a0, 8
	; RV64IB-NEXT: mul a0, a0, a1			; RV64IB-NEXT: sh3add a0, a0, a1
	; RV64IB-NEXT: ret			; RV64IB-NEXT: ret
	;			;
	; RV64IBA-LABEL: mul264:			; RV64IBA-LABEL: mul264:
	; RV64IBA: # %bb.0:			; RV64IBA: # %bb.0:
	; RV64IBA-NEXT: addi a1, zero, 264			; RV64IBA-NEXT: slli a1, a0, 8
	; RV64IBA-NEXT: mul a0, a0, a1			; RV64IBA-NEXT: sh3add a0, a0, a1
	; RV64IBA-NEXT: ret			; RV64IBA-NEXT: ret
	%c = mul i64 %a, 264			%c = mul i64 %a, 264
	ret i64 %c			ret i64 %c
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Optimize multiplication in the zba extension with SH*ADD
ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 359527

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/lib/Target/RISCV/RISCVInstrInfoB.td

llvm/test/CodeGen/RISCV/rv32zba.ll

llvm/test/CodeGen/RISCV/rv64zba.ll

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Optimize multiplication in the zba extension with SH*ADDClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 359527

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

llvm/lib/Target/RISCV/RISCVInstrInfoB.td

llvm/test/CodeGen/RISCV/rv32zba.ll

llvm/test/CodeGen/RISCV/rv64zba.ll

[RISCV] Optimize multiplication in the zba extension with SH*ADD
ClosedPublic