Download Raw Diff

Details

Reviewers

craig.topper
asb
luismarques
LevyHsu

Commits

rG9e5c5afc7ee2: [RISCV] Optimize multiplication in the zba extension with SH*ADD

Summary

This patch make the following optimization.

(mul x, 3 * power_of_2) -> (SLLI (SH1ADD x, x), bits)
(mul x, 5 * power_of_2) -> (SLLI (SH2ADD x, x), bits)
(mul x, 9 * power_of_2) -> (SLLI (SH3ADD x, x), bits)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

benshi001 created this revision.Jul 12 2021, 12:36 AM

Herald added subscribers: vkmr, frasercrmck, evandro and 23 others. · View Herald TranscriptJul 12 2021, 12:37 AM

benshi001 requested review of this revision.Jul 12 2021, 12:37 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2021, 12:37 AM

Herald added subscribers: llvm-commits, MaskRay. · View Herald Transcript

benshi001 edited the summary of this revision. (Show Details)Jul 12 2021, 12:38 AM

The new optimization rules must be put before the following generic SH*ADD rules, otherwise the new optimization does not work.

def : Pat<(add (shl GPR:$rs1, (XLenVT 1)), non_imm12:$rs2),
          (SH1ADD GPR:$rs1, GPR:$rs2)>;
def : Pat<(add (shl GPR:$rs1, (XLenVT 2)), non_imm12:$rs2),
          (SH2ADD GPR:$rs1, GPR:$rs2)>;
def : Pat<(add (shl GPR:$rs1, (XLenVT 3)), non_imm12:$rs2),
          (SH3ADD GPR:$rs1, GPR:$rs2)>;

Harbormaster completed remote builds in B113446: Diff 357848.Jul 12 2021, 1:20 AM

benshi001 updated this revision to Diff 357914.Jul 12 2021, 5:59 AM

benshi001 edited the summary of this revision. (Show Details)

For previous Pats, such as

def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 12)), GPR:$rs2),
          (SH2ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;

The pattern (rs1*12 + rs2) no long existed after changes in decomposeMulByConstant(), so it has to be changed to

ParFrag<shadd ...>
def : Pat<(shadd GPR:$rs1, (XLenVT 1), (XLenVT 2), non_imm12:$rs2),
          (SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;

to preserve the previous optimization.

What's more, they must be put before the generic rs1*2+rs2->(shadd rs1, rs2) ones otherwise they can not be matched.

benshi001 updated this revision to Diff 357918.Jul 12 2021, 6:16 AM

benshi001 edited the summary of this revision. (Show Details)

jrtc27 added inline comments.Jul 12 2021, 6:22 AM

llvm/lib/Target/RISCV/RISCVInstrInfoB.td
993	To be, shadd means something like `A << B + C`, not `A << B + A << C + D`

Harbormaster completed remote builds in B113490: Diff 357918.Jul 12 2021, 7:00 AM

benshi001 updated this revision to Diff 357937.Jul 12 2021, 7:08 AM

benshi001 marked an inline comment as done.

benshi001 added inline comments.

llvm/lib/Target/RISCV/RISCVInstrInfoB.td
993	change the ParFrag's name to addshl.

Harbormaster completed remote builds in B113501: Diff 357937.Jul 12 2021, 7:46 AM

LevyHsu added inline comments.Jul 12 2021, 7:52 AM

llvm/lib/Target/RISCV/RISCVInstrInfoB.td
993	If I understand it correctly, (shl node:$A, node:$B) matches (A<<B) (shl node:$A, node:$C) matches (A<<C) which makes the pattern like Jessica said: (A<<B) + (A<<C) + D But on spec those patterns are: uint_xlen_t sh1add(uint_xlen_t rs1, uint_xlen_t rs2) { return (rs1 << 1) + rs2; } uint_xlen_t sh2add(uint_xlen_t rs1, uint_xlen_t rs2) { return (rs1 << 2) + rs2; } uint_xlen_t sh3add(uint_xlen_t rs1, uint_xlen_t rs2) { return (rs1 << 3) + rs2; }

LevyHsu added inline comments.Jul 12 2021, 8:23 AM

llvm/lib/Target/RISCV/RISCVInstrInfoB.td
993	Nevermind...missed the update

benshi001 marked 2 inline comments as done.Jul 12 2021, 6:34 PM

benshi001 updated this revision to Diff 358125.Jul 12 2021, 7:26 PM

benshi001 edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B113634: Diff 358125.Jul 12 2021, 8:04 PM

benshi001 updated this revision to Diff 358496.Jul 13 2021, 7:40 PM

benshi001 edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B113904: Diff 358496.Jul 13 2021, 8:19 PM

I did not put the optimization of (mul x, 3 * power_of_2) in decomposeMulByConstant(). The reason is that using a PatLeaf is more simple than checking the pattern (ADD (SLLI), (SLLI).

benshi001 updated this revision to Diff 359522.Jul 16 2021, 8:09 PM

I change the $rs2 in the Pat from non_imm12 to GPR as following

def : Pat<(addshl GPR:$rs1, (XLenVT 1), (XLenVT 2), GPR:$rs2),
          (SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;

Because for a*6 + 10, non_imm12:$rs2 will lead to

addi Rb, zero, 6
mul Ra, Rb, Ra
addi Ra, Ra, 6

while GPR:$rs2 will lead to

addi Rb, zero, 10
sh1add Ra, Ra, Ra
sh1add Ra, Ra, Rb

And I think the later one is better, so I changed non_imm12:$rs2 to GPR:$rs2.

Harbormaster completed remote builds in B114656: Diff 359522.Jul 16 2021, 8:42 PM

In D105796#2885104, @benshi001 wrote:
I change the $rs2 in the Pat from non_imm12 to GPR as following
def : Pat<(addshl GPR:$rs1, (XLenVT 1), (XLenVT 2), GPR:$rs2),
          (SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
Because for a*6 + 10, non_imm12:$rs2 will lead to
addi Rb, zero, 6
mul Ra, Rb, Ra
addi Ra, Ra, 6
while GPR:$rs2 will lead to
addi Rb, zero, 10
sh1add Ra, Ra, Ra
sh1add Ra, Ra, Rb
And I think the later one is better, so I changed non_imm12:$rs2 to GPR:$rs2.

I'm not sure I follow this. How can a change to the isel pattern cause a mul to be created? Wasn't the mul already decomposed?

benshi001 added a comment.Jul 16 2021, 9:31 PM

This comment was removed by benshi001.

In D105796#2885130, @craig.topper wrote:
In D105796#2885104, @benshi001 wrote:
I change the $rs2 in the Pat from non_imm12 to GPR as following
def : Pat<(addshl GPR:$rs1, (XLenVT 1), (XLenVT 2), GPR:$rs2),
          (SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
Because for a*6 + 10, non_imm12:$rs2 will lead to
addi Rb, zero, 6
mul Ra, Rb, Ra
addi Ra, Ra, 6
while GPR:$rs2 will lead to
addi Rb, zero, 10
sh1add Ra, Ra, Ra
sh1add Ra, Ra, Rb
And I think the later one is better, so I changed non_imm12:$rs2 to GPR:$rs2.
I'm not sure I follow this. How can a change to the isel pattern cause a mul to be created? Wasn't the mul already decomposed?

You are right. For non_imm12:$rs2 in a*10+6, the following is generated

slli    a1, a0, 1
sh3add  a0, a0, a1
addi    a0, a0, 6

For GPR:$rs2, the assembly is

sh2add  a0, a0, a0
addi    a1, zero, 6
sh1add  a0, a0, a1

I think maybe the first one is better, so I will rollback my patch.

benshi001 updated this revision to Diff 359527.Jul 16 2021, 9:40 PM

Harbormaster completed remote builds in B114661: Diff 359527.Jul 16 2021, 10:13 PM

benshi001 updated this revision to Diff 360351.Jul 20 2021, 8:42 PM

benshi001 edited the summary of this revision. (Show Details)

This comment was removed by benshi001.

I have split previous patch to smaller ones, which would be easy to review. And current patch only contains

(mul x, 3 * power_of_2) -> (SLLI (SH1ADD x, x), bits)
(mul x, 5 * power_of_2) -> (SLLI (SH2ADD x, x), bits)
(mul x, 9 * power_of_2) -> (SLLI (SH3ADD x, x), bits)

Harbormaster completed remote builds in B115242: Diff 360351.Jul 20 2021, 9:27 PM

craig.topper added inline comments.Jul 21 2021, 9:56 AM

llvm/lib/Target/RISCV/RISCVInstrInfoB.td
1016	Can we rename BSETINVTwoBitsMaskLow to TrailingZerosXForm?

Do we need to change BSETINVTwoBitsMaskHigh to LeadingZerosXForm ?

In D105796#2895304, @benshi001 wrote:

Do we need to change BSETINVTwoBitsMaskHigh to LeadingZerosXForm ?

Since it does 63-leadingzeros i wouldn’t name it LeadingZerosXForm. You can leave it as is until we find another use for it.

LGTM

This revision is now accepted and ready to land.Jul 21 2021, 7:09 PM

This revision was landed with ongoing or failed builds.Jul 21 2021, 7:29 PM

Closed by commit rG9e5c5afc7ee2: [RISCV] Optimize multiplication in the zba extension with SH*ADD (authored by benshi001). · Explain Why

This revision was automatically updated to reflect the committed changes.

benshi001 added a commit: rG9e5c5afc7ee2: [RISCV] Optimize multiplication in the zba extension with SH*ADD.

Harbormaster completed remote builds in B115474: Diff 360682.Jul 21 2021, 8:27 PM

Jimerlife mentioned this in D116917: [RISCV] Optimize some mul operation using SH*ADDUW instruction.Jan 9 2022, 11:19 PM

Diff 360686

llvm/lib/Target/RISCV/RISCVInstrInfoB.td

Show First 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	if (!N->hasOneUse())
return false;		return false;
// The immediate should not be a simm12.		// The immediate should not be a simm12.
if (isInt<12>(N->getSExtValue()))		if (isInt<12>(N->getSExtValue()))
return false;		return false;
// The immediate must have exactly two bits set.		// The immediate must have exactly two bits set.
return countPopulation(N->getZExtValue()) == 2;		return countPopulation(N->getZExtValue()) == 2;
}]>;		}]>;

def BSETINVTwoBitsMaskLow : SDNodeXForm<imm, [{		def TrailingZerosXForm : SDNodeXForm<imm, [{
uint64_t I = N->getZExtValue();		uint64_t I = N->getZExtValue();
return CurDAG->getTargetConstant(countTrailingZeros(I), SDLoc(N),		return CurDAG->getTargetConstant(countTrailingZeros(I), SDLoc(N),
N->getValueType(0));		N->getValueType(0));
}]>;		}]>;

def BSETINVTwoBitsMaskHigh : SDNodeXForm<imm, [{		def BSETINVTwoBitsMaskHigh : SDNodeXForm<imm, [{
uint64_t I = N->getZExtValue();		uint64_t I = N->getZExtValue();
return CurDAG->getTargetConstant(63 - countLeadingZeros(I), SDLoc(N),		return CurDAG->getTargetConstant(63 - countLeadingZeros(I), SDLoc(N),
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	def BCLRIANDIMask : PatLeaf<(imm), [{
return Subtarget->is64Bit() ? isPowerOf2_64(~I) : isPowerOf2_32(~I);		return Subtarget->is64Bit() ? isPowerOf2_64(~I) : isPowerOf2_32(~I);
}]>;		}]>;

def BCLRIANDIMaskLow : SDNodeXForm<imm, [{		def BCLRIANDIMaskLow : SDNodeXForm<imm, [{
return CurDAG->getTargetConstant((N->getZExtValue() & 0x7ff) \| ~0x7ffull,		return CurDAG->getTargetConstant((N->getZExtValue() & 0x7ff) \| ~0x7ffull,
SDLoc(N), N->getValueType(0));		SDLoc(N), N->getValueType(0));
}]>;		}]>;

		def C3LeftShift : PatLeaf<(imm), [{
		uint64_t C = N->getZExtValue();
		return C > 3 && ((C % 3) == 0) && isPowerOf2_64(C / 3);
		}]>;

		def C5LeftShift : PatLeaf<(imm), [{
		uint64_t C = N->getZExtValue();
		return C > 5 && ((C % 5) == 0) && isPowerOf2_64(C / 5);
		}]>;

		def C9LeftShift : PatLeaf<(imm), [{
		uint64_t C = N->getZExtValue();
		return C > 9 && ((C % 9) == 0) && isPowerOf2_64(C / 9);
		}]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Instruction class templates		// Instruction class templates
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// Some of these templates should be moved to RISCVInstrFormats.td once the B		// Some of these templates should be moved to RISCVInstrFormats.td once the B
// extension has been ratified.		// extension has been ratified.

let hasSideEffects = 0, mayLoad = 0, mayStore = 0 in		let hasSideEffects = 0, mayLoad = 0, mayStore = 0 in
▲ Show 20 Lines • Show All 622 Lines • ▼ Show 20 Lines	def : Pat<(or GPR:$rs1, BSETINVMask:$mask),
(BSETI GPR:$rs1, BSETINVMask:$mask)>;		(BSETI GPR:$rs1, BSETINVMask:$mask)>;
def : Pat<(xor GPR:$rs1, BSETINVMask:$mask),		def : Pat<(xor GPR:$rs1, BSETINVMask:$mask),
(BINVI GPR:$rs1, BSETINVMask:$mask)>;		(BINVI GPR:$rs1, BSETINVMask:$mask)>;

def : Pat<(and (srl GPR:$rs1, uimmlog2xlen:$shamt), (XLenVT 1)),		def : Pat<(and (srl GPR:$rs1, uimmlog2xlen:$shamt), (XLenVT 1)),
(BEXTI GPR:$rs1, uimmlog2xlen:$shamt)>;		(BEXTI GPR:$rs1, uimmlog2xlen:$shamt)>;

def : Pat<(or GPR:$r, BSETINVTwoBitsMask:$i),		def : Pat<(or GPR:$r, BSETINVTwoBitsMask:$i),
(BSETI (BSETI GPR:$r, (BSETINVTwoBitsMaskLow BSETINVTwoBitsMask:$i)),		(BSETI (BSETI GPR:$r, (TrailingZerosXForm BSETINVTwoBitsMask:$i)),
(BSETINVTwoBitsMaskHigh BSETINVTwoBitsMask:$i))>;		(BSETINVTwoBitsMaskHigh BSETINVTwoBitsMask:$i))>;
def : Pat<(xor GPR:$r, BSETINVTwoBitsMask:$i),		def : Pat<(xor GPR:$r, BSETINVTwoBitsMask:$i),
(BINVI (BINVI GPR:$r, (BSETINVTwoBitsMaskLow BSETINVTwoBitsMask:$i)),		(BINVI (BINVI GPR:$r, (TrailingZerosXForm BSETINVTwoBitsMask:$i)),
(BSETINVTwoBitsMaskHigh BSETINVTwoBitsMask:$i))>;		(BSETINVTwoBitsMaskHigh BSETINVTwoBitsMask:$i))>;
def : Pat<(or GPR:$r, BSETINVORIMask:$i),		def : Pat<(or GPR:$r, BSETINVORIMask:$i),
(BSETI (ORI GPR:$r, (BSETINVORIMaskLow BSETINVORIMask:$i)),		(BSETI (ORI GPR:$r, (BSETINVORIMaskLow BSETINVORIMask:$i)),
(BSETINVTwoBitsMaskHigh BSETINVORIMask:$i))>;		(BSETINVTwoBitsMaskHigh BSETINVORIMask:$i))>;
def : Pat<(xor GPR:$r, BSETINVORIMask:$i),		def : Pat<(xor GPR:$r, BSETINVORIMask:$i),
(BINVI (XORI GPR:$r, (BSETINVORIMaskLow BSETINVORIMask:$i)),		(BINVI (XORI GPR:$r, (BSETINVORIMaskLow BSETINVORIMask:$i)),
(BSETINVTwoBitsMaskHigh BSETINVORIMask:$i))>;		(BSETINVTwoBitsMaskHigh BSETINVORIMask:$i))>;
def : Pat<(and GPR:$r, BCLRITwoBitsMask:$i),		def : Pat<(and GPR:$r, BCLRITwoBitsMask:$i),
▲ Show 20 Lines • Show All 146 Lines • ▼ Show 20 Lines
}]>;		}]>;

let Predicates = [HasStdExtZba] in {		let Predicates = [HasStdExtZba] in {
def : Pat<(add (shl GPR:$rs1, (XLenVT 1)), non_imm12:$rs2),		def : Pat<(add (shl GPR:$rs1, (XLenVT 1)), non_imm12:$rs2),
(SH1ADD GPR:$rs1, GPR:$rs2)>;		(SH1ADD GPR:$rs1, GPR:$rs2)>;
def : Pat<(add (shl GPR:$rs1, (XLenVT 2)), non_imm12:$rs2),		def : Pat<(add (shl GPR:$rs1, (XLenVT 2)), non_imm12:$rs2),
(SH2ADD GPR:$rs1, GPR:$rs2)>;		(SH2ADD GPR:$rs1, GPR:$rs2)>;
def : Pat<(add (shl GPR:$rs1, (XLenVT 3)), non_imm12:$rs2),		def : Pat<(add (shl GPR:$rs1, (XLenVT 3)), non_imm12:$rs2),
(SH3ADD GPR:$rs1, GPR:$rs2)>;		(SH3ADD GPR:$rs1, GPR:$rs2)>;
		jrtc27Unsubmitted Done Reply Inline Actions To be, shadd means something like `A << B + C`, not `A << B + A << C + D` jrtc27: To be, shadd means something like `A << B + C`, not `A << B + A << C + D`
		benshi001AuthorUnsubmitted Done Reply Inline Actions change the ParFrag's name to addshl. benshi001: change the ParFrag's name to addshl.
		LevyHsuUnsubmitted Done Reply Inline Actions If I understand it correctly, (shl node:$A, node:$B) matches (A<<B) (shl node:$A, node:$C) matches (A<<C) which makes the pattern like Jessica said: (A<<B) + (A<<C) + D But on spec those patterns are: uint_xlen_t sh1add(uint_xlen_t rs1, uint_xlen_t rs2) { return (rs1 << 1) + rs2; } uint_xlen_t sh2add(uint_xlen_t rs1, uint_xlen_t rs2) { return (rs1 << 2) + rs2; } uint_xlen_t sh3add(uint_xlen_t rs1, uint_xlen_t rs2) { return (rs1 << 3) + rs2; } LevyHsu: If I understand it correctly, (shl node:$A, node:$B) matches (A<<B) (shl node:$A, node:$C)…
		LevyHsuUnsubmitted Done Reply Inline Actions Nevermind...missed the update LevyHsu: Nevermind...missed the update

def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 6)), GPR:$rs2),		def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 6)), GPR:$rs2),
(SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;		(SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 10)), GPR:$rs2),		def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 10)), GPR:$rs2),
(SH1ADD (SH2ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;		(SH1ADD (SH2ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 18)), GPR:$rs2),		def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 18)), GPR:$rs2),
(SH1ADD (SH3ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;		(SH1ADD (SH3ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 12)), GPR:$rs2),		def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 12)), GPR:$rs2),
(SH2ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;		(SH2ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 20)), GPR:$rs2),		def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 20)), GPR:$rs2),
(SH2ADD (SH2ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;		(SH2ADD (SH2ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 36)), GPR:$rs2),		def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 36)), GPR:$rs2),
(SH2ADD (SH3ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;		(SH2ADD (SH3ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 24)), GPR:$rs2),		def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 24)), GPR:$rs2),
(SH3ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;		(SH3ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 40)), GPR:$rs2),		def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 40)), GPR:$rs2),
(SH3ADD (SH2ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;		(SH3ADD (SH2ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;
def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 72)), GPR:$rs2),		def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 72)), GPR:$rs2),
(SH3ADD (SH3ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;		(SH3ADD (SH3ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;

		def : Pat<(mul GPR:$r, C3LeftShift:$i),
		(SLLI (SH1ADD GPR:$r, GPR:$r),
		(TrailingZerosXForm C3LeftShift:$i))>;
		craig.topperUnsubmitted Done Reply Inline Actions Can we rename BSETINVTwoBitsMaskLow to TrailingZerosXForm? craig.topper: Can we rename BSETINVTwoBitsMaskLow to TrailingZerosXForm?
		def : Pat<(mul GPR:$r, C5LeftShift:$i),
		(SLLI (SH2ADD GPR:$r, GPR:$r),
		(TrailingZerosXForm C5LeftShift:$i))>;
		def : Pat<(mul GPR:$r, C9LeftShift:$i),
		(SLLI (SH3ADD GPR:$r, GPR:$r),
		(TrailingZerosXForm C9LeftShift:$i))>;
} // Predicates = [HasStdExtZba]		} // Predicates = [HasStdExtZba]

let Predicates = [HasStdExtZba, IsRV64] in {		let Predicates = [HasStdExtZba, IsRV64] in {
def : Pat<(i64 (shl (and GPR:$rs1, 0xFFFFFFFF), uimm5:$shamt)),		def : Pat<(i64 (shl (and GPR:$rs1, 0xFFFFFFFF), uimm5:$shamt)),
(SLLIUW GPR:$rs1, uimm5:$shamt)>;		(SLLIUW GPR:$rs1, uimm5:$shamt)>;
def : Pat<(i64 (add (and GPR:$rs1, 0xFFFFFFFF), non_imm12:$rs2)),		def : Pat<(i64 (add (and GPR:$rs1, 0xFFFFFFFF), non_imm12:$rs2)),
(ADDUW GPR:$rs1, GPR:$rs2)>;		(ADDUW GPR:$rs1, GPR:$rs2)>;
def : Pat<(i64 (and GPR:$rs, 0xFFFFFFFF)), (ADDUW GPR:$rs, X0)>;		def : Pat<(i64 (and GPR:$rs, 0xFFFFFFFF)), (ADDUW GPR:$rs, X0)>;
▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rv32zba.ll

	Show First 20 Lines • Show All 300 Lines • ▼ Show 20 Lines
	; RV32I-LABEL: mul96:			; RV32I-LABEL: mul96:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	; RV32I-NEXT: addi a1, zero, 96			; RV32I-NEXT: addi a1, zero, 96
	; RV32I-NEXT: mul a0, a0, a1			; RV32I-NEXT: mul a0, a0, a1
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	;			;
	; RV32IB-LABEL: mul96:			; RV32IB-LABEL: mul96:
	; RV32IB: # %bb.0:			; RV32IB: # %bb.0:
	; RV32IB-NEXT: addi a1, zero, 96			; RV32IB-NEXT: sh1add a0, a0, a0
	; RV32IB-NEXT: mul a0, a0, a1			; RV32IB-NEXT: slli a0, a0, 5
	; RV32IB-NEXT: ret			; RV32IB-NEXT: ret
	;			;
	; RV32IBA-LABEL: mul96:			; RV32IBA-LABEL: mul96:
	; RV32IBA: # %bb.0:			; RV32IBA: # %bb.0:
	; RV32IBA-NEXT: addi a1, zero, 96			; RV32IBA-NEXT: sh1add a0, a0, a0
	; RV32IBA-NEXT: mul a0, a0, a1			; RV32IBA-NEXT: slli a0, a0, 5
	; RV32IBA-NEXT: ret			; RV32IBA-NEXT: ret
	%c = mul i32 %a, 96			%c = mul i32 %a, 96
	ret i32 %c			ret i32 %c
	}			}

	define i32 @mul160(i32 %a) {			define i32 @mul160(i32 %a) {
	; RV32I-LABEL: mul160:			; RV32I-LABEL: mul160:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	; RV32I-NEXT: addi a1, zero, 160			; RV32I-NEXT: addi a1, zero, 160
	; RV32I-NEXT: mul a0, a0, a1			; RV32I-NEXT: mul a0, a0, a1
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	;			;
	; RV32IB-LABEL: mul160:			; RV32IB-LABEL: mul160:
	; RV32IB: # %bb.0:			; RV32IB: # %bb.0:
	; RV32IB-NEXT: addi a1, zero, 160			; RV32IB-NEXT: sh2add a0, a0, a0
	; RV32IB-NEXT: mul a0, a0, a1			; RV32IB-NEXT: slli a0, a0, 5
	; RV32IB-NEXT: ret			; RV32IB-NEXT: ret
	;			;
	; RV32IBA-LABEL: mul160:			; RV32IBA-LABEL: mul160:
	; RV32IBA: # %bb.0:			; RV32IBA: # %bb.0:
	; RV32IBA-NEXT: addi a1, zero, 160			; RV32IBA-NEXT: sh2add a0, a0, a0
	; RV32IBA-NEXT: mul a0, a0, a1			; RV32IBA-NEXT: slli a0, a0, 5
	; RV32IBA-NEXT: ret			; RV32IBA-NEXT: ret
	%c = mul i32 %a, 160			%c = mul i32 %a, 160
	ret i32 %c			ret i32 %c
	}			}

	define i32 @mul288(i32 %a) {			define i32 @mul288(i32 %a) {
	; RV32I-LABEL: mul288:			; RV32I-LABEL: mul288:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	; RV32I-NEXT: addi a1, zero, 288			; RV32I-NEXT: addi a1, zero, 288
	; RV32I-NEXT: mul a0, a0, a1			; RV32I-NEXT: mul a0, a0, a1
	; RV32I-NEXT: ret			; RV32I-NEXT: ret
	;			;
	; RV32IB-LABEL: mul288:			; RV32IB-LABEL: mul288:
	; RV32IB: # %bb.0:			; RV32IB: # %bb.0:
	; RV32IB-NEXT: addi a1, zero, 288			; RV32IB-NEXT: sh3add a0, a0, a0
	; RV32IB-NEXT: mul a0, a0, a1			; RV32IB-NEXT: slli a0, a0, 5
	; RV32IB-NEXT: ret			; RV32IB-NEXT: ret
	;			;
	; RV32IBA-LABEL: mul288:			; RV32IBA-LABEL: mul288:
	; RV32IBA: # %bb.0:			; RV32IBA: # %bb.0:
	; RV32IBA-NEXT: addi a1, zero, 288			; RV32IBA-NEXT: sh3add a0, a0, a0
	; RV32IBA-NEXT: mul a0, a0, a1			; RV32IBA-NEXT: slli a0, a0, 5
	; RV32IBA-NEXT: ret			; RV32IBA-NEXT: ret
	%c = mul i32 %a, 288			%c = mul i32 %a, 288
	ret i32 %c			ret i32 %c
	}			}

	define i32 @mul258(i32 %a) {			define i32 @mul258(i32 %a) {
	; RV32I-LABEL: mul258:			; RV32I-LABEL: mul258:
	; RV32I: # %bb.0:			; RV32I: # %bb.0:
	▲ Show 20 Lines • Show All 238 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rv64zba.ll

	Show First 20 Lines • Show All 590 Lines • ▼ Show 20 Lines
	; RV64I-LABEL: mul96:			; RV64I-LABEL: mul96:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: addi a1, zero, 96			; RV64I-NEXT: addi a1, zero, 96
	; RV64I-NEXT: mul a0, a0, a1			; RV64I-NEXT: mul a0, a0, a1
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	;			;
	; RV64IB-LABEL: mul96:			; RV64IB-LABEL: mul96:
	; RV64IB: # %bb.0:			; RV64IB: # %bb.0:
	; RV64IB-NEXT: addi a1, zero, 96			; RV64IB-NEXT: sh1add a0, a0, a0
	; RV64IB-NEXT: mul a0, a0, a1			; RV64IB-NEXT: slli a0, a0, 5
	; RV64IB-NEXT: ret			; RV64IB-NEXT: ret
	;			;
	; RV64IBA-LABEL: mul96:			; RV64IBA-LABEL: mul96:
	; RV64IBA: # %bb.0:			; RV64IBA: # %bb.0:
	; RV64IBA-NEXT: addi a1, zero, 96			; RV64IBA-NEXT: sh1add a0, a0, a0
	; RV64IBA-NEXT: mul a0, a0, a1			; RV64IBA-NEXT: slli a0, a0, 5
	; RV64IBA-NEXT: ret			; RV64IBA-NEXT: ret
	%c = mul i64 %a, 96			%c = mul i64 %a, 96
	ret i64 %c			ret i64 %c
	}			}

	define i64 @mul160(i64 %a) {			define i64 @mul160(i64 %a) {
	; RV64I-LABEL: mul160:			; RV64I-LABEL: mul160:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: addi a1, zero, 160			; RV64I-NEXT: addi a1, zero, 160
	; RV64I-NEXT: mul a0, a0, a1			; RV64I-NEXT: mul a0, a0, a1
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	;			;
	; RV64IB-LABEL: mul160:			; RV64IB-LABEL: mul160:
	; RV64IB: # %bb.0:			; RV64IB: # %bb.0:
	; RV64IB-NEXT: addi a1, zero, 160			; RV64IB-NEXT: sh2add a0, a0, a0
	; RV64IB-NEXT: mul a0, a0, a1			; RV64IB-NEXT: slli a0, a0, 5
	; RV64IB-NEXT: ret			; RV64IB-NEXT: ret
	;			;
	; RV64IBA-LABEL: mul160:			; RV64IBA-LABEL: mul160:
	; RV64IBA: # %bb.0:			; RV64IBA: # %bb.0:
	; RV64IBA-NEXT: addi a1, zero, 160			; RV64IBA-NEXT: sh2add a0, a0, a0
	; RV64IBA-NEXT: mul a0, a0, a1			; RV64IBA-NEXT: slli a0, a0, 5
	; RV64IBA-NEXT: ret			; RV64IBA-NEXT: ret
	%c = mul i64 %a, 160			%c = mul i64 %a, 160
	ret i64 %c			ret i64 %c
	}			}

	define i64 @mul288(i64 %a) {			define i64 @mul288(i64 %a) {
	; RV64I-LABEL: mul288:			; RV64I-LABEL: mul288:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	; RV64I-NEXT: addi a1, zero, 288			; RV64I-NEXT: addi a1, zero, 288
	; RV64I-NEXT: mul a0, a0, a1			; RV64I-NEXT: mul a0, a0, a1
	; RV64I-NEXT: ret			; RV64I-NEXT: ret
	;			;
	; RV64IB-LABEL: mul288:			; RV64IB-LABEL: mul288:
	; RV64IB: # %bb.0:			; RV64IB: # %bb.0:
	; RV64IB-NEXT: addi a1, zero, 288			; RV64IB-NEXT: sh3add a0, a0, a0
	; RV64IB-NEXT: mul a0, a0, a1			; RV64IB-NEXT: slli a0, a0, 5
	; RV64IB-NEXT: ret			; RV64IB-NEXT: ret
	;			;
	; RV64IBA-LABEL: mul288:			; RV64IBA-LABEL: mul288:
	; RV64IBA: # %bb.0:			; RV64IBA: # %bb.0:
	; RV64IBA-NEXT: addi a1, zero, 288			; RV64IBA-NEXT: sh3add a0, a0, a0
	; RV64IBA-NEXT: mul a0, a0, a1			; RV64IBA-NEXT: slli a0, a0, 5
	; RV64IBA-NEXT: ret			; RV64IBA-NEXT: ret
	%c = mul i64 %a, 288			%c = mul i64 %a, 288
	ret i64 %c			ret i64 %c
	}			}

	define i64 @sh1add_imm(i64 %0) {			define i64 @sh1add_imm(i64 %0) {
	; RV64I-LABEL: sh1add_imm:			; RV64I-LABEL: sh1add_imm:
	; RV64I: # %bb.0:			; RV64I: # %bb.0:
	▲ Show 20 Lines • Show All 453 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Optimize multiplication in the zba extension with SH*ADD
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 360686

llvm/lib/Target/RISCV/RISCVInstrInfoB.td

llvm/test/CodeGen/RISCV/rv32zba.ll

llvm/test/CodeGen/RISCV/rv64zba.ll

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Optimize multiplication in the zba extension with SH*ADDClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 360686

llvm/lib/Target/RISCV/RISCVInstrInfoB.td

llvm/test/CodeGen/RISCV/rv32zba.ll

llvm/test/CodeGen/RISCV/rv64zba.ll

[RISCV] Optimize multiplication in the zba extension with SH*ADD
ClosedPublic