This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Optimize multiplication in the zba extension with SH*ADD
ClosedPublic

Authored by benshi001 on Jul 12 2021, 12:36 AM.

Details

Summary

This patch make the following optimization.

(mul x, 3 * power_of_2) -> (SLLI (SH1ADD x, x), bits)
(mul x, 5 * power_of_2) -> (SLLI (SH2ADD x, x), bits)
(mul x, 9 * power_of_2) -> (SLLI (SH3ADD x, x), bits)

Diff Detail

Event Timeline

benshi001 created this revision.Jul 12 2021, 12:36 AM
benshi001 requested review of this revision.Jul 12 2021, 12:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2021, 12:37 AM
benshi001 edited the summary of this revision. (Show Details)Jul 12 2021, 12:38 AM
benshi001 added a comment.EditedJul 12 2021, 12:40 AM

The new optimization rules must be put before the following generic SH*ADD rules, otherwise the new optimization does not work.

def : Pat<(add (shl GPR:$rs1, (XLenVT 1)), non_imm12:$rs2),
          (SH1ADD GPR:$rs1, GPR:$rs2)>;
def : Pat<(add (shl GPR:$rs1, (XLenVT 2)), non_imm12:$rs2),
          (SH2ADD GPR:$rs1, GPR:$rs2)>;
def : Pat<(add (shl GPR:$rs1, (XLenVT 3)), non_imm12:$rs2),
          (SH3ADD GPR:$rs1, GPR:$rs2)>;
benshi001 updated this revision to Diff 357914.Jul 12 2021, 5:59 AM
benshi001 edited the summary of this revision. (Show Details)
benshi001 edited the summary of this revision. (Show Details)

For previous Pats, such as

def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 12)), GPR:$rs2),
          (SH2ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;

The pattern (rs1*12 + rs2) no long existed after changes in decomposeMulByConstant(), so it has to be changed to

ParFrag<shadd ...>
def : Pat<(shadd GPR:$rs1, (XLenVT 1), (XLenVT 2), non_imm12:$rs2),
          (SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;

to preserve the previous optimization.

What's more, they must be put before the generic rs1*2+rs2->(shadd rs1, rs2) ones otherwise they can not be matched.

benshi001 updated this revision to Diff 357918.Jul 12 2021, 6:16 AM
benshi001 edited the summary of this revision. (Show Details)
jrtc27 added inline comments.Jul 12 2021, 6:22 AM
llvm/lib/Target/RISCV/RISCVInstrInfoB.td
993

To be, shadd means something like A << B + C, not A << B + A << C + D

benshi001 updated this revision to Diff 357937.Jul 12 2021, 7:08 AM
benshi001 marked an inline comment as done.
benshi001 added inline comments.
llvm/lib/Target/RISCV/RISCVInstrInfoB.td
993

change the ParFrag's name to addshl.

LevyHsu added inline comments.Jul 12 2021, 7:52 AM
llvm/lib/Target/RISCV/RISCVInstrInfoB.td
993

If I understand it correctly,
(shl node:$A, node:$B) matches (A<<B)
(shl node:$A, node:$C) matches (A<<C)
which makes the pattern like Jessica said: (A<<B) + (A<<C) + D

But on spec those patterns are:

uint_xlen_t sh1add(uint_xlen_t rs1, uint_xlen_t rs2)
{
return (rs1 << 1) + rs2;
}
uint_xlen_t sh2add(uint_xlen_t rs1, uint_xlen_t rs2)
{
return (rs1 << 2) + rs2;
}
uint_xlen_t sh3add(uint_xlen_t rs1, uint_xlen_t rs2)
{
return (rs1 << 3) + rs2;
}
LevyHsu added inline comments.Jul 12 2021, 8:23 AM
llvm/lib/Target/RISCV/RISCVInstrInfoB.td
993

Nevermind...missed the update

benshi001 marked 2 inline comments as done.Jul 12 2021, 6:34 PM
benshi001 updated this revision to Diff 358125.Jul 12 2021, 7:26 PM
benshi001 edited the summary of this revision. (Show Details)
benshi001 updated this revision to Diff 358496.Jul 13 2021, 7:40 PM
benshi001 edited the summary of this revision. (Show Details)
benshi001 edited the summary of this revision. (Show Details)

I did not put the optimization of (mul x, 3 * power_of_2) in decomposeMulByConstant(). The reason is that using a PatLeaf is more simple than checking the pattern (ADD (SLLI), (SLLI).

benshi001 updated this revision to Diff 359522.Jul 16 2021, 8:09 PM

I change the $rs2 in the Pat from non_imm12 to GPR as following

def : Pat<(addshl GPR:$rs1, (XLenVT 1), (XLenVT 2), GPR:$rs2),
          (SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;

Because for a*6 + 10, non_imm12:$rs2 will lead to

addi Rb, zero, 6
mul Ra, Rb, Ra
addi Ra, Ra, 6

while GPR:$rs2 will lead to

addi Rb, zero, 10
sh1add Ra, Ra, Ra
sh1add Ra, Ra, Rb

And I think the later one is better, so I changed non_imm12:$rs2 to GPR:$rs2.

I change the $rs2 in the Pat from non_imm12 to GPR as following

def : Pat<(addshl GPR:$rs1, (XLenVT 1), (XLenVT 2), GPR:$rs2),
          (SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;

Because for a*6 + 10, non_imm12:$rs2 will lead to

addi Rb, zero, 6
mul Ra, Rb, Ra
addi Ra, Ra, 6

while GPR:$rs2 will lead to

addi Rb, zero, 10
sh1add Ra, Ra, Ra
sh1add Ra, Ra, Rb

And I think the later one is better, so I changed non_imm12:$rs2 to GPR:$rs2.

I'm not sure I follow this. How can a change to the isel pattern cause a mul to be created? Wasn't the mul already decomposed?

This comment was removed by benshi001.

I change the $rs2 in the Pat from non_imm12 to GPR as following

def : Pat<(addshl GPR:$rs1, (XLenVT 1), (XLenVT 2), GPR:$rs2),
          (SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>;

Because for a*6 + 10, non_imm12:$rs2 will lead to

addi Rb, zero, 6
mul Ra, Rb, Ra
addi Ra, Ra, 6

while GPR:$rs2 will lead to

addi Rb, zero, 10
sh1add Ra, Ra, Ra
sh1add Ra, Ra, Rb

And I think the later one is better, so I changed non_imm12:$rs2 to GPR:$rs2.

I'm not sure I follow this. How can a change to the isel pattern cause a mul to be created? Wasn't the mul already decomposed?

You are right. For non_imm12:$rs2 in a*10+6, the following is generated

slli    a1, a0, 1
sh3add  a0, a0, a1
addi    a0, a0, 6

For GPR:$rs2, the assembly is

sh2add  a0, a0, a0
addi    a1, zero, 6
sh1add  a0, a0, a1

I think maybe the first one is better, so I will rollback my patch.

benshi001 updated this revision to Diff 359527.Jul 16 2021, 9:40 PM
benshi001 updated this revision to Diff 360351.Jul 20 2021, 8:42 PM
benshi001 edited the summary of this revision. (Show Details)
This comment was removed by benshi001.

I have split previous patch to smaller ones, which would be easy to review. And current patch only contains

(mul x, 3 * power_of_2) -> (SLLI (SH1ADD x, x), bits)
(mul x, 5 * power_of_2) -> (SLLI (SH2ADD x, x), bits)
(mul x, 9 * power_of_2) -> (SLLI (SH3ADD x, x), bits)
craig.topper added inline comments.Jul 21 2021, 9:56 AM
llvm/lib/Target/RISCV/RISCVInstrInfoB.td
1016

Can we rename BSETINVTwoBitsMaskLow to TrailingZerosXForm?

benshi001 updated this revision to Diff 360682.Jul 21 2021, 6:50 PM
benshi001 marked an inline comment as done.

Do we need to change BSETINVTwoBitsMaskHigh to LeadingZerosXForm ?

craig.topper added a comment.EditedJul 21 2021, 6:59 PM

Do we need to change BSETINVTwoBitsMaskHigh to LeadingZerosXForm ?

Since it does 63-leadingzeros i wouldn’t name it LeadingZerosXForm. You can leave it as is until we find another use for it.

This revision is now accepted and ready to land.Jul 21 2021, 7:09 PM
This revision was landed with ongoing or failed builds.Jul 21 2021, 7:29 PM
This revision was automatically updated to reflect the committed changes.