This patch make the following optimization.
(mul x, 3 * power_of_2) -> (SLLI (SH1ADD x, x), bits) (mul x, 5 * power_of_2) -> (SLLI (SH2ADD x, x), bits) (mul x, 9 * power_of_2) -> (SLLI (SH3ADD x, x), bits)
Paths
| Differential D105796
[RISCV] Optimize multiplication in the zba extension with SH*ADD ClosedPublic Authored by benshi001 on Jul 12 2021, 12:36 AM.
Details Summary This patch make the following optimization. (mul x, 3 * power_of_2) -> (SLLI (SH1ADD x, x), bits) (mul x, 5 * power_of_2) -> (SLLI (SH2ADD x, x), bits) (mul x, 9 * power_of_2) -> (SLLI (SH3ADD x, x), bits)
Diff Detail
Unit TestsFailed Event TimelineHerald added subscribers: vkmr, frasercrmck, evandro and 23 others. · View Herald TranscriptJul 12 2021, 12:37 AM Comment Actions The new optimization rules must be put before the following generic SH*ADD rules, otherwise the new optimization does not work. def : Pat<(add (shl GPR:$rs1, (XLenVT 1)), non_imm12:$rs2), (SH1ADD GPR:$rs1, GPR:$rs2)>; def : Pat<(add (shl GPR:$rs1, (XLenVT 2)), non_imm12:$rs2), (SH2ADD GPR:$rs1, GPR:$rs2)>; def : Pat<(add (shl GPR:$rs1, (XLenVT 3)), non_imm12:$rs2), (SH3ADD GPR:$rs1, GPR:$rs2)>; Comment Actions For previous Pats, such as def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 12)), GPR:$rs2), (SH2ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>; The pattern (rs1*12 + rs2) no long existed after changes in decomposeMulByConstant(), so it has to be changed to ParFrag<shadd ...> def : Pat<(shadd GPR:$rs1, (XLenVT 1), (XLenVT 2), non_imm12:$rs2), (SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>; to preserve the previous optimization. What's more, they must be put before the generic rs1*2+rs2->(shadd rs1, rs2) ones otherwise they can not be matched.
Comment Actions I did not put the optimization of (mul x, 3 * power_of_2) in decomposeMulByConstant(). The reason is that using a PatLeaf is more simple than checking the pattern (ADD (SLLI), (SLLI). Comment Actions I change the $rs2 in the Pat from non_imm12 to GPR as following def : Pat<(addshl GPR:$rs1, (XLenVT 1), (XLenVT 2), GPR:$rs2), (SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>; Because for a*6 + 10, non_imm12:$rs2 will lead to addi Rb, zero, 6 mul Ra, Rb, Ra addi Ra, Ra, 6 while GPR:$rs2 will lead to addi Rb, zero, 10 sh1add Ra, Ra, Ra sh1add Ra, Ra, Rb And I think the later one is better, so I changed non_imm12:$rs2 to GPR:$rs2. Comment Actions
I'm not sure I follow this. How can a change to the isel pattern cause a mul to be created? Wasn't the mul already decomposed? Comment Actions
You are right. For non_imm12:$rs2 in a*10+6, the following is generated slli a1, a0, 1 sh3add a0, a0, a1 addi a0, a0, 6 For GPR:$rs2, the assembly is sh2add a0, a0, a0 addi a1, zero, 6 sh1add a0, a0, a1 I think maybe the first one is better, so I will rollback my patch. This comment was removed by benshi001. Comment Actions I have split previous patch to smaller ones, which would be easy to review. And current patch only contains (mul x, 3 * power_of_2) -> (SLLI (SH1ADD x, x), bits) (mul x, 5 * power_of_2) -> (SLLI (SH2ADD x, x), bits) (mul x, 9 * power_of_2) -> (SLLI (SH3ADD x, x), bits)
benshi001 marked an inline comment as done. Comment ActionsDo we need to change BSETINVTwoBitsMaskHigh to LeadingZerosXForm ? Comment Actions
Since it does 63-leadingzeros i wouldn’t name it LeadingZerosXForm. You can leave it as is until we find another use for it. This revision is now accepted and ready to land.Jul 21 2021, 7:09 PM This revision was landed with ongoing or failed builds.Jul 21 2021, 7:29 PM Closed by commit rG9e5c5afc7ee2: [RISCV] Optimize multiplication in the zba extension with SH*ADD (authored by benshi001). · Explain Why This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 359527 llvm/lib/Target/RISCV/RISCVISelLowering.cpp
llvm/lib/Target/RISCV/RISCVInstrInfoB.td
llvm/test/CodeGen/RISCV/rv32zba.ll
llvm/test/CodeGen/RISCV/rv64zba.ll
|
To be, shadd means something like A << B + C, not A << B + A << C + D