This patch make the following optimization.
(mul x, 3 * power_of_2) -> (SLLI (SH1ADD x, x), bits) (mul x, 5 * power_of_2) -> (SLLI (SH2ADD x, x), bits) (mul x, 9 * power_of_2) -> (SLLI (SH3ADD x, x), bits)
Differential D105796
[RISCV] Optimize multiplication in the zba extension with SH*ADD benshi001 on Jul 12 2021, 12:36 AM. Authored by
Details This patch make the following optimization. (mul x, 3 * power_of_2) -> (SLLI (SH1ADD x, x), bits) (mul x, 5 * power_of_2) -> (SLLI (SH2ADD x, x), bits) (mul x, 9 * power_of_2) -> (SLLI (SH3ADD x, x), bits)
Diff Detail
Event TimelineComment Actions The new optimization rules must be put before the following generic SH*ADD rules, otherwise the new optimization does not work. def : Pat<(add (shl GPR:$rs1, (XLenVT 1)), non_imm12:$rs2), (SH1ADD GPR:$rs1, GPR:$rs2)>; def : Pat<(add (shl GPR:$rs1, (XLenVT 2)), non_imm12:$rs2), (SH2ADD GPR:$rs1, GPR:$rs2)>; def : Pat<(add (shl GPR:$rs1, (XLenVT 3)), non_imm12:$rs2), (SH3ADD GPR:$rs1, GPR:$rs2)>; Comment Actions For previous Pats, such as def : Pat<(add (mul_oneuse GPR:$rs1, (XLenVT 12)), GPR:$rs2), (SH2ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>; The pattern (rs1*12 + rs2) no long existed after changes in decomposeMulByConstant(), so it has to be changed to ParFrag<shadd ...> def : Pat<(shadd GPR:$rs1, (XLenVT 1), (XLenVT 2), non_imm12:$rs2), (SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>; to preserve the previous optimization. What's more, they must be put before the generic rs1*2+rs2->(shadd rs1, rs2) ones otherwise they can not be matched.
Comment Actions I did not put the optimization of (mul x, 3 * power_of_2) in decomposeMulByConstant(). The reason is that using a PatLeaf is more simple than checking the pattern (ADD (SLLI), (SLLI). Comment Actions I change the $rs2 in the Pat from non_imm12 to GPR as following def : Pat<(addshl GPR:$rs1, (XLenVT 1), (XLenVT 2), GPR:$rs2), (SH1ADD (SH1ADD GPR:$rs1, GPR:$rs1), GPR:$rs2)>; Because for a*6 + 10, non_imm12:$rs2 will lead to addi Rb, zero, 6 mul Ra, Rb, Ra addi Ra, Ra, 6 while GPR:$rs2 will lead to addi Rb, zero, 10 sh1add Ra, Ra, Ra sh1add Ra, Ra, Rb And I think the later one is better, so I changed non_imm12:$rs2 to GPR:$rs2. Comment Actions I'm not sure I follow this. How can a change to the isel pattern cause a mul to be created? Wasn't the mul already decomposed? This comment was removed by benshi001. Comment Actions You are right. For non_imm12:$rs2 in a*10+6, the following is generated slli a1, a0, 1 sh3add a0, a0, a1 addi a0, a0, 6 For GPR:$rs2, the assembly is sh2add a0, a0, a0 addi a1, zero, 6 sh1add a0, a0, a1 I think maybe the first one is better, so I will rollback my patch. This comment was removed by benshi001. Comment Actions I have split previous patch to smaller ones, which would be easy to review. And current patch only contains (mul x, 3 * power_of_2) -> (SLLI (SH1ADD x, x), bits) (mul x, 5 * power_of_2) -> (SLLI (SH2ADD x, x), bits) (mul x, 9 * power_of_2) -> (SLLI (SH3ADD x, x), bits)
Comment Actions Since it does 63-leadingzeros i wouldn’t name it LeadingZerosXForm. You can leave it as is until we find another use for it. |
To be, shadd means something like A << B + C, not A << B + A << C + D