... to shift/add or shift/sub.
Details
Diff Detail
Event Timeline
This patch can not cover all cases (especially "call __mulsi3" on rv32 without M extension), but at least it works well for most cases.
Maybe a better solution is make ISD::MUL as Custom, which I will try later. You are appreciated to review and accept such a partial optimization.
Thanks for the patch!
This optimisation is done by DAGCombine if you instead implement decomposeMulByConstant in RISCVTargetLowering. Read the comment on the TargetLoweringBase class to understand how to use it. This is preferrable, because we don't want to maintain a target-specific copy of this optimisation if we can avoid it.
It would be sensible to base your implementation on the one in the x86 backend: X86TargetLowering::decomposeMulByConstant, which deals with some phase ordering issues around legalisation.
This is looking good.
I'm going to pre-commit the test additions today - if you could rebase your changes on top, that will allow us to see how this change affects the new testcases you added. I'll keep you as the author and let you know the sha so you can rebase on top of the commit.
llvm/lib/Target/RISCV/RISCVISelLowering.cpp | ||
---|---|---|
2989 ↗ | (On Diff #273889) | This TODO should not apply to RISC-V, yet. |
llvm/lib/Target/RISCV/RISCVISelLowering.cpp | ||
---|---|---|
2989 ↗ | (On Diff #273889) | Thanks. I have rebased and fixed according to what you suggested. |
I'm happy with this optimisation where this patch removes multiply libcalls.
Where the target has a the m extension, and especially for 64-bit multiplies on rv32im, I'm not sure this is an optimisation.
I think that, for the moment, we should add a guard to the hook to avoid this transformation where we do have mul instructions:
if (Subtarget.hasStdExtM()) return false;
What do you think?
llvm/test/CodeGen/RISCV/mul.ll | ||
---|---|---|
305–379 | I think this is a pessimisation, though I realise that depends on how slow the 32-bit multiplier is compared to add/shift. |
Shall we loose the guard condition to that ?
if (!Subtarget.is64Bit && Subtarget.hasStdExtM()) return false;
This will prevent the optimization for RV32IM, but still work for RV64IM。
I think a mul-instruction's latency is sure to be >=2, so all existing test cases will not have regresion.
LGTM.
I'm not overly concerned about the occasional code size increases from doing the optimization for RV32IM, so the loosening of the condition is OK IMO.
Everything else seems to be in order now.
Maybe wait a couple of days more for @lenary's OK.
One issue, then I'm happy.
llvm/lib/Target/RISCV/RISCVISelLowering.cpp | ||
---|---|---|
2990 ↗ | (On Diff #275139) | getSExtValue will assert if the value does not fit into 64 bits - you need to do a check before you get there. I think this hook can be called before legalisation, so you may not get only legal types in this call. |
I think this is a pessimisation, though I realise that depends on how slow the 32-bit multiplier is compared to add/shift.