Page MenuHomePhabricator

[RISCV] Optimize multiplication by constant
ClosedPublic

Authored by benshi001 on Jun 26 2020, 7:46 AM.

Diff Detail

Event Timeline

benshi001 created this revision.Jun 26 2020, 7:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 26 2020, 7:46 AM

This patch can not cover all cases (especially "call __mulsi3" on rv32 without M extension), but at least it works well for most cases.

Maybe a better solution is make ISD::MUL as Custom, which I will try later. You are appreciated to review and accept such a partial optimization.

benshi001 edited the summary of this revision. (Show Details)Jun 26 2020, 8:11 AM
benshi001 edited the summary of this revision. (Show Details)Jun 26 2020, 8:14 AM
benshi001 edited the summary of this revision. (Show Details)Jun 26 2020, 8:25 AM

Thanks for the patch!

This optimisation is done by DAGCombine if you instead implement decomposeMulByConstant in RISCVTargetLowering. Read the comment on the TargetLoweringBase class to understand how to use it. This is preferrable, because we don't want to maintain a target-specific copy of this optimisation if we can avoid it.

It would be sensible to base your implementation on the one in the x86 backend: X86TargetLowering::decomposeMulByConstant, which deals with some phase ordering issues around legalisation.

benshi001 edited the summary of this revision. (Show Details)Jun 26 2020, 8:27 AM
benshi001 updated this revision to Diff 273889.Jun 27 2020, 1:49 AM
benshi001 edited the summary of this revision. (Show Details)

Thanks. I have uploaded a new version according to your suggestion!

Thanks for the patch!

This optimisation is done by DAGCombine if you instead implement decomposeMulByConstant in RISCVTargetLowering. Read the comment on the TargetLoweringBase class to understand how to use it. This is preferrable, because we don't want to maintain a target-specific copy of this optimisation if we can avoid it.

It would be sensible to base your implementation on the one in the x86 backend: X86TargetLowering::decomposeMulByConstant, which deals with some phase ordering issues around legalisation.

lenary added a comment.Jul 1 2020, 2:34 PM

This is looking good.

I'm going to pre-commit the test additions today - if you could rebase your changes on top, that will allow us to see how this change affects the new testcases you added. I'll keep you as the author and let you know the sha so you can rebase on top of the commit.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2995

This TODO should not apply to RISC-V, yet.

lenary added a comment.Jul 1 2020, 2:47 PM

This is looking good.

I'm going to pre-commit the test additions today - if you could rebase your changes on top, that will allow us to see how this change affects the new testcases you added. I'll keep you as the author and let you know the sha so you can rebase on top of the commit.

Done in rG003a086ffc0.

benshi001 updated this revision to Diff 274997.Jul 1 2020, 8:24 PM
benshi001 edited the summary of this revision. (Show Details)
benshi001 marked 2 inline comments as done.Jul 1 2020, 8:31 PM
benshi001 added inline comments.
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2995

Thanks. I have rebased and fixed according to what you suggested.

lenary added a comment.Jul 2 2020, 3:50 AM

I'm happy with this optimisation where this patch removes multiply libcalls.

Where the target has a the m extension, and especially for 64-bit multiplies on rv32im, I'm not sure this is an optimisation.

I think that, for the moment, we should add a guard to the hook to avoid this transformation where we do have mul instructions:

if (Subtarget.hasStdExtM())
  return false;

What do you think?

llvm/test/CodeGen/RISCV/mul.ll
546–547

I think this is a pessimisation, though I realise that depends on how slow the 32-bit multiplier is compared to add/shift.

benshi001 updated this revision to Diff 275136.Jul 2 2020, 8:36 AM
benshi001 marked an inline comment as done.
benshi001 updated this revision to Diff 275139.Jul 2 2020, 8:44 AM
benshi001 marked an inline comment as done.EditedJul 2 2020, 8:50 AM

I'm happy with this optimisation where this patch removes multiply libcalls.

Where the target has a the m extension, and especially for 64-bit multiplies on rv32im, I'm not sure this is an optimisation.

I think that, for the moment, we should add a guard to the hook to avoid this transformation where we do have mul instructions:

if (Subtarget.hasStdExtM())
  return false;

What do you think?

Shall we loose the guard condition to that ?

if (!Subtarget.is64Bit && Subtarget.hasStdExtM())
   return false;

This will prevent the optimization for RV32IM, but still work for RV64IM。

I think a mul-instruction's latency is sure to be >=2, so all existing test cases will not have regresion.

luismarques accepted this revision.Jul 6 2020, 10:10 AM

LGTM.
I'm not overly concerned about the occasional code size increases from doing the optimization for RV32IM, so the loosening of the condition is OK IMO.
Everything else seems to be in order now.
Maybe wait a couple of days more for @lenary's OK.

This revision is now accepted and ready to land.Jul 6 2020, 10:10 AM
lenary added a comment.Jul 7 2020, 1:17 AM

One issue, then I'm happy.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
2990

getSExtValue will assert if the value does not fit into 64 bits - you need to do a check before you get there.

I think this hook can be called before legalisation, so you may not get only legal types in this call.

benshi001 updated this revision to Diff 276297.Jul 7 2020, 6:27 PM
benshi001 marked an inline comment as done.
MaskRay accepted this revision.Jul 7 2020, 6:38 PM
MaskRay retitled this revision from [RISCV] Optimize multiplication by specific immediates to [RISCV] Optimize multiplication by constant.
MaskRay edited the summary of this revision. (Show Details)
This revision was automatically updated to reflect the committed changes.