This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Add target hook function to decide folding (mul (add x, c1), c2)
ClosedPublic

Authored by benshi001 on Aug 8 2021, 3:04 AM.

Diff Detail

Event Timeline

benshi001 created this revision.Aug 8 2021, 3:04 AM
benshi001 requested review of this revision.Aug 8 2021, 3:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 8 2021, 3:04 AM
benshi001 updated this revision to Diff 365022.Aug 8 2021, 7:25 AM
benshi001 updated this revision to Diff 365026.Aug 8 2021, 8:02 AM
craig.topper added inline comments.Aug 8 2021, 2:06 PM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
537 ↗(On Diff #365026)

What about all the optimizations we have for mul by constant using SHXADD? Won't this miss out on those?

benshi001 added inline comments.Aug 8 2021, 6:48 PM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
537 ↗(On Diff #365026)

I do not think it will affect the optimization with SHxADD.

For the ones such as
(mul 11/13/25/41/73/37/21), (mul (3/5/9)*power_of_2), (mul power_of_2 + (2/4/8)), those are pure mul without add involved, so they won't be affected.

For the rules such as
(x + y * 4) -> (SH2ADD y, x)
(x + y * 20) -> (SH2ADD (SH2ADD x, x), y)
.......

My patch still generate better code.

c1 + x * 72 (c1 is a non-simm12 constant)

before current patch

lui Ry, higher-bits of c1
addi Ry, Ry, lower-12-bits of c1
sh3add Rz, x, x
sh3add Rz, Rz, Ry

after my patch

addi Ry, x, c1/72
sh3add Ry, Ry, Ry
sll Ry, Ry, 3

I will add those cases to the test file.

craig.topper added inline comments.Aug 8 2021, 7:13 PM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
537 ↗(On Diff #365026)

Isn’t it possible for the MUL you create here to have 11/13/25/41/73/37/21 as a constant that should use SHXADD?

benshi001 marked 2 inline comments as done.Aug 8 2021, 7:22 PM
benshi001 added inline comments.
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
537 ↗(On Diff #365026)

Yes, it is possible. And for mul with 11/13/25/41/73/37/21 which use shxadd, my change still generates better asm. I will add in the tests.

benshi001 marked an inline comment as done.Aug 8 2021, 7:38 PM
benshi001 added inline comments.Aug 8 2021, 7:38 PM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
537 ↗(On Diff #365026)

Sorry, your concern is right. I can not handle mul with 11/13/25/41/73/37/21.

benshi001 updated this revision to Diff 365073.Aug 8 2021, 9:42 PM
benshi001 added inline comments.
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
537 ↗(On Diff #365026)

I have skip those constants in the zba extension. And will figure out a better way for them in the future.

Thanks for your help.

benshi001 updated this revision to Diff 365091.Aug 8 2021, 11:35 PM
benshi001 updated this revision to Diff 365158.Aug 9 2021, 4:49 AM

Is there any better way to implement this optimization other than ISelDagToDag?

  1. It seems hard to calculate c1/c0 by writing TD mapping rules.
  1. I also tried DAG transform in RISCVTargetLowering::ReplaceNodeResults, it is also not easy, since it involves more code about legalization.

Is there any better way to implement this optimization other than ISelDagToDag?

  1. It seems hard to calculate c1/c0 by writing TD mapping rules.
  1. I also tried DAG transform in RISCVTargetLowering::ReplaceNodeResults, it is also not easy, since it involves more code about legalization.

It should probably be a DAG combine and the transform that's turning (mul (add X, C1), C2) into (add (mul X, C2), C1 * C2) should ask the target if it is profitable, or at least call isLegalAddImmediate for C1*C2. @spatel or @lebedev.ri, what do you think?

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
18 ↗(On Diff #365158)

#include <set>. and it should be after all llvm headers

534 ↗(On Diff #365158)

Use std::array so you can use begin()/end() on the std::set.

Though you could sort the array and use std::binary_search and avoid the set completely.

Is there any better way to implement this optimization other than ISelDagToDag?

  1. It seems hard to calculate c1/c0 by writing TD mapping rules.
  1. I also tried DAG transform in RISCVTargetLowering::ReplaceNodeResults, it is also not easy, since it involves more code about legalization.

It should probably be a DAG combine and the transform that's turning (mul (add X, C1), C2) into (add (mul X, C2), C1 * C2) should ask the target if it is profitable, or at least call isLegalAddImmediate for C1*C2. @spatel or @lebedev.ri, what do you think?

Putting it into DAGCombine SGTM

benshi001 updated this revision to Diff 365405.Aug 10 2021, 3:09 AM
benshi001 retitled this revision from [RISCV] Optimize (add (mul x, c0), c1) to [DAGCombiner] Add target hook function to decide folding (mul (add x, c1), c2).
benshi001 edited the summary of this revision. (Show Details)

Is there any better way to implement this optimization other than ISelDagToDag?

  1. It seems hard to calculate c1/c0 by writing TD mapping rules.
  1. I also tried DAG transform in RISCVTargetLowering::ReplaceNodeResults, it is also not easy, since it involves more code about legalization.

It should probably be a DAG combine and the transform that's turning (mul (add X, C1), C2) into (add (mul X, C2), C1 * C2) should ask the target if it is profitable, or at least call isLegalAddImmediate for C1*C2. @spatel or @lebedev.ri, what do you think?

Thanks for your help! I have added a new target hook function isMulAddWithConstNotProfitable to let the DAGCombiner consult the target before combining (mul (add X, C1), C2).

This solution seems more clear. And it really improve most riscv's assembly code, except two of them, which I have made inline comments.

benshi001 added inline comments.Aug 10 2021, 3:24 AM
llvm/test/CodeGen/RISCV/addimm-mulimm.ll
241

This should not be a regression, since 8 bytes are saved, and the mulw should cost no more than 3 cycles since 73 is a small integer, so does for 19/25/41/73/11/13...

305

This can be further optimized to

(SLLIW (SH1ADD a0, a0, a0), 6).

I will make another patch for this optimization.

benshi001 added inline comments.Aug 10 2021, 3:31 AM
llvm/test/CodeGen/RISCV/addimm-mulimm.ll
305

The best form should be

; RV64IM-NEXT: addi a0, a0, 1000
; RV64IM-NEXT: sh1add a0, a0, a0
; RV64IM-NEXT: slliw a0, a0, 6

It can be done via new rules in RISCVInstrInfoB.td

benshi001 added inline comments.Aug 10 2021, 5:27 AM
llvm/test/CodeGen/RISCV/addimm-mulimm.ll
305

I have submitted another patch to optimize that case.

https://reviews.llvm.org/D107820

In this solution, this is no need to concern the impact to optimization with SHXADD.

jrtc27 added inline comments.Aug 10 2021, 5:43 AM
llvm/test/CodeGen/RISCV/addimm-mulimm.ll
214

Why are RV32IM check lines, in a file whose name and path don't say bitmanip, mentioning bitmanip instructions?

benshi001 updated this revision to Diff 365474.Aug 10 2021, 7:33 AM
benshi001 marked an inline comment as done.
craig.topper added inline comments.Aug 10 2021, 8:34 AM
llvm/include/llvm/CodeGen/TargetLowering.h
2089

Drop the "Not" and return true. Remove the ! from the caller.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9071

Use cast instead of dyn_cast and drop this assert. cast asserts internally.

benshi001 updated this revision to Diff 365503.Aug 10 2021, 9:09 AM
benshi001 marked 2 inline comments as done.
benshi001 added inline comments.Aug 10 2021, 7:12 PM
llvm/test/CodeGen/RISCV/addimm-mulimm.ll
295

This regression will be fixed by D107708.

benshi001 updated this revision to Diff 365763.Aug 11 2021, 8:12 AM

Comments need updating

benshi001 updated this revision to Diff 365767.Aug 11 2021, 8:58 AM

Comments need updating

I have updated the inline comments. Thank you.

Comments need updating

llvm/include/llvm/CodeGen/TargetLowering.h
2086
llvm/lib/Target/RISCV/RISCVISelLowering.h
467
benshi001 added inline comments.Aug 11 2021, 5:40 PM
llvm/include/llvm/CodeGen/TargetLowering.h
2086

I think my origin default true is right, the default return value should not be false.

Since my hook is called as

if (AddNode.getNode()->hasOneUse() &&
    TLI.isMulAddWithConstProfitable(AddNode, ConstNode))
  return true;

So it should return default true for undetermined cases.

benshi001 added inline comments.Aug 11 2021, 5:49 PM
llvm/include/llvm/CodeGen/TargetLowering.h
2086

Actually my previous version is isMulAddWithConstNotProfitable, which return default false, and return true for clear regression on specific targets.

Craig suggested me to remove the Not, and inverse the condition when calling.

The core issue is, the original DAGCombiner will do the folding if the AddNode has only one use, which will harm performance in some situation. And the solution is adding another check (along with the hasOneUse) to let the target prevent the transform if the target does think there is regresssion.

But if the target is also not sure, what default value should be ? And the hook name should have Not or should not have a Not ?

jrtc27 added inline comments.Aug 11 2021, 6:04 PM
llvm/include/llvm/CodeGen/TargetLowering.h
2086

The default should be whatever makes SelectionDAG behave the same as it currently does unless the changes in behaviour turn out to be useful for the majority of targets.

TLI functions should be positive not negative; there are no hooks that have Not in them (other than when they refer to a not instruction).

I'm not sure what the issue is though? Whether you have a NotProfitable function that defaults to false or a Profitable function that defaults to true makes no semantic difference other than when the caller needs to put a ! in front of it, it's purely a stylistic issue.

benshi001 updated this revision to Diff 365898.Aug 11 2021, 7:11 PM
benshi001 added inline comments.Aug 11 2021, 7:13 PM
llvm/include/llvm/CodeGen/TargetLowering.h
2086

I will keep current form without Not, and still return default true to "make SelectionDAG behave the same as it currently does".

And I have improved my comments more clear as

/// Return true if it may be profitable to fold
/// (mul (add x, c1), c2) -> (add (mul x, c2), c1*c2), and return false
/// to prevent the folding for definite regression.
/// The target should check the cost of materializing c1, c2 and c1*c2 into
/// registers. If it is not sure about some cases, a default true
/// can be returned to let the DAGCombiner decide.
benshi001 marked 3 inline comments as done.Aug 11 2021, 7:19 PM

ping ... Can this patch be approved ? It seems there is no objection on adding a hook isMulAddWithConstProfitable

lebedev.ri accepted this revision.Aug 18 2021, 1:26 AM

LGTM unless other have further comments.
Thanks.

This revision is now accepted and ready to land.Aug 18 2021, 1:26 AM

LGTM unless other have further comments.
Thanks.

I would like to land on Sunday evening, unless there will be other objection.

jrtc27 added inline comments.Aug 18 2021, 5:04 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16858–16859

"finds no regression" doesn't make sense to me. A regression is a bug, but if the target says it's profitable then there is no regression; a regression would be if a target said it was not profitable but it in fact was, and so the emitted code got worse. So when a target returns false, it's not that it finds a regression, because nothing has happened yet.

This also feeds into a related point. When comments in DAGCombiner talk about regressions, they generally mean "this transformation should make sense on most targets, but many of them don't enable it currently because they have patterns or custom lowering that would need to be adapted to handle it otherwise they won't match important cases any more" (with the "won't match important cases" being the regression), generally marked as TODO (all but one, with the odd-one-out still saying that something should probably be improved). This is not that case, this is just asking the target whether the transformation is profitable, simply as a "does your instruction set benefit from this transformation?".

llvm/lib/Target/RISCV/RISCVISelLowering.h
464

What's the point of duplicating this doxygen comment? It's just going to get outdated, and anyone who wants to know what it does (which is pretty obvious from the name) can just look at TargetLowering.h. If we copied the documentation to every override the tree would be a mess.

benshi001 updated this revision to Diff 367200.Aug 18 2021, 7:14 AM
benshi001 marked 2 inline comments as done.Aug 18 2021, 7:16 AM
benshi001 added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16858–16859

Thanks. I have made the comments more clear.

llvm/lib/Target/RISCV/RISCVISelLowering.h
464

Thanks. I have removed the redundant comments.

benshi001 marked 2 inline comments as done.Aug 18 2021, 7:25 AM
jrtc27 added inline comments.Aug 18 2021, 7:27 AM
llvm/include/llvm/CodeGen/TargetLowering.h
2083

The comment's still not great, putting grammatical issues aside. A lot of it is just explaining the basics of TLI hooks, but also is overly prescriptive with what backends should do to evaluate it (and I also don't like "to avoid definite worse code generated", often TLI hooks end up being best-effort heuristics, unable to give a definitive answer, because that might require extremely expensive whole-function checks that depend on knowing what other transformations are going to be made).

I'd just go with something like (borrowing style from surrounding examples):

Return true if it may be profitable to transform
(mul (add x, c1), c2) -> (add (mul x, c2), c1*c2).
This may not be true if c1 and c2 can be represented as immediates but c1*c2 cannot, for example.

benshi001 updated this revision to Diff 367231.Aug 18 2021, 9:17 AM
benshi001 marked an inline comment as done.Aug 18 2021, 9:22 AM
benshi001 added inline comments.
llvm/include/llvm/CodeGen/TargetLowering.h
2083

Thanks, I have updated the comments according to your suggested expression.

One more issue, English is not my mother language and your are appreciated to help me fix any "grammatical issues" you mentioned. ^_^

benshi001 marked an inline comment as done.Aug 18 2021, 9:26 AM
luismarques added inline comments.Aug 19 2021, 1:51 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9062

Nit: IMO, it would be more intuitive to compare the other way around, swapping the operands and changing the condition to >=.

benshi001 added inline comments.Aug 19 2021, 5:13 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9062

I can swap the operands, but why changing the condition to >=, should using > be better? Since I think this should consider i64 on rv64 and i32 on rv32.

benshi001 updated this revision to Diff 367469.Aug 19 2021, 5:34 AM
benshi001 marked an inline comment as done.Aug 19 2021, 5:35 AM
jrtc27 added inline comments.Aug 19 2021, 5:53 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9062

Yes, > is correct, and I agree with Luis that this is the more natural way round to express it.

luismarques added inline comments.Aug 19 2021, 5:57 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9062

Sorry, ignore the part about the equals :) I mixed some thoughts when writing that.

benshi001 updated this revision to Diff 367486.Aug 19 2021, 6:53 AM
benshi001 marked 2 inline comments as done.
benshi001 added inline comments.
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
9062

Thanks. The order has been changed.

benshi001 marked an inline comment as done.Aug 19 2021, 6:54 AM