Download Raw Diff

Details

Reviewers

steven.zhang
nemanjai
jsji
qiucf

Group Reviewers

Restricted Project

Commits

rG6e0ad5bc8c34: [PowerPC] Add an ISEL pattern for Mul with Imm.

Summary

If the multiplicand is a constant with following formats:

mul with (2^N * int16_imm) -> mulli + rldicr
mul with (2^N + 2^M) -> rldicr + add + rldicr
mul with (2^N - 2^M) -> rldicr + sub + rldicr

Scenario 2 and 3 are moved to DAGCombiner D88201.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	10 ms	linux > LLVM-Unit.ADT/_/ADTTests::APFloatTest.FMA
	20 ms	windows > LLVM-Unit.ADT/_/ADTTests_exe::APFloatTest.FMA

Event Timeline

Esme created this revision.Sep 9 2020, 8:12 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 9 2020, 8:12 AM

Herald added subscribers: llvm-commits, shchenz, kbarton, hiraditya. · View Herald Transcript

Esme requested review of this revision.Sep 9 2020, 8:12 AM

Harbormaster completed remote builds in B71095: Diff 290740.Sep 9 2020, 9:55 AM

Esme edited the summary of this revision. (Show Details)Sep 9 2020, 7:45 PM

LGTM. Thank you for doing this. I assume that you have run it with bmk to make sure that the functionality is fine.

This revision is now accepted and ready to land.Sep 16 2020, 1:15 AM

Why this can NOT be done in DAGCombiner by implementing decomposeMulByConstant target hook?

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
4990	Can we add comments about all these scenarios? With simple examples.
5000	`exactly two bits set` ? What do you mean?

In D87384#2279886, @jsji wrote:

Why this can NOT be done in DAGCombiner by implementing decomposeMulByConstant target hook?

Do you have evidence of profitability of (mul (shl %a, N) M) being preferable over (mul %a, C) on other targets? If so, I suppose extending the DAG combine that handles:

mul x, (2^N + 1) --> add (shl x, N), x
mul x, (2^N - 1) --> sub (shl x, N), x

Perhaps it wold be good to gauge interest from other RISC targets.

Scenarios 2 and 3 are quite similar to the existing DAG combine so this would be a straightforward implementation. However I am not sure how likely it is that two shifts and an add/sub are better than a multiply on other targets.

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
4990	I agree 100%. We should illustrate this with some example bit patterns.
5000	2^N + 2^M has exactly two bits set. However, I don't think this is profitable as implemented. This implementation does not benefit from an ILP improvement since the instructions are dependent. What we probably want is to convert `(mul %a, (2^N + 2^M))` to `(add (shl %a, N), (shl %a, M))` since that improves ILP so the total latency of the sequence is actually reduced. That would also mean that for something like this: res1 = a 0x8800; res2 = a 0x8080; we only need three total shifts (we should also add that as a test case). Similar argument applies to the subtract case below.

Thank you for your comments! @jsji @nemanjai

For scenarios 2 and 3, I will modified it as @nemanjai 's hint, and move it to DAGCombiner since it is really similar to existing patterns and then implement our PPCTargetLowering::decomposeMulByConstant. If other targets are interested about this, they can add conditions for this in their decomposeMulByConstant.

But for scenario 1, I am not sure if there are benefits to other targets, since it is inspired by our HW instruction MULLI. So I prefer to keep it in ISEL.

Keep the scenario 1 implemented in ISEL.
For scenario 2 and 3, I will post another patch to implement them in DAGCombiner.

Harbormaster completed remote builds in B72499: Diff 293409.Sep 22 2020, 5:15 AM

I think we already have similar pattern for scenario 1 as well:

// Change (mul (shl X, C), Y) -> (shl (mul X, Y), C) when the shift has one
// use.

Just that (shl X,C) is not constant there..

So I would assume dealing with similar situation controlled by TLI.decomposeMulByConstantwill be easy and also no harm to other targets.
Targets can control and add this scenario in their TLI if they also want this.

BTW: looks like x86 also has imul.

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
4989	Maybe it would be clearer if we use DAG expression in comments. eg: `(mul X, c2 << c1) -> (rldicr (mulli X, c2 >> c1) c1)`

Hi @jsji An infinite loop will occur if we handle the scenario 1 (mul X, c2 << c1) -> (shl (mul X, c2), c1) in DAGCombiner, because there exists a reverse conversion (shl (mul x, c1), c2) -> (mul x, c1 << c2).

In D87384#2289477, @Esme wrote:

Hi @jsji An infinite loop will occur if we handle the scenario 1 (mul X, c2 << c1) -> (mul (shl X, c1), c2) in DAGCombiner, because there exists a reverse conversion (mul (shl X, c1), c2) -> (mul X, c2 << c1).

Hmm.. then OK to keep this in ISEL, but please add comments about DAGcombiner prefer (mul X, c2 << c1). Thanks

Update comments.

Esme edited the summary of this revision. (Show Details)Sep 24 2020, 12:29 AM

Esme mentioned this in D88201: [DAGCombiner] Add decomposition patterns for Mul-by-Imm..Sep 24 2020, 12:54 AM

Harbormaster completed remote builds in B72765: Diff 293950.Sep 24 2020, 12:59 AM

Esme updated this revision to Diff 293968.Sep 24 2020, 1:32 AM

Esme edited the summary of this revision. (Show Details)

Thank you so much for your comments. :D @jsji @nemanjai
I have post another patch D88201 to handle scenario 2 and 3 in DAGCombiner.

Harbormaster completed remote builds in B72780: Diff 293968.Sep 24 2020, 2:24 AM

Format.

Harbormaster completed remote builds in B72791: Diff 293996.Sep 24 2020, 3:54 AM

Looking forward to your further comments :) @jsji @nemanjai