This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Strength reduction of multiply by a constant by shift and add/sub in place
ClosedPublic

Authored by wuzish on Mar 4 2019, 7:41 PM.

Details

Summary

A shift and add/sub sequence combination is faster in place of a multiply by constant. Because the cycle or latency of multiply is not huge, we only consider such following worthy patterns.

(mul x, 2^N + 1) => (add (shl x, N), x)
(mul x, -(2^N + 1)) => -(add (shl x, N), x)
(mul x, 2^N - 1) => (sub (shl x, N), x)
(mul x, -(2^N - 1)) => (sub x, (shl x, N))

And the cycles or latency is subtarget-dependent so that we need consider the subtarget to determine to do or not do such transformation. Also data type is considered for different cycles or latency to do multiply.

Diff Detail

Event Timeline

wuzish created this revision.Mar 4 2019, 7:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2019, 7:42 PM
jsji added a comment.Mar 15 2019, 2:48 PM

Comments mostly related to comments and testcases.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
14584

condion => condition, and move to before 'return'?

14588

This comments is confusing: all four cases are using vector mul, so "because vector mul has more cycles" can not explain why we want to do the case of IsAddone && IsNeg for vector type only.

Can we add comments about the cycles before and after combine, maybe a table so that it will clear about why the combine is profitable for specific type/pattern here ?
eg:

  type         mul     add  shl   add+shl .  sub+add+shl      sub+shl
 scalar         ?         ?       ? 
vector          ?          ?     ?
14601

Maybe we should move these comments after if similar to else if below?

llvm/test/CodeGen/PowerPC/mul-const-vector.ll
10

Looks like vsplitsb/xxsplitb is the only difference for P8/P9, you might be better using common prefixes to share (and reduce) checks.

... -mcpu=pwr9 ... --check-prefixes=CHECK,CHECK-P9
... -mcpu=pwr8 --check-prefixes=CHECK,CHECK-P8

; CHECK-LABEL: test1_v16i8:
; CHECK-P8:vspltisb v[[REG1:[0-9]+]], 4
; CHECK-P9: xxspltib v[[REG1:[0-9]+]], 4
; CHECK-NOT: vmul
; CHECK-NEXT: vslb v[[REG2:[0-9]+]], v2, v[[REG1]]
wuzish marked 4 inline comments as done.Mar 19 2019, 12:10 AM
wuzish added inline comments.
llvm/test/CodeGen/PowerPC/mul-const-vector.ll
10

Thank you. It's good to know the usage of two prefixes. --check-prefixes=CHECK,CHECK-P8

Address comments and simplify the check pattern by using multiple prefixes.

jsji added a comment.Mar 25 2019, 8:07 AM

Thanks for updating! @wuzish However, I am even more confused by your new comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
14591

These are for P9 right? I believe P8 is different no?

14593

? I am confused: the combination here is doing strength reduction from mul to shl. Why we are comparing the cycles between vector type and scalar type? Should we compare the cycles for mul seq and shl seq for the SAME type?

wuzish marked 2 inline comments as done.Mar 25 2019, 7:23 PM
wuzish added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
14591

Hmm, well. Actually I only have cycles information table about P9, so we can only do for P9 and leave P8 as TODO, and also change comments into // TODO: enhance the condition for subtarget before pwr9

14593

Because this transformation replacement are 3 instrs. So it's not profitable for scalar mul operation because scalar mul takes 5 cycles, but it's profitable for vector mul operation because vector mul takes 7 cycles.

jsji added inline comments.Mar 26 2019, 7:39 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
14591

OK for me.

14593

yes, but my point is your comment here is misleading.

How about something like:

mul is 5(scalar)/7(vector) cycle, add/sub/shl are all 2 cycle. For 2 instrs patterns, add/sub + shl are 4 cycles, it is always profitable; but for 3 instrs patterns (mul x, -(2^N + 1)) => -(add (shl x, N), x) , sub+add+shl are 6 cycles, so we should only do it for vector type.

update code comments.

wuzish marked 2 inline comments as done and an inline comment as not done.Mar 26 2019, 11:30 PM

address comments.

wuzish marked 2 inline comments as done.Mar 28 2019, 2:42 AM

Gentle pin.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
14591

I actually find P8 cycles information and update related test result to accept P8.

jsji accepted this revision.Mar 28 2019, 7:59 AM

LGTM. Thanks for exploiting.

This revision is now accepted and ready to land.Mar 28 2019, 7:59 AM
wuzish marked an inline comment as done.

rebase patch based trunk because of out-of-date.

This revision was automatically updated to reflect the committed changes.
llvm/test/CodeGen/PowerPC/mul-const.ll