# [PowerPC] Strength reduction of multiply by a constant by shift and add/sub in placeClosedPublicActions

Authored by wuzish on Mar 4 2019, 7:41 PM.

# Details

Summary

A shift and add/sub sequence combination is faster in place of a multiply by constant. Because the cycle or latency of multiply is not huge, we only consider such following worthy patterns.

```(mul x, 2^N + 1) => (add (shl x, N), x)
(mul x, -(2^N + 1)) => -(add (shl x, N), x)
(mul x, 2^N - 1) => (sub (shl x, N), x)
(mul x, -(2^N - 1)) => (sub x, (shl x, N))```

And the cycles or latency is subtarget-dependent so that we need consider the subtarget to determine to do or not do such transformation. Also data type is considered for different cycles or latency to do multiply.

# Diff Detail

### Event Timeline

wuzish created this revision.Mar 4 2019, 7:41 PM
Herald added a project: Restricted Project. Mar 4 2019, 7:42 PM

gentle pin...

jsji added a comment.Mar 15 2019, 2:48 PM

Comments mostly related to comments and testcases.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
14587

condion => condition, and move to before 'return'?

14591

This comments is confusing: all four cases are using vector mul, so "because vector mul has more cycles" can not explain why we want to do the case of IsAddone && IsNeg for vector type only.

Can we add comments about the cycles before and after combine, maybe a table so that it will clear about why the combine is profitable for specific type/pattern here ?
eg:

```  type         mul     add  shl   add+shl .  sub+add+shl      sub+shl
scalar         ?         ?       ?
vector          ?          ?     ?```
14604

Maybe we should move these comments after if similar to else if below?

llvm/test/CodeGen/PowerPC/mul-const-vector.ll
10

Looks like vsplitsb/xxsplitb is the only difference for P8/P9, you might be better using common prefixes to share (and reduce) checks.

```... -mcpu=pwr9 ... --check-prefixes=CHECK,CHECK-P9
... -mcpu=pwr8 --check-prefixes=CHECK,CHECK-P8

; CHECK-LABEL: test1_v16i8:
; CHECK-P8:vspltisb v[[REG1:[0-9]+]], 4
; CHECK-P9: xxspltib v[[REG1:[0-9]+]], 4
; CHECK-NOT: vmul
; CHECK-NEXT: vslb v[[REG2:[0-9]+]], v2, v[[REG1]]```
wuzish marked 4 inline comments as done.Mar 19 2019, 12:10 AM
llvm/test/CodeGen/PowerPC/mul-const-vector.ll
10

Thank you. It's good to know the usage of two prefixes. --check-prefixes=CHECK,CHECK-P8

wuzish updated this revision to Diff 191253.Mar 19 2019, 12:13 AM

Address comments and simplify the check pattern by using multiple prefixes.

jsji added a comment.Mar 25 2019, 8:07 AM

Thanks for updating! @wuzish However, I am even more confused by your new comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
14594

These are for P9 right? I believe P8 is different no?

14596

? I am confused: the combination here is doing strength reduction from mul to shl. Why we are comparing the cycles between vector type and scalar type? Should we compare the cycles for mul seq and shl seq for the SAME type?

wuzish marked 2 inline comments as done.Mar 25 2019, 7:23 PM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
14594

Hmm, well. Actually I only have cycles information table about P9, so we can only do for P9 and leave P8 as TODO, and also change comments into // TODO: enhance the condition for subtarget before pwr9

14596

Because this transformation replacement are 3 instrs. So it's not profitable for scalar mul operation because scalar mul takes 5 cycles, but it's profitable for vector mul operation because vector mul takes 7 cycles.

jsji added inline comments.Mar 26 2019, 7:39 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
14594

OK for me.

14596

yes, but my point is your comment here is misleading.

How about something like:

`mul is 5(scalar)/7(vector) cycle, add/sub/shl are all 2 cycle. For 2 instrs patterns, add/sub + shl are 4 cycles, it is always profitable; but for 3 instrs patterns (mul x, -(2^N + 1)) => -(add (shl x, N), x) , sub+add+shl are 6 cycles, so we should only do it for vector type.`
wuzish updated this revision to Diff 192413.Mar 26 2019, 11:27 PM

wuzish marked 2 inline comments as done and an inline comment as not done.Mar 26 2019, 11:30 PM
wuzish updated this revision to Diff 192587.Mar 28 2019, 2:39 AM

wuzish marked 2 inline comments as done.Mar 28 2019, 2:42 AM

Gentle pin.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
14594

I actually find P8 cycles information and update related test result to accept P8.

jsji accepted this revision.Mar 28 2019, 7:59 AM

LGTM. Thanks for exploiting.

This revision is now accepted and ready to land.Mar 28 2019, 7:59 AM
wuzish updated this revision to Diff 192767.Mar 28 2019, 7:24 PM
wuzish marked an inline comment as done.

rebase patch based trunk because of out-of-date.

This revision was automatically updated to reflect the committed changes.