Page MenuHomePhabricator

[PowerPC] Strength reduction of multiply by a constant by shift and add/sub in place
ClosedPublic

Authored by wuzish on Mar 4 2019, 7:41 PM.

Details

Summary

A shift and add/sub sequence combination is faster in place of a multiply by constant. Because the cycle or latency of multiply is not huge, we only consider such following worthy patterns.

(mul x, 2^N + 1) => (add (shl x, N), x)
(mul x, -(2^N + 1)) => -(add (shl x, N), x)
(mul x, 2^N - 1) => (sub (shl x, N), x)
(mul x, -(2^N - 1)) => (sub x, (shl x, N))

And the cycles or latency is subtarget-dependent so that we need consider the subtarget to determine to do or not do such transformation. Also data type is considered for different cycles or latency to do multiply.

Diff Detail

Repository
rL LLVM

Event Timeline

wuzish created this revision.Mar 4 2019, 7:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2019, 7:42 PM

gentle pin...

jsji added a comment.Mar 15 2019, 2:48 PM

Comments mostly related to comments and testcases.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
14584 ↗(On Diff #189264)

condion => condition, and move to before 'return'?

14588 ↗(On Diff #189264)

This comments is confusing: all four cases are using vector mul, so "because vector mul has more cycles" can not explain why we want to do the case of IsAddone && IsNeg for vector type only.

Can we add comments about the cycles before and after combine, maybe a table so that it will clear about why the combine is profitable for specific type/pattern here ?
eg:

  type         mul     add  shl   add+shl .  sub+add+shl      sub+shl
 scalar         ?         ?       ? 
vector          ?          ?     ?
14601 ↗(On Diff #189264)

Maybe we should move these comments after if similar to else if below?

llvm/test/CodeGen/PowerPC/mul-const-vector.ll
9 ↗(On Diff #189264)

Looks like vsplitsb/xxsplitb is the only difference for P8/P9, you might be better using common prefixes to share (and reduce) checks.

... -mcpu=pwr9 ... --check-prefixes=CHECK,CHECK-P9
... -mcpu=pwr8 --check-prefixes=CHECK,CHECK-P8

; CHECK-LABEL: test1_v16i8:
; CHECK-P8:vspltisb v[[REG1:[0-9]+]], 4
; CHECK-P9: xxspltib v[[REG1:[0-9]+]], 4
; CHECK-NOT: vmul
; CHECK-NEXT: vslb v[[REG2:[0-9]+]], v2, v[[REG1]]
wuzish marked 4 inline comments as done.Mar 19 2019, 12:10 AM
wuzish added inline comments.
llvm/test/CodeGen/PowerPC/mul-const-vector.ll
9 ↗(On Diff #189264)

Thank you. It's good to know the usage of two prefixes. --check-prefixes=CHECK,CHECK-P8

wuzish updated this revision to Diff 191253.Mar 19 2019, 12:13 AM

Address comments and simplify the check pattern by using multiple prefixes.

jsji added a comment.Mar 25 2019, 8:07 AM

Thanks for updating! @wuzish However, I am even more confused by your new comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
14591 ↗(On Diff #191253)

These are for P9 right? I believe P8 is different no?

14593 ↗(On Diff #191253)

? I am confused: the combination here is doing strength reduction from mul to shl. Why we are comparing the cycles between vector type and scalar type? Should we compare the cycles for mul seq and shl seq for the SAME type?

wuzish marked 2 inline comments as done.Mar 25 2019, 7:23 PM
wuzish added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
14591 ↗(On Diff #191253)

Hmm, well. Actually I only have cycles information table about P9, so we can only do for P9 and leave P8 as TODO, and also change comments into // TODO: enhance the condition for subtarget before pwr9

14593 ↗(On Diff #191253)

Because this transformation replacement are 3 instrs. So it's not profitable for scalar mul operation because scalar mul takes 5 cycles, but it's profitable for vector mul operation because vector mul takes 7 cycles.

jsji added inline comments.Mar 26 2019, 7:39 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
14591 ↗(On Diff #191253)

OK for me.

14593 ↗(On Diff #191253)

yes, but my point is your comment here is misleading.

How about something like:

mul is 5(scalar)/7(vector) cycle, add/sub/shl are all 2 cycle. For 2 instrs patterns, add/sub + shl are 4 cycles, it is always profitable; but for 3 instrs patterns (mul x, -(2^N + 1)) => -(add (shl x, N), x) , sub+add+shl are 6 cycles, so we should only do it for vector type.
wuzish updated this revision to Diff 192413.Mar 26 2019, 11:27 PM

update code comments.

wuzish marked 2 inline comments as done and an inline comment as not done.Mar 26 2019, 11:30 PM
wuzish updated this revision to Diff 192587.Mar 28 2019, 2:39 AM

address comments.

wuzish marked 2 inline comments as done.Mar 28 2019, 2:42 AM

Gentle pin.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
14591 ↗(On Diff #191253)

I actually find P8 cycles information and update related test result to accept P8.

jsji accepted this revision.Mar 28 2019, 7:59 AM

LGTM. Thanks for exploiting.

This revision is now accepted and ready to land.Mar 28 2019, 7:59 AM
wuzish updated this revision to Diff 192767.Mar 28 2019, 7:24 PM
wuzish marked an inline comment as done.

rebase patch based trunk because of out-of-date.

This revision was automatically updated to reflect the committed changes.