Page MenuHomePhabricator

[PowerPC] suboptimal vec_abs for some cases on P9
ClosedPublic

Authored by jedilyn on Nov 20 2018, 11:58 PM.

Details

Summary

The current code sequence & its cycles for vec_abs look like:

signed char:           tot cycs: 3+2+3 = 8
    xxspltib 35, 128  
    vaddubm 2, 2, 3  
    vabsdub 2, 2, 3  

signed short:          tot cycs: 2+2+3+2+3 = 12
    lis 3, -32768      
    ori 3, 3, 32768    
    mtvsrws 35, 3      
    vadduhm 2, 2, 3  
    vabsduh 2, 2, 3  

signed word:           tot cycs: 2 (parallel 2) + 2 + 3 = 7
    vxor 3, 3, 3       
    xvnegsp 34, 34
    xvnegsp 35, 35
    vabsduw 2, 2, 3

It can be better like:

signed char:           tot cycs: 2 + 2 + 3 = 7
    xxlxor 35, 35, 35 
    vsububm 3, 3, 2
    vmaxsb 2, 2, 3

signed short:          tot cycs: 2 + 2 + 3 = 7
    xxlxor 35, 35, 35
    vsubuhm 3, 3, 2
    vmaxsh 2, 2, 3

signed word:           tot cycs: 2 + 3 = 5
    vnegw 0,2
    vmaxsw 2,0,2

The current implementation is intended to use absolute different instructions introduced in Power9, but it's for unsigned integers and we won't see any benefits with more efforts to support it for signed.

In this past, we always expand abs for vector type excepting for power9, this patch is to recognize vector abs as custom, then even if the code doesn't call vec_abs built-in function but the same semantic code sequence, the general combine pass can check and create abs node for vector type. Later if there is any opportunity we can further combine it to others like vabsdu and the abs node can be removed, otherwise it's lowed to vsub + vmax as before.

Diff Detail

Repository
rL LLVM

Event Timeline

jedilyn created this revision.Nov 20 2018, 11:58 PM
jedilyn edited the summary of this revision. (Show Details)Nov 21 2018, 2:06 AM
jsji requested changes to this revision.Dec 13 2018, 10:25 AM

I like the idea here overall, but looks like we are squeezing several similar and related work into one patch. I would prefer we split them into different patches. Thanks!

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
657 ↗(On Diff #174874)

Can we add some comment here for why we need to handle v2i64 differently? due to current subtarget doesn't support smax v2i64?

9572 ↗(On Diff #174874)

Which SMAX patch? can you be specific? Has it landed now?

13267 ↗(On Diff #174874)

Although combineVSelect is also for VABSD, this looks like another independent optimization, can we split this into another patch?

14526 ↗(On Diff #174874)

Can we add some comments before this function, to show what it will transform from & to?

14562 ↗(On Diff #174874)

Can we add some comments before this function, to show what it will transform from & to?

llvm/lib/Target/PowerPC/PPCISelLowering.h
381 ↗(On Diff #174874)

Can we have some example code to show why we need to patch the bit, similar to old code for ISD::ABS

// For abs(sub(a, b)), we generate VABSDUW(a+0x80000000, b+0x80000000).
...
382 ↗(On Diff #174874)

Since this node will only handle vector types, maybe we should rename it to VABSD?

llvm/lib/Target/PowerPC/PPCInstrVSX.td
4048 ↗(On Diff #174874)

Can we add comment here for the last operand

llvm/test/CodeGen/PowerPC/ppc64-P9-vabsd.ll
1 ↗(On Diff #174874)

Can we add -ppc-vsr-nums-as-vr -ppc-asm-full-reg-names in a NFC patch before this?

5 ↗(On Diff #174874)

Why we remove Function Attrs here? If this is no relevant clean up work, please do it in a NFC patch before this. Thanks.

103 ↗(On Diff #174874)

This can be in the new patch for combineVSelect.

470 ↗(On Diff #174874)

This can be in new combineVselect patch as well.

llvm/test/CodeGen/PowerPC/pre-inc-disable.ll
22 ↗(On Diff #174874)

are these CHECK -> CHECK-DAG relevant to this change? Or just non-relevant NFC change?If so , please commit before this patch. Thanks.

This revision now requires changes to proceed.Dec 13 2018, 10:25 AM
jedilyn marked 12 inline comments as done.Dec 14 2018, 12:35 AM

Hi Jinsong @jsji, thanks for your time! I'll break down it to some separate patches and add more comments later.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
657 ↗(On Diff #174874)

Yes, I'll add more comments.

9572 ↗(On Diff #174874)

The one is D47332, it isn't landed yet. I'll add more comments here.

13267 ↗(On Diff #174874)

Sure.

14526 ↗(On Diff #174874)

Sure.

14562 ↗(On Diff #174874)

Sure.

llvm/lib/Target/PowerPC/PPCISelLowering.h
381 ↗(On Diff #174874)

Good idea, I'll add more comments here.

382 ↗(On Diff #174874)

OK, I will use VABSDinstead.

llvm/lib/Target/PowerPC/PPCInstrVSX.td
4048 ↗(On Diff #174874)

Since we have added more comments to describe the last operand usage for ABSD in target specific ISDNODE header file, does it make sense just leaving this here? Although to supplement more comments here is pretty easy, it probably brings more efforts to maintain the description and keep them always sync in different places.

llvm/test/CodeGen/PowerPC/ppc64-P9-vabsd.ll
1 ↗(On Diff #174874)

Good idea. I'll split it.

103 ↗(On Diff #174874)

Yes, thanks!

470 ↗(On Diff #174874)

OK

llvm/test/CodeGen/PowerPC/pre-inc-disable.ll
22 ↗(On Diff #174874)

Good point, thanks. It can be a NFC patch to make the case more flexible. Technically speaking, the original case isn't very robust, the order and assigned register may change.

jedilyn marked an inline comment as done.Dec 14 2018, 12:41 AM
jedilyn added inline comments.
llvm/test/CodeGen/PowerPC/ppc64-P9-vabsd.ll
5 ↗(On Diff #174874)

I was told that don't keep those irrelevant attributes in the ll case and only leave what is really necessary, I tried to follow it and believed they aren't required for this case, so I cleaned it when updated this file. Good idea to make it in a NFC patch.

inouehrs added inline comments.Dec 14 2018, 12:47 AM
llvm/test/CodeGen/PowerPC/ppc64-P9-vabsd.ll
4 ↗(On Diff #174874)

It is nice to have a test case for pwr7, i.e. hasP8Altivec = false, to confirm that v2i64 is not optimized but others are optimized.

jedilyn marked an inline comment as done.Dec 14 2018, 1:07 AM
jedilyn added inline comments.
llvm/test/CodeGen/PowerPC/ppc64-P9-vabsd.ll
4 ↗(On Diff #174874)

Good point, thanks @inouehrs ! I'll add one case for it.

jedilyn updated this revision to Diff 178429.EditedDec 17 2018, 12:08 AM

Addressed the comments of Jinsong and Hiroshi, will submit VSELECT related exploitation patch once this gets approved.
The irrelevant test case updates have been done by rL349325 and rL349329.

jsji accepted this revision.Dec 17 2018, 2:57 PM

LGTM, thanks for exploiting vabs!

This revision is now accepted and ready to land.Dec 17 2018, 2:57 PM
jsji retitled this revision from [Power9] suboptimal vec_abs for some cases to [PowerPC] suboptimal vec_abs for some cases on P9.Dec 17 2018, 3:05 PM
This revision was automatically updated to reflect the committed changes.