This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Custom lower rotl v1i128 to vector_shuffle.
ClosedPublic

Authored by Esme on Jun 3 2020, 3:19 AM.

Details

Summary

A bug is reported by bug-45628, where the swap_with_shift case can't be matched to a single HW instruction xxswapd as expected. Tests at master have been added in D81073.
We have MatchRotate to handle an 'or' of two operands and generate a rot[lr] if the case matches the idiom of rotate. While PPC doesn't support ROTL v1i128. We can custom lower ROTL v1i128 to the vector_shuffle. The vector_shuffle will be matched to a single HW instruction during the phase of instruction selection.

Diff Detail

Event Timeline

Esme created this revision.Jun 3 2020, 3:19 AM
lkail added a subscriber: lkail.Jun 3 2020, 4:00 AM
lkail added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9628

Can we use std::iota and std::rotate to simplify such simulated rotation?

Esme updated this revision to Diff 268352.Jun 3 2020, 7:28 PM

Use std::iota and std::rotate to simplify vector rotation.

Esme updated this revision to Diff 268360.Jun 3 2020, 9:47 PM
Esme marked an inline comment as done.

Change the diff base from master to D81073.

Esme edited the summary of this revision. (Show Details)Jun 3 2020, 9:49 PM
lkail added inline comments.Jun 3 2020, 10:18 PM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9624

I suppose we still need to peekThroughBitcasts Op0.

llvm/test/CodeGen/PowerPC/pr45628.ll
231–232

New sequence looks getting more memory ops than original one.

steven.zhang added inline comments.Jun 4 2020, 1:01 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9622

It will have problem if return SDValue() when lower the ROTL. I would change it as assertion.

9634

nit:
if (SDValue Shuffle = xxxx)

return Shuffle;
llvm/test/CodeGen/PowerPC/pr45628.ll
231–232

I believe they are using to move the VSR to two GPR. And yes, we need to take a more look to see if there is any better way to handle it.

shawnl added a subscriber: shawnl.Jun 5 2020, 1:43 PM
shawnl added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9622

assert(Op.getValueType() == MVT::v1i128 && "unexpected MVP type")

llvm/test/CodeGen/PowerPC/pr45628.ll
231–232

Is there any way to only lower to scalar if the value does not need to be in a vector register like it does here? Also, why does the vector code here look better than the version in my PR?:

swap_with_shift:                        # @swap_with_shift
        xxspltd 35, 34, 1
        xxswapd 34, 34
        xxlxor 0, 0, 0 <=== the version on left does not need this xor
        xxpermdi 35, 35, 0, 1
        xxpermdi 34, 0, 34, 1
        xxlor 34, 35, 34
        blr
Esme updated this revision to Diff 269048.Jun 7 2020, 6:10 AM

Add assertions.

Esme marked 7 inline comments as done.Jun 7 2020, 11:59 PM
Esme added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9622

Thx~ assertions have been added.

9624

Thanks for reminding.

llvm/test/CodeGen/PowerPC/pr45628.ll
231–232

Is there any way to only lower to scalar if the value does not need to be in a vector register like it does here?

I will take a look into it, thx.

why does the vector code here look better than the version in my PR?

Because the llc option -mcpu=pwr9 was added.

lkail added a comment.Jun 8 2020, 5:04 AM

Because the llc option -mcpu=pwr9 was added.

This reminds me we should not ignore pwr8. Please add RUN lines for pwr8, thanks.

Esme updated this revision to Diff 270089.Jun 11 2020, 2:59 AM
Esme marked an inline comment as done.

Add RUN lines for pwr8.

steven.zhang accepted this revision.Jun 11 2020, 3:52 AM

LGTM now, But please hold on several days to see if @shawnl and @lkail have more comments.

This revision is now accepted and ready to land.Jun 11 2020, 3:52 AM

What about shr and ashl?

a = b << 64;
or
a = b >> 64;

Esme added a comment.Jun 17 2020, 9:20 AM

What about shr and ashl?

a = b << 64;
or
a = b >> 64;

Yeah, they seem to be similar problems, I may post another patch to solve this problem, thanks for comment.

Also non-powers-of-8 can be handled https://godbolt.org/z/6Nww6p

lkail accepted this revision.Jun 17 2020, 6:06 PM

LGTM

This revision was automatically updated to reflect the committed changes.
Esme added a comment.Jun 23 2020, 11:59 PM

What about shr and ashl?

a = b << 64;
or
a = b >> 64;

It seams no gain to transform SHL/SRL v1i128 to vector_shuffle. https://godbolt.org/z/CfoPNP

It can be done with xor to produce a zero vector, however I can't get the code-gen to generate something that doesn't use the TOC.

It can be done with xor to produce a zero vector, however I can't get the code-gen to generate something that doesn't use the TOC.

We can use the xor for zero vector, but for the mask, we have to load it from constant pool. So, I don't see the benefit with the transformation for this case.