A bug is reported by bug-45628, where the swap_with_shift case can't be matched to a single HW instruction xxswapd as expected. Tests at master have been added in D81073.
We have MatchRotate to handle an 'or' of two operands and generate a rot[lr] if the case matches the idiom of rotate. While PPC doesn't support ROTL v1i128. We can custom lower ROTL v1i128 to the vector_shuffle. The vector_shuffle will be matched to a single HW instruction during the phase of instruction selection.
Details
- Reviewers
steven.zhang jsji nemanjai lkail - Group Reviewers
Restricted Project - Commits
- rGad6024e29fe7: [PowerPC] Custom lower rotl v1i128 to vector_shuffle.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
9628 | Can we use std::iota and std::rotate to simplify such simulated rotation? |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
9622 | It will have problem if return SDValue() when lower the ROTL. I would change it as assertion. | |
9634 | nit: return Shuffle; | |
llvm/test/CodeGen/PowerPC/pr45628.ll | ||
231–232 | I believe they are using to move the VSR to two GPR. And yes, we need to take a more look to see if there is any better way to handle it. |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
9622 | assert(Op.getValueType() == MVT::v1i128 && "unexpected MVP type") | |
llvm/test/CodeGen/PowerPC/pr45628.ll | ||
231–232 | Is there any way to only lower to scalar if the value does not need to be in a vector register like it does here? Also, why does the vector code here look better than the version in my PR?: swap_with_shift: # @swap_with_shift xxspltd 35, 34, 1 xxswapd 34, 34 xxlxor 0, 0, 0 <=== the version on left does not need this xor xxpermdi 35, 35, 0, 1 xxpermdi 34, 0, 34, 1 xxlor 34, 35, 34 blr |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
9622 | Thx~ assertions have been added. | |
9624 | Thanks for reminding. | |
llvm/test/CodeGen/PowerPC/pr45628.ll | ||
231–232 |
I will take a look into it, thx.
Because the llc option -mcpu=pwr9 was added. |
Because the llc option -mcpu=pwr9 was added.
This reminds me we should not ignore pwr8. Please add RUN lines for pwr8, thanks.
Yeah, they seem to be similar problems, I may post another patch to solve this problem, thanks for comment.
It seams no gain to transform SHL/SRL v1i128 to vector_shuffle. https://godbolt.org/z/CfoPNP
It can be done with xor to produce a zero vector, however I can't get the code-gen to generate something that doesn't use the TOC.
We can use the xor for zero vector, but for the mask, we have to load it from constant pool. So, I don't see the benefit with the transformation for this case.
It will have problem if return SDValue() when lower the ROTL. I would change it as assertion.