In PowerPC, there is instruction to load vector in big endian element order when it's in little endian target.
So we can combine vector load + reverse into big endian load to eliminate the swap instruction. Also combine vector reverse + store into big endian store.
Details
- Reviewers
hfinkel nemanjai jsji - Commits
- rG66c320908ba0: recommit:[PowerPC] Eliminate loads/swap feeding swap/store for vector type by…
rL367516: recommit:[PowerPC] Eliminate loads/swap feeding swap/store for vector type by…
rG54d446f70e8a: revert r367382 because buildbot failure
rL367382: [PowerPC] Eliminate loads/swap feeding swap/store for vector type by using big…
Diff Detail
- Repository
- rL LLVM
Event Timeline
Mostly good to me.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
13109 ↗ | (On Diff #211019) | Can this be a range-based for loop? |
13124 ↗ | (On Diff #211019) | Maybe we should check this before more expensive IsElementReverse check? |
llvm/lib/Target/PowerPC/PPCISelLowering.h | ||
462 ↗ | (On Diff #211019) | This will overlap with LXVD2X above, it would be great if we can have a follow up NFC to clean up. |
llvm/test/CodeGen/PowerPC/load-shuffle-and-shuffle-store.ll | ||
8 ↗ | (On Diff #211019) | Can we add one RUN line to run big endian test to make sure nothing get affected? |
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
13109 ↗ | (On Diff #211019) | It's not easy because it's reverse begin and end. |
llvm/lib/Target/PowerPC/PPCISelLowering.h | ||
462 ↗ | (On Diff #211019) | Yes, it would be good to merge the implementation of LXVD2X/STXVD2X into LOAD_VEC_BE/STORE_VEC_BE. I would try to do this in another patch. |
Because we have PPCVSXSwapRemoval pass to hack the element order before P9 with vsx, the element order is not always standard normal order in register.
And the optimization of this patch will be conflict with PPCVSXSwapRemoval so that we can not get correct result during the process. The fix way is to not do this optmization before P9.