The PowerPC code generator currently scalarizes vector truncates that would fit in a vector register, resulting in vector extracts, scalar operations, and vector merges. This patch custom lowers a vector truncate that would fit in a register to a vector shuffle instead.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
6920 ↗ | (On Diff #180946) | This may be misleading to the reader as it suggests that the byte order within an element in the register is different on little endian systems. It might be clearer to use LLVM-like notation for vectors and write these as: BE: <MSB1| LSB1, MSB2|LSB2, uu, uu, uu, uu, uu, uu> to <LSB1, LSB2, u, u, u, u, u, u, u, u, u, u, u, u, u, u> LE: < uu, uu, uu, uu, uu, uu, MSB2|LSB2, MSB1| LSB1> to <u, u, u, u, u, u, u, u, u, u, u, u, u, u, LSB2, LSB1> |
LGTM. Thanks for exploiting this! Some minor comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
645 ↗ | (On Diff #180946) | Maybe add a comment here about why only these 5 target VT are supported? |
6928 ↗ | (On Diff #180946) | Comment a little misleading here? |
6949 ↗ | (On Diff #180946) | Maybe we can just use WideNumElts + 1 here? |
llvm/test/CodeGen/PowerPC/vec-trunc.ll | ||
2 ↗ | (On Diff #180946) | Why we require pwr9 here? I think this should apply to pwr8 and below as well? |
16 ↗ | (On Diff #180946) | Can we add this new testcase in a NFC patch first, then show ONLY the difference caused by this opt here? It will be great for others to see that this patch reduced the number of instructions from 33 to 1! |
18 ↗ | (On Diff #180946) | Maybe we should either add the check to xxswapd in BE ? |
llvm/test/CodeGen/PowerPC/vec-trunc.ll | ||
---|---|---|
16 ↗ | (On Diff #180946) | I think if we're doing this, it would probably be nice to see the entire codegen. Just produce the checks using utils/update_llc_checks.py. |
llvm/test/CodeGen/PowerPC/vec-trunc.ll | ||
---|---|---|
18 ↗ | (On Diff #180946) | LowerTRUNCATEVector has an if statement checking LE or BE, so even though we want the result to be the same in the end we have to test both LE and BE to test the whole function. |
llvm/test/CodeGen/PowerPC/vec-trunc.ll | ||
---|---|---|
18 ↗ | (On Diff #180946) | Yes, we need to test both BE/LE, but no need to have CHECK-BE/CHECK-LE prefixes if the results are the same? |
llvm/test/CodeGen/PowerPC/vec-trunc.ll | ||
---|---|---|
18 ↗ | (On Diff #180946) | Oh, okay, I see what you mean now. I could have saved a bunch of check lines. |