The PowerPC code generator currently scalarizes vector truncates that would fit in a vector register, resulting in vector extracts, scalar operations, and vector merges. This patch custom lowers a vector truncate that would fit in a register to a vector shuffle instead.
This may be misleading to the reader as it suggests that the byte order within an element in the register is different on little endian systems. It might be clearer to use LLVM-like notation for vectors and write these as:
BE: <MSB1| LSB1, MSB2|LSB2, uu, uu, uu, uu, uu, uu> to <LSB1, LSB2, u, u, u, u, u, u, u, u, u, u, u, u, u, u> LE: < uu, uu, uu, uu, uu, uu, MSB2|LSB2, MSB1| LSB1> to <u, u, u, u, u, u, u, u, u, u, u, u, u, u, LSB2, LSB1>
LGTM. Thanks for exploiting this! Some minor comments.
Maybe add a comment here about why only these 5 target VT are supported?
Comment a little misleading here?
Maybe we can just use WideNumElts + 1 here?
Why we require pwr9 here? I think this should apply to pwr8 and below as well?
Can we add this new testcase in a NFC patch first, then show ONLY the difference caused by this opt here?
It will be great for others to see that this patch reduced the number of instructions from 33 to 1!
Maybe we should either add the check to xxswapd in BE ?
LowerTRUNCATEVector has an if statement checking LE or BE, so even though we want the result to be the same in the end we have to test both LE and BE to test the whole function.