Lower fixed length vector truncates to a sequence of SVE UZP1 instructions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Probably worth adding a testcase for truncating from <4 x i64> to <4 x i8>.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1063 | This specifically applies to the result type. You might want to note that you're implicitly depending on the fact that we do custom legalization for NEON TRUNCATE operations for other reasons. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
1063 | I wondered about that. Do you think it would be better if I was just explicit and add the necessary setOperation calls even though they're duplicates? |
llvm/test/CodeGen/AArch64/sve-fixed-length-trunc.ll | ||
---|---|---|
213 | Just passing by and had to comment that this is an expensive truncate. It's around 5x slower than the equivalent on x86, with 1/2 the throughput. Might be a good candidate for a dedicated hardware instruction on future SVE revisions... |
llvm/test/CodeGen/AArch64/sve-fixed-length-trunc.ll | ||
---|---|---|
213 | We could experiment with using tbl or compact, if this comes up in practice. |
I'm happy to add this but just wanted to query what it gives. <4 x i8> is not a legal type so the test just exercises the same truncate path as <4 x i64> to <4 x i16>, or is this what you want protected (i.e. ensure the bytes remain where they're expected to be).
Made custom lowering for all truncates explicit. Added test for trunc_v4i64_v4i8 and tighten up the register based tests.
LGTM
I'm happy to add this but just wanted to query what it gives. <4 x i8> is not a legal type so the test just exercises the same truncate path as <4 x i64> to <4 x i16>, or is this what you want protected (i.e. ensure the bytes remain where they're expected to be).
That's fine; I just want to test that it doesn't get caught in the custom lowering and crash somehow.
clang-tidy: warning: invalid case style for function 'LowerFixedLengthVectorTruncateToSVE' [readability-identifier-naming]
not useful