This is an archive of the discontinued LLVM Phabricator instance.

[SVE] Code generation for fixed length vector truncates.
ClosedPublic

Authored by paulwalker-arm on Jul 8 2020, 6:54 AM.

Details

Summary

Lower fixed length vector truncates to a sequence of SVE UZP1 instructions.

Diff Detail

Event Timeline

paulwalker-arm created this revision.Jul 8 2020, 6:54 AM

Probably worth adding a testcase for truncating from <4 x i64> to <4 x i8>.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1072

This specifically applies to the result type. You might want to note that you're implicitly depending on the fact that we do custom legalization for NEON TRUNCATE operations for other reasons.

paulwalker-arm marked an inline comment as done.Jul 8 2020, 1:23 PM
paulwalker-arm added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1072

I wondered about that. Do you think it would be better if I was just explicit and add the necessary setOperation calls even though they're duplicates?

cameron.mcinally marked an inline comment as done.Jul 8 2020, 1:30 PM
cameron.mcinally added a subscriber: cameron.mcinally.
cameron.mcinally added inline comments.
llvm/test/CodeGen/AArch64/sve-fixed-length-trunc.ll
214

Just passing by and had to comment that this is an expensive truncate. It's around 5x slower than the equivalent on x86, with 1/2 the throughput.

Might be a good candidate for a dedicated hardware instruction on future SVE revisions...

efriedma added inline comments.Jul 8 2020, 2:09 PM
llvm/test/CodeGen/AArch64/sve-fixed-length-trunc.ll
214

We could experiment with using tbl or compact, if this comes up in practice.

Probably worth adding a testcase for truncating from <4 x i64> to <4 x i8>.

I'm happy to add this but just wanted to query what it gives. <4 x i8> is not a legal type so the test just exercises the same truncate path as <4 x i64> to <4 x i16>, or is this what you want protected (i.e. ensure the bytes remain where they're expected to be).

Made custom lowering for all truncates explicit. Added test for trunc_v4i64_v4i8 and tighten up the register based tests.

efriedma accepted this revision.Jul 9 2020, 12:16 PM

LGTM

I'm happy to add this but just wanted to query what it gives. <4 x i8> is not a legal type so the test just exercises the same truncate path as <4 x i64> to <4 x i16>, or is this what you want protected (i.e. ensure the bytes remain where they're expected to be).

That's fine; I just want to test that it doesn't get caught in the custom lowering and crash somehow.

This revision is now accepted and ready to land.Jul 9 2020, 12:16 PM
This revision was automatically updated to reflect the committed changes.