This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Custom lower vector_shuffle for v4i16/v4f16
ClosedPublic

Authored by arsenm on Jun 27 2019, 8:57 AM.

Details

Reviewers
rampitec
Summary

Ordinarily it is lowered as a build_vector of each extract_vector_elt,
which in turn get lowered to bitcasts and bit shifts. Very little
understand the lowered extract pattern, resulting in much worse
code. We treat concat_vectors of v2i16 as legal, so prefer that.

Diff Detail

Event Timeline

arsenm created this revision.Jun 27 2019, 8:57 AM
rampitec accepted this revision.Jun 27 2019, 2:24 PM

LGTM with small suggestion.

lib/Target/AMDGPU/SIISelLowering.cpp
4750

It's better to swap conditions. That way you will not read beyond the array even if accidentally pass an odd index.

This revision is now accepted and ready to land.Jun 27 2019, 2:24 PM
arsenm updated this revision to Diff 207564.Jul 2 2019, 8:08 AM

Fix test failure

arsenm requested review of this revision.Jul 2 2019, 8:09 AM
This revision is now accepted and ready to land.Jul 2 2019, 11:37 AM
arsenm closed this revision.Jul 2 2019, 12:15 PM

r364959