This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel] Detect splats built with G_CONCAT_VECTORS
ClosedPublic

Authored by rovka on Jan 17 2023, 12:47 AM.

Details

Summary

Add support to the MI matching of vector splats for patterns that
consist of G_CONCAT_VECTORS of smaller splats with the same constant
value. With this, we would consider the following pseudo-MIR to be a splat:

%0 = G_[F]CONSTANT [...]
%1 = G_BUILD_VECTOR %0, %0, ..., %0
%2 = G_CONCAT_VECTORS %1, %1, ..., %1

Since it uses recursion for matching splats, it could match pretty
complicated patterns with all sorts of combinations of G_BUILD_VECTOR
and G_CONCAT_VECTORS (e.g. a G_CONCAT_VECTORS with
a G_BUILD_VECTOR_TRUNC and another G_CONCAT_VECTORS as operands),
and it should also look through copies etc.

This should make it easier to match complex immediates for certain
instructions on AMDGPU, where for instance a <8 x s16> will be split
before instruction selection into a G_CONCAT_VECTORS of <2 x s16>
splats.

Diff Detail

Event Timeline

rovka created this revision.Jan 17 2023, 12:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2023, 12:47 AM
rovka requested review of this revision.Jan 17 2023, 12:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2023, 12:47 AM
rovka added reviewers: Restricted Project, aemerson, paquette.Jan 17 2023, 12:48 AM
foad added a subscriber: foad.Jan 17 2023, 3:20 AM

This should make it easier to match complex immediates for certain
instructions on AMDGPU, where for instance a <8 x s16> will be split
before instruction selection into a G_CONCAT_VECTORS of <2 x s16>
splats.

I don't quite understand when this would be useful. If <8 x s16> is split up into <2 x s16> pieces during legalization, because only the latter is legal, then why would we still have a G_CONCAT_VECTORS with result type <8 x s16>? What operations would use that result? Do you have an example in mind?

rovka added a comment.Jan 17 2023, 4:14 AM

This should make it easier to match complex immediates for certain
instructions on AMDGPU, where for instance a <8 x s16> will be split
before instruction selection into a G_CONCAT_VECTORS of <2 x s16>
splats.

I don't quite understand when this would be useful. If <8 x s16> is split up into <2 x s16> pieces during legalization, because only the latter is legal, then why would we still have a G_CONCAT_VECTORS with result type <8 x s16>? What operations would use that result? Do you have an example in mind?

There are some intrinsics that use large vector types, e.g. v16f16.

arsenm accepted this revision.Jan 17 2023, 8:22 AM
arsenm added a subscriber: arsenm.

This suggests to me we're missing some useful vector combines, but I don't see a problem having this handle this case also

llvm/unittests/CodeGen/GlobalISel/PatternMatchTest.cpp
810

Could use a case with some undefs in it (undef simple element and undef vector component)

This revision is now accepted and ready to land.Jan 17 2023, 8:22 AM
rovka added inline comments.Jan 18 2023, 1:28 AM
llvm/unittests/CodeGen/GlobalISel/PatternMatchTest.cpp
810

Will do, thanks.

rovka updated this revision to Diff 490075.Jan 18 2023, 1:39 AM

Add tests with undef

This revision was landed with ongoing or failed builds.Jan 18 2023, 1:56 AM
This revision was automatically updated to reflect the committed changes.