This is an archive of the discontinued LLVM Phabricator instance.

[x86] match concat of 128-bit high half vectors before lowering to VPERM2X128
ClosedPublic

Authored by spatel on Jan 22 2020, 7:36 AM.

Details

Summary

shuffle (ins ?, X, C1), (ins ?, Y, C2), Mask --> concat X, Y

This is another shuffle problem seen with PR42024:
https://bugs.llvm.org/show_bug.cgi?id=42024

We have this small crack in legalization/lowering/combining/demanded that allows forming a vperm2f128 of high halves with AVX1 when we could do better by peeking through the insert_subvector nodes. AFAICT, it requires IR as shown in the diffs - much larger than legal vectors - to avoid all of the usual folds.

Another option might be to allow forming the 256-bit vperm here and then doing custom shuffle combining on that opcode.

Diff Detail

Event Timeline

spatel created this revision.Jan 22 2020, 7:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 22 2020, 7:36 AM

This looks OK, but I'd be curious if we'd hit more cases by adding a vperm2f128(insert_subvector,insert_subvector) combine in combineTargetShuffle instead.

This looks OK, but I'd be curious if we'd hit more cases by adding a vperm2f128(insert_subvector,insert_subvector) combine in combineTargetShuffle instead.

I drafted a combineTargetShuffle() alternative, and it doesn't cause any difference on existing regression tests. I'll post it here for comparison.

spatel updated this revision to Diff 239645.Jan 22 2020, 10:22 AM

Patch updated:
Move the transform to combineTargetShuffle(); no test diffs.

RKSimon accepted this revision.Jan 22 2020, 10:34 AM

LGTM - cheers

This revision is now accepted and ready to land.Jan 22 2020, 10:34 AM
This revision was automatically updated to reflect the committed changes.