This is an archive of the discontinued LLVM Phabricator instance.

[x86] improve AVX lowering of vector zext
ClosedPublic

Authored by spatel on Mar 25 2019, 9:33 AM.

Details

Summary

If we know the 2 halves of an oversized zext-in-reg are the same, don't create those halves independently.

I tried several different approaches to fold this, but it's difficult to get right during legalization. In the default path, we are creating a generic shuffle that looks like an unpack high, but it can get transformed into a different mask (a blend), so it's not straightforward to match that. If we try to fold after it actually becomes an X86ISD::UNPCKH node, we can't be sure what the operand node is - it might be a generic shuffle, or it could be some x86-specific op.

I thought we had some utility to determine if a mask had an any-size splat subset pattern, but I don't see it, so I wrote a small match mask helper for this 1 case.

From the test output, we should be doing something like this for SSE4.1 as well, but I'd rather leave that as a follow-up since it involves changing lowering actions.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Mar 25 2019, 9:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 25 2019, 9:33 AM
RKSimon added inline comments.Mar 25 2019, 9:42 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
9899 ↗(On Diff #192122)

This might fail if one half has an undef mask value and the other doesn't and we then use the mask half with the undef (there's probably a better way to phrase that....) - we need to return a merged mask with the mask value set to the non-undef case. IIRC we used to have a helper to do this but it evolved into isRepeatedShuffleMask which assumes sublane offsets.

spatel marked an inline comment as done.Mar 25 2019, 9:55 AM
spatel added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
9899 ↗(On Diff #192122)

Yes, good catch. I'll add another test to try to show that. Returning the 'stronger' half mask would make this patch more complex if I understand correctly, and I haven't actually seen that pattern. Ok, to just make the matcher more strict and fail if the undef lanes don't line up?

spatel updated this revision to Diff 192158.Mar 25 2019, 11:04 AM

Patch updated:

  1. Don't match undef lanes unless they exist on both halves of the mask.
  2. New test to show that difference.
RKSimon accepted this revision.Mar 27 2019, 11:24 AM

LGTM - someday shuffle combining will handle different sized source/destination registers (and we can properly combine from vinsertf128) but this is good enough for now.

This revision is now accepted and ready to land.Mar 27 2019, 11:24 AM
This revision was automatically updated to reflect the committed changes.