This is an archive of the discontinued LLVM Phabricator instance.

[X86] Teach lowerShuffleAsBlend to use bit blend for v16i8/v32i8/v16i16 when avx512vl is enabled but not avx512bw.
ClosedPublic

Authored by craig.topper on Jul 3 2020, 11:06 PM.

Details

Summary

Probably not super important since there are no real CPUs with
avx512vl and not avx512bw. But vpternlog should be better than
vblendvb.

I do wonder if we should use vpternlog even with BWI. We
currently use vblendmb or vpblendmw by putting the mask into a GPR
and moving it to a k-register. But I don't think we hoist the
GPR to k-register copy in machine LICM. Using VPTERNLOG would use
a constant pool load, but has the advantage that we're pretty good
at hoisting and rematerializing those.

Diff Detail

Event Timeline

craig.topper created this revision.Jul 3 2020, 11:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 3 2020, 11:07 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
RKSimon accepted this revision.Jul 4 2020, 12:21 AM

LGTM - but as you said this doesn't tend to occur in the real world - but I guess somebody might decide to disable avx512bw for "reasons".....

This revision is now accepted and ready to land.Jul 4 2020, 12:21 AM

Thanks Simon. Thoughts on whether we should do this with BWI too? I think the bit blend would also work better with shuffle combining?

RKSimon added a comment.EditedJul 4 2020, 3:47 AM

Thanks Simon. Thoughts on whether we should do this with BWI too? I think the bit blend would also work better with shuffle combining?

Yes, my only concern is that currently we don't anything to combine variable select patterns OTHER than bit blend - ternlog/vselect are neither handled as faux shuffles or combined to at the moment which they probably need to be.

This revision was automatically updated to reflect the committed changes.