This is an archive of the discontinued LLVM Phabricator instance.

[X86] Teach lowerV4I32Shuffle to only use broadcasts if the mask has more than one undef element. Prioritize shifts over broadcast in lowerV8I16Shuffle.
ClosedPublic

Authored by craig.topper on Aug 12 2019, 12:04 AM.

Details

Summary

The motivating case are the changes in vector-reduce-add.ll where
we were doing extra work in the scalar domain instead of shuffling.
There may be some one use check that needs to be looked into there,
but this patch sidesteps the issue by avoiding broadcasts that
aren't really broadcasting.

Diff Detail

Event Timeline

craig.topper created this revision.Aug 12 2019, 12:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 12 2019, 12:04 AM
Herald added a subscriber: hiraditya. · View Herald Transcript

Is this going to interfere with folding AVX512 broadcast loads into an instruction at all?

More generally, broadcast is preferable if the input is a foldable load (immediate shifts can't fold), but I think combineX86ShuffleChain should handle this.

Are you asking about v8i16 where I switched the shift priority?

I wish we knew the original VT. In the motivating case the shuffle has been widened. So there’s a bitcast that makes the possibility of folding unlikely.

RKSimon accepted this revision.Aug 19 2019, 5:49 AM

Are you asking about v8i16 where I switched the shift priority?

Yes, but I think combineX86ShuffleChain will try to convert this to a foldableload+broadcast if it can so we should be OK.

LGTM

This revision is now accepted and ready to land.Aug 19 2019, 5:49 AM
This revision was automatically updated to reflect the committed changes.