This is an archive of the discontinued LLVM Phabricator instance.

[X86] lowerShuffleAsLanePermuteAndRepeatedMask - retain the per-lane undef elements and don't just copy the repeated mask
ClosedPublic

Authored by RKSimon on Jan 25 2023, 6:35 AM.

Details

Summary

lowerShuffleAsLanePermuteAndRepeatedMask expands a shuffle from shuffle(x,y,mask) to shuffle(shuffle(x,y,lanemask1),shuffle(x,y,lanemask2),repeatedinlanemask)

However, we weren't making use of the fact that elements of the original mask might be undef - instead of fully applying the entire repeatedinlanemask to every lane, we can simplify the mask if we never demanded that element in the original mask.

Yet another improvement addressing regressions from D127115

Diff Detail

Event Timeline

RKSimon created this revision.Jan 25 2023, 6:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 25 2023, 6:35 AM
RKSimon requested review of this revision.Jan 25 2023, 6:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 25 2023, 6:35 AM
deadalnix accepted this revision.Jan 25 2023, 3:15 PM
deadalnix added a subscriber: deadalnix.

This is looking good. I'm going to accept as it, but it'd be better to wait a little to give the opportunity for someone more familiar with this to shime in.

This revision is now accepted and ready to land.Jan 25 2023, 3:15 PM

This is looking good. I'm going to accept as it, but it'd be better to wait a little to give the opportunity for someone more familiar with this to shime in.

Thanks - does anyone else have any comments? Otherwise I'll probably commit this over the weekend.

llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-5.ll