This is an archive of the discontinued LLVM Phabricator instance.

[X86] combineVectorTruncationWithPACKUS - remove split/concatenation of mask
ClosedPublic

Authored by RKSimon on Apr 7 2019, 5:22 AM.

Details

Summary

combineVectorTruncationWithPACKUS is currently splitting the upper bit bit masking into 128-bit subregs and then concatenating them back together.

This was originally done to avoid regressions that caused existing subregs to be concatenated to the larger type just for the AND masking before being extracted again. This was fixed by @spatel (notably rL303997 and rL347356).

This also lets SimplifyDemandedBits do some further improvements before it hits the recursive depth limit.

My only annoyance with this is that we were broadcasting some xmm masks but we seem to have lost them by moving to ymm - but that's a known issue as the logic in lowerBuildVectorAsBroadcast isn't great.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Apr 7 2019, 5:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2019, 5:23 AM
spatel accepted this revision.Apr 18 2019, 8:50 AM

LGTM

lib/Target/X86/X86ISelLowering.cpp
39199

It would be good to add an example in the comment here or within the function. Something like:
trunc <8 x i32> X to <8 x i16> -->
MaskX = X & 0xffff (clear high bits to prevent saturation)
packus (extract_subv MaskX, 0), (extract_subv MaskX, 1)

This revision is now accepted and ready to land.Apr 18 2019, 8:50 AM
This revision was automatically updated to reflect the committed changes.