This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add support for calling SimplifyDemandedBits on the input of PDEP with a constant mask.
ClosedPublic

Authored by craig.topper on Sep 17 2020, 11:07 PM.

Details

Summary

We can do several optimizations for PDEP using computeKnownBits and SimplifyDemandedBits

-If the MSBs of the output aren't demanded, those MSBs of the mask input aren't demanded either. We need to keep the most significant demanded bit of the mask and any mask bits before it.
-The number of possible ones in the mask determines how many bits of the lsbs of the other operand are demanded. Any bits of the mask we don't demand by the previous rule should not be counted.
-The result will have zeros in any position that the mask is zero.
-Since non-mask input bits can only be output in the original position or a higher bit position, the result will have at least as many trailing zeroes as the non-mask input.

Test cases have not been committed yet. But this patch was written to show the test diffs. Wanted to get feedback on the tests before committing.

Diff Detail

Event Timeline

craig.topper created this revision.Sep 17 2020, 11:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 17 2020, 11:07 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
craig.topper requested review of this revision.Sep 17 2020, 11:07 PM
RKSimon added inline comments.Sep 18 2020, 2:45 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
49499

Can you move this functionality inside X86TargetLowering::SimplifyDemandedBitsForTargetNode?

Hopefully we can then expand it in the future to handle incoming DemandedBits masks as well to adjust the PDEP mask.

X86TargetLowering::computeKnownBitsForTargetNode support would be useful eventually as well.

craig.topper added inline comments.Sep 18 2020, 4:19 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
49499

Sure, but I don’t think it makes sense to move it without calculating the known bits based on the mask. So we’ll need more tests.

reames added a subscriber: reames.Sep 18 2020, 8:44 AM

Check my logic

Demanded bits for mask = (1 << (bitwidth - lzcnt(input demanded bits)) - 1
Demanded bits for other operand = (1<< ctpop(~mask_known.zero & demanded_bits_for_mask)) - 1

Result known.zero = mask_known.zero

We can do several optimizations for PDEP using computeKnownBits and SimplifyDemandedBits

-If the MSBs of the output aren't demanded, those MSBs of the mask input aren't demanded either. We need to keep the most significant demanded bit of the mask and any mask bits before it.
-The number of possible ones in the mask determines how many bits of the lsbs of the other operand are demanded. Any bits of the mask we don't demand by the previous rule should not be counted.
-The result will have zeros in any position that the mask is zero.
-Since non-mask input bits can only be output in the original position or a higher bit position, the result will have at least as many trailing zeroes as the non-mask input.

Test cases have not been committed yet. But this patch was written to show the test diffs. Wanted to get feedback on the tests before committing.

craig.topper edited the summary of this revision. (Show Details)Sep 19 2020, 10:22 PM
RKSimon accepted this revision.Sep 28 2020, 3:33 AM

LGTM - the logic makes sense to me - maybe add equivalent code into X86TTIImpl::simplifyDemandedUseBitsIntrinsic ? We already have pdep/pext constant folding support in instcombine

llvm/lib/Target/X86/X86ISelLowering.cpp
34064

It might be worth moving this into a KnownBits::DepositBits helper - IIRC RISCV has a similar instruction being finalised - maybe also move the instcombine constant folding handling into APInt ?

This revision is now accepted and ready to land.Sep 28 2020, 3:33 AM