Page MenuHomePhabricator

[InstCombine] Optimize and of icmps with power-of-2 and contiguous masks
Needs ReviewPublic

Authored by jmciver on May 16 2022, 12:07 PM.

Details

Summary

Add an instance combine optimization for expressions of the form:

(%arg u< C1) & ((%arg & C2) != C2) -> %arg u< C2

Where C1 is a power-of-2 and C2 is a contiguous mask starting 1 bit below C1.

This commit resolves GitHub missed-optimization issue #54856.

Diff Detail

Event Timeline

jmciver created this revision.May 16 2022, 12:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 16 2022, 12:07 PM
Herald added a subscriber: hiraditya. · View Herald Transcript

All six reported failing tests (due to timeouts) are passing locally in my build/test environment.

jmciver updated this revision to Diff 429934.May 16 2022, 9:55 PM

Fix capitzalation of function names containing ICMP

Based on functionality of the optimization I believe foldLogOpOfMaskedICmps to be a reasonable location. However, it maybe better to create a new function and call it from InstCombinerImpl::foldAndOrOfICmps based on the following.

The placement of the call to foldLogOpOfMaskedICmps_AllZeros_BMask_NotMixed_and_NotAllOnes in the function foldLogOpOfMaskedICmps breaks the consistency of the conditional testing based on the conjunction of left and right mask (see variable Mask line 586). It is possible to place calls to foldLogOpOfMaskedICmps_AllZeros_BMask_NotMixed_and_NotAllOnes in two locations in foldLogOpOfMaskedICmps:

  • In function foldLogOpOfMaskedIcmpsAsymmetric called from conditional block Mask == 0 (line 593)
  • In conditional block Mask & (Mask_NotAllZeros | BMask_NotAllOnes (line 655)

I opted against this due to readability.

Thanks in advance for any feedback!

jmciver published this revision for review.May 17 2022, 12:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2022, 12:00 AM
nikic edited the summary of this revision. (Show Details)May 17 2022, 1:20 AM

To reduce performance impact I am thinking that the call to foldLogOpOfMaskedICmps_AllZeros_BMask_NotMixed_and_NotAllOnes should be moved into a new function, foldPowerOf2AndWithLesserContinuous, and then call this function from foldAndOrOfICmps. This will help to keep the purpose of foldLogOpOfMaskedICmps clear and reduce the overhead by not testing for a less probable optimization early.

Please can you add vector test coverage - uniform / non-uniform cases and cases containing undef / poison elements

Thanks @RKSimon! I will work on adding the additional tests.