This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] canonicalize "extract lowest set bit" away from cttz intrinsic
ClosedPublic

Authored by spatel on Feb 18 2023, 6:05 AM.

Details

Summary

1 << (cttz X) --> -X & X
https://alive2.llvm.org/ce/z/qv3E9e

This creates an extra use of the input value, so that's generally not preferred, but there are advantages to this direction:

  1. 'negate' and 'and' allow for better analysis than 'cttz'.
  2. This is more likely to induce follow-on transforms (in the example from issue #60801, we'll get the decrement pattern).
  3. The more basic ALU ops are more likely to result in better codegen across a variety of targets.

This won't solve the motivating bugs (see issue #60799) because we do not recognize the redundant icmp+sel, and the x86 backend may not have the pattern-matching to produce the optimal BMI instructions.

Diff Detail

Event Timeline

spatel created this revision.Feb 18 2023, 6:05 AM
spatel requested review of this revision.Feb 18 2023, 6:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 18 2023, 6:05 AM
nikic accepted this revision.Feb 18 2023, 6:23 AM

LGTM. I think this is a reasonable canonical form, especially as @goldstein.w.n added KnownBits support for these patterns recently.

Worth noting that for pow2 detection we canonicalize in the reverse direction (towards ctpop and icmp), and I don't quite remember why we ended up going with that somewhat unusual choice.

This revision is now accepted and ready to land.Feb 18 2023, 6:23 AM
kazu accepted this revision.Feb 18 2023, 1:51 PM

LGTM. Thanks!

Worth noting that for pow2 detection we canonicalize in the reverse direction (towards ctpop and icmp), and I don't quite remember why we ended up going with that somewhat unusual choice.

For this pattern:
(-A & A) != A --> ctpop(A) > 1
It was a combination of less instructions, less uses, and being the more obvious spelling for, "Is more than one bit set?"

We can make the same argument about cttz with the low-set-bit in this pattern, but it's the same number of instructions and has the other noted benefits. Both ways have advantages, so we just have to make a choice.