This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold (ctpop(X) == 1) | (X == 0) into ctpop(X) < 2
ClosedPublic

Authored by hkmatsumoto on Mar 19 2022, 1:16 PM.

Diff Detail

Event Timeline

hkmatsumoto created this revision.Mar 19 2022, 1:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2022, 1:16 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
hkmatsumoto requested review of this revision.Mar 19 2022, 1:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2022, 1:16 PM
xbolva00 added inline comments.Mar 19 2022, 1:45 PM
llvm/test/Transforms/InstCombine/icmp-or.ll
371

Please add a test with vector type.

llvm/test/Transforms/InstCombine/ispow2.ll
538

Now reduced.

craig.topper added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2613

Should we also handle the opposite version ctpop(x) != 1 && x != 0?

llvm/test/Transforms/InstCombine/icmp-or.ll
399

Can you add a vector test as well?

RKSimon added inline comments.Mar 19 2022, 1:55 PM
llvm/test/Transforms/InstCombine/icmp-or.ll
382

What about if the ctpop has multi uses?

craig.topper added inline comments.Mar 19 2022, 2:03 PM
llvm/test/Transforms/InstCombine/icmp-or.ll
382

The ctpop isn't being changed. Does it matter if it has more uses?

What is interesting is if the (icmp eq (ctpop(x)), 1) has another user other than the or.

Address code reviews

  • Fold (icmp ne ctpop(X) 1) & (icmp ne X 0) into (icmp ugt ctpop(X) 1) as well

foldOrOfCtpop folds the aforementioned pattern into (icmp uge ctpop(X) 2)
and that is later transformed into (icmp ugt ctpop(X) 1). I think this is fine.

  • Add tests for vector type
hkmatsumoto marked 3 inline comments as done.Mar 20 2022, 1:21 AM
hkmatsumoto added inline comments.Mar 20 2022, 1:25 AM
llvm/test/Transforms/InstCombine/ispow2.ll
538

Since this is almost the first time contributing to LLVM, I'm not sure what I should do about this. Does it mean we should remove these reduced tests?

xbolva00 added inline comments.Mar 20 2022, 1:55 AM
llvm/test/Transforms/InstCombine/ispow2.ll
538

just remove "(but this could reduce)" from the comment I think.

Address code review, removing "but this could reduce" from comment

hkmatsumoto added inline comments.Mar 20 2022, 2:11 AM
llvm/test/Transforms/InstCombine/ispow2.ll
538

I get it, thanks!

craig.topper added inline comments.Mar 20 2022, 3:29 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
921

Since the m_ICmp matches are the same for And/Or can we share those and possibly early out from the function if they don't match. Then only use IsAnd to check the predicates and control the what we emit?

spatel added inline comments.Mar 21 2022, 8:11 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
916

This name is too specific now that we are handling 'and'. It could be called "foldIsPowerOf2OrZero" to match the transform below?

llvm/test/Transforms/InstCombine/ispow2.ll
540

It seems better to rename this test and the next one now that we can optimize it - "is_pow2_or_zero" or similar.

Reflect code reviews

  • Factor out pattern matching code and exit early if it didn't match
  • Rename foldOrOfCtpop to foldIsPowerOf2OrZero
  • Rename 2 tests in ispow2.ll to is_pow2_or_zero(_logical) to inform that this patch optimizes them
hkmatsumoto marked 3 inline comments as done.Mar 23 2022, 1:18 PM
hkmatsumoto added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
921

You are definitely right! Changed accordingly.

hkmatsumoto marked an inline comment as done.Mar 23 2022, 1:18 PM

A few changes for tests suggested inline.

There might be some generalization of ctpop analysis that we can make as a follow-up patch.
For example, I was looking at a "wrong predicate" combination and noticed that we miss possible optimizations like this:
https://alive2.llvm.org/ce/z/RRk_d9

llvm/test/Transforms/InstCombine/icmp-or.ll
382

This transform only replaces the logic op with a cmp, so I think we want to do it even if all of the intermediate values have other uses.
Either way, we should have at least one test where there are other uses of the icmps.

412

We can swap the operand order of the "or" for more coverage of the commuted pattern.

444

We are testing the 'and' pattern too now, so it doesn't match the name of the file. I think it would be better to add the new tests next to the changed tests in ispow2.ll.

478

Instead of checking the same wrong constant again, it would be better to change to a wrong predicate(s).

hkmatsumoto updated this revision to Diff 418405.EditedMar 26 2022, 11:04 AM

Reflect code reviews

  • Move all added tests to ispow2.ll from icmp-or.ll since now that they contain tests for "and" operand
  • Add tests to confirm extra uses don't affect the optimization
  • Add negative tests whose order of the op/and operand being swapped (named *_swap_cmp)
  • Add negative tests for wrong predicates
  • Test various wrong constants
hkmatsumoto added inline comments.Mar 26 2022, 11:06 AM
llvm/test/Transforms/InstCombine/ispow2.ll
540

Renamed to is_pow2or0_ctpop.

555

Renamed to is_pow2or0_ctpop_logical.

A few changes for tests suggested inline.

There might be some generalization of ctpop analysis that we can make as a follow-up patch.
For example, I was looking at a "wrong predicate" combination and noticed that we miss possible optimizations like this:
https://alive2.llvm.org/ce/z/RRk_d9

Very interesting! I'll look into it and (try to) send a follow-up PR.

Thank you for the making the test changes. I pushed the baseline tests on your behalf here:
ebaa28e0750b

Please rebase and update the patch here in Phabricator - it should only show changes in the CHECK lines after the update.

joerg added a subscriber: joerg.Mar 29 2022, 3:03 AM

Why is this fold preferable to (X & (X-1)) == 0? At least on all architectures without native population count, the binary-and based test is preferable and it might even be better with it.

Why is this fold preferable to (X & (X-1)) == 0? At least on all architectures without native population count, the binary-and based test is preferable and it might even be better with it.

Less IR instructions. I think SDAG already expands some ctpop patterns to logic (backends should decide about optimal form)

Sync with main branch

Why is this fold preferable to (X & (X-1)) == 0? At least on all architectures without native population count, the binary-and based test is preferable and it might even be better with it.

Less IR instructions. I think SDAG already expands some ctpop patterns to logic (backends should decide about optimal form)

Correct - we try to convert the setcc(ctpop) pattern here:
https://github.com/llvm/llvm-project/blob/f1d8e46258c6a08ca1a375dc9670dd5581d6cf65/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp#L3772

That doesn't check whether the target has a popcount instruction for scalar types, so it is potentially too aggressive. For x86, we have a bonus optimization after that, so it should not show up there:
https://godbolt.org/z/oc16Kdhcf

spatel added inline comments.Mar 29 2022, 5:25 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
915

Why create ">= 2" instead of "> 1" directly?

I don't think it makes the transform or code any clearer with ">= 2", and we will always canonicalize to the other form, so I would prefer to go directly to the final result for efficiency.

hkmatsumoto added inline comments.Mar 29 2022, 7:11 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
915

You're right. I thought it is fine to let another pass do further transformation but in retrospect, folding >= 2 to > 1 is so trivial that we should do it here.

Create ctpop(X) > 1 instead of ctpop(X) >= 2

spatel accepted this revision.Mar 29 2022, 7:15 AM

LGTM

This revision is now accepted and ready to land.Mar 29 2022, 7:15 AM
hkmatsumoto added a comment.EditedMar 29 2022, 7:17 AM

@spatel Since I don't have commit access, can you land this patch?
Please use "Hirochika Matsumoto <git@hkmatsumoto.com>" as my identity.

spatel edited the summary of this revision. (Show Details)Mar 29 2022, 7:18 AM
This revision was landed with ongoing or failed builds.Mar 29 2022, 8:32 AM
This revision was automatically updated to reflect the committed changes.