This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fold `(-1 + A) & B` into `A ? 0 : B` where A is effectively a bool
ClosedPublic

Authored by dtcxzyw on Jun 16 2023, 8:39 AM.

Details

Summary

Solves issue https://github.com/llvm/llvm-project/issues/63321.

This patch explicitly folds (-1 + A) & B into A ? 0 : B. Additional trunc will be created when A is neither i1 nor <N x i1>.

https://alive2.llvm.org/ce/z/pWv9jJ

Diff Detail

Event Timeline

dtcxzyw created this revision.Jun 16 2023, 8:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2023, 8:39 AM
dtcxzyw requested review of this revision.Jun 16 2023, 8:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2023, 8:39 AM
goldstein.w.n added inline comments.Jun 16 2023, 1:06 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2657

This comment and the title are missing an important detail that the value but me 0/1 (either via load range as is your alive2 link or from i1)

2659

The add doesn't need to be nsw

2665

Instead of duplicating the logic for op1/op2 use 'match(&I, m_c_And(...))`

2667

Comment that 0 refers to depth.

2667

Dont cast op1 cxti, make a simplifyquery at I (sorry not computer at the moment, so can't lookup exact syntax)

goldstein.w.n added inline comments.Jun 16 2023, 1:29 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2667

I think using knownbits:::ule(1) would ne a lot clearer.

nikic added inline comments.Jun 17 2023, 12:55 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2659

Why does this need a zext?

dtcxzyw updated this revision to Diff 532391.Jun 17 2023, 6:11 AM
dtcxzyw retitled this revision from [InstCombine] Fold `(-1 + zext(B)) & X` into `B ? 0 : X` where B is effectively a bool to [InstCombine] Fold `(-1 + A) & B` into `A ? 0 : B` where A is effectively a bool.
dtcxzyw edited the summary of this revision. (Show Details)

fix comment and rebase

dtcxzyw marked 6 inline comments as done.Jun 17 2023, 6:13 AM
dtcxzyw added inline comments.Jun 17 2023, 6:17 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2659

It needs a zext to avoid creating redundant trunc (zext i1).

Polite ping :)

nikic added inline comments.Jun 25 2023, 12:52 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2659

You are always creating the trunc, so I don't understand how explicitly matching the zext avoids a redundant trunc. The trunc will get folded away later in either case.

2659

m_c_Add is unnecessary due to constant on RHS.

2659

You need a one use restriction on the add and a multiuse test.

2664

Use countMaxActiveBits() <= 1 instead of KnownBits::ule.

2666

This gets canonicalized to an icmp eq 0, so you should directly emit that.

dtcxzyw updated this revision to Diff 534416.Jun 25 2023, 8:28 PM
  • Rebase
  • Address comments
dtcxzyw marked 6 inline comments as done.Jun 25 2023, 8:29 PM
goldstein.w.n added inline comments.Jun 28 2023, 8:17 AM
llvm/test/Transforms/InstCombine/binop-cast.ll
309

Can you:

  1. Split the tests to a seperate patch (create a series so that patch1 is the tests, patch2 is the impl). This allows us to see exact what changes the impl causes
  2. Can you add a negative test where the load range is out of range (maybe 0-3)
  3. Can you add a test where %x is the same as %y and the 0/1 range is created using and.
goldstein.w.n added inline comments.Jul 4 2023, 11:53 AM
llvm/test/Transforms/InstCombine/binop-cast.ll
306

I'm not sure this isn't a regression.
Generally I'd think add; and is as easier to analyze than icmp; select and likely to get better codegen. I think maybe this should only be done if we can actuall save inst count (so above require the zext).

dtcxzyw marked an inline comment as done.Jul 4 2023, 11:26 PM
dtcxzyw added inline comments.
llvm/test/Transforms/InstCombine/binop-cast.ll
306

icmp; select seems to get better codegen on x86-64 and aarch64.

x86-64: https://godbolt.org/z/jv9v9xTMT
riscv64: https://godbolt.org/z/zTbbnn9x4
aarch64: https://godbolt.org/z/eK6xWKnrb

goldstein.w.n added inline comments.Aug 24 2023, 10:04 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2663

computeKnownBits(A, /* Depth */ 0, &I).countMaxActiveBits() <= 1 -> isKnownToBeAPowerOfTwo(A,..., /*OrZero*/true, ...)

nikic added inline comments.Aug 24 2023, 11:37 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2655

Please land this code cleanup as a separate NFC commit.

2663

That's not the same thing though? The current check only allows 0 and 1, which result in -1 and 0 after the add. If it's just any power of two we have 0 and 2^x instead, which become -1 and 2^x-1.

goldstein.w.n added inline comments.Aug 24 2023, 11:50 AM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2663

Yup ignore my incorrect comment. I misread as a population count check :(

nikic added inline comments.Aug 25 2023, 3:33 AM
llvm/test/Transforms/InstCombine/binop-cast.ll
301–302

This isn't a negative test?

dtcxzyw added inline comments.Aug 25 2023, 4:40 AM
llvm/test/Transforms/InstCombine/binop-cast.ll
301–302

It was added as a negative test since @goldstein.w.n thought it should be a regression.
However, I tested the inst seq on some backends and found that icmp; select seems to get better codegen.

x86-64: https://godbolt.org/z/jv9v9xTMT
riscv64: https://godbolt.org/z/zTbbnn9x4
riscv64 (sifive-u74 with SFB opt): https://godbolt.org/z/aKjEs5n1P (icmp; select only costs 2 cycles with SFB)
aarch64: https://godbolt.org/z/eK6xWKnrb

I think the pattern select i1 %x, i32 %y, i32 0 is more likely to get better codegen.

goldstein.w.n added inline comments.Sep 9 2023, 5:30 PM
llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
2655

This comment is outstanding.

2658

i1/!range {0,2} isn't really the necessary condition. The necessary condition is A is 0 or 1 which can be proven in a lot more cases.

Please also update the title/summary with this precondition of A being 0/1.

dtcxzyw marked 6 inline comments as done.Sep 10 2023, 11:01 PM
This revision is now accepted and ready to land.Sep 11 2023, 9:14 AM
dtcxzyw added inline comments.Sep 11 2023, 9:58 AM
llvm/test/Transforms/InstCombine/binop-cast.ll
301–302

@goldstein.w.n Any more comments here? I think it should not be a negative test.

Ping.

That was been approved.

dtcxzyw marked 2 inline comments as done.Sep 21 2023, 1:43 AM