This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Look through truncate to fold icmp with intrinsics
ClosedPublic

Authored by bcl5980 on Feb 5 2023, 10:11 PM.

Details

Summary

The output of intrinsic functions like ctpop, cttz, ctlz have limited range from 0 to bitwidth. So if the truncate destination type can hold the source bitwidth size, we can just ignore the truncate and use the truncate src to do combination.

Alive2 proofs:
https://alive2.llvm.org/ce/z/9D_-qP

Diff Detail

Event Timeline

bcl5980 created this revision.Feb 5 2023, 10:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2023, 10:11 PM
bcl5980 requested review of this revision.Feb 5 2023, 10:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2023, 10:11 PM
bcl5980 updated this revision to Diff 495031.Feb 6 2023, 1:31 AM

rebase with precommit tests.

General proofs with Alive2 are probably not possible for a bitwidth constrained transform like this, but you can include proofs for the specific changes here.
How can we test ctpop?
What happens if the trunc has an extra use?

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
4952–4953

Require one less bit for cttz/ctlz when the "is_zero_poison" argument is true?

llvm/test/Transforms/InstCombine/cmp-intrinsic.ll
549

Why truncate to i15?

IIUC, the transform is valid on this example for truncate down to 6 bits, but not 5 bits. So we should have both of those variations.

The source type width should be tested similarly, so if we start with i32, it is ok to truncate to i5, but not i4?

goldstein.w.n added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
4950

Imo it makes more sense to do this generically.

(icmp P (trunc(X), C)) if KnownBits(X)[OrigWidth:TruncWidth] == 0 just drop the truncate. That will cover all these intrins + any other cases that happen to come up.

bcl5980 added inline comments.Feb 8 2023, 6:43 PM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
4950

I don't want to enable all cases for the trunc. That may cause a lot of potential regressions. By default we should shrink bits if possible not expand it. I prefer to limited the change to the case we can really get improvement.
Maybe I can move this change to SDAG to avoid these concern.

4952–4953

https://alive2.llvm.org/ce/z/pQQavA
It looks one less bit for cttz when is_zero_poison=true is wrong.

bcl5980 added inline comments.Feb 8 2023, 7:10 PM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
4952–4953

ah, I'm wrong. Should be llvm::Log2_32_Ceil(SrcBits - is_zero_poison), not Op0 bitwidth - is_zero_poison

goldstein.w.n added inline comments.Feb 8 2023, 7:17 PM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
4950

Seems to only match m_Oneuse(m_Trunc(...)) so seems like only case is the truncate is entirely eliminated, no? Not sure I see how that could cause regressions.

bcl5980 added inline comments.Feb 8 2023, 7:47 PM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
4950

What I mean is not IR level regression. Think about the case source type is i256, the dest type is i32. Remove the trunc and restore the next instruct to be i256 may cause extra instruction on backend.

bcl5980 updated this revision to Diff 496014.Feb 8 2023, 8:34 PM
bcl5980 edited the summary of this revision. (Show Details)

Address comments.

bcl5980 updated this revision to Diff 496015.Feb 8 2023, 8:44 PM

add one use negative tests.

General proofs with Alive2 are probably not possible for a bitwidth constrained transform like this, but you can include proofs for the specific changes here.
How can we test ctpop?

It looks current code already optimized ctpop. So I just remove the ctpop here.

What happens if the trunc has an extra use?

Line 4920 already limit the code to one-use. And I also add tests for it now.

bcl5980 updated this revision to Diff 496019.Feb 8 2023, 9:10 PM

rebase tests.

spatel accepted this revision.Feb 9 2023, 12:47 PM

LGTM

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
4950

This is a gray area. Ideally, if we can eliminate IR instructions, we do it, but it if it results in wider instructions, then it's potentially not an improvement for analysis in IR. The codegen concern is secondary, but yes, we do factor that into the decision because sometimes it is not easy to invert the transform.

For this one, it seems unlikely that we'll gain much by using a potentially expensive (in compile-time) ValueTracking API because we probably already have specialized folds for the common patterns where we'd know the high bits are already cleared ('and' or 'lshr').

4954–4955

This needs an explanation comment. Something like:

// If the "is_zero_poison" argument is set, then we know at least 
// one bit is set in the input, so the result is always at least one
// less than the full bitwidth of that input.
4957

Adjust word choice:

// Make sure the destination is wide enough to hold the largest output of the intrinsic.
This revision is now accepted and ready to land.Feb 9 2023, 12:47 PM