This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine]: Add cases for assume (X & Y != {0|Y})
ClosedPublic

Authored by goldstein.w.n on Jan 2 2023, 11:22 AM.

Details

Summary

If X or Y is known to be a non-zero power of 2 we can prove the known
bit.

I.e assume X is a variable and Y is a power of 2 constants.

  1. assume(X & Y == 0) -> Known(X).Zeros[Log2(Y)] = 1
  1. assume(X & Y == Y) -> Known(X).Ones[Log2(Y)] = 1

Diff Detail

Event Timeline

goldstein.w.n created this revision.Jan 2 2023, 11:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 2 2023, 11:22 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
goldstein.w.n requested review of this revision.Jan 2 2023, 11:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 2 2023, 11:22 AM

Note:

I have not verified the transformations in this patch individually, I verified the net changes from D140850, D140851, and D140852 in bulk.

nikic added inline comments.Jan 4 2023, 6:14 AM
llvm/lib/Analysis/ValueTracking.cpp
701

I believe the placement of this call was intentional, because it is more expansive than the icmp operand checks. That said, I don't see a compile-time impact on CTMark, though that is not exactly an assume-heavy workload.

We did address various inefficiencies in isValidAssumeForContext() over time though, so I would be fine with trying this change. It should be split out into a separate NFC patch though.

948

It looks like we are missing this canonicalization: https://alive2.llvm.org/ce/z/MtveLU With that done, this becomes v & b == 0 and is covered by existing handling.

952

Move this computeKnownBits() call into the isKnownToBeAPowerOfTwo() branch.

959

This should just check whether A is zero, no need to compute known bits.

goldstein.w.n added inline comments.Jan 4 2023, 9:24 AM
llvm/lib/Analysis/ValueTracking.cpp
948

It looks like we are missing this canonicalization: https://alive2.llvm.org/ce/z/MtveLU With that done, this becomes v & b == 0 and is covered by existing handling.

So would a better approach be to handle the v & b != a canonicalization in InstCombine and drop this case or leave this as is?

nikic added a reviewer: spatel.Jan 4 2023, 9:35 AM
nikic added inline comments.
llvm/lib/Analysis/ValueTracking.cpp
948

Handling this in InstCombine would be preferred, it reduces the number of patterns other passes see.

goldstein.w.n added inline comments.Jan 4 2023, 10:10 AM
llvm/lib/Analysis/ValueTracking.cpp
948

Handling this in InstCombine would be preferred, it reduces the number of patterns other passes see.

Do you know where I should look in InstCombineCompares for where to put this?

959

This should just check whether A is zero, no need to compute known bits.

How can I check if a value is known zero w.o computeKnownBits? I see isKnownNonZero but not the inverse.

goldstein.w.n marked 2 inline comments as not done.Jan 4 2023, 10:12 AM

Remove case from computeKnownBits ICMP_NE and cannocilize instead

goldstein.w.n marked 4 inline comments as done.Jan 4 2023, 4:13 PM
goldstein.w.n added inline comments.
llvm/lib/Analysis/ValueTracking.cpp
701

I believe the placement of this call was intentional, because it is more expansive than the icmp operand checks. That said, I don't see a compile-time impact on CTMark, though that is not exactly an assume-heavy workload.

We did address various inefficiencies in isValidAssumeForContext() over time though, so I would be fine with trying this change. It should be split out into a separate NFC patch though.

Fair enough. Once this is all through I'll make a patch and can discuss there.

spatel added inline comments.Jan 5 2023, 9:03 AM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
4832–4834 ↗(On Diff #486423)

This should be split into an independent patch with several tests.

Double-check to make sure this is the same logic, but I think it'd be shorter and easier to read without the loop:

// If Op1 is a power-of-2 (has exactly one bit set):
// (Op1 & X) == Op1 --> (Op1 & X) != 0
// (Op1 & X) != Op1 --> (Op1 & X) == 0
if (match(Op0, m_c_And(m_Specific(Op1), m_Value())) &&
    isKnownToBeAPowerOfTwo(Op1, false, 0, &I)) {
  return new ICmpInst(CmpInst::getInversePredicate(Pred), Op0,
                      ConstantInt::getNullValue(Op0->getType()));
}
// If Op0 is a power-of-2 (has exactly one bit set):
// Op0 == (Op0 & X) --> (Op0 & X) != 0
// Op0 != (Op0 & X) --> (Op0 & X) == 0
if (match(Op1, m_c_And(m_Specific(Op0), m_Value())) &&
    isKnownToBeAPowerOfTwo(Op0, false, 0, &I)) {
  return new ICmpInst(CmpInst::getInversePredicate(Pred), Op1,
                      ConstantInt::getNullValue(Op0->getType()));
}
goldstein.w.n marked an inline comment as done.Jan 5 2023, 9:19 AM
goldstein.w.n added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
4832–4834 ↗(On Diff #486423)

This should be split into an independent patch with several tests.

Sure, if I split it off will I be able to just revise the parent/child status of the surrounding reviews to link it in or will I need to make 5-new ones?

Double-check to make sure this is the same logic, but I think it'd be shorter and easier to read without the loop:

// If Op1 is a power-of-2 (has exactly one bit set):
// (Op1 & X) == Op1 --> (Op1 & X) != 0
// (Op1 & X) != Op1 --> (Op1 & X) == 0
if (match(Op0, m_c_And(m_Specific(Op1), m_Value())) &&
    isKnownToBeAPowerOfTwo(Op1, false, 0, &I)) {
  return new ICmpInst(CmpInst::getInversePredicate(Pred), Op0,
                      ConstantInt::getNullValue(Op0->getType()));
}
// If Op0 is a power-of-2 (has exactly one bit set):
// Op0 == (Op0 & X) --> (Op0 & X) != 0
// Op0 != (Op0 & X) --> (Op0 & X) == 0
if (match(Op1, m_c_And(m_Specific(Op0), m_Value())) &&
    isKnownToBeAPowerOfTwo(Op0, false, 0, &I)) {
  return new ICmpInst(CmpInst::getInversePredicate(Pred), Op1,
                      ConstantInt::getNullValue(Op0->getType()));
}

Will take a look for next version.

spatel added inline comments.Jan 5 2023, 9:30 AM
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
4832–4834 ↗(On Diff #486423)

It's independent of anything else, so there's no need to update the other patches up for review? Just pull out this hunk and regenerate test diffs if needed.

I just added some tests here:
ad14cef1d530

nikic added inline comments.Jan 5 2023, 10:50 AM
llvm/lib/Analysis/ValueTracking.cpp
959

match(A, m_Zero()) would be the usual way.

goldstein.w.n added inline comments.Jan 5 2023, 10:52 AM
llvm/lib/Analysis/ValueTracking.cpp
959

match(A, m_Zero()) would be the usual way.

Won't that only match Constant types? Could miss a case no?

Propegating changes from rebase

goldstein.w.n marked 2 inline comments as done.Jan 5 2023, 2:33 PM
goldstein.w.n added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
4832–4834 ↗(On Diff #486423)

This should be split into an independent patch with several tests.

Double-check to make sure this is the same logic, but I think it'd be shorter and easier to read without the loop:

// If Op1 is a power-of-2 (has exactly one bit set):
// (Op1 & X) == Op1 --> (Op1 & X) != 0
// (Op1 & X) != Op1 --> (Op1 & X) == 0
if (match(Op0, m_c_And(m_Specific(Op1), m_Value())) &&
    isKnownToBeAPowerOfTwo(Op1, false, 0, &I)) {
  return new ICmpInst(CmpInst::getInversePredicate(Pred), Op0,
                      ConstantInt::getNullValue(Op0->getType()));
}
// If Op0 is a power-of-2 (has exactly one bit set):
// Op0 == (Op0 & X) --> (Op0 & X) != 0
// Op0 != (Op0 & X) --> (Op0 & X) == 0
if (match(Op1, m_c_And(m_Specific(Op0), m_Value())) &&
    isKnownToBeAPowerOfTwo(Op0, false, 0, &I)) {
  return new ICmpInst(CmpInst::getInversePredicate(Pred), Op1,
                      ConstantInt::getNullValue(Op0->getType()));
}

Done (and used your code, its much less verbose), added you as reviewer but see: https://reviews.llvm.org/D141090

nikic added inline comments.Jan 6 2023, 5:19 AM
llvm/lib/Analysis/AssumptionCache.cpp
131–132

nit: Drop newline

135

Could add match(B, m_Zero()) here and drop the case below, no longer relevant.

llvm/lib/Analysis/ValueTracking.cpp
944

and drop comments below, merge conditions, etc. We're only handling the one case now.

959

If all bits are known zero, uses should be replaced by a zero value. There is no need to handle non-constant case here.

In fact, you should also drop the isKnnownToBeAPowerOfTwo and computeKnownBits calls on B, because we don't just need a power of two, we need to know which power of two it is: It has to be constant. We can just use match(B, m_Power2(Pow2)) instead.

goldstein.w.n marked 6 inline comments as done.Jan 6 2023, 11:19 AM

Remove unnecessary code b.c of canonicalizations earlier / only using constants

nikic added a comment.Jan 10 2023, 8:22 AM

The implementation looks basically fine. I think it would be easiest to land this if it's detached from the patch stack, with some dedicated tests -- which would probably be pretty much just these?

declare void @llvm.assume(i1)

define i32 @pow2(i32 %x) {
  %and = and i32 %x, 4
  %cmp = icmp ne i32 %and, 0
  call void @llvm.assume(i1 %cmp)
  %and2 = and i32 %x, 4
  ret i32 %and2
}

define i32 @not_pow2(i32 %x) {
  %and = and i32 %x, 3
  %cmp = icmp ne i32 %and, 0
  call void @llvm.assume(i1 %cmp)
  %and2 = and i32 %x, 3
  ret i32 %and2
}
llvm/lib/Analysis/ValueTracking.cpp
943

Huh, I would have expected this to require {} due to the variable declaration.

946

The c_s here can be dropped due to canonicalization (constant on the right).

947

!BPow2->isZero() is not necessary, m_Power2 only matches power of twos (unlike m_Power2OrZero).

Seperate from series and add indepedent tests

goldstein.w.n retitled this revision from [Patch 3/4]: Add cases for assume (X & Y != {0|Y}) to [InstCombine]: Add cases for assume (X & Y != {0|Y}).Jan 10 2023, 10:59 AM
goldstein.w.n marked 3 inline comments as done.
goldstein.w.n added inline comments.
llvm/lib/Analysis/ValueTracking.cpp
943

Added braces.

goldstein.w.n marked an inline comment as done.EditedJan 10 2023, 11:16 AM

The implementation looks basically fine. I think it would be easiest to land this if it's detached from the patch stack, with some dedicated tests -- which would probably be pretty much just these?

Done, test patch is: https://reviews.llvm.org/D141412

declare void @llvm.assume(i1)

define i32 @pow2(i32 %x) {
  %and = and i32 %x, 4
  %cmp = icmp ne i32 %and, 0
  call void @llvm.assume(i1 %cmp)
  %and2 = and i32 %x, 4
  ret i32 %and2
}

define i32 @not_pow2(i32 %x) {
  %and = and i32 %x, 3
  %cmp = icmp ne i32 %and, 0
  call void @llvm.assume(i1 %cmp)
  %and2 = and i32 %x, 3
  ret i32 %and2
}

Added those tests along with some other variations.
Some of which will be "fixed" in D140852 (the _br cases).

Also added some tests for when the power of 2 is non-constant but
we still "should" be able to prove we don't need the operation (will
look into that more later).

My plan is:

  1. get this in
  2. Split D140852 from the series and submit that (it will cleanup some cases here)
  3. Split D140850 and D140849 and independent patches (just the mul case)
  4. non-constant cases.

That sound reasonable?


Also, if you have a moment about an unrelated thing. I'm working on a patch
for SimplifyLibCalls to see if there is a previous dominating slen = strlen(s) call
that will allow transformation for str*(s, ...) -> mem*(s, ..., slen) even if the string is non-constant.
To do this, it needs to check that between the strlen call and lib-call the memory
*s can't have been clobbered. I think the "right" way to do this is memoryssa but
it seems ridiculous (and overly consuming of compile time) to add memoryssa to all
of instcombine just for this one relatively unimportant pass. I saw we have AAResults
already computed so started with that, but it only works on a single basic-block (and since
we don't have postdom tree there doesn't seem to be any cheap way to collect all possible
affected basic-blocks between the strlen call and lib-call. My thought for how to handle
this was either:

  1. Bite the bullet and use memoryssa throughout instcombine.
  2. Only handle strlen calls in the same basic-block as the lib-call and use AAResults
  3. Add a file to "canonicalize" lib calls before EarlyCSE by making ALL str*(s,...) functions mem*(s, ..., strlen(s)), let EarlyCSE remove all duplicate strlen calls, then make the SimplifyLibCall in instcombine replace mem*(s, ..., strlen(s)) with the str* equivilent.

Do you have any advise about which of these (if any) make the most sense. And if none
how you might approach it?

Propegate test changes

Propegate test changes

Propegate test changes

nikic accepted this revision.Jan 11 2023, 1:17 AM

LGTM

Also, if you have a moment about an unrelated thing. I'm working on a patch
for SimplifyLibCalls to see if there is a previous dominating slen = strlen(s) call
that will allow transformation for str*(s, ...) -> mem*(s, ..., slen) even if the string is non-constant.
To do this, it needs to check that between the strlen call and lib-call the memory
*s can't have been clobbered. I think the "right" way to do this is memoryssa but
it seems ridiculous (and overly consuming of compile time) to add memoryssa to all
of instcombine just for this one relatively unimportant pass. I saw we have AAResults
already computed so started with that, but it only works on a single basic-block (and since
we don't have postdom tree there doesn't seem to be any cheap way to collect all possible
affected basic-blocks between the strlen call and lib-call. My thought for how to handle
this was either:

  1. Bite the bullet and use memoryssa throughout instcombine.

Extremely unlikely. Both because of compile-time impact, and because this would add implementation complexity to all other transforms, that now need to keep MSSA up to date.

  1. Only handle strlen calls in the same basic-block as the lib-call and use AAResults

This is possible -- probably easiest if it covers your motivating cases.

  1. Add a file to "canonicalize" lib calls before EarlyCSE by making ALL str*(s,...) functions mem*(s, ..., strlen(s)), let EarlyCSE remove all duplicate strlen calls, then make the SimplifyLibCall in instcombine replace mem*(s, ..., strlen(s)) with the str* equivilent.

Handling this in EarlyCSE/GVN does seem most principled. I'm not sure it's necessary to actually perform the conversion from str to mem+strlen in advance -- shouldn't we be able to look up whether we have an available strlen call, and perform the transform from str to mem if we do? Of course, it would be a bit of an awkward special case...

This revision is now accepted and ready to land.Jan 11 2023, 1:17 AM
This revision was landed with ongoing or failed builds.Jan 11 2023, 1:21 AM
This revision was automatically updated to reflect the committed changes.

LGTM

Also, if you have a moment about an unrelated thing. I'm working on a patch
for SimplifyLibCalls to see if there is a previous dominating slen = strlen(s) call
that will allow transformation for str*(s, ...) -> mem*(s, ..., slen) even if the string is non-constant.
To do this, it needs to check that between the strlen call and lib-call the memory
*s can't have been clobbered. I think the "right" way to do this is memoryssa but
it seems ridiculous (and overly consuming of compile time) to add memoryssa to all
of instcombine just for this one relatively unimportant pass. I saw we have AAResults
already computed so started with that, but it only works on a single basic-block (and since
we don't have postdom tree there doesn't seem to be any cheap way to collect all possible
affected basic-blocks between the strlen call and lib-call. My thought for how to handle
this was either:

  1. Bite the bullet and use memoryssa throughout instcombine.

Extremely unlikely. Both because of compile-time impact, and because this would add implementation complexity to all other transforms, that now need to keep MSSA up to date.

  1. Only handle strlen calls in the same basic-block as the lib-call and use AAResults

This is possible -- probably easiest if it covers your motivating cases.

  1. Add a file to "canonicalize" lib calls before EarlyCSE by making ALL str*(s,...) functions mem*(s, ..., strlen(s)), let EarlyCSE remove all duplicate strlen calls, then make the SimplifyLibCall in instcombine replace mem*(s, ..., strlen(s)) with the str* equivilent.

Handling this in EarlyCSE/GVN does seem most principled. I'm not sure it's necessary to actually perform the conversion from str to mem+strlen in advance -- shouldn't we be able to look up whether we have an available strlen call, and perform the transform from str to mem if we do? Of course, it would be a bit of an awkward special case...

Thank you for the advise :)

The canonicalization wouldn't be necessary, it would be just be a simple way to avoid having to check the strlen case for each function (although
then again we would need to be able to reverse it later which may have the same level of complexity). Just searching for a dominating + valid
strlen call and using that isn't too much of a special case as is, most of the functions already check for a constant-time strlen so its really just
a matter of adding an extra branch in existing infrastructure.

Think I'll give EarlyCSE/GVN a shot (I was leaning towards EarlyCSE), if it doesn't work out i'll just do in the AAResults method.