This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] optimize unsigned icmp of increment
ClosedPublic

Authored by Deewiant on Sep 17 2016, 8:13 AM.

Details

Summary

Allows LLVM to optimize sequences like the following:

%add = add nuw i32 %x, 1
%cmp = icmp ugt i32 %add, %y

Into:

%cmp = icmp uge i32 %x, %y

Previously, only signed comparisons were being handled.

Decrements could also be handled, but 'sub nuw %x, 1' is currently canonicalized to 'add %x, -1' in InstCombineAddSub, losing the nuw flag. Removing that canonicalization seems like it might have far-reaching ramifications so I kept this simple for now.

Event Timeline

Deewiant updated this revision to Diff 71733.Sep 17 2016, 8:13 AM
Deewiant retitled this revision from to [InstCombine] optimize unsigned icmp of inc/dec like signed.
Deewiant updated this object.
Deewiant added reviewers: majnemer, spatel.
Deewiant added a subscriber: llvm-commits.
sanjoy requested changes to this revision.Sep 19 2016, 11:49 AM
sanjoy added a reviewer: sanjoy.
sanjoy added a subscriber: sanjoy.

I think we can be more aggressive here: the only non-poison value for (X +nuw -1) is -1, and we should be exploiting that.

lib/Transforms/InstCombine/InstCombineCompares.cpp
2832

This looks wrong -- if X and Y are both 0 then uge (X +nuw -1) Y is uge -1 0 == true but ugt X Y is ugt 0 0 = false. Same for some of the cases below.

This revision now requires changes to proceed.Sep 19 2016, 11:49 AM

Regarding aggressiveness, simplifying X +nuw -1 to -1 appears worth doing but I figured it deserves a separate change.

lib/Transforms/InstCombine/InstCombineCompares.cpp
2832

D'oh, you're right. I guess this relates to what I said initially about X +nuw -1 vs X -nuw 1: unless I screwed up even more, these properties do hold for subtractions, but we canonicalize them to nuwless additions so it wouldn't be very useful to match them. Would you rather have me remove these cases entirely or do the currently-useless subtraction matching in the hopes that the canonicalization is changed? I can also do that canonicalization change in InstCombineAddSub but I'm worried that it might have far-reaching effects.

sanjoy added inline comments.Sep 19 2016, 12:51 PM
lib/Transforms/InstCombine/InstCombineCompares.cpp
2832

I'd suggest removing these for now, and adding a TODO about supporting subtractions later if the canonicalization changes.

spatel edited edge metadata.Sep 19 2016, 1:40 PM

Not sure if this is easier to read, but if the existing code was rewritten with something like this:

auto getNewPred = [](CmpInst::Predicate P, Value *AddConstant) {
  if (!match(AddConstant, m_AllOnes()) && !match(AddConstant, m_One()))
    return ICmpInst::BAD_ICMP_PREDICATE;

  switch (P) {
  case CmpInst::ICMP_SLT: return CmpInst::ICMP_SLE;
  case CmpInst::ICMP_SGE: return CmpInst::ICMP_SGT;
  case CmpInst::ICMP_SLE: return CmpInst::ICMP_SLT;
  case CmpInst::ICMP_SGT: return CmpInst::ICMP_SGE;
  default: return CmpInst::BAD_ICMP_PREDICATE;
  }
};

Then your patch would just be to add some cases to the switch?

Not sure if this is easier to read, but if the existing code was rewritten with something like this:

auto getNewPred = [](CmpInst::Predicate P, Value *AddConstant) {
  if (!match(AddConstant, m_AllOnes()) && !match(AddConstant, m_One()))
    return ICmpInst::BAD_ICMP_PREDICATE;

  switch (P) {
  case CmpInst::ICMP_SLT: return CmpInst::ICMP_SLE;
  case CmpInst::ICMP_SGE: return CmpInst::ICMP_SGT;
  case CmpInst::ICMP_SLE: return CmpInst::ICMP_SLT;
  case CmpInst::ICMP_SGT: return CmpInst::ICMP_SGE;
  default: return CmpInst::BAD_ICMP_PREDICATE;
  }
};

Then your patch would just be to add some cases to the switch?

If I understood your meaning, I think using just a function like that would result in this matching too many cases. E.g. if the usage would be something like:

const auto NewPred = getNewPred(Pred);
if (A && NoOp0WrapProblem)
  return new ICmpInst(NewPred, A, Op1);

We'd be performing incorrect transformations like icmp sge (X + 1), Y -> icmp sgt X, Y. So we'd still have to combine specific predicates with specific match calls (and non-null A/C checks) somehow, and at that point I don't think that would really improve readability.

Deewiant updated this revision to Diff 71957.Sep 20 2016, 11:20 AM
Deewiant retitled this revision from [InstCombine] optimize unsigned icmp of inc/dec like signed to [InstCombine] optimize unsigned icmp of increment.
Deewiant updated this object.
Deewiant edited edge metadata.

Removed the incorrect X +nuw -1 identities, replaced with a TODO about X - 1 identities.

We'd be performing incorrect transformations like icmp sge (X + 1), Y -> icmp sgt X, Y. So we'd still have to combine specific predicates with specific match calls (and non-null A/C checks) somehow, and at that point I don't think that would really improve readability.

Ah, yes. We should probably have some negative regression tests for those cases. :)
There's code in canonicalizeCmpWithConstant() that does similar matching, so maybe there's still some way to shrink the logic, but it can wait.

sanjoy requested changes to this revision.Sep 26 2016, 11:56 AM
sanjoy edited edge metadata.

Please upload with full context. One easy way to do that is to use the arcanist tool.

This revision now requires changes to proceed.Sep 26 2016, 11:56 AM
Deewiant updated this revision to Diff 72533.Sep 26 2016, 12:05 PM
Deewiant edited edge metadata.

Sorry, I wasn't careful when I did the update. Trying arcanist now.

sanjoy accepted this revision.Sep 26 2016, 12:11 PM
sanjoy edited edge metadata.

lgtm with nits

test/Transforms/InstCombine/icmp.ll
2769

Add one or two negative checks.

2803

I'd drop the fptosi unless you need it for some reason -- it is distracting.

This revision is now accepted and ready to land.Sep 26 2016, 12:11 PM
Deewiant updated this revision to Diff 72547.Sep 26 2016, 12:54 PM
Deewiant edited edge metadata.

[InstCombine] add negative tests for icmp of inc/dec

test/Transforms/InstCombine/icmp.ll
2769

Added one for each combination of signed/unsigned, LHS/RHS, increment/decrement, for a total of eight.

2803

I copied the idea from the tests for the nsw case further above (e.g. cmp_sgt_rhs_dec). I haven't actually looked into what exactly is happening here: without fptosi the comparisons get flipped so that the increment is on the LHS, which would make these tests equivalent to the earlier ones.

spatel added inline comments.Sep 26 2016, 1:04 PM
test/Transforms/InstCombine/icmp.ll
2803

The commuting happens because of operand complexity canonicalization. It would be nice to add a comment above here to make that clear.

Please see:
https://llvm.org/bugs/show_bug.cgi?id=28296
https://reviews.llvm.org/D24419
...for more info.

lgtm, as stated earlier.

If you don't have commit access yet, let me know and I will commit this for you.

test/Transforms/InstCombine/icmp.ll
2803

Ok.

spatel added inline comments.Sep 26 2016, 1:14 PM
test/Transforms/InstCombine/icmp.ll
2803

Also note that fptosi is a unary operator, so if we fix PR28296, these tests will silently stop testing the cases that they were hoping to test (sigh).

It might be better if they used a non-nuw 'add' to match the complexity of 'add nuw'.

Deewiant added inline comments.Sep 26 2016, 9:34 PM
test/Transforms/InstCombine/icmp.ll
2803

If the canonicalization starts working differently the tests would rather start failing, since the icmp CHECKs would no longer match due to the reversed comparison.

Should I duplicate the comment next to every fptosi-using test here? There are 10 of them altogether.

spatel added inline comments.Sep 27 2016, 6:46 AM
test/Transforms/InstCombine/icmp.ll
2803

If the commuted operand cases don't fold, that's a logic hole (see D24419).
But I don't think that will happen in this case - we have duplicated all of the patterns using NoOp0WrapProblem / NoOp1WrapProblem?

I see now that the tests are scattered all over the place. One comment line is enough for now. Sooner or later, I'll get back to this chunk of code to make it vector-capable, so I would move all of the nsw/nuw tests to their own test file.

If you want to make that change ahead of this patch, feel free, but I wouldn't hold up this patch since it is already approved.

spatel added a comment.Jan 9 2017, 8:15 AM

This patch hasn't been committed to trunk yet although it is accepted. Any more changes wanted/needed?

D'oh, sorry about leaving this hanging for so long. Apparently I never received an e-mail for your last message about adding a comment for the tests and then never checked back on the web page either. I should be able to add the comment tomorrow and then this is hopefully good to go.

Deewiant updated this revision to Diff 84343.Jan 13 2017, 11:39 AM
  • [InstCombine] explain fptosi usage in tests
spatel accepted this revision.Jan 13 2017, 11:53 AM
spatel edited edge metadata.

LGTM - do you have commit access?

Nope, I do not have commit access.

This revision was automatically updated to reflect the committed changes.