This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Clean up after IndVarSimplify
Changes PlannedPublic

Authored by dmgreen on Sep 25 2018, 10:31 AM.

Details

Summary

This is an attempt to fix up the mess of code left over in:
https://godbolt.org/z/ZRUqCX

Where, when compiling this code (optionally having the loop do something):

unsigned test(unsigned A)
{
  while(A >= 32) {
    A -= 32;
  };
  return A;
}

We end up with this remainder code from IndVarSimplify calculating the exit value of A:

define i32 @test(i32 %A) {
entry:
  %l0 = icmp ult i32 %A, 31
  %l1 = select i1 %l0, i32 %A, i32 31
  %umax = xor i32 %l1, -1
  %l2 = add i32 %umax, %A
  %l3 = add i32 %l2, 32
  %l4 = and i32 %l3, -32
  %l5 = sub i32 %A, %l4
  ret i32 %l5
}

This:

(A - (-32 & (32 + A + ~min(A, 31))))

can be simplified to just (https://rise4fun.com/Alive/ARG).

A & 31

This looks like an instcombine problem to me, so here I've put in two new folds. One (here) takes us from:

A + ~min(A, 31)      or   A + ~select(P, A, B)

to

A < 31 ? -1 : A-32        select(P, -1, A+~B)

https://rise4fun.com/Alive/LvO. But can potentially break appart the min/max in the process. In this case we happen to fold back with the 32+ into another min/max, but that might not be the case in general.

The second part has been moved over to D53033. Together they get us down to A&31.

Diff Detail

Event Timeline

dmgreen created this revision.Sep 25 2018, 10:31 AM

Is there a bug report associated with this? I saw a comment about this case somewhere, but I'm not finding it now.

Definitely, let's split these up. I starting looking at the demanded bits half and convinced myself there must be some umin sibling that we miss. This is what I came up with:
https://rise4fun.com/Alive/TSb

Make sure that looks right - I adjusted the precondition from your proof trying to see how this works, then I propagated 'not' vals through that. We should be able to fold both versions (but doesn't have to be in 1 patch).

dmgreen planned changes to this revision.Oct 2 2018, 6:02 AM

Unfortunately, since I did this everything seems to have changed and we now end up with something like:
(S + -32) - (32 & (S + umax(31 - S, -32)))

Which at first glaces looks much tougher, but I'll have a look and see if I can find a way to simplify that.

Is there a bug report associated with this? I saw a comment about this case somewhere, but I'm not finding it now.

There was a bug I put a comment in about this a while back. That was PR38280, but I don't know if it's really related to this exactly. I've seen this problem come up in a few other places (a memcpy in our clibs, one of the cases in cmsis dsp, a customers crc function, etc)

Definitely, let's split these up. I starting looking at the demanded bits half and convinced myself there must be some umin sibling that we miss. This is what I came up with:
https://rise4fun.com/Alive/TSb

Will do, thanks for the examples.

It turns out the other case I ran into above ((S + -32) - (32 & (S + umax(31 - S, -32)))) was from do loops, not while loops. Signed will also be different to unsigned, with signed cases not having quite as small simplified forms.
These are the cases:
https://godbolt.org/z/SE-xhD
With some possible simplifications:
https://rise4fun.com/Alive/slxj

The unsigned_while case is the one fixed here. There others will need more work, with the signed ones not simplifying easily along some of those steps.

Another way to approach this might be to look at simplifying the SCEV directly, but I'm not sure if that will be any easier.

dmgreen updated this revision to Diff 168982.Oct 10 2018, 4:48 AM
dmgreen edited the summary of this revision. (Show Details)
dmgreen added a reviewer: lebedev.ri.

I've taken the demand parts parts out of this, and added some extra tests for the do / while and signed /unsigned cases.

This patch gets us the unsigned while case, and a lot of the signed while case. Like I said, it can split up max's though.

lebedev.ri added inline comments.Oct 10 2018, 5:02 AM
lib/Transforms/InstCombine/InstCombineAddSub.cpp
1197–1202

This will most likely miss some complicated-and-hard-to-catch commutative cases.
First, you most certainly can use m_OneUse() matchers.
Second, those two extra matches will likely be a reason for misses.
I haven't tried it, please check *very* carefully, but i think this should be:

auto my_Select =
    m_CombineOr(m_Select(m_Value(Pred), m_Specific(A), m_Value(B)),
                m_Select(m_Value(Pred), m_Value(B), m_Specific(A)));
if (match(&I,
          m_c_BinOp(m_CombineAnd(m_OneUse(m_Not(m_CombineAnd(
                                     m_OneUse(my_Select), m_Value(Select)))),
                                 m_Value(Not)),
                    m_Value(A))) &&
    IsFreeToInvert(B, B->hasOneUse())) {

Third, IsFreeToInvert() may still cause some misses, so maybe if you really want it should be a matcher too?

dmgreen added inline comments.Oct 10 2018, 8:54 AM
lib/Transforms/InstCombine/InstCombineAddSub.cpp
1197–1202

Do you mean because we may match an add(not(X), not(select)) and miss the select? If so then A would be not(X) and we would have not(select(p, not(X), B)), with B being freely invertible and the whole thing would be inverted to select(p, X, ~B).

Or are there other cases I'm not seeing?

I also ran into problems with what I think was using m_specific and m_value in the same pattern. We are maybe best off not using m_c_BinOp here?

Your point about the m_OneUse's is a good one, I'll use them to make this better.

lebedev.ri added inline comments.Oct 10 2018, 9:02 AM
lib/Transforms/InstCombine/InstCombineAddSub.cpp
1197–1202

Or are there other cases I'm not seeing?

I don't know.
One thing i can say is that in general such a complicated all-on-one match()er
will miss less commutative cases than trying to manually match everything.

I also ran into problems with what I think was using m_specific and m_value in the same pattern.

That is expected.
If you need to catch the value and then check for it in the same match(),
you use m_Value() as usual, and then use m_Deferred() instead of m_Specific(),
and it will just work.

We are maybe best off not using m_c_BinOp here?

No.

dmgreen updated this revision to Diff 169214.Oct 11 2018, 8:02 AM

OK, now one bit ol' matcher. Thanks for the suggestions.

lebedev.ri added inline comments.Oct 11 2018, 8:21 AM
lib/Transforms/InstCombine/InstCombineAddSub.cpp
1201

m_Value(A) will *always* match.
I wonder if would it make sense to swap them around?

1205–1210

How about

Value *TrueVal = ConstantInt::getAllOnesValue(A->getType());
Value *FalseVal = AmC;
if (cast<SelectInst>(Select)->getTrueValue() != A)
  std::swap(TrueVal, FalseVal)
return SelectInst::Create(Pred, TrueVal, FalseVal);
dmgreen updated this revision to Diff 169260.Oct 11 2018, 11:32 AM

Whilst we're here, can anyone think of a good way to simplify:
(S + -32) - (-32 & (S + umax(31 - S, -32)))
That's the "do" case. I think if we distribute the -32& through the add, that with the rest of instcombine + cse + instcombine again does get us down to:
S & 31

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1201

I believe we have to match A before the m_Deferred. I've added a test for matching both ways round.

1205–1210

Nice.

lebedev.ri added inline comments.Oct 11 2018, 11:37 AM
lib/Transforms/InstCombine/InstCombineAddSub.cpp
1201

D'oh, right, sorry.

This looks good to me, one more nit.
I think we should deal with do while in another patch.

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1202

Why is there free-invert restriction on B?
We had 3 instructions:

  1. add A, not(select); root, will always get replaced
  2. not(select); does not exist in the new pattern, should go away, thus one-use
  3. Select(P, A, B); does not exist in the new pattern, should go away, thus one-use

Now, we replace them with 3 instructions:

  1. Select(P, -1, (add A, not(B)))
  2. add A, not(B)
  3. not(B) which is free since B is freely invertible.

But we had 3 instructions and produce 3 instructions, even if B is not free to invert.
So why restrict to freely-invertivbe B?
If it is indeed needed, could you please add it as a comment in the diff ^.

dmgreen updated this revision to Diff 169667.Oct 15 2018, 2:51 AM

I think we should deal with do while in another patch.

Yeah, defo. I just need to come up with a sensible way to fix it. I feel some of this is pushing against the edges of what instcombine should be doing, but I'll keep pushing until someone tells me to stop.

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1202

Sure, sounds about right. I was going for "makes things better" as opposed to "doesn't make things worse".

Consider the code:

%l0 = icmp sgt i32 %A, %B
%l1 = select i1 %l0, i32 %A, i32 %B
%not = xor i32 %l1, -1
%x = add i32 %not, %A

which is add(A, not(max(A, B))
and we (would) convert it to select(icmp(a>b), -1, A+~B). On the target I was looking at we don't tend to have max instructions, but that's obviously not universal. At least if B is invertible we can say this looks better even without the max. (I mean, materialising -1 isn't always free, but that's perhaps more me thinking about inlining than about inst combine.)

Does that sound plausible enough? I've added a comment. If you disagree, I can change things the other way. I don't have benchmarks that suggest that either one is better than the other.

lebedev.ri added inline comments.Oct 15 2018, 2:59 AM
lib/Transforms/InstCombine/InstCombineAddSub.cpp
1202

Obvious counter-point - and what if you already have that bad pattern, which is suboptimal for the target?
If something isn't optimal for the target, then that should be optimized in the backend,
instead of handicapping middle-end. (also, there are more than one target).

dmgreen added inline comments.Oct 15 2018, 12:40 PM
lib/Transforms/InstCombine/InstCombineAddSub.cpp
1202

Sorry, but I'm not sure if I exactly got your point. Do you mean, we should remove the IsFreeToInvert anyway, because it's the backend's job to turn it back into whatever's best for it?

We have a few places where we don't do optimisations on min/max's in instcombine, I think under the assumption that if we break them apart it can be hard to put them back together, but I'm not sure. I'm happy to take it out. Like I said, I ran some benchmarks and couldn't see a difference either way (but perhaps wasn't using the best target for that).

lebedev.ri added inline comments.Oct 15 2018, 12:59 PM
lib/Transforms/InstCombine/InstCombineAddSub.cpp
1202

I'm not sure. I'm only saying that avoiding some transform just because it may hurt backends
is not *always* the right choice. @spatel will know best.

See e.g. D46760, which is actually a https://bugs.llvm.org/show_bug.cgi?id=37872,
which we are not doing because https://reviews.llvm.org/rL155136 / https://reviews.llvm.org/rL155136
so it is now waiting for funnel shift support (https://bugs.llvm.org/show_bug.cgi?id=38152)
And if instead of simply not doing that opt in middle-end, the back-end would have been fixed back then,
how far further would we be now? </rant>

Anyway, if we keep this one-use restriction,
i *think* we also need to make sure that we have the *opposite* transform.
I.e. select(icmp(a>b), -1, A+~B) -> add(A, not(max(A, B)) iff B is not free to invert.

Are you still working on this?

dmgreen planned changes to this revision.Jan 14 2019, 8:15 AM

Are you still working on this?

I'll hopefully get back to this, but not right now.

I have recently taken another looks at this, and came to the conclusion that, especially for some of the signed loops, we probably shouldn't be expanding these out in IndVars. They won't simplify even if we try to clear them up in instcombine. There is a check in there to bail if they are high cost SCEVs, but the logic seems a bit wrong and is treating them as low cost just because they are divided by a power of 2. (Which is SCEV's way of doing the mod).

There's a fix in D58435 to fix that. This may need to be attacked in SCEV too, if we want the "and" in the cases that simplify.

SCEV/IndVars or not, this is still terrible IR.

sanjoy removed a subscriber: sanjoy.Jun 30 2019, 10:12 AM
lebedev.ri resigned from this revision.Jan 12 2023, 4:42 PM

This review seems to be stuck/dead, consider abandoning if no longer relevant.

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 4:42 PM
Herald added a subscriber: StephenFan. · View Herald Transcript