This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] remove use restriction for min/max with not operands (PR35875)
AbandonedPublic

Authored by spatel on Mar 8 2018, 10:59 AM.

Details

Summary

I looked around instcombine for precedence for this, but I don't see any existing folds where it comes up.
I'm proposing that we pull 'not' ops through min/max even if it means adding an instruction (because all of the 'not' values have other uses) .
The justification for this is that we're still eliminating all (4) of the 'not' value uses in the min/max, so it's still a net reduction of uses.
The bigger motivating case can be seen in the last PR35875 test. We end up eliminating instructions there because we can absorb 'not' into 'sub'.

Possible alternatives are:

  1. Do a very narrow/large pattern match for min/max+sub. To get the motivating example, it would actually have to be a match of 3-way min/max + sub.
  2. Defer this fold to some other pass (aggressive-instcombine, GVN?) where we can check users to make sure we won't increase the instruction count.

Diff Detail

Event Timeline

spatel created this revision.Mar 8 2018, 10:59 AM

Sorry for the delay. I obviously like this because it fixes my regression :)

Before I went away I ran a load of benchmarks. They eventually finished and showed no regressions except for this rgb to cmy case (which went up 48% on v6m targets!)

If this is unpalatable because it can increase instruction count, we may be able to do something in IsFreeOrProfitableToInvert in InstCombineSelect. Is checking the users in InstCombine (more than just counting them) not a done thing?

Before I went away I ran a load of benchmarks. They eventually finished and showed no regressions except for this rgb to cmy case (which went up 48% on v6m targets!)

Thanks for verifying that this change does what we intended on the motivating case (and does no noticeable harm otherwise).

If this is unpalatable because it can increase instruction count, we may be able to do something in IsFreeOrProfitableToInvert in InstCombineSelect. Is checking the users in InstCombine (more than just counting them) not a done thing?

It is done sometimes, but I view that code construct with suspicion. It's usually just covering for a misplaced pattern match. Ie, in this case it wouldn't be much different than starting a larger match from a sub.

I still don't know how we should handle this, but staring at a bit more, I see that this is just an application of DeMorgan's Law. So if we proceed down this path, we also want to loosen the use restrictions to allow:

Name: not_not_and_sub_with_not
%nx = xor i8 %x, -1
%ny = xor i8 %y, -1
%nz = xor i8 %z, -1
%a = and i8 %nx, %ny
%s = sub i8 %a, %nz
  =>
%o = or i8 %x, %y
%s = sub i8 %z, %o

https://rise4fun.com/Alive/cvv
...or if we decide to use larger matches starting from the sub, we'd want to match basic logic ops as well as min/max.

Ping * 3.

Anyone want to take a shot at this bit of IR canonicalization theory? :)

There are basically three equivalencies here: ~~a == a, (~a-~b)==(b-a), and UMIN(~a, ~b) == ~UMAX(a, b). The tricky bit is turning those three equivalencies into a profitable transform. instcombine is generally bad at this sort of transform; we don't keep track of potential transforms, so instead we usually just suppress transforms which are not immediately profitable. So maybe we miss some transforms, but at least we aren't making the user's code worse.

So I guess these are the options:

  1. Make the code worse, and hope it gets better later. That's what this patch does, but it's not something we like to do usually; often it happens to be profitable for the particular testcase someone is looking at, but it makes the code worse in other cases.
  2. Use a gigantic match for your exact testcase. This works, but obviously isn't very general.
  3. Make a new pass (or AggressiveInstCombine) which does some sort of "global" optimization (to, for example, minimize the total number of not operations in a function).
spatel abandoned this revision.Apr 6 2018, 8:03 AM

There are basically three equivalencies here: ~~a == a, (~a-~b)==(b-a), and UMIN(~a, ~b) == ~UMAX(a, b). The tricky bit is turning those three equivalencies into a profitable transform. instcombine is generally bad at this sort of transform; we don't keep track of potential transforms, so instead we usually just suppress transforms which are not immediately profitable. So maybe we miss some transforms, but at least we aren't making the user's code worse.

So I guess these are the options:

  1. Make the code worse, and hope it gets better later. That's what this patch does, but it's not something we like to do usually; often it happens to be profitable for the particular testcase someone is looking at, but it makes the code worse in other cases.
  2. Use a gigantic match for your exact testcase. This works, but obviously isn't very general.
  3. Make a new pass (or AggressiveInstCombine) which does some sort of "global" optimization (to, for example, minimize the total number of not operations in a function).

Thanks! I agree with all of this. I figured it was worth a discussion to see if there was support for the 'more instructions, but less uses' argument, but I knew that was shaky at best. Also, getting a perf win by removing a line of code from the compiler was worth a shot.

I think AggressiveInstCombine is a better place to house rare and narrow matchers like what we need here (see also the discussion in D45173). It seems likely that we could also transfer some existing InstCombine functionality over there to reduce the scope and cost of InstCombine. Yes, that pass becomes the new island of misfit folds, but it's easy to remove if people don't like it (only running at -O3 currently).