This is an archive of the discontinued LLVM Phabricator instance.

[PatternMatch] Create match method to track uses complexity
AbandonedPublic

Authored by rampitec on Nov 17 2021, 4:05 PM.

Details

Reviewers
spatel
lebedev.ri
Summary

This patch introduces new match function which can return
a number of matched operations with multiple uses. It also
can take an argument with a limit of such operands number
for match to succeed.

The function is needed if we want to limit a total number
of multiply used operands but do not necessarily care which.

Diff Detail

Event Timeline

rampitec created this revision.Nov 17 2021, 4:05 PM
rampitec requested review of this revision.Nov 17 2021, 4:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 17 2021, 4:05 PM
rampitec retitled this revision from [PatternMatch] Create match method to track uses complexity to [PatternMatch] Create match method to track uses complexity. WIP..Nov 17 2021, 4:05 PM

I'm somewhat unconvinced that we can solve this problem at the match stage.
Consider

v0 = a + c
v1 = v0 + c
v2 = v1 + c
r = v2 + c

if you do match(m_Add(m_Add(m_Add(m_Add(m_Value(A), m_Value(C)), m_Specific(C)), m_Specific(C)), m_Specific(C))),
i think UseComplexity will be 4(?), but that doesn't mean that you can create 4 instructions,
in reality it depends on the pattern you will produce. If you want to produce a + 4*c, sure, you can do that,
because none of the instructions you matched are used in the pattern. But if you want to produce
v2 + 1*c, you can not, because that would increase instruction count...

I'm somewhat unconvinced that we can solve this problem at the match stage.
Consider

v0 = a + c
v1 = v0 + c
v2 = v1 + c
r = v2 + c

if you do match(m_Add(m_Add(m_Add(m_Add(m_Value(A), m_Value(C)), m_Specific(C)), m_Specific(C)), m_Specific(C))),
i think UseComplexity will be 4(?), but that doesn't mean that you can create 4 instructions,
in reality it depends on the pattern you will produce. If you want to produce a + 4*c, sure, you can do that,
because none of the instructions you matched are used in the pattern. But if you want to produce
v2 + 1*c, you can not, because that would increase instruction count...

In that match complexity will be zero because none of the values have more than one use. Than you can rule what is your complexity to use as a limit based on your target transform, just like I did in the example transform with 4 instructions being a threshold just on the grounds of comparing source and target code.

The assumption is that all that instructions go away after transform except those which have uses outside of the source pattern. And then even if not you can adjust the limit. I guess I will have to do it where I am reusing a subexpression, because a reused value shall not count, it just stays, but that needs a proper handling of m_CombineAnd as a max of complexity of its operands, which I have not done yet. Then speaking of that m_CombineOr shall use a complexity of a selected leg.

A side note: I have initially tried to modify match() methods to accept and calculate complexity arguments and being stateless, just return them. I have then realized that turns into nightmare because a match can restart and retry, and that is nearly impossible to track statelessly. The benefit was obvious: an early termination of a match, but the accounting cost quickly became too high and not stateless anyway. That said I believe only a BinOp and UnOp will have to keep state. Well, combine and/or too likely. The leaf matches like m_Value or constants do not count and have complexity zero anyway. So the new match interface will have to sum final complexity after the whole match has succeeded and states recorded.

rampitec updated this revision to Diff 388278.Nov 18 2021, 11:32 AM

Use SFINAE to only add getUseComplexity() method to matchers which really need it. This will be much less code than previously anticipated.

rampitec edited the summary of this revision. (Show Details)Nov 18 2021, 11:35 AM
nikic added a subscriber: nikic.Nov 18 2021, 11:50 AM

Taking a step back here, I want to ask: Is this solving some kind of real problem? I get the impression that use counting in InstCombine has moved beyond a practical consideration into the realm of religious dogma. I don't think there is much value in always making exotic, complex folds maximally applicable in multi-use scenarios, if doing so does not happen to be straightforward.

Taking a step back here, I want to ask: Is this solving some kind of real problem? I get the impression that use counting in InstCombine has moved beyond a practical consideration into the realm of religious dogma. I don't think there is much value in always making exotic, complex folds maximally applicable in multi-use scenarios, if doing so does not happen to be straightforward.

The example use is one application, the rest of the cases in the same function too. In fact in a later experiments I came across a situation when we do not perform reassociation in the demorganing bacause of the multi use issue and I have a hard time solving it without such accounting.

The other practical improvement is that it shall just simplify the accounting. It is easier to write something like 4 - UseComplexity and omit all m_OneUse than counting unwrapped patterns.

rampitec updated this revision to Diff 392843.Dec 8 2021, 11:23 AM
rampitec retitled this revision from [PatternMatch] Create match method to track uses complexity. WIP. to [PatternMatch] Create match method to track uses complexity.
rampitec edited the summary of this revision. (Show Details)

The patch is ready now.

rampitec planned changes to this revision.Dec 9 2021, 11:55 AM

This is actually a bad idea at least as is. A use of an subexpression will preserve the whole subexpression which needs to be accounted in the complexity.

rampitec updated this revision to Diff 393299.Dec 9 2021, 3:21 PM
rampitec edited the summary of this revision. (Show Details)
  • Fixed subexpression cost handling, any use at root now marks each value in subexpression used.
  • Changed all patterns foldComplexAndOrPatterns.
  • Added tests.
rampitec updated this revision to Diff 394627.Dec 15 2021, 11:55 AM

Rebased.
Added handling of newly added case (~A & B & C) | ~(A | B) --> (C | ~B) & ~A.

spatel added a comment.Jan 7 2022, 6:33 AM

I'm not opposed to the direction, but I don't have enough C++ knowledge to review the implementation.

@nikic commented on the exactness of use counting that we've created and wondered if we need to try so hard. I sympathize/agree with that view.
But I've actually hoped that we could loosen the constraints on some patterns - that is, allow the creation of more instructions because the transform is still beneficial from an overall use count.
Take for example one of the motivating Demorgan patterns for this patch:
https://alive2.llvm.org/ce/z/4XGCm3
The 'not' ops have extra uses, so we don't do: (~A & ~B) --> ~(A | B) even though that would get the final result in 2 ops instead of 3. But we don't do the inverse transform either (it requires more than simple pattern-matching).
If we removed some strictness about uses, we probably wouldn't require this patch or the follow-ons for complicated bitwise logic optimizations. Those are only needed because we miss the simpler, intermediate folds in complex expressions with extra uses. I suppose we'd be more at risk for infinite combine loops if we allowed creating extra instructions, but it might be worth experimenting with removing some use checks on bitwise logic folds to see how that plays out.

If we removed some strictness about uses, we probably wouldn't require this patch or the follow-ons for complicated bitwise logic optimizations. Those are only needed because we miss the simpler, intermediate folds in complex expressions with extra uses. I suppose we'd be more at risk for infinite combine loops if we allowed creating extra instructions, but it might be worth experimenting with removing some use checks on bitwise logic folds to see how that plays out.

More than infinite loops (which is also possible) I am worried we may in fact make code worse without taking into account a wider pattern.
Anyhow, we used to discuss this approach in the very beginning of the ternary changes stack, so I thought to give it a try. It turns out to be more complicated change than I was initially anticipated unfortunately. I am not pushing this in any way.

I am going to abandon this and all other instcombine changes due to the lack of interest.

rampitec abandoned this revision.Feb 3 2022, 2:04 PM