This is an archive of the discontinued LLVM Phabricator instance.

[ValueTracking] don't recursively compute known bits using multiple llvm.assumes
ClosedPublic

Authored by spatel on Apr 15 2021, 8:56 AM.

Details

Summary

This is an alternative to D99759 to avoid the compile-time explosion seen in:
https://llvm.org/PR49785

The suggestion was to make the exclusion logic stronger to avoid blowing up, but I'm not seeing how to do that. Note that we reduced the complexity of the exclusion mechanism in D16204 because it was too costly.

So I'm questioning the need for recursion/exclusion entirely - what is the optimization value vs. cost of recursively computing known bits based on assumptions? This was built into the implementation from the start with 60db058, and we have kept adding code/cost to deal with that capability.

By clearing the query's AssumptionCache inside computeKnownBitsFromAssume(), this patch retains all existing assume functionality except refining known bits based on even more assumptions.

We have 1 regression test that shows a difference in optimization power. Is that example representative of real-world llvm.assume usage?

Diff Detail

Event Timeline

spatel created this revision.Apr 15 2021, 8:56 AM
spatel requested review of this revision.Apr 15 2021, 8:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 15 2021, 8:56 AM
nikic accepted this revision.Apr 15 2021, 9:39 AM

The suggestion was to make the exclusion logic stronger to avoid blowing up, but I'm not seeing how to do that. Note that we reduced the complexity of the exclusion mechanism in D16204 because it was too costly.

What I had in mind was to exclude not a specific assume, but all assumes on a given value (the value passed to assumptionsFor). However, this change by itself did not fix the issue (only in conjunction with a limit on number of assumes visited in the loop).

I like your idea here though. Recursing over assumes does seem to be of rather questionable value. For the affected example, if we use less silly instruction ordering, GVN would canonicalize the variables (https://llvm.godbolt.org/z/1cn93nEh1) after which the recursion is no longer necessary.

So this LGTM. Maybe wait a bit in case someone has an alternative suggestion.

This revision is now accepted and ready to land.Apr 15 2021, 9:39 AM

The suggestion was to make the exclusion logic stronger to avoid blowing up, but I'm not seeing how to do that. Note that we reduced the complexity of the exclusion mechanism in D16204 because it was too costly.

What I had in mind was to exclude not a specific assume, but all assumes on a given value (the value passed to assumptionsFor). However, this change by itself did not fix the issue (only in conjunction with a limit on number of assumes visited in the loop).

I like your idea here though. Recursing over assumes does seem to be of rather questionable value.

Note that we canonicalize assume(x && y) to assume(x), assume(y), so we really ought to do better than just give up.

For the affected example, if we use less silly instruction ordering, GVN would canonicalize the variables (https://llvm.godbolt.org/z/1cn93nEh1) after which the recursion is no longer necessary.

So this LGTM. Maybe wait a bit in case someone has an alternative suggestion.

Note that we canonicalize assume(x && y) to assume(x), assume(y), so we really ought to do better than just give up.

Not sure I see what relation that has to this issue. We still process all applicable assumes. We'll just not make use of even more assumes when doing recursive queries while already handling an assume.

Let me try to state the core problem here: ValueTracking walks are depth-limited. This is fine as long as the branching factor is low. At a typical branching factor <= 2, going to a depth of 6 has maximum complexity of 2^64, which is reasonable. However, we need to be careful whenever the branching factor is higher than that. For a factor b=3, complexity is already 3^6 = 729, which is beyond reasonable bounds. I believe there are only two cases where we can run into this issue: One is phi nodes, where the branching factor is the number of phi node arguments. The other are assumes, where the branching factor is the number of applicable assumes for a value.

In such cases, we need to apply *some* kind of limit to avoid pathological compile-time costs. What we do for phi nodes is to perform the recursive known bits queries with a depth of MaxDepth-1, which aggressively limits the number of recursions we can do with high branching factor. This patch uses an alternative approach, which forbids further use a assumes in the recursive queries.

That does also suggest a possible alternative approach here, which is to do the same thing we do for phi nodes and do the recursive queries with MaxDepth-1.

That does also suggest a possible alternative approach here, which is to do the same thing we do for phi nodes and do the recursive queries with MaxDepth-1.

Right - I drafted something with that approach, so I can clean it up and post it if that's preferable.

That's a more limiting approach IIUC because that would cut off all recursive analysis for the computeKnownBits calls within computeKnownBitsFromAssume(). The proposal here still allows recursion through other methods like computeKnownBitsFromOperator(), but only clips before we descend via another round of computeKnownBitsFromAssume().

nikic added a comment.Apr 15 2021, 1:17 PM

That does also suggest a possible alternative approach here, which is to do the same thing we do for phi nodes and do the recursive queries with MaxDepth-1.

Right - I drafted something with that approach, so I can clean it up and post it if that's preferable.

That's a more limiting approach IIUC because that would cut off all recursive analysis for the computeKnownBits calls within computeKnownBitsFromAssume(). The proposal here still allows recursion through other methods like computeKnownBitsFromOperator(), but only clips before we descend via another round of computeKnownBitsFromAssume().

I think both variants would perform similarly in practice, because these recursive known bits calls are mostly just unnecessary over-generalization. In practice, we expect assume patterns to be things like (a & 15) == 0 to say that a value is aligned, not (a & k1) == k2, where we happen to be able to infer something based on known bits in k1 and k2.

As I don't expect either variant to have much practical impact, I'd be okay either way, but prefer the option you implemented in this patch, because it also gets rid of the assume exclusion mechanism. That's a nice cleanup.

This revision was landed with ongoing or failed builds.Apr 16 2021, 5:46 AM
This revision was automatically updated to reflect the committed changes.