This is an archive of the discontinued LLVM Phabricator instance.

[DagCombine] Increase depth by number of operands to avoid a pathological compile time.
ClosedPublic

Authored by asbirlea on Feb 2 2022, 11:46 PM.

Details

Summary

We're hitting a pathological compile-time case, profiled to be in
DagCombiner::visitTokenFactor and many inserts into a SmallPtrSet.
It looks like one of the paths around findBetterNeighborChains is not
capped and leads to this.

This patch resolves the issue. Looking for feedback if this solution
looks reasonable.

Diff Detail

Event Timeline

asbirlea created this revision.Feb 2 2022, 11:46 PM
asbirlea requested review of this revision.Feb 2 2022, 11:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 2 2022, 11:46 PM

Both these have been reported in the past about large token counts affecting compile time:

https://github.com/llvm/llvm-project/issues/52858
https://github.com/llvm/llvm-project/issues/40519

AFAICT, this patch doesn't make any difference for https://github.com/llvm/llvm-project/issues/52858.

Can folks on this review suggest an alternative solution?

bjope added a comment.Feb 7 2022, 10:53 AM

I'm far from an expert at this (I don't know that much about TokenFactors and the DAGCombiner worklist).
But one thing that caught my attention was the code that https://github.com/llvm/llvm-project/issues/52858 is talking about in visitTokenFactor:

SDValue DAGCombiner::visitTokenFactor(SDNode *N) {
  ...
  // If the sole user is a token factor, we should make sure we have a
  // chance to merge them together. This prevents TF chains from inhibiting
  // optimizations.
  if (N->hasOneUse() && N->use_begin()->getOpcode() == ISD::TokenFactor)
    AddToWorklist(*(N->use_begin()));

I'm wondering why, or rather if it is good idea that, we always add the user to the worklist here. Regardless if the current TokenFactor is simplified or not.
Normally I think we visit nodes bottom-up (or rather in topological order, but well that has puzzled me before because it seems a bit random, see https://discourse.llvm.org/t/shouldnt-dag-combines-be-applied-in-topological-order/59458), so we have probably just visited the using TokenFactor. Visiting that node again seems a bit redundant unless we actually perform some optimizations on the current TokenFactor.

So maybe one shouldn't just add the user to the worklist directly here, instead postponing it to the ends of the function and make it conditional on if any changes were made.

Afaict that gives a speedup to the example in this ticket. When it comes to the code example from https://github.com/llvm/llvm-project/issues/52858, then I couldn't really see any significant differences when using -time-passes even if removing the AddToWorklist completely (so either that reproducer won't work for the commit/target I used when testing, or I might have messed up my testing somehow).

rnk accepted this revision.Feb 9 2022, 1:28 PM
rnk added a subscriber: rnk.

lgtm

I understand that this is not a complete solution to these pathological compile time cases, but this does address some of them, and I think it's important to put correctness before performance. These timeouts are affecting our users, so we have some urgency and can't wait to develop a perfect solution.

I see the repro, but I guess it's still too large to make a good regression test.

This revision is now accepted and ready to land.Feb 9 2022, 1:28 PM
This revision was landed with ongoing or failed builds.Feb 9 2022, 1:35 PM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Feb 9 2022, 1:42 PM

I'm seeing a whole bunch of test failures after this change: http://45.33.8.238/linux/68015/step_12.txt Do you think this could've been caused by your change?