This is an archive of the discontinued LLVM Phabricator instance.

Improve ScalarEvolution::forgetLoop() performance
ClosedPublic

Authored by kariddi on Sep 9 2017, 2:47 AM.

Details

Summary

In some tests I found forgetLoop() in ScalarEvolution can be pretty slow.

Specifically in one test I measured performance and I found that UnrollLoop took 17% of total compile time with a good 10% of the 17% being SCEV::forgetLoop().

The code seems to be churning quite a bit of instructions in some worse case scenario. The fact also that the algorithm is recursive and that we don't reuse the data structures with the child loops don't help.

I refactored it to be iterative which enables us to reuse the data structures for computing the children loops and I made it such that the "Visited" set is shared between the processing of all the sub loops .

I'm not sure if that is correct because I'm not a SCEV expert and I would like feedback on that. At a naive look it seems like that if you processed an instruction already I don't see why , if you end up encountering it again while processing a child loop, you should process it again, as it doesn't seem the information of being part of a specific loop is used for "forgetMemoizedResults()" or "eraseValueFromMap()"

These changes pretty much halve the time taken by the method in my case which is still not ideal but it's much better. It passes all the LLVM and Clang make tests like this.

I also experimented in adding this:

    if (It != ValueExprMap.end()) {
      eraseValueFromMap(It->first);
      forgetMemoizedResults(It->second);
      if (PHINode *PN = dyn_cast<PHINode>(I))
        ConstantEvolutionLoopExitValue.erase(PN);
-  }
+ } else
+    continue;

Which basically makes the algorithm to not continue following the def-use chain if we couldn't find a value in the ValueExprMap. I'm not sure this change is correct either, but it is worth noting that with this change the time of the method goes to 0.1% compile time in my case (from 10% originally) and still passes all the LLVM and Clang make tests.
I tried this change only from intuition though. Does anybody know if it is actually valid to do so?

Diff Detail

Event Timeline

kariddi created this revision.Sep 9 2017, 2:47 AM
kariddi edited the summary of this revision. (Show Details)
kariddi edited the summary of this revision. (Show Details)
kariddi edited the summary of this revision. (Show Details)Sep 9 2017, 2:51 AM
kariddi updated this revision to Diff 114489.Sep 9 2017, 9:44 AM

Fixed a typo

kariddi updated this revision to Diff 114490.Sep 9 2017, 9:46 AM

Corrected the wrong patch :P

sanjoy accepted this revision.Sep 10 2017, 12:37 PM

This patch LGTM.

I don't think the modification you stated in the description is safe, but thinking about this a bit, I think ScalarEvolution::forgetLoop is fundamentally broken. It only looks at IR level PHI nodes and their users and evicts them from the caches, but not all things that need to be evicted can be found this way.

I think we'll need some sort of a per Loop use-list to implement forgetLoop correctly (which should also speed it up, at the cost of using more memory). I'll give this a shot.

lib/Analysis/ScalarEvolution.cpp
6370

Use auto *.

6414

This can be LoopWorklist.append(CurrL->begin(), CurrL->end()).

This revision is now accepted and ready to land.Sep 10 2017, 12:37 PM
This revision was automatically updated to reflect the committed changes.

Thanks, addressed the nitpicks