Recently we run into a case with compile time problem (27 mins in O3 but a few seconds in O0). In that case, we have a huge block containing many stores. Each time a store is visited, dagcombiner will find a candidate set all chained by the same root node for the store, then try to merge the nodes in the candidate set into a wider store. This is a quadratic algorithm, because for the same candidate set, everytime a store inside of the set is visited, the same set will be considered for merging.
In this patch, we add a map to record how many times the pair of root node and the store node is seen before a store node is added into candidate set. If the count is above a threshold (set to 10 by default now), the store node won't be added to candidate set again. With the patch, the compile time drops to 20 seconds.
I verify that the patch won't miss store merging opportunities in most cases in the way that by building a large internal server application and bootstrapping clang w/wo the patch, I find no difference for the binaries.
Visited.size() and maxsteps are in scope for any call of the help. Just do the comparision at the call point instead of adding an argument.