Recently we run into a case with compile time problem (27 mins in O3 but a few seconds in O0). In that case, we have a huge block containing many stores. Each time a store is visited, dagcombiner will find a candidate set all chained by the same root node for the store, then try to merge the nodes in the candidate set into a wider store. This is a quadratic algorithm, because for the same candidate set, everytime a store inside of the set is visited, the same set will be considered for merging.
In this patch, we add a map to record how many times the pair of root node and the store node is seen before a store node is added into candidate set. If the count is above a threshold (set to 10 by default now), the store node won't be added to candidate set again. With the patch, the compile time drops to 20 seconds.
I verify that the patch won't miss store merging opportunities in most cases in the way that by building a large internal server application and bootstrapping clang w/wo the patch, I find no difference for the binaries.