FindBetterNeighborChains simulateanously improves the chain
dependencies of a chain of related stores avoiding the generation of
extra token factors. For chains longer than the GatherAllAliasDepths,
stores further down in the chain will necessarily fail, a potentially
significant waste and preventing otherwise trivial parallelization.
This patch directly parallelize the chains of stores before improving
each store. This generally improves DAG-level parallelism.

I think readability would be much better if this was split a bit:
bool DAGCombiner::findBetterNeighborChains(StoreSDNode *St) { if (OptLevel == CodeGenOpt::None) return false; // This holds the base pointer, index, and the offset in bytes from the base // pointer. BaseIndexOffset BasePtr = BaseIndexOffset::match(St, DAG); // We must have a base and an offset. if (!BasePtr.getBase().getNode()) return false; // Do not handle stores to undef base pointers. if (BasePtr.getBase().isUndef()) return false; // First try to merge chained stores. StoreSDNode *STChain = St; SmallVector<StoreSDNode *, 8> ChainedStores = findChainedStores(STChain, BasePtr); if (ChainedStores.size() > 0) { mergeChainedStores(ChainedStores); return true; } // Improve St's Chain.. SDValue BetterChain = FindBetterChain(St, St->getChain()); if (St->getChain() != BetterChain) { replaceStoreChain(St, BetterChain); return true; } return false; }(maybe even merge findChainedStores into mergeChainedStores)