While working on a project I wound up generating a fairly large lookup table (10k entries) of callbacks inside of a static constructor. Clang was taking upwards of ~10 minutes to compile the lookup table. I generated a smaller test case (http://www.inolen.com/static_initializer_test.ll) that, after running with -ftime-report, pointed fingers at GlobalOpt and MemCpyOptimizer.
Running memcpyopt through opt accounted for around ~1 minute. The main culprit was MemCpyOptimizer insertion sorting the ranges as it discovered them in tryMergingIntoMemset. I've changed this up such that ranges are always appended to the list, and once they've all been added they're sorted and merged (n log n vs n^2).
I'm not really sure who to tag as a reviewer, Lang mentioned that Chandler may be appropriate.
Optional: consider hoisting "Ranges.end()" out to "range_iterator E = Ranges.end;". See http://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop .