The PhaseOrdering test example is based on:
The test goes over the complexity cliff somewhere between -O2 and -O3 (more unrolling is done at -O3).
The initial assume is created by SimplifyCFG, multiplied by the LoopVectorizer, and then copied across numerous clone blocks by LoopUnroll. LoopUnroll then calls SimplifyInstruction() which calls simplifyICmpWithDominatingAssume() and we seem to be spending all of our time in there computing known bits, etc.
It seems unlikely that we have much to gain by cloning assumes late in the opt pipeline, so I'm proposing to stub that out when called during LoopUnroll.