Do the cloning in two steps, first allocate all the new loops, then clone the basic blocks in the same order as the original loop.
Do you have any kind of test case? I suppose that you can check the output in a test in test/Analysis/LoopInfo to show the ordering?
Is this logic equivalent? The original code is going a pre-order traversal, and for each loop, it excludes blocks in an inner loop (when CurLoop != LI->getLoopFor(BB)). Here we're doing something for all blocks in the loop including those also in inner loops?
The resulting cloned basic blocks will be the same, which are all the blocks in OrigLoop. The different is the order of the basic block. Example:
OuterHeader: br InnerHeader InnerHeader: br InnerHeader or OuterLatch OuterLatch: br OuterHeader or OuterExit
Output before this change:
ClonedOuterHeader: br ClonedInnerHeader ClonedOuterLatch: br ClonedOuterHeader or ClonedOuterExit ClonedInnerHeader: br ClonedInnerHeader or ClonedOuterLatch
Output after this change:
ClonedOuterHeader: br ClonedInnerHeader ClonedInnerHeader: br ClonedInnerHeader or ClonedOuterLatch ClonedOuterLatch: br ClonedOuterHeader or ClonedOuterExit
FYI - this function is extended by me in https://reviews.llvm.org/rG7c1deeff4a67296654823a871fea5c1a2aef3b8a to support cloning a loop nest instead of only allow the innermost loop. But I didn't think properly about the ordering of the basic block at that time.