InstCombine removes pairs of start+end intrinsics that don't have anything in between them. Currently this is done by starting at the start intrinsic and scanning forwards. This patch changes it to start at the end intrinsic and scan backwards.
The motivation here is as follows: When we process the start intrinsic, we have not yet looked at the following instructions, which may still get folded/removed. If they do, we will only be able to remove the start/end pair on the next iteration. When we process the end intrinsic, all the instructions before it have already been visited, and we don't run into this problem.
The highlighted test case drops from 4 to 2 iterations, because we can pick up all four start/end pairs in one pass, instead of doing one on each iteration.
Add a comment to explain why we're using the end instruction and reverse_iterator (can copy/adapt text from this patch description).