The LSR pass can generate an undesired pattern which hurts tail predication in some cases.
However, fixing LSR directly isn't easy and could break other targets, so instead of changing LSR, we decided to fix it in the MVETailPredicationPass.
This patch improves the MVETailPredication pass so it can detect the undesirable pattern and rewrite it in a tail-predication-friendly form.
Here is an example of the IR that LSR can generate
loopbody: %lsr.iv = phi i32 [ %lsr.iv.next, %loopbody ], [ %42, %pred ] %44 = add i32 %lsr.iv, -4 %45 = call <4 x i1> @llvm.arm.mve.vctp32(i32 %44) #5 ; ... etc %lsr.iv.next = add nsw i32 %lsr.iv, -4
That can't be tail-predicated because the VCTP's operand is defined inside the loop, so this patch will rewrite it like this:
pred: %44 = add i32 %42, -4 loopbody: %lsr.iv = phi i32 [ %lsr.iv.next, %loopbody ], [ %42, %pred ] %lsr.fixed = phi i32 [ %lsr.iv.next, %loopbody ], [ %44, %pred ] %45 = call <4 x i1> @llvm.arm.mve.vctp32(i32 %lsr.fixed) ; ... etc %lsr.iv.next = add nsw i32 %lsr.iv, -4
That IR is functionally the same as before, but causes no issue for tail-predication.
I think getLoopPreheader() would be preferable.