Currently, CGP will get rid of mostly empty blocks by merging them with their successors. However, in a case where a block BB simply branches unconditionally to a block that has no code other than a return, it can be helpful if BB just returned since shrink-wrapping might have another candidate for the epilogue.
Motivating test case:
; Function Attrs: noinline nounwind readnone define signext i32 @callee(i32 signext %a, i32 signext %lim) local_unnamed_addr #0 { entry: %cmp5 = icmp sgt i32 %lim, 0 br i1 %cmp5, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry br label %for.body for.cond.cleanup: ; preds = %for.body, %entry %Ret.0.lcssa = phi i32 [ 0, %entry ], [ %0, %for.body ] ret i32 %Ret.0.lcssa for.body: ; preds = %for.body.preheader, %for.body %i.07 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ] %Ret.06 = phi i32 [ %0, %for.body ], [ 0, %for.body.preheader ] %0 = tail call i32 asm "add $0, $1, $2", "=r,r,r,~{r14},~{r15},~{r16},~{r17},~{r18},~{r19},~{r20},~{r21},~{r22},~{r23},~{r24},~{r25},~{r26},~{r27},~{r28},~{r29},~{r30},~{r31}"(i32 %a, i32 %Ret.06) %inc = add nuw nsw i32 %i.07, 1 %exitcond = icmp eq i32 %inc, %lim br i1 %exitcond, label %for.cond.cleanup, label %for.body
IR into Codegen Prepare:
; Function Attrs: noinline nounwind readnone define signext i32 @caller(i32 signext %a, i32 signext %lim) local_unnamed_addr #0 { entry: %cmp5 = icmp sgt i32 %lim, 0 br i1 %cmp5, label %for.body.preheader, label %for.cond.cleanup for.body.preheader: ; preds = %entry br label %for.body for.cond.cleanup.loopexit: ; preds = %for.body %.lcssa = phi i32 [ %0, %for.body ] br label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry %Ret.0.lcssa = phi i32 [ 0, %entry ], [ %.lcssa, %for.cond.cleanup.loopexit ] ret i32 %Ret.0.lcssa for.body: ; preds = %for.body, %for.body.preheader %lsr.iv = phi i32 [ %lsr.iv.next, %for.body ], [ %lim, %for.body.preheader ] %Ret.06 = phi i32 [ %0, %for.body ], [ 0, %for.body.preheader ] %0 = tail call i32 asm "add $0, $1, $2", "=r,r,r,~{r14},~{r15},~{r16},~{r17},~{r18},~{r19},~{r20},~{r21},~{r22},~{r23},~{r24},~{r25},~{r26},~{r27},~{r28},~{r29},~{r30},~{r31}"(i32 %a, i32 %Ret.06) #3, !srcloc !3 %lsr.iv.next = add i32 %lsr.iv, -1 %exitcond = icmp eq i32 %lsr.iv.next, 0 br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body }
Block for.cond.cleanup.loopexit can be used for the epilogue if flow has gone through for.body whereas block for.cond.cleanup can be used for the epilogue if flow never went into the loop.
The patch certainly increases the amount of shrink-wrapping we do. I'm currently running spec on PPC and will post the net change once the results are ready.
I initially meant to add a target hook to see if this is desired by the target, but decided against it as I don't see an issue with always doing so. Depending on the review comments, I can either remove this comment or re-add the target hook.