When FastISel fails to translate an instruction it hands off code generation to SelectionDAG. Before it does so, it may have generated local value instructions to feed phi nodes in successor blocks. These instructions will then be generated again by SelectionDAG, causing duplication and less efficient code, including extra spill instructions.
Consider the following example:
define zeroext i1 @_Z3fooee(x86_fp80 %x, x86_fp80 %y) {
entry:
%x.addr = alloca x86_fp80, align 16 %y.addr = alloca x86_fp80, align 16 store x86_fp80 %x, x86_fp80* %x.addr, align 16 store x86_fp80 %y, x86_fp80* %y.addr, align 16 %0 = load x86_fp80, x86_fp80* %x.addr, align 16 %1 = load x86_fp80, x86_fp80* %y.addr, align 16 %cmp = fcmp oeq x86_fp80 %0, %1 br i1 %cmp, label %lor.end, label %lor.rhs
lor.rhs: ; preds = %entry
%2 = load x86_fp80, x86_fp80* %x.addr, align 16 %call = call zeroext i1 @_Z3bare(x86_fp80 %2) br label %lor.end
lor.end: ; preds = %lor.rhs, %entry
%3 = phi i1 [ true, %entry ], [ %call, %lor.rhs ] ret i1 %3
}
FastISel fails to translate one of the instructions in the entry block and leaves the rest of code generation to SelectionDAG. However, it fails to remove the instructions it generated to supply the phi node in the lor.end block:
... subq $64, %rsp fldt 32(%rbp) fldt 16(%rbp) movb $1, %al <======= fstpt -16(%rbp) fld %st(0) fstpt -32(%rbp) fldt -16(%rbp) movb $1, %cl <======= fucompi %st(1) fstp %st(0) movb %al, -33(%rbp) # 1-byte Spill <======== not used later movb %cl, -34(%rbp) # 1-byte Spill <======== used jne .LBB0_1 jp .LBB0_1 jmp .LBB0_2
.LBB0_1:
...
The patch proposes to remove all phi-node handling instructions as dead code when FastISel quits.