When branching to a block that immediately tail calls, it is possible to fold the call directly into the branch if the call is direct and there is no stack adjustment, saving one byte.
Example:
define void @f(i32 %x, i32 %y) { entry: %p = icmp eq i32 %x, %y br i1 %p, label %bb1, label %bb2 bb1: tail call void @foo() ret void bb2: tail call void @bar() ret void }
before:
f: movl 4(%esp), %eax cmpl 8(%esp), %eax jne .LBB0_2 jmp foo .LBB0_2: jmp bar
after:
f: movl 4(%esp), %eax cmpl 8(%esp), %eax jne bar .LBB0_1: jmp foo
I don't expect any significant size savings from this (on a Clang bootstrap I saw 288 bytes), but it does make the code a little tighter.
This patch only does 32-bit, but 64-bit would work similarly.
Perhaps add an assert that we never get here?