That patch fixes https://bugs.llvm.org/show_bug.cgi?id=42219 bug.
Related review D62639.
For the noimplicitfloat mode, the compiler mustn't generate
floating-point code if it was not asked directly to do so.
This rule does not work with variable function arguments currently.
Though compiler correctly guards block of code, which copies xmm vararg parameters with a check for %al,
it does not protect spills for xmm registers. Thus, such spills generated in non-protected areas and could break code, which does not expect floating-point data.
The problem happens in -O0 optimization mode. With this optimization level
there is used FastRegisterAllocator, which spills virtual registers at basic block boundaries.
Register Allocator does not protect spills with additional control-flow modifications.
Thus to resolve that problem, it is suggested to not copy incoming physical
registers into virtual registers. Instead, store incoming physical xmm registers into the memory from scratch.
Another variant of this problem happens with high optimization modes with thunk functions.
At a high optimization level, the Greedy Register Allocator is used.
For the following test case(CodeGen/X86/musttail-varargs.ll) :
define void @f_thunk(i8* %this, ...) { ; Use va_start so that we exercise the combination. %ap = alloca [4 x i8*], align 16 %ap_i8 = bitcast [4 x i8*]* %ap to i8* call void @llvm.va_start(i8* %ap_i8) %fptr = call void(i8*, ...)*(i8*) @get_f(i8* %this) <<<<<<<<<<<<<<<<<<< musttail call void (i8*, ...) %fptr(i8* %this, ...) ret void }
f_thunk function needs to propagate all their parameters into %fptr. But, it needs to store/restore virtual registers
corresponding to incoming xmm registers around the call to get_f(). So the final code contains unprotected stores/restores for xmm registers. The solution for that case is also to avoid using virtual registers. Copy incoming physical xmm registers into the memory at the function entry. Restore them from memory right before tail-call jump instruction. New asm code for this case would look like this:
f_thunk: testb %al, %al je .LBB0_2 # %bb.1: movaps %xmm0, 96(%rsp) <<< store incoming xmm registers on to stack .LBB0_2: callq get_f testb %al, %al <<< check for existence of xmm varargs parameters je .LBB0_4 # %bb.3: movaps 96(%rsp), %xmm0 <<< restore xmm varargs parameters before tailcall jump. jmpq *%r11 # TAILCALL .LBB0_4: jmpq *%r11 # TAILCALL
This looks like a lot of very fragile pattern matching. I would greatly prefer it if we didn't have to do this.