Related issues: https://bugs.llvm.org/show_bug.cgi?id=28431 https://bugs.llvm.org/show_bug.cgi?id=27190 https://bugs.llvm.org/show_bug.cgi?id=21183 and the original issue that brought me here https://github.com/JuliaLang/julia/issues/17288
Presumably there is a related apple issue at rdar://problem/8007500, since a related work-around refers to it, but I don't work at apple, and the linked issue is not open-access.
Quentin Colombet posted the workaround/hint in 2014, in https://bugs.llvm.org/show_bug.cgi?id=21183: Disable join-liveintervals.
Here, I just formalized his tip (only apply in ExposesReturnsTwice functions). That alone fixes the bug in my reproducers.
Next I went over the other passes that handle this, and noted the old workaround in StackSlotColoring.cpp. Extrapolating from that, I suspect that StackSlotColoring.cpp is no better at handling the problem, so I disabled that out of an overabundance of caution.
The final change is to the inlining heuristic: Inlining perfectly fine code into an ExposesReturnsTwice context is a very bad idea until we get much better at optimizing and not miscompiling code surrounding setjmp. Hence that change. A further advantage is that we should trigger the bug less often: It is related to register spilling, and less inlined code often means less register spills.
The changes are pretty unprincipled. But we have been miscompiling setjmp code for 6 years, and nobody qualified seems motivated to fix the underlying issues in the machine-optimizer passes. So I think a temporary band-aid is warranted. I am not qualified ;)
I compared below example to gcc and msvc: MSVC emits very ugly risk-free correct code, by creating tons of volatile loads/stores. GCC generates super nice correct code; but I am not well-acquainted enough with the gcc code-base to understand what they are doing, and whether we could do the same.
The following reproducer is adapted from https://bugs.llvm.org/show_bug.cgi?id=28431
#include <setjmp.h> #include <stdio.h> #include <stdlib.h> extern jmp_buf env; extern int ff(int v); extern int gg(); extern void dojump(); __attribute__((noinline)) int f(int a) { printf("pre longjump: a = %d\n", a); int b = gg(); int c = gg(); int d = gg(); int e = gg(); int f = gg(); int g = gg(); int h = gg(); int i = gg(); double k = ff(b) + ff(c + ff(d + ff(e + ff(f + ff(g + ff(h + i)))))); k *= b; k -= c; k += i; if (setjmp(env) == 0) { printf("Longjump set: a+4 = %d\n", a + 4); dojump(); b = gg(); c = gg(); d = gg(); e = gg(); f = gg(); g = gg(); h = gg(); i = gg(); k = ff(b) + ff(c + ff(d + ff(e + ff(f + ff(g + ff(h + i)))))); k *= b; k -= c; k += i; printf("%d\n", a + 4); b = gg(); c = gg(); d = gg(); e = gg(); f = gg(); g = gg(); h = gg(); i = gg(); k = ff(b) + ff(c + ff(d + ff(e + ff(f + ff(g + ff(h + i)))))); k *= b; k -= c; k += i; printf("%d\n", a + 4); b = gg(); c = gg(); d = gg(); e = gg(); f = gg(); g = gg(); h = gg(); i = gg(); k = ff(b) + ff(c + ff(d + ff(e + ff(f + ff(g + ff(h + i)))))); k *= b; k -= c; k += i; printf("%d\n", a + 4); printf("%f\n", k); } else { printf("Returned from Longjump: a = %d\n", a); } return -1; } int main() { f(0); return 0; }
miscompiles with clang -O3, which becomes evident when linked against
#include <setjmp.h> jmp_buf env; int ff(int v){return 0;}; int gg(){return 0;}; void dojump(){longjmp(env, 1);}
I get output:
pre longjump: a = 0 Longjump set: a+4 = 4 Returned from Longjump: a = 4
I don't have proper regression tests yet. Could anyone comment on how to proceed on that (first PR in llvm)? Since this is a machine-IR issue one could just make a .ll fixture, but I'm not sure how to make that reliable.