Related issues: https://bugs.llvm.org/show_bug.cgi?id=28431 https://bugs.llvm.org/show_bug.cgi?id=27190 https://bugs.llvm.org/show_bug.cgi?id=21183 and the original issue that brought me here https://github.com/JuliaLang/julia/issues/17288
Presumably there is a related apple issue at rdar://problem/8007500, since a related work-around refers to it, but I don't work at apple, and the linked issue is not open-access.
Quentin Colombet posted the workaround/hint in 2014, in https://bugs.llvm.org/show_bug.cgi?id=21183: Disable join-liveintervals.
Here, I just formalized his tip (only apply in ExposesReturnsTwice functions). That alone fixes the bug in my reproducers.
Next I went over the other passes that handle this, and noted the old workaround in StackSlotColoring.cpp. Extrapolating from that, I suspect that StackSlotColoring.cpp is no better at handling the problem, so I disabled that out of an overabundance of caution.
The final change is to the inlining heuristic: Inlining perfectly fine code into an ExposesReturnsTwice context is a very bad idea until we get much better at optimizing and not miscompiling code surrounding setjmp. Hence that change. A further advantage is that we should trigger the bug less often: It is related to register spilling, and less inlined code often means less register spills.
The changes are pretty unprincipled. But we have been miscompiling setjmp code for 6 years, and nobody qualified seems motivated to fix the underlying issues in the machine-optimizer passes. So I think a temporary band-aid is warranted. I am not qualified ;)
I compared below example to gcc and msvc: MSVC emits very ugly risk-free correct code, by creating tons of volatile loads/stores. GCC generates super nice correct code; but I am not well-acquainted enough with the gcc code-base to understand what they are doing, and whether we could do the same.
The following reproducer is adapted from https://bugs.llvm.org/show_bug.cgi?id=28431
#include <setjmp.h>
#include <stdio.h>
#include <stdlib.h>
extern jmp_buf env;
extern int ff(int v);
extern int gg();
extern void dojump();
__attribute__((noinline)) int f(int a)
{
    printf("pre longjump: a = %d\n", a);
    int b = gg();
    int c = gg();
    int d = gg();
    int e = gg();
    int f = gg();
    int g = gg();
    int h = gg();
    int i = gg();
    double k = ff(b) + ff(c + ff(d + ff(e + ff(f + ff(g + ff(h + i))))));
    k *= b;
    k -= c;
    k += i;
    if (setjmp(env) == 0) {
        printf("Longjump set: a+4 = %d\n", a + 4);
        dojump();
        b = gg();
        c = gg();
        d = gg();
        e = gg();
        f = gg();
        g = gg();
        h = gg();
        i = gg();
        k = ff(b) + ff(c + ff(d + ff(e + ff(f + ff(g + ff(h + i))))));
        k *= b;
        k -= c;
        k += i;
        printf("%d\n", a + 4);
        b = gg();
        c = gg();
        d = gg();
        e = gg();
        f = gg();
        g = gg();
        h = gg();
        i = gg();
        k = ff(b) + ff(c + ff(d + ff(e + ff(f + ff(g + ff(h + i))))));
        k *= b;
        k -= c;
        k += i;
        printf("%d\n", a + 4);
        b = gg();
        c = gg();
        d = gg();
        e = gg();
        f = gg();
        g = gg();
        h = gg();
        i = gg();
        k = ff(b) + ff(c + ff(d + ff(e + ff(f + ff(g + ff(h + i))))));
        k *= b;
        k -= c;
        k += i;
        printf("%d\n", a + 4);
        printf("%f\n", k);
        
    }
    else {
        printf("Returned from Longjump: a = %d\n", a);
    }
    return -1;
}
int main()
{
    f(0);
    return 0;
}miscompiles with clang -O3, which becomes evident when linked against
#include <setjmp.h>
jmp_buf env;
int ff(int v){return 0;};
int gg(){return 0;};
void dojump(){longjmp(env, 1);}I get output:
pre longjump: a = 0 Longjump set: a+4 = 4 Returned from Longjump: a = 4
I don't have proper regression tests yet. Could anyone comment on how to proceed on that (first PR in llvm)? Since this is a machine-IR issue one could just make a .ll fixture, but I'm not sure how to make that reliable.