This is an archive of the discontinued LLVM Phabricator instance.

[WIP] [RISCV] Emit cfi directives for function epilogue
AbandonedPublic

Authored by xgupta on Oct 10 2021, 9:08 PM.

Details

Reviewers
luismarques
asb
Summary

This patch attempt to fix https://bugs.llvm.org/show_bug.cgi?id=51864. Which is related to debugging of a binary. This patch is actually reverted https://reviews.llvm.org/D69723 with a minor change.

Diff Detail

Event Timeline

xgupta created this revision.Oct 10 2021, 9:08 PM
xgupta updated this revision to Diff 378564.Oct 10 2021, 9:29 PM

minor fix

xgupta updated this revision to Diff 381303.Oct 21 2021, 9:43 AM

update one testcase

xgupta edited the summary of this revision. (Show Details)Oct 21 2021, 9:51 AM
xgupta added a reviewer: luismarques.

IIUC the issue with previous version that .cfi_def_cfa_offset 0 should be .cfi_def_cfa_offset 16? If that is still the case I will try to fix this.
I have one question: - By applying this patch 304 test cases are failing. I only corrected one test case. Is there any script to update other test cases or do they need to be manually updated?

xgupta published this revision for review.Oct 21 2021, 9:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 21 2021, 9:58 AM

As I told you in various bug reports, the issue is that LLVM's CFI epilogue directives are utterly broken when shrink-wrapping is used. Please go re-read https://reviews.llvm.org/D69723 for the rationale behind reverting the original commit.

Thanks for letting me know on various reports :). But I completely don't know what is shrink wrapping. I will read it more. My only understanding it as LLVM x86 is emitting these CFI directives so there should be no problem in RISCV. If shrink wrapping is the problem then we should revert that patch: https://reviews.llvm.org/D62190. If .cfi_restore ra is the problem then does that need to be fixed?
I only want watchpoints to work correctly in that binary. If it is not possible here, I think LLVM developers recommending me to "go" and add support for hardware watchpoint in RISCV gdb port. Is that correct?

xgupta updated this revision to Diff 381713.Oct 22 2021, 11:45 PM
xgupta edited the summary of this revision. (Show Details)

update few more testcase

xgupta retitled this revision from [RISCV] Emit cfi directives for function epilogue to [WIP] [RISCV] Emit cfi directives for function epilogue.Oct 22 2021, 11:47 PM
xgupta abandoned this revision.Oct 26 2021, 3:06 AM

Hi @HsiangKai, I want to know how you have written 12877+ lines of test cases such as in https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/RISCV/rvv/vluxseg-rv32.ll . I need to update them in this patch to pass check-llvm. It is some script? can you please tell me how to use that.

Hi @HsiangKai, I want to know how you have written 12877+ lines of test cases such as in https://github.com/llvm/llvm-project/blob/main/llvm/test/CodeGen/RISCV/rvv/vluxseg-rv32.ll . I need to update them in this patch to pass check-llvm. It is some script? can you please tell me how to use that.

Read the very first line:

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py

Almost every test in llvm/test/CodeGen/RISCV uses that, you should rarely have to update them manually (I hope you didn't do so for all these tests here... if you did, please re-run the script on all of them, because the script can include things a human might not, e.g. you can see comments on many of the lines that technically aren't needed, but using the script ensures that when someone re-runs it they don't see spurious diffs).

That is:

utils/update_llc_test_checks.py --llc-binary build/bin/llc -u `find llvm/test/CodeGen/RISCV \( -name '*.ll' -o -name '*.mir' \)`
utils/update_mir_test_checks.py --llc-binary build/bin/llc -u `find llvm/test/CodeGen/RISCV \( -name '*.ll' -o -name '*.mir' \)`
xgupta updated this revision to Diff 383026.Oct 28 2021, 7:15 AM

update all test cases

That is:

utils/update_llc_test_checks.py --llc-binary build/bin/llc -u `find llvm/test/CodeGen/RISCV \( -name '*.ll' -o -name '*.mir' \)`
utils/update_mir_test_checks.py --llc-binary build/bin/llc -u `find llvm/test/CodeGen/RISCV \( -name '*.ll' -o -name '*.mir' \)`

Thanks, I think it is not mention in the docs, I will write something about it in TestingGuide.rst .

luke957 resigned from this revision.Oct 28 2021, 7:32 AM

So sorry for my bad herald script.

xgupta edited reviewers, added: asb; removed: luke957.Oct 28 2021, 8:54 AM

Hi @luismarques, Do you have any comments here? Or what happens(break) when this CFI directive is emitted and shrink wrapping is used, an example? Stack unwinding problem, still it is here, what it needed to be fixed, maybe some hint.

I think this problem needs to be solved soon, and I am not an expert here to fix it by myself.

xgupta added a comment.Nov 8 2021, 7:12 AM

Hello @luismarques,

can you please tell me how you have generated this assembly mentioned in the commit summary(.c file)?

define void @branch_and_tail_call(i1 %a) {
; RV32-LABEL: branch_and_tail_call:
; RV32:       # %bb.0:
; RV32-NEXT:    addi sp, sp, -16
; RV32-NEXT:    .cfi_def_cfa_offset 16
; RV32-NEXT:    sw ra, 12(sp)
; RV32-NEXT:    .cfi_offset ra, -4
; RV32-NEXT:    andi a0, a0, 1
; RV32-NEXT:    beqz a0, .LBB2_2
; RV32-NEXT:  # %bb.1: # %blue_pill
; RV32-NEXT:    lw ra, 12(sp)
; RV32-NEXT:    .cfi_restore ra
; RV32-NEXT:    addi sp, sp, 16
; RV32-NEXT:    .cfi_def_cfa_offset 0
; RV32-NEXT:    tail foo
; RV32-NEXT:  .LBB2_2: # %red_pill
; RV32-NEXT:    call bar
; RV32-NEXT:    lw ra, 12(sp)
; RV32-NEXT:    .cfi_restore ra
; RV32-NEXT:    addi sp, sp, 16
; RV32-NEXT:    .cfi_def_cfa_offset 0
; RV32-NEXT:    ret
}

For instance, without this patch the following program will not correctly unwind, and will get stuck (compile with clang, link with a gnu toolchain, test with qemu):

void three() {
    throw 7;
}

void two(void) {
    try {
        three();
    } catch(int &c) {
        throw 42;
    }
}

int main() {
    try {
        two();
    }
    catch(...) {
    }
    return 0;
}

When you say it get stuck, can you please tell me what steps you have run?

PS I want to revert this patch to fix https://bugs.llvm.org/show_bug.cgi?id=51864 in https://reviews.llvm.org/D111519.

xgupta added a comment.Nov 8 2021, 7:13 AM

Oops, commented on wrong review.

jrtc27 added a comment.Nov 8 2021, 7:22 AM

Hello @luismarques,

can you please tell me how you have generated this assembly mentioned in the commit summary(.c file)?

define void @branch_and_tail_call(i1 %a) {
; RV32-LABEL: branch_and_tail_call:
; RV32:       # %bb.0:
; RV32-NEXT:    addi sp, sp, -16
; RV32-NEXT:    .cfi_def_cfa_offset 16
; RV32-NEXT:    sw ra, 12(sp)
; RV32-NEXT:    .cfi_offset ra, -4
; RV32-NEXT:    andi a0, a0, 1
; RV32-NEXT:    beqz a0, .LBB2_2
; RV32-NEXT:  # %bb.1: # %blue_pill
; RV32-NEXT:    lw ra, 12(sp)
; RV32-NEXT:    .cfi_restore ra
; RV32-NEXT:    addi sp, sp, 16
; RV32-NEXT:    .cfi_def_cfa_offset 0
; RV32-NEXT:    tail foo
; RV32-NEXT:  .LBB2_2: # %red_pill
; RV32-NEXT:    call bar
; RV32-NEXT:    lw ra, 12(sp)
; RV32-NEXT:    .cfi_restore ra
; RV32-NEXT:    addi sp, sp, 16
; RV32-NEXT:    .cfi_def_cfa_offset 0
; RV32-NEXT:    ret
}

This is clearly the branch_and_tail_call test case in frame-info.ll that had the ; FIXME: fix use of .cfi_restore with wrong CFAs.

For instance, without this patch the following program will not correctly unwind, and will get stuck (compile with clang, link with a gnu toolchain, test with qemu):

void three() {
    throw 7;
}

void two(void) {
    try {
        three();
    } catch(int &c) {
        throw 42;
    }
}

int main() {
    try {
        two();
    }
    catch(...) {
    }
    return 0;
}

When you say it get stuck, can you please tell me what steps you have run?

I'm going to assume as he said: "compile with clang, link with a gnu toolchain, test with qemu". Presumably the bogus DWARF leads it to get stuck in an unwind loop, with an early prologue meaning the call site is incorrectly (by correctly decoding the DWARF) determined to be a leaf, but RA is the call site itself, not the caller, so every iteration does nothing.

PS I want to revert this patch to fix https://bugs.llvm.org/show_bug.cgi?id=51864 in https://reviews.llvm.org/D111519.

I'm getting tired of saying this, but the underlying target-independent deficiency in LLVM needs fixing before you can think about reverting this.

xgupta added a comment.EditedNov 14 2021, 9:40 AM

Hello @luismarques,

can you please tell me how you have generated this assembly mentioned in the commit summary(.c file)?

define void @branch_and_tail_call(i1 %a) {
; RV32-LABEL: branch_and_tail_call:
; RV32:       # %bb.0:
; RV32-NEXT:    addi sp, sp, -16
; RV32-NEXT:    .cfi_def_cfa_offset 16
; RV32-NEXT:    sw ra, 12(sp)
; RV32-NEXT:    .cfi_offset ra, -4
; RV32-NEXT:    andi a0, a0, 1
; RV32-NEXT:    beqz a0, .LBB2_2
; RV32-NEXT:  # %bb.1: # %blue_pill
; RV32-NEXT:    lw ra, 12(sp)
; RV32-NEXT:    .cfi_restore ra
; RV32-NEXT:    addi sp, sp, 16
; RV32-NEXT:    .cfi_def_cfa_offset 0
; RV32-NEXT:    tail foo
; RV32-NEXT:  .LBB2_2: # %red_pill
; RV32-NEXT:    call bar
; RV32-NEXT:    lw ra, 12(sp)
; RV32-NEXT:    .cfi_restore ra
; RV32-NEXT:    addi sp, sp, 16
; RV32-NEXT:    .cfi_def_cfa_offset 0
; RV32-NEXT:    ret
}

This is clearly the branch_and_tail_call test case in frame-info.ll that had the ; FIXME: fix use of .cfi_restore with wrong CFAs.

I am still looking for .cpp file so that I can test/diff it (clang riscv assembly) with gcc riscv generated assembly. Side question if someone knows why clang riscv emits .cfi_restore in epilogue (for frame-info.ll after applying this patch) while clang x86 does not. What is the use of .cf_restore in riscv that is not needed in x86? I have read .cfi_restore directive in https://sourceware.org/binutils/docs/as/CFI-directives.html but definition didn't give the answer.

xgupta added a comment.EditedNov 14 2021, 8:49 PM

I have taken this example from https://reviews.llvm.org/D69723 to understand the problem with stack unwinding.

cat test.cpp

void three() {
    throw 7;
}

void two(void) {
    try {
        three();
    } catch(int &c) {
        throw 42;
    }
}

int main() {
    try {
        two();
    }
    catch(...) {
    }
    return 0;
}

If I see the output of the generated assembly in both GCC RISCV and Clang RISCV cases, differences can be seen in three directive lines sequence in epilogue.
clang RISCV epilogue:

.cfi_def_cfa sp, 48
 ld      ra, 40(sp)            
.cfi_restore ra
.cfi_restore s0

vs

GCC RISCV epilogue where 1= ra and 8= s0:

.cfi_restore 1
ld      s0,0(sp)
.cfi_restore 8
.cfi_def_cfa 2, 16

So I will assume these three lines need to be corrected to match GCC and fix the issue.

jrtc27 added a comment.EditedNov 14 2021, 10:41 PM

No. That is -O0 code and won't exhibit the issue, because the issue happens due to optimisation. Nor is main the function I believe to be the problem, since once you've unwound to main (which depends on the callee to have correct CFI, not the caller) there's no more unwinding to do, either implicitly (since the exception has been caught) or explicitly (since main itself performs no further actions that will request unwinding).

See https://godbolt.org/z/5xvTY9EKf. Observe that two has had its epilogue scheduled before its landing pad (corresponding to the call site that is the call to three):

.Ltmp10:
.Ltmp4:
        .loc    1 15 1                          # example.cpp:15:1
        ld      s0, 0(sp)                       # 8-byte Folded Reload
        # after this patch: .cfi_def_cfa sp, 16
        ld      ra, 8(sp)                       # 8-byte Folded Reload
        # after this patch: .cfi_restore ra
        # after this patch: .cfi_restore s0
        addi    sp, sp, 16
        # after this patch: .cfi_def_cfa_offset 0
        ret
.LBB1_2:
.Ltmp5:
.Ltmp11:
        .loc    1 15 1 is_stmt 0                # example.cpp:15:1
        sext.w  a1, a1
        addi    a2, zero, 1
        mv      s0, a0
        .loc    1 12 5 is_stmt 1                # example.cpp:12:5
        bne     a1, a2, .LBB1_6
        mv      a0, s0
        call    __cxa_begin_catch
...

This means that, if you have CFI in the epilogue as with this patch (annotated above with what I believe would be emitted), it is bogusly applied to the call to __cxa_throw (for throwing the 42 in the catch).

Now let's consider how that unwinds (during the search phase). The runtime and unwinder will unwind fine all the way back to the call to __cxa_throw (specifically the byte after; all the range checks are start < x <= end not the normal start <= x < end since jalr writes the next instruction to the link register) during the search phase. Once there, the runtime will ask the unwinder to unwind another frame, as there is no exception handler there for an int. But the unwinder, trusting the CFI which says ra has already been restored (and s0), will leave ra (and s0) untouched, so after one iteration of unwinding ra still points at the byte after two's call to __cxa_throw. So it sees if there's an exception handler there for an int, and the answer is still no. So it unwinds again, trusting the CFI, and again leaves ra untouched, which again still points at the byte after the call to __cxa_throw. And so on and so on, forever trying to unwind out of two to find an exception handler but stuck in it with no escape. It's possible I've got the details slightly wrong, but they aren't really the point, the general idea of how this breaks is, and why it is crucial that all C++ call sites have CFI that will unwind accurately if followed.

GCC's code for the same sequence is:

.LVL2:
.LEHE0:
        .loc 1 15 1
        ld      ra,8(sp)
        .cfi_remember_state
        .cfi_restore 1
        ld      s0,0(sp)
        .cfi_restore 8
        addi    sp,sp,16
        .cfi_def_cfa_offset 0
        jr      ra
.L7:
        .cfi_restore_state
        .loc 1 12 7
        li      a5,1
        bne     a1,a5,.L11
.LBB2:
        .loc 1 12 18 discriminator 1
        call    __cxa_begin_catch
...

This is almost the same CFI as what this patch produces (GCC interleaves the .cfi_restores with the corresponding loads but that doesn't really matter, so long as the stack slot and stack pointer aren't clobbered it's valid to defer a .cfi_restore) with the key difference that, because the epilogue has been scheduled before the landing pad, a .cfi_remember_state is emitted before any other CFI directives in the epilogue and a .cfi_restore_state is emitted as the first thing in the landing pad, restoring the state to the remembered state, namely that before the epilogue, and thus ensuring that the unwinder sees CFI that says ra is still stored on the stack (along with s0). That is the crucial part missing in LLVM, and the reason that we cannot emit CFI in epilogues.

(In case you're looking at main's assembly and notice that it too has its epilogue scheduled before its landing pad, yes that results in incorrect CFI for the landing pad, but no it doesn't matter in this case unless you're debugging, since all it does is call __cxa_begin_catch and __cxa_end_catch that do reference counting of the exception object behind the scenes, they don't need to unwind)

No. That is -O0 code and won't exhibit the issue, because the issue happens due to optimisation. Nor is main the function I believe to be the problem, since once you've unwound to main (which depends on the callee to have correct CFI, not the caller) there's no more unwinding to do, either implicitly (since the exception has been caught) or explicitly (since main itself performs no further actions that will request unwinding).

See https://godbolt.org/z/5xvTY9EKf. Observe that two has had its epilogue scheduled before its landing pad (corresponding to the call site that is the call to three):

.Ltmp10:
.Ltmp4:
        .loc    1 15 1                          # example.cpp:15:1
        ld      s0, 0(sp)                       # 8-byte Folded Reload
        # after this patch: .cfi_def_cfa sp, 16
        ld      ra, 8(sp)                       # 8-byte Folded Reload
        # after this patch: .cfi_restore ra
        # after this patch: .cfi_restore s0
        addi    sp, sp, 16
        # after this patch: .cfi_def_cfa_offset 0
        ret
.LBB1_2:
.Ltmp5:
.Ltmp11:
        .loc    1 15 1 is_stmt 0                # example.cpp:15:1
        sext.w  a1, a1
        addi    a2, zero, 1
        mv      s0, a0
        .loc    1 12 5 is_stmt 1                # example.cpp:12:5
        bne     a1, a2, .LBB1_6
        mv      a0, s0
        call    __cxa_begin_catch
...

This means that, if you have CFI in the epilogue as with this patch (annotated above with what I believe would be emitted), it is bogusly applied to the call to __cxa_throw (for throwing the 42 in the catch).

Now let's consider how that unwinds (during the search phase). The runtime and unwinder will unwind fine all the way back to the call to __cxa_throw (specifically the byte after; all the range checks are start < x <= end not the normal start <= x < end since jalr writes the next instruction to the link register) during the search phase. Once there, the runtime will ask the unwinder to unwind another frame, as there is no exception handler there for an int. But the unwinder, trusting the CFI which says ra has already been restored (and s0), will leave ra (and s0) untouched, so after one iteration of unwinding ra still points at the byte after two's call to __cxa_throw. So it sees if there's an exception handler there for an int, and the answer is still no. So it unwinds again, trusting the CFI, and again leaves ra untouched, which again still points at the byte after the call to __cxa_throw. And so on and so on, forever trying to unwind out of two to find an exception handler but stuck in it with no escape. It's possible I've got the details slightly wrong, but they aren't really the point, the general idea of how this breaks is, and why it is crucial that all C++ call sites have CFI that will unwind accurately if followed.

GCC's code for the same sequence is:

.LVL2:
.LEHE0:
        .loc 1 15 1
        ld      ra,8(sp)
        .cfi_remember_state
        .cfi_restore 1
        ld      s0,0(sp)
        .cfi_restore 8
        addi    sp,sp,16
        .cfi_def_cfa_offset 0
        jr      ra
.L7:
        .cfi_restore_state
        .loc 1 12 7
        li      a5,1
        bne     a1,a5,.L11
.LBB2:
        .loc 1 12 18 discriminator 1
        call    __cxa_begin_catch
...

This is almost the same CFI as what this patch produces (GCC interleaves the .cfi_restores with the corresponding loads but that doesn't really matter, so long as the stack slot and stack pointer aren't clobbered it's valid to defer a .cfi_restore) with the key difference that, because the epilogue has been scheduled before the landing pad, a .cfi_remember_state is emitted before any other CFI directives in the epilogue and a .cfi_restore_state is emitted as the first thing in the landing pad, restoring the state to the remembered state, namely that before the epilogue, and thus ensuring that the unwinder sees CFI that says ra is still stored on the stack (along with s0). That is the crucial part missing in LLVM, and the reason that we cannot emit CFI in epilogues.

(In case you're looking at main's assembly and notice that it too has its epilogue scheduled before its landing pad, yes that results in incorrect CFI for the landing pad, but no it doesn't matter in this case unless you're debugging, since all it does is call __cxa_begin_catch and __cxa_end_catch that do reference counting of the exception object behind the scenes, they don't need to unwind)

my command: riscv64-unknown-elf-g++ xxx.cpp -g -S xxx.s, but my assemble don't have .cfi_remember_state and .cfi_restore_state directives.

ToolChain down load from here:

I was wondering how your CFI directive above came about? Is there something wrong with my command?

I used the GCC 10.2.0 on godbolt.org. Basic block scheduling in newer GCCs may result in the directives being not needed for its code generation, I don't know; the necessary CFI is a function of what code is being generated, so if that changes then so does the CFI.