Page MenuHomePhabricator

[libunwind] Fix handling of DW_CFA_GNU_args_size
ClosedPublic

Authored by joerg on Oct 9 2017, 2:15 AM.

Details

Summary

This effectively reverts SVN r204290 and r204292 (back when this code was part of libcxxabi).

According to SVN r204290, the primary architecture using the DW_CFA_GNU_args_size opcode is VAX.

However, clang also produces it on X86 when it has done X86CallFrameOptimization, which gets done much more frequently if the stack is aligned to 4 bytes (which is the default when targeting windows).

This issue can be tested by building code for x86 linux (at least for 32 bit) with clang with -mstack-alignment=4.

I'm not sure if this code should be handled differently for VAX with some ifdef, or what the correct interpretation of this opcode is. I ran into this issue while trying to use libunwind for 32 bit windows (where clang uses a 4 byte stack alignment by default), found this commit, and noticed I got it working by reverting it. And later noticed I could reproduce the same issue on 32 bit x86 linux as well, by forcing -mstack-alignment=4.

Diff Detail

Repository
rL LLVM

Event Timeline

mstorsjo created this revision.Oct 9 2017, 2:15 AM
compnerd requested changes to this revision.Oct 9 2017, 2:19 PM

I think that the problem is that we are using the generic register name, but we need to use the target specific register name. On x86, EIP/ESP are swapped. We should also have a test case for this. I had reduced this down to a simpler test case of:

void f(int,int,int,int,int,int,int,int,int);

int main() {
  try {
    f(0,1,2,3,4,5,6,7,8);
  } catch (int) {
    return 0;
  }
  return 1;
}
This revision now requires changes to proceed.Oct 9 2017, 2:19 PM

I think that the problem is that we are using the generic register name, but we need to use the target specific register name. On x86, EIP/ESP are swapped.

You mean EBP/ESP? I think the code here does the right thing, with UNW_REG_SP always mapping to the actual ESP.

We should also have a test case for this. I had reduced this down to a simpler test case of:

void f(int,int,int,int,int,int,int,int,int);
 
int main() {
  try {
    f(0,1,2,3,4,5,6,7,8);
  } catch (int) {
    return 0;
  }
  return 1;
}

Oh, great if you can reproduce the issue with that! Feel free to try to dig into the issue and figure out a better fix then.

rnk added inline comments.Oct 11 2017, 2:53 PM
src/libunwind.cpp
188 ↗(On Diff #118188)

I think it makes sense to have this here: the contract is that if the personality sets the IP when the cursor pointed to a PC with a non-zero arg size, we should adjust SP for the personality.

However, it's not clear to me that we don't need the same adjustment when stepping across frames that use arg size without a frame pointer.

mstorsjo added inline comments.Oct 11 2017, 9:45 PM
src/libunwind.cpp
188 ↗(On Diff #118188)

When stepping across frames, the gnu args size is already included in, as part of the CFA offset, so with the current code, it steps too far.

rnk accepted this revision.Oct 12 2017, 2:07 PM

lgtm

src/libunwind.cpp
188 ↗(On Diff #118188)

OK, yes, now I see the code that does that. :)

Wow, I've finally debugged some unwind failures to this on x86_64 Linux ...

  1. Call-frame optimization introduces it
  2. This fixes the cases I was investigating!!

Thank you!

Wow, I've finally debugged some unwind failures to this on x86_64 Linux ...

  1. Call-frame optimization introduces it
  2. This fixes the cases I was investigating!!

That's good to hear! FWIW, I've talked to @joerg off-list and he said he'll try to look into it soon.

joerg edited edge metadata.Oct 22 2017, 2:18 PM

I've looked at this in some detail now. I'm not exactly sure yet why it is broken. The patch seems quite wrong to me. DW_CFA_GNU_args_size should be applied only when unwinding a call instruction and that regard, the commit message of the original change is quite correct. What I am still trying to understand is how the precise unwind frame disagrees with the unwinder.

I've looked at this in some detail now. I'm not exactly sure yet why it is broken. The patch seems quite wrong to me. DW_CFA_GNU_args_size should be applied only when unwinding a call instruction and that regard, the commit message of the original change is quite correct. What I am still trying to understand is how the precise unwind frame disagrees with the unwinder.

@joerg Any update on this matter, how should we move forward with unbreaking the unwinding on i386?

rnk added a comment.Nov 6 2017, 2:27 PM

I've looked at this in some detail now. I'm not exactly sure yet why it is broken. The patch seems quite wrong to me. DW_CFA_GNU_args_size should be applied only when unwinding a call instruction and that regard, the commit message of the original change is quite correct. What I am still trying to understand is how the precise unwind frame disagrees with the unwinder.

If you look at Clang's output here https://godbolt.org/g/jFcSxz, you can see that we emit precise CFA adjustments for each push. We don't need to adjust the CFA by gnu arg size in UnwindCursor::step, which unwinds through frames. We only apply it when setting up the register context before transitioning to the landingpad. That's why unw_set_reg UNW_REG_IP is at least approximately the right place to do this SP adjustment, IMO.

Basically, a general purpose unwinder that collects return addresses can ignore gnu args size because we already have CFA adjustments, but an unwinder that implements landingpad transitions must adjust SP by gnu arg size as part of that context switch.

At least, that's how I understand this change. Maybe older (VAX?) compilers didn't implement gnu arg size this way, but at least with this patch, we handshake with ourselves correctly.

Ping @joerg and @rnk - what's a sensible path forward with respect to this? @rnk thinks this change is correct, while @joerg disagrees, but hasn't had time to look into it yet. Is it acceptable to apply this now and then revert later once @joerg has analyzed the issue?

I'm back to the point where I can't reproduce the problem :( Can we start providing an actual failing test case? It's annoying to debug a problem when you can't reproduce it.

I'm back to the point where I can't reproduce the problem :( Can we start providing an actual failing test case? It's annoying to debug a problem when you can't reproduce it.

My testcase that triggers the issue is available at https://martin.st/temp/hello-exception.cpp. With this input file, I can produce the DW_CFA_GNU_args_size opcodes with this command: clang -target i386-netbsd -O3 -fomit-frame-pointer -S -o - hello-exception.cpp (tested with both clang 4.0 and current trunk).

joerg commandeered this revision.Jun 5 2018, 3:03 PM
joerg edited reviewers, added: mstorsjo; removed: joerg.
joerg updated this revision to Diff 150049.Jun 5 2018, 3:10 PM

After a careful review of newer GCC / libgcc and the assembler annotations from LLVM, I have come to the following conclusions:

(1) The semantics have been somewhat changed by GCC in recent years. There is no actual specification, so we have to go by what behavior actually makes sense.
(2) The primary motivation is still that the DW_CFA_GNU_args_size is a call-site specific annotation. It is expected to be applied when the IP is moved by the personality routine to compensate for the call site specific (temporary) adjustment.
(3) It is not clear with plain unw_set_ip outside the scope of the Itanium EH handling should have this behavior, so it might need to be split into an internal routine.
(4) LLVM does not produce correct CFA annotation for stdcall and similar cases where the callee removes additional stack space.

The patch covers the first two items.

mstorsjo accepted this revision.Jun 5 2018, 11:40 PM

Sounds sensible to me; I've tested the patch in the setup that showed the issue this original patch tried to fix, and it seems to work fine there.

compnerd accepted this revision.Jun 11 2018, 2:09 PM

This makes much more sense. Thanks @joerg

This revision is now accepted and ready to land.Jun 11 2018, 2:09 PM
rnk added a comment.Jun 13 2018, 2:22 PM

After a careful review of newer GCC / libgcc and the assembler annotations from LLVM, I have come to the following conclusions:

(1) The semantics have been somewhat changed by GCC in recent years. There is no actual specification, so we have to go by what behavior actually makes sense.
(2) The primary motivation is still that the DW_CFA_GNU_args_size is a call-site specific annotation. It is expected to be applied when the IP is moved by the personality routine to compensate for the call site specific (temporary) adjustment.

Right.

(3) It is not clear with plain unw_set_ip outside the scope of the Itanium EH handling should have this behavior, so it might need to be split into an internal routine.

I don't know enough about this code to really respond to this.

(4) LLVM does not produce correct CFA annotation for stdcall and similar cases where the callee removes additional stack space.

Here's what we generate for that case today: https://godbolt.org/g/33cNJy
The important part is:

.cfi_escape 0x2e, 0x0c
pushl   $3
.cfi_adjust_cfa_offset 4
pushl   $2
.cfi_adjust_cfa_offset 4
pushl   $1
.cfi_adjust_cfa_offset 4
calll   __Z13may_throw_stdiii@12
.cfi_adjust_cfa_offset -12

Are you saying that the runtime will calculate the wrong CFA because it will include the .cfi_adjust_cfa_offset -12? As in, adding a nop after the call would fix the glitch? If so, I think the right thing to do would be to fix libunwind to use return_address - 1 when unwinding.

This revision was automatically updated to reflect the committed changes.