This is an archive of the discontinued LLVM Phabricator instance.

[lldb/Unwind] Use eh_frame plan directly when it doesn't need to be augmented
ClosedPublic

Authored by labath on Jun 23 2020, 6:51 AM.

Details

Summary

This fixes a bug in the logic for choosing the unwind plan. Based on the
comment in UnwindAssembly-x86, the intention was that a plan which
describes the function epilogue correctly does not need to be augmented
(and it should be used directly). However, the way this was implemented
(by returning false) meant that the higher level code
(FuncUnwinders::GetEHFrameAugmentedUnwindPlan) interpreted this as a
failure to produce _any_ plan and proceeded with other fallback options.
The fallback usually chosed for "asynchronous" plans was the
"instruction emulation" plan, which tended to fall over on certain
functions with multiple epilogues (that's a separate bug).

This patch simply changes the function to return true, which signals the
caller that the unmodified plan is ready to be used.

The attached test case demonstrates the case where we would previously
fall back to the instruction emulation plan, and unwind incorrectly --
the test asserts that the "augmented" eh_frame plan is used, and that
the unwind is correct.

Diff Detail

Event Timeline

labath created this revision.Jun 23 2020, 6:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2020, 6:51 AM

I do not like much this approach but it fixes this case. I would prefer checking DW_AT_producer together with -grecord-gcc-switches.
I was trying GCC and clang and all the IMO relevant options and I can get either no .eh_frame/.debug_frame or an asynchronous one.:

(set -ex;rm -rf udir;mkdir udir;cd udir;rm -f *;for e in -f{,no-}exceptions;do for u in -f{,no-}asynchronous-unwind-tables;do for a in -f{,no-}unwind-tables;do for c in clang gcc;do echo 'int main(void){return 0;}'|$c $e $a $u -o $c$e$u$a -x c -;done;done;done;done;for i in $(sha256sum *|sort|awk '{x[$1]=$2}END{for (y in x) print x[y]}');do nm $i|grep -w main;readelf -wf $i;done) 2>&1|less

So I think current compilers no longer produce non-asynchronous unwind tables (that should be verified in the gcc+clang code which I did not) and a simple compiler version check in DW_AT_producer would be enough (even without checking compilation options).

lldb/test/Shell/Unwind/eh-frame-augment-noop.test
20

trailing whitespace. (5x)

I do not like much this approach but it fixes this case. I would prefer checking DW_AT_producer together with -grecord-gcc-switches.
I was trying GCC and clang and all the IMO relevant options and I can get either no .eh_frame/.debug_frame or an asynchronous one.:

(set -ex;rm -rf udir;mkdir udir;cd udir;rm -f *;for e in -f{,no-}exceptions;do for u in -f{,no-}asynchronous-unwind-tables;do for a in -f{,no-}unwind-tables;do for c in clang gcc;do echo 'int main(void){return 0;}'|$c $e $a $u -o $c$e$u$a -x c -;done;done;done;done;for i in $(sha256sum *|sort|awk '{x[$1]=$2}END{for (y in x) print x[y]}');do nm $i|grep -w main;readelf -wf $i;done) 2>&1|less

So I think current compilers no longer produce non-asynchronous unwind tables (that should be verified in the gcc+clang code which I did not) and a simple compiler version check in DW_AT_producer would be enough (even without checking compilation options).

I would be the first person to jump with joy of all compilers produced unwind info which is correct at every location, but I'm afraid we're just not there yet. The simplest way to disprove that is to look at the calls to __stdcall functions:

$ bin/clang -m32 -x c - -o - -S <<<"__stdcall int e(int); void f() { e(47); }" -O3 -fasynchronous-unwind-tables
...
f:                                      # @f
	.cfi_startproc
# %bb.0:                                # %entry
	subl	$12, %esp
	.cfi_def_cfa_offset 16
	movl	$47, (%esp)
	calll	e
#        .cfi_def_cfa_offset 8  <=== THIS IS MISSING
  	addl	$8, %esp
	.cfi_def_cfa_offset 4
	retl
...

Clang does not correctly describe the fact that the __stdcall function pops its arguments, so unwind will be incorrect immediately upon return from the function. Gcc handles this correctly, though I don't guarantee that it doesn't miss some other corner case. Now this situation is not actually handled by lldb's "augmenter", so the example somewhat shaky, but it does show that there are gaps in clang/llvm modelling of unwind info.

As for the DW_AT_producer idea, the main gotcha I see there is that eh/debug_frame is supposed to be usable even without the rest of the debug info. Now, we could treat DW_AT_producer as optional and fall back to some default behavior in case it's not present -- that would mean that the unwind sections are still usable independently, but the need for a fallback makes the idea less appealing.

The other problem with that is that we would actually have to gather information about which compilers report the unwind info correctly. I think gathering that information would be very worthwhile, but what I would do with that data is just make a note of the releases which started producing reasonable epilogue info, and when those releases reach a certain age (5 years?), just delete the code to support them.

Now this situation is not actually handled by lldb's "augmenter", so the example somewhat shaky, but it does show that there are gaps in clang/llvm modelling of unwind info.

Good to know but ... I do not see how is this stdcall unwinding bug related to this issue whether to use .eh_frame or not - UnwindAssembly_x86::AugmentUnwindPlanFromCallSite cannot verify any .eh_frame is 100% valid.

As for the DW_AT_producer idea, the main gotcha I see there is that eh/debug_frame is supposed to be usable even without the rest of the debug info.

Producer is also recorded in .gnu.build.attributes which is in the main binary (not in external .debug info):

$ annocheck --enable-notes /bin/bash
Notes: Range: 0x304e0 .. 0xe22e9.
  [O] Tool: running gcc 10.1.1 20200507
Notes: Range: 0x311f0 .. 0xe26a5.
  [O] Tool: running gcc 10.1.1 20200507
  
$ readelf -Wn /bin/bash
Displaying notes found in: .gnu.build.attributes
  Owner                                Data size        Description
  GA$<tool>running gcc 10.1.1 20200507 0x00000000       OPEN        Applies to region from 0x304e0 to 0xe22e9
  GA$<tool>running gcc 10.1.1 20200507 0x00000000       OPEN        Applies to region from 0x311f0 to 0xe26a5

Although in fact one does not need to parse the notes at all as if there exist such annobin notes one can already guarantee the compiler is new enough.

the need for a fallback makes the idea less appealing.

Fallback without installed external debuginfo would not work much but a fallback for old binaries still without annobin notes is not so bad IMO.

The other problem with that is that we would actually have to gather information about which compilers report the unwind info correctly.

non-clang/non-gcc compilers can contribute whitelist entries themselves. Such whitelisting would not regress the existing behavior, it would only further improve the whitelisted clang+gcc compilers.

I do not understand why the testcase has two epilogues. The UnwindAssembly_x86::AugmentUnwindPlanFromCallSite() code tests only beginning and end of CFI, it does not read anything in between.
The simplified testcase works the same for me: https://people.redhat.com/jkratoch/D82378.patch

Now this situation is not actually handled by lldb's "augmenter", so the example somewhat shaky, but it does show that there are gaps in clang/llvm modelling of unwind info.

Good to know but ... I do not see how is this stdcall unwinding bug related to this issue whether to use .eh_frame or not - UnwindAssembly_x86::AugmentUnwindPlanFromCallSite cannot verify any .eh_frame is 100% valid.

It's not related, at least not directly. I was mainly responding to the claim that all (current) compilers produce asynchronous unwind tables. This example disproves that (well, one could claim that this is an asynchronous unwind table, but that it is an incorrect one, but I don't think that distinction is relevant).

And if there's one bug/incompleteness, there could very well be others, so I was also alluding to the fact that we might need to have some sort of an "augmentation framework" for the forseeable future. However, it could very well be the case that we no longer need to augment epilogue data, and if that's the case deleting the code for doing that would be great (but for that someone would need to investigate the epilogue behavior of different compilers more closely).

As for the DW_AT_producer idea, the main gotcha I see there is that eh/debug_frame is supposed to be usable even without the rest of the debug info.

Producer is also recorded in .gnu.build.attributes which is in the main binary (not in external .debug info):

Well... I don't think that's an improvement. :/ This section seems to be present only on some flavours of linux (my distro doesn't have it), so one cannot even apply it to linux universally, much less other operating systems. So, I don't see the reason for changing the current detection algorithm. I think that a much more worthwhile use of time would be to check whether even need to detect something in the first place.

I do not understand why the testcase has two epilogues. The UnwindAssembly_x86::AugmentUnwindPlanFromCallSite() code tests only beginning and end of CFI, it does not read anything in between.
The simplified testcase works the same for me: https://people.redhat.com/jkratoch/D82378.patch

True. The reason I did that was because in the single-epilogue case, we would unwind correctly using pretty much any method (e.g. the instruction emulation method, which was used before this patch). But you're right that that is not important for this patch. I'll use the simplified version.

labath updated this revision to Diff 273043.Jun 24 2020, 8:37 AM
  • remove trailing whitespace
  • simplify test
jankratochvil accepted this revision.Jun 24 2020, 9:09 AM

It's not related, at least not directly.

My goal was to prove UnwindAssembly_x86::AugmentUnwindPlanFromCallSite is no longer useful nowadays. It was trying to detect situation where both:

  • .eh_frame exists
  • .eh_frame does not unwind prologue

I cannot force clang/gcc to produce such .eh_frame, IMO it should be produced with -funwind-tables -fno-asynchronous-unwind-tables but that does produce the same as -fasynchronous-unwind-tables.

Whether the compiler has some unwind info incompleteness/bugs is not important, the quality of .eh_frame from recent clang+gcc is always better than any fallbacks LLDB has.

So given UnwindAssembly_x86::AugmentUnwindPlanFromCallSite is no longer useful and it can (I think it still can sometimes) falsely discard a good .eh_frame section it would be best to disable UnwindAssembly_x86::AugmentUnwindPlanFromCallSite for new compilers. The question is how to detect "new compilers" without needing DWARF, my only idea is to do it by checking existence of section .gnu.build.attributes. If some Linux distribution still does not have .gnu.build.attributes does not matter, it is only a backward compatible improvement of current situation. I will post a patch for that as I have hijacked this bug.

And if there's one bug/incompleteness, there could very well be others, so I was also alluding to the fact that we might need to have some sort of an "augmentation framework" for the forseeable future. However, it could very well be the case that we no longer need to augment epilogue data, and if that's the case deleting the code for doing that would be great (but for that someone would need to investigate the epilogue behavior of different compilers more closely).

I do not see any specific case needing augmentation for code from new compilers.

"deleting the code" - if you mean one does not need to detect .gnu.build.attributes (or some other way) to detect new compilers and rather just assume any code is from a new enough compiler it is even better (and I agree).

Well... I don't think that's an improvement. :/ This section seems to be present only on some flavours of linux (my distro doesn't have it),

When it somewhere is an improvement and somewhere it has no effect I find it still as an improvement.

So, I don't see the reason for changing the current detection algorithm.

I mean to discard the current detection if there is possible a more safe detection.

This revision is now accepted and ready to land.Jun 24 2020, 9:09 AM

Hi Pavel, sorry I've been doing a bunch of random things today and haven't really had a chance to look at this yet. eh_frame is so problematic for lldb, we never know what we're getting. I keep thinking we should take over a few augmentation string characters so that the producer can definitively indicate one of:

Synchronous unwind only - at throwable locations, or when unwinding up the stack for a throw.

Fully describes prologue setup.

Fully describes epilogue(s) teardown.

Fully asynchronous - unwind information is correct at all instructions.

When we set up a stack frame (and base things off of the frame pointer), that can cover up a multitude of eh_frame issues as long as we get the prologue/epilogue. All the bugs tend to come out when we start using -fomit-frame-pointer, or look at leaf frames. ;) I don't know if gcc still does this, but I remember ages ago if you had i386 32-bit PIC code and needed to refer to pc-relative data it would do a CALLQ next-instruction and then pop the saved pc off the stack, without any unwind information at that point. Maybe it does that now.

With x86_64, I know gcc emits prologue & epilogue, and clang probably does too now (we rarely use eh_frame on macOS systems so it's been a while since I've seen it). I don't know what icc does. Given how rarely people bother with omit-frame-pointer codegen on x86_64 (it was more important on 32-bit i386), and prologue+epilogue were often present, I was fine with living on eh_frame as the primary unwind mechanism. It still makes me nervous in general, though.

(having augmentation strings would also allow us to remove a special-case in macOS support -- we have a couple of solibs with hand-written assembly and hand-written eh_frame for it, and we special case those libraries to say "trust the eh_frame from this one, it's cool".)

I've never pushed hard for the idea of adding augmentation strings from the producers because it seemed easier to depend primarily on the instruction analysis methods. But maybe it is something we should pursue.

None of this really comments specifically on the proposed patch, just talking out loud about my feelings on all of this. One thing you mentioned is that the x86 instruction analysis doesn't handle multiple epilogues -- it should do that? I fixed a bug in it three weeks ago where it didn't understand an epilogue-ending tail-call JMPQ as the final instruction in the epilogue (to a noreturn function). UnwindAssemblyInstEmulation takes a much better approach where it forwards the unwind state to branches in the function and doesn't bother trying to recognize the instructions in an epilogue; that's definitely superior and what the x86 unwinder should do (or just fold it into the InstEmulation system). But in general, multiple epilogues should be handled I believe..

Thanks for replying Jason.

I think having the augmentation string specify its validity would be great, though I am somewhat sceptical of being able to push that idea through -- given that eh_frame needs to be understood by a lot of consumers and is actually critical for the correctness of some applications (those using exceptions), making changes to it seems to be hard -- that's why it's still stuck at version 1 of debug_frame.

That said, given that current compilers do emit prologue/epilogue unwind info correctly, I'm not sure that we need the epilogue augmentation code, without additional header information. Given all of this discussion I'll try to find some time to do a detailed investigation of the prologue/epilogue state in various compiler versions (similar to the DW_AT_low_pc analysis in D78489), and see what comes out of it.

You're right that the instruction emulation not handling multiple epilogues is a bug. It's probably not about _all_ epilogues, only some special ones -- for example, the function where we ran into this was so simple it did not have any traditional prologue instructions and I suspect this is what threw the code off.

I was planning to look into that separately. My flow of thought which led to starting off with this patch was:

  • I knew that instruction emulation is broken, but the very first plan we try is the "eh_frame augmented" plan.
  • It seemed like the augmenter should also be able to handle this function (and it's job should be even easier). So I tried to find why that doesn't kick in.
  • While stepping through the code, I found this comment (UnwindAssembly-x86.cpp:127), which seems to imply that we don't need to augment it at all. It stop short of expliticly saying that we should actually use that plan, but that's seems implied to me. (it also makes sense -- the augmenter is there to add the epilogue instructions, and we already have them, so there's nothing to add..)
  • So, I figured I'd first fix it so that it actually does what it says it does.

From your comment, it's not clear to me whether you are ok with this patch, or if you want to do something differently.

BTW, the call-next-instruction trick is still used, but not by any gcc version available on godbolt.org -- clang uses this sequence on i386 PIC. However, since clang-3.8, the cfi instruction properly describe how to unwind out of that.

jasonmolenda accepted this revision.Jun 26 2020, 12:42 AM

Sorry yes this patch is fine. To be honest, on x86_64 at least, as you've pointed out, maybe we should just live on eh_frame completely without going through any of these augmentation checks at all. AFAIK gdb is living off of eh_frame exclusively, it may be that all the compilers have caught on to that and are emitting asynchronous unwind information in eh_frame.

My idea about adding an augmentation letter (most simply, just "is this accurate at every instruction") started to seem more reasonable to me when I learned about how armcc added "armcc+" to its augmentation string to indicate that a bug in their eh_frame generation has been fixed. (although all the other augmentation string entries I've seen have been single letter ones, so I'm not sure how smart it was to add a whole word there. It's too bad the augmentation string didn't define a separator.

Thanks for listening to all my eh_frame ramblings on this!

This revision was automatically updated to reflect the committed changes.
JDevlieghere added inline comments.
lldb/test/Shell/Unwind/eh-frame-augment-noop.test
19

On Darwin this returns

Asynchronous (not restricted to call-sites) UnwindPlan is 'assembly insn profiling'

I've skipped the test in b4180fe477bfe302778aaceee65faf69c5e7be76

labath marked an inline comment as done.Jun 29 2020, 2:07 AM
labath added inline comments.
lldb/test/Shell/Unwind/eh-frame-augment-noop.test
19

Thanks. It looks like what happens here is that when targetting mac, clang additionally emits unwind info in the compact unwind form and the linker then drops the eh_frame info for these functions. There doesn't seem to be a way to prevent the compact unwind generation (besides generating unwind info that cannot be translated into the compact format).

So, it seems skipping the test is the right thing to do here.

You're right that the instruction emulation not handling multiple epilogues is a bug. It's probably not about _all_ epilogues, only some special ones -- for example, the function where we ran into this was so simple it did not have any traditional prologue instructions and I suspect this is what threw the code off.

I was planning to look into that separately.

I started looking into it now. The problem seems to be that the pop %rcx instruction is not treated as part of the epilogue -- the code only treats volatile register restores as a part of the epilogue. However, in this case, the pop %rcx is not there because the program needs to restore the register values -- it's there simply to undo the stack alignment that was done in the prologue to satisfy the ABI. This then causes us to incorrectly restore the "before-prologue" state when going past the first ret, and adjusting the stack twice:

pushq %rax # align stack => CFA=rsp+16
# some function call
jcc 1f
# some code
pop %rcx # unalign stack => CFA=rsp+8
retq # should reset CFA=rsp+16, but does +8 instead
1: # more code, unwind info incorrect here
pop %rcx # => CFA=rsp+0, should be +8
retq

The simplest fix for that would be to treat _all_ pop instructions as the beginning of prologue (basically, move this line out of the enclosing if(non-volatile) block), but I don't know whether that isn't too aggressive (it should be mostly ok, because current compilers prefer movq ..., -N(%rsp) instead of push/pop sequences, but still...). What do you think?