Page MenuHomePhabricator

[libunwind] Add an interface for dynamic .eh_frame registration
ClosedPublic

Authored by housel on Oct 14 2021, 8:56 PM.

Details

Summary

The libgcc runtime library provides register_frame and deregister_frame functions, which can be used by dynamic code generators to register an .eh_frame section, which contains one or more Call Frame Information records, each consisting of a Common Information Entry record followed by one or more Frame Description Entry records. This libunwind library also provides register_frame
and
deregister_frame functions, but they are effectively aliases for unw_add_dynamic_fde and unw_remove_dynamic_fde and thus can only take a single FDE.

This patch adds unw_add_dynamic_eh_frame_section and unw_remove_dynamic_eh_frame_section functions which explicitly use the .eh_frame format. Clients such as the ORCv2 platform and runtime can check for these functions and use them if unwinding is being
provided by libunwind, or fall back to register_frame and deregister_frame if unwinding is provided by libgcc.

Diff Detail

Event Timeline

housel created this revision.Oct 14 2021, 8:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2021, 8:56 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
housel requested review of this revision.Oct 14 2021, 8:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2021, 8:56 PM
MaskRay added a subscriber: MaskRay.EditedOct 14 2021, 9:06 PM

I looked at the libgcc mechanism at one time. I remember that in most cases it just uses PT_GNU_EH_FRAME and these eh_frame boundary registry functions are not needed.
Can ORC just use PT_GNU_EH_FRAME?

Clang passes --eh-frame-hdr to ld even in -static mode, unlike GCC. I assume ORC doesn't need to support gcc -static (or: requires -Wl,--eh-frame-hdr).

I looked at the libgcc mechanism at one time. I remember that in most cases it just uses PT_GNU_EH_FRAME and these eh_frame boundary registry functions are not needed.
Can ORC just use PT_GNU_EH_FRAME?

ORC doesn't go through the system dynamic loader, so there's no program header (JITLink doesn't generate one). None of the dynamic loaders I've looked at have any way to dynamically register program headers so that they'll be returned by dl_iterate_phdr. With ORC, code and data segments, including .eh_frame segments, get allocated in the target execution space and initialized by the JITLink memory manager, and the ORC runtime calling these registration functions is the only way libunwind will know about these tables.

lhames accepted this revision.Oct 17 2021, 10:03 PM
lhames added a subscriber: lhames.

This looks good to me, but we should get a libunwind contributor to weigh in too.

I've been trying to think of a good way to test this, but it's awkward. The best strategy that I've come up with, at least for testing within libunwind itself, would be to compile a test case with eh-frame tables written in by hand (as C arrays, compile it with -fno-asynchronous-unwind-tables, patch up the tables manually with C code at runtime, and then register/use/deregister them. That's all pretty awkward, which is probably why there are no existing tests for the dynamic registration APIs.

I think the ORC runtime provides a much more natural way to test this. Did you manage to come up with some ORC-runtime based tests in the end?

I don't know enough about Dwarf unwinding but the implementation looks generally good. Can you please add a testcase indicating how ORCJIT is planning to use it?

libunwind/src/DwarfParser.hpp
158

Can you document how and when this parameter should be used?

I think the ORC runtime provides a much more natural way to test this. Did you manage to come up with some ORC-runtime based tests in the end?

My current plan is to automate what I've been doing manually, namely running a test of C++ exception handling using llvm-jitlink, both with the default configuration (using libgcc-provided unwinding), and with libunwind LD_PRELOADed to force it as the unwinding provider.

joerg added a subscriber: joerg.Oct 18 2021, 1:26 PM

I would strongly prefer if ORCv2 doesn't depend on this. It essentially forces libunwind to parse the whole section just to find the delimiters of the FDEs. That's a lot of unnecessary work as JIT use normally allows registering functions individually.

I think the ORC runtime provides a much more natural way to test this. Did you manage to come up with some ORC-runtime based tests in the end?

My current plan is to automate what I've been doing manually, namely running a test of C++ exception handling using llvm-jitlink, both with the default configuration (using libgcc-provided unwinding), and with libunwind LD_PRELOADed to force it as the unwinding provider.

@lhames The other possibility is to lift JIT runtime to top level in llvm-project so it can link libunwind into its runtime so you don't need to:

  • Create a public API in libunwind to support JIT
  • Worry about deploying a compatible libunwind together with JIT (or worry about back-deploy)
  • You can write all the tests in ORCJIT context.

I would strongly prefer if ORCv2 doesn't depend on this. It essentially forces libunwind to parse the whole section just to find the delimiters of the FDEs. That's a lot of unnecessary work as JIT use normally allows registering functions individually.

I don't follow this. Does libunwind provide some way to register FDEs without parsing the FDE content? If so we can definitely use that, but we should still process the whole section: ORC links objects (not functions), and we should register every FDE for an object when it's linked in.

It's also worth noting that ORC and MCJIT have always called __register_frame on every frame, which seems like it should be at least as much work as this.

It's also worth noting that FreeBSD's version of libgcc exception handling is actually based on the libunwind code, with a local patch that implements compatibility with libgcc __register_frame by changing it to parse an entire .eh_frame section (in a slightly more ad hoc fashion than this code). Having this new entry point in-tree would simplify the FreeBSD local changes.

MaskRay added a subscriber: dim.Oct 18 2021, 10:44 PM

It's also worth noting that FreeBSD's version of libgcc exception handling is actually based on the libunwind code, with a local patch that implements compatibility with libgcc __register_frame by changing it to parse an entire .eh_frame section (in a slightly more ad hoc fashion than this code). Having this new entry point in-tree would simplify the FreeBSD local changes.

@dim

dim added a subscriber: emaste.Oct 19 2021, 2:24 AM

It's also worth noting that FreeBSD's version of libgcc exception handling is actually based on the libunwind code, with a local patch that implements compatibility with libgcc __register_frame by changing it to parse an entire .eh_frame section (in a slightly more ad hoc fashion than this code). Having this new entry point in-tree would simplify the FreeBSD local changes.

Adding @emaste since he added this part in:
https://github.com/freebsd/freebsd-src/commit/77ac8927fda49722185dab219365be31daed8872

joerg added a comment.Oct 19 2021, 6:49 AM

I would strongly prefer if ORCv2 doesn't depend on this. It essentially forces libunwind to parse the whole section just to find the delimiters of the FDEs. That's a lot of unnecessary work as JIT use normally allows registering functions individually.

I don't follow this. Does libunwind provide some way to register FDEs without parsing the FDE content? If so we can definitely use that, but we should still process the whole section: ORC links objects (not functions), and we should register every FDE for an object when it's linked in.

It's also worth noting that ORC and MCJIT have always called __register_frame on every frame, which seems like it should be at least as much work as this.

__register_frame requires parsing the CIE header, but not the whole FDE program. E.g. that's the findPCRange logic. After that, the FDE is just added to the internal block list. Parsing a whole segment is more involved as it needs to look for the terminator of each block to find the next FDE.

joerg added a comment.Oct 19 2021, 6:50 AM

It's also worth noting that FreeBSD's version of libgcc exception handling is actually based on the libunwind code, with a local patch that implements compatibility with libgcc __register_frame by changing it to parse an entire .eh_frame section (in a slightly more ad hoc fashion than this code). Having this new entry point in-tree would simplify the FreeBSD local changes.

Are you mixing up of __register_frame and __register_frame_info?

Are you mixing up of __register_frame and __register_frame_info?

No? FreeBSD doesn't patch __register_frame_info, and so (like base libunwind) it does nothing in the code I linked.

__register_frame requires parsing the CIE header, but not the whole FDE program. E.g. that's the findPCRange logic. After that, the FDE is just added to the internal block list. Parsing a whole segment is more involved as it needs to look for the terminator of each block to find the next FDE.

Ok -- that makes sense. So I think __unw_add_dynamic_eh_frame_section just needs to walk the section and register the FDEs with __unw_add_dynamic_fde, rather than calling into DwarfFDECache up-front. Does that sound right?

From the JIT's perspective we're looking for two things: (1) to register a whole section with a single call (this avoids jumping in and out of libunwind for every FDE), and (2) a new symbol name whose semantics we can rely on, since __register_frame behaves differently in different unwinding libraries.

To be clear, this new code parses exactly as much of each FDE as the existing __register_frame/__unw_add_dynamic_fde does, including doing the same work to compute the record length. Neither needs to parse the instructions at registration time.

To be clear, this new code parses exactly as much of each FDE as the existing __register_frame/__unw_add_dynamic_fde does, including doing the same work to compute the record length. Neither needs to parse the instructions at registration time.

Right -- I should have actually looked at the implementation of __unw_add_dynamic_fde. ;)

@joerg -- We have always registered every FDE in the JIT. This is an improvement over what we have, not a regression. Is there some more efficient approach that you would recommend? Otherwise I think this should go in.

@joerg Any further thoughts on this?

I'm happy for it to go in as-is -- we can always make the implementation of __unw_add_dynamic_eh_frame_section / __unw_remove_dynamic_eh_frame_section later, but the API looks good to me, and I think this is a strict improvement in functionality for JIT clients.

housel updated this revision to Diff 382884.Oct 27 2021, 7:12 PM

Added additional comments to CFI_Parser<A>::decodeFDE.

housel marked an inline comment as done.Oct 27 2021, 7:13 PM
lhames accepted this revision.Nov 1 2021, 10:56 AM

@joerg @dim @emaste Any further comments on this?

I'd like @housel to be able to land this this week if possible. It will significantly improve the usability of libunwind for JIT clients by giving us an API with behavior that is consistent across platforms.

emaste added a comment.Nov 1 2021, 5:40 PM

@joerg @dim @emaste Any further comments on this?

I have no objection.

2016 (when I patched FreeBSD's in-tree libunwind) is long enough ago that I don't recall any of the context but indeed it does not appear that this will cause any trouble on our next import.

lhames added a comment.Nov 3 2021, 8:25 PM

I think the ORC runtime provides a much more natural way to test this. Did you manage to come up with some ORC-runtime based tests in the end?

My current plan is to automate what I've been doing manually, namely running a test of C++ exception handling using llvm-jitlink, both with the default configuration (using libgcc-provided unwinding), and with libunwind LD_PRELOADed to force it as the unwinding provider.

@lhames The other possibility is to lift JIT runtime to top level in llvm-project so it can link libunwind into its runtime so you don't need to:

  • Create a public API in libunwind to support JIT
  • Worry about deploying a compatible libunwind together with JIT (or worry about back-deploy)
  • You can write all the tests in ORCJIT context.

@steven_wu I missed this earlier, sorry!

so it can link libunwind into its runtime

I don't follow this. Are you suggesting that the ORC runtime would link in portions of libunwind? I don't think that'd give us the behavior that we want -- we need frames to be registered with the processes copy of libunwind so that unwinding is consistent across JIT and AOT-compiled code.

@joerg, @emaste -- I finally got a chance to experiment with a Linux docker container and confirmed that libgcc_s's __register_frame can handle individual frames. Unfortunately that does not help us on FreeBSD.

If we ignore FreeBSD for a moment we could imagine switch to registering single frames everywhere, but this would be an unrecoverable performance regression on Linux -- we would necessarily have to walk all CFI entries in the section to use the API.

One possible alternative solution would be to update libunwind's __register_ehframe function to also accept whole sections the same way libgcc_s does, but I don't think this is a good solution: We'd be overloading the behavior in a surprising way to match a libgcc_s behavior that doesn't seem well-documented, and we'd still be left needing some other way to detect whether the current unwinder supported the desired semantics.

I think the ideal solution would be to always pass in whole sections, as @housel's proposed solution does.

I think we can debate the right API signature (e.g. should it pass the section range so that we really can skip walking the CFI records, or is this cheap enough not to bother?), function name, and implementation, but does anyone have any objection to the direction? If so, why?

I've chatted about this with Joerg on a side thread and wanted to summarize my understanding of the situation.

The problem is that we have incompatible behaviors for __register_frame and __deregister_frame between unwinders:

  • libgcc_s and the FreeBSD port of libunwind accept either null-terminated whole frames or single FDEs. If called with single FDEs, FreeBSD's port of libunwind will have quadratic cost for registration, as it always walks the remainder of the section. I suspect libgcc_s's behavior is similar..
  • llvm-project's libunwind accepts only single FDEs.

Attempts to reliably detect the unwinder being used (so that we can call it the "best" way) have been flaky, leading to hard errors for clients. We're looking for a consistent behavior that we could rely on (at least when we can detect that it is available). Ideally, the proposed scheme should allow for efficient implementations.

The options that I have considered and discussed with others are:

  1. Always registering single frames.

The problem with this is that it forces up front costs on the caller: all FDE starts must be identified, and we have one libcall per FDE to register (and probably a heap allocation embedded in that call). On top of this, two major unwinders (libgcc_s and the FreeBSD port of libunwind) seem like they have poor implementations of single-FDE processing that may have quadratic overhead in the worst case.

  1. Teach libunwind's _register_frame to also accept whole frames.

We could update libunwind's __register_frame implementation to also be able to take a whole section. For this to be useful though, we would still need some reliable way to detect whether this behavior is available on the libunwind version being used. A new special symbol would do (__unw_register_frame_accepts_whole_sections?), but this feels awkward compared to @housel's proposal for a new function with clearly defined behavior.

  1. Register .eh_frame_hdr sections.

The .eh_frame_hdr section is a pre-built index of the .eh_frame section. Its format is documented in https://refspecs.linuxfoundation.org/LSB_1.3.0/gLSB/gLSB/ehframehdr.html. This section is not present in relocatable objects (which LLVM's JIT APIs use as input), but is synthesized by the static linker.

For ahead of time compilation this link-time synthesis provides a lot of value -- indexing the FDEs once at link times means that you never have to do it at launch time.

For a JIT I don't think this provides any value: Indexing of the .eh_frame will necessarily be happening at "launch" time. At the same time, forcing synthesis of an .eh_frame_hdr section for registration constrains unwinder implementations: they definitely can't index lazily (the caller already eagerly indexed for them), and if they don't want to use .eh_frame_hdr as their internal index data structure then the scheme has wasted two encoded pointers per FDE registered, plus 4 bytes and two more encoded values for the .eh_frame_hdr header.

  1. @housel's solution -- Add a new registration function that is defined to take an __eh_frame start address.

This requires no up-front computation, we just pass in the start of the .eh_frame section. It places no constraints on the unwinder: Implementations are free to index eagerly or lazily as they like, and to choose whatever internal representation best suits them.

I think that @housel's solution is the best option, and that we should proceed with it.

There doesn't seem to be any objection to this approach, so I think you're free to land @housel.

This revision was not accepted when it landed; it landed in state Needs Review.Nov 18 2021, 8:06 AM
This revision was automatically updated to reflect the committed changes.