This is an archive of the discontinued LLVM Phabricator instance.

[libunwind] Make .eh_frame scanning/caching optional
AbandonedPublic

Authored by rprichard on Aug 19 2020, 5:21 PM.

Details

Reviewers
saugustine
jgorbe
mstorsjo
compnerd
Group Reviewers
Restricted Project
Summary

Add another macro, _LIBUNWIND_SUPPORT_DWARF_SECTION, that indicates
that the location and length of .eh_frame is known, and that .eh_frame
should be scanned to find an FDE. There are a few reasons to turn this
code off:

  • When there is a valid .eh_frame_hdr section, scanning .eh_frame is unnecessary.
  • When .eh_frame is located using PT_GNU_EH_FRAME, the length of the section is unknown.
  • If the unwinder uses dl_iterate_phdr, then entries that are automatically added to DwarfFDECache would become invalid if the module containing the entry were unloaded. (On Apple systems, DwarfFDECache registers dyldUnloadHook to remove unloaded entries.)
  • Omitting the dwarf_section and dwarf_section_length fields reduces memory use in the FrameHeaderCache.

For a bare-metal target, the location of both .eh_frame and
.eh_frame_hdr is known, but maybe the .eh_frame_hdr is unused/empty,
so enable both methods. .eh_frame is only scanned if the index is
unavailable.

Add a new dso_length field to track the length of the PT_LOAD segment
used by the FrameHeaderCache. Previously, the FrameHeaderCache used the
dwarf_section_length field for this purpose.

Fix the calculation of ehSectionEnd in CFI_Parser<A>::findFDE.

Above the dso_base field declaration, remove the redundant reference to
_LIBUNWIND_SUPPORT_DWARF_INDEX. If _LIBUNWIND_SUPPORT_DWARF_INDEX is
defined, then _LIBUNWIND_SUPPORT_DWARF_UNWIND is also defined.

Fixes PR36005 and PR46829.

Diff Detail

Event Timeline

rprichard created this revision.Aug 19 2020, 5:21 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 19 2020, 5:21 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
rprichard requested review of this revision.Aug 19 2020, 5:21 PM
rprichard updated this revision to Diff 286922.Aug 20 2020, 5:02 PM

Stylistic changes.

Split out the chain-of-ifdefs change into D86768.

rprichard retitled this revision from [libunwind] Refactor DWARF FDE lookup to [libunwind] Make .eh_frame scanning/caching optional.Aug 28 2020, 12:56 AM
rprichard edited the summary of this revision. (Show Details)

FWIW, when compact unwinding defers to DWARF (compactSaysUseDwarf), the unwinder begins a scan of .eh_frame at a given offset. Is that really the right behavior, or will the offset always point directly at the appropriate FDE (as with the DWARF index .eh_frame_hdr)?

Aside: For most architectures, when UnwindCursor::setInfoBasedOnIPRegister calls compactSaysUseDwarf, dwarfOffset isn't initialized, but it is read. The value is properly initialized on [arm, arm64, i386, x86_64], though, and I'm guessing those are the architectures that are practically used with compact unwinding.

If the unwinder uses dl_iterate_phdr, then entries that are automatically added to DwarfFDECache would become invalid if the module containing the entry were unloaded. (On Apple systems, DwarfFDECache registers dyldUnloadHook to remove unloaded entries.)

AFAICT, this is also a problem for Windows DWARF EH, which relies on scanning an .eh_frame section. e.g. A stale cachedFDE could point into the middle of an FDE/CIE with an arbitrary outcome. Maybe decodeFDE crashes, or maybe it decodes something that looks like a match. I'm not sure if it can be fixed without disabling the caching. If so, then I'd wonder if the caching is worth keeping. (Bare-metal could use .eh_frame_hdr, and I'm not sure whether Apple systems actually need the full __eh_frame scan.)

Windows also has SEH exceptions, which I think don't have this problem.

rprichard edited the summary of this revision. (Show Details)Aug 28 2020, 3:04 AM

The Linux ABI Extensions spec describes the binary search table of .eh_frame_hdr as optional, so maybe falling back to .eh_frame scanning could be necessary to accommodate some ELF files? I wonder what situation results in an omitted search table. ld.bfd does have code for outputting .eh_frame_hdr without a search table.

If the unwinder uses dl_iterate_phdr, then entries that are automatically added to DwarfFDECache would become invalid if the module containing the entry were unloaded. (On Apple systems, DwarfFDECache registers dyldUnloadHook to remove unloaded entries.)

AFAICT, this is also a problem for Windows DWARF EH, which relies on scanning an .eh_frame section. e.g. A stale cachedFDE could point into the middle of an FDE/CIE with an arbitrary outcome. Maybe decodeFDE crashes, or maybe it decodes something that looks like a match. I'm not sure if it can be fixed without disabling the caching. If so, then I'd wonder if the caching is worth keeping. (Bare-metal could use .eh_frame_hdr, and I'm not sure whether Apple systems actually need the full __eh_frame scan.)

Indeed, that's correct. I guess a stale cache entry would only ever be hit if there's another DLL loaded into the same address range as the previous one, but if that happens and unwidning hits a cached entry, pretty much anything could happen. Not really sure if there's any good callbacks for when other DLLs are unloaded...

Disabling the cache in those cases might be safest - even though it probably gives a good speedup for all practical cases, the risks it imposes are pretty bad as well...

Windows also has SEH exceptions, which I think don't have this problem.

It's handled quite differently there, yeah.

Regarding the rest of the patch, I don't have enough experience with these details on ELF platforms to have much of an informed opinion... The windows specific bits look good though.

The commit description talks about making the caching optional, but I don't really see that aspect in the patch - or maybe I'm just not studying it closely enough?

The commit description talks about making the caching optional, but I don't really see that aspect in the patch - or maybe I'm just not studying it closely enough?

Oh, by optional, I just meant that the .eh_frame scanning and caching are both controlled by the new _LIBUNWIND_SUPPORT_DWARF_SECTION macro, which is left off for some targets.

The Linux ABI Extensions spec describes the binary search table of .eh_frame_hdr as optional, so maybe falling back to .eh_frame scanning could be necessary to accommodate some ELF files? I wonder what situation results in an omitted search table. ld.bfd does have code for outputting .eh_frame_hdr without a search table.

It looks like both ld.bfd and ld.gold have code paths that output an .eh_frame_hdr without a search table. This can happen if they're not able to parse .eh_frame. AFAICT, LLD always outputs the search table (EhFrameHeader::write), and aborts if it can't parse .eh_frame. One example of this: If a CIE has an unrecognized augmentation code, then the linker may be unable to parse the R code, and without that, it can't read the FDE initial_location field, because the R code specifies its encoding. LLD avoids this problem by aborting if it can't read the R code (EhReader::getFdeEncoding).

libgcc appears to fall back to scanning .eh_frame if .eh_frame_hdr doesn't have the FDE, and it can scan beyond the end of the segment containing the .eh_frame section, if the section doesn't end with a terminator.

Perhaps a more conservative fix for PR46829 is to match libgcc's behavior here, by disabling the end-of-section tracking for dl_iterate_phdr. e.g. by setting dwarf_section_length to -1. In that case, then PR36005's true fix is to link against a crtend.o with the .eh_frame terminator. (Or use LLD, which automatically inserts the terminator.)

Perhaps a more conservative fix for PR46829 is to match libgcc's behavior here, by disabling the end-of-section tracking for dl_iterate_phdr. e.g. by setting dwarf_section_length to -1. In that case, then PR36005's true fix is to link against a crtend.o with the .eh_frame terminator. (Or use LLD, which automatically inserts the terminator.)

Haven't read through the discussions. Just want to comment "Or use LLD, which automatically inserts the terminator" - the zero terminator CIE is to work around a glibc issue D46566

Just want to comment "Or use LLD, which automatically inserts the terminator" - the zero terminator CIE is to work around a glibc issue D46566

The need for the terminator seems more fundamental:

  • In theory (and in the LSB5 spec), .eh_frame_hdr's search table is optional, or perhaps it could be incomplete? (e.g. see this mailing list post)
  • libgcc (and LLVM's libunwind) accommodate an incomplete/omitted search table by falling back to scanning .eh_frame.
  • A PC might not be present in .eh_frame, so the unwinder needs to know when it reaches the end of .eh_frame.
  • .eh_frame_hdr records the location of .eh_frame, but not the size.
  • Therefore, a terminator is required.

If .eh_frame_hdr had also included the size of .eh_frame, then a terminator wouldn't be needed. My impression is that the terminator predates .eh_frame_hdr. e.g. The __register_frame_info function takes a pointer to the start of .eh_frame, but no size.

crtend.o normally provides the terminator, and bfd/gold appear to preserve it. So, if crtend.o isn't used, then .eh_frame is unterminated (e.g. PR36005). I noticed that bfd/gold remove a terminator if I try to add one before actual CIE/FDE records. LLD 6.0.0 seems to strip all terminators, including crtend.o's terminator, so the output is never terminated. Current LLD adds the terminator unconditionally, so the output is always terminated, even without crtend.o.

This Phabricator revision assumes that .eh_frame_hdr's search table is sufficient, and maybe it is, at least on some targets? I suspect the bfd/gold linkers issue a warning if they don't output a search table, and lld always outputs it.

Two more ways the search table can be omitted:

  • With the gold linker, if .eh_frame has only a terminator, then the generated .eh_frame_hdr's table is omitted. LLD instead outputs a 0-entry search table.
  • ld.bfd seems to omit the search table if it can't use relative relocations to refer to functions. I don't really understand what's happening there. Maybe HPPA is an example?

Maybe the .eh_frame fallback scan can be skipped when .eh_frame_hdr has a search table. This should work as long as the table is never incomplete, and I suspect it'd make the terminator unnecessary most of the time.

For now, I think I'll work on a patch that simply removes the broken .eh_frame size calculation.

rprichard abandoned this revision.Sep 16 2020, 7:03 PM

Abandoned in favor of D87750.