This is an archive of the discontinued LLVM Phabricator instance.

[COFF] Check table ptr more thoroughly and ignore empty sections
ClosedPublic

Authored by alvinhochun on Jun 2 2022, 10:35 AM.

Details

Summary

When loading split debug files for PE/COFF executables (produced with
objcopy --only-keep-debug), the tables or directories in such files
may point to data inside sections that may have been stripped.
COFFObjectFile shall detect and gracefully handle this, to allow the
object file be loaded without considering these tables or directories.
This is required for LLDB to load these files for use as debug symbols.

COFFObjectFile shall also check these pointers more carefully to account
for cases in which the section contains less raw data than the size
given by VirtualSize, to prevent going out of bounds.

This commit also changes COFFDump in llvm-objdump to reuse the pointers
that are already range-checked in COFFObjectFile. This fixes a crash
when trying to dump the TLS directory from a stripped file.

Fixes https://github.com/mstorsjo/llvm-mingw/issues/284

Diff Detail

Event Timeline

alvinhochun created this revision.Jun 2 2022, 10:35 AM
Herald added a reviewer: MaskRay. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
alvinhochun published this revision for review.Jun 2 2022, 10:41 AM
alvinhochun added reviewers: mstorsjo, rnk.
Herald added a project: Restricted Project. · View Herald TranscriptJun 2 2022, 10:41 AM
mstorsjo added inline comments.Jun 2 2022, 1:18 PM
llvm/lib/Object/COFFObjectFile.cpp
912

The patch overall looks reasonable to me. The changes to this function feel a bit clunky, but I don't really have any other good suggestions either.

@rnk, what do you think about this patch?

rnk accepted this revision.Jun 2 2022, 7:59 PM

lgtm

llvm/lib/Object/COFFObjectFile.cpp
912

To make it less clunky, I can suggest making a wrapper which consumes stripped section errors, and then call it like:

static Error ignoreStrippedErrors(Error E) {
  if (E.isA<...>()) {
    consumeError(...);
    return Error::success();
  }
  return std::move(E);
}

...
  if (Error E = ignoreStrippedErrors(initDelayImportTablePtr()))
    return E;
...
This revision is now accepted and ready to land.Jun 2 2022, 7:59 PM

Applied suggestion to make code less clunky.

alvinhochun marked 2 inline comments as done.Jun 3 2022, 7:24 AM
This revision was landed with ongoing or failed builds.Jun 3 2022, 8:31 AM
This revision was automatically updated to reflect the committed changes.
glandium added a subscriber: glandium.EditedJun 8 2022, 2:21 AM

I don't have time to provide more details right now, but this change broke Firefox mingw builds with: llvm-strip: error: '../../dist/firefox/uninstall/helper.exe': RVA 0x43000 for base reloc table found but data is incomplete

That helper.exe file is produced by NSIS.

I don't have time to provide more details right now, but this change broke Firefox mingw builds with: llvm-strip: error: '../../dist/firefox/uninstall/helper.exe': RVA 0x43000 for base reloc table found but data is incomplete

That helper.exe file is produced by NSIS.

Can you upload that binary somewhere?

Can you upload that binary somewhere?

This is not the exact binary this happened with, but it's reproducible with this one:

Can you upload that binary somewhere?

This is not the exact binary this happened with, but it's reproducible with this one:

Thanks! I can see what's happening here. Unfortunately, it seems like that binary is somewhat broken, and libObject used to accept it silently before.

Looking at the binary with llvm-readobj --sections --file-headers, I see this:

  BaseRelocationTableRVA: 0x43000
  BaseRelocationTableSize: 0xB68

Section {
  Number: 4
  Name: .ndata (2E 6E 64 61 74 61 00 00)
  VirtualSize: 0x1C000
  VirtualAddress: 0x40000
  RawDataSize: 512

Section {
  Number: 6
  Name: .reloc (2E 72 65 6C 6F 63 00 00)
  VirtualSize: 0xB68
  VirtualAddress: 0x64000
  RawDataSize: 3072

The base relocation table usually resides in the .reloc section, and as the exact size (0xB68) matches, it looks like it just has got the wrong RVA in the header. With the new, more thorough checks, libObject notices that the RVA points into the .ndata section (within 0x40000 - 0x5C000) but there's only raw data covering 0x40000 - 0x40200, and 0x43000 is outside of that. Previously we'd just set up a pointer into bogus/out of range data in these cases, now we error out.

So the file is clearly buggy here, but it could, in some cases, be quite valid to have RVAs pointing outside of the allocated raw data. (Normally, the data size outside of RawDataSize up until VirtualSize is allocated and readable at runtime, but zero-initialized instead of initialized with data from the file.)

In this case, the check was added to avoid crashes, and for the specific case of SizeOfRawData == 0 we return SectionStrippedError, which then is silently treated as not an error (but not initializing e.g. the BaseRelocHeader pointer, to avoid trying to read from it).

To avoid crashing from RVAs outside of the allocated bounds, but still not flat out erroring out on binaries that are somewhat broken (the same issue causes llvm-readobj to reject the binary, so I had to use an older llvm-readobj binary to look at it), should we just treat all those cases as a silent "section stripped" case?

E.g. this fixes the issue:

diff --git a/llvm/lib/Object/COFFObjectFile.cpp b/llvm/lib/Object/COFFObjectFile.cpp
index 65166b3710a0..f7d957ee2682 100644
--- a/llvm/lib/Object/COFFObjectFile.cpp
+++ b/llvm/lib/Object/COFFObjectFile.cpp
@@ -483,18 +483,12 @@ Error COFFObjectFile::getRvaPtr(uint32_t Addr, uintptr_t &Res,
       // fail, otherwise it will be impossible to use this object as debug info
       // in LLDB. Return SectionStrippedError here so that
       // COFFObjectFile::initialize can ignore the error.
-      if (Section->SizeOfRawData == 0)
-        return make_error<SectionStrippedError>();
+      // Somewhat common binaries may have RVAs pointing outside of the
+      // provided raw data. Instead of rejecting the binaries, just
+      // treat the section as stripped for these purposes.
       if (Section->SizeOfRawData < Section->VirtualSize &&
           Addr >= SectionStart + Section->SizeOfRawData) {
-        if (ErrorContext)
-          return createStringError(object_error::parse_failed,
-                                   "RVA 0x%" PRIx32
-                                   " for %s found but data is incomplete",
-                                   Addr, ErrorContext);
-        return createStringError(
-            object_error::parse_failed,
-            "RVA 0x%" PRIx32 " found but data is incomplete", Addr);
+        return make_error<SectionStrippedError>();
       }
       uint32_t Offset = Addr - SectionStart;
       Res = reinterpret_cast<uintptr_t>(base()) + Section->PointerToRawData +

Yeah, that seems fine as a workaround. https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#section-table-section-headers says:

If this (VirtualSize) value is greater than SizeOfRawData, the section is zero-padded.

This means the RVA should technically be pointing to a block of zeros. If we pretend the section does not exist at all, it is probably still good enough, right?

But I wonder if the code should always just use the .reloc section and ignore the Base Relocation Table RVA if it points to somewhere odd?

Yeah, that seems fine as a workaround. https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#section-table-section-headers says:

If this (VirtualSize) value is greater than SizeOfRawData, the section is zero-padded.

This means the RVA should technically be pointing to a block of zeros. If we pretend the section does not exist at all, it is probably still good enough, right?

Yep, that's probably good enough. For some tables, I guess it could be remotely useful to pretend it exists and allocate a temporary zero block to point into. (Previously you would have gotten random data from the neighbouring sections instead.) But that sounds like something to consider if that actually turns out to be needed...

But I wonder if the code should always just use the .reloc section and ignore the Base Relocation Table RVA if it points to somewhere odd?

No, I think it's safer to use the data directories - that'd be what windows itself uses. After linking, the section names are generally mostly cosmetic. Some things stay in their own sections up until linking but get merged into e.g. the .rdata section when linked. (For dwarf unwind info in PE/COFF we do have a hack in libunwind though - libunwind introspects the modules and locates the section named .eh_frame at runtime. Due to how PE/COFF string tables and all that works, the section name is truncated to .eh_fram in practice.)

I guess you're right. MSVC's dumpbin /relocations helper.exe listed nothing at all.

The base relocation table usually resides in the .reloc section, and as the exact size (0xB68) matches, it looks like it just has got the wrong RVA in the header. With the new, more thorough checks, libObject notices that the RVA points into the .ndata section (within 0x40000 - 0x5C000) but there's only raw data covering 0x40000 - 0x40200, and 0x43000 is outside of that. Previously we'd just set up a pointer into bogus/out of range data in these cases, now we error out.

From a cursory look, it seems to come from NSIS itself not handing BaseRelocationTableRVA. Its prebuild binaries come with stubs that actually have it set to 0 and thus don't have a .reloc section. The stubs we're using in the case of that helper.exe were compiled with mingw-clang and linked with mingw-lld, and do contain a .reloc section. The helper.exe binary ends up having the same BaseRelocationTableRVA as the stub, but NSIS moved the .reloc section, so it's not correct anymore.

So... maybe you shouldn't actually relax things as per D127345, because it seems like a genuine bug in NSIS that should be fixed rather than worked around.

So, looking a little bit deeper, the resource editor in NSIS doesn't handle relocations, and the .reloc section is right after the .rsrc section that the resource editor will grow... So they actually have a build rule to pass /FIXED to link.exe when building their stubs with MSVC. I suppose mingw-binutils defaults to non-relocatable .exes (because the problem doesn't appear when building with mingw-gcc/mingw-binutils), and we only end up with relocatable .exes because we're building with mingw-clang/lld and lld defaults to relocatable .exes.
I'd call this a build system bug.

The base relocation table usually resides in the .reloc section, and as the exact size (0xB68) matches, it looks like it just has got the wrong RVA in the header. With the new, more thorough checks, libObject notices that the RVA points into the .ndata section (within 0x40000 - 0x5C000) but there's only raw data covering 0x40000 - 0x40200, and 0x43000 is outside of that. Previously we'd just set up a pointer into bogus/out of range data in these cases, now we error out.

From a cursory look, it seems to come from NSIS itself not handing BaseRelocationTableRVA. Its prebuild binaries come with stubs that actually have it set to 0 and thus don't have a .reloc section. The stubs we're using in the case of that helper.exe were compiled with mingw-clang and linked with mingw-lld, and do contain a .reloc section. The helper.exe binary ends up having the same BaseRelocationTableRVA as the stub, but NSIS moved the .reloc section, so it's not correct anymore.

Ouch, that does indeed sound like something that should be fixed.

(I don’t remember offhand, but it might be possible to omit the reloc section for an executable by linking with ‘-Wl,-Xlink=-fixed` and maybe -Wl,--disable-dynamicbase. Not sure if it’s actually left out, or if it is just not used.)

Fixing NSIS definitely sounds like something that should be done here.

So... maybe you shouldn't actually relax things as per D127345, because it seems like a genuine bug in NSIS that should be fixed rather than worked around.

Well, while the NSIS bug is real, and it’s great that the restrictive object library caught it, I think we still should go with that patch too anyway - the PE executable isn’t entirely broken structurally, the table just lies in zero-initialized, allocated memory. And that lets the llvm tools operate on files that are potentially valid, instead of flat out rejecting them.

@rnk - what’s your take on this?

Passing -Wl,/fixed as a linker flag when building the NSIS stubs fixes the issue, FWIW.

So, looking a little bit deeper, the resource editor in NSIS doesn't handle relocations, and the .reloc section is right after the .rsrc section that the resource editor will grow... So they actually have a build rule to pass /FIXED to link.exe when building their stubs with MSVC. I suppose mingw-binutils defaults to non-relocatable .exes (because the problem doesn't appear when building with mingw-gcc/mingw-binutils), and we only end up with relocatable .exes because we're building with mingw-clang/lld and lld defaults to relocatable .exes.

Actually, binutils 2.36 and newer also create a reloc section by default now, since this commit: https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=514b4e191d5f46de8e142fe216e677a35fa9c4bb;hp=f2e5245f4169c2a1849ba998872e245c1b303950

It can be disabled by passing -Wl,--disable-reloc-section though.

Passing -Wl,/fixed as a linker flag when building the NSIS stubs fixes the issue, FWIW.

It seems like we should implement the --disable-reloc-section in the LLD mingw linker layer then, and pass it onto the lld-link interface as /fixed.

(I'm somewhat amazed that you can pass options directly via the mingw linker layer to the lld-link layer like that. The mingw options interface considers /fixed as an input path name and just passes it on to lld-link, which then considers it an option. The kosher way of doing it is -Wl,-Xlink=/fixed (or -Wl,-Xlink=-fixed if you prefer that form), but that's a bit unwieldy and clumsy. And for any case where one really wants to pass such options outside of just one off testing, we should add a proper mingw linker option for it.)