This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy][COFF] Fix section name encoding
ClosedPublic

Authored by npmiller on Feb 1 2022, 3:55 AM.

Details

Summary

This patch is bringing the section name encoding of llvm-objcopy to the same level as what's done in llvm/lib/MC/WinCOFFObjectWriter.cpp, the base64 encoding code is taken from there.

The section name encoding for llvm-objcopy had two main issues, the first is that the size used for the snprintf in the original code is incorrect because snprintf adds a null byte, so this code was only able to encode offsets of 6 digits, /, \0 and 6 digits of the offset, rather than the 7 digits it should support.

And the second part is that it didn't support the base64 encoding for offsets larger than 7 digits.

This issue specifically showed up when using the clang-offload-bundler with a binary containing a lot of symbols/sections, since it uses llvm-objcopy to add the sections containing the offload code.

This patch may need some modifications and/or follow up.

It would be good to de-duplicate some of this code, at least the base64 encoding code, there's a base64 encoding function in support however it encodes bytes, rather than re-interpret a number into base64, but I'm not sure where it would go best.

And I'm also a little unsure on how to approach testing for this, yaml2obj, however it is also missing base64 encoding which should definitely be added, but even then the yaml file ends up very large to be able to test all the different encodings. The MC counterpart of this uses llvm-mc and assembler macros to generate the very large string tables, would it be okay to do the same in llvm-objcopy testing?

Diff Detail

Event Timeline

npmiller created this revision.Feb 1 2022, 3:55 AM
npmiller requested review of this revision.Feb 1 2022, 3:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2022, 3:55 AM
mstorsjo added inline comments.
llvm/tools/llvm-objcopy/COFF/Writer.cpp
173 ↗(On Diff #404064)

Wouldn't this end up overwriting the first byte following S.Header.Name? (It might be benign to do that in practice here, but nevertheless, it would be better to not write outside of the intended struct fields.)

npmiller added inline comments.Feb 1 2022, 5:09 AM
llvm/tools/llvm-objcopy/COFF/Writer.cpp
173 ↗(On Diff #404064)

No because this is no writing to the struct field but to an intermediary string str. The correct amount of data is memcpy'd into the struct field from str a bit further down.

mstorsjo added inline comments.Feb 1 2022, 5:28 AM
llvm/tools/llvm-objcopy/COFF/Writer.cpp
173 ↗(On Diff #404064)

Oh, right, I see! Sorry for the confusion.

And I'm also a little unsure on how to approach testing for this, yaml2obj, however it is also missing base64 encoding which should definitely be added, but even then the yaml file ends up very large to be able to test all the different encodings. The MC counterpart of this uses llvm-mc and assembler macros to generate the very large string tables, would it be okay to do the same in llvm-objcopy testing?

If it's impractical to use yaml2obj, using llvm-mc is fine. Just make sure the appropriate REQUIRES: directive is included for the target.

llvm/tools/llvm-objcopy/COFF/Writer.cpp
159 ↗(On Diff #404064)

Please make sure comments follow LLVM coding standards with upper-case first letter, and period at the end.

163 ↗(On Diff #404064)

LLVM coding style uses UpperCamelCase for variable names. Please fix your variable names. Also, you can zero-initialise, without the need for memset.

npmiller updated this revision to Diff 405003.Feb 1 2022, 10:49 AM

This updated patch adds a test using llvm-mc to check the section name encoding, it is inspired from the MC test for this feature llvm/test/MC/COFF/section-name-encoding.s, but tweaked for llvm-objcopy.

It also fixes up the comment formatting, variable naming and zero-initialization as suggested.

jhenderson added inline comments.Feb 2 2022, 12:50 AM
llvm/test/tools/llvm-objcopy/COFF/section-name-encoding.s
2–3

llvm-objcopy generally uses # and ## in lit test files, the former for test directives, like RUN lines, and FileCheck patterns, the latter for pure comments.

(If llvm-mc doesn't like # as a comment marker, then I'd use //// still for true comments, to distinguish them from test directives).

llvm/tools/llvm-objcopy/COFF/Writer.cpp
126 ↗(On Diff #405003)

In terms of avoiding duplicated code, I think you should, in a separate prerequisite patch, move the duplicate code from WinCOFFObjectWriter.cpp to somewhere like the BinaryFormat or Object library. It's not immediately clear to me which is more appropriate though.

172 ↗(On Diff #405003)

I'd use static_cast rather than a C-style cast, whilst you're changing this line.

npmiller updated this revision to Diff 405276.Feb 2 2022, 7:50 AM

Comment prefixes in the test have been updated as suggested.

I've also filed a separate parent revision to move the section name encoding code from llvm/lib/MC/WinCOFFObjectWriter.cpp into BinaryFormat, and updated this patch to use that instead.

See: https://reviews.llvm.org/D118793

@jhenderson Can you have another look at this? I ran into the issue that this patch is fixing (discussion in https://github.com/msys2/MINGW-packages/pull/10555#issuecomment-1046014913), and if at all possible, I'd like to have the fix for this issue in the 14.0.x branch (as we're only just past rc1, there should still be time for valuable bug fixes IMO).

What do you think about that? Alternatively, if this feels too big to cherrypick (IMO it isn't), I could fix the simplest part of the issue with a smaller patch like this:

diff --git a/llvm/lib/ObjCopy/COFF/Writer.cpp b/llvm/lib/ObjCopy/COFF/Writer.cpp
index cbd0e4261238..c7928b009b5d 100644
--- a/llvm/lib/ObjCopy/COFF/Writer.cpp
+++ b/llvm/lib/ObjCopy/COFF/Writer.cpp
@@ -130,8 +130,10 @@ size_t COFFWriter::finalizeStringTable() {
   for (auto &S : Obj.getMutableSections()) {
     memset(S.Header.Name, 0, sizeof(S.Header.Name));
     if (S.Name.size() > COFF::NameSize) {
-      snprintf(S.Header.Name, sizeof(S.Header.Name), "/%d",
+      char Buf[COFF::NameSize + 1]{};
+      snprintf(Buf, sizeof(Buf), "/%d",
                (int)StrTabBuilder.getOffset(S.Name));
+      memcpy(S.Header.Name, Buf, COFF::NameSize);
     } else {
       memcpy(S.Header.Name, S.Name.data(), S.Name.size());
     }

(This would fix the bug I ran into, where the bug appears with string tables over 10^6 bytes. This minimal patch makes things work with string tables up to 10^7 bytes - while the patch under review here handles it even further than that.)

jhenderson accepted this revision.Feb 21 2022, 1:21 AM

LGTM. Sorry, not sure how this fell off my radar. I'm happy for it to be picked too. @mstorsjo, you are good with this patch, I take it?

This revision is now accepted and ready to land.Feb 21 2022, 1:21 AM

LGTM. Sorry, not sure how this fell off my radar. I'm happy for it to be picked too. @mstorsjo, you are good with this patch, I take it?

Yep, it looks good to me too. I'll go ahead and land D118793 and this one then, and then file the backport ticket. (Re backporting, the code has been moved between llvm/tools/llvm-objcopy and llvm/lib/ObjCopy between the releases/14.x and main branches, but git cherry-pick seems to handle it fine regardless - let's hope it works in the backport testing bot too.)

This revision was landed with ongoing or failed builds.Feb 21 2022, 3:51 AM
This revision was automatically updated to reflect the committed changes.

This is failing: https://lab.llvm.org/buildbot/#/builders/121/builds/16742

FAILED: lib/libLLVMObjCopy.so.15git 
: && /usr/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-class-memaccess -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wmisleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/./lib  -Wl,--gc-sections -shared -Wl,-soname,libLLVMObjCopy.so.15git -o lib/libLLVMObjCopy.so.15git lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/Archive.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/ObjCopy.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/ConfigManager.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/COFF/COFFObjcopy.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/COFF/Object.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/COFF/Reader.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/COFF/Writer.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/ELF/ELFObjcopy.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/ELF/Object.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/MachO/MachOObjcopy.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/MachO/MachOReader.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/MachO/MachOWriter.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/MachO/MachOLayoutBuilder.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/MachO/Object.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/wasm/Object.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/wasm/Reader.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/wasm/Writer.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/wasm/WasmObjcopy.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib"  lib/libLLVMObject.so.15git  lib/libLLVMMC.so.15git  lib/libLLVMSupport.so.15git  -Wl,-rpath-link,/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/lib && :
lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/COFF/Writer.cpp.o: In function `llvm::objcopy::coff::COFFWriter::finalizeStringTable() [clone .localalias.6]':
Writer.cpp:(.text._ZN4llvm7objcopy4coff10COFFWriter19finalizeStringTableEv+0x1ec): undefined reference to `llvm::COFF::encodeSectionName(char*, unsigned long)'

This is failing: https://lab.llvm.org/buildbot/#/builders/121/builds/16742

FAILED: lib/libLLVMObjCopy.so.15git 
: && /usr/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wno-maybe-uninitialized -Wno-class-memaccess -Wno-noexcept-type -Wdelete-non-virtual-dtor -Wno-comment -Wmisleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/./lib  -Wl,--gc-sections -shared -Wl,-soname,libLLVMObjCopy.so.15git -o lib/libLLVMObjCopy.so.15git lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/Archive.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/ObjCopy.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/ConfigManager.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/COFF/COFFObjcopy.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/COFF/Object.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/COFF/Reader.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/COFF/Writer.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/ELF/ELFObjcopy.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/ELF/Object.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/MachO/MachOObjcopy.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/MachO/MachOReader.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/MachO/MachOWriter.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/MachO/MachOLayoutBuilder.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/MachO/Object.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/wasm/Object.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/wasm/Reader.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/wasm/Writer.cpp.o lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/wasm/WasmObjcopy.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib"  lib/libLLVMObject.so.15git  lib/libLLVMMC.so.15git  lib/libLLVMSupport.so.15git  -Wl,-rpath-link,/home/buildbots/ppc64le-clang-multistage-test/clang-ppc64le-multistage/stage1/lib && :
lib/ObjCopy/CMakeFiles/LLVMObjCopy.dir/COFF/Writer.cpp.o: In function `llvm::objcopy::coff::COFFWriter::finalizeStringTable() [clone .localalias.6]':
Writer.cpp:(.text._ZN4llvm7objcopy4coff10COFFWriter19finalizeStringTableEv+0x1ec): undefined reference to `llvm::COFF::encodeSectionName(char*, unsigned long)'

Sorry about that - if you happen to have such a build available, does this diff help?

diff --git a/llvm/lib/ObjCopy/CMakeLists.txt b/llvm/lib/ObjCopy/CMakeLists.txt
index 1e516394c74a..ec1160e331c9 100644
--- a/llvm/lib/ObjCopy/CMakeLists.txt
+++ b/llvm/lib/ObjCopy/CMakeLists.txt
@@ -64,6 +64,7 @@ add_llvm_component_library(LLVMObjCopy
   intrinsics_gen

   LINK_COMPONENTS
+  BinaryFormat
   Object
   Support
   MC
svenvh added a subscriber: svenvh.Feb 21 2022, 6:03 AM

Sorry about that - if you happen to have such a build available, does this diff help?

diff --git a/llvm/lib/ObjCopy/CMakeLists.txt b/llvm/lib/ObjCopy/CMakeLists.txt
index 1e516394c74a..ec1160e331c9 100644
--- a/llvm/lib/ObjCopy/CMakeLists.txt
+++ b/llvm/lib/ObjCopy/CMakeLists.txt
@@ -64,6 +64,7 @@ add_llvm_component_library(LLVMObjCopy
   intrinsics_gen

   LINK_COMPONENTS
+  BinaryFormat
   Object
   Support
   MC

I have just committed this exact same patch, which did fix the shared libs build issue for me locally.

Sorry about that - if you happen to have such a build available, does this diff help?

diff --git a/llvm/lib/ObjCopy/CMakeLists.txt b/llvm/lib/ObjCopy/CMakeLists.txt
index 1e516394c74a..ec1160e331c9 100644
--- a/llvm/lib/ObjCopy/CMakeLists.txt
+++ b/llvm/lib/ObjCopy/CMakeLists.txt
@@ -64,6 +64,7 @@ add_llvm_component_library(LLVMObjCopy
   intrinsics_gen

   LINK_COMPONENTS
+  BinaryFormat
   Object
   Support
   MC

I have just committed this exact same patch, which did fix the shared libs build issue for me locally.

Thanks!