Page MenuHomePhabricator

[llvm-elfabi] Emit ELF header and string table section

Authored by haowei on May 9 2019, 3:54 PM.



This change serves to create the initial framework for outputting ELF files from llvm-elfabi.

Sorry this is so large but this, perhaps short of the code for outputting the string table, is about as small of a setup as I can get and still have something testable. You can see the full picture here: Had I tried to get to exactly this point more directly and not by splitting it probably wouldn't be quite as large or generically written but if possible I'd like to keep some of the generality here since I'll likely add it back soon.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jhenderson added inline comments.Nov 12 2020, 1:52 AM

I think the wrong place was changed?


Same comment here re. default error handler function. You might want to change the TBEWriteError and other error bits similarly, in a separate patch.

This revision now requires changes to proceed.Nov 12 2020, 1:52 AM
jhenderson added inline comments.Nov 12 2020, 1:53 AM

Looking at the pre-merge checks this test is failing on Windows as-is.

grimar added inline comments.Nov 12 2020, 2:12 AM

It is only used for writing Elf_Shdr, Elf_Ehdr etc types, what I think should work fine on any hosts.
Is your concern that somebody might try to use write for regular types?


I believe this code is unreachable because readELFFile is only called for ELFs:

// First try to read as a binary (fails fast if not binary).
if (InputFileFormat.getNumOccurrences() == 0 ||
    InputFileFormat == FileFormat::ELF) {
  Expected<std::unique_ptr<ELFStub>> StubFromELF =
jhenderson added inline comments.Nov 12 2020, 2:28 AM

We're probably talking past each other, or I might be forgetting something, but the fields within Elf_Shdr, Elf_Ehdr etc will be in little/big endian order right depending on the host, right? This code here is essentially just doing a byte-wise copy, by my understanding.


InputFileFormat.getNumOccurrences() == 0 - doesn't that mean an unspecified input format will be attempted to be read as ELF, but there's no guarantee it actually is ELF...?

grimar added inline comments.Nov 12 2020, 2:41 AM

My understanding is the following:

E.g. Elf_Shdr consists of Elf_Word`/Elf_Addr/Elf_Off fields,
which are of support::detail::packed_endian_specific_integral type:

using Elf_Shdr_Base<ELFT>::sh_size;
template <endianness TargetEndianness>
struct Elf_Shdr_Base<ELFType<TargetEndianness, false>> {
  LLVM_ELF_IMPORT_TYPES(TargetEndianness, false)
  Elf_Word sh_name;      // Section name (index into string table)
  Elf_Word sh_type;      // Section type (SHT_*)
  Elf_Word sh_flags;     // Section flags (SHF_*)
  Elf_Addr sh_addr;      // Address where section is to be loaded
  Elf_Off sh_offset;     // File offset of section data, in bytes
  Elf_Word sh_size;      // Size of section, in bytes
  Elf_Word sh_link;      // Section type-specific header table index link
  Elf_Word sh_info;      // Section type-specific extra information
  Elf_Word sh_addralign; // Section address alignment
  Elf_Word sh_entsize;   // Size of records contained within the section

#define LLVM_ELF_IMPORT_TYPES_ELFT(ELFT)                                       \
  using Elf_Addr = typename ELFT::Addr;                                        \
  using Elf_Off = typename ELFT::Off;                                          \
  using Elf_Half = typename ELFT::Half;                                        \
  using Elf_Word = typename ELFT::Word;                                        \
  using Elf_Sword = typename ELFT::Sword;                                      \
  using Elf_Xword = typename ELFT::Xword;                                      \
  using Elf_Sxword = typename ELFT::Sxword;

  using Half = packed<uint16_t>;
  using Word = packed<uint32_t>;
  using Sword = packed<int32_t>;
  using Xword = packed<uint64_t>;
  using Sxword = packed<int64_t>;
  using Addr = packed<uint>;
  using Off = packed<uint>;
  template <typename Ty>
  using packed = support::detail::packed_endian_specific_integral<Ty, E, 1>;

This type respects endianess on assign:

struct packed_endian_specific_integral {
  void operator=(value_type newValue) {
    endian::write<value_type, endian, alignment>(
      (void*)Value.buffer, newValue);

/// Write a value to memory with a particular endianness.
template <typename value_type, std::size_t alignment>
inline void write(void *memory, value_type value, endianness endian) {
  value = byte_swap<value_type>(value, endian);
             memory, (detail::PickAlignment<value_type, alignment>::value)),
         &value, sizeof(value_type));

template <typename value_type>
inline value_type byte_swap(value_type value, endianness endian) {
  if ((endian != native) && (endian != system_endianness()))
  return value;

Oh. I missed that there is || InputFileFormat == FileFormat::ELF, not && InputFileFormat == FileFormat::ELF.
This place looks a bit wierd to me..

haowei updated this revision to Diff 305052.Nov 13 2020, 12:38 AM
haowei marked 24 inline comments as done.
haowei updated this revision to Diff 305053.Nov 13 2020, 12:49 AM
haowei marked an inline comment as done.
haowei added inline comments.

I checked writeEhdr() in obj-copy but it is not flexible enough to be used here. I agree it would be great to have a common function to write Ehdr into memory so it can be reused by these tools. But I feel it is out of scope of this change. I can draft a change to address this problem if you feel it is necessary to do so.


StrTab is renamed to DynStr.


The section alignment is set in this builder function so I guess the case you discussed would not happen. I have no preference in using 0 or 1 here. Using 1 seems to make code simpler but less robust.


I prefer not to do that. Using write function to determine the offset would look very messy on D89432.


Using loop would make more sense but it seems to be infeasible in this case. The "Section.Content.write" function is responsible for writing data to the MemoryBuffer but this function is not a virtual function derived by different classes (there are SymTab and Dynamic sections builder in D89432) so it would require some data structure changes to make it happen. Or if there is anything like "instanceof" that I can check and cast the types safely at run time, that would also work.

As shown in D89432, we only need to write 4 sections in a stub elf file. So it is not very cost efficient to invent some new classes just to make the loop happen. If you have better solutions please let me know.


I changed the order in the new diff


Test does not fail when writing elfheader in be or le. If you have better solution to make it more robust, please let me know.


I changed it to llvm_unreachable.


I changed the wrong place, sorry about that.


I have a windows workstation but it does not have development environment set up for llvm. I will see if I can set it up tomorrow. For now, let me test it on bots first.

1 ↗(On Diff #304652)

merged in latest diff.

1 ↗(On Diff #304652)

merged in latest diff.

1 ↗(On Diff #304652)

We intended to merge elfabi tool into elf-ifs tool once the we feel ELFObjHandler is stable. So I prefer not to add additional flag(feature) to llvm-elfabi at this point.


writeBinaryStub also use SoName in TargetStub variable. This change moves this logic outside of the EmitTBE branch so the assignment at L140 can be written once but used by both L147 and L159.


I added a fatalError(Error err) function.

jhenderson added inline comments.Nov 16 2020, 2:07 AM

This comment still has most of the typos/grammar problems I fixed. Please review my suggested edit and fix them.


FWIW, LLVM has a dyn_cast still mechanism that allows some degree of runtime type checking, but it only works within a class heirarchy (see any class that has the classof function, and how that works).

It looks like it would be straightforward to unify the section types somehow, with a virtual write function, but without actually trying it, I'm not certain it would be.


It's fine. I forgot that the Elf_* types had endianness magic built into them.

1 ↗(On Diff #305053)

Add descriptive comment at top of test. This should be in place for all tests.

I'd also consider renaming it output-target-error.test, which is a little more concise.

14 ↗(On Diff #305053)

Nit: no new line at EOF.

1 ↗(On Diff #305053)

Use '##' for comments, to make them stand out from other lit and FileCheck commands. This applies to all new comments you've added in this and other tests.

4–7 ↗(On Diff #305053)

These can all be combined into a single llvm-readobj execution, without the need for separate prefixes.

1 ↗(On Diff #305053)

It might make more sense to merge the ELF32/ELF64 and LE/BE tests all into a single test. Your test would look something like:

# RUN: llvm-elfabi %s --output-target=elf32-little %t32-little
# RUN: llvm-readobj -h -S --string-dump .shstrtab --string-dump .dynstr %t32-little | FileCheck %s -DCLASS="32-bit (0x1)" <other defines here for the other bits that are slightly different>
# RUN: llvm-elfabi %s --output-target=elf32-big %t32-b ig
# RUN: llvm-readobj -h -S --string-dump .shstrtab --string-dump .dynstr %t32-big | FileCheck %s -DCLASS="32-bit (0x1)" <other defines here for the other bits that are slightly different>
<same for 64-bit variants>
haowei updated this revision to Diff 305663.Nov 16 2020, 11:19 PM
haowei marked 3 inline comments as done.
haowei updated this revision to Diff 305673.Nov 17 2020, 12:19 AM
haowei marked 2 inline comments as done.
haowei marked 3 inline comments as done.Nov 17 2020, 12:22 AM
haowei added inline comments.
1 ↗(On Diff #305053)

I combined 4 test files into 1.

jhenderson added inline comments.Nov 17 2020, 12:59 AM

Did you get anywhere with resolving this? The test is still failing on the pre-merge bot, as noted near the top of this page.

haowei updated this revision to Diff 305965.Nov 17 2020, 8:35 PM
haowei marked an inline comment as done.
haowei added inline comments.Nov 17 2020, 8:39 PM

I finally get my windows workstation to build llvm successfully. So the reason this test failed on Windows is that, the chmod seems to be a no-op on windows. It didn't change the permission of the %t.TestDir at all. The failure at not llvm-elfabi... did not happen so the test failed.

I don't have a good workaround for this so I disabled the test under Windows for now. Do you have any good suggestion for testing this failure under Windows?

grimar added inline comments.Nov 18 2020, 12:00 AM

I guess you should be able to change the permission for the file, if not for directory.

Our LLD test (ELF\lto\thinlto-cant-write-index.ll) has the following lines:

; Ensure lld generates error if unable to write to index files
; RUN: rm -f %t2.o.thinlto.bc
; RUN: touch %t2.o.thinlto.bc
; RUN: chmod u-w %t2.o.thinlto.bc
; RUN: not ld.lld --plugin-opt=thinlto-index-only -shared %t1.o %t2.o -o /dev/null 2>&1 | FileCheck %s
; RUN: chmod u+w %t2.o.thinlto.bc
; CHECK: cannot open {{.*}}2.o.thinlto.bc: {{P|p}}ermission denied

This test is not disabled for windows amd I believe that the last line has {{P|p}} because the message
produced is different on windows/*nix.

There are other LLD tests that are using chmod on files and are not disabled on windows.

haowei added inline comments.Nov 18 2020, 12:08 AM

If I only change the file permission, it won't work under linux. Apparently the file write function used in elfabi will remove the file first and then recreate it. If the directory contains the file has wx permission, the file can be removed regardless of the file's own permission bits. That is why in this test, I changed the permission of the directory, not the file.

grimar added inline comments.Nov 18 2020, 12:28 AM

I see. Perhaps it is OK to keep it unsupported at least for now.
(may be someone else have ideas, I don't have any currently).

I am not sure how your way of doing it works well though.
Usually in tests the "system-windows" is commonly used:

UNSUPPORTED: system-windows

I also observe "windows-gnu", "windows-msvc" and just "windows" versions but,
don't see "-windows-" anywhere.

jhenderson added inline comments.Nov 18 2020, 12:37 AM

If you can get the test to work in one case for Linux and another case for Windows, how about you split this test into fail-file-write.test (for non-Windows) and fail-file-write-windows.test (for Windows) with the appropriate setup and REQUIERS/UNSUPPORTED checks according to the test?

haowei updated this revision to Diff 306244.Nov 18 2020, 3:49 PM
haowei updated this revision to Diff 306258.Nov 18 2020, 4:29 PM
haowei added inline comments.Nov 18 2020, 4:34 PM

I added a windows test. However, it does not trigger the error from L530 in ELFObjHandler.cpp . Instead, it triggers the error from L539 in the same file. Maybe FileOutputBuffer::create is implemented differently on 2 platforms.

MaskRay added inline comments.Nov 18 2020, 4:48 PM

Use static for these functions

The coding standard also says non-exported class declarations should be written in anonymous namespaces

haowei updated this revision to Diff 306292.Nov 18 2020, 8:31 PM
haowei added inline comments.

I added static.

This is not a class declaration I guess I don't need to put it into anonymous namespace.

jhenderson added inline comments.Nov 19 2020, 12:45 AM

Windows file permissions are tricky at best. The important thing for these tests is that the code in llvm-elfabi that handles errors from FileOutputBuffer::create is correct. It doesn't really matter what the error actually is here.


Ping this?


This is not a class declaration I guess I don't need to put it into anonymous namespace.

@MaskRay might be referring to the new classes in ElfObjHandler.cpp like OutputSection etc

haowei updated this revision to Diff 306527.Nov 19 2020, 1:48 PM
haowei marked 2 inline comments as done.
haowei added inline comments.Nov 19 2020, 1:55 PM

Sorry I missed that. Phabricator gray out the comments made me though I already fixed it.


I put those classes into anonymous namespace. Does it look good?

MaskRay added inline comments.Nov 19 2020, 2:04 PM

The coding standard uses // end anonymous namespace (no period)
Some code uses // namespace because sometimes clang-format inserts that.
Please don't invent a new style.


Please make all new functions static if not used outside the file.

haowei updated this revision to Diff 306536.Nov 19 2020, 2:25 PM
haowei marked an inline comment as done.
haowei marked an inline comment as done.Nov 19 2020, 2:27 PM
haowei added inline comments.

All new functions that are not used outside of the translation unit was added with static modifier, please let me know if you found anything I missed.

jhenderson accepted this revision.Nov 20 2020, 12:14 AM

Looks good from my point of view, but please wait for @MaskRay to confirm he is happy.

This revision is now accepted and ready to land.Nov 20 2020, 12:14 AM
MaskRay accepted this revision.Nov 20 2020, 12:22 AM
This revision was landed with ongoing or failed builds.Nov 23 2020, 11:34 AM
This revision was automatically updated to reflect the committed changes.
haowei marked an inline comment as done.