This is an archive of the discontinued LLVM Phabricator instance.

[ELF] --oformat=binary: use LMA to compute file offsets
ClosedPublic

Authored by MaskRay on Aug 1 2020, 10:42 PM.

Details

Summary

--oformat=binary is rare (used in a few places in FreeBSD, see stand/i386/mbr/Makefile LDFLAGS_BIN)
The result should be identical to a normal output transformed by objcopy -O binary.

The current implementation ignores addresses and lays out sections by
respecting output section alignments. It can fail when an output section
address is specified, e.g. .rodata ALIGN(16) : (PR33651).

Fix PR33651 by respecting LMA. The code is similar to
tools/llvm-objcop/ELF/Object.cpp BinaryWriter::finalize after D71035 and D79229.
Unforunately for an output section without PT_LOAD, we assume its LMA is equal
to its VMA. So the result is still incorrect when an output section LMA
(AT(...)) is specified

Also drop alignTo(off, config->wordsize). GNU ld does not round up the file size.

Diff Detail

Event Timeline

MaskRay created this revision.Aug 1 2020, 10:42 PM
MaskRay requested review of this revision.Aug 1 2020, 10:42 PM
grimar added a comment.Aug 3 2020, 1:26 AM

There is a GNU ld bug about --oformat=binary "Linking with --oformat=binary discards non-static global variables" (https://sourceware.org/bugzilla/show_bug.cgi?id=12524).
The last 2 comments are saying:

  1. "ld --oformat is a misfeature. Don't use it. As you have found, it doesn't work except for the simplest of object files, and only then on a very limited set of targets. You are much better off linking to your usual target, then converting the file with objcopy.",
  2. "I only added support for --oformat=binary to gold because the Linux kernel build uses it.".

So I am not sure that we want to add more complication for the existent logic of --oformat binary implemented in LLD?

MaskRay added a comment.EditedAug 3 2020, 9:02 AM

There is a GNU ld bug about --oformat=binary "Linking with --oformat=binary discards non-static global variables" (https://sourceware.org/bugzilla/show_bug.cgi?id=12524).
The last 2 comments are saying:

  1. "ld --oformat is a misfeature. Don't use it. As you have found, it doesn't work except for the simplest of object files, and only then on a very limited set of targets. You are much better off linking to your usual target, then converting the file with objcopy.",
  2. "I only added support for --oformat=binary to gold because the Linux kernel build uses it.".

So I am not sure that we want to add more complication for the existent logic of --oformat binary implemented in LLD?

I think the complexity is acceptable: 7 lines of code + 2 lines of comments + 1 blank line. For a misfeature, we either not implement it, approximate it, or fully implement it, but should not leave it in a known broken state if a reasonable fix is available. For this case I consider we have a perfect emulation. We cannot simply drop the feature because there are still use cases (in latest Linux kernel it appears to only exist in documentation; in FreeBSD it is still used).

I also question Alan's conclusion that "--oformat=binary is a misfeature". We can say it is redundant, because "objcopy -O binary" performs the task, but I can't find evidence that this feature was misdesigned. It was possible that the feature cannot be reasonable implemented on other binary formats, so Alan called it a misfeature, but for ELF --oformat=binary should be good.

FWIW, I don't really agree that --oformat=binary is necessarily a misfeature in a linker - it saves the extra objcopy step to produce binary output, and the logic in the linker should be much simpler, since the linker already knows everything about the program layout, without having to come up with complicated heuristics to figure out what to do in some edge cases (e.g. what to do with empty sections on segment edges). That being said, I don't have any use-case in my area, so I don't really care whether it is implemented or not.

lld/ELF/Writer.cpp
2558

laided -> laid

psmith added a comment.Aug 4 2020, 1:09 AM

I agree with MaskRay that if we do support the feature then we should make sure it works or document its limitations, even if our limitations are "Implemented to the degree that it supports the FreeBSD Kernel, we recommend producing an ELF file and using llvm-objcopy --output-target=binary."

FWIW the modifications here look like an improvement and are localised to a small error so no objections from me for the improvement. I'm surprised to see we don't generate Program Headers for oformat binary. I would expect us to generate but not allocate addresses or write out the ELF header or program headers to the file. I think this would be the equivalent of writing an ELF file then using objcopy on it.

MaskRay updated this revision to Diff 283019.Aug 4 2020, 1:49 PM
MaskRay marked an inline comment as done.

Fix a typo

grimar added a comment.Aug 5 2020, 2:07 AM

OK. I have no more objections about this then. Happy if others are happy.

lld/ELF/Writer.cpp
2564

Consider setting the offset field just once and deduplicating the
sec->type != SHT_NOBITS && (sec->flags & SHF_ALLOC) && sec->size > 0 condition:

E.g.

auto needsOffset = [](OutputSection &sec) {
  return sec.type != SHT_NOBITS && (sec.flags & SHF_ALLOC) && sec.size > 0;
};

uint64_t minAddr = UINT64_MAX;
for (OutputSection *sec : outputSections)
  if (needsOffset(*sec))
    minAddr = std::min(minAddr, sec->getLMA());

fileSize = 0;
for (OutputSection *sec : outputSections)
  if (needsOffset(*sec)) {
    sec->offset = sec->getLMA() - minAddr;
    fileSize = std::max(fileSize, sec->offset + sec->size);
  }

Or

uint64_t minAddr = UINT64_MAX;
std::vector<OutputSection *> sections;
for (OutputSection *sec : outputSections) {
  if (sec.type = SHT_NOBITS && (sec.flags & SHF_ALLOC) && sec.size > 0) {
    minAddr = std::min(minAddr, sec->getLMA());
    sections.push_back(sec);
  }
}

fileSize = 0;
for (OutputSection *sec : sections) {
  sec->offset = sec->getLMA() - minAddr;
  fileSize = std::max(fileSize, sec->offset + sec->size);
}
psmith added a comment.Aug 5 2020, 8:08 AM

No objections from me either. I guess LGTM if George's last comment can be resolved.

MaskRay updated this revision to Diff 283267.Aug 5 2020, 9:08 AM
MaskRay marked an inline comment as done.

Add lambda needsOffset

This revision was not accepted when it landed; it landed in state Needs Review.Aug 5 2020, 9:10 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.