This is an archive of the discontinued LLVM Phabricator instance.

llvm-objcopy: Implement --extract-partition and --extract-main-partition.
ClosedPublic

Authored by pcc on May 23 2019, 8:07 PM.

Details

Summary

This implements the functionality described in Partitions.rst in D60242
(eventually https://lld.llvm.org/Partitions.html). It works as follows:

  • Reads the section headers using the ELF header at file offset 0;
  • If extracting a loadable partition:
    • Finds the section containing the required partition ELF header by looking it up in the section table;
    • Reads the ELF and program headers from the section.
  • If extracting the main partition:
    • Reads the ELF and program headers from file offset 0.
  • Filters the section table according to which sections are in the program headers that it read:
    • If ParentSegment != nullptr or section is not SHF_ALLOC, then it goes in.
    • Sections containing partition ELF headers or program headers are excluded as there are no headers for these in ordinary ELF files.

Diff Detail

Repository
rL LLVM

Event Timeline

pcc created this revision.May 23 2019, 8:07 PM
Herald added a project: Restricted Project. · View Herald Transcript
MaskRay added inline comments.May 23 2019, 8:11 PM
llvm/tools/llvm-objcopy/ELF/Object.cpp
961 ↗(On Diff #201118)

const SectionBase &. Recently there were changes to spell auto types explicitly.

llvm/tools/llvm-objcopy/ObjcopyOpts.td
152 ↗(On Diff #201118)

Flag<["--"]. In llvm-objcopy land, we no longer use -long-option...

MaskRay removed a subscriber: MaskRay.
pcc updated this revision to Diff 201120.May 23 2019, 8:44 PM
pcc marked 2 inline comments as done.
  • Address review comments
MaskRay added inline comments.May 28 2019, 11:33 PM
llvm/test/tools/llvm-objcopy/ELF/Inputs/partitions.elf.test
5 ↗(On Diff #201120)

ld.lld can be used in a llvm-objcopy test. This may have to be moved to lld/test/ELF/

pcc marked an inline comment as done.May 28 2019, 11:54 PM
pcc added inline comments.
llvm/test/tools/llvm-objcopy/ELF/Inputs/partitions.elf.test
5 ↗(On Diff #201120)

(assuming you meant *can't* be used)

Yes, that's correct. That's why I'm checking in a binary (partitions.elf) to be used during this test. The purpose of this file is to document how I generated partitions.elf.

A few questions

  1. This direction means that we can only ever convert from a multi partition binary to a single partition binary. It seems like we should be able to remove each partition individually. The naming scheme I would imagine would be "--remove-partition=<name>" or "--remove-main-partition" and for the equivalent of this feature "--only-keep-partition=<name" and "--only-keep-main-partition". Are you sure we want to limit things to work like this?
  2. Can you explain the details a bit more? Like it appears that we have a section to point to partition headers but I'd expect this to be available at dynamic link time as well which should mean that there should be a program header based solution to this as well. I'm also trying to understand exactly what the resulting file format looks like and what effect other operations would have on the net result. Lots of things are not clear to me from the documentation posted.
llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp
734 ↗(On Diff #201120)

Adding the CopyConfig here is antithetical to the goal of making this code into a library at some point in the future. We should find a way to avoid this. This goes for propagation of Config elsewhere in the code as well. Propogation of CopyConfig is a hard no from me personally.

pcc marked an inline comment as done.May 29 2019, 2:57 PM

A few questions

  1. This direction means that we can only ever convert from a multi partition binary to a single partition binary. It seems like we should be able to remove each partition individually. The naming scheme I would imagine would be "--remove-partition=<name>" or "--remove-main-partition" and for the equivalent of this feature "--only-keep-partition=<name" and "--only-keep-main-partition". Are you sure we want to limit things to work like this?
  2. Can you explain the details a bit more? Like it appears that we have a section to point to partition headers but I'd expect this to be available at dynamic link time as well which should mean that there should be a program header based solution to this as well. I'm also trying to understand exactly what the resulting file format looks like and what effect other operations would have on the net result. Lots of things are not clear to me from the documentation posted.

The combined output file (i.e. the file produced by the linker) is not intended to be loaded directly. It is a description of the program with all partitions loaded. As written in the doc, it is essentially an ELF file with all of the partitions concatenated together. Further postprocessing (by llvm-objcopy) is required to produce a loadable file.

In a combined output file, each partition has its own set of data contained within segments (i.e. ELF header, program headers and SHF_ALLOC sections), while data not contained within segments (e.g. debug info, section headers) is shared between partitions. To extract a loadable partition, the partition's ELF headers are found by searching the section table for an SHT_PART_EHDR section named after the partition. From there, the program headers are found, which serve as a description of which SHF_ALLOC sections pertain to the partition. Thus the entire description of each partition can be found starting from the section headers.

This is not unlike other types of ELF files that describe (parts of) a program without being directly loadable. Two other examples are core dumps and --only-keep-debug files.

Because the combined output file is not intended to be loaded directly, there is no need for a description of the loadable partitions in any other partition's program headers. When llvm-objcopy extracts a partition, the information required to load that partition, i.e. its ELF and program headers, end up moved into place at the front of the file so that they can be read by the dynamic loader. You can think of what llvm-objcopy is doing when it extracts a partition as taking a slice of the combined output file from the start of the partition's ELF headers until the start of the next partition (or the end of the file), except that the data not contained within segments is preserved where it makes sense (i.e. debug info, section headers pertaining to preserved sections).

I'm having trouble seeing a use case for --remove-partition or --remove-main-partition, and it doesn't really fit the pattern of "make this file loadable" of the other flags, so I'd be hesitant to implement it.

llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp
734 ↗(On Diff #201120)

In an earlier version of the code [1] I was just passing down the name of the partition to extract. Would that work better (maybe with the name passed as a ctor argument rather than as a field)? Alternatively, we could create something like a ReaderConfig to be passed down here that for the moment would just contain the partition name.

[1] https://github.com/pcc/llvm-project/blob/c9939239d7e829e2c779138aaedea061f4a095d0/llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp#L597

Ok I spoke offline about some of my questions with Peter. Here are some useful facts:

All partitions share the same section header table. This means that if you don't use --extract-* parameters llvm-objcopy should still copy everything as needed. Not that this is intended to work but --strip-all (but not --strip-sections) would actually work as I would expect it to if we told it about these new section types (e.g. treat them like GNU warnings). Debug information is shared between them. I thought this was odd as they all had different address spaces. Peter did some tests and debuggers seem to handle when debug information references in DWARF to outside of the DSO just fine (Peter also pointed out that --gc-sections causes this sort of issue as well). I think --strip-sections would effectively extract the main partition.

There is no dynamic linker support planned for this so before you can run these sorts of DSOs or link them into things you have to extract them.

I'm happy with the general direction here. I don't think there are significant barriers to tweaking this sort of thing in the future or even to doing multi-partition extraction.

llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp
734 ↗(On Diff #201120)

Either ReaderConfig or just passing the partition name makes sense to me.

jakehehrlich added inline comments.May 31 2019, 2:58 PM
llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp
734 ↗(On Diff #201120)

I think using Option<StringRef> or having two constructors, one with and one without would be best actully. That way you can distinguish between the case for not extracting main and extracting main.

pcc updated this revision to Diff 202808.Jun 3 2019, 3:23 PM
pcc marked an inline comment as done.
  • Create ReaderConfig
pcc marked an inline comment as done.Jun 3 2019, 3:24 PM
pcc added inline comments.
llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp
734 ↗(On Diff #201120)

I decided to go with Optional<StringRef> in a field of ReaderConfig.

I'm happy with this as is but I'd like @jhenderson, @rupprecht, and @alexshap to take a look since this represents a fairly major feature addition even if the code is pretty small. I don't believe this imposes any extra assumptions or requirements on the rest of the code however.

I've got a few minor comments, but structurally this looks reasonable to me. I could potentially see other benefits to this approach in the future too.

llvm/test/tools/llvm-objcopy/ELF/partitions.test
1–3 ↗(On Diff #202808)

It might be helpful briefly explaining what "part1" and "part2" are for those not massively familiar with the intended switch behaviour. This is probably best done by explaining what partitions.elf contains.

9–12 ↗(On Diff #202808)

I'd prefer this part of the test to be after the MAIN: bits further down. I'd also recommend putting the first error message after the first check and the second after the second check. Keeping the first three RUNs at the top together makes sense though, so I'd do:

RUN: extract main
RUN: extract 1
RUN: extract 2

CHECKs for the first three

RUN: error 1
ERROR1:
RUN: error 2
ERROR2:
14 ↗(On Diff #202808)

We usually add extra spacing to make all the content line up like it does in the real output, i.e:

MAIN:      ELF Header:
MAIN-NEXT:   Magic:
llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp
516 ↗(On Diff #202808)

It's quite tricky looking at the test to see if this part of the code is actually tested. Could you confirm that it is, i.e. that for both switches there is a non-alloc section not in a segment that is stripped and that there is a non-alloc section in a segment that is not stripped, please?

llvm/tools/llvm-objcopy/ELF/Object.h
885 ↗(On Diff #202808)

Do you anticipate there being multiple members in this struct in the future? If not, it seems a bit superfluous.

900 ↗(On Diff #202808)

I think this needs a comment explaining what HdrElfFile is. Either that or the name needs changing, because it's not immediately clear to me what it's supposed to be.

pcc updated this revision to Diff 203271.Jun 5 2019, 5:53 PM
pcc marked 8 inline comments as done.
  • Address review comments
pcc added inline comments.Jun 5 2019, 5:56 PM
llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp
516 ↗(On Diff #202808)

Non-alloc sections not in segments are not stripped. In the test, .comment is such a section and the test shows that it is kept in all output files.

The linker will not normally place non-alloc sections in segments (I think this is only possible with the SECTIONS directive, and lld forbids SECTIONS together with multiple partitions), so it would be tricky to construct such a test case, and I think we can consider a file with such a section to be invalid (as a multi-partition file, not in general) anyway. This condition returns false (i.e. keep) once it's determined that the section is non-alloc, so we wouldn't be increasing code coverage by having a non-alloc section in a segment.

The important case that is being tested is that alloc sections not in segments (i.e. not in segments for the current partition) are being stripped. The test shows that e.g. the main partition's .text0 is being stripped from the loadable partitions, likewise .bss1 in part1 is being stripped from the main partition and part2, etc.

I wonder whether it would be better (in a separate change) to flip the predicate here and have it return true if we want to keep the section. That seems like it would make at least this clause a little less confusing, but I'm not sure about the others.

llvm/tools/llvm-objcopy/ELF/Object.h
885 ↗(On Diff #202808)

Okay, I changed this to a ctor arg.

900 ↗(On Diff #202808)

Is the comment on lines 1434-1436 not enough? I think that if I were trying to understand why we're passing HdrElfFile here, I'd look at what the caller of readProgramHeaders was doing, which would lead me to that comment. Maybe there's a better place for it though?

jhenderson added inline comments.Jun 6 2019, 3:26 AM
llvm/test/tools/llvm-objcopy/ELF/partitions.test
186 ↗(On Diff #203271)

part3 -> part2 (?) to avoid confusion with the other error message?

llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp
516 ↗(On Diff #202808)

Thanks.

I don't think we should flip the predicate - the function is effectively "shouldRemove()" and so true is the correct return for sections that should be removed (and false otherwise).

I think I misread (Sec.Flags & SHF_ALLOC) != 0 as being "true if section is not-alloc", when it is the inverse. I'm not sure flipping the predicate would make this any less likely to happen.

llvm/tools/llvm-objcopy/ELF/Object.h
900 ↗(On Diff #202808)

I tend to go with a different approach (I guess everybody is different): if I'm in readProgramHeaders, and I want to know what my input argument is, I go to the declaration, because there's usually documentation comments there, if not at the definition, so maybe the comment needs moving here.

Alternatively, maybe it would be slightly less confusing renaming it slightly to HeadersFile?

pcc updated this revision to Diff 203408.Jun 6 2019, 11:40 AM
pcc marked 3 inline comments as done.
  • Address review comments
llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp
516 ↗(On Diff #202808)

Ack

llvm/tools/llvm-objcopy/ELF/Object.h
900 ↗(On Diff #202808)

Renaming to HeadersFile works for me; done.

jhenderson accepted this revision.Jun 7 2019, 2:22 AM

Thanks, LGTM, though please wait for confirmation from some of the others.

This revision is now accepted and ready to land.Jun 7 2019, 2:22 AM
MaskRay accepted this revision.Jun 7 2019, 2:34 AM
MaskRay added inline comments.
llvm/test/tools/llvm-objcopy/ELF/Inputs/partitions.elf.test
5 ↗(On Diff #201120)

Sorry I didn't notice this file is in the Inputs/ directory so these lines do not run.

ruiu accepted this revision.Jun 7 2019, 3:47 AM
ruiu added a subscriber: ruiu.

I think jhenderson's LGTM is enough but in case you need one more, LGTM.

This revision was automatically updated to reflect the committed changes.