This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Attribute an empty section to a segment ending at its address
AbandonedPublic

Authored by MaskRay on Feb 17 2020, 10:10 PM.

Details

Summary

sectionWithinSegment() treats an empty section as having a size of 1. Due to the rule, an empty
.tdata will not be attributed to an empty PT_TLS. (The empty p_align=64 PT_TLS is for Android
Bionic's TCB compatibility. See https://reviews.llvm.org/D62055#1507426 for more context.)

--only-keep-debug will not layout a segment with no section (layoutSegmentsForOnlyKeepDebug()), thus
p_offset of PT_TLS can go past the end of the file. The strange p_offset can trigger validation
errors for successive tools, e.g. llvm-objcopy errors when reading back the separate debug file
(readProgramHeaders()).

For an empty section on the boundary of two segments, attributing it to the second segment is better
(see empty-sections.test). The (treating sh_size=0 as sh_size=1) rule is arguably to make this
happen. We can actually make the intention more explicit in readProgramHeaders() and delete the
(treating sh_size=0 as sh_size=1) rule.

In the case of nested segments, e.g. PT_LOAD⊇PT_GNU_RELRO⊇PT_TLS. We want ParentSegment to refer
to PT_LOAD so that layoutSectionsForOnlyKeepDebug() will advance sh_offset to make sh_offset≡sh_addr
(mod p_align), this is naturally satisfied because PT_LOAD comes before PT_TLS (GNU ld/gold/lld).

(If PT_LOAD comes after PT_TLS, the rule will break, but this is really impractical.)

A side effect is that an empty non-SHF_ALLOC section following SHF_ALLOC sections will be considered
included by a segment. The change to strip-non-alloc.test and strip-all.test demonstrates this
behavior change, which should be benign.

Diff Detail

Event Timeline

MaskRay created this revision.Feb 17 2020, 10:10 PM
MaskRay edited the summary of this revision. (Show Details)Feb 17 2020, 10:30 PM
jhenderson requested changes to this revision.Feb 19 2020, 2:06 AM

I don't think this is correct. There are a large number of parts of the existing behaviour that will be changed by this patch, and I'm not sure that's desirable. First and foremost, an empty section at the boundary point between two segments will go from being considered part of the second segmetn for layout purposes, to being in the first segment, as it impacts what segment is considered the section's parent. Here's a bit of YAML I threw together to illustrate the behaviour difference:

--- !ELF
FileHeader:
  Class:   ELFCLASS64
  Data:    ELFDATA2LSB
  Type:    ET_EXEC
  Machine: EM_X86_64
Sections:
  - Name: gap
    Type: Fill
    Size: 0xE00
  - Name: .foo
    Type: SHT_PROGBITS
    Flags: [SHF_ALLOC]
    Size: 0x100
    AddressAlign: 0x100
    Address: 0
  - Name: .empty
    Type: SHT_PROGBITS
    Flags: [SHF_ALLOC]
    Address: 0x1000
  - Name: .baz
    Type: SHT_PROGBITS
    Flags: [SHF_ALLOC]
    Size: 0x100
    AddressAlign: 0x1000
    Address: 0x1000
ProgramHeaders:
  - Type: PT_LOAD
    VAddr: 0
    PAddr: 0
    Align: 0x100
    Sections:
      - Section: .foo
  - Type: PT_LOAD
    VAddr: 0x1000
    PAddr: 0x1000
    Align: 0x1000
    Sections:
      - Section: .empty
      - Section: .baz

Both llvm-objcopy and GNU objcopy will physically move the first segment earlier in the file, to fill the gap, and all contained sections will move with it (before the patch, this is just .foo). This means that .foo ends up at offset 0x100, and .empty and .baz are left at 0x1000, since the second segment can't move due to alignment constraints. With the patch, however, .empty is considered part of the first segment, and so is moved to offset 0x200, leaving it a long way from the segment it should be in according to the address.

Whilst in general this probably doesn't matter (section headers have no impact on runtime usually, whilst an empty section has no contents so it doesn't matter where it gets moved to), I could imagine rarer cases where sections are used as some kind of marker or similar. Moving the section to a different segment might cause unintended consequences. Furthermore, the effective address of that section will change as it is now in a different segment. This might cause problems for tools that want to lookup things, based on the address.

I think a different solution needs to be found.

Additional related thought - for an empty segment that is on the boundary between two segments, which should it move with? Arguably, the segment is in neither and can be moved independently (that is what GNU objcopy does). I extended the above example to try this out, by having a segment at the same location as .empty, and the segment moved independently of the other two, but .empty remained in the final segment. Weirdly, if I made that segment a PT_TLS segment, the behaviour changed completely - the PT_TLS was moved even before the other two segments, and its Virtual Address changed to 0. This sounds a little mad, and I'm not sure we should follow it. Further experimentation showed yet another different behaviour in GNU objcopy however - if the .empty section is given the SHF_TLS flag, the PT_TLS segment does not move (i.e. it stays with the .empty section, at the start of the second segment.

My conclusion from all of this is that the solution might be to special-case TLS segments somehow. I honestly don't really know though.

This revision now requires changes to proceed.Feb 19 2020, 2:06 AM
MaskRay updated this revision to Diff 245552.EditedFeb 19 2020, 4:19 PM

The following patch does not affect any tests.

--- i/llvm/tools/llvm-objcopy/ELF/Object.cpp
+++ w/llvm/tools/llvm-objcopy/ELF/Object.cpp
@@ -1294,3 +1294,3 @@ void ELFBuilder<ELFT>::readProgramHeaders(const ELFFile<ELFT> &HeadersFile) {
         Seg.addSection(&Sec);
-        if (!Sec.ParentSegment || Sec.ParentSegment->Offset > Seg.Offset)
+        if (!Sec.ParentSegment)
           Sec.ParentSegment = &Seg;

So we don't have enough coverage for the subtle "empty section" cases.


llvm::objcopy::elf::Segment::Sections is mostly unused. The first section is used by --only-keep-debug. llvm::objcopy::elf::Segment::ParentSegment is the main thing - it can affect layout decisions w/ or w/o --only-keep-debug.

I have updated the patch to change sectionWithinSegment() heuristics. Now, for an empty section Sec on the boundary between two segments, Sec->ParentSegment will be set to the right segment.

This right segment may be a PT_TLS or PT_GNU_RELRO, and we may want to set Sec->ParentSegment to the containing PT_LOAD. A later loop does this.

Your test case (included in D74879) does not change with the new heuristics.

MaskRay marked an inline comment as done.Feb 19 2020, 4:58 PM
MaskRay added inline comments.
llvm/tools/llvm-objcopy/ELF/Object.cpp
1322

I believe this code captures the intention of the original code. Though, no tests break with this change.

I find it difficult to construct a test case that will break by deleting this piece of code.

@jhenderson It seems you may have such tests?

MaskRay updated this revision to Diff 246080.Feb 22 2020, 9:43 AM
MaskRay edited the summary of this revision. (Show Details)

Simplify

MaskRay marked an inline comment as done.Feb 22 2020, 9:43 AM
MaskRay added inline comments.
llvm/tools/llvm-objcopy/ELF/Object.cpp
1322

Deleted. I think this is not useful.

jhenderson added inline comments.Feb 25 2020, 1:41 AM
llvm/test/tools/llvm-objcopy/ELF/only-keep-debug.test
273–274

Do we need these lines?

llvm/test/tools/llvm-objcopy/ELF/strip-all.test
83

I think this behaviour change shows the test needs changing, as it no longer tests the case "non-alloc not in segment is removed".

llvm/tools/llvm-objcopy/ELF/Object.cpp
1061

by -> in

1291

This change will mean that sections are no longer assigned to the "most parental" segment. For example in the below example, the Section's parent will be the PT_TLS segment rather than the PT_LOAD segment.

 | Section |
 | PT_TLS  |
| PT_LOAD   |

I had a quick look at the code, and there are some cases where the ParentSegment is important: in the IHexWriter::finalize function, the sectionPhysicalAddr static function, and the layoutSectionsForOnlyKeepDebug, there are checks against PT_LOAD segment types. There may be other issues too that I haven't found.

I haven't tried to create test cases that involve any of these, but it seems like it would be possible.

1322

FWIW, I didn't have any specific test that broke with this. I only had the example that I posted that I came up with after considering the code change.

MaskRay marked 5 inline comments as done.EditedFeb 25 2020, 12:28 PM

I haven't tried to create test cases that involve any of these, but it seems like it would be possible.

It is possible but it does not matter.

For the PT_LOAD which include the PT_TLS, note that p_offset(PT_LOAD) = p_offset(PT_TLS) on all versions of GNU ld and gold, and lld>=9.0 (precisely, after D56828).

We will set ParentSegment to the PT_TLS if p_offset(PT_LOAD) < p_offset(PT_TLS). GNU strip/objcopy<2.31 does not handle such PT_TLS (there are various other toolchain validation issues) => this makes such a layout quite problematic => it is not worth working around the layout in llvm-objcopy. (Alan Modra kindly worked around lld's old layout in binutils 2.31 but I hope they can drop that hack (https://sourceware.org/bugzilla/show_bug.cgi?id=22829))


It is also possible to place PT_TLS before PT_LOAD to set ParentSegment to the PT_TLS, but it is also an impractical scenario. The solution I pick here is consistent with:

static bool compareSegmentsByOffset(const Segment *A, const Segment *B) {
  // Any segment without a parent segment should come before a segment
  // that has a parent segment.
  if (A->OriginalOffset < B->OriginalOffset)
    return true;
  if (A->OriginalOffset > B->OriginalOffset)
    return false;
  return A->Index < B->Index;
}

Linkers don't place a PT_TLS before a PT_LOAD. One can use PHDRS command to place a PT_TLS before a PT_LOAD, but we can say that is unsupported by llvm-objcopy.

(A recent binutils change https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=30fe183248b2523ecff9da36853e2f893c4c4b91 introduced a fairly complex comparison function whose purpose is similar to our's. I think it is an overengineering.)

llvm/test/tools/llvm-objcopy/ELF/strip-all.test
83

Added Size: 1 to .blarg instead.

llvm/tools/llvm-objcopy/ELF/Object.cpp
1322

I think there are only artificial impractical tests that can break. See my comment above.

MaskRay updated this revision to Diff 246555.Feb 25 2020, 1:23 PM
MaskRay marked an inline comment as done.

Address comments

For the PT_LOAD which include the PT_TLS, note that p_offset(PT_LOAD) = p_offset(PT_TLS) on all versions of GNU ld and gold, and lld>=9.0 (precisely, after D56828).

Just because GNU ld, LLD and gold are the most common opensource linkers around doesn't mean that they're the only linkers in existence, and that all outputs will conform to their output styles. Indeed, in our downstream linker script, we have precisely the situation where p_offset(PT_LOAD) != p_offset(PT_TLS), although I admit that I don't think we use any of the functionality affected by this change currently (though we might in the future).

We will set ParentSegment to the PT_TLS if p_offset(PT_LOAD) < p_offset(PT_TLS). GNU strip/objcopy<2.31 does not handle such PT_TLS (there are various other toolchain validation issues) => this makes such a layout quite problematic => it is not worth working around the layout in llvm-objcopy. (Alan Modra kindly worked around lld's old layout in binutils 2.31 but I hope they can drop that hack (https://sourceware.org/bugzilla/show_bug.cgi?id=22829))

Just because there's a bug in GNU objcopy/strip doesn't mean we should try to emulate that bug. This is a good example of where we can (in fact, we already do) and should do better than GNU tools. I think I've been persuaded that an empty section at the end of another segment should be considered part of that segment IFF there is no segment to its right, but I don't agree with breaking parent layout calculations that currently work in all other cases.


It is also possible to place PT_TLS before PT_LOAD to set ParentSegment to the PT_TLS, but it is also an impractical scenario. The solution I pick here is consistent with:

static bool compareSegmentsByOffset(const Segment *A, const Segment *B) {
  // Any segment without a parent segment should come before a segment
  // that has a parent segment.
  if (A->OriginalOffset < B->OriginalOffset)
    return true;
  if (A->OriginalOffset > B->OriginalOffset)
    return false;
  return A->Index < B->Index;
}

I'm not sure it is consistent with the system as a whole. Note that segments are wholly resolved such that the nested tree is flattened (i.e. no parent segment has a parent segment itself) (see setParentSegment). This change would mean that sections don't get this same behaviour (their parent segment would be the "deepest" segment).

Linkers don't place a PT_TLS before a PT_LOAD. One can use PHDRS command to place a PT_TLS before a PT_LOAD, but we can say that is unsupported by llvm-objcopy.

Why are we at all justified in saying that a perfectly legal PHDRS command is unsupported by llvm-objcopy? This seems extremely unreasonable, especially when we can't even assume that all linkers output that style, as there is nothing in the ELF specification mandating it (PT_LOADs are the only things mentioned in the gABI in relation to program header table ordering). Worse yet, by making the change you're proposing, we're actually breaking behaviour that currently works.

(A recent binutils change https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=30fe183248b2523ecff9da36853e2f893c4c4b91 introduced a fairly complex comparison function whose purpose is similar to our's. I think it is an overengineering.)

The fix appears to solve a bug though (the tools couldn't handle a valid input). Whilst the solution might be more complex than required (I don't know whether it is or not), a solution was necessary.

For the PT_LOAD which include the PT_TLS, note that p_offset(PT_LOAD) = p_offset(PT_TLS) on all versions of GNU ld and gold, and lld>=9.0 (precisely, after D56828).

Just because GNU ld, LLD and gold are the most common opensource linkers around doesn't mean that they're the only linkers in existence, and that all outputs will conform to their output styles. Indeed, in our downstream linker script, we have precisely the situation where p_offset(PT_LOAD) != p_offset(PT_TLS), although I admit that I don't think we use any of the functionality affected by this change currently (though we might in the future).

I have rechecked usage of Section::ParentSegment. The only fuzzy thing I find is ihex output:

static uint64_t sectionPhysicalAddr(const SectionBase *Sec) {
  Segment *Seg = Sec->ParentSegment;
  if (Seg && Seg->Type != ELF::PT_LOAD)
    Seg = nullptr;
  return Seg ? Seg->PAddr + Sec->OriginalOffset - Seg->OriginalOffset
             : Sec->Addr;
}

This is related to the load memory address (LMA) concept. I started to know LMA with a previous llvm-objcopy -O binary change. My recent linker changes related to LMA regions (D74286, D74297) enhanced my understanding, but I confess I still don't know enough.

The situation you deal with (p_offset(PT_TLS) < p_offset(PT_LOAD)) is in my mind. I have tried hard thinking of several cases but I can't find a case where the patch can break copy/only-keep-debug usage.

I suggest we start with this patch, improve it along the way when we have more detailed description of the use cases.

We will set ParentSegment to the PT_TLS if p_offset(PT_LOAD) < p_offset(PT_TLS). GNU strip/objcopy<2.31 does not handle such PT_TLS (there are various other toolchain validation issues) => this makes such a layout quite problematic => it is not worth working around the layout in llvm-objcopy. (Alan Modra kindly worked around lld's old layout in binutils 2.31 but I hope they can drop that hack (https://sourceware.org/bugzilla/show_bug.cgi?id=22829))

Just because there's a bug in GNU objcopy/strip doesn't mean we should try to emulate that bug.

I don't think it is simply "a bug in GNU objcopy/strip". There is some inherent issues with this layout.

This is a good example of where we can (in fact, we already do) and should do better than GNU tools. I think I've been persuaded that an empty section at the end of another segment should be considered part of that segment IFF there is no segment to its right, but I don't agree with breaking parent layout calculations that currently work in all other cases.

Agreed. I have read some binutils code and I totally agree that the related code in bfd/elf.c (assign_file_positions*) and ld/ldlang.c can be largely simplified. For example, a PT_LOAD pass followed by a non-PT_LOAD pass should really be a segment(section) pass followed by a section(segment) pass.
(I came up with the current --only-keep-debug approach after thinking very hard about the lld Writer, the llvm-objcopy layout and bfd/elf.c assign_file_positions*)


It is also possible to place PT_TLS before PT_LOAD to set ParentSegment to the PT_TLS, but it is also an impractical scenario. The solution I pick here is consistent with:

static bool compareSegmentsByOffset(const Segment *A, const Segment *B) {
  // Any segment without a parent segment should come before a segment
  // that has a parent segment.
  if (A->OriginalOffset < B->OriginalOffset)
    return true;
  if (A->OriginalOffset > B->OriginalOffset)
    return false;
  return A->Index < B->Index;
}

I'm not sure it is consistent with the system as a whole. Note that segments are wholly resolved such that the nested tree is flattened (i.e. no parent segment has a parent segment itself) (see setParentSegment). This change would mean that sections don't get this same behaviour (their parent segment would be the "deepest" segment).

Linkers don't place a PT_TLS before a PT_LOAD. One can use PHDRS command to place a PT_TLS before a PT_LOAD, but we can say that is unsupported by llvm-objcopy.

Why are we at all justified in saying that a perfectly legal PHDRS command is unsupported by llvm-objcopy? This seems extremely unreasonable, especially when we can't even assume that all linkers output that style, as there is nothing in the ELF specification mandating it (PT_LOADs are the only things mentioned in the gABI in relation to program header table ordering). Worse yet, by making the change you're proposing, we're actually breaking behaviour that currently works.

Sorry, I can't agree with "this patch is breaking behavior that currently works."
(If you have a test case, I'll happily adapt. I can't think of a case this patch can break, though.)

Note that I was imagining how the changed rule could theoretically break future use cases.
I suggest we start thinking about them after we have concrete use cases.

  1. Adding a flattening rule can be hardly tested now. 2) Such a rule may be completely rewritten.

(A recent binutils change https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=30fe183248b2523ecff9da36853e2f893c4c4b91 introduced a fairly complex comparison function whose purpose is similar to our's. I think it is an overengineering.)

The fix appears to solve a bug though (the tools couldn't handle a valid input). Whilst the solution might be more complex than required (I don't know whether it is or not), a solution was necessary.

I am pleased this patch managed to support @yikong's use cases by deleting code:)

FYI, I'm not ignoring you here, I just want to spend some significant time trying to understand the exact implications of the behaviour change before responding further. Unfortunately, with my current workload, I haven't been able to do that yet.

MaskRay added a comment.EditedMar 2 2020, 9:13 AM

FYI, I'm not ignoring you here, I just want to spend some significant time trying to understand the exact implications of the behaviour change before responding further. Unfortunately, with my current workload, I haven't been able to do that yet.

No hurry! The logic (attributing sections to segments) is intricate, and part of the logic will be shared with obj2yaml (dumping program headers). We should be cautious here.

(I thought about several scenarios that may be similar to your use case, and I can't break default ELFWriter/--only-keep-debug output. If you have a use case that indeed needs to flatten Section::ParentSegment, I will be happy to help in the future.)

Gentle ping... We need this change for Android.

Gentle ping... We need this change for Android.

Could you clarify what exactly about this change you need for Android?

I've been struggling under an overly large backlog of reviews, so haven't been able to spend the time I need to on this yet. I'm hoping I'll have time to review it more later this week.

In the case of nested segments, e.g. PT_LOAD⊇PT_GNU_RELRO⊇PT_TLS. We want ParentSegment to refer
to PT_LOAD so that layoutSectionsForOnlyKeepDebug() will advance sh_offset to make sh_offset≡sh_addr
(mod p_align), this is naturally satisfied because PT_LOAD comes before PT_TLS (GNU ld/gold/lld).

(If PT_LOAD comes after PT_TLS, the rule will break, but this is really impractical.)

I still don't understand what is meant by "really impractical". It is not hard to write a linker script where this is the case and produce runnable output on my Linux VM, and whilst I wouldn't generally encourage people to write linker scripts, there are plenty of people out there who do for various perfectly valid reasons.

I've been thinking about alternatives that would a) get rid of the size 1, and b) solve the problem, without changing the existing behaviour in bad ways for non-empty segments/sections.

The problem as I understand it is that an empty section is not allocated to an empty segment at the same location. How about we just change the sectionWithinSegment check for empty sections to simply say something like:

if (Section.Size == 0 && Segment.FileSize == 0)
  return Segment.OriginalOffset == Section.OriginalOffset;
// Possibly add equivalent case for SHT_NOBITS using addresses.

This ensures empty sections are considered to be within empty segments if they share the same offset (I think this makes a reasonable amount of sense, although I could imagine some weird issues if you end up with multiple empty sections and segments at the same location with different alignments). We might want similar logic added to segmentOverlapsSegment, although I'm not sure.

MaskRay added a comment.EditedMar 18 2020, 9:24 AM

In the case of nested segments, e.g. PT_LOAD⊇PT_GNU_RELRO⊇PT_TLS. We want ParentSegment to refer
to PT_LOAD so that layoutSectionsForOnlyKeepDebug() will advance sh_offset to make sh_offset≡sh_addr
(mod p_align), this is naturally satisfied because PT_LOAD comes before PT_TLS (GNU ld/gold/lld).

(If PT_LOAD comes after PT_TLS, the rule will break, but this is really impractical.)

I still don't understand what is meant by "really impractical". It is not hard to write a linker script where this is the case and produce runnable output on my Linux VM, and whilst I wouldn't generally encourage people to write linker scripts, there are plenty of people out there who do for various perfectly valid reasons.

"really impratical" because - as my previous comment said, I tried to break the code but I can't. If you have such a use case, let's consider it.

By (If PT_LOAD comes after PT_TLS, the rule will break, but this is really impractical.), it was my hypothesis. I can't find a use case to prove it. If PT_TLS comes before PT_LOAD, I won't be surprised if some other binary manipulation tools will break.

I've been thinking about alternatives that would a) get rid of the size 1, and b) solve the problem, without changing the existing behaviour in bad ways for non-empty segments/sections.

The problem as I understand it is that an empty section is not allocated to an empty segment at the same location. How about we just change the sectionWithinSegment check for empty sections to simply say something like:

if (Section.Size == 0 && Segment.FileSize == 0)
  return Segment.OriginalOffset == Section.OriginalOffset;
// Possibly add equivalent case for SHT_NOBITS using addresses.

This ensures empty sections are considered to be within empty segments if they share the same offset (I think this makes a reasonable amount of sense, although I could imagine some weird issues if you end up with multiple empty sections and segments at the same location with different alignments). We might want similar logic added to segmentOverlapsSegment, although I'm not sure.

As I mentioned, I am happy that we achieve @kongyi's goal by deleting code. One fallacy of GNU is that they add a bunch of rules which cannot easily be simplified now. When a concrete use case surfaces, I am happy to adapt. We may find a better way to do that.

Sorry for the delay in coming back again to this. It's taken me quite some time to collect my thoughts and find time for it, amidst all the other reviews people want my input on, not to mention my actual own workload.

One of the things that @jakehehrlich and I tried hard to achieve when writing llvm-objcopy was to ensure that llvm-objcopy can handle cleanly any valid input. By "valid" I mean compliant with the ELF gABI. The intent of that was to ensure that the output could be consumed by any gABI-compliant tool, and can similarly consume any input generated by such tools. Hence, it is important to be able to handle any segment ordering, regardless of whether tools commonly emit it or not. The reasoning behind this is that there are many different usages of the tools out there, and we cannot account for all use-cases, especially if we are not accounting for valid things in the ELF standard (examples include segments that are not nested inside PT_LOADs, PT_TLS segments before PT_LOAD segments and so on). "Simplifying" code that removes support for a valid layout (by changing behaviour in an undesirable way) is not a good thing to do, no matter how much of an edge case that behaviour may be, in my opinion. Additionally, I don't think we should care about binary mainpulation tools that cannot handle valid input - any such tools are not gABI compliant. Anyway, that's all mostly just high-level comments, and not all that useful, given that there is a known issue here.

More specific to the issue at hand: whilst we can certainly do something in llvm-objcopy to improve the situation @kongyi faces, I wonder if we could (should?) workaround it by relaxing some of the checks that are failing with the output? An empty segment's p_offset to me doesn't seem to really matter. It could be 0, it could be 0xffffffffffffffff. Because it has no data, it doesn't really matter where the offset is, so perhaps the checks should ignore empty segments.

@kongyi, asides from llvm-objcopy, what other tools have you seen produce the failure?

Finally, back to the current fix in llvm-objcopy. I think, to avoid confusion, it would be good for nesting of segments within other segments and sections within segments to behave the same. I note that at the moment, an empty segment is considered to overlap (and therefore be nested inside) a segment that it is at the beginning of, but not at the end of. Empty segments are not considered to be part of other empty segments at the same offset ever. Segment parents are canonicalised such that no parent segment itself has a parent segment. This simplifies the layout algorithms, because segments with a parent simply follow their parent, rather than trying to figure out where they are supposed to go independently. Prior to this patch, sections follow the same rules: they have a most parental segment, and empty sections belong to a segment iff they are wholly within or at the start of a non-empty segment.

A relevant consideration here is that there is no way of identifying whether an empty section/segment should belong to another empty segment. For example, if there are two empty segments and an empty section, all with the same offset but different alignments, how should layout be performed? I don't think there's a clear correct answer here, without further context. Clearly it makes sense for a SHT_TLS section to belong to the TLS segment, but what about other sections and segment types? Currently, I believe llvm-objcopy will lay them all out independently, according to their respective alignment restrictions. With this patch, if I'm not mistaken the section will be associated with both segments, with its parent being the first of the two in the program header table, I believe. It will therefore get moved with that segment, which could result in incorrect output due to alignment violations or similar (before the section's alignment would be considered, but now it would not be).

A related note is that llvm-readelf does not consider empty sections to be part of a segment at the same location, unlike GNU readelf. I haven't identified the exact rules the two follow though. GNU objcopy doesn't have the same issue we do, because it moves at least an empty PT_TLS segment to offset 0, regardless of where it is placed in relation to anything else (the .tdata and .tbss sections remain in their PT_LOAD). I don't know why - it doesn't seem to do this for other segments.

Sorry, this is getting a bit rambly. I keep coming back to the issue being actually in the layoutSegmentsForOnlyKeepDebug code: should we really be completely ignoring segments without sections? I can't help but feel like we should either pad the output's file size to compensate for the empty segment's offset, if necessary, or change the empty segment's offset to 0 (or indeed any good offset). Alternatively, we should loosen the checks as mentioned earlier, so that the tools don't complain.

I've got to leave it here. I was hoping to come up with another reproducible example which this patch clearly breaks, rather than just relying on higher-level comments, but I've run out of time. I'll have to do that another time.

I've played around with the llvm-objcopy's with and without this patch to see the impact of the change. One situation I noticed is where using the --remove-section option, combined with the --only-keep-debug option. In the example I'm using, there is a PT_LOAD which contains a PT_GNU_RELRO, which contains, at its end, a PT_TLS, with another relro section before the TLS segment. (Note: the segment types, apart from the PT_LOAD are irrelevant here - a custom linker script could have all sorts of different segment types where this layout could happen). If the relro section is removed by objcopy, the output from only-keep-debug differs before and after this patch. I haven't experimented yet due to time constraints, but I wouldn't be entirely surprised if the same behaviour difference would be observed if --remove-section was performed in a prior run of llvm-objcopy, or if the section was replaced with segment padding of some variety.

Here is the input source I used:

// test.cpp
thread_local int tdata = 42;
extern void (* const ptr)();
extern "C" void _start (){}
void (* const ptr)() = &_start;

// test.script - derived from LLD's output without a linker script but with the relro and tls segments and sections swapped.
PHDRS
{
    ph_phdr           PT_PHDR PHDRS   FLAGS (0x4);
    ph_ro             PT_LOAD FILEHDR PHDRS FLAGS (0x4);
    ph_text           PT_LOAD           FLAGS (0x1 | 0x4);
    ph_data           PT_LOAD           FLAGS (0x2 | 0x4);
    ph_relro          PT_GNU_RELRO      FLAGS (0x4);
    ph_tls            PT_TLS            FLAGS (0x4);
}
SECTIONS
{
    . = 0x200000 + SIZEOF_HEADERS;
    .eh_frame : { *(.eh_frame) } : ph_ro
    .text : { *(.text .text.*) } : ph_text
    .data.rel.ro : { *(.data.rel.ro .data.rel.ro*) } : ph_data : ph_relro
    .tdata : { *(.tdata .tdata.*) } : ph_data : ph_relro : ph_tls
}

// Compile and link this using clang+LLD. I then ran the following commands:
C:\Work\D74755> .\patched-objcopy.exe test.elf test.okd-patched.elf --only-keep-debug --remove-section=.data.rel.ro
C:\Work\D74755> .\base-objcopy.exe test.elf test.okd-base.elf --only-keep-debug --remove-section=.data.rel.ro
C:\Work\D74755> C:\llvm\build\Debug\bin\llvm-readelf --sections -l .\test.okd-base.elf > 1.txt
C:\Work\D74755> C:\llvm\build\Debug\bin\llvm-readelf --sections -l .\test.okd-patched.elf > 2.txt

1.txt and 2.txt illustrate that the section header offsets have been modified, such that they do not match those of the equivalent ELF file (i.e. one only created with just --remove-section=.data.rel.ro applied). I do not know how debuggers use the only-keep-debug information, but if they use section offsets, the patched version of objcopy will break them, as they will no longer match up. I will look at the remaining differences another time.

MaskRay added a comment.EditedApr 3 2020, 6:57 PM

I've played around with the llvm-objcopy's with and without this patch to see the impact of the change. One situation I noticed is where using the --remove-section option, combined with the --only-keep-debug option. In the example I'm using, there is a PT_LOAD which contains a PT_GNU_RELRO, which contains, at its end, a PT_TLS, with another relro section before the TLS segment. (Note: the segment types, apart from the PT_LOAD are irrelevant here - a custom linker script could have all sorts of different segment types where this layout could happen). If the relro section is removed by objcopy, the output from only-keep-debug differs before and after this patch. I haven't experimented yet due to time constraints, but I wouldn't be entirely surprised if the same behaviour difference would be observed if --remove-section was performed in a prior run of llvm-objcopy, or if the section was replaced with segment padding of some variety.

Here is the input source I used:

// test.cpp
thread_local int tdata = 42;
extern void (* const ptr)();
extern "C" void _start (){}
void (* const ptr)() = &_start;

// test.script - derived from LLD's output without a linker script but with the relro and tls segments and sections swapped.
PHDRS
{
    ph_phdr           PT_PHDR PHDRS   FLAGS (0x4);
    ph_ro             PT_LOAD FILEHDR PHDRS FLAGS (0x4);
    ph_text           PT_LOAD           FLAGS (0x1 | 0x4);
    ph_data           PT_LOAD           FLAGS (0x2 | 0x4);
    ph_relro          PT_GNU_RELRO      FLAGS (0x4);
    ph_tls            PT_TLS            FLAGS (0x4);
}
SECTIONS
{
    . = 0x200000 + SIZEOF_HEADERS;
    .eh_frame : { *(.eh_frame) } : ph_ro
    .text : { *(.text .text.*) } : ph_text
    .data.rel.ro : { *(.data.rel.ro .data.rel.ro*) } : ph_data : ph_relro
    .tdata : { *(.tdata .tdata.*) } : ph_data : ph_relro : ph_tls
}

// Compile and link this using clang+LLD. I then ran the following commands:
C:\Work\D74755> .\patched-objcopy.exe test.elf test.okd-patched.elf --only-keep-debug --remove-section=.data.rel.ro
C:\Work\D74755> .\base-objcopy.exe test.elf test.okd-base.elf --only-keep-debug --remove-section=.data.rel.ro
C:\Work\D74755> C:\llvm\build\Debug\bin\llvm-readelf --sections -l .\test.okd-base.elf > 1.txt
C:\Work\D74755> C:\llvm\build\Debug\bin\llvm-readelf --sections -l .\test.okd-patched.elf > 2.txt

1.txt and 2.txt illustrate that the section header offsets have been modified, such that they do not match those of the equivalent ELF file (i.e. one only created with just --remove-section=.data.rel.ro applied). I do not know how debuggers use the only-keep-debug information, but if they use section offsets, the patched version of objcopy will break them, as they will no longer match up. I will look at the remaining differences another time.

Thanks for making the investigations! I tried your example:

clang -c test.cpp -O
ld.lld test.o -T test.script -o test
llvm-objcopy --only-keep-debug --remove-section=.data.rel.ro test test.dbg
 Section Headers:                                                                                                                               [6/62]
   [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al                                                             
@@ -6,11 +6,11 @@                                                                                                                                     
   [ 1] .eh_frame         NOBITS          0000000000200190 000190 00004c 00   A  0   0  8                                                             
   [ 2] .rodata           NOBITS          00000000002001e0 000190 000008 00   A  0   0  8                                                             
   [ 3] .text             NOBITS          00000000002001f0 0001f0 000021 00  AX  0   0 16                                                             
-  [ 4] .tdata            NOBITS          0000000000200214 000214 000004 00 WAT  0   0  4                                                             
-  [ 5] .comment          PROGBITS        0000000000000000 000214 00002a 01  MS  0   0  1
-  [ 6] .symtab           SYMTAB          0000000000000000 000240 0000a8 18      8   4  8
-  [ 7] .shstrtab         STRTAB          0000000000000000 0002e8 000043 00      0   0  1
-  [ 8] .strtab           STRTAB          0000000000000000 00032b 00002e 00      0   0  1
+  [ 4] .tdata            NOBITS          0000000000200214 0001f0 000004 00 WAT  0   0  4
+  [ 5] .comment          PROGBITS        0000000000000000 0001f0 00002a 01  MS  0   0  1
+  [ 6] .symtab           SYMTAB          0000000000000000 000220 0000a8 18      8   4  8
+  [ 7] .shstrtab         STRTAB          0000000000000000 0002c8 000043 00      0   0  1
+  [ 8] .strtab           STRTAB          0000000000000000 00030b 00002e 00      0   0  1
 Key to Flags:
   W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
   L (link order), O (extra OS processing required), G (group), T (TLS),
@@ -26,9 +26,9 @@
   PHDR           0x000040 0x0000000000200040 0x0000000000200040 0x000150 0x000150 R   0x8
   LOAD           0x000000 0x0000000000200000 0x0000000000200000 0x000190 0x0001e8 R   0x1000
   LOAD           0x0001f0 0x00000000002001f0 0x00000000002001f0 0x000000 0x000021 R E 0x1000
-  LOAD           0x000214 0x0000000000200211 0x0000000000200211 0x000000 0x000007 RW  0x1000
-  GNU_RELRO      0x000214 0x0000000000200211 0x0000000000200211 0x000000 0x000def R   0x1
-  TLS            0x000214 0x0000000000200214 0x0000000000200214 0x000000 0x000004 R   0x4
+  LOAD           0x0001f0 0x0000000000200211 0x0000000000200211 0x000000 0x000007 RW  0x1000
+  GNU_RELRO      0x0001f0 0x0000000000200211 0x0000000000200211 0x000000 0x000def R   0x1
+  TLS            0x0001f0 0x0000000000200214 0x0000000000200214 0x000000 0x000004 R   0x4

The sh_offset fields of sections after .data.rel.ro are shifted. The p_offset fields of the PT_TLS, the PT_GNU_RELRO and the RW PT_LOAD are shifted.

I am not concerned about the file offset changes with --only-keep-debug --remove-section.
First, in an --only-keep-debug produced debug file, section contents other than .debug_* are not used by debuggers.
Second, users will not combine the two options.

File offsets are expected to be rewritten to keep the file size small. The sh_offset field of a SHT_NOBITS section is not meaningful. (gdb/lldb does not even need program headers. For the original implementation of --only-keep-debug (D67137), I added program headers because ChromeOS's internal symbolization system needs program headers.)

🤔

Sorry, I had to put this on the backburner again last week, as I had some other more pressing internal work to look at (I carried on doing simple reviews, but even that throughput was low). I haven't forgotten about it and will continue looking later in the week hopefully.

Note for self: only places that still need to be checked where different Sec->ParentSegment value might have an impact are:

  1. sectionPhysicalAddr (used in various places IHexSectionWriter[Base]).
  2. IHexWriter::finalize::IsInPtLoad (used to identify how to write things).
  3. BinaryWriter::finalize used to populate list of segments for ordering.
  4. BinaryWriter::finalize used to calculate addresses of sections.

BinaryWriter::finalize used to calculate addresses of sections.

Analysis for this bit: assuming that segment addresses make sense (i.e. the address of a nested segment is the same relative distance to the top-level parent's address as the offsets are), this is not an issue, as the calculation Sec.Offset - Sec.ParentSegment->Offset + Sec.ParentSegment->PAddr should evaluate to the same regardless of which segment in the stack is picked.

BinaryWriter::finalize used to populate list of segments for ordering.

At first glance, I thought that this was going to cause a significant issue, as a different set of segments could end up getting picked, and therefore behaviour might end up being different. However, when looking at the code carefully, it looks like OrderedSegments in the function is only used to set the physical address to the virtual address when there are no physical addresses set amongst these sections. It looks like there's a lot of dead code in this area that can be tidied up (why order the segments? why does a comment refer to laying them out etc?). The only effect as far as I can tell of setting the physical address is so that it can be used in the calculation already highlighted above. As a result, as long as all ParentSegments have their address updated in this way, it doesn't really matter which version is used.

Consequently, I cannot see any behaviour change that could be caused in binary output by this change.

jhenderson added a comment.EditedApr 23 2020, 2:46 AM

IHexWriter::finalize::IsInPtLoad (used to identify how to write things).

This one is a problem. Because the parent segment is no longer the most parental, this means that any segment nested in another segment (e.g. PT_GNU_RELRO, PT_TLS etc) will be omitted from the IHex output. I couldn't find any testing for this case, but if there were, it would fail - I experimented with adding an additional segment to ihex-elf-segments.yaml containing one of the sections, and the ihex-writer.test started failing with this change (but not without it).

sectionPhysicalAddr (used in various places IHexSectionWriter[Base]).

Related to the previous point, but in the event that no section is found to have a PT_LOAD parent segment, then the behaviour is to fall back to using section virtual addresses rather than the physical addresses. This means that if all sections happen to be covered by an additional program header (admittedly relatively unlikely in normal usage, but certainly not impossible), the physical addresses will be ignored, and different address values will be emitted. This seems wrong to me, although I don't know how ihex is used in practice and therefore whether it matters that much. Still, given the previous point, I think it is important that we maintain the existing flattening algorithm.

So, is uint64_t SecSize = Sec.Size ? Sec.Size : 1; a subtle condition we may want to get rid of? Or is simplifying code can still be problematic? IMHO there are a few points regarding hypothetical use cases but: (1) they don't appear to be used (2) their reliance on the current behavior does not have a solid guarantee.

So, is uint64_t SecSize = Sec.Size ? Sec.Size : 1; a subtle condition we may want to get rid of? Or is simplifying code can still be problematic? IMHO there are a few points regarding hypothetical use cases but: (1) they don't appear to be used (2) their reliance on the current behavior does not have a solid guarantee.

I'm not sure I entirely follow. I'm open to the logic being simplified if it can be done without breaking behaviour, but as things stand, this patch breaks at least one behaviour as demonstrated in my previous two comments. ihex is not a "hyopthetical use-case" as far as I'm aware (if it were, why was support for it added?) and the layout I experimented with was not unreasonable.

There are plenty of users out there with their own linker scripts with layouts that look unusual compared to the "default" layout, and for various reasons. Just because they are not involved in this conversation does not mean they are "hypothetical use cases" that can be ignored. I've had to help one team with such a linker script on more than one occasion. Usually, it's because they've broken gABI rules in some way (e.g. PT_LOAD isn't in ascending address order). I'd like to avoid having to explain to them that they need to change their layout from something they thought worked, just because we decided to introduce an implicit restriction into llvm-objcopy.

If we accept the principle that an empty section should be assigned to an empty segment with the same offset to fix --only-keep-debug layout behaviour (but see my earlier comments where I'm not convinced by that), it seems to me like the issue is "what should the parent of an empty section sandwiched between two segments be?" We have 4 options:

  1. Both - behaviour of the current patch; works for empty segments, but which should be the parent segment? Current patch breaks ihex, whilst previous patch assigns it to the left segment which changes layout in a way inconsistent with GNU behaviour (it's admittedly debatable as to whether that behaviour matters, but I'm not comfortable with it, especially as I could see symbols being in those sections which are somehow being relied upon).
  2. Neither - won't solve the issue - segment will continue to have no section so won't be laid out.
  3. Segment before - won't solve the issue - an empty segment following a non-empty one will still not have any section.
  4. Segment after - the current behaviour (but broken for empty segments appearing after).

As previously mentioned, readelf in its section to segment mapping does not include empty sections at the end of empty segments, unless the segment is also empty (llvm-readelf does not do the "unless", but I think that's a bug). I think we should aim to match this algorithm for consistency because that a) fixes the problem, and b) doesn't change any other behaviour. It also matches what GNU objcopy does. The downside is we don't get to significantly simplify the code, although I don't think it complicates it any more than it was before - I think the slightly subtle size = 1 code gets replaced by the clearer if (size == 0) do something code.

If you aren't happy adding a size == 0 special case, then I think we are at an impasse and need more input from others.

MaskRay abandoned this revision.Nov 11 2020, 9:22 AM