This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Add support for --rename-section flags from gnu objcopy
ClosedPublic

Authored by rupprecht on Jul 26 2018, 11:54 AM.

Details

Summary

Add support for --rename-section flags from gnu objcopy.

Not all flags appear to have an effect for ELF objects, but allowing them would allow easier drop-in replacement. Other unrecognized flags are rejected.

This was only tested by comparing flags printed by "readelf -e <.o>" against the output of gnu vs llvm objcopy, it hasn't been tested to be valid beyond that.

Diff Detail

Event Timeline

rupprecht created this revision.Jul 26 2018, 11:54 AM
tools/llvm-objcopy/llvm-objcopy.cpp
121

maybe drop Config from the name ? doesn't seem to provide too much value, although i don't insist

196

I think in any case it should not be in the global namespace,
but more importantly, probably it would be better to use ELF* constants directly rather than introducing this enum.

655

above:
i think now things are getting a bit more complicated and this code deserves to be moved into a helper function (it would parse the passed StringRef and return SectionConfig / Error)

658

size_t + no C-style casts, please

679

std::move is unnecessary here - that struct essentially contains only integral types (pointers / numbers) - it doesn't have a move-ctor, thus std::move will only introduce an unnecessary cast here.

rupprecht marked 3 inline comments as done.Jul 26 2018, 1:57 PM

Thanks for the review! I'm clearly still ramping up on this codebase :)

tools/llvm-objcopy/llvm-objcopy.cpp
196

I think in any case it should not be in the global namespace,

Moved to the anonymous namespace above (w/ config classes)

but more importantly, probably it would be better to use ELF* constants directly rather than introducing this enum.

This isn't meant to replace the ELF SHF_* constants, it's meant to map to allowed command-line flags.
For instance, "readonly" maps to "unset SHF_WRITE"

655

and return SectionConfig / Error

Done -- although it felt more natural to make it return SectionConfig and just call error() directly if anything happens. Is that OK or would you prefer I return Expected<SectionConfig> and do the error() call up a level in here?

rupprecht updated this revision to Diff 157572.Jul 26 2018, 1:58 PM

Refactored/addressed code review comments

tools/llvm-objcopy/llvm-objcopy.cpp
196

yea, it's not a replacement, but why do we need this double indirection -
i mean now there is a switch (lines 215 - 228) and there is a chain of "if" statements (lines 255 - 265) -
it looks like the latter would be kinda redundant if ParseSectionName returned the flag (instead of the corresponding enum value) - am i missing smth ?

tools/llvm-objcopy/llvm-objcopy.cpp
480

here and in the other places:

  1. when the type is not obvious we usually don't use auto (for better readability),
  2. the names of the variable is kinda inconsistent with the type - the type is SectionRename - i'd name the variable smth like SR
alexander-shaposhnikov added inline comments.
tools/llvm-objcopy/llvm-objcopy.cpp
480

typo: the name of the variable

What is the behaviour of --rename-section when it touches a section with existing flags that don't conflict with the specified value? For example alloc when the section has SHF_EXECINSTR already? Does it preserve those flags? Can multiple modes be specified for the same section (e.g. code + alloc)? I don't think the tests really show these.

test/tools/llvm-objcopy/rename-section-flag.test
41

You could probably make this test a little more explicit, and reduce some of the duplication by using --check-prefixes:

--rename-section=.foo=.bar,alloc | FileCheck %s --check-prefixes=CHECK,ALLOC,WRITE
... --rename-section=.foo=.bar,load | FileCheck %s --check-prefixes=CHECK,WRITE

# CHECK: Name: .bar
# CHECK-NEXT: Type: SHT_PROGBITS
# CHECK-NEXT: Flags [
# ALLOC-NEXT: SHF_ALLOC (0x2)
# WRITE-NEXT: SHF_WRITE (0x1)
# CHECK-NEXT: ]
50

I'm a bit surprised by this one. I haven't checked, but I'd expect load to cause SHF_ALLOC.

tools/llvm-objcopy/llvm-objcopy.cpp
196

This function name is a bit misleading to me, since it doesn't do anything with the section name at all. Please rename it to something like ParseSectionRenameFlags or something equally descriptive.

213

I would say this should be called "ParseRenameSectionValue", not Flag, to avoid confusion with the flags portion of the value passed to the option.

247

Why not assign to this directly in the above set of ifs, rather than go via another variable?

rupprecht marked 9 inline comments as done.Jul 27 2018, 10:15 AM

What is the behaviour of --rename-section when it touches a section with existing flags that don't conflict with the specified value? For example alloc when the section has SHF_EXECINSTR already? Does it preserve those flags?

It looks like SHF_EXECINSTR is not preserved in that case, e.g.

$ cat /tmp/a.c 
char foo[] = "hello";
$ readelf -S /tmp/a.o | grep -A 1 .text
  [ 2] .text             PROGBITS         0000000000000000  00000040
       0000000000000000  0000000000000000  AX       0     0     4
$ objcopy --rename-section=.text=.foo,alloc /tmp/a.o /tmp/a-gnu.o; readelf -S /tmp/a-gnu.o | grep -A 1 .foo
  [ 1] .foo              NOBITS           0000000000000000  00000040
       0000000000000000  0000000000000000  WA       0     0     4

Can multiple modes be specified for the same section (e.g. code + alloc)? I don't think the tests really show these.

Yes, it's comma-delimited. I added a couple tests to show that.

test/tools/llvm-objcopy/rename-section-flag.test
41

Thanks, the test is much more readable now. I wasn't aware of multiple check-prefixes.

50

That's what I thought as well. However,

$ readelf -S /tmp/a.o | grep -A 1 .text
  [ 2] .text             PROGBITS         0000000000000000  00000040
       0000000000000000  0000000000000000  AX       0     0     4
$ objcopy --rename-section=.text=.foo,load /tmp/a.o /tmp/a-gnu.o; readelf -S /tmp/a-gnu.o | grep -A 1 .foo
  [ 1] .foo              PROGBITS         0000000000000000  00000040
       0000000000000000  0000000000000000   W       0     0     4
tools/llvm-objcopy/llvm-objcopy.cpp
196

So, I had it written w/o this, and it was pretty ugly. I added a separate type here, and I think it's better for a couple reasons:

  • It logically separates command line flags (which are really bfd names) against ELF section types.
  • It's not a simple map to ELF flag values. Most of them, sure -- for "alloc", just return SHF_ALLOC and have the caller |= the flag value. But there is no "readonly" flag; instead, there's SHF_WRITE, which is set _unless_ "readonly" is set. You could special case that one -- it's the only outlier here -- but I think it's easier to follow the code when written this way.
  • I've just started incrementally working on this, and I'm not convinced that what I have is complete w.r.t. gnu objcopy compatibility (or at least as far as we care about matching it). These might do things besides just SHF_* flags, so having separate types makes that more transparent.
rupprecht marked 2 inline comments as done.
  • Clean up long test case, fix some naming
tools/llvm-objcopy/llvm-objcopy.cpp
215

is there a test for this error ? if not - would you mind adding one ?

+ question (just in case): do the error messages match binutils' ones ?

I've added some inline comments, but otherwise this looks ok to me,
please, wait for James as well.

This revision is now accepted and ready to land.Jul 27 2018, 4:06 PM
tools/llvm-objcopy/llvm-objcopy.cpp
14–15

i think SmallVector should go before STLExtras

rupprecht marked 6 inline comments as done.Jul 27 2018, 4:32 PM
rupprecht added inline comments.
tools/llvm-objcopy/llvm-objcopy.cpp
14–15

git-clang-format wants it here, unfortunately. Spot checking other LLVM files that include both of these, STLExtras comes first ~95% of the time.

215

There are tests for these -- and actually this once deviating from binutils, so I'll revert it.
I've updated the other error messages to match binutils, with the exception of unrecognized section flag, which requires a minor change to the error() method.

$ objcopy --rename-section=.foo=.bar,xyz 
objcopy: unrecognized section flag `xyz'
objcopy: supported flags: alloc, load, noload, readonly, debug, code, data, rom, share, contents, merge, strings

(the error methods here only allow a single line error message)

rupprecht updated this revision to Diff 157804.Jul 27 2018, 4:33 PM
rupprecht marked 2 inline comments as done.
  • Change error messages to match gnu binutils

LGTM, I had the same confusion James did. I have independently confirmed this is the correct behavior. As much as I dislike this behavior I think we need to match what GNU objcopy does here because this might result in observable differences in behavior (like a segfault if this is run on a ,o file and then linked)

test/tools/llvm-objcopy/rename-section-flag.test
50

Ah ok, so if you add any flag at all *then* this is correct. if however you add no flag it does not change the flags. As much as I dislike this behavoir it's critical we match GNU objcopy here...sigh...

jakehehrlich accepted this revision.Jul 27 2018, 5:31 PM

Forgot to accept

My main concern is still the interaction between these switches and existing flags. Specifically, I'm worried about what will happen if this is used on a section with lots of flags that cannot be controlled via this switch. A quick look at the spec shows that this will clear the following flags unconditionally: SHF_INFO_LINK, SHF_GROUP, SHF_TLS, SHF_COMPRESSED, and any OS/processor-specific flags. I haven't checked what GNU objcopy does, but in basically all of those, changing the flags could be bad, and have a negative impact on the ability to link the resultant object. Could you investigate what GNU does in each of these cases, and write tests to show the behaviour, please.

Also, I feel like we should have two versions of the basic test: one with a section with no flags, and the other with a section with all the flags, if possible, to show which get set, which get unset, and which get ignored. Perhaps this would be best achieved with two sections, one each of the two different flag sets outlined in the previous sentence, and the main test just runs objcopy/readobj/FileCheck twice or similar.

test/tools/llvm-objcopy/rename-section-flag.test
16

I think this should be "EXEC" (or possibly even "EXECINSTR") not "CODE", just to make it completely clear what we mean.

50

Okay. I dug into GNU objcopy a little bit more, and that sort of makes sense, since these flags in theory affect more than the section flags, and look to be intended to be cross-platform.

tools/llvm-objcopy/llvm-objcopy.cpp
215

Honestly, I don't think we need to maintain error-message compatibility with objcopy, where the objcopy message is not as good as we can do.

You should also make sure things like capitalization of error messages etc is consistent with the rest of the code (and ideally the rest of LLVM, although I wouldn't be surprised if there's some variation on a wider scale). Specifically, I think we use upper-case in llvm-objcopy. If this differs from the wider LLVM, we might want to change that, but not in this change.

229

i -> I
In addition, LLVM style is to put the termination size value in the initializer portion of for loops like so:
for (size_t I = 1, Size = NameAndFlags.size(); I < Size; ++I)

233

Since, as I mentioned, exact identity on error messages is unimportant, in my opinion, why not just make this a long message with all the supported values listed either on the same line, or with an explicit new line?

jhenderson requested changes to this revision.Jul 30 2018, 2:12 AM

Marking as "Request Changes" to clear the LGTM until my above points about section flags are addressed.

This revision now requires changes to proceed.Jul 30 2018, 2:12 AM
rupprecht marked 10 inline comments as done.Jul 30 2018, 2:39 PM

My main concern is still the interaction between these switches and existing flags. Specifically, I'm worried about what will happen if this is used on a section with lots of flags that cannot be controlled via this switch. A quick look at the spec shows that this will clear the following flags unconditionally: SHF_INFO_LINK, SHF_GROUP, SHF_TLS, SHF_COMPRESSED, and any OS/processor-specific flags. I haven't checked what GNU objcopy does, but in basically all of those, changing the flags could be bad, and have a negative impact on the ability to link the resultant object. Could you investigate what GNU does in each of these cases, and write tests to show the behaviour, please.

Here's what I've found from some basic testing of gnu objcopy:

  • SHF_ALLOC, SHF_WRITE, SHF_EXECINSTR, SHF_MERGE, SHF_STRINGS: covered already, appear to always be cleared
  • SHF_INFO_LINK, SHF_OS_NONCONFORMING, SHF_TLS: appear to be always cleared
  • SHF_COMPRESSED, SHF_EXCLUDE, SHF_GROUP, SHF_LINK_ORDER: appear to be preserved
  • CPU/processor dependent flags (e.g. SHF_X86_64_LARGE) are preserved

I added a mask so that preserved flags are never changed. It works for the current test cases, but I do plan to do more thorough testing (once this patch, and others, land) to see if this is really good enough.

Also, I feel like we should have two versions of the basic test: one with a section with no flags, and the other with a section with all the flags, if possible, to show which get set, which get unset, and which get ignored. Perhaps this would be best achieved with two sections, one each of the two different flag sets outlined in the previous sentence, and the main test just runs objcopy/readobj/FileCheck twice or similar.

Done -- I couldn't figure out how to get the prefixes configured to have it all in one test case, so I split off the all-flags test into a separate test file.

rupprecht updated this revision to Diff 158078.Jul 30 2018, 2:40 PM
  • Preserve certain SHF_* flags and tidy up error messages

The code looks correct to me (barring a couple of nits) for what you are trying to do, but I do have a couple of outstanding concerns, that I'd like to discuss further:

I'm really worried about clearing the TLS flag, and to a lesser extent, the INFO_LINK flag. Clearing the TLS flag would almost certainly break the file, and I have no idea what the repercussions of clearing INFO_LINK might be. I'd be tempted to break with GNU compatibility here and not clear either of them. If we really don't want to do that, I'd recommend including a "tls" input option. I'm not sure what to do with INFO_LINK, because it affects other fields in the section header. Are you sure that the INFO_LINK and TLS flags are being cleared by the renaming and not by something else (e.g. implicit meaning of section names)? What happens if you rename e.g. .tdata.foo to .tdata.bar (thus maintaining the standard TLS naming scheme)?

I've been thinking about this a bit more, and I honestly feel like it doesn't make sense to publish the somewhat arcane options that GNU objcopy supports, except for legacy support. As such, whilst I believe we should support things like "contents", "debug" etc, in order to maintain compatibility, I also feel like they shouldn't be in the help text, to try to "persuade" people to adopt a more sensible and meaningful usage going forwards. Going on from this, how would you feel about extending the syntax to make more sense for ELF, i.e. to have a flag name for each modifiable individual ELF flag (i.e. alloc, write, execinstr, tls, ...)? I don't think this needs to happen in this change, but I do feel like we shouldn't be satisfied with just GNU compatibility.

test/tools/llvm-objcopy/rename-section-flag-preserved.test
1 ↗(On Diff #158078)

Since this uses an OS-specific flag, will this work without x86_64 target support? Or do we need a REQUIRES?

3 ↗(On Diff #158078)

Nit: no need to abbreviate this. Just use "with" explicitly.

52 ↗(On Diff #158078)

Nit: You should indent this to the same level as the previous line.

tools/llvm-objcopy/ObjcopyOpts.td
32–35

Is this clang-formatted? It looks weird...

rupprecht marked 4 inline comments as done.Jul 31 2018, 11:11 AM

The code looks correct to me (barring a couple of nits) for what you are trying to do, but I do have a couple of outstanding concerns, that I'd like to discuss further:

I'm really worried about clearing the TLS flag, and to a lesser extent, the INFO_LINK flag. Clearing the TLS flag would almost certainly break the file, and I have no idea what the repercussions of clearing INFO_LINK might be. I'd be tempted to break with GNU compatibility here and not clear either of them. If we really don't want to do that, I'd recommend including a "tls" input option. I'm not sure what to do with INFO_LINK, because it affects other fields in the section header. Are you sure that the INFO_LINK and TLS flags are being cleared by the renaming and not by something else (e.g. implicit meaning of section names)? What happens if you rename e.g. .tdata.foo to .tdata.bar (thus maintaining the standard TLS naming scheme)?

I'm still not able to get SHF_TLS propagated, even with a real object file (not just something artificially created w/ yaml2obj):

$ cat /tmp/t.c
static __thread char tls = 5;
char touch() { return ++tls; }
$ clang -c /tmp/t.c -o /tmp/t.o && bin/llvm-readobj -sections /tmp/t.o
  ...
  Section {
    Index: 4
    Name: .tdata (87)
    Type: SHT_PROGBITS (0x1)
    Flags [ (0x403)
      SHF_ALLOC (0x2)
      SHF_TLS (0x400)
      SHF_WRITE (0x1)
    ]
  ...
$ objcopy --rename-section .tdata=.tdata1,alloc /tmp/t.o /tmp/t2.o && bin/llvm-readobj -sections /tmp/t2.o
  ...
  Section {
    Index: 3
    Name: .tdata1 (38)
    Type: SHT_NOBITS (0x8)
    Flags [ (0x3)
      SHF_ALLOC (0x2)
      SHF_WRITE (0x1)
    ]
  ...

Although note that PROGBITS also changed to NOBITS. Maybe we should just not support renaming sections that having SHF_TLS, SHF_INFO_LINK, or other special flags? This could be reaching into unsupported objcopy territory which gnu objcopy is silently allowing.

I don't really have an immediate preference, so I just added those two to the preserve mask -- let me know if I should add others.

I've been thinking about this a bit more, and I honestly feel like it doesn't make sense to publish the somewhat arcane options that GNU objcopy supports, except for legacy support. As such, whilst I believe we should support things like "contents", "debug" etc, in order to maintain compatibility, I also feel like they shouldn't be in the help text, to try to "persuade" people to adopt a more sensible and meaningful usage going forwards. Going on from this, how would you feel about extending the syntax to make more sense for ELF, i.e. to have a flag name for each modifiable individual ELF flag (i.e. alloc, write, execinstr, tls, ...)? I don't think this needs to happen in this change, but I do feel like we shouldn't be satisfied with just GNU compatibility.

Yep, I was thinking something like objcopy --rename-section .foo=.bar,shf_alloc,shf_execinstr,noshf_write... might be a good way to build that into this.

BTW, the use case I'm currently trying to fix is objcopy -I binary -Bi386:x86-64 -Oelf64-x86-64 --rename-section .data=.rodata,alloc,load,readonly,data,contents, mentioned here: http://lists.llvm.org/pipermail/llvm-dev/2017-June/113655.html. It would make my life (and I think many others' lives) a lot easier if we could at least support those options for drop-in compatibility, but agree that we don't need to go full-on bug for bug compatibility.

test/tools/llvm-objcopy/rename-section-flag-preserved.test
1 ↗(On Diff #158078)

No, it appears this works cross platform; I've separated all flags into a separate test case for clarity. (And it works fine even though the machine I'm testing on is not mips/arm/hex).

rupprecht marked an inline comment as done.
  • Cover full OS/proc flags in test, fix other comments
jhenderson accepted this revision.Aug 1 2018, 2:14 AM

I don't really have an immediate preference, so I just added those two to the preserve mask -- let me know if I should add others.

I think this is a reasonable approach for now. I can't think of any situation where converting between TLS and non-TLS (and vice versa) makes sense, since the relocations needed are different.

Yep, I was thinking something like objcopy --rename-section .foo=.bar,shf_alloc,shf_execinstr,noshf_write... might be a good way to build that into this.

I'd probably drop the "shf" (alloc is equivalent to SHF_ALLOC as far as I'm aware), and I'd make them all explicitly set (so no "noshf_*" flags), to avoid weird behaviour.

BTW, the use case I'm currently trying to fix is objcopy -I binary -Bi386:x86-64 -Oelf64-x86-64 --rename-section .data=.rodata,alloc,load,readonly,data,contents, mentioned here: http://lists.llvm.org/pipermail/llvm-dev/2017-June/113655.html. It would make my life (and I think many others' lives) a lot easier if we could at least support those options for drop-in compatibility, but agree that we don't need to go full-on bug for bug compatibility.

Sure. I think the only thing I'd change is to remove mentioning the GNU objcopy supported flags from a) the error message and b) the help text, until we agree on the "ideal" interface. We should still support those flags, but if we make them hidden, it means people will (hopefully) be more likely to use any new interface we propose going forward.

Okay, maybe here's my best, hopefully uncontroversial, suggestion, since it would be good to get this in for the LLVM 7.0 cut today(?): leave them in but mark them as "legacy" or for GNU compatibility e.g: Flags supported for GNU compatibility: alloc, load... in both error message and help text. If and when we add more flags, for a better/clearer UI, we can make that a separate sentence like Supported Flags: alloc, write, execinstr,... Additional flags supported for GNU compatibility: load, noload... etc.

LGTM with those changes.

tools/llvm-objcopy/ObjcopyOpts.td
32–35

I just realised that this is in the .td file, so clang-format might not be applicable, but it looks better now anyway, thanks!

This revision is now accepted and ready to land.Aug 1 2018, 2:14 AM

Okay, maybe here's my best, hopefully uncontroversial, suggestion, since it would be good to get this in for the LLVM 7.0 cut today(?): leave them in but mark them as "legacy" or for GNU compatibility e.g: Flags supported for GNU compatibility: alloc, load... in both error message and help text. If and when we add more flags, for a better/clearer UI, we can make that a separate sentence like Supported Flags: alloc, write, execinstr,... Additional flags supported for GNU compatibility: load, noload... etc.

That wording sounds good to me. I've updated the errors/tests.

LGTM with those changes.

Thanks for the thorough review!

tools/llvm-objcopy/ObjcopyOpts.td
32–35

Actually, I did use clang-format (forcing it with "git-clang-format --extensions td master") to fix my ugly manual formatting. It's similar enough to C++ that clang-format sometimes works well on small diff regions, although it doesn't seem like it's 100% ready for full-file formatting.

rupprecht updated this revision to Diff 158551.Aug 1 2018, 9:17 AM

Tweak flag message string

This revision was automatically updated to reflect the committed changes.