This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] --set-section-flags: allow "large" to add SHF_X86_64_LARGE
ClosedPublic

Authored by tkoeppe on Jun 19 2023, 4:03 AM.

Details

Summary

Currently, objcopy cannot set the new flag SHF_X86_64_LARGE. This change introduces the named flag "large" which translates to that section flag.

An "invalid argument" error is produced if a user attempts to set the flag on an architecture other than X86_64.

Diff Detail

Event Timeline

tkoeppe created this revision.Jun 19 2023, 4:03 AM
tkoeppe requested review of this revision.Jun 19 2023, 4:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 19 2023, 4:03 AM

yeah we should check if Config.OutputArch is ELF::EM_X86_64, in line with https://reviews.llvm.org/D148358

rnk added a subscriber: rnk.Jun 23 2023, 9:20 AM
rnk added inline comments.
llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
86

I think the thing to do is to:

  • Pass in the elf machine
  • Make NewFlags an output parameter (out parameters seem more consistent with local code than ErrorOr)
  • Return an Error instead
  • While handling the generic SecLarge flag, check the ELF machine, produce an error for non x86_64 machine values
  • Propagate that up out of setSectionFlagsAndType, whose caller can return Error

Yes, sorry, I forgot about that. Will do that for the next revision for sure!

tkoeppe added a comment.EditedJun 23 2023, 9:31 AM

yeah we should check if Config.OutputArch is ELF::EM_X86_64, in line with https://reviews.llvm.org/D148358

And what do we do if the flag is requested but we're on the wrong architecture? A fatal error? That might be better than silently ignoring an nonsensical request.

I see rnk provided a comprehensive strategy, thanks!

yeah we should check if Config.OutputArch is ELF::EM_X86_64, in line with https://reviews.llvm.org/D148358

And what do we do if the flag is requested but we're on the wrong architecture? A fatal error? That might be better than silently ignoring an nonsensical request.

yes and we should say that it's only supported on x86-64 in the error message

tkoeppe updated this revision to Diff 534124.Jun 23 2023, 5:02 PM
tkoeppe marked an inline comment as done.

Made it an error to try to set the "large" flag when not on an x86_64 architecture.

tkoeppe updated this revision to Diff 534126.Jun 23 2023, 5:04 PM

Fixed return type of setSectionFlagsAndType.

jhenderson requested changes to this revision.Jun 26 2023, 1:47 AM

Could you add some tests please?

llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
74

EM isn't a particularly self-descriptive name. Let's call it EMachine like it's called in other places. Same goes throughout the changes.

86

I would prefer this to return an Expected<uint64_t> with the new flags. It's easier to work with than an output parameter + Error return. Otherwise, I agree with the rest of these comments (which I see you've already addressed).

96

I'm not convinced this is the right error value - you could argue that the input flag type is invalid, rather than the file type, so this would end up being errc::invalid_argument, I think.

97

I don't have a good solution to this currently, but the use of the command-line term in a libary function is a little concerning. A future tool might use a different name for the SHF_X86_64_LARGE flag, which would then make this error non-sensical. We should avoid referring to the command-line term this deep. For now, you could probably get away with referring directly to the flag name "SHF_X86_64_LARGE", and we can then change it to something else (e.g. SHF_*_LARGE) if another machine gains support.

700–701

I don't think this is the correct way of finding the machine architecture at this point in the code. If Config.OutputArch isn't set, the result will be a value that doesn't support EM_X86_64_LARGE, even though the actual object being created might be an X86_64 object (because it was already one). Take a look at how OutputElfType is created in executeObjcopyOnBinary for example.

This revision now requires changes to proceed.Jun 26 2023, 1:47 AM
tkoeppe marked 3 inline comments as done.Jun 26 2023, 2:21 AM

Could you add some tests please?

Could you please advise how to add an objcpy test that only runs under x86_64?

llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
86

Sure thing, changed to Expected.

96

Yes, makes sense, changed!

97

I see, makes sense. I was thinking of giving the user a clue what they did wrong, and both the command line flag and this code are particular to objcpy, but I see your point. Changed.

tkoeppe updated this revision to Diff 534468.Jun 26 2023, 2:22 AM
tkoeppe marked 2 inline comments as done.

Addressed some review comments.

tkoeppe updated this revision to Diff 534475.Jun 26 2023, 2:54 AM

Minor fixes.

tkoeppe updated this revision to Diff 534481.Jun 26 2023, 3:14 AM

Changed how EMachine is determined; now only using Config.OutputArch if that is available, otherwise Obj.Machine.

llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
700–701

Aha, I'm now using Config.OutputArch ? Config.OutputArch->EMachine : Obj.Machine, PTAL.

tkoeppe updated this revision to Diff 534510.Jun 26 2023, 5:11 AM

Added test, updated PreserveMask to include ~SHF_X86_64_LARGE.

Could you add some tests please?

Could you please advise how to add an objcpy test that only runs under x86_64?

I just added it unconditionally to llvm/test/tools/llvm-objcopy/ELF/set-section-flags.test for now. PTAL.

tkoeppe edited the summary of this revision. (Show Details)Jun 26 2023, 5:12 AM

One more thing: you need to add the new flag value to the list of supported flags and their meaning in the llvm-objcopy command guide (llvm/docs/CommandGuide).

llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
97

Yeah, I get what your point is, but as this is a library, we can't assume all future users will have the same interface. You could do some more significant changes, if you wanted, which would allow us to get the "textual" version of the flag reported. One idea would be to simply pass the map of SectionFlag enum values to their textual versions into this method. Alternatively, a callback that takes a SectionFlag and converts it to the relevant form. I don't think it's necessary for this patch, but it's an option, if you want to improve the message.

105–106

Please make sure there is testing for this change.

There's also a potential issue here, if I'm not mistaken: on other architectures, there might be a flag with the same value as SHF_X86_64_LARGE, but which should be handled independently. You'll want a test case for this to show what should happen.

120–123

I'll be honest, I don't consider this code readable. I'd prefer the NewFlags initialization to be outside the if, as per the inline edit. Also, the use of auto hdies that this is an Expected return, which doesn't seem desirable.

700–701

Looks better. Please make sure you have testing that shows the behaviour with and without Config.OutputArch set (both to a supporting architecture and not) and with and without an Obj.Machine that supports the new flag. I believe that means you need 6 tests?

  • Config.OutputArch not set, Obj.Machine is EM_X86_64.
  • Config.OutputArch not set, Obj.Machine is not EM_X86_64.
  • Config.OutputArch set to x86-64, Obj.Machine is EM_X86_64.
  • Config.OutputArch set to x86-64, Obj.Machine is not EM_X86_64.
  • Config.OutputArch set to other machine, Obj.Machine is EM_X86_64.
  • Config.OutputArch set to other machine, Obj.Machine is not EM_X86_64.
719–720

Rather than repeat this code, would it make more sense to pass in the final value from outside the function?

tkoeppe updated this revision to Diff 534952.Jun 27 2023, 6:12 AM

Review comments.

llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
105–106

Hm, an interesting wrinke:

According to https://github.com/llvm/llvm-project/commit/d67c1e129bfd6afa756fed7e3988110bf3d70260#diff-38b4803de5adf661f3d9c7e5e1602a1f097bb79a7828cf31b9c629eb3af94fd1R18, it seems that we actually want to _preserve_ the flag, and not allow it to be changed by the user. That seems unfortunate.

What do you think, can we reverse that decision?

120–123

I can replace the auto with the proper type, which is definitely an improvement.

I'd say keeping the scope small is generally a win, since there's less one has to keep in mind for longer than necessary. But if you strongly prefer your style, we can of course use that.

719–720

Wait, yes, we don't need to compute this at all, since Obj.Machine is already set to the right value on L624, so we can just use that directly. PTAL.

tkoeppe marked 2 inline comments as not done.Jun 27 2023, 6:12 AM
tkoeppe added inline comments.Jun 27 2023, 11:45 AM
llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
105–106

And you're right that the same numeric value is used for other flags in other architectures!

I think I need some overall guidance here:

  • We currently _preserve_ some so-called "OS flags".
  • Several different archs use the same numeric value for one of their OS flags.
  • But now we actually _don't_ want to preserve the "LARGE" flag but make it configurable.

Is that even a workable idea? It'd potentially break existing users who have been silently propagating the flag (even if that's unlikely). But we really need a way to set this flag on our own custom sections that we assemble by other means.

There seems to be a limitation of the current objcpy interface: it doesn't explicitly allow distinguishing "set", "clear", and "leave as is"; rather the set of volatile/preserved flags is somehow nebulously hardcoded, and there doesn't seem an evolutionary path to promote a flag from preserved to volatile.

What do you reckon?

jhenderson added inline comments.Jun 28 2023, 1:18 AM
llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
105–106

First up, and most importantly, we should aim for compatibility with GNU objcopy. As this is a new feature (at least, I'm assuming it's not available in GNU objcopy yet), you should discuss with them a) the flag name in the command-line (i.e. "large") and b) how this should/should not be preserved.

Asssuming we implement this feature as you are planning, I think it's reasonable to do the following:

  1. Flags that are processor specific should be "preserved" when the machine doesn't have that numeric flag value as a known OS flag. It's unclear what should happen if you are trying to change the machine type in this case (in other words, is it the new or old OS that's important?).
  2. Flags that are processor specific that are known for the relevant machine should NOT be preserved. However, we would need to consider backwards compatibility here: if a user is currently modifying the flags of a section that has the SHF_X86_64_LARGE flag, if we no longer preserve it by default, they will suddenly find the flag is lost when they update their version of llvm-objcopy. This seems a little unfriendly. The alternative is to preserve the flag if it exists, or add it if it doesn't and is requested, but this means users won't have the ability to drop the flag. Again, it may be worth discussing with the GNU developers. A migration path might be to introduce a new command-line option to specifically remove flags, and then leave some legacy behaviour in --set-section-flags to maintain the current behaviour there (whilst still allowing that option to add the flag, as suggested).

@MaskRay, any thoughts on this?

120–123

I'm not convinced you have to keep anything in mind. Here's my rationale:

  1. A variable that isn't used has no impact on the later code (obviously), so it doesn't matter if it is still in scope.
  2. A variable that is used later needs to be available, so it obviously still needs to be in scope.
  3. A variable that is redeclared later on will result in a compiler error. This will suggest that you either need to start using the existing variable (i.e. transition it from 1) to 2)) or to give the new one (or the old one) a different name.

The only cases I can think of where an existing variable could trip you up are:
a) when you create a new variable in the same scope, but forget to give it a type, and therefore accidentally reuse an existing variable.
b) when you create a new variable in an inner scope, and therefore start hiding the existing variable.
In the first of these two cases either it doesn't matter - if the variable wasn't used after this point, then overwriting it isn't an issue, whilst if the variable was used after this point, we're not in a position where limiting scope would be relevant. In the second case, at least with Visual Studio you can have warning levels turned up to warn you about this, but regardless, you don't impact the outer variables behaviour at all.

(Sorry, that ended up being a longer thought process than I intended!)

llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
105–106

I think GNU objcopy's current behavior is to preserve all SHF_MASKOS | SHF_MASKPROC bits. For EM_X86_64 objects, it will make sense to not preserve SHF_X86_64_LARGE (if "large" is not specified with --set-section-flags, drop the section flag).

I think our behavior should behave like this as well. If a non-x86-64 object has the flag, we should preserve it, e.g. SHF_MIPS_GPREL == SHF_X86_64_LARGE.

I don't worry about compatibility here. The default -mlarge-data-threshold= is 65536 and it's difficult to get SHF_X86_64_LARGE sections with GCC (Clang doesn't emit this yet). I do not find real use cases of -mlarge-data-threshold= on https://sourcegraph.com/search .

llvm/test/tools/llvm-objcopy/ELF/set-section-flags.test
100

Consider adding a section with SHF_X86_64_FLAG and calling objcopy --set-section-flags on it but without specifying "large". Test that the flag is cleared.

A more descriptive subject may be: [llvm-objcopy] --set-section-flags: allow "large" to add SHF_X86_64_LARGE

tkoeppe retitled this revision from Add named section flag "large" to objcopy to [llvm-objcopy] --set-section-flags: allow "large" to add SHF_X86_64_LARGE.Jul 3 2023, 6:18 AM

Many thanks, @MaskRay!

I believe GNU objcopy doesn't have a "large" flag at the moment, but it does preserve any existing SHF_X86_64_LARGE sectopm flags. I also agree that it seems OK to change this, but I don't have a good industry-wise overview.

@jhenderson: Do you think this is acceptable?

tkoeppe updated this revision to Diff 536757.Jul 3 2023, 7:04 AM
tkoeppe marked an inline comment as done.

Made SHF_X86_64_LARGE flag non-preserved only on x86_64; added tests that the section flag is unset when the corresponding command line flag is absent.

llvm/test/tools/llvm-objcopy/ELF/set-section-flags.test
100

Done, I'm just using the previous file and calling --set-section-flags again without large.

clang-format is complaining in the pre-merge checks. Please make sure you have run it on all your new code.

llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
105–106

All sounds good to me, thanks @MaskRay.

700–701

Pinging this comment about testing.

llvm/test/tools/llvm-objcopy/ELF/rename-section-flag-osproc-mask.test
30

Probably should have a blank line between the two blocks, for readability.

tkoeppe marked an inline comment as done.Jul 4 2023, 12:53 AM

clang-format is complaining in the pre-merge checks. Please make sure you have run it on all your new code.

Sorrry, I thought I had done that, but I must have missed a final pass -- fixing in the next revision!

llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
700–701

Yes, will do, I just wanted to get agreement on the overall plan first!

MaskRay added inline comments.Jul 5 2023, 12:20 PM
llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
89

This can be tested when you add a non-EM_X86_64 test.

section flag 'large' ... may be more appropriate since for a non-EM_X86_64 machine, the flag is not called SHF_X86_64_LARGE.

jhenderson added inline comments.Jul 6 2023, 12:26 AM
llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
89

At the moment, the "large" option is only valid for X86_64, hence there's a one-to-one mapping of the option name to the flag name, and it's therefore safe to use the term on non-X86-64 machines (because "large" means "set the SHF_X86_64_LARGE" flag specifically).

As stated earlier, I don't think mentioning a command-line term in a library is a great idea, because another tool might use another mechanism to configure SecLarge. We could pass in a mapping of flag names (e.g. SecLarge) to their textual representation (e.g. large) so that the calling code could configure it appropriately, but that seems a bit overkill at the moment.

tkoeppe updated this revision to Diff 540193.Jul 13 2023, 3:10 PM

Updated both the "guide" manual and the usage help text.

Added a new "multiarch" test file that tests converting from x86_64 to MIPS and back.

One more thing: you need to add the new flag value to the list of supported flags and their meaning in the llvm-objcopy command guide (llvm/docs/CommandGuide).

Done.

llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
700–701

I've added a new file with a set of tests now. Notes:

  • When the target is non-EM_X86_64, it is an error to set the "large" flag (this is tested).
  • When not setting a target, the preservation or lack thereof of the "large" flag is tested by rename-section-flag-osproc-mask.test.
  • When converting from EM_X86_64 to non-EM_X86_64, any present "large" flags are reinterpreted to the target's interpretation of that flag value.
  • When converting from non-EM_X86_64 to EM_X86_64 and --set-section-flag is used, the "large" flag is set based on whether it is specified on the command line. When --set-section-flag is not used, existing flags are reinterpreted.
tkoeppe marked 2 inline comments as done.Jul 13 2023, 3:25 PM
MaskRay accepted this revision.Jul 13 2023, 3:33 PM
MaskRay added inline comments.
llvm/test/tools/llvm-objcopy/ELF/set-section-flags-large-multiarch.test
5

The long option --output-target is fine, but the short -O is much more common and can make the RUN line easier to read.

38

We may need two SHF_X86_64_LARGE sections, one operated on by --set-section-flags. This way we can demonstrate that the untouched section keeps having the SHF_X86_64_LARGE flag.

llvm/tools/llvm-objcopy/ObjcopyOpts.td
88

While here, remove the trailing period. I think for TableGen help messages the convention is to omit the period. Some options may be inconsistent, just ignore them:)

MaskRay added inline comments.Jul 13 2023, 3:39 PM
llvm/docs/CommandGuide/llvm-objcopy.rst
156

architecture is not x86_64.

I think the "target format" term is somewhat ambiguous. Sometimes it refers specifically to the object file format (LLVM may call it BinaryFormat): ELF, COFF, etc. Here we just mean the architecture (sometimes called processor or machine, but I find "architecture" better).

tkoeppe marked 2 inline comments as done.Jul 13 2023, 3:48 PM
tkoeppe added inline comments.
llvm/docs/CommandGuide/llvm-objcopy.rst
156

Done, changed to "architecture".

tkoeppe marked 2 inline comments as done.Jul 13 2023, 3:56 PM
tkoeppe updated this revision to Diff 540205.Jul 13 2023, 3:59 PM

Wording/punctuation tweaks in the help text. Added another, untouched section to the new unit test.

aeubanks added inline comments.Jul 18 2023, 2:30 PM
llvm/docs/CommandGuide/llvm-objcopy.rst
156
jhenderson added inline comments.Jul 20 2023, 12:21 AM
llvm/docs/CommandGuide/llvm-objcopy.rst
141

Perhaps worth extending this to say "all object file formats or target architectures"?

156

I found "this" somewhat awkward in the comment (the semi-colon is still needed though). How about "rejected if target architecture is not x86_64"?

llvm/test/tools/llvm-objcopy/ELF/set-section-flags-large-multiarch.test
11

As you don't have many different variant test cases which have different expected outputs, you don't need so many check-prefixes. I suggest reducing them down to just the REINTERPRET* prefixes and one other (e.g. CHECK).

22

As this is only used in one case, I recommend moving it to immediately after that case's RUN line. Also, I suggest including the "error: " prefix, as it helps ensure we've captured the whole message.

31–33

Is it worth having a version of this section (i.e. one without SHF_X86_64_LARGE) that isn't operated on by --set-section-flags, just to complete the test matrix? (Possibly not)

tkoeppe marked 13 inline comments as done.Jul 20 2023, 5:20 AM
tkoeppe marked 5 inline comments as done.Jul 20 2023, 5:28 AM

I cannot figure out why clang-format rejects llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp: when I run the formatter script (clang/tools/clang-format/git-clang-format llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp) it doesn't make any changes.

llvm/test/tools/llvm-objcopy/ELF/set-section-flags-large-multiarch.test
31–33

Yes, because there's a difference in behaviour: if you don't operate on the section, the flag is reinterpreserved when you change architectures, but if you do operate on it, then it might get dropped. I initially had a separate objcopy run for this, but maskray@ suggested just rolling that into a section that's not operated on.

38

@jhenderson (See this comment.)

tkoeppe marked 3 inline comments as done.Jul 20 2023, 5:32 AM
tkoeppe added inline comments.
llvm/test/tools/llvm-objcopy/ELF/set-section-flags-large-multiarch.test
11

Oh yes, I see, nice. Done.

tkoeppe updated this revision to Diff 542452.Jul 20 2023, 5:34 AM
tkoeppe marked an inline comment as done.

Review comments: Simplify tests, improve documentation

I cannot figure out why clang-format rejects llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp: when I run the formatter script (clang/tools/clang-format/git-clang-format llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp) it doesn't make any changes.

Is it possible there's a difference in versions between the clang-format being run by the pre-merge checks and the version you're using? Anyway, see my inline edit. Hopefully that solves the pre-merge check issue.

llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
97–99

When I run my local clang-format on this code, I get the inline edit.

llvm/test/tools/llvm-objcopy/ELF/set-section-flags-large-multiarch.test
31–33

I think you misunderstood my comment. At the moment, you have the following three cases (one case per section):

section without flag, specified by --set-section-flags (.foo)
section with flag, specified by --set-section-flags (.bar)
section with flag, not specified by --set-section-flags (.untouched)

I'm suggesting you add the missing case:
section without flag, not specified by --set-section-flags

Hopefully that case is trivial, but given you've got the rest of the test matrix, I think having coverage for that final combination would make sense.

tkoeppe marked an inline comment as done.Jul 25 2023, 2:29 AM
tkoeppe added inline comments.
llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
97–99

Aha, I see! With my original format, clang-format makes no changes. But if I make a local edit to put everything on a single line and _then_ run clang-format, then it produces your version.

It seems like both versions are acceptable to clang-format, but if it needs to make a change, it prefers your version?

tkoeppe marked an inline comment as done.Jul 25 2023, 2:39 AM
tkoeppe added inline comments.
llvm/test/tools/llvm-objcopy/ELF/set-section-flags-large-multiarch.test
31–33

Ah yes, of course, I did indeed misread your message. Sorry! Adding the test.

tkoeppe updated this revision to Diff 543897.Jul 25 2023, 2:40 AM

Add another test case; fix formatting.

jhenderson accepted this revision.Jul 25 2023, 2:43 AM

LGTM, thanks!

This revision is now accepted and ready to land.Jul 25 2023, 2:43 AM

LGTM, thanks!

Likewise, thank _you_ for the review! This is going to be great help for our large binaries, much appreciated.

This revision was landed with ongoing or failed builds.Jul 25 2023, 9:47 AM
This revision was automatically updated to reflect the committed changes.