Page MenuHomePhabricator

[llvm-objcopy][NFC] Move core implementation of llvm-objcopy into separate library.
ClosedPublic

Authored by avl on Oct 5 2020, 5:29 AM.

Details

Summary

This patch moves core implementation of llvm-objcopy into ObjCopy library
(http://lists.llvm.org/pipermail/llvm-dev/2020-September/145075.html).
The functionality for parsing input options is left inside tools/llvm-objcopy.
The interface of ObjCopy library:

ObjCopy/ELF/ELFObjcopy.h

Error executeObjcopyOnIHex(const CopyConfig &Config, MemoryBuffer &In,
                           Buffer &Out);
Error executeObjcopyOnRawBinary(const CopyConfig &Config, MemoryBuffer &In,
                                Buffer &Out);
Error executeObjcopyOnBinary(const CopyConfig &Config,
                             object::ELFObjectFileBase &In, Buffer &Out);

ObjCopy/COFF/COFFObjcopy.h

Error executeObjcopyOnBinary(const CopyConfig &Config,
                             object::COFFObjectFile &In, Buffer &Out);

ObjCopy/MachO/MachOObjcopy.h

Error executeObjcopyOnBinary(const CopyConfig &Config,
                             object::MachOObjectFile &In, Buffer &Out);

ObjCopy/wasm/WasmObjcopy.h

Error executeObjcopyOnBinary(const CopyConfig &Config,
                             object::WasmObjectFile &In, Buffer &Out);

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
In D88827#3118304, @avl wrote:

ping,

Apologies for not getting to this - a number of other reviews have used up what time I've had to do reviews.

This patch is looking very close to being ready, but not quite there yet.

llvm/include/llvm/ObjCopy/COFF/COFFObjcopy.h
26–28

I've no issue with adding comments like these, but could you spin them off into separate patches, please, as they are independently useful, and this will help reduce the size of this patch.

Same applies for other new doc comments.

llvm/include/llvm/Object/ObjCopy/ConfigManager.h
24–25 ↗(On Diff #372901)

This should be a doc comment if it is needed. I don't think the comment is particularly useful in its current form. The functionality is fairly obvious from its interface.

33 ↗(On Diff #372901)

The functionality in this and similar functions for wasm and Mach-O should be moved to a .cpp file. It's not unlikely that it will change over time, so we don't want to force rebuilds due to it being unnecessarily in the header.

llvm/include/llvm/Object/ObjCopy/ObjCopy.h
12–18 ↗(On Diff #372901)

I think you can forward declare most of these classes, and include the headerse in the .cpp rather than having this header include them all.

I think about the only ones you likely need are Error.h, and vector.

llvm/lib/ObjCopy/COFF/Object.cpp
1

Actually, I think the way to solve this is to use some CMake functionality to stick specific files in IDE folders, so that they don't end up clashing: in the Visual Studio IDE, for example, I see 4 Object.h files next to each other in the Header Files group, but there's no indication which is which. Instead, we could create COFF, ELF etc sub-groups for the headers and source files. I believe https://cmake.org/cmake/help/latest/command/source_group.html is the relevant piece of CMake.

llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
171

As you're renaming files, it's probably worth doing a complete clang-format of the whole file at the same time. Applies to all files with clang-format issues in otherwise untouched code.

llvm/lib/Object/CMakeLists.txt
32–47 ↗(On Diff #372901)

I thought ObjCopy was going to be its own distinct library, not a part of the Object library? It would make more sense to me for it to be separate, as it's a fairly distinct piece of functionality.

llvm/unittests/Object/ObjCopyTest.cpp
9 ↗(On Diff #372901)

Here and in other new files, if you haven't already, please make sure the include set is minimal for the code in the new file.

35 ↗(On Diff #372901)

As this function is called from several places, you may want to add some tracing to it: https://github.com/google/googletest/blob/main/docs/advanced.md#adding-traces-to-assertions

If you don't, and the test fails, you may not be able to easily tell which test is causing the failure.

39 ↗(On Diff #372901)

I believe YAML is an acronym, so should be all-caps in comments.

49–50 ↗(On Diff #372901)

I was under the impression that temporary files are supposed to be deleted automatically (I might be wrong), so you wouldn't need the FileRemover?

More generally, the need to create a concrete file in the unit test makes me wonder whether we could enhance the objcopy API to take an object of some kind that could represent data in memory (as an alternative to on-disk). I believe such an object hierarchy already exists within LLVM, although the name escapes me right now.

avl added inline comments.Nov 18 2021, 2:54 AM
llvm/lib/Object/CMakeLists.txt
32–47 ↗(On Diff #372901)

It was directly requested to implement it as part of Object library during the previous review iteration:

https://reviews.llvm.org/D88827#2344900

jhenderson added inline comments.Nov 18 2021, 3:07 AM
llvm/lib/Object/CMakeLists.txt
32–47 ↗(On Diff #372901)

Right, there was some opposition from @sbc100 though. I think we need to hear their thoughts on that.

If we do end up putting it in libObject, should the namespace be changed? Not sure either way.

avl added inline comments.Nov 18 2021, 3:27 AM
llvm/lib/Object/CMakeLists.txt
32–47 ↗(On Diff #372901)

Do you mean:

objcopy->object?
objcopy->objcopybase?

I think it is fine to have objcopy(like in this patch). Changing objcopy->object creates some names clashes. Thus we need to have other namespace than object for the objcopy part. Having objcopy inside Object library and inside llvm-objcopy tool seems to be no problem.

jhenderson added inline comments.Nov 18 2021, 3:28 AM
llvm/lib/Object/CMakeLists.txt
32–47 ↗(On Diff #372901)

I was referring to the first of those (using object instead of objcopy). If using the same namespace would be a problem, I don't see any issue with leaving as-is.

avl added inline comments.Nov 18 2021, 3:38 AM
llvm/unittests/Object/ObjCopyTest.cpp
49–50 ↗(On Diff #372901)

I was under the impression that temporary files are supposed to be deleted automatically (I might be wrong), so you wouldn't need the FileRemover?

I do not see the place where automatic removing is implemented. Also, there are examples of FileRemover usages in other unit tests:

https://github.com/llvm/llvm-project/blob/7b6790850968031fe1c098ed6dcc196ddc547ea5/llvm/unittests/Support/MemoryBufferTest.cpp#L105

More generally, the need to create a concrete file in the unit test makes me wonder whether we could enhance the objcopy API to take an object of some kind that could represent data in memory (as an alternative to on-disk). I believe such an object hierarchy already exists within LLVM, although the name escapes me right now.

I think it would be a good enhancement. I propose to do it in a separate patch though.

jhenderson added inline comments.Nov 18 2021, 4:03 AM
llvm/unittests/Object/ObjCopyTest.cpp
49–50 ↗(On Diff #372901)

I do not see the place where automatic removing is implemented

For Windows, see references to the OF_Delete flag in Path.inc. For Linux, the behaviour is configured higher up the stack. See RemoveFileOnSignal referenced by TempFile::create.

At the location you cited, FileRemover is needed, because we're not working with a TempFile there.

I think it would be a good enhancement. I propose to do it in a separate patch though.

Agreed, but I think it would be useful to see it before this patch gets committed, to avoid a bad interface landing.

avl added inline comments.Nov 18 2021, 4:36 AM
llvm/unittests/Object/ObjCopyTest.cpp
49–50 ↗(On Diff #372901)

Ah, the suggestion is to use sys::fs::TempFile::create instead of sys::fs::createTemporaryFile(). I see, thanks.

jhenderson added inline comments.Nov 18 2021, 4:38 AM
llvm/unittests/Object/ObjCopyTest.cpp
49–50 ↗(On Diff #372901)

I didn't realise the two did different things, but it would make sense to do as you are saying, yes.

avl added inline comments.Nov 18 2021, 8:27 AM
llvm/unittests/Object/ObjCopyTest.cpp
49–50 ↗(On Diff #372901)

I think we could not use sys::fs::TempFile::create() for reusing automatic removing. Please look at the use case:

Expected<sys::fs::TempFile> Temp =
  sys::fs::TempFile::create("a.temp-unittest-%%%%%%", Mode);

Config.Common.OutputFilename = Temp.TmpName;

// Call executeObjcopyOnBinary()
std::error_code EC;
raw_fd_ostream OutStream(Temp->FD, false);
Error Err = objcopy::executeObjcopyOnBinary(Config, *Obj.get(), OutStream);
OutStream.flush();

// if we call Temp.keep() here then the autoremoving functionality would be lost.
// if we call Temp.discard() then the file would be removed.

// if we not call Temp.keep() or Temp.discard()
// then we would try to open the temporarily file second time
// (inside ObjectFile::createObjectFile) which is undesirable.

// Load and check copied file.
Expected<OwningBinary<ObjectFile>> Result =
    ObjectFile::createObjectFile(Config.Common.OutputFilename);
jhenderson added inline comments.Nov 19 2021, 12:00 AM
llvm/unittests/Object/ObjCopyTest.cpp
49–50 ↗(On Diff #372901)

Fair enough. Stick with this for now, although that maybe suggests we should change the executeObjcopyOnBinary interface sooner rather than later, so that it can take an in-memory buffer, instead of an on-disk file.

avl updated this revision to Diff 388956.Nov 22 2021, 9:53 AM

addressed comments:

  1. deleted doc comments(will add them with the follow-up patch).
  2. moved methods implementation into .cpp file.
  3. optimized headers.
  4. did not modify cmake to create IDE folders(will do it with the follow-up patch).
  5. added traces in the test assertions.
  6. rewrote test to use inmemory output file.

A couple of nits remaining, otherwise basically looks good. I'd still like:

  1. @sbc100's response to the library division. I have a very slight preference for separate, but don't care enough about it to push for it, if others are opposed. Adding them as a reviewer in case it gets their attention.
  2. I'd like to see patches based on this patch for the other issues that your deferring (particularly the IDE one).

Once those are dealt with, I'll give this an LGTM.

llvm/include/llvm/Object/ObjCopy/ObjCopy.h
12 ↗(On Diff #388956)

I think you can forward declare Binary, Archive and NewArchiveMember, right, so that you don't need this header?

llvm/tools/llvm-objcopy/ObjcopyOptions.cpp
634–635

More clang-formatting that can be done as you rename the file.

Oh, ideally I'd also get a second reviewer to give this a once-over, in case I've missed anything too.

avl added a comment.Nov 23 2021, 2:23 AM

@sbc100's response to the library division. I have a very slight preference for separate, but don't care enough about it to push for it, if others are opposed. Adding them as a reviewer in case it gets their attention.

My own preference is also to make ObjCopy to be separate library, though I also do not want to push it if others want opposite. I read this message https://reviews.llvm.org/D88827#2344900 and this https://reviews.llvm.org/D88827#2346399 and this https://reviews.llvm.org/D88827#2346567 responses as a consensus on making it to be part of Object library. If we do not have a consensus, I am open to discuss it further.

llvm/include/llvm/Object/ObjCopy/ObjCopy.h
12 ↗(On Diff #388956)

std::vector<NewArchiveMember> wants to know sizeof (NewArchiveMember).
So, we could not have forward declaration here. i.e. we still need ArchiveWriter.h header.

avl updated this revision to Diff 389132.Nov 23 2021, 2:36 AM

addressed clang-format issue.

In D88827#3148315, @avl wrote:

@sbc100's response to the library division. I have a very slight preference for separate, but don't care enough about it to push for it, if others are opposed. Adding them as a reviewer in case it gets their attention.

My own preference is also to make ObjCopy to be separate library, though I also do not want to push it if others want opposite. I read this message https://reviews.llvm.org/D88827#2344900 and this https://reviews.llvm.org/D88827#2346399 and this https://reviews.llvm.org/D88827#2346567 responses as a consensus on making it to be part of Object library. If we do not have a consensus, I am open to discuss it further.

I changed my mind a bit since my early comment you linked, having gone cold on the review and now having come back to it. Here's my thinking: the object manipulation performed by the objcopy code is based on an internal Object class that has nothing to do with the object classes within the libObject library. By putting the two inside the same library, we risk confusion ("which type of Object do I need for this functionality?"). There are fundamental differences which would make reusing the libObject classes for the objcopy code less than ideal - it may not even be possible without a lot of work.

Regarding the Archive writing code being in libObject: it's my tentative opinion that archive functionality shouldn't be in libObject at all: an archive isn't really an object itself: it's a group of objects. The functionality for archives is completely unrelated to the functionality for other file types supported by the object library. As such, I think it should actually be moved into a separate libArchive library or similar.

Finally, there's @sbc100's point earlier: whilst it's true that by having the code in the same library doesn't force tools to link in that code, it does require distributions to include the llvm-objcopy library code anytime it needs the libObject code, because it's part of that library, even though it may not need it. I've seen several instances internally where we generate packages of LLVM libraries, without including all of them, so including the llvm-objcopy code in the libObject library would bloat the size of the package.

@echristo, @alexander-shaposhnikov, do either of you have any further thoughts? What do you think of my points above? I'm not strongly opposed to it going in libObject, but just think it makes a little more sense to be separate. Also, would either of you be able to give this a review more generally?

llvm/include/llvm/Object/ObjCopy/ObjCopy.h
12 ↗(On Diff #388956)

You learn something new everyday. I thought the type used in std::vector could also be forward declared if it was just used in the return signature. Guess I was wrong.

avl added a comment.Dec 7 2021, 3:16 AM

@echristo, @alexander-shaposhnikov Do you have objections to doing Objcopy to be a separate library instead of putting it into the existing Object library?

I agree with concerns raised by @jhenderson (https://reviews.llvm.org/D88827#3148452) and @sbc100 (https://reviews.llvm.org/D88827#2348439): Doing Objcopy as a separate library would allow to reduce code inter-dependency and reduce size of resulting code/package(when objcopy functionality is not neccessary).

avl added a comment.Dec 20 2021, 3:41 AM

@echristo, @alexander-shaposhnikov Would you mind taking part in this review, please? Your opinion might help to make a progress with it. During previous iterations, you preferred the variant when ObjCopy code would be moved into the Object library. There are arguments to make ObjCopy be a separate library. Do you have objections to that?

avl added a comment.Jan 13 2022, 4:11 AM

@jhenderson James, what do you think would be right to do if we do not have a response from others?

Both solutions(make it be a separate library or make it be part of Object) are OK for me.
It would be good if we might have a progress with this patch. Probably we can make a decision and continue with it?

@avl, I'd suggest emailing them directly to prod them, as they aren't responding. If they don't respond after that, I think you should adopt what active reviewers (i.e. me!) are suggesting.

On the note of activity, I will be off for at least 2 weeks, starting tomorrow, so may not be around to continue reviewing in that time!

avl added a comment.Jan 13 2022, 4:17 AM

@avl, I'd suggest emailing them directly to prod them, as they aren't responding. If they don't respond after that, I think you should adopt what active reviewers (i.e. me!) are suggesting.

On the note of activity, I will be off for at least 2 weeks, starting tomorrow, so may not be around to continue reviewing in that time!

I see, thanks! will follow that advice.

In D88827#3148315, @avl wrote:

@sbc100's response to the library division. I have a very slight preference for separate, but don't care enough about it to push for it, if others are opposed. Adding them as a reviewer in case it gets their attention.

My own preference is also to make ObjCopy to be separate library, though I also do not want to push it if others want opposite. I read this message https://reviews.llvm.org/D88827#2344900 and this https://reviews.llvm.org/D88827#2346399 and this https://reviews.llvm.org/D88827#2346567 responses as a consensus on making it to be part of Object library. If we do not have a consensus, I am open to discuss it further.

I changed my mind a bit since my early comment you linked, having gone cold on the review and now having come back to it. Here's my thinking: the object manipulation performed by the objcopy code is based on an internal Object class that has nothing to do with the object classes within the libObject library. By putting the two inside the same library, we risk confusion ("which type of Object do I need for this functionality?"). There are fundamental differences which would make reusing the libObject classes for the objcopy code less than ideal - it may not even be possible without a lot of work.

The original idea is that these classes -could- be used for this. If that's not true we should go back and fix that rather than have a separate set. That said, we do have a separate set which complicates matters. I think the question I'd ask is "how can we get from here to there?". No straightforward answer here for sure.

Regarding the Archive writing code being in libObject: it's my tentative opinion that archive functionality shouldn't be in libObject at all: an archive isn't really an object itself: it's a group of objects. The functionality for archives is completely unrelated to the functionality for other file types supported by the object library. As such, I think it should actually be moved into a separate libArchive library or similar.

From my perspective: ideally an archive should just be a specific type of object that contains other objects (similar to perhaps a shared library if we did that). Reading it just means cracking it, sharing the archive format information, and then iterating through each object accordingly. Writing would be the reverse. Any thoughts on where we're on a different page? :)

Finally, there's @sbc100's point earlier: whilst it's true that by having the code in the same library doesn't force tools to link in that code, it does require distributions to include the llvm-objcopy library code anytime it needs the libObject code, because it's part of that library, even though it may not need it. I've seen several instances internally where we generate packages of LLVM libraries, without including all of them, so including the llvm-objcopy code in the libObject library would bloat the size of the package.

Are you thinking about things just shipping .a or .o files or something else? Most platforms that I can think of can remove unneeded object code in a binary link so I'm curious about the concern here.

@echristo, @alexander-shaposhnikov, do either of you have any further thoughts? What do you think of my points above? I'm not strongly opposed to it going in libObject, but just think it makes a little more sense to be separate. Also, would either of you be able to give this a review more generally?

I'll do my best :)

-eric

llvm-objcopy has its own data models for some binary format specific things, so placing it under the llvm::object namespace may lead to some conflicts. Placing it into a sub-namespace does not avoid the conflict.
If it keeps using the llvm::objcopy namespace, then placing it under lib/Object may cause some confusion. A separate directory looks good to me.


An archive is a quite different container format. So I get the point that there is a question about whether it fits into lib/Object.
On the other hand, an archive is sometimes an input file type and used nearly indistinguishable with other file typed in syntax (ld.lld, llvm-readelf, ...). Having it in lib/Object seems fine to me...


By placing some less associated code in the same library, there is a risk that dependencies tend to become mixed.
Layered component design has some advantages but in practice it is easy to run into a situation that for all of {a,b,c}.cpp to include all of {a,b,c}.h...
Different lib/* directories tend to force developers to think harder on the layering.

I changed my mind a bit since my early comment you linked, having gone cold on the review and now having come back to it. Here's my thinking: the object manipulation performed by the objcopy code is based on an internal Object class that has nothing to do with the object classes within the libObject library. By putting the two inside the same library, we risk confusion ("which type of Object do I need for this functionality?"). There are fundamental differences which would make reusing the libObject classes for the objcopy code less than ideal - it may not even be possible without a lot of work.

The original idea is that these classes -could- be used for this. If that's not true we should go back and fix that rather than have a separate set. That said, we do have a separate set which complicates matters. I think the question I'd ask is "how can we get from here to there?". No straightforward answer here for sure.

I guess I might have misrepresented the internals of the llvm-objcopy classes: to populate them, they do use the libObject code. The llvm-objcopy Object class acts as another layer that provides manipulation functionality on top of the libObject object files. In some ways, it could actually be used purely for reading though, which raises some interesting thoughts which I can't exactly express. If we wanted to change the existing libObject code to support direct manipulation, without this extra layer, we'd need to change how those classes store data - at the moment, they just consist of references into memory buffers, rather than copies stored within the program. As such, it is impossible to change these references without changing how data is backed. The llvm-objcopy library provides this additional layer in essence. I think this shows that, at least at the moment, the llvm-objcopy Object code is a separate layer. Whethere that then means the separate layer should be in a separate library is probably a different point (it means it can be, but doesn't mean it must be.

Regarding the Archive writing code being in libObject: it's my tentative opinion that archive functionality shouldn't be in libObject at all: an archive isn't really an object itself: it's a group of objects. The functionality for archives is completely unrelated to the functionality for other file types supported by the object library. As such, I think it should actually be moved into a separate libArchive library or similar.

From my perspective: ideally an archive should just be a specific type of object that contains other objects (similar to perhaps a shared library if we did that). Reading it just means cracking it, sharing the archive format information, and then iterating through each object accordingly. Writing would be the reverse. Any thoughts on where we're on a different page? :)

I don't think we really disagree here (although I wouldn't call an archive an object in those words). The point I was largely addressing was the earlier point made by @alexander-shaposhnikov about how the object library already allows manipulation of archives, unlike the other object formats.

Finally, there's @sbc100's point earlier: whilst it's true that by having the code in the same library doesn't force tools to link in that code, it does require distributions to include the llvm-objcopy library code anytime it needs the libObject code, because it's part of that library, even though it may not need it. I've seen several instances internally where we generate packages of LLVM libraries, without including all of them, so including the llvm-objcopy code in the libObject library would bloat the size of the package.

Are you thinking about things just shipping .a or .o files or something else? Most platforms that I can think of can remove unneeded object code in a binary link so I'm curious about the concern here.

Yeah, we have some cases where we distribute .a files (or more specifically Windows .lib files, but the principle is the same). A couple of concrete examples to illustrate the points: firstly, if I want to build LLD, I don't need the llvm-objcopy code. Let's say llvm-objcopy included 5 files. If those 5 files are included in libObject, they need to be compiled and added to that library, as part fo the build process for building LLD, even though those files aren't used in LLD's link, thus unnecessarily slowing down the build as a whole. Secondly, if I have a distribution that wants to ship LLVM libraries sufficient for developers to read objects, we'd end up distributing those same 5 files as part of libObject, unnecessarily. The end executables won't be any bigger in either case of course.

@echristo, @alexander-shaposhnikov, do either of you have any further thoughts? What do you think of my points above? I'm not strongly opposed to it going in libObject, but just think it makes a little more sense to be separate. Also, would either of you be able to give this a review more generally?

I'll do my best :)

-eric

FYI, I'm on leave for the next 2-3 weeks, so won't be able to respond to further points after this for a while.

llvm-objcopy has its own data models for some binary format specific things, so placing it under the llvm::object namespace may lead to some conflicts. Placing it into a sub-namespace does not avoid the conflict.

Not sure I follow how placing things into, say, llvm::object::objcopy would have potential for conflicts beyond the current state of the code?

avl added a comment.Jan 14 2022, 4:22 AM

Finally, there's @sbc100's point earlier: whilst it's true that by having the code in the same library doesn't force tools to link in that code, it does require distributions to include the llvm-objcopy library code anytime it needs the libObject code, because it's part of that library, even though it may not need it. I've seen several instances internally where we generate packages of LLVM libraries, without including all of them, so including the llvm-objcopy code in the libObject library would bloat the size of the package.

Are you thinking about things just shipping .a or .o files or something else? Most platforms that I can think of can remove unneeded object code in a binary link so I'm curious about the concern here.

Yeah, we have some cases where we distribute .a files (or more specifically Windows .lib files, but the principle is the same). A couple of concrete examples to illustrate the points: firstly, if I want to build LLD, I don't need the llvm-objcopy code. Let's say llvm-objcopy included 5 files. If those 5 files are included in libObject, they need to be compiled and added to that library, as part fo the build process for building LLD, even though those files aren't used in LLD's link, thus unnecessarily slowing down the build as a whole. Secondly, if I have a distribution that wants to ship LLVM libraries sufficient for developers to read objects, we'd end up distributing those same 5 files as part of libObject, unnecessarily. The end executables won't be any bigger in either case of course.

there is also a scenario when end executables would be bigger(though it is not about shipping, this is for debug builds). It is assumed that --gc-sections or similar will remove unused code and then the final executable would be small. That is not exactly true for debug builds. --gc-sections will not remove unused debuginfo. Thus end debug executables would contain unnecessary debuginfo and then will be bigger.

In D88827#3243279, @avl wrote:

Finally, there's @sbc100's point earlier: whilst it's true that by having the code in the same library doesn't force tools to link in that code, it does require distributions to include the llvm-objcopy library code anytime it needs the libObject code, because it's part of that library, even though it may not need it. I've seen several instances internally where we generate packages of LLVM libraries, without including all of them, so including the llvm-objcopy code in the libObject library would bloat the size of the package.

Are you thinking about things just shipping .a or .o files or something else? Most platforms that I can think of can remove unneeded object code in a binary link so I'm curious about the concern here.

Yeah, we have some cases where we distribute .a files (or more specifically Windows .lib files, but the principle is the same). A couple of concrete examples to illustrate the points: firstly, if I want to build LLD, I don't need the llvm-objcopy code. Let's say llvm-objcopy included 5 files. If those 5 files are included in libObject, they need to be compiled and added to that library, as part fo the build process for building LLD, even though those files aren't used in LLD's link, thus unnecessarily slowing down the build as a whole. Secondly, if I have a distribution that wants to ship LLVM libraries sufficient for developers to read objects, we'd end up distributing those same 5 files as part of libObject, unnecessarily. The end executables won't be any bigger in either case of course.

there is also a scenario when end executables would be bigger(though it is not about shipping, this is for debug builds). It is assumed that --gc-sections or similar will remove unused code and then the final executable would be small. That is not exactly true for debug builds. --gc-sections will not remove unused debuginfo. Thus end debug executables would contain unnecessary debuginfo and then will be bigger.

This isn't generally true: even in debug builds and without --gc-sections, linkers tend not to link in unused objects into the final executable.

avl added a comment.Jan 14 2022, 4:41 AM

This isn't generally true: even in debug builds and without --gc-sections, linkers tend not to link in unused objects into the final executable.

I remember there was a patch to make it possible - https://reviews.llvm.org/D54747 . Which finally was not applied and current behavior that it is not working(if I did not miss something new).

In D88827#3243298, @avl wrote:

This isn't generally true: even in debug builds and without --gc-sections, linkers tend not to link in unused objects into the final executable.

I remember there was a patch to make it possible - https://reviews.llvm.org/D54747 . Which finally was not applied and current behavior that it is not working(if I did not miss something new).

The difference here is that if you feed the linker a specific set of object files, it will include all of them (minus what --gc-sections removes). But if you feed the linker a static library, it will only include the object files that actually are referenced (unless the library is added with the --whole-archive option or something similar).

avl added a comment.Jan 14 2022, 5:34 AM
In D88827#3243298, @avl wrote:

This isn't generally true: even in debug builds and without --gc-sections, linkers tend not to link in unused objects into the final executable.

I remember there was a patch to make it possible - https://reviews.llvm.org/D54747 . Which finally was not applied and current behavior that it is not working(if I did not miss something new).

The difference here is that if you feed the linker a specific set of object files, it will include all of them (minus what --gc-sections removes). But if you feed the linker a static library, it will only include the object files that actually are referenced (unless the library is added with the --whole-archive option or something similar).

Ah, right. We are talking about library here.

@jhenderson accurately expressed my considerations / original reasoning. Sorry about the very late reply. Perhaps, it's worth adding a few words.
I kind of envisioned the following minimalistic interface exposed by the library (essentially just a single function):

Error copy(const Config &C, const object::Object &O, raw_ostream &Out)

My hope was that we would refactor Config significantly and clean it up with 2 major goals in mind: (1) have proper separation of concerns and structure: Config { ... <common options>, COFFConfig, ELFConfig, MachOConfig };
(in particular, the fields which are not required anymore would be removed) (2) the functionality which is specific to the tool (e.g. creating a .DWO file) would live in the tool and would not be a part of the library. The same applies e.g. to setting file attributes.
The question wether it should be a separate library of a part of libObject - I had mixed feelings, based on the interface I thought it was appropriate to have it in libObject, but making it a separate library also makes sense to me.

avl added a comment.Jan 17 2022, 9:03 AM

@alexander-shaposhnikov My understaning is that current state of this patch is pretty close to above description:

I kind of envisioned the following minimalistic interface exposed by the library (essentially just a single function):

Error copy(const Config &C, const object::Object &O, raw_ostream &Out)

Error executeObjcopyOnBinary(const MultiFormatConfig &Config,
                             object::Binary &In, raw_ostream &Out);

This function does dispatch on input formats and calls format-specific function.

It is not a single function though. I think we may refactor it further with separate patches.

My hope was that we would refactor Config significantly and clean it up with 2 major goals in mind: (1) have proper separation of concerns and structure: Config { ... <common options>, COFFConfig, ELFConfig, MachOConfig };
(in particular, the fields which are not required anymore would be removed)

Config was refactored by following patches :

https://reviews.llvm.org/D99055
https://reviews.llvm.org/D102277
https://reviews.llvm.org/D103260

(2) the functionality which is specific to the tool (e.g. creating a .DWO file) would live in the tool and would not be a part of the library. The same applies e.g. to setting file attributes.

these patches did it possible to leave creating a .DWO file, setting attributes, creating streams inside the tool:

https://reviews.llvm.org/D98582
https://reviews.llvm.org/D98511
https://reviews.llvm.org/D91028
https://reviews.llvm.org/D98426
https://reviews.llvm.org/D95478

avl added a comment.Feb 7 2022, 1:49 AM

@echristo Are you satisfied with response by @jhenderson ?

following is a reminder, if messages are lost in the history:

That is @echristo message - https://reviews.llvm.org/D88827#3242326
That is @jhenderson response - https://reviews.llvm.org/D88827#3243266

avl updated this revision to Diff 409152.Feb 15 2022, 10:01 PM

move implementation into the separate library ObjCopy.

jhenderson accepted this revision.Feb 16 2022, 1:35 AM

LGTM, with two nits.

llvm/include/llvm/ObjCopy/ObjCopy.h
23–28

This doesn't need to be in the header, right? I'd get rid of it, as it's not really part fo the public interface (executeObjcopyOnArchive would be the right thing to use, I believe(?)).

llvm/tools/llvm-objcopy/llvm-objcopy.cpp
9

Delete this blank line then resort the headers.

This revision is now accepted and ready to land.Feb 16 2022, 1:35 AM
avl added inline comments.Feb 16 2022, 2:19 AM
llvm/include/llvm/ObjCopy/ObjCopy.h
23–28

This also is used in lib/ObjCopy/MachO/MachOObjcopy.cpp. But, indeed, it might be removed from the public interface - include/llvm/ObjCopy/ObjCopy.h. Would it be OK if declaration of createNewArchiveMembers would be put into the new local header llvm/lib/ObjCopy/Archive.h ?

jhenderson added inline comments.Feb 16 2022, 2:20 AM
llvm/include/llvm/ObjCopy/ObjCopy.h
23–28

Yes, I think that should be okay.

avl updated this revision to Diff 409266.Feb 16 2022, 7:59 AM

addressed comments - moved createNewArchiveMembers into separate header.

avl edited the summary of this revision. (Show Details)Feb 17 2022, 1:55 AM
This revision was landed with ongoing or failed builds.Feb 17 2022, 2:12 AM
This revision was automatically updated to reflect the committed changes.
avl added a comment.Feb 17 2022, 2:14 AM

@jhenderson Thank you for the review!

thakis added a subscriber: thakis.Feb 17 2022, 5:53 AM

As far as I know, you also have to update the entries in clang/docs/tools/clang-formatted-files.txt.

llvm/lib/CMakeLists.txt
24

Can we be consistent about capitalization of "ObjCopy" or "Objcopy"? The new lib capitalizes the C, but e.g. ObjcopyOptions.cpp doesn't. (Don't care which way, just be consistent).

% git grep Objcopy | wc -l
     150
% git grep ObjCopy | wc -l
      77
llvm/unittests/ObjCopy/CMakeLists.txt
1

I think you have to add_subdirectory this in llvm/unittests/CMakeLists.txt to get this test binary actually built.

avl added a comment.Feb 17 2022, 6:25 AM

@thakis Thanks! will implement comments with followup commit.

avl added a comment.Feb 17 2022, 8:36 AM

@thakis addressed comments in https://reviews.llvm.org/rG0b57e6c46b707c0e7a123efe82abf3c1e7b5a503. As to the consistent naming - will do it in the separate patch.

I'm also seeing a bunch of warnings that look related:

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (ObjCopy.Object.o) in output file used for input files: obj/llvm/lib/ObjCopy/ELF/ObjCopy.Object.o and: obj/llvm/lib/ObjCopy/COFF/ObjCopy.Object.o (due to use of basename, truncation, blank padding or duplicate input files)
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (ObjCopy.Object.o) in output file used for input files: obj/llvm/lib/ObjCopy/COFF/ObjCopy.Object.o and: obj/llvm/lib/ObjCopy/MachO/ObjCopy.Object.o (due to use of basename, truncation, blank padding or duplicate input files)
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (ObjCopy.Object.o) in output file used for input files: obj/llvm/lib/ObjCopy/MachO/ObjCopy.Object.o and: obj/llvm/lib/ObjCopy/wasm/ObjCopy.Object.o (due to use of basename, truncation, blank padding or duplicate input files)
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (ObjCopy.Reader.o) in output file used for input files: obj/llvm/lib/ObjCopy/COFF/ObjCopy.Reader.o and: obj/llvm/lib/ObjCopy/wasm/ObjCopy.Reader.o (due to use of basename, truncation, blank padding or duplicate input files)
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (ObjCopy.Writer.o) in output file used for input files: obj/llvm/lib/ObjCopy/wasm/ObjCopy.Writer.o and: obj/llvm/lib/ObjCopy/COFF/ObjCopy.Writer.o (due to use of basename, truncation, blank padding or duplicate input files)
avl added a comment.Feb 22 2022, 6:28 AM

I'm also seeing a bunch of warnings that look related:

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (ObjCopy.Object.o) in output file used for input files: obj/llvm/lib/ObjCopy/ELF/ObjCopy.Object.o and: obj/llvm/lib/ObjCopy/COFF/ObjCopy.Object.o (due to use of basename, truncation, blank padding or duplicate input files)
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (ObjCopy.Object.o) in output file used for input files: obj/llvm/lib/ObjCopy/COFF/ObjCopy.Object.o and: obj/llvm/lib/ObjCopy/MachO/ObjCopy.Object.o (due to use of basename, truncation, blank padding or duplicate input files)
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (ObjCopy.Object.o) in output file used for input files: obj/llvm/lib/ObjCopy/MachO/ObjCopy.Object.o and: obj/llvm/lib/ObjCopy/wasm/ObjCopy.Object.o (due to use of basename, truncation, blank padding or duplicate input files)
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (ObjCopy.Reader.o) in output file used for input files: obj/llvm/lib/ObjCopy/COFF/ObjCopy.Reader.o and: obj/llvm/lib/ObjCopy/wasm/ObjCopy.Reader.o (due to use of basename, truncation, blank padding or duplicate input files)
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (ObjCopy.Writer.o) in output file used for input files: obj/llvm/lib/ObjCopy/wasm/ObjCopy.Writer.o and: obj/llvm/lib/ObjCopy/COFF/ObjCopy.Writer.o (due to use of basename, truncation, blank padding or duplicate input files)

My understanding is that it is not a problem for linking. But it may be a problem if one would want to extract the files from libLLVMObjCopy.a. To avoid this problem we need to rename files. Will do it with separate patch.