This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy][NFC] Move core implementation of llvm-objcopy into separate library.
ClosedPublic

Authored by avl on Oct 5 2020, 5:29 AM.

Details

Summary

This patch moves core implementation of llvm-objcopy into ObjCopy library
(http://lists.llvm.org/pipermail/llvm-dev/2020-September/145075.html).
The functionality for parsing input options is left inside tools/llvm-objcopy.
The interface of ObjCopy library:

ObjCopy/ELF/ELFObjcopy.h

Error executeObjcopyOnIHex(const CopyConfig &Config, MemoryBuffer &In,
                           Buffer &Out);
Error executeObjcopyOnRawBinary(const CopyConfig &Config, MemoryBuffer &In,
                                Buffer &Out);
Error executeObjcopyOnBinary(const CopyConfig &Config,
                             object::ELFObjectFileBase &In, Buffer &Out);

ObjCopy/COFF/COFFObjcopy.h

Error executeObjcopyOnBinary(const CopyConfig &Config,
                             object::COFFObjectFile &In, Buffer &Out);

ObjCopy/MachO/MachOObjcopy.h

Error executeObjcopyOnBinary(const CopyConfig &Config,
                             object::MachOObjectFile &In, Buffer &Out);

ObjCopy/wasm/WasmObjcopy.h

Error executeObjcopyOnBinary(const CopyConfig &Config,
                             object::WasmObjectFile &In, Buffer &Out);

Diff Detail

Event Timeline

avl created this revision.Oct 5 2020, 5:29 AM
avl requested review of this revision.Oct 5 2020, 5:30 AM
avl edited the summary of this revision. (Show Details)Oct 5 2020, 5:58 AM
grimar added a comment.Oct 6 2020, 1:02 AM

This looks fine to me generally. Have a suggestion though:
in few places there are changes that are related to clang-formatting of original code it seems.
I'd commit them as a separate NFC cleanup (you don't need a review for that) and then rebase this diff.

llvm/unittests/ObjCopy/ObjCopyTest.cpp
40

Perhaps just fail inside?

51

You can avod having this ASSERT_TRUE if you use cast<T>, I think.
Also, you can use T & instead of a pointer, because it is expected that the value is always non-null.

100

Do you need Machine: EM_X86_64? By default it is EM_NONE, so it should work without an explicit value I guess.

This change needs some analysis & review, it might take some time (especially because this week we have the LLVM conference).

A few considerations / observations / questions.

  1. While this diff moves the implementation of llvm-objcopy into a library it seems like the current interfaces / design are not ideal for a library.

For example, CopyConfig essentially represents parsed command-line options and it is "string-heavy".
Another manifestation of this problem is that the "public" headers (include/llvm/ObjCopy/...) expose a great deal of the internal implementation details etc.

  1. Designing a good interface is a nontrivial task. It is important to understand the scope of the problem and potentially dissect it into subparts.

For example if the plan (at least initially) is to reuse the code for reading / writing object files then one of the first steps would be factoring out (and, probably, cleaning up) the model ("class Object") and
exposing the minimal interface for reading / writing . In particular, the internal details of implementation and the associated complexity (e.g. class Reader, class Writer) would live inside the library.

avl updated this revision to Diff 296459.Oct 6 2020, 7:36 AM

addressed comments.

avl added a comment.Oct 6 2020, 7:40 AM

@alexshap

This refactoring tries to reuse this part of the functionality:

Error executeObjcopyOnBinary(const CopyConfig &Config, object::ELFObjectFileBase &In, Buffer &Out);

get source file "In", apply transformations described by "Config", write output into the "Out".

i.e. it is not trying to refactor low level of the objcopy code("class Object" and others). It still should be internal thing of ObjCopy library.

I agree that the patch currently makes internal details (class Object, class Reader, class Writer) to be public.
They should be hidden inside library code. I addressed this into the new version of the patch.

avl updated this revision to Diff 296524.Oct 6 2020, 12:44 PM

apply clang-tidy comments.

Looks pretty good from my point of view. Some general comments:

  1. The functions in the library header need doxygen style comments to explain the interface. Possibly these could be added in a follow-up patch.
  2. I think if we are moving files around, now is a good time to run a blanket clang-format on the files. Whether that should be done as part of the patch or a separate follow-on one, I don't know.
  3. Probably not this patch, but we should consider our testing strategy for the libray and llvm-objcopy going forward. Some examples for this dicussion: should we move the existing lit tests? Should we port/duplicate some of them as gtest unit tests? How should new library features be tested (gtest or lit)? How should new llvm-strip/objcopy/... options be tested (bearing in mind in theory there should be library testing etc etc)? I think this sort of discussion probably belongs on the mailing list.

I think keeping the surface area of the library down to just the executeObjcopyOn* functions plus the config struct is a pretty good example of a facade style design pattern, and I think makes sense in this context for at least the first version. This assumes that the use-case is a simple in-process read-in/transform/write-out (possibly to a memory buffer for further operations by other tools). We might want to expose other bits of the process and provide other ways of driving the objcopy process in the future (e.g. making it more interactive somehow), but I think that is an extension for later. We do need to nail down the initial use-case(s) though to make sure we don't produce something that isn't useful.

llvm/include/llvm/ObjCopy/Buffer.h
23–24 ↗(On Diff #296524)

Maybe as part of a separate patch, it would be worth taking a look at this TODO. It would be great if the Buffer could be removed from the library API and generic LLVM buffers used instead (for example an in-memory buffer or a file buffer, depending on what people want to do).

llvm/include/llvm/ObjCopy/CopyConfig.h
126–127 ↗(On Diff #296524)

This comment probably needs updating to better match the new usage - but see out-of-line comment.

llvm/include/llvm/ObjCopy/ObjCopy.h
20–23

I think details of the output format shouldn't be described in the comment - theoretically objcopy could even mutate from one object format to another (see e.g. the IHEX stuff).

27–29
llvm/lib/ObjCopy/COFF/Object.cpp
1

Whilst you're moving this and the equivalent files for other formats around, could you please rename them to be obvious from the filename which format they are for (same goes for their headers), please? For example, COFF/Object.cpp -> COFF/COFFObject.cpp. The reason for this is that when using the Visual Studio IDE, all the "Object.cpp" files end up listed next to each other in the file browser, and the only way of figuring out which is which is by opening them and seeing.

llvm/lib/ObjCopy/wasm/Reader.h
9–10
avl added a comment.Oct 8 2020, 12:13 AM

The functions in the library header need doxygen style comments to explain the interface. Possibly these could be added in a follow-up patch.

Ok.

I think if we are moving files around, now is a good time to run a blanket clang-format on the files. Whether that should be done as part of the patch or a separate follow-on one, I don't know.

Ok, but I think it is better to do in a separate patch.

Probably not this patch, but we should consider our testing strategy for the libray and llvm-objcopy going forward. Some examples for this dicussion: should we move the existing lit tests? Should we port/duplicate some of them as gtest unit tests? How should new library features be tested (gtest or lit)? How should new llvm-strip/objcopy/... options be tested (bearing in mind in theory there should be library testing etc etc)? I think this sort of discussion probably belongs on the mailing list.

OK, I will start that thread soon.

llvm/include/llvm/ObjCopy/Buffer.h
23–24 ↗(On Diff #296524)

agreed, but I think it is better to do in separate patch.

llvm/include/llvm/ObjCopy/CopyConfig.h
126–127 ↗(On Diff #296524)

ok.

llvm/include/llvm/ObjCopy/ObjCopy.h
20–23

ok.

llvm/lib/ObjCopy/COFF/Object.cpp
1

ok.

I have some general comments / concerns (in addition to the inline comment).
The interface of the library is important and once it's committed and people start using the library in multiple places it might be harder to make changes / fix issues
(unfortunately this has already happened in LLVM a few times in the past) .

  1. CopyConfig as a part of the interface in its current form seems to be not "in a perfect shape". E.g. some ELF-specific options are inside the struct ELF, some others are just regular fields.

Some fields make sense in the context of the tool but they don't (at least at quick glance) in the context of a library function.

  1. class Buffer - I agree with @jhenderson's comment. Just want to note that in general there are multiple ways how to address it, but this requires some thinking.

1, 2 - perhaps, this can be refactored / reorganized / better documented, but it would be great to have it done before the introduction of the library.

I'd like to point out that not all the operations supported by llvm-objcopy can be performed in-memory (e.g. splitting dwo will create a new file). If a library function has such side effects (creates new objects on the local file system) it would be good to have it documented.

cc: @mtrent , @dblaikie, @echristo

llvm/lib/ObjCopy/COFF/Object.cpp
1

@jhenderson, I'm sorry to disagree, but renaming files this way doesn't seem to be a good idea and the provided justification doesn't appear to be sufficient.
Since this file contains the implementation of what's declared in Object.h I would strongly prefer to have it named Object.cpp given it is already located in the corresponding folder. Visual Studio IDE might have some peculiarities but having consistent naming is important, adding such prefixes doesn't seem to be a good approach.

avl added a comment.Oct 8 2020, 5:32 AM

1, 2 - perhaps, this can be refactored / reorganized / better documented, but it would be great to have it done before the introduction of the library.

if possible - I propose to do these points as next patches after this(to make some progress on it). If not - I would work on preliminary patches.

llvm/lib/ObjCopy/COFF/Object.cpp
1

Would it be OK, If both of the files would be renamed Object.h -> COFFObject.h and Object.cpp->COFFObject.cpp ?

avl added inline comments.Oct 9 2020, 8:07 AM
llvm/include/llvm/ObjCopy/Buffer.h
23–24 ↗(On Diff #296524)

@alexshap @jhenderson

Speaking of class Buffer refactoring. I do not think that this comment is fully valid:

// The class Buffer abstracts out the common interface of FileOutputBuffer and
// WritableMemoryBuffer so that the hierarchy of Writers depends on this
// abstract interface and doesn't depend on a particular implementation.
// TODO: refactor the buffer classes in LLVM to enable us to use them here
// directly.

It suggests to create some common interface for FileOutputBuffer and WritableMemoryBuffer.
Which is assumed to look similar to this:

class Buffer {
  StringRef Name;

public:
  virtual ~Buffer();
  virtual Error allocate(size_t Size) = 0;
  virtual uint8_t *getBufferStart() = 0;
  virtual Error commit() = 0;

  explicit Buffer(StringRef Name) : Name(Name) {}
  StringRef getName() const { return Name; }
};

There exists a problem with methods commit() and allocate(). commit() is a redundant for WritableMemoryBuffer.
adding it to WritableMemoryBuffer would require to patch all current usages of WritableMemoryBuffer.
So it looks incorrectly to use it for common parent interface of FileOutputBuffer and WritableMemoryBuffer.
allocate() suggests another way of buffer creation. Currently, buffers are created by static creation methods:

static std::unique_ptr<WritableMemoryBuffer>
getNewMemBuffer(size_t Size, const Twine &BufferName = "");

static Expected<std::unique_ptr<FileOutputBuffer>>
create(StringRef FilePath, size_t Size, unsigned Flags = 0);

Adding "virtual Error allocate(size_t Size)" would lead to creation empty buffer by static creation method
and then call to allocate(). This does not seem a good addition to the already existed FileOutputBuffer and WritableMemoryBuffer. FileOutputBuffer and WritableMemoryBuffer assume another use cases than Buffer.

Actually, what is neccessary by objcopy is method:

uint8_t *createBuffer ( size_t Size );

All other functionality is redundant and could be removed from objcopy.

what do you think about following design?

LazyBuffer {
  StringRef Name;

  virtual StringRef getName() const { return Name; }
  virtual uint8_t *createBuffer ( size_t Size ) = 0;
};

MemoryLazyBuffer : public LazyBuffer {
  virtual uint8_t *createBuffer ( size_t Size ) {
    Buffer = WritableMemoryBuffer::getNewMemBuffer(Size, Name);
    return Buffer->getBufferStart();
  }

  std::unique_ptr<WritableMemoryBuffer> Buffer;
};

FileLazyBuffer : public LazyBuffer {
  virtual uint8_t *createBuffer ( size_t Size ) {
    Buffer = FileOutputBuffer::create(Name, Size);
    return Buffer->getBufferStart();
  }

  std::unique_ptr<FileOutputBuffer> Buffer;
};

Usage:

static Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In,
                                    LazyBuffer &Out);


MemoryLazyBuffer MB("name");
executeObjcopyOnBinary(Config, Input, MB);

FileLazyBuffer MB("name");
executeObjcopyOnBinary(Config, Input, MB);
if (MB.Buffer)
  MB.Buffer->commit();

Another alternative is that library always writes to general MemoryBuffer :

static Expected<std::unique_ptr<MemoryBuffer>> executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In);

and later this MemoryBuffer would be written into the file by llvm-objcopy.cpp.

What do you think?

I like the direction this is going, I'll take more of a deep look soon, but wanted to ask: "should this be in Object rather than a separate library?" When I'd originally asked for this to be split into its own library I'd thought that it would get added into libobject.

Thoughts?

-eric

I like the direction this is going, I'll take more of a deep look soon, but wanted to ask: "should this be in Object rather than a separate library?" When I'd originally asked for this to be split into its own library I'd thought that it would get added into libobject.

Thoughts?

-eric

One of the ideals would be to have writable versions of the various ObjectFile classes defined by the Object library. I know this was worked on last year as part of GSOC by @abrachet, but it didn't really end up getting to a usable point. I kind of like the separation of concerns here - the proposed Objcopy library would handle manipulation of object files, whilst the Object library is primarily for parsing and inspecting them. The former builds on the latter, but the latter doesn't need to care about the former. Thus a user who wrote an object dumping tool wouldn't need the Objcopy library. However, I don't have a strong opinion on this, so if a good design could be presented to resolve that, I'd be happy.

In D88827#2319015, @alexshap wrote:

I have some general comments / concerns (in addition to the inline comment).
The interface of the library is important and once it's committed and people start using the library in multiple places it might be harder to make changes / fix issues
(unfortunately this has already happened in LLVM a few times in the past) .

Could we maybe put a big note at the top of the library header saying that the API is still work-in-progress and shouldn't be used, until we've got to the final point? I feel like moving the code first is a good step and then we can iterate on the design once it's in place (I agree that CopyConfig probably needs more work).

llvm/include/llvm/ObjCopy/Buffer.h
23–24 ↗(On Diff #296524)

I'm not sure I've looked at the intricacies of the different buffer classes to know what the right approach is. However, if modifying the existing buffers seems like it won't work/will be too invasive, this seems like a fair approach (it's the "Adapter" design pattern in action, I believe).

I'm not sure I'd call it LazyBuffer, although I don't have a specific better name (maybe just ObjcopyBuffer, depending on how generic we want it to be). Also, unless the objcopy code actually needs the name for anything (aside from error messages, I'm not sure what that would be), I'd not include that in the interface.

avl added a comment.Oct 14 2020, 5:39 AM

I like the direction this is going, I'll take more of a deep look soon, but wanted to ask: "should this be in Object rather than a separate library?" When I'd originally asked for this to be split into its own library I'd thought that it would get added into libobject.

Thoughts?

I think, for the start, it is probably better to make it as a separate library. It could be refactored later and some part could probably be moved into Object library(Object.h/.cpp, Reader.h/.cpp, /Writer.h/.cpp).
If the ObjCopy library would be used in many other places in the end - then it would make sense to put it into the Object library, otherwise it would be better to live it as separate library.
Though, I am not against moving this ObjCopy library into the Object library, if there is opinion that it would be better.

avl added inline comments.Oct 20 2020, 8:59 AM
llvm/lib/ObjCopy/COFF/Object.cpp
1

@jhenderson, I'm sorry to disagree, but renaming files this way doesn't seem to be a good idea and the provided justification doesn't appear to be sufficient.
Since this file contains the implementation of what's declared in Object.h I would strongly prefer to have it named Object.cpp given it is already located in the corresponding folder. Visual Studio IDE might have some peculiarities but having consistent naming is important, adding such prefixes doesn't seem to be a good approach.

@alexshap Could you explain this renaming thing, please? i.e. if both header file COFF/Object.h and src file COFF/Object.cpp would be renamed(COFF/COFFObject.h, COFF/COFFObject.cpp), would it be OK?

avl updated this revision to Diff 299664.Oct 21 2020, 6:23 AM

addressed comments: renamed files, added doxygen comments into library headers(except CopyConfig.h).

I propose to go ahead with this patch and continue with follow-up patches:

  1. Remove Buffer.h/Buffer.cpp.
  2. cleanup CopyConfig.h.
  3. remove handling of SplitDWO from the library.

I really do think this needs to be under object. It can be separate in there as lib/Object/Copy if you want, but I don't think it should be a parallel directory. Hopefully the updates for the move aren't overly onerous here.

-eric

I really do think this needs to be under object. It can be separate in there as lib/Object/Copy if you want, but I don't think it should be a parallel directory. Hopefully the updates for the move aren't overly onerous here.

-eric

Won't that mean that any tool that just wants to read in object files (e.g. lld) will need to build this extra code for no reason? Not a huge deal I guess (especially for lld which due to LTO pulls in all of llvm!) but I imagine there are many tools that just want to read (and not modify) objects.

My 0.02$: (perhaps, this should have been mentioned earlier) the current class CopyConfig contains e.g. file names (again, imo it is good enough for a tool, but not good enough for a library) and this means that if somebody wants to add a section to an object file he won't be able to accomplish this task using the current interface without creating extra files. It kind of defeats the idea. To solve this problem proper abstractions should be introduced / the code needs to be refactored. Personally I would strongly prefer to see the following iterative approach here: refactor the current code in llvm-objcopy step by step until it's ready to be moved into a library with a clean and easy-to-use interface. Maybe I'm missing something, but doing refactoring post factum seems to be a less controllable process and might get us to the state where the code has been move out of the tool, the interface has been modified to accomplish a very specific task and the rest (burden) will stay there for years creating more issues than benefits, moreover, it would introduce some risks.
Regarding where to place these functions - into libObject or create a separate library - libObject already contains several write* functions, (e.g. for archives), so indeed, putting this group of functions (e.g. one can use a bit less verbose name - copy(...)) into libObject seems to be quite natural.

llvm/lib/ObjCopy/COFF/Object.cpp
1

I'm very sorry, but i still think that the old names were good, adding these prefixes is unnecessary and makes things less intuitive (e.g. class Object is described in Object.h).

I really do think this needs to be under object. It can be separate in there as lib/Object/Copy if you want, but I don't think it should be a parallel directory. Hopefully the updates for the move aren't overly onerous here.

-eric

Won't that mean that any tool that just wants to read in object files (e.g. lld) will need to build this extra code for no reason? Not a huge deal I guess (especially for lld which due to LTO pulls in all of llvm!) but I imagine there are many tools that just want to read (and not modify) objects.

That will depend on the archiver/linker behaviour, I expect, but at least for standard ld.lld behaviour (I can't speak for others), only the objects that actually are needed are pulled from the archive. Therefore, for example, if LLD didn't reference anything in the objcopy codebase, it wouldn't actually use those archive members, and therefore it wouldn't be added to the tools code. Even if it did, linker options like --gc-sections or equivalent likely would cause the unused code to be removed at link time, if the tool is built appropriately.

@alexshap's point about other files in the Object library allowing writing is persuasive to me. I think an Objcopy sub-directory would make sense though, to avoid the confusion of having both an ObjectFile and Object class/header file etc in Object. This is assuming of course we don't want to jump into refactoring the two classes so that they become one. I suspect that way might prove tricky to get right.

I'm somewhat ambivalent as to whether we move then refactor or refactor then move. If people feel that the latter approach is the right one, then so be it.

Regarding the naming of Object.h, that's fine, leave it as-is. I suspect with some small CMake changes, it should be possible to get the files to appear in a corresponding "solution folder" within the VS IDE, which would help disambiguate things.

I really do think this needs to be under object. It can be separate in there as lib/Object/Copy if you want, but I don't think it should be a parallel directory. Hopefully the updates for the move aren't overly onerous here.

-eric

Won't that mean that any tool that just wants to read in object files (e.g. lld) will need to build this extra code for no reason? Not a huge deal I guess (especially for lld which due to LTO pulls in all of llvm!) but I imagine there are many tools that just want to read (and not modify) objects.

That will depend on the archiver/linker behaviour, I expect, but at least for standard ld.lld behaviour (I can't speak for others), only the objects that actually are needed are pulled from the archive. Therefore, for example, if LLD didn't reference anything in the objcopy codebase, it wouldn't actually use those archive members, and therefore it wouldn't be added to the tools code. Even if it did, linker options like --gc-sections or equivalent likely would cause the unused code to be removed at link time, if the tool is built appropriately.

You are right, the size of resulting tools won't be increased. I was referring to the fact that such a tool would transitively depend on more sources and would take longer to compile, and would become dirty more often on average after a git pull. For example, its good that one can build llvm-objdump without compiling the whole of llvm.

@alexshap's point about other files in the Object library allowing writing is persuasive to me. I think an Objcopy sub-directory would make sense though, to avoid the confusion of having both an ObjectFile and Object class/header file etc in Object. This is assuming of course we don't want to jump into refactoring the two classes so that they become one. I suspect that way might prove tricky to get right.

I'm somewhat ambivalent as to whether we move then refactor or refactor then move. If people feel that the latter approach is the right one, then so be it.

Regarding the naming of Object.h, that's fine, leave it as-is. I suspect with some small CMake changes, it should be possible to get the files to appear in a corresponding "solution folder" within the VS IDE, which would help disambiguate things.

avl added a comment.Oct 22 2020, 11:17 PM

To summarize comments - I am going to:

  1. create preliminary patch for Buffers.
  2. create preliminary patch for CopyConfig.
  3. move handling of SplitDWO into llvm-objcopy.cpp.
  4. move this functionality into Object library.
  5. do not rename files.
avl added a comment.Nov 1 2020, 1:08 PM

@alexshap @jhenderson @echristo

Folks, Before creating a patch I would like to consult what would be the best option to refactor the Buffer class.
One of the alternatives is described here https://reviews.llvm.org/D88827#2321871 ("Adapter" approach).

I think the better solution would be to use raw_ostream instead of buffers:

current:

Error executeObjcopyOnBinary(const CopyConfig &Config,
                             object::ELFObjectFileBase &In, Buffer &Out);

new:

Error executeObjcopyOnBinary(const CopyConfig &Config,
                             object::ELFObjectFileBase &In, raw_ostream &Out);

Generally, using streams could allow us to reduce memory usages. No need to load all data into the memory - the data could be streamed through a smaller buffer. Opposite, the current WritableMemoryBuffer, used in llvm-objcopy, would allocate whole data into the memory. Thus replacing WritableMemoryBuffer with raw_ostream
would allow minimizing memory requirements.

FileOutputBuffer(used by llvm-objcopy) has an advantage over raw_fd_ostream(which might be used if we would like
to store data into the file). FileOutputBuffer::createOnDiskBuffer() allows to use memory mapped file. The similar functionality
could be implemented for raw_fd_ostream. It is possible to add preallocate() method into raw_ostream.

class raw_ostream {

void preallocate(uint64_t size);

}

That method, implemented for raw_fd_ostream, could create a memory-mapped file. The streamed data would be written
into that memory file then. Thus we would be able to use memory-mapped files with raw_fd_ostream.

So, it seems we could use raw_ostream instead of Buffer without losing functionality.
It seems to me that raw_ostream is a good abstraction here and it would be good to use it for llvm-objcopy.

So what is your opinion, would it be OK to use raw_ostream?
Or should we use "Adapter" approach, from https://reviews.llvm.org/D88827#2321871 ?

Using a stream approach sounds reasonable. I don't really know what the benefits are of using a memory mapped file versus other options (I vaguely recall from some older work that they improve performance, but am not sure if that is still the case or not). The one concern I'd have with a stream for writing output is if we ever need to jump back and forth within the object for some reason. Without looking at the existing objcopy code, I don't know if there are any instances where this happens though.

By the way, I think the preallocate method might be better termed reserve as it sounds like it solves a similar intent as std::vector::reserve to me.

avl added a comment.Nov 2 2020, 5:03 AM

Using a stream approach sounds reasonable. I don't really know what the benefits are of using a memory mapped file versus other options (I vaguely recall from some older work that they improve performance, but am not sure if that is still the case or not). The one concern I'd have with a stream for writing output is if we ever need to jump back and forth within the object for some reason. Without looking at the existing objcopy code, I don't know if there are any instances where this happens though.

If there is such necessity then instead of raw_ostream there could be used raw_pwrite_stream, which allows such seek&update functionality.

During this refactoring effort, I am not going to rewrite the existing objcopy writing code.
So I plan to do things this way:

Error executeObjcopyOnBinary(Config,In, raw_ostream &Out) {
  // TODO: refactor "writing" code to output into "raw_ostream &Out"
  // TOFO: directly, without MemBuffer in the middle.
  MemBuffer.allocate(Size)
  // existing writing code.
  MemBuffer.commit();
  Out.write(MemBuffer.getBufferStart(), MemBuffer.getBufferSize());
}

I propose to use raw_ostream now and replace it with raw_pwrite_stream later, if it would be necessary.
If it would be hard to use raw_ostream during such postponed rewriting then we could change raw_ostream into raw_pwrite_stream at that moment.

If it is already clear that we need to use raw_pwrite_stream then I will use it within my patch.

By the way, I think the preallocate method might be better termed reserve as it sounds like it solves a similar intent as std::vector::reserve to me.

Ok

avl updated this revision to Diff 372901.Sep 16 2021, 5:03 AM

addressed comments(usages of Buffers are replaced with streams, CopyConfig is refactored,
handling of SplitDWO is moved into llvm-objcopy.cpp, implementation of
llvm-objcopy is moved into the Object library).

avl edited the summary of this revision. (Show Details)Sep 16 2021, 6:18 AM
avl added a comment.Sep 16 2021, 6:30 AM

@jhenderson @alexander-shaposhnikov

Would you mind to take a look at this review, please? It implemented all requests which were done previously https://reviews.llvm.org/D88827#2349119.

In D88827#3003632, @avl wrote:

@jhenderson @alexander-shaposhnikov

Would you mind to take a look at this review, please? It implemented all requests which were done previously https://reviews.llvm.org/D88827#2349119.

Sorry I haven't got to this - there are seeral higher-priority items that I need to be focusing on. It's still on my radar though.

In D88827#3118304, @avl wrote:

ping,

Apologies for not getting to this - a number of other reviews have used up what time I've had to do reviews.

This patch is looking very close to being ready, but not quite there yet.

llvm/include/llvm/ObjCopy/COFF/COFFObjcopy.h
26–28

I've no issue with adding comments like these, but could you spin them off into separate patches, please, as they are independently useful, and this will help reduce the size of this patch.

Same applies for other new doc comments.

llvm/include/llvm/Object/ObjCopy/ConfigManager.h
24–25 ↗(On Diff #372901)

This should be a doc comment if it is needed. I don't think the comment is particularly useful in its current form. The functionality is fairly obvious from its interface.

33 ↗(On Diff #372901)

The functionality in this and similar functions for wasm and Mach-O should be moved to a .cpp file. It's not unlikely that it will change over time, so we don't want to force rebuilds due to it being unnecessarily in the header.

llvm/include/llvm/Object/ObjCopy/ObjCopy.h
12–18 ↗(On Diff #372901)

I think you can forward declare most of these classes, and include the headerse in the .cpp rather than having this header include them all.

I think about the only ones you likely need are Error.h, and vector.

llvm/lib/ObjCopy/COFF/Object.cpp
1

Actually, I think the way to solve this is to use some CMake functionality to stick specific files in IDE folders, so that they don't end up clashing: in the Visual Studio IDE, for example, I see 4 Object.h files next to each other in the Header Files group, but there's no indication which is which. Instead, we could create COFF, ELF etc sub-groups for the headers and source files. I believe https://cmake.org/cmake/help/latest/command/source_group.html is the relevant piece of CMake.

llvm/lib/ObjCopy/ELF/ELFObjcopy.cpp
171

As you're renaming files, it's probably worth doing a complete clang-format of the whole file at the same time. Applies to all files with clang-format issues in otherwise untouched code.

llvm/lib/Object/CMakeLists.txt
32–47 ↗(On Diff #372901)

I thought ObjCopy was going to be its own distinct library, not a part of the Object library? It would make more sense to me for it to be separate, as it's a fairly distinct piece of functionality.

llvm/unittests/Object/ObjCopyTest.cpp
9 ↗(On Diff #372901)

Here and in other new files, if you haven't already, please make sure the include set is minimal for the code in the new file.

35 ↗(On Diff #372901)

As this function is called from several places, you may want to add some tracing to it: https://github.com/google/googletest/blob/main/docs/advanced.md#adding-traces-to-assertions

If you don't, and the test fails, you may not be able to easily tell which test is causing the failure.

39 ↗(On Diff #372901)

I believe YAML is an acronym, so should be all-caps in comments.

49–50 ↗(On Diff #372901)

I was under the impression that temporary files are supposed to be deleted automatically (I might be wrong), so you wouldn't need the FileRemover?

More generally, the need to create a concrete file in the unit test makes me wonder whether we could enhance the objcopy API to take an object of some kind that could represent data in memory (as an alternative to on-disk). I believe such an object hierarchy already exists within LLVM, although the name escapes me right now.

avl added inline comments.Nov 18 2021, 2:54 AM
llvm/lib/Object/CMakeLists.txt
32–47 ↗(On Diff #372901)

It was directly requested to implement it as part of Object library during the previous review iteration:

https://reviews.llvm.org/D88827#2344900

jhenderson added inline comments.Nov 18 2021, 3:07 AM
llvm/lib/Object/CMakeLists.txt
32–47 ↗(On Diff #372901)

Right, there was some opposition from @sbc100 though. I think we need to hear their thoughts on that.

If we do end up putting it in libObject, should the namespace be changed? Not sure either way.

avl added inline comments.Nov 18 2021, 3:27 AM
llvm/lib/Object/CMakeLists.txt
32–47 ↗(On Diff #372901)

Do you mean:

objcopy->object?
objcopy->objcopybase?

I think it is fine to have objcopy(like in this patch). Changing objcopy->object creates some names clashes. Thus we need to have other namespace than object for the objcopy part. Having objcopy inside Object library and inside llvm-objcopy tool seems to be no problem.

jhenderson added inline comments.Nov 18 2021, 3:28 AM
llvm/lib/Object/CMakeLists.txt
32–47 ↗(On Diff #372901)

I was referring to the first of those (using object instead of objcopy). If using the same namespace would be a problem, I don't see any issue with leaving as-is.

avl added inline comments.Nov 18 2021, 3:38 AM
llvm/unittests/Object/ObjCopyTest.cpp
49–50 ↗(On Diff #372901)

I was under the impression that temporary files are supposed to be deleted automatically (I might be wrong), so you wouldn't need the FileRemover?

I do not see the place where automatic removing is implemented. Also, there are examples of FileRemover usages in other unit tests:

https://github.com/llvm/llvm-project/blob/7b6790850968031fe1c098ed6dcc196ddc547ea5/llvm/unittests/Support/MemoryBufferTest.cpp#L105

More generally, the need to create a concrete file in the unit test makes me wonder whether we could enhance the objcopy API to take an object of some kind that could represent data in memory (as an alternative to on-disk). I believe such an object hierarchy already exists within LLVM, although the name escapes me right now.

I think it would be a good enhancement. I propose to do it in a separate patch though.

jhenderson added inline comments.Nov 18 2021, 4:03 AM
llvm/unittests/Object/ObjCopyTest.cpp
49–50 ↗(On Diff #372901)

I do not see the place where automatic removing is implemented

For Windows, see references to the OF_Delete flag in Path.inc. For Linux, the behaviour is configured higher up the stack. See RemoveFileOnSignal referenced by TempFile::create.

At the location you cited, FileRemover is needed, because we're not working with a TempFile there.

I think it would be a good enhancement. I propose to do it in a separate patch though.

Agreed, but I think it would be useful to see it before this patch gets committed, to avoid a bad interface landing.

avl added inline comments.Nov 18 2021, 4:36 AM
llvm/unittests/Object/ObjCopyTest.cpp
49–50 ↗(On Diff #372901)

Ah, the suggestion is to use sys::fs::TempFile::create instead of sys::fs::createTemporaryFile(). I see, thanks.

jhenderson added inline comments.Nov 18 2021, 4:38 AM
llvm/unittests/Object/ObjCopyTest.cpp
49–50 ↗(On Diff #372901)

I didn't realise the two did different things, but it would make sense to do as you are saying, yes.

avl added inline comments.Nov 18 2021, 8:27 AM
llvm/unittests/Object/ObjCopyTest.cpp
49–50 ↗(On Diff #372901)

I think we could not use sys::fs::TempFile::create() for reusing automatic removing. Please look at the use case:

Expected<sys::fs::TempFile> Temp =
  sys::fs::TempFile::create("a.temp-unittest-%%%%%%", Mode);

Config.Common.OutputFilename = Temp.TmpName;

// Call executeObjcopyOnBinary()
std::error_code EC;
raw_fd_ostream OutStream(Temp->FD, false);
Error Err = objcopy::executeObjcopyOnBinary(Config, *Obj.get(), OutStream);
OutStream.flush();

// if we call Temp.keep() here then the autoremoving functionality would be lost.
// if we call Temp.discard() then the file would be removed.

// if we not call Temp.keep() or Temp.discard()
// then we would try to open the temporarily file second time
// (inside ObjectFile::createObjectFile) which is undesirable.

// Load and check copied file.
Expected<OwningBinary<ObjectFile>> Result =
    ObjectFile::createObjectFile(Config.Common.OutputFilename);
jhenderson added inline comments.Nov 19 2021, 12:00 AM
llvm/unittests/Object/ObjCopyTest.cpp
49–50 ↗(On Diff #372901)

Fair enough. Stick with this for now, although that maybe suggests we should change the executeObjcopyOnBinary interface sooner rather than later, so that it can take an in-memory buffer, instead of an on-disk file.

avl updated this revision to Diff 388956.Nov 22 2021, 9:53 AM

addressed comments:

  1. deleted doc comments(will add them with the follow-up patch).
  2. moved methods implementation into .cpp file.
  3. optimized headers.
  4. did not modify cmake to create IDE folders(will do it with the follow-up patch).
  5. added traces in the test assertions.
  6. rewrote test to use inmemory output file.

A couple of nits remaining, otherwise basically looks good. I'd still like:

  1. @sbc100's response to the library division. I have a very slight preference for separate, but don't care enough about it to push for it, if others are opposed. Adding them as a reviewer in case it gets their attention.
  2. I'd like to see patches based on this patch for the other issues that your deferring (particularly the IDE one).

Once those are dealt with, I'll give this an LGTM.

llvm/include/llvm/Object/ObjCopy/ObjCopy.h
12 ↗(On Diff #388956)

I think you can forward declare Binary, Archive and NewArchiveMember, right, so that you don't need this header?

llvm/tools/llvm-objcopy/ObjcopyOptions.cpp
634–635

More clang-formatting that can be done as you rename the file.

Oh, ideally I'd also get a second reviewer to give this a once-over, in case I've missed anything too.

avl added a comment.Nov 23 2021, 2:23 AM

@sbc100's response to the library division. I have a very slight preference for separate, but don't care enough about it to push for it, if others are opposed. Adding them as a reviewer in case it gets their attention.

My own preference is also to make ObjCopy to be separate library, though I also do not want to push it if others want opposite. I read this message https://reviews.llvm.org/D88827#2344900 and this https://reviews.llvm.org/D88827#2346399 and this https://reviews.llvm.org/D88827#2346567 responses as a consensus on making it to be part of Object library. If we do not have a consensus, I am open to discuss it further.

llvm/include/llvm/Object/ObjCopy/ObjCopy.h
12 ↗(On Diff #388956)

std::vector<NewArchiveMember> wants to know sizeof (NewArchiveMember).
So, we could not have forward declaration here. i.e. we still need ArchiveWriter.h header.

avl updated this revision to Diff 389132.Nov 23 2021, 2:36 AM

addressed clang-format issue.

In D88827#3148315, @avl wrote:

@sbc100's response to the library division. I have a very slight preference for separate, but don't care enough about it to push for it, if others are opposed. Adding them as a reviewer in case it gets their attention.

My own preference is also to make ObjCopy to be separate library, though I also do not want to push it if others want opposite. I read this message https://reviews.llvm.org/D88827#2344900 and this https://reviews.llvm.org/D88827#2346399 and this https://reviews.llvm.org/D88827#2346567 responses as a consensus on making it to be part of Object library. If we do not have a consensus, I am open to discuss it further.

I changed my mind a bit since my early comment you linked, having gone cold on the review and now having come back to it. Here's my thinking: the object manipulation performed by the objcopy code is based on an internal Object class that has nothing to do with the object classes within the libObject library. By putting the two inside the same library, we risk confusion ("which type of Object do I need for this functionality?"). There are fundamental differences which would make reusing the libObject classes for the objcopy code less than ideal - it may not even be possible without a lot of work.

Regarding the Archive writing code being in libObject: it's my tentative opinion that archive functionality shouldn't be in libObject at all: an archive isn't really an object itself: it's a group of objects. The functionality for archives is completely unrelated to the functionality for other file types supported by the object library. As such, I think it should actually be moved into a separate libArchive library or similar.

Finally, there's @sbc100's point earlier: whilst it's true that by having the code in the same library doesn't force tools to link in that code, it does require distributions to include the llvm-objcopy library code anytime it needs the libObject code, because it's part of that library, even though it may not need it. I've seen several instances internally where we generate packages of LLVM libraries, without including all of them, so including the llvm-objcopy code in the libObject library would bloat the size of the package.

@echristo, @alexander-shaposhnikov, do either of you have any further thoughts? What do you think of my points above? I'm not strongly opposed to it going in libObject, but just think it makes a little more sense to be separate. Also, would either of you be able to give this a review more generally?

llvm/include/llvm/Object/ObjCopy/ObjCopy.h
12 ↗(On Diff #388956)

You learn something new everyday. I thought the type used in std::vector could also be forward declared if it was just used in the return signature. Guess I was wrong.

avl added a comment.Dec 7 2021, 3:16 AM

@echristo, @alexander-shaposhnikov Do you have objections to doing Objcopy to be a separate library instead of putting it into the existing Object library?

I agree with concerns raised by @jhenderson (https://reviews.llvm.org/D88827#3148452) and @sbc100 (https://reviews.llvm.org/D88827#2348439): Doing Objcopy as a separate library would allow to reduce code inter-dependency and reduce size of resulting code/package(when objcopy functionality is not neccessary).

avl added a comment.Dec 20 2021, 3:41 AM

@echristo, @alexander-shaposhnikov Would you mind taking part in this review, please? Your opinion might help to make a progress with it. During previous iterations, you preferred the variant when ObjCopy code would be moved into the Object library. There are arguments to make ObjCopy be a separate library. Do you have objections to that?

avl added a comment.Jan 13 2022, 4:11 AM

@jhenderson James, what do you think would be right to do if we do not have a response from others?

Both solutions(make it be a separate library or make it be part of Object) are OK for me.
It would be good if we might have a progress with this patch. Probably we can make a decision and continue with it?

@avl, I'd suggest emailing them directly to prod them, as they aren't responding. If they don't respond after that, I think you should adopt what active reviewers (i.e. me!) are suggesting.

On the note of activity, I will be off for at least 2 weeks, starting tomorrow, so may not be around to continue reviewing in that time!

avl added a comment.Jan 13 2022, 4:17 AM

@avl, I'd suggest emailing them directly to prod them, as they aren't responding. If they don't respond after that, I think you should adopt what active reviewers (i.e. me!) are suggesting.

On the note of activity, I will be off for at least 2 weeks, starting tomorrow, so may not be around to continue reviewing in that time!

I see, thanks! will follow that advice.

In D88827#3148315, @avl wrote:

@sbc100's response to the library division. I have a very slight preference for separate, but don't care enough about it to push for it, if others are opposed. Adding them as a reviewer in case it gets their attention.

My own preference is also to make ObjCopy to be separate library, though I also do not want to push it if others want opposite. I read this message https://reviews.llvm.org/D88827#2344900 and this https://reviews.llvm.org/D88827#2346399 and this https://reviews.llvm.org/D88827#2346567 responses as a consensus on making it to be part of Object library. If we do not have a consensus, I am open to discuss it further.

I changed my mind a bit since my early comment you linked, having gone cold on the review and now having come back to it. Here's my thinking: the object manipulation performed by the objcopy code is based on an internal Object class that has nothing to do with the object classes within the libObject library. By putting the two inside the same library, we risk confusion ("which type of Object do I need for this functionality?"). There are fundamental differences which would make reusing the libObject classes for the objcopy code less than ideal - it may not even be possible without a lot of work.

The original idea is that these classes -could- be used for this. If that's not true we should go back and fix that rather than have a separate set. That said, we do have a separate set which complicates matters. I think the question I'd ask is "how can we get from here to there?". No straightforward answer here for sure.

Regarding the Archive writing code being in libObject: it's my tentative opinion that archive functionality shouldn't be in libObject at all: an archive isn't really an object itself: it's a group of objects. The functionality for archives is completely unrelated to the functionality for other file types supported by the object library. As such, I think it should actually be moved into a separate libArchive library or similar.

From my perspective: ideally an archive should just be a specific type of object that contains other objects (similar to perhaps a shared library if we did that). Reading it just means cracking it, sharing the archive format information, and then iterating through each object accordingly. Writing would be the reverse. Any thoughts on where we're on a different page? :)

Finally, there's @sbc100's point earlier: whilst it's true that by having the code in the same library doesn't force tools to link in that code, it does require distributions to include the llvm-objcopy library code anytime it needs the libObject code, because it's part of that library, even though it may not need it. I've seen several instances internally where we generate packages of LLVM libraries, without including all of them, so including the llvm-objcopy code in the libObject library would bloat the size of the package.

Are you thinking about things just shipping .a or .o files or something else? Most platforms that I can think of can remove unneeded object code in a binary link so I'm curious about the concern here.

@echristo, @alexander-shaposhnikov, do either of you have any further thoughts? What do you think of my points above? I'm not strongly opposed to it going in libObject, but just think it makes a little more sense to be separate. Also, would either of you be able to give this a review more generally?

I'll do my best :)

-eric

llvm-objcopy has its own data models for some binary format specific things, so placing it under the llvm::object namespace may lead to some conflicts. Placing it into a sub-namespace does not avoid the conflict.
If it keeps using the llvm::objcopy namespace, then placing it under lib/Object may cause some confusion. A separate directory looks good to me.


An archive is a quite different container format. So I get the point that there is a question about whether it fits into lib/Object.
On the other hand, an archive is sometimes an input file type and used nearly indistinguishable with other file typed in syntax (ld.lld, llvm-readelf, ...). Having it in lib/Object seems fine to me...


By placing some less associated code in the same library, there is a risk that dependencies tend to become mixed.
Layered component design has some advantages but in practice it is easy to run into a situation that for all of {a,b,c}.cpp to include all of {a,b,c}.h...
Different lib/* directories tend to force developers to think harder on the layering.

I changed my mind a bit since my early comment you linked, having gone cold on the review and now having come back to it. Here's my thinking: the object manipulation performed by the objcopy code is based on an internal Object class that has nothing to do with the object classes within the libObject library. By putting the two inside the same library, we risk confusion ("which type of Object do I need for this functionality?"). There are fundamental differences which would make reusing the libObject classes for the objcopy code less than ideal - it may not even be possible without a lot of work.

The original idea is that these classes -could- be used for this. If that's not true we should go back and fix that rather than have a separate set. That said, we do have a separate set which complicates matters. I think the question I'd ask is "how can we get from here to there?". No straightforward answer here for sure.

I guess I might have misrepresented the internals of the llvm-objcopy classes: to populate them, they do use the libObject code. The llvm-objcopy Object class acts as another layer that provides manipulation functionality on top of the libObject object files. In some ways, it could actually be used purely for reading though, which raises some interesting thoughts which I can't exactly express. If we wanted to change the existing libObject code to support direct manipulation, without this extra layer, we'd need to change how those classes store data - at the moment, they just consist of references into memory buffers, rather than copies stored within the program. As such, it is impossible to change these references without changing how data is backed. The llvm-objcopy library provides this additional layer in essence. I think this shows that, at least at the moment, the llvm-objcopy Object code is a separate layer. Whethere that then means the separate layer should be in a separate library is probably a different point (it means it can be, but doesn't mean it must be.

Regarding the Archive writing code being in libObject: it's my tentative opinion that archive functionality shouldn't be in libObject at all: an archive isn't really an object itself: it's a group of objects. The functionality for archives is completely unrelated to the functionality for other file types supported by the object library. As such, I think it should actually be moved into a separate libArchive library or similar.

From my perspective: ideally an archive should just be a specific type of object that contains other objects (similar to perhaps a shared library if we did that). Reading it just means cracking it, sharing the archive format information, and then iterating through each object accordingly. Writing would be the reverse. Any thoughts on where we're on a different page? :)

I don't think we really disagree here (although I wouldn't call an archive an object in those words). The point I was largely addressing was the earlier point made by @alexander-shaposhnikov about how the object library already allows manipulation of archives, unlike the other object formats.

Finally, there's @sbc100's point earlier: whilst it's true that by having the code in the same library doesn't force tools to link in that code, it does require distributions to include the llvm-objcopy library code anytime it needs the libObject code, because it's part of that library, even though it may not need it. I've seen several instances internally where we generate packages of LLVM libraries, without including all of them, so including the llvm-objcopy code in the libObject library would bloat the size of the package.

Are you thinking about things just shipping .a or .o files or something else? Most platforms that I can think of can remove unneeded object code in a binary link so I'm curious about the concern here.

Yeah, we have some cases where we distribute .a files (or more specifically Windows .lib files, but the principle is the same). A couple of concrete examples to illustrate the points: firstly, if I want to build LLD, I don't need the llvm-objcopy code. Let's say llvm-objcopy included 5 files. If those 5 files are included in libObject, they need to be compiled and added to that library, as part fo the build process for building LLD, even though those files aren't used in LLD's link, thus unnecessarily slowing down the build as a whole. Secondly, if I have a distribution that wants to ship LLVM libraries sufficient for developers to read objects, we'd end up distributing those same 5 files as part of libObject, unnecessarily. The end executables won't be any bigger in either case of course.

@echristo, @alexander-shaposhnikov, do either of you have any further thoughts? What do you think of my points above? I'm not strongly opposed to it going in libObject, but just think it makes a little more sense to be separate. Also, would either of you be able to give this a review more generally?

I'll do my best :)

-eric

FYI, I'm on leave for the next 2-3 weeks, so won't be able to respond to further points after this for a while.

llvm-objcopy has its own data models for some binary format specific things, so placing it under the llvm::object namespace may lead to some conflicts. Placing it into a sub-namespace does not avoid the conflict.

Not sure I follow how placing things into, say, llvm::object::objcopy would have potential for conflicts beyond the current state of the code?

avl added a comment.Jan 14 2022, 4:22 AM

Finally, there's @sbc100's point earlier: whilst it's true that by having the code in the same library doesn't force tools to link in that code, it does require distributions to include the llvm-objcopy library code anytime it needs the libObject code, because it's part of that library, even though it may not need it. I've seen several instances internally where we generate packages of LLVM libraries, without including all of them, so including the llvm-objcopy code in the libObject library would bloat the size of the package.

Are you thinking about things just shipping .a or .o files or something else? Most platforms that I can think of can remove unneeded object code in a binary link so I'm curious about the concern here.

Yeah, we have some cases where we distribute .a files (or more specifically Windows .lib files, but the principle is the same). A couple of concrete examples to illustrate the points: firstly, if I want to build LLD, I don't need the llvm-objcopy code. Let's say llvm-objcopy included 5 files. If those 5 files are included in libObject, they need to be compiled and added to that library, as part fo the build process for building LLD, even though those files aren't used in LLD's link, thus unnecessarily slowing down the build as a whole. Secondly, if I have a distribution that wants to ship LLVM libraries sufficient for developers to read objects, we'd end up distributing those same 5 files as part of libObject, unnecessarily. The end executables won't be any bigger in either case of course.

there is also a scenario when end executables would be bigger(though it is not about shipping, this is for debug builds). It is assumed that --gc-sections or similar will remove unused code and then the final executable would be small. That is not exactly true for debug builds. --gc-sections will not remove unused debuginfo. Thus end debug executables would contain unnecessary debuginfo and then will be bigger.

In D88827#3243279, @avl wrote:

Finally, there's @sbc100's point earlier: whilst it's true that by having the code in the same library doesn't force tools to link in that code, it does require distributions to include the llvm-objcopy library code anytime it needs the libObject code, because it's part of that library, even though it may not need it. I've seen several instances internally where we generate packages of LLVM libraries, without including all of them, so including the llvm-objcopy code in the libObject library would bloat the size of the package.

Are you thinking about things just shipping .a or .o files or something else? Most platforms that I can think of can remove unneeded object code in a binary link so I'm curious about the concern here.

Yeah, we have some cases where we distribute .a files (or more specifically Windows .lib files, but the principle is the same). A couple of concrete examples to illustrate the points: firstly, if I want to build LLD, I don't need the llvm-objcopy code. Let's say llvm-objcopy included 5 files. If those 5 files are included in libObject, they need to be compiled and added to that library, as part fo the build process for building LLD, even though those files aren't used in LLD's link, thus unnecessarily slowing down the build as a whole. Secondly, if I have a distribution that wants to ship LLVM libraries sufficient for developers to read objects, we'd end up distributing those same 5 files as part of libObject, unnecessarily. The end executables won't be any bigger in either case of course.

there is also a scenario when end executables would be bigger(though it is not about shipping, this is for debug builds). It is assumed that --gc-sections or similar will remove unused code and then the final executable would be small. That is not exactly true for debug builds. --gc-sections will not remove unused debuginfo. Thus end debug executables would contain unnecessary debuginfo and then will be bigger.

This isn't generally true: even in debug builds and without --gc-sections, linkers tend not to link in unused objects into the final executable.

avl added a comment.Jan 14 2022, 4:41 AM

This isn't generally true: even in debug builds and without --gc-sections, linkers tend not to link in unused objects into the final executable.

I remember there was a patch to make it possible - https://reviews.llvm.org/D54747 . Which finally was not applied and current behavior that it is not working(if I did not miss something new).

In D88827#3243298, @avl wrote:

This isn't generally true: even in debug builds and without --gc-sections, linkers tend not to link in unused objects into the final executable.

I remember there was a patch to make it possible - https://reviews.llvm.org/D54747 . Which finally was not applied and current behavior that it is not working(if I did not miss something new).

The difference here is that if you feed the linker a specific set of object files, it will include all of them (minus what --gc-sections removes). But if you feed the linker a static library, it will only include the object files that actually are referenced (unless the library is added with the --whole-archive option or something similar).

avl added a comment.Jan 14 2022, 5:34 AM
In D88827#3243298, @avl wrote:

This isn't generally true: even in debug builds and without --gc-sections, linkers tend not to link in unused objects into the final executable.

I remember there was a patch to make it possible - https://reviews.llvm.org/D54747 . Which finally was not applied and current behavior that it is not working(if I did not miss something new).

The difference here is that if you feed the linker a specific set of object files, it will include all of them (minus what --gc-sections removes). But if you feed the linker a static library, it will only include the object files that actually are referenced (unless the library is added with the --whole-archive option or something similar).

Ah, right. We are talking about library here.

@jhenderson accurately expressed my considerations / original reasoning. Sorry about the very late reply. Perhaps, it's worth adding a few words.
I kind of envisioned the following minimalistic interface exposed by the library (essentially just a single function):

Error copy(const Config &C, const object::Object &O, raw_ostream &Out)

My hope was that we would refactor Config significantly and clean it up with 2 major goals in mind: (1) have proper separation of concerns and structure: Config { ... <common options>, COFFConfig, ELFConfig, MachOConfig };
(in particular, the fields which are not required anymore would be removed) (2) the functionality which is specific to the tool (e.g. creating a .DWO file) would live in the tool and would not be a part of the library. The same applies e.g. to setting file attributes.
The question wether it should be a separate library of a part of libObject - I had mixed feelings, based on the interface I thought it was appropriate to have it in libObject, but making it a separate library also makes sense to me.

avl added a comment.Jan 17 2022, 9:03 AM

@alexander-shaposhnikov My understaning is that current state of this patch is pretty close to above description:

I kind of envisioned the following minimalistic interface exposed by the library (essentially just a single function):

Error copy(const Config &C, const object::Object &O, raw_ostream &Out)

Error executeObjcopyOnBinary(const MultiFormatConfig &Config,
                             object::Binary &In, raw_ostream &Out);

This function does dispatch on input formats and calls format-specific function.

It is not a single function though. I think we may refactor it further with separate patches.

My hope was that we would refactor Config significantly and clean it up with 2 major goals in mind: (1) have proper separation of concerns and structure: Config { ... <common options>, COFFConfig, ELFConfig, MachOConfig };
(in particular, the fields which are not required anymore would be removed)

Config was refactored by following patches :

https://reviews.llvm.org/D99055
https://reviews.llvm.org/D102277
https://reviews.llvm.org/D103260

(2) the functionality which is specific to the tool (e.g. creating a .DWO file) would live in the tool and would not be a part of the library. The same applies e.g. to setting file attributes.

these patches did it possible to leave creating a .DWO file, setting attributes, creating streams inside the tool:

https://reviews.llvm.org/D98582
https://reviews.llvm.org/D98511
https://reviews.llvm.org/D91028
https://reviews.llvm.org/D98426
https://reviews.llvm.org/D95478

avl added a comment.Feb 7 2022, 1:49 AM

@echristo Are you satisfied with response by @jhenderson ?

following is a reminder, if messages are lost in the history:

That is @echristo message - https://reviews.llvm.org/D88827#3242326
That is @jhenderson response - https://reviews.llvm.org/D88827#3243266

avl updated this revision to Diff 409152.Feb 15 2022, 10:01 PM

move implementation into the separate library ObjCopy.

jhenderson accepted this revision.Feb 16 2022, 1:35 AM

LGTM, with two nits.

llvm/include/llvm/ObjCopy/ObjCopy.h
23–28

This doesn't need to be in the header, right? I'd get rid of it, as it's not really part fo the public interface (executeObjcopyOnArchive would be the right thing to use, I believe(?)).

llvm/tools/llvm-objcopy/llvm-objcopy.cpp
9

Delete this blank line then resort the headers.

This revision is now accepted and ready to land.Feb 16 2022, 1:35 AM
avl added inline comments.Feb 16 2022, 2:19 AM
llvm/include/llvm/ObjCopy/ObjCopy.h
23–28

This also is used in lib/ObjCopy/MachO/MachOObjcopy.cpp. But, indeed, it might be removed from the public interface - include/llvm/ObjCopy/ObjCopy.h. Would it be OK if declaration of createNewArchiveMembers would be put into the new local header llvm/lib/ObjCopy/Archive.h ?

jhenderson added inline comments.Feb 16 2022, 2:20 AM
llvm/include/llvm/ObjCopy/ObjCopy.h
23–28

Yes, I think that should be okay.

avl updated this revision to Diff 409266.Feb 16 2022, 7:59 AM

addressed comments - moved createNewArchiveMembers into separate header.

avl edited the summary of this revision. (Show Details)Feb 17 2022, 1:55 AM
This revision was landed with ongoing or failed builds.Feb 17 2022, 2:12 AM
This revision was automatically updated to reflect the committed changes.
avl added a comment.Feb 17 2022, 2:14 AM

@jhenderson Thank you for the review!

thakis added a subscriber: thakis.Feb 17 2022, 5:53 AM

As far as I know, you also have to update the entries in clang/docs/tools/clang-formatted-files.txt.

llvm/lib/CMakeLists.txt
24

Can we be consistent about capitalization of "ObjCopy" or "Objcopy"? The new lib capitalizes the C, but e.g. ObjcopyOptions.cpp doesn't. (Don't care which way, just be consistent).

% git grep Objcopy | wc -l
     150
% git grep ObjCopy | wc -l
      77
llvm/unittests/ObjCopy/CMakeLists.txt
1

I think you have to add_subdirectory this in llvm/unittests/CMakeLists.txt to get this test binary actually built.

avl added a comment.Feb 17 2022, 6:25 AM

@thakis Thanks! will implement comments with followup commit.

avl added a comment.Feb 17 2022, 8:36 AM

@thakis addressed comments in https://reviews.llvm.org/rG0b57e6c46b707c0e7a123efe82abf3c1e7b5a503. As to the consistent naming - will do it in the separate patch.

I'm also seeing a bunch of warnings that look related:

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (ObjCopy.Object.o) in output file used for input files: obj/llvm/lib/ObjCopy/ELF/ObjCopy.Object.o and: obj/llvm/lib/ObjCopy/COFF/ObjCopy.Object.o (due to use of basename, truncation, blank padding or duplicate input files)
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (ObjCopy.Object.o) in output file used for input files: obj/llvm/lib/ObjCopy/COFF/ObjCopy.Object.o and: obj/llvm/lib/ObjCopy/MachO/ObjCopy.Object.o (due to use of basename, truncation, blank padding or duplicate input files)
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (ObjCopy.Object.o) in output file used for input files: obj/llvm/lib/ObjCopy/MachO/ObjCopy.Object.o and: obj/llvm/lib/ObjCopy/wasm/ObjCopy.Object.o (due to use of basename, truncation, blank padding or duplicate input files)
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (ObjCopy.Reader.o) in output file used for input files: obj/llvm/lib/ObjCopy/COFF/ObjCopy.Reader.o and: obj/llvm/lib/ObjCopy/wasm/ObjCopy.Reader.o (due to use of basename, truncation, blank padding or duplicate input files)
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (ObjCopy.Writer.o) in output file used for input files: obj/llvm/lib/ObjCopy/wasm/ObjCopy.Writer.o and: obj/llvm/lib/ObjCopy/COFF/ObjCopy.Writer.o (due to use of basename, truncation, blank padding or duplicate input files)
avl added a comment.Feb 22 2022, 6:28 AM

I'm also seeing a bunch of warnings that look related:

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (ObjCopy.Object.o) in output file used for input files: obj/llvm/lib/ObjCopy/ELF/ObjCopy.Object.o and: obj/llvm/lib/ObjCopy/COFF/ObjCopy.Object.o (due to use of basename, truncation, blank padding or duplicate input files)
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (ObjCopy.Object.o) in output file used for input files: obj/llvm/lib/ObjCopy/COFF/ObjCopy.Object.o and: obj/llvm/lib/ObjCopy/MachO/ObjCopy.Object.o (due to use of basename, truncation, blank padding or duplicate input files)
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (ObjCopy.Object.o) in output file used for input files: obj/llvm/lib/ObjCopy/MachO/ObjCopy.Object.o and: obj/llvm/lib/ObjCopy/wasm/ObjCopy.Object.o (due to use of basename, truncation, blank padding or duplicate input files)
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (ObjCopy.Reader.o) in output file used for input files: obj/llvm/lib/ObjCopy/COFF/ObjCopy.Reader.o and: obj/llvm/lib/ObjCopy/wasm/ObjCopy.Reader.o (due to use of basename, truncation, blank padding or duplicate input files)
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/libtool: warning same member name (ObjCopy.Writer.o) in output file used for input files: obj/llvm/lib/ObjCopy/wasm/ObjCopy.Writer.o and: obj/llvm/lib/ObjCopy/COFF/ObjCopy.Writer.o (due to use of basename, truncation, blank padding or duplicate input files)

My understanding is that it is not a problem for linking. But it may be a problem if one would want to extract the files from libLLVMObjCopy.a. To avoid this problem we need to rename files. Will do it with separate patch.