Page MenuHomePhabricator

[llvm-objcopy][NFC] Move core implementation of llvm-objcopy into separate library.
Needs ReviewPublic

Authored by avl on Oct 5 2020, 5:29 AM.

Details

Summary

This patch moves core implementation of llvm-objcopy into separate library ObjCopy
(http://lists.llvm.org/pipermail/llvm-dev/2020-September/145075.html).
The functionality for parsing input options is left inside tools/llvm-objcopy.
The interface of ObjCopy library:

ObjCopy/ELF/ELFObjcopy.h

Error executeObjcopyOnIHex(const CopyConfig &Config, MemoryBuffer &In,
                           Buffer &Out);
Error executeObjcopyOnRawBinary(const CopyConfig &Config, MemoryBuffer &In,
                                Buffer &Out);
Error executeObjcopyOnBinary(const CopyConfig &Config,
                             object::ELFObjectFileBase &In, Buffer &Out);

ObjCopy/COFF/COFFObjcopy.h

Error executeObjcopyOnBinary(const CopyConfig &Config,
                             object::COFFObjectFile &In, Buffer &Out);

ObjCopy/MachO/MachOObjcopy.h

Error executeObjcopyOnBinary(const CopyConfig &Config,
                             object::MachOObjectFile &In, Buffer &Out);

ObjCopy/wasm/WasmObjcopy.h

Error executeObjcopyOnBinary(const CopyConfig &Config,
                             object::WasmObjectFile &In, Buffer &Out);

Diff Detail

Unit TestsFailed

TimeTest
380 mslinux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -m64 -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -mcmodel=large -mllvm -hwasan-globals -mllvm -hwasan-use-short-granules -mllvm -hwasan-instrument-landing-pads=0 -mllvm -hwasan-instrument-personality-functions /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/hwasan/TestCases/sizes.cpp -nostdlib++ -lstdc++ -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/hwasan/X86_64/TestCases/Output/sizes.cpp.tmp

Event Timeline

avl created this revision.Oct 5 2020, 5:29 AM
Herald added a project: Restricted Project. · View Herald Transcript
avl requested review of this revision.Oct 5 2020, 5:30 AM
avl edited the summary of this revision. (Show Details)Oct 5 2020, 5:58 AM
grimar added a comment.Oct 6 2020, 1:02 AM

This looks fine to me generally. Have a suggestion though:
in few places there are changes that are related to clang-formatting of original code it seems.
I'd commit them as a separate NFC cleanup (you don't need a review for that) and then rebase this diff.

llvm/unittests/ObjCopy/ObjCopyTest.cpp
40

Perhaps just fail inside?

51

You can avod having this ASSERT_TRUE if you use cast<T>, I think.
Also, you can use T & instead of a pointer, because it is expected that the value is always non-null.

100

Do you need Machine: EM_X86_64? By default it is EM_NONE, so it should work without an explicit value I guess.

This change needs some analysis & review, it might take some time (especially because this week we have the LLVM conference).

A few considerations / observations / questions.

  1. While this diff moves the implementation of llvm-objcopy into a library it seems like the current interfaces / design are not ideal for a library.

For example, CopyConfig essentially represents parsed command-line options and it is "string-heavy".
Another manifestation of this problem is that the "public" headers (include/llvm/ObjCopy/...) expose a great deal of the internal implementation details etc.

  1. Designing a good interface is a nontrivial task. It is important to understand the scope of the problem and potentially dissect it into subparts.

For example if the plan (at least initially) is to reuse the code for reading / writing object files then one of the first steps would be factoring out (and, probably, cleaning up) the model ("class Object") and
exposing the minimal interface for reading / writing . In particular, the internal details of implementation and the associated complexity (e.g. class Reader, class Writer) would live inside the library.

avl updated this revision to Diff 296459.Oct 6 2020, 7:36 AM

addressed comments.

avl added a comment.Oct 6 2020, 7:40 AM

@alexshap

This refactoring tries to reuse this part of the functionality:

Error executeObjcopyOnBinary(const CopyConfig &Config, object::ELFObjectFileBase &In, Buffer &Out);

get source file "In", apply transformations described by "Config", write output into the "Out".

i.e. it is not trying to refactor low level of the objcopy code("class Object" and others). It still should be internal thing of ObjCopy library.

I agree that the patch currently makes internal details (class Object, class Reader, class Writer) to be public.
They should be hidden inside library code. I addressed this into the new version of the patch.

avl updated this revision to Diff 296524.Oct 6 2020, 12:44 PM

apply clang-tidy comments.

Looks pretty good from my point of view. Some general comments:

  1. The functions in the library header need doxygen style comments to explain the interface. Possibly these could be added in a follow-up patch.
  2. I think if we are moving files around, now is a good time to run a blanket clang-format on the files. Whether that should be done as part of the patch or a separate follow-on one, I don't know.
  3. Probably not this patch, but we should consider our testing strategy for the libray and llvm-objcopy going forward. Some examples for this dicussion: should we move the existing lit tests? Should we port/duplicate some of them as gtest unit tests? How should new library features be tested (gtest or lit)? How should new llvm-strip/objcopy/... options be tested (bearing in mind in theory there should be library testing etc etc)? I think this sort of discussion probably belongs on the mailing list.

I think keeping the surface area of the library down to just the executeObjcopyOn* functions plus the config struct is a pretty good example of a facade style design pattern, and I think makes sense in this context for at least the first version. This assumes that the use-case is a simple in-process read-in/transform/write-out (possibly to a memory buffer for further operations by other tools). We might want to expose other bits of the process and provide other ways of driving the objcopy process in the future (e.g. making it more interactive somehow), but I think that is an extension for later. We do need to nail down the initial use-case(s) though to make sure we don't produce something that isn't useful.

llvm/include/llvm/ObjCopy/Buffer.h
23–24

Maybe as part of a separate patch, it would be worth taking a look at this TODO. It would be great if the Buffer could be removed from the library API and generic LLVM buffers used instead (for example an in-memory buffer or a file buffer, depending on what people want to do).

llvm/include/llvm/ObjCopy/CopyConfig.h
126–127

This comment probably needs updating to better match the new usage - but see out-of-line comment.

llvm/include/llvm/ObjCopy/ObjCopy.h
20–23

I think details of the output format shouldn't be described in the comment - theoretically objcopy could even mutate from one object format to another (see e.g. the IHEX stuff).

27–29
llvm/lib/ObjCopy/COFF/COFFObject.cpp
1

Whilst you're moving this and the equivalent files for other formats around, could you please rename them to be obvious from the filename which format they are for (same goes for their headers), please? For example, COFF/Object.cpp -> COFF/COFFObject.cpp. The reason for this is that when using the Visual Studio IDE, all the "Object.cpp" files end up listed next to each other in the file browser, and the only way of figuring out which is which is by opening them and seeing.

llvm/lib/ObjCopy/wasm/WasmReader.h
9–10
avl added a comment.Oct 8 2020, 12:13 AM

The functions in the library header need doxygen style comments to explain the interface. Possibly these could be added in a follow-up patch.

Ok.

I think if we are moving files around, now is a good time to run a blanket clang-format on the files. Whether that should be done as part of the patch or a separate follow-on one, I don't know.

Ok, but I think it is better to do in a separate patch.

Probably not this patch, but we should consider our testing strategy for the libray and llvm-objcopy going forward. Some examples for this dicussion: should we move the existing lit tests? Should we port/duplicate some of them as gtest unit tests? How should new library features be tested (gtest or lit)? How should new llvm-strip/objcopy/... options be tested (bearing in mind in theory there should be library testing etc etc)? I think this sort of discussion probably belongs on the mailing list.

OK, I will start that thread soon.

llvm/include/llvm/ObjCopy/Buffer.h
23–24

agreed, but I think it is better to do in separate patch.

llvm/include/llvm/ObjCopy/CopyConfig.h
126–127

ok.

llvm/include/llvm/ObjCopy/ObjCopy.h
20–23

ok.

llvm/lib/ObjCopy/COFF/COFFObject.cpp
1

ok.

I have some general comments / concerns (in addition to the inline comment).
The interface of the library is important and once it's committed and people start using the library in multiple places it might be harder to make changes / fix issues
(unfortunately this has already happened in LLVM a few times in the past) .

  1. CopyConfig as a part of the interface in its current form seems to be not "in a perfect shape". E.g. some ELF-specific options are inside the struct ELF, some others are just regular fields.

Some fields make sense in the context of the tool but they don't (at least at quick glance) in the context of a library function.

  1. class Buffer - I agree with @jhenderson's comment. Just want to note that in general there are multiple ways how to address it, but this requires some thinking.

1, 2 - perhaps, this can be refactored / reorganized / better documented, but it would be great to have it done before the introduction of the library.

I'd like to point out that not all the operations supported by llvm-objcopy can be performed in-memory (e.g. splitting dwo will create a new file). If a library function has such side effects (creates new objects on the local file system) it would be good to have it documented.

cc: @mtrent , @dblaikie, @echristo

llvm/lib/ObjCopy/COFF/COFFObject.cpp
1

@jhenderson, I'm sorry to disagree, but renaming files this way doesn't seem to be a good idea and the provided justification doesn't appear to be sufficient.
Since this file contains the implementation of what's declared in Object.h I would strongly prefer to have it named Object.cpp given it is already located in the corresponding folder. Visual Studio IDE might have some peculiarities but having consistent naming is important, adding such prefixes doesn't seem to be a good approach.

avl added a comment.Oct 8 2020, 5:32 AM

1, 2 - perhaps, this can be refactored / reorganized / better documented, but it would be great to have it done before the introduction of the library.

if possible - I propose to do these points as next patches after this(to make some progress on it). If not - I would work on preliminary patches.

llvm/lib/ObjCopy/COFF/COFFObject.cpp
1

Would it be OK, If both of the files would be renamed Object.h -> COFFObject.h and Object.cpp->COFFObject.cpp ?

avl added inline comments.Oct 9 2020, 8:07 AM
llvm/include/llvm/ObjCopy/Buffer.h
23–24

@alexshap @jhenderson

Speaking of class Buffer refactoring. I do not think that this comment is fully valid:

// The class Buffer abstracts out the common interface of FileOutputBuffer and
// WritableMemoryBuffer so that the hierarchy of Writers depends on this
// abstract interface and doesn't depend on a particular implementation.
// TODO: refactor the buffer classes in LLVM to enable us to use them here
// directly.

It suggests to create some common interface for FileOutputBuffer and WritableMemoryBuffer.
Which is assumed to look similar to this:

class Buffer {
  StringRef Name;

public:
  virtual ~Buffer();
  virtual Error allocate(size_t Size) = 0;
  virtual uint8_t *getBufferStart() = 0;
  virtual Error commit() = 0;

  explicit Buffer(StringRef Name) : Name(Name) {}
  StringRef getName() const { return Name; }
};

There exists a problem with methods commit() and allocate(). commit() is a redundant for WritableMemoryBuffer.
adding it to WritableMemoryBuffer would require to patch all current usages of WritableMemoryBuffer.
So it looks incorrectly to use it for common parent interface of FileOutputBuffer and WritableMemoryBuffer.
allocate() suggests another way of buffer creation. Currently, buffers are created by static creation methods:

static std::unique_ptr<WritableMemoryBuffer>
getNewMemBuffer(size_t Size, const Twine &BufferName = "");

static Expected<std::unique_ptr<FileOutputBuffer>>
create(StringRef FilePath, size_t Size, unsigned Flags = 0);

Adding "virtual Error allocate(size_t Size)" would lead to creation empty buffer by static creation method
and then call to allocate(). This does not seem a good addition to the already existed FileOutputBuffer and WritableMemoryBuffer. FileOutputBuffer and WritableMemoryBuffer assume another use cases than Buffer.

Actually, what is neccessary by objcopy is method:

uint8_t *createBuffer ( size_t Size );

All other functionality is redundant and could be removed from objcopy.

what do you think about following design?

LazyBuffer {
  StringRef Name;

  virtual StringRef getName() const { return Name; }
  virtual uint8_t *createBuffer ( size_t Size ) = 0;
};

MemoryLazyBuffer : public LazyBuffer {
  virtual uint8_t *createBuffer ( size_t Size ) {
    Buffer = WritableMemoryBuffer::getNewMemBuffer(Size, Name);
    return Buffer->getBufferStart();
  }

  std::unique_ptr<WritableMemoryBuffer> Buffer;
};

FileLazyBuffer : public LazyBuffer {
  virtual uint8_t *createBuffer ( size_t Size ) {
    Buffer = FileOutputBuffer::create(Name, Size);
    return Buffer->getBufferStart();
  }

  std::unique_ptr<FileOutputBuffer> Buffer;
};

Usage:

static Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In,
                                    LazyBuffer &Out);


MemoryLazyBuffer MB("name");
executeObjcopyOnBinary(Config, Input, MB);

FileLazyBuffer MB("name");
executeObjcopyOnBinary(Config, Input, MB);
if (MB.Buffer)
  MB.Buffer->commit();

Another alternative is that library always writes to general MemoryBuffer :

static Expected<std::unique_ptr<MemoryBuffer>> executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In);

and later this MemoryBuffer would be written into the file by llvm-objcopy.cpp.

What do you think?

I like the direction this is going, I'll take more of a deep look soon, but wanted to ask: "should this be in Object rather than a separate library?" When I'd originally asked for this to be split into its own library I'd thought that it would get added into libobject.

Thoughts?

-eric

I like the direction this is going, I'll take more of a deep look soon, but wanted to ask: "should this be in Object rather than a separate library?" When I'd originally asked for this to be split into its own library I'd thought that it would get added into libobject.

Thoughts?

-eric

One of the ideals would be to have writable versions of the various ObjectFile classes defined by the Object library. I know this was worked on last year as part of GSOC by @abrachet, but it didn't really end up getting to a usable point. I kind of like the separation of concerns here - the proposed Objcopy library would handle manipulation of object files, whilst the Object library is primarily for parsing and inspecting them. The former builds on the latter, but the latter doesn't need to care about the former. Thus a user who wrote an object dumping tool wouldn't need the Objcopy library. However, I don't have a strong opinion on this, so if a good design could be presented to resolve that, I'd be happy.

I have some general comments / concerns (in addition to the inline comment).
The interface of the library is important and once it's committed and people start using the library in multiple places it might be harder to make changes / fix issues
(unfortunately this has already happened in LLVM a few times in the past) .

Could we maybe put a big note at the top of the library header saying that the API is still work-in-progress and shouldn't be used, until we've got to the final point? I feel like moving the code first is a good step and then we can iterate on the design once it's in place (I agree that CopyConfig probably needs more work).

llvm/include/llvm/ObjCopy/Buffer.h
23–24

I'm not sure I've looked at the intricacies of the different buffer classes to know what the right approach is. However, if modifying the existing buffers seems like it won't work/will be too invasive, this seems like a fair approach (it's the "Adapter" design pattern in action, I believe).

I'm not sure I'd call it LazyBuffer, although I don't have a specific better name (maybe just ObjcopyBuffer, depending on how generic we want it to be). Also, unless the objcopy code actually needs the name for anything (aside from error messages, I'm not sure what that would be), I'd not include that in the interface.

avl added a comment.Oct 14 2020, 5:39 AM

I like the direction this is going, I'll take more of a deep look soon, but wanted to ask: "should this be in Object rather than a separate library?" When I'd originally asked for this to be split into its own library I'd thought that it would get added into libobject.

Thoughts?

I think, for the start, it is probably better to make it as a separate library. It could be refactored later and some part could probably be moved into Object library(Object.h/.cpp, Reader.h/.cpp, /Writer.h/.cpp).
If the ObjCopy library would be used in many other places in the end - then it would make sense to put it into the Object library, otherwise it would be better to live it as separate library.
Though, I am not against moving this ObjCopy library into the Object library, if there is opinion that it would be better.

avl added inline comments.Oct 20 2020, 8:59 AM
llvm/lib/ObjCopy/COFF/COFFObject.cpp
1

@jhenderson, I'm sorry to disagree, but renaming files this way doesn't seem to be a good idea and the provided justification doesn't appear to be sufficient.
Since this file contains the implementation of what's declared in Object.h I would strongly prefer to have it named Object.cpp given it is already located in the corresponding folder. Visual Studio IDE might have some peculiarities but having consistent naming is important, adding such prefixes doesn't seem to be a good approach.

@alexshap Could you explain this renaming thing, please? i.e. if both header file COFF/Object.h and src file COFF/Object.cpp would be renamed(COFF/COFFObject.h, COFF/COFFObject.cpp), would it be OK?

avl updated this revision to Diff 299664.Oct 21 2020, 6:23 AM

addressed comments: renamed files, added doxygen comments into library headers(except CopyConfig.h).

I propose to go ahead with this patch and continue with follow-up patches:

  1. Remove Buffer.h/Buffer.cpp.
  2. cleanup CopyConfig.h.
  3. remove handling of SplitDWO from the library.

I really do think this needs to be under object. It can be separate in there as lib/Object/Copy if you want, but I don't think it should be a parallel directory. Hopefully the updates for the move aren't overly onerous here.

-eric

I really do think this needs to be under object. It can be separate in there as lib/Object/Copy if you want, but I don't think it should be a parallel directory. Hopefully the updates for the move aren't overly onerous here.

-eric

Won't that mean that any tool that just wants to read in object files (e.g. lld) will need to build this extra code for no reason? Not a huge deal I guess (especially for lld which due to LTO pulls in all of llvm!) but I imagine there are many tools that just want to read (and not modify) objects.

alexshap added a comment.EditedOct 21 2020, 11:07 PM

My 0.02$: (perhaps, this should have been mentioned earlier) the current class CopyConfig contains e.g. file names (again, imo it is good enough for a tool, but not good enough for a library) and this means that if somebody wants to add a section to an object file he won't be able to accomplish this task using the current interface without creating extra files. It kind of defeats the idea. To solve this problem proper abstractions should be introduced / the code needs to be refactored. Personally I would strongly prefer to see the following iterative approach here: refactor the current code in llvm-objcopy step by step until it's ready to be moved into a library with a clean and easy-to-use interface. Maybe I'm missing something, but doing refactoring post factum seems to be a less controllable process and might get us to the state where the code has been move out of the tool, the interface has been modified to accomplish a very specific task and the rest (burden) will stay there for years creating more issues than benefits, moreover, it would introduce some risks.
Regarding where to place these functions - into libObject or create a separate library - libObject already contains several write* functions, (e.g. for archives), so indeed, putting this group of functions (e.g. one can use a bit less verbose name - copy(...)) into libObject seems to be quite natural.

llvm/lib/ObjCopy/COFF/COFFObject.cpp
1

I'm very sorry, but i still think that the old names were good, adding these prefixes is unnecessary and makes things less intuitive (e.g. class Object is described in Object.h).

I really do think this needs to be under object. It can be separate in there as lib/Object/Copy if you want, but I don't think it should be a parallel directory. Hopefully the updates for the move aren't overly onerous here.

-eric

Won't that mean that any tool that just wants to read in object files (e.g. lld) will need to build this extra code for no reason? Not a huge deal I guess (especially for lld which due to LTO pulls in all of llvm!) but I imagine there are many tools that just want to read (and not modify) objects.

That will depend on the archiver/linker behaviour, I expect, but at least for standard ld.lld behaviour (I can't speak for others), only the objects that actually are needed are pulled from the archive. Therefore, for example, if LLD didn't reference anything in the objcopy codebase, it wouldn't actually use those archive members, and therefore it wouldn't be added to the tools code. Even if it did, linker options like --gc-sections or equivalent likely would cause the unused code to be removed at link time, if the tool is built appropriately.

@alexshap's point about other files in the Object library allowing writing is persuasive to me. I think an Objcopy sub-directory would make sense though, to avoid the confusion of having both an ObjectFile and Object class/header file etc in Object. This is assuming of course we don't want to jump into refactoring the two classes so that they become one. I suspect that way might prove tricky to get right.

I'm somewhat ambivalent as to whether we move then refactor or refactor then move. If people feel that the latter approach is the right one, then so be it.

Regarding the naming of Object.h, that's fine, leave it as-is. I suspect with some small CMake changes, it should be possible to get the files to appear in a corresponding "solution folder" within the VS IDE, which would help disambiguate things.

I really do think this needs to be under object. It can be separate in there as lib/Object/Copy if you want, but I don't think it should be a parallel directory. Hopefully the updates for the move aren't overly onerous here.

-eric

Won't that mean that any tool that just wants to read in object files (e.g. lld) will need to build this extra code for no reason? Not a huge deal I guess (especially for lld which due to LTO pulls in all of llvm!) but I imagine there are many tools that just want to read (and not modify) objects.

That will depend on the archiver/linker behaviour, I expect, but at least for standard ld.lld behaviour (I can't speak for others), only the objects that actually are needed are pulled from the archive. Therefore, for example, if LLD didn't reference anything in the objcopy codebase, it wouldn't actually use those archive members, and therefore it wouldn't be added to the tools code. Even if it did, linker options like --gc-sections or equivalent likely would cause the unused code to be removed at link time, if the tool is built appropriately.

You are right, the size of resulting tools won't be increased. I was referring to the fact that such a tool would transitively depend on more sources and would take longer to compile, and would become dirty more often on average after a git pull. For example, its good that one can build llvm-objdump without compiling the whole of llvm.

@alexshap's point about other files in the Object library allowing writing is persuasive to me. I think an Objcopy sub-directory would make sense though, to avoid the confusion of having both an ObjectFile and Object class/header file etc in Object. This is assuming of course we don't want to jump into refactoring the two classes so that they become one. I suspect that way might prove tricky to get right.

I'm somewhat ambivalent as to whether we move then refactor or refactor then move. If people feel that the latter approach is the right one, then so be it.

Regarding the naming of Object.h, that's fine, leave it as-is. I suspect with some small CMake changes, it should be possible to get the files to appear in a corresponding "solution folder" within the VS IDE, which would help disambiguate things.

avl added a comment.Oct 22 2020, 11:17 PM

To summarize comments - I am going to:

  1. create preliminary patch for Buffers.
  2. create preliminary patch for CopyConfig.
  3. move handling of SplitDWO into llvm-objcopy.cpp.
  4. move this functionality into Object library.
  5. do not rename files.
avl added a comment.Sun, Nov 1, 1:08 PM

@alexshap @jhenderson @echristo

Folks, Before creating a patch I would like to consult what would be the best option to refactor the Buffer class.
One of the alternatives is described here https://reviews.llvm.org/D88827#2321871 ("Adapter" approach).

I think the better solution would be to use raw_ostream instead of buffers:

current:

Error executeObjcopyOnBinary(const CopyConfig &Config,
                             object::ELFObjectFileBase &In, Buffer &Out);

new:

Error executeObjcopyOnBinary(const CopyConfig &Config,
                             object::ELFObjectFileBase &In, raw_ostream &Out);

Generally, using streams could allow us to reduce memory usages. No need to load all data into the memory - the data could be streamed through a smaller buffer. Opposite, the current WritableMemoryBuffer, used in llvm-objcopy, would allocate whole data into the memory. Thus replacing WritableMemoryBuffer with raw_ostream
would allow minimizing memory requirements.

FileOutputBuffer(used by llvm-objcopy) has an advantage over raw_fd_ostream(which might be used if we would like
to store data into the file). FileOutputBuffer::createOnDiskBuffer() allows to use memory mapped file. The similar functionality
could be implemented for raw_fd_ostream. It is possible to add preallocate() method into raw_ostream.

class raw_ostream {

void preallocate(uint64_t size);

}

That method, implemented for raw_fd_ostream, could create a memory-mapped file. The streamed data would be written
into that memory file then. Thus we would be able to use memory-mapped files with raw_fd_ostream.

So, it seems we could use raw_ostream instead of Buffer without losing functionality.
It seems to me that raw_ostream is a good abstraction here and it would be good to use it for llvm-objcopy.

So what is your opinion, would it be OK to use raw_ostream?
Or should we use "Adapter" approach, from https://reviews.llvm.org/D88827#2321871 ?

Using a stream approach sounds reasonable. I don't really know what the benefits are of using a memory mapped file versus other options (I vaguely recall from some older work that they improve performance, but am not sure if that is still the case or not). The one concern I'd have with a stream for writing output is if we ever need to jump back and forth within the object for some reason. Without looking at the existing objcopy code, I don't know if there are any instances where this happens though.

By the way, I think the preallocate method might be better termed reserve as it sounds like it solves a similar intent as std::vector::reserve to me.

avl added a comment.Mon, Nov 2, 5:03 AM

Using a stream approach sounds reasonable. I don't really know what the benefits are of using a memory mapped file versus other options (I vaguely recall from some older work that they improve performance, but am not sure if that is still the case or not). The one concern I'd have with a stream for writing output is if we ever need to jump back and forth within the object for some reason. Without looking at the existing objcopy code, I don't know if there are any instances where this happens though.

If there is such necessity then instead of raw_ostream there could be used raw_pwrite_stream, which allows such seek&update functionality.

During this refactoring effort, I am not going to rewrite the existing objcopy writing code.
So I plan to do things this way:

Error executeObjcopyOnBinary(Config,In, raw_ostream &Out) {
  // TODO: refactor "writing" code to output into "raw_ostream &Out"
  // TOFO: directly, without MemBuffer in the middle.
  MemBuffer.allocate(Size)
  // existing writing code.
  MemBuffer.commit();
  Out.write(MemBuffer.getBufferStart(), MemBuffer.getBufferSize());
}

I propose to use raw_ostream now and replace it with raw_pwrite_stream later, if it would be necessary.
If it would be hard to use raw_ostream during such postponed rewriting then we could change raw_ostream into raw_pwrite_stream at that moment.

If it is already clear that we need to use raw_pwrite_stream then I will use it within my patch.

By the way, I think the preallocate method might be better termed reserve as it sounds like it solves a similar intent as std::vector::reserve to me.

Ok