- User Since
- Aug 17 2017, 6:34 AM (191 w, 6 d)
Perhaps using a different option name would be useful, if we had that choice, but we don't always, either because of compatibility with existing tools, or to avoid unnecessarily breaking people's existing build scripts that use llvm-objcopy.
I've not checked, but if memory serves me correctly, there are some options (--add-symbol, I think, at least) that are parsed differently depending on the file format. As such, we can't parse them upfront - if we did, one of the two parse*Config functions would fail. We'd only want that to happen if a user was providing both file formats and the option.
It seems the same problem might occur even with lazy parsing. What if we have an archive containing object files with different formats. We might not be able to specify "--add-symbol" option which would be successfully parsed for both of them:
Mon, Apr 19
What do you think?
Wed, Apr 14
ping. Colleagues, what is your opinion on the way how this refactoring should be done?
Mon, Apr 12
@manojgupta @cjdb Sorry for the inconvenience. Please check, whether the problem is really fixed on your side(The commit with the fix is https://reviews.llvm.org/rGee8a5e4bc2c986b8e6c07e81fb58dc1e5a5c2d17)
yep, will fix or revert.
Thu, Apr 8
Fri, Apr 2
The one more argument for design presented by this patch is that: it is assumed that part of the interface
would be moved into the future ObjCopy library. The part which would be moved is quite small and clear.
Second part would be left in the llvm-objcopy tool. If we do not like its implementation(ParsedMultiFormatConfig)
we still might be able to reimplement llvm-objcopy part using another approach, but the library interface
will remain the same. Currently, indeed, we do not have many cross-format bugs. But, when this code
would be done as a library then it becomes more possibilities to do such cross-format bugs.
So, it is generally helpful if the interface of the library would prevent cross-format bugs.
I refactored this patch so that it would be more clear which part is supposed to go into the library
and which part is supposed to be left inside llvm-objcopy:
Thu, Apr 1
So the general point of this change is that previously all the options were in one struct which meant that, say, the MachO implementation could accidentally end up using an ELF option value, etc? And this way the implementations can accept the common options (the base class) and their implementation-specific options?
Wed, Mar 31
Tue, Mar 30
if all the config property access went through virtual functions then it could probably be done without virtual inheritance, but with virtual functions and maybe some CRTP mix-in kind of implementations, but that may not be any better
I do have some concerns about the multiple inheritance stuff going on, if I'm honest, as usually this is a design smell
Mon, Mar 29
Fri, Mar 26
Changed a patch to solve following problems:
Wed, Mar 24
after some more thinking, it seems to me that nor this patch nor current llvm version is not a right solution.
Tue, Mar 23
Looking at the design for ELF parseable options more, I do not really understand why parsing of ELF options should be delayed. It seems that they might be set at the same time when all other options parsed(i.e. as it was before D67139):
Mar 22 2021
Thank you, for the review.
As for the main part of this change, I think this is essentially trying to redo D67139, which had a lot of back and forth discussion on it. Apart from anything else, the *Objcopy.cpp files are not where to put command-line parsing logic, which was an item from that review. Please go over the comments from D67139, and make sure to take the points raised in the comments on board. Once you have done that, feel free to propose a new solution (which could be similar to this, or something else entirely), if you feel it is important.
Could you do this in a separate patch upfront, please? As things stand, this patch is not NFC, because of this...! I have some concerns about the specific nature of the non-NFC part, which I'd like to comment on separately.
Mar 21 2021
Mar 19 2021
There is of course a completely different approach, which would avoid this issue, namely to change the function to just create and return the stream and then leave the clients to pass the stream to their write functions. If you decided to switch to this approach, you'd probably want a new stream class derived from raw_fd_ostream which does the tempfile stuff itself, in the constructor/destructor, so that clients don't need t worry about that logic.
Mar 18 2021
thanks for reviewing.
Thank you for the review.
Mar 17 2021
Thank you for the review.
Mar 13 2021
fix a mistake.
Mar 12 2021
addressed comments(removed ownership keeping code from FileOutputBuffer).
I'm not sure if this would be a big issue but restoreStatOnFile also runs on the split dwo file, and will keep its ownership with this change. I don't think GNU assembler does that currently.
Thank you for the review!
Mar 11 2021
Mar 10 2021
The problem our downstream user has run into is seen only when using llvm-objcopy to copy archives, not regular objects, likely because it is only triggered by the writing via raw_fd_ostream, which is not done for regular object files. This change appears to extend the problem to object files too, if I'm not mistaken, hence my concern. However, if the issue is fixed, then it is not an issue either way.
Mar 8 2021
It might be worth trying to get somebody to confirm the discussed issue has been fixed. Perhaps worth asking on llvm-dev. A simple reproducible is to do something like "llvm-objcopy test.a path/on/shared/drive/test.a".
Mar 6 2021
Mar 5 2021
That seems quite expensive :s
Yeah, having to keep the types in memory is the bit I'm getting at - but yes, it's not all CUs. If they're being unloaded/reloaded though, it might be less clear whether it's so costly to preserve the existing output behavior. Though I don't personally have a problem with creating a synthetic/arbitrary CU to put all the types in either. But other folks (Apple folks with a vested interest in the dsymutil behavior) might disagree.
Mar 4 2021
will address comments. as to checking the "Windows 10 VM running using Parallels on a Mac" configuration - unfortunately, I do not have that hardware/configuration. So I would not be able to check it in the nearest future.
Mar 3 2021
Sorry, I've been unwell, and not been able to look more at this yet.
Mar 2 2021
updated the test.
Mar 1 2021
@jhenderson Hi James, Is it OK now?
@craig.topper Would you mind to take a look at this updated patch once more, please?
Feb 28 2021
Feb 27 2021
Feb 25 2021
rebased. addressed comments.
Sounds like your proposal would require that too - or require reloading the CUs twice? I think either strategy (keeping them loaded, or reloading them) could be used for either of our proposed directions (creating a separate unit, or using the existing units)?
I agree with Fred in that it might be too expensive to keep all of the CUs around so that we can later copy type data out of them at the end into uniqued compile units.
Feb 24 2021
How would you guarantee that the artificial CU is always emitted in the same order? It seems like you need to stash the types somewhere, then create a deterministic ordering and then emit it, but I don't remember whether that's doable with LLVM's DIEs (It's been a while since I touched this code).
Will address all comments. I have a two questions/comments inside...
One idea is for each type we want to unique, each CU as it is being parsed on its own thread creates a type map that maps "type compile unit path" (which is DW_AT_decl_file of that type) to a list of uniqued types with decl context info. This would allow all types with the same DW_AT_decl_file to be stored in the same type compile unit. As we emit the debug info for a normal compile unit, any type references are changed to be DW_AT_ref_addr references with values that will need to be fixed up. After all normal CUs have been emitted, combine all of the CU specific type maps together, and now emit the type compile units (carefully to include all needed children in each type to make the most complete version of the type possible and emit them) in a sorted order to allow determinism. Then we fix up all DW_AT_ref_addr references in the normal CUs to point to the one type definition from the type compile units.
Comparing your proposal - avoiding nondeterminism by sorting the types by name in the new type-holding CU, we could do something similar, but instead sorting the CUs a type appears in to pick which of those existing locations to be the canonical home. (or, indeed, could sort by the original linear visitation order)
Feb 23 2021
I'm not sure why that would necessarily be better/faster - it'd still require two passes, right? One to collect the types, then another to emit the unit with the types and the units referencing that?
The final .debug_str should only have one copy of a string. It should be possible to create individual string tables and merge them in the last loop you have that fixes up offsets right?
Thanks! updated test accordingly.
Feb 22 2021
yep. this implementation is non-deterministic.
I tried to preserve the current ODR algorithm(which relies on concrete type sequences).
change reserve() to reserveExtraSpace(). That allows pre-allocating
the internal stream buffers without knowing its current size.
What do you think of this patch? Is it OK in general and Is it worth to divide it into smaller patches and integrate?
moved control-flow expansion for VASTART_SAVE_XMM_REGS until after regalloc.
Feb 19 2021
Feb 15 2021
short summary why that refactoring is useful: It replaces specialized interfaces Buffer/MemBuffer/FileBuffer with more general interface - raw_ostream. It allows easily redefine the type of destination. It also allows to work effectively for implementations which do not pre-allocate destination space.
Feb 8 2021
removed usages of memory mapped file. make reserve() method to
pre-allocate internal buffers only.