- User Since
- May 16 2017, 11:34 AM (125 w, 5 d)
Fri, Oct 11
I'm not sure why I was added but this looks fine to me. If it compiles and runs on everyone's system I can't see how this could be anything but good.
Wed, Oct 9
LGTM except a nit
Mon, Oct 7
Seems like a good idea to me!
Tue, Sep 24
"Interstitial data" is data covered by by a segment but not a section. Gap fill should IMO set the bytes for all of that space. This is consistent with James's recommendation how things work. --pad-to when padding to a page aligned address this can for instance ensure that all executable bytes contain either valid code or a trap instruction.
What's the use case for these two flags? Where has it come up in actual practice that these were needed? I think they basically exist to cover an issue left by the linker that lld no longer leaves. I'm particularly hesitant to add --pad-to
--gap-fill should likely overwrite all interstitial data which includes data at the end of loadable segments not just the data between sections that we previously preserved. I view the --gap-fill use case as being a use case for things like lld's trap filling. Yes this violates the strict interpretation of "don't overwrite segment contents" but its ok in this case I think. Making binaries more secure is a good thing!
Thu, Sep 19
To be clear though, we have a test ensuring that -B is ignored, correct?
Wed, Sep 18
Tue, Sep 17
Mon, Sep 16
Sep 13 2019
I'm happy with this since the votes seem to favor this behavior.
LGTM! Thanks for working on this.
What about having ELFcopyConfig inherit from CopyConfig?
Sep 4 2019
We should certainly check perf. I'd also like to wait on this until Roland gets back in order for him to chime in since he had specific examples of use cases where program headers are required as I recall. Specifically the section virtual addresses are not sufficient in the arcane case of prelinking but prelinking isn't really something I know very much about.
If our intent is to prefer using -O over -B when possible (as seems to be the case) then this all looks good to me. I just want an explanation if one exists.
So it seems like we've been preferring the architecture implied by -O instead of the one preferred by -B. My intuition is that it should be the opposite. Did we find a reason to prefer using -O?
Sep 3 2019
The desired behavior is to keep the program headers intact as much as possible. elf utils for instance just flatout ignores them in this case. Program headers are needed by many tools to matchup the virtual addresses with the sections. I think outside of virtual address and memory size there isn't much that's actully needed in practice however. Ideally we wouldn't even change the file size of segments but I think I've structured llvm-objcopy in a way that makes that difficult unfortunately. I'm 100% ok using a distinct layout algorithm for --only-keep-debug as its a special snowflake where the meaning of segments drastically changes and segments are critical to layout. I'm also ok changing the file size of segments as I don't think its likely to cause a problem with tools like debuggers (there also exist other tools that aren't debuggers that need to perform the same kinds of actions).
This is obsolete I think. Clang doesn't use llvm-objcopy for this anymore if I understand correct. @pcc should be able to confirm.
Aug 26 2019
Landing aside, this looks good to me
Aug 2 2019
This looks fine to me
LGTM other than a couple atomicity issues.
This looks awesome!
I think everything but the implementation of genIndex being confusing looks good to me. I haven't actually groked how that code works yet other than the fact that it generates the index tree that I expect and all the surrounding code looks good to me. I understand that some of the trickiness here comes from the fact that you're building from a list of values but trying to generate the tree structure from that list which is hard. I think we can structure that code better; lets see if we can't device a better algorithm.
The code looks fine to me but I don't understand the global details of this. Hopefully Julie can still review things from there but if not we don't get timely reviews then we can still land this.
What's the use case for this and how do we handle return codes when an error occurs but we continue on?
Jul 23 2019
Sorry there's a lot for me to catch up on it would seem. Would it be possible to get an of how having a mutable subclass of ELFObjectFile works us towards our goal of having a library?
Jul 18 2019
I'm blindly accepting the CSS but all other code looks fine. I still have that chrome bug where I can't LGTM sometimes but consider this reviewed and accepted by me as well
Jul 16 2019
Ok so I thought about endianness more and I think that's not an issue because computations are generally done in native endianness. I do however think you should definitely have either the 64 bit version be the default or have both a 32 and 64 bit version. This type is most useful in cases where tools like lld, llvm-objcopy, yaml2obj, etc... would be using an internal structure or temporary variable and generally you want to handle both 32-bit and 64-bit operations in these cases. The solution is always to default to the highest bit size which is 64-bits because it will handle the 32-bit case just fine. Unless there's a compelling reason to not use 64-bits I think you should just make it your default. Certainly before lld or llvm-objcopy can use this we need to have a 64-bit version.
Jul 12 2019
I think teaching more raw_ostreams about seeking is probably right. I started to propose a solution but it boiled down to make our own abstractions over existing ostreams and what amounts to raw_string_ostream. I think raw_string_ostream and raw_os_ostream should both support seeking
Jul 8 2019
(sorry I just realized this was already landed, I was out when it was bumped).
Sorry looking now.
The code looks good to me. I'll let Julie give the final architectural approval.
As in I'm not quite ready for this to land but It looks preety solid. I think if you respond to those comments I'll take one last look and then give an actual accept
This looks good to me!
I can't verify the windows parts today. If you can wait until tomorrow I can run it on a windows machine to test it.
Jul 2 2019
LGTM (sorry a chrome bug is preventing me from using drop down menus...sigh)
Overall I think this looks good. Others have been following the details far more closely that I've been so I'll defer to them.
Speaking with Roland on this it seems to be that the established protocol for how you match up vaddrs is to use the program headers. Roland invoked arcane scary references about prelinking and other sharp corners about other possible methods but I don't know or understand the details. The guidance I'm getting is that we need to keep the program headers virtual address space correct. He has stated the program headers are used by many existing tools and that the consumers are more varied than just the two major debuggers. If you need more than a "here be dragons" from Roland I'll probe for more information but I'm content enough with just knowing that the existing protocol requires using program headers.
My feeling is that segments are not important. The offsets of those NOBITS sections are not important. If segments are laid out weirdly after objcopy --only-keep-debug, I think it doesn't matter if llvm-objcopy also fails to layout the segments in a sensible way. The important thing here is that debuggers can correctly load debug info from such --only-keep-debug processed files. So I asked you test the workflow:
I think this change as is should not go in. I think I'd like to explain some details that I've learned about this flag and how it functions in practice since I've had to consume binaries produced by this and it's equivalent in eu-strip.
Jul 1 2019
How important is using ErrSaver. Isn't it possible to just use std::string in these error cases?
This is *awesome*. This is much more clear and precise that the GNU objcopy documentation!
Jun 28 2019
Can you upload git diff -U100000 <commit before genHTML> functions so that I can see the code without it being fragmented? It's hard to parse some of the more fragmented stuff here
This all LGTM now. Please wait long enough for others to get their say but overall I'm happy now.
I'm getting confused in all the different patches. I reviewed the HTML generator one like yesterday and was awaiting changes. Doesn't this do the same thing?
Jun 27 2019
Overall looks better. One of my comments causes a sweeping change to occur so I'll respond back after thats done.
Jun 26 2019
How invasive would it be to change this to not use an ostream but to instead use an in memory buffer? It would be useful to know where the use cases come from. Do we intend to use this inside of MC? Do we intend to output raw files? When outputting to MC is the space preallocated? Is there a reason that using an ostream would let us avoid a copy? If we can't avoid a copy or theres not an easy way to know the total size in advance then I'm pro using an std::vector<uint8_t> as the underlying storage mechanism instead of an ostream.
Jun 25 2019
Sorry about the lull here. This LGTM
Jun 24 2019
I got to the 'genHTML' functions and I decided that this general approach is not great.
Jun 20 2019
General idea looks good to me!
Jun 17 2019
Jun 14 2019
A bunch of nits but this looks good to me.
Jun 13 2019
Jun 12 2019
Sorry I wasn't very quick to get to this. This looks really good to me. I'm going to go over some of the this with a fine tooth comb tomorrow but It basically looks good as at high level glance.
Jun 10 2019
Addressed Armando's comments
Jun 6 2019
Actually if we can make Description a hash set that would be best. I think using std::set would require an awkward < operator to be defined. There should be standard ways of getting hashes out of all the string and vector types and you can use hash_combine to combine all of these into a single hash.
Jun 5 2019
This is just where I got tired today but I think I can recommend how to split this up so I could move faster and provide more useful high level review. Prior to splitting I'll keep chugging away for at least a bit each day.
"goes past the end of the file" is the phrasing that I've seen GNU tools use and gets more to the point so I agree with James. Other than that this looks good to me!
Jun 4 2019
I'm happy with this as is but I'd like @jhenderson, @rupprecht, and @alexshap to take a look since this represents a fairly major feature addition even if the code is pretty small. I don't believe this imposes any extra assumptions or requirements on the rest of the code however.
Neat! I didn't even know this was possible TBH
May 31 2019
Might label as NFC.
This looks good to me but please have someone else accept first as well.
Ok I spoke offline about some of my questions with Peter. Here are some useful facts:
May 30 2019
Alternatively we could only sort program headers in the layout algorithm and then just layout sections in whatever order we please. It will just mean that our layout won't mirror the input but it will be valid.
Alternatively we can sort by Index to allow ourselves to make mistakes in maintaining the order along the way but I think I'd rather not do that since that's just a hack.
Can we just sort in the layout code instead of sorting everything? Then we don't have to sort at all. I think we've largely been good about making operations stable so order should be maintained if just don't sort.
May 29 2019
A few questions
If it isn't too much work I'd prefer these be split up as well. Otherwise this looks like a strict improvement to me!
May 23 2019
I think this is good. The current behavior treats sectionWithinSegment as not mattering in this case because this predicate is used for getting file offsets right and ignores the SHT_NOBITS case. This is a strict improvement that just sets the predicate in such a way that llvm-objcopy will output the conventional offset now.
May 22 2019
Solid improvement. LGTM.