- User Since
- May 16 2017, 11:34 AM (108 w, 6 d)
Fri, Jun 14
A bunch of nits but this looks good to me.
Thu, Jun 13
Wed, Jun 12
Sorry I wasn't very quick to get to this. This looks really good to me. I'm going to go over some of the this with a fine tooth comb tomorrow but It basically looks good as at high level glance.
Mon, Jun 10
Addressed Armando's comments
Thu, Jun 6
Actually if we can make Description a hash set that would be best. I think using std::set would require an awkward < operator to be defined. There should be standard ways of getting hashes out of all the string and vector types and you can use hash_combine to combine all of these into a single hash.
Wed, Jun 5
This is just where I got tired today but I think I can recommend how to split this up so I could move faster and provide more useful high level review. Prior to splitting I'll keep chugging away for at least a bit each day.
"goes past the end of the file" is the phrasing that I've seen GNU tools use and gets more to the point so I agree with James. Other than that this looks good to me!
Tue, Jun 4
I'm happy with this as is but I'd like @jhenderson, @rupprecht, and @alexshap to take a look since this represents a fairly major feature addition even if the code is pretty small. I don't believe this imposes any extra assumptions or requirements on the rest of the code however.
Neat! I didn't even know this was possible TBH
Fri, May 31
Might label as NFC.
This looks good to me but please have someone else accept first as well.
Ok I spoke offline about some of my questions with Peter. Here are some useful facts:
Thu, May 30
Alternatively we could only sort program headers in the layout algorithm and then just layout sections in whatever order we please. It will just mean that our layout won't mirror the input but it will be valid.
Alternatively we can sort by Index to allow ourselves to make mistakes in maintaining the order along the way but I think I'd rather not do that since that's just a hack.
Can we just sort in the layout code instead of sorting everything? Then we don't have to sort at all. I think we've largely been good about making operations stable so order should be maintained if just don't sort.
Wed, May 29
A few questions
If it isn't too much work I'd prefer these be split up as well. Otherwise this looks like a strict improvement to me!
Thu, May 23
I think this is good. The current behavior treats sectionWithinSegment as not mattering in this case because this predicate is used for getting file offsets right and ignores the SHT_NOBITS case. This is a strict improvement that just sets the predicate in such a way that llvm-objcopy will output the conventional offset now.
Wed, May 22
Solid improvement. LGTM.
Tue, May 21
Thanks for fixing all of my mistakes!
Mon, May 20
Awesome. LGTM. Tell me if you want me or someone else to land it. You can also email Chris Latner to get access yourself.
May 16 2019
I'm still confused. If the behavior intended only has to do with the existence of relocations why are we making this change? A clear difference of behavior needs to be pointed out and that hasn't been made clear.
I have the same question as James with respect to requiring PT_DYNAMIC but otherwise the code looks good to me. I'm still looking at the tests.
Also what does GNU objdump do in this case?
When uploading changes make sure to upload the whole context. If using git diff use git diff -U100000 or some other large number to get all of the surrounding context.
link to the bug for the lazy: https://bugs.llvm.org/show_bug.cgi?id=41862
May 13 2019
I'm still a bit unclear. Does it strip symbols that are duplicates of symbols in .dynsym? Does it just get rid of the the whole symbol table no matter what?
May 10 2019
This shouldn't emit the .tbe format at all, the .tbe format is meant to be in near 1-1 correspondence with .so files, not with .o files. It has things like DT_NEEDED, and DT_SONAME related information that don't make sense for .o files for instance.
May 9 2019
The following is the first split. Subsequent splits will be much smaller (adding support for program headers, adding support for soname and needed, adding support for symbols, etc...): https://reviews.llvm.org/D61767#change-LCioCf1j7osU
I'm confused as to what this is trying to accomplish. This really just marks every symbol as unneeded in a DSO and keeps the behavior for other cases. In that case why not strip the symbol table all togethor? This is a very round about way to accomplish that.
May 6 2019
Can you explain how this occurred a bit more and how this is solving the issue?
May 3 2019
- No more TLS stuff but STT_TLS symbol types are preserved which appears to be sufficient for all known use cases
- Fixed all the tests, accidental extra tests, minor issues, and made all tests pass.
@MuskRay Have you checked other linkers or tools?
Ok so I went digging on the PT_TLS and tls section issues. Seems like removing both the segment and the section is the right way to go.
May 2 2019
Ok so I'm good with this. We would like to land this ASAP.
Jake, I am still not sure what you would prefer for moving forward. The current patch can output the yaml format I originally proposed or the one that looks similar to the one you proposed. What I need to know is:
Do you want the merging if the "ifo" files to happen inside of llvm-elfabi?
Do you care if we upstream the yaml we proposed as a first step (which is really a distilled version of that yaml2obj can consume anyways. this right now functions actually) ???
Or, would you rather the ifo files be merged by a different separate tool (something maybe called llvm-ifsogen)???
May 1 2019
Ok we're getting closer
I can't upload a new patch right now but I plan on making all recommended changes, even the ones I didn't reply to.
This code looks good to me and the use case seems valid. I'll give my LGTM but please wait for others.
Good idea! The code looks good as well as testing.
Apr 30 2019
Oh and this is the big picture. I'll be splitting this up somehow.
Ok so this is the rough idea that I have for doing this. I need to add program headers which turns out to require refactoring this a bit. The general idea takes some explanation. There are multiple ways to think about it however. The goal is for everything to "work itself out" automagically. I'll need to add program headers as many bits of functionality are not easily testable here unless we add them. You'll note that I put the NOBITS sections before the other sections. This doesn't jive well with using a single program header to cover them all and since space is a thing we'd like to constrain here we should support it. Unfortunately I did that as optimization as a means of avoiding the dynamic symbol table needing know the addresses of the NOBITS sections before starting. This is overcomeable using an additional loop however. I think an additional loop would be needed even if you hand rolled however and by adding this loop we can probably avoid maintaining an intermediate vector for the dynamic symbol table. Instead we can compute the overall size first, allocate the FileOutputBuffer, and then write directly into it. I'm not sure the complexity or redundancy is justified by the memory savings and I think the time savings will be pretty minimal.
Wanted to mention, from my testing with yaml2obj, I needed the section flags (SHF_EXECINSTR, SHF_WRITE, SHF_ALLOC, etc) otherwise nm was spitting the wrong thing. How are you going to handle those?
Oh and "ifo" is a good name.
The linker doesn't look at section permissions. In fact the only thing it even looks at the section index for is to check for alignment for copy relocations which is something most people don't even use.
Well I think it should be a seperate tool but because the code inside llvm-elfabi isn't a library yet (though it is written to be used like one) we can do that by adding a symlink and changing the behavior of the underlying binary to present as a different tool. Many different tools around llvm do this for similar reasons.
There are two parts to this. TextAPI is a public interface of textual representations for these sorts of files that already exists in llvm. There exist ELF and MachO textual representations currently. llvm-elfabi is a tool for 1) Creating Stubs 2) Creating .tbe (the TextAPI fully linked ELF format) files 3) Converting between them and 4) Analyzing the details of the formats.
If multiple note sections with unaligned sizes but larger than 4-byte alignment are used this solution doesn't work.
Does llvm-elfabi consume your proposed Schema format? Has it landed yet?
Apr 29 2019
With respect to group handeling. I think we should just check that the details of every symbol across all unmerged files that are being merged are consistent and then ignore duplicates.
I think the !tbd-elf-v1 versioning can help with any changes or alterations we want to make along the way too.
To clarify my format is for the unmerged format. The merged format should be the ELFStub internal representation found in llvm's TextAPI. The plan is to have at minimum .tbe and ELF stubs. I'm actively working on getting the ELF stub output which can then be converted to the yaml2obj format since they will just be ELF files.
I think I know the format that I'd like *roughly*. I haven't worked out how to handle COMDAT and section groups yet. Roland seems to think that COMDAT and section groups will be an issue but I'm working out the details with him to make sure I understand it. I'll get back to you soon.
I recall @ruiu telling me that mips had a really odd endianness that wasn't accounted for by the standard ELFT types. Is that still true?
Apr 26 2019
Gonna take this one too.
I'm picking this up. Looking for review on this again. I'm going to drive this to completion enough to use on Fuchsia and then maintain it. After it has the features we want for Fuchsia I'll work on adding features that other people want but at a slower pace (like I did after a while with llvm-objcopy) eventually I'll ramp down but honestly, unlike llvm-objcopy, I could see this tool stabilizing.
@ruiu and others. I'm still good to land this as is. Are you ok with me landing this?
The larger alignment isn't being set by mistake there's just an odd note that's being added to linux binaries right now and its causing this sort of headache.
Apr 25 2019
Why would you convert the actual format, which is already using YAML, to the yaml2obj format? That seems to have zero gain to me.
This appears to be correct. StripDebug is a preety simple and minimal field so we can go ahead and add this. The keep symbol thing still looks a bit funny.
It would be helpful to convert all of the listed tools over as well for consistency
Eh I guess this is fine. I'm kind of 50/50 on this myself. I don't think you'll break any major team or code base I know about but I could be wrong. I don't really like fragmenting the standard used across these tools but grouped options increases compatibility so I think on balance I support this.
As an example, .tbe and .tbd files use YAML but in a very specific well defined structure. yaml2obj also uses a specific format which is actually well defined, it just doesn't map on very well here.
There seems to be some confusion about the term "format" here. There's the YAML format but within that there are specific YAML formats. My proposal will be to have a specific YAML format that just isn't the one used by yaml2obj. YAML isn't the issue, it's the structure that that tool specifically uses.
Maybe we should land this and wait for someone to have that issue. Can someone file a bug about that issue now?
Hmm...right now we don't parse the dynamic symbol table but parsing it just enough to check if no symbols are needed and then not throwing an error if no symbol table is needed seems like a better option than using that flag. That flag should of course also be respected here however.
Apr 24 2019
To be clear I'm only really giving my thumbs up to a specific high level plan that I believe handles a case that llvm-elfabi wouldn't otherwise handle on its own. I consider the specifics like the output format still in the air right now. I strongly believe that a unique yaml or json format should be used for this. We should discuss the specifics of what should go in that format and have a discussion on that format. The trick is to capture everything in .o files that could wind up mattering to a .tbe file and nothing more. I haven't thought though what that entails completely yet but I'll think about it and throw up my proposal. Once we have at least one proposal we can start iterating on the specifics of it.
Apr 23 2019
The strong motivation here is to avoid having the list be maintained outside of the source code. I am not too strongly tied to the YAML approach as long as the binaries that we get at the end are equivalent to the pruned ELF binaries from the YAML2ELF approach. However, the trade off here is valuable - it allows for the source code to be the source of truth for the interface set. The intermediate emission is due to the fact that this requires program level visibility and having the intermediates an extra step to merge is relatively simpler. In fact, you could emit the TBE files from this and check that into the tree if you really wanted (but allowing the source to remain the source of truth I think is a valid and valuable trade off). The reason for the selection of YAML is simply pragmatism - that is already available in the tree. If the TBE linker is available now, I have no issue with that being the intermediate representation (assuming that the representation is easy to parse programatically).
I actually don't see why something like this couldn't be used in both scenarios (ie some users generating the files at build time, and others using -emit-ifso to periodically generate and land changes to checked in files).
Why is this better than producing output from the output of the static linker at the end? Also how do we plan on integrating this with llvm-elfabi so that we don't have unnecessarily duplicated efforts? Why does that tool not suffice?
Apr 22 2019
I'm not misunderstanding, please seem my post on llvm-dev where I layout the different issues involved. You're underestimating the complexities involved here I think. Trivializing what the linker does to "the linker just merges these" is a gross oversimplification that doesn't account for symbol consuming vs exporting, size, alignment, visibility, linkage, symbol type, symbol versioning (which is impossible to handle here without consuming version scripts as was pointed out by someone else), which symbols are untimely exported/consumed or even intermediate operations on .o files via tools like objcopy (which is uncommon and shouldn't be a hard blocker here but something to think about). Also all linkers behave just a bit differently with respect to these issues so 1) the reality is that the thing that merges these will behave just a bit differently than any existing linker but 2) even if it trys to match one linker it will fail to match the other so it will only be compaitable in all cases with atmost one linker (but probably none when edge cases are considered. I also already pointed out the issue with having to search all source files on the tree. I agree that if you were to use the linker to accomplish this that would would have to look at all source files. Further more many flags passed to the linker untimely affect these things making matters more complicated. The "linker" you describe for these cases is far from "cat".
There's a lot of code to review here. I'll keep reviewing it everyday but this is going to take a while to review. Any help on splitting this up and making into smaller chunks would be helpful. Splitting reading and writing up into two separate patches would be helpful and removing features that we can add later would be helpful.