- User Since
- Jan 18 2017, 2:49 AM (117 w, 2 d)
Thu, Apr 18
There's a counter-example to this though, which I'm concerned about. If we silently accept --special-syms, with it being the default always, there could be configure scripts that don't specify --special-syms that expect to NOT see those symbols (e.g. they're looking for all "normal" symbols). This in turn could result in a silent breakage.
In general, we should try to emulate GNU nm behaviour (that was the conclusion from the recent LLVM conference). However, I'm slightly concerned about changing behaviour breaking users of the LLVM tools. It might be worth pinging llvm-dev to see if there's anybody who would be unhappy with changing the default at this point, or even just to let people know that this will happen, unless somebody objects. Personally, I have no objections to it, but I don't have any use for llvm-nm and ARM currently.
LGTM, with two nits.
Add TODOs. Fix typos.
Wed, Apr 17
Rebase & address review comments:
- Remove unnecessary empty output checks.
- Add strip option and tests.
- Reformat one bit.
- Use distinct check prefixes for the error messages.
LGTM, with a couple of minor comments.
Refactor an if block missed in previous change. Rename switch to --allow-broken-links.
I have only one more minor comment. Happy for this to go in with or without it, but @probinson should give the final thumbs up.
LGTM, but I wouldn't mind a second pair of eyes on the new logic (@rupprecht?), as I'm not really familiar with how the DILineInfo class works.
Tue, Apr 16
Re-arrange some if/else statements and improve/add some comments and the option help text.
As I'll be away next week, I've added a couple of others to help with reviewing this too.
Code all looks good, but you should talk about the effect of --output-style on the output for both llvm-symbolizer and llvm-addr2line as appropriate.
Looks good, aside from one comment about the tests.
Mon, Apr 15
Could we have a test case that shows that llvm-symbolizer --output-style=GNU produces the same as llvm-addr2line for the inlining stuff, please?
LGTM. I'm slightly surprised the latter checks needed modifying (it's okay to though), but aside from that, this is fine.
FWIW, there is actually a build bot that does the conversion, and it fails if the syntax is bad, so we should aim to get it right first time if possible.
LGTM, with one nit.
Fri, Apr 12
LGTM, with nit. Might want to get @MaskRay to give thumbs up on the macro stuff, but it's probably fine.
Okay, basically looks fine. Just a couple of small comments.
There is no test for PRC, OS and RSV kind of symbols. Does anyone know how to generate them?
Use yaml2obj, with an explicit section index, but actually you don't need to test them as they are already tested by test/tools/llvm-readobj/elf-symbol-shndx.test.
LGTM, but it would be worth getting a second pair of eyes on the documentation who is more familiar with how the markup works, to make sure it is correct.
Yes, it's a bug. Where the original st_shndx field is between SHN_LORESERVE and SHN_HIRESERVE (and not SHN_XINDEX), it should be decorated in this way. In all other cases it shouldn't be. It's a little complicated to explain in detail, but I'll do my best. If we look at the section header table, there are sections with index 0xff00, 0xff01, etc. Thus when we generally refer to sections with index 0xff00 etc we are talking about those sections. The symbol st_shndx field (i.e. the field indicating the symbol's section) however is only 16 bits, and the upper few values are reserved for various purposes. The SHN_XINDEX value (0xffff) in this context means look up the section index in the SHT_SYMTAB_SHNDX section instead. Other values in this range have other meanings (e.g. absolute and common symbols). If the real section index for the symbol's section is >= SHN_LORESERVE, it is impossible to represent it in directly in st_shndx, so it will be SHN_XINDEX.
I don't really know much about llvm-as. Are all the filtered-out options irrelevant? In particular -color seems like it might be relevant.
I agree, although it's worth noting that llvm-symbolizer DOES have this documentation, so it may be worth piggy-backing on that rather than writing an entirely new doc.
Good tip, thanks. It's probably worth bearing in mind in case we have any more tests along these lines that the existing pre-canned binary can't be used for.
Basically looks good to me. Just a few small comments remaining.
Thu, Apr 11
LGTM. As I mentioned earlier, @jakehehrlich should review this before it gets committed. However, he may be away, so you might have to wait a few days.
You can use objcopy -R .dynamic to remove the .dynamic without resorting to hex-editing. I've seen developers do some pretty weird "corner case" things and still expect things to work, so it would be good if we can achieve it, assuming the logic isn't too hard. Since it just makes sense to ignore the ELF type entirely, I think this resolves the concern anyway.
Sorry, I skipped over that somehow (I think I jumped straight from the bug). We already have at least one test that uses many sections (see many-sections.test). That unzips a pre-built object to achieve this. One option might be to update that object file (and possibly the test), if it can satisfy the prerequisites you need.
Splitting on a character would make a lot of sense. Most tools use '=' or similar for multi-string arguments. However, if there is a Mach-O version of llvm-nm that inspired this option, we should be compatible with that if at all possible.
Have you looked at unit-testing the new code? It might be nice if you can.
It's not even a new issue. We've had this problem for years in the PlayStation proprietary linker, because ASLR-enabled executables have a 0 base address. We have special handling of discarded functions to work around this in the linker.
Having looked at this a bit more, I wonder if this should be part of the existing SectionsToRename loop. After all, there isn't much difference between the two behaviours, apart from the fact that this new switch doesn't change the flags. Would you mind folding the two loops together? As a side effect of this, I expect you might want to rename relocation sections when --rename-section alone is used, which I don't think is currently happening.
I've not looked at the implementation in depth, but cl::DefaultOption seems like a nice way of handling this. Please make sure that there is testing in llvm-readobj and llvm-objdump that test their own short alias interpretation somewhere though.
I'm trying to think where a NOBITS/PROGBITS distinction actually matters. Some relevant thoughts come to my mind:
- Where sections are in the middle of segments, they should always be PROGBITS, not NOBITS, otherwise addresses will get messed up (there is a special exception for .tbss and nested PT_TLS segments).
- If a section is PROGBITS when it doesn't need to be, then it will take up file space. Assuming it is filled with zeroes, this is harmless in terms of how the ELF behaves, and just impacts things like file size and probably load times.
- Converting a PROGBITS section to NOBITS can be harmful if the contents are important, as those contents will be lost.
- Non-alloc NOBITS sections are meaningless. Promoting them to PROGBITS will bloat the file size but otherwise have no effect.
- If there is a GNU binutils test for it, there's probably some good reason for any behaviour that I haven't thought of, so we should match GNU's behaviour, unless it is clearly dodgy.
Wed, Apr 10
Just chiming in from a LLVM binutils developer perspective. Some of the binutils are missing -h as an alias, when they really should have it for compatibility (specifically I checked llvm-nm and llvm-symbolizer). As a result, a solution that auto-adds -h as an alias would be good, as long as we have the ability to override it somehow. I wouldn't mind having logic in llvm-objdump/llvm-readobj to explicitly override the short alias if it is required.
A couple of examples where this might be useful:
- Removing just the dynamic symbols. This would cause an error if there are any dynamic relocation sections. However, not all dynamic relocations require symbols, so potentially NO dynamic relocations need symbols. The .dynsym is then unnecessary.
- Removing the .dynstr. This would cause an error due to .dynamic referencing it. This section doesn't necessarily have tags that reference strings. Additionally, users might want to strip the information, but still provide some of the ELF file for example to aid debugging an issue.
Thanks for the help text. I think we could add a simple test actually. It would test that "Generic Options:" and "llvm-nm Options:" are printed, but not "General Options:" (I think that's the name given to the irrelevant LLVM options, but I could be wrong).
The behaviour change looks good, in the sense that it does what you set out to do. I don't have any real objections to it aside from the fact that it is different from how LLVM tools typically work, but I personally also find it odd the LLVM tools work with a single-dash! I'd like others to comment on this though before giving the thumbs up, particularly @jakehehrlich as the code owner.
I don't think a test for every option is wise, but could you paste the output of the help text with your change applied, please, so I can review it easily?
Fri, Apr 5
Thu, Apr 4
Okay, no more comments from me.
Code basically looks fine, but somebody else should confirm that the format is correct.
I don't feel strongly about this change either way. I think it is correct in the sense that it does what it sets out to achieve, but I have no opinion on whether we should aim for compatibility here. I do agree that non-alloc nobits sections makes no sense, but could there be a crazy build system where somebody modifies the flags in different runs, relying on the later run to add the SHF_ALLOC section?
LGTM, but please wait for one of the other reviewers too.
Do you mean removing for e.g. --set-section-flags=data or whatever? I'm against dropping support for any flags that were added to be compatible with GNU, as most of them have been added with good reason, or are being relied on by end users to do what is expected. Certainly, I've run into some users of these switches, though I don't remember exactly which values.