This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Add support for large indexes
ClosedPublic

Authored by jakehehrlich on Jul 11 2018, 4:00 PM.

Details

Summary

This patch is an update of an older patch that never landed (see here: https://reviews.llvm.org/D42516)

Recently various users have run into this issue and it just 100% has to be solved at this point. The main difference in this patch is that I use gunzip instead of unzip which should hopefully allow tests to pass. Please review this as if it is a new patch however. I found some issues along the way and made some minor modifications.

The binary used in this patch for testing (a zip file to make it small) can be found here: https://drive.google.com/file/d/1UjsnTO9edLttZibbr-2T1bJl92KEQFAO/view?usp=sharing

Diff Detail

Repository
rL LLVM

Event Timeline

jakehehrlich created this revision.Jul 11 2018, 4:00 PM
jakehehrlich edited the summary of this revision. (Show Details)Jul 11 2018, 4:01 PM
llvm/test/tools/llvm-objcopy/auto-remove-shndx.test
1 ↗(On Diff #155075)

just in case - I don't see any other tests using gunzip - not sure if it's available by default (in particular, on the machines where build bots are running)

jakehehrlich added inline comments.Jul 11 2018, 4:46 PM
llvm/test/tools/llvm-objcopy/auto-remove-shndx.test
1 ↗(On Diff #155075)

There is no published list. I'll just have to try it and revert it if it doesn't work. This is the easy case that *might* just work. If this dosn't work then I'll add some python to each of these tests to unzip the file instead.

llvm/tools/llvm-objcopy/Object.h
404 ↗(On Diff #155075)

similarly to how it's done in the other similar cases I'd initialize Symbols with nullptr

I might be going overboard with the comments around "auto", and I didn't highlight all of the ones I spotted. But here's the link to the corresponding style point: https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable. Feel free to ignore what I said if you think they're valid uses.

llvm/test/tools/llvm-objcopy/auto-remove-shndx.test
1 ↗(On Diff #155075)

I just tested, and my Windows machine doesn't have it, so it's not part of the GnuWin32 tools, which are the requirement.

llvm/test/tools/llvm-objcopy/remove-shndx.test
2 ↗(On Diff #155075)

Would you mind first putting in a separate patch to LLVM to rename .symtab_shndxr to .symtab_shndx, which is the standard name? I'd rather not propagate the incorrect name further.

llvm/test/tools/llvm-objcopy/strict-no-add.test
1–2 ↗(On Diff #155075)

Just musing about this a bit. I wonder if this is the behaviour we should follow. It's certainly correct to do so, but it's unclear from the gABI whether it's okay to generate it under other circumstances. If we think it's okay to, maybe it would be simpler to just always generate the table if there are >= SHN_LORESERVE number of sections?

llvm/tools/llvm-objcopy/Object.cpp
164 ↗(On Diff #155075)

I'd rename this to something like "IndexesBuffer" or similar, just because I got confused between Indexes and Sec.Indexes on the line below.

203 ↗(On Diff #155075)

I'm not sure "proper value of st_shndx" is quite clear enough. Maybe "the value that will appear in st_shndx" might be better?

705 ↗(On Diff #155075)

What is the type of Symbols here? I think the LLVM coding style is not to use auto in these situations.

706 ↗(On Diff #155075)

Ditto.

723 ↗(On Diff #155075)

Ditto.

726 ↗(On Diff #155075)

Hmm... not sure about this error message, since the section isn't invalid, rather the section index is. So, maybe something like "Symbol 'bar' has an invalid section index 12345".

I assume that testing this would be unrealistic due to the size of the required test input?

857 ↗(On Diff #155075)

initilize -> initialize (x 2)

859 ↗(On Diff #155075)

refernece -> reference

1021 ↗(On Diff #155075)

See comments about auto above.

1233–1238 ↗(On Diff #155075)

Could you use std::any_of here?
Also, I might have missed something, but why the update to Sec.Index for large-index sections only here? You assign the indexes further down anyway.

1245 ↗(On Diff #155075)

but if -> but if we

1271 ↗(On Diff #155075)

I wouldn't use the function name in the comment here. I'd just say "prepare for layout"

1277 ↗(On Diff #155075)

"For instance" -> "For instance,"

llvm/tools/llvm-objcopy/Object.h
419 ↗(On Diff #155075)

I wonder if this should be the default for all SectionBase classes (i.e. initialise the field in the header)?

Now uses python to uncompress.
Responded to comments.

jakehehrlich marked 16 inline comments as done.Jul 12 2018, 3:00 PM
jakehehrlich added inline comments.
llvm/test/tools/llvm-objcopy/auto-remove-shndx.test
1 ↗(On Diff #155075)

siiiiigh. I'll rewrite the test to work using python then.

llvm/test/tools/llvm-objcopy/strict-no-add.test
1–2 ↗(On Diff #155075)

I think the cases where anyone would care about it too much are few and far between which is why I'm not breaking my back to solve the case where it overly eagerly emits a shndx table by 1 index. I started to write the code for doing this but it just winds up being that we delete some code that I already wrote so why not keep it I guess. If you think the system as a whole is strictly improved by removing that code I'll remove it.

llvm/tools/llvm-objcopy/Object.cpp
705 ↗(On Diff #155075)

Our options here are

ElfFile<ELFT>::Elf_Sym_Range
typename ELFT::SymRange
ArrayRef<typename ELFT::Sym>

I've presented these in order of unfolding. They're all shorter than I expected. The last is the most clear but also it means that if ElfFile or ELFT changes their SymRange type an issue could arise. I find most of these either verbose or opaque however.

706 ↗(On Diff #155075)

I'm going to push back here because a typename and collon is needed. e.g. there is extra difficulty.

726 ↗(On Diff #155075)

This is actually testable quite easily by uploading a specially constructed binary (which I'm getting pretty good at these days). Basically construct a binary that uses SHN_XINDEX and then puts an overly large index in the shndx table despite there being only 1 symbol. This would require just a bit of yaml2obj and some hex editing. The problem is that constructing such a binary is liable to take me 2-3 hours and I don't want to spend that time just to test one error message.

llvm/tools/llvm-objcopy/Object.h
419 ↗(On Diff #155075)

Sure, I'll go ahead and set it that way, why not.

jakehehrlich marked 3 inline comments as done.

Forgot to update error message for one of James' comments

jakehehrlich marked an inline comment as done.

Updated shndx name for this test.

to me looks good, though I'd wait for James.

This revision is now accepted and ready to land.Jul 12 2018, 4:08 PM
jhenderson accepted this revision.Jul 13 2018, 4:23 AM

LGTM, with the nits, assuming the tests aren't realistic without spending hours at it.

llvm/test/tools/llvm-objcopy/remove-shndx.test
1 ↗(On Diff #155283)

Probably worth putting a comment at the start of this test, to say what it is doing, because at the moment the test name is "remove-shndx", but the test doesn't actually do so!

llvm/tools/llvm-objcopy/Object.cpp
712–713 ↗(On Diff #155283)

Do we have a sensible way to test this, without binary editing something?

720–721 ↗(On Diff #155283)

Ditto. Is a test here feasible?

203 ↗(On Diff #155075)

Nit: will appear st_shndx -> will appear in st_shndx

726 ↗(On Diff #155075)

Okay, fair enough. I won't force it. I wonder if maybe this is an indication we need a "relaxed" mode for yaml2obj that allows us to write arbitrary values to the file? Anyway, not really part of this, more just idle musing again.

I think my ideal scenario would be able to write a symbol with a custom section index to the symbol table, even if that doesn't refer to a real section index.

@jakehehrlich - is there anything blocking this ?

Nope, I was just about to land it. I wasn't in office Friday and I was conducting an interview this morning. It will be in shortly.

This revision was automatically updated to reflect the committed changes.