This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Dwarf decompression support.
ClosedPublic

Authored by plotfi on Sep 9 2018, 1:41 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

plotfi created this revision.Sep 9 2018, 1:41 PM

So I was thinking that we might try and detect compressed section at section construction time instead. The we can implement decompression in terms of CompressedSection...the only blemish seems to be that CompressedSection always allocates. I'm not super sure what the right solution to that is.

On a much more complicated note, I'm thinking I have a nicer and more generic way of handling compressed sections but it's not something you should worry about in this change (unless it sounds awesome to you and you're up for a more complex challenge). If we "decompress" (air quotes explained later) every compressed section at construction regardless of weather it is a debug section then we can even handle compressed symbol tables or relocations! To avoid too much allocation and copying we can lazily decompress (e.g. the airquotes) only when needed. I've had an idea floating around in the back of my head about lazily constructing sections anyway. I'm going to need to think more about a design here but that's what I'm thinking would eventually be a better design.

tools/llvm-objcopy/Object.h
274 ↗(On Diff #164592)

Why do these need to be moved into SectionBase?

tools/llvm-objcopy/llvm-objcopy.cpp
426 ↗(On Diff #164592)

So section replacement is very tricky to do right. It's one of the fundamental operations (like remove and add) that I want on Object. I have not implemented it yet because, well, it's complicated. It's easy to do when no relocations, symbols, or other parts of the system reference the given section but we're almost never so lucky. In the previous change we dealt with relocations via this method but using a different, less generic name. It might be worth doing that again. So either the name should change, or we should bite the bullet and implement a proper replace method in Object. Implementing a proper replace method will require a wide scale change to llvm-objcopy but would have a lot of benefits. My thinking was that replace could be implemented by adding a new method to SectionBase called "replaceSection" much like "removeSection" works now. Architecturally that's pretty simple. The problem is that *every* section needs to be extended with this method and every detail has to be implemented correctly.

If you wanna do it, I'd be super happy to review it. If not, we can just rename this function to something less generic for now. Maybe something like "replaceDebugSections" since it really only handles relocations.

676 ↗(On Diff #164592)

Instead of modifying OwnedDataSection can you just do the decompression here?

tools/llvm-objcopy/Object.h
366 ↗(On Diff #164592)

I'm looking at the changes in OwnedDataSection, tbh, i'm not a big fan of them.
It looks like OwnedDataSection is not the right place for them (now we have a bool flag and a new branch everywhere + some details (like ".zdebug") have leaked into it).

tools/llvm-objcopy/Object.h
366 ↗(On Diff #164592)

one potential approach might be to create DecompressedSection class. cc: @jakehehrlich
(to avoid misunderstanding: ubiquitous "if(DoCompression)" or "if(Sec.DoCompression)" all over the pipeline suggest that this dispatching mechanism should not be in OwnedDataSection)

tools/llvm-objcopy/Object.h
366 ↗(On Diff #164592)

typo: DoDecompression

jakehehrlich added inline comments.Sep 10 2018, 2:26 PM
tools/llvm-objcopy/Object.h
366 ↗(On Diff #164592)

+1 on this general view point. OwnedDataSection is not the right place.

plotfi added inline comments.Sep 12 2018, 8:42 AM
tools/llvm-objcopy/Object.h
274 ↗(On Diff #164592)

Because I set them on "Sec" after makeSection is called. I think maybe if we reuse CompressedSection or add a DecompressedSection this could be moved elsewhere.

366 ↗(On Diff #164592)

Sounds good. @jakehehrlich Would you prefer I reuse CompressedSection or write a new DecompressedSection class? I'll send out a new patch after you let me know.

tools/llvm-objcopy/llvm-objcopy.cpp
426 ↗(On Diff #164592)

Yeah would be a good thing to think about. Alternatively I could maybe decompress the section into a "DecompressedSection" on read, and instead of replacing the decompressed sections I could possibly just flag them for decompressed write? I agree, better to not encourage people to reuse this outside of the context of debug sections.

676 ↗(On Diff #164592)

Sure. Would you prefer I reuse CompressedSection somehow (adding some decompression flag) or make a whole new DecompressedSection class?

plotfi updated this revision to Diff 165168.Sep 12 2018, 3:22 PM

Added DecompressedSection.

plotfi marked 11 inline comments as done.Sep 12 2018, 3:23 PM
tools/llvm-objcopy/Object.h
382 ↗(On Diff #165168)

oh, no, that's not what i meant tbh, and this looks kinda strange (i mean the class with this method), although maybe I'm missing smth.
When I said that probably it would make sense to introduce DecompressedSection I meant that the approach would be in some sense "dual" to what you did for compression in the previous review. What are your thoughts ?

plotfi added inline comments.Sep 13 2018, 9:25 AM
tools/llvm-objcopy/Object.h
382 ↗(On Diff #165168)

I need to read and store the decompressed size at makeSection time. So either I need a decompressed size field on the SectionBase or I need to make the decompressed section at makeSection time. So next I could either replace DecompressedSection with a DecompressedSection that’s flagged for decompression or I could just flag it. Alternatively I guess if I had access to the args that HandleArgs gets I could create a decompressed section at makeSection time that always decompresses when the Arg is set.

plotfi updated this revision to Diff 165331.Sep 13 2018, 10:47 AM
  • using two new classes: DecompressableSection and DecompressedSection.
  • When a compressed debug section is encountered in makeSection we create a DecompressableSection instead of a Section so that we can keep track of the DecompressedSize (which we need ELFT to read).
  • in handleArgs if the Decompression command line arg is set then we replace all of the DecompressableSections with DecompressedSections. The difference being that the decompressed sections unset the SHF_COMPRESSED flag, undo the .zdebug* naming, and set the Section size (and alignment to DecompressedSize and DecompressedAlign. DecompressedSections Writer function also does the decompressing whereas DecompressableSection just writes the original uncompressed data.
plotfi marked 2 inline comments as done.Sep 13 2018, 10:48 AM
plotfi added inline comments.
tools/llvm-objcopy/Object.h
405 ↗(On Diff #165331)

Oops, I think I need to add Align = DecompressedAlign here. What do you think @jakehehrlich

this version looks much cleaner to me than the previous one!

although I'd probably try to make the implementation more symmetric (i mean "compression" and "decompression"). Would that be possible ?

In D51841#1234283, @alexshap wrote:

although I'd probably try to make the implementation more symmetric (i mean "compression" and "decompression"). Would that be possible ?

What does that mean?

@plotfi , oh, sorry, i should have added more context. The thing is, that I've been thinking about this patch more and more, and it seems to me, that it's possible to eliminate DecompressableSection by making the class CompressedSection a bit more robust (probably what you mentioned at some point). For example (not insisting on this approach, but just one potential option): CompressedSection can store (internally) either the original content, or the already compressed content + you can add one more constructor to that class etc. In this case the full picture will probably look a bit nicer and easier to perceive: we will have two classes: DecompressedSection and CompressedSection which are responsible for managing this sort of transformation (replacing a given section with the "compressed" version of it or, instead, with the "decompressed" version of it).

jhenderson requested changes to this revision.Sep 14 2018, 2:22 AM
jhenderson added a reviewer: jhenderson.

Please remember to include me as a reviewer on all llvm-objcopy reviews.

test/tools/llvm-objcopy/compress-debug-sections.test
5 ↗(On Diff #165331)

As with other tests, this test (or possibly the specific compress and decompress tests) should show that the inputs are not modified when using the specified switch.

test/tools/llvm-objcopy/decompress-debug-sections-zlib-gnu.test
8 ↗(On Diff #165331)

Why is this check and the following one operating on the compressed object? That's tested elsewhere. In fact, I'm not sure this test is really testing the decompression at all. It looks like almost a copy-and-paste job from the compressed test.

9 ↗(On Diff #165331)

This check-prefix is badly named, since we are testing more than flags with it.

10 ↗(On Diff #165331)

What is this check achieving?

test/tools/llvm-objcopy/decompress-debug-sections-zlib.test
1 ↗(On Diff #165331)

The same comments apply to this test as the other test.

tools/llvm-objcopy/Object.cpp
155 ↗(On Diff #165331)

No need to use strlen on a string literal. In general, use char arrays and sizeof. However, in this case, make Magic an ArrayRef, and use the size method and std::equals or similar to compare, rather than strncmp (assuming that ArrayRefs don't have a startswith method). It will read much better, and avoids unnecessary casting from uint8_t to const char.

163 ↗(On Diff #165331)

Get rid of the blank lines in this function. They don't really improve readability, in my opinion.

164 ↗(On Diff #165331)

Don't use unsigned. Use size_t, since that's the return type of sizeof.

dataOffset -> DataOffset
Please remember to follow LLVM style.

166 ↗(On Diff #165331)

As "ZLIB" and its size are used everywhere, I suggest making it a constant global variable that you can query with .size(), as required. (See also my comments elsewhere about making it a uint8_t array/ArrayRef).

173 ↗(On Diff #165331)

This may not be your fault, but SmallVector is clearly the wrong thing for this here, since this won't be Small for the vast majority of uses.

175 ↗(On Diff #165331)

Is this explicit cast necessary? If it is (e.g. we support 32-bit hosts still), use static_cast.

1022 ↗(On Diff #165331)

Please add braces here to this case, so that the local variables scope is tied to the case (see e.g. case SHT_SYMTAB_SHNDX).

1026 ↗(On Diff #165331)

Don't use C-style casts. Also, this is the wrong thing to do here anyway. Compare the bytes against an array of uint8_t, rather than converting it to a string. Note that uint8_t does not necessarily convert trivially to char.

1027 ↗(On Diff #165331)

I feel like most of this new code should be in a different function, which would add the section to the object, the call to which would be guarded by this if.

1028 ↗(On Diff #165331)

Remove this blank line.

tools/llvm-objcopy/Object.h
405 ↗(On Diff #165331)

What is DecompressedAlign actually used for? You should certainly be setting Align.

tools/llvm-objcopy/llvm-objcopy.cpp
436 ↗(On Diff #165331)

doReplacement isn't clear what it means from the name alone. Perhaps shouldReplace would be clearer.

452 ↗(On Diff #165331)

NS? Use a clearer name, please, e.g. NewSection

676 ↗(On Diff #165331)

No need for the '{' and '}' here and in the else.

1019–1020 ↗(On Diff #165331)

"... as well as --decompress-debug-sections at the same time" -> "... at the same time as --decompress-debug-sectons"

This revision now requires changes to proceed.Sep 14 2018, 2:22 AM

I may have missed this somewhere, but what happens if you specify --decompress/--compress-debug-sections on a system without zlib?

plotfi marked an inline comment as done.Sep 14 2018, 8:21 AM

I may have missed this somewhere, but what happens if you specify --decompress/--compress-debug-sections on a system without zlib?

For decompress right now it probably hits an unreachable, looking at the current code. But for compress it hits an error message.

What you're supposed to do is call zlib::isAvailable() to see if zlib is available and if not just print an error. llvm/lib/Support/Compression.cpp has unreachable versions of all the other zlib library functions.

tools/llvm-objcopy/Object.cpp
155 ↗(On Diff #165331)

I don't want to use ArrayRefs like that again. They blow up when you compile with clang-8 and the bots fail. I'd rather just use StringRef.

173 ↗(On Diff #165331)

the choices are:

Error uncompress(StringRef InputBuffer, char *UncompressedBuffer,
                 size_t &UncompressedSize);

Error uncompress(StringRef InputBuffer,
                 SmallVectorImpl<char> &UncompressedBuffer,
                 size_t UncompressedSize);

I dont want to use the one that passes UncompressedSize by reference.

175 ↗(On Diff #165331)

Ah sorry about that.

plotfi marked an inline comment as done.Sep 14 2018, 8:21 AM
plotfi added inline comments.Sep 14 2018, 9:01 AM
tools/llvm-objcopy/Object.cpp
166 ↗(On Diff #165331)

I will not use an ArrayRef like that unless I can be convince something like ArrayRef<uint8_t> foo = { 'F', 'O', 'O' } points foo to an allocated array and that "{ 'F', 'O', 'O' }" isn't immediately deallocated. From what I can I have seen, this will not work on clang-8. I will use a std::vector for now.

plotfi updated this revision to Diff 165537.Sep 14 2018, 10:49 AM
plotfi marked 18 inline comments as done.Sep 14 2018, 10:53 AM
plotfi added inline comments.
tools/llvm-objcopy/Object.h
405 ↗(On Diff #165331)

I'm not actually completely sure. But it's in the ELF Chdr so I believe it's just encoding the alignment of the section from what be was before being compressed.

plotfi marked 3 inline comments as done.Sep 14 2018, 10:54 AM
plotfi updated this revision to Diff 165548.Sep 14 2018, 11:26 AM
plotfi marked 3 inline comments as done.Sep 14 2018, 11:28 AM
plotfi added inline comments.
test/tools/llvm-objcopy/compress-debug-sections.test
5 ↗(On Diff #165331)

I don't understand what you mean here. Could you please explain?

test/tools/llvm-objcopy/decompress-debug-sections-zlib-gnu.test
8 ↗(On Diff #165331)

Removed the test, simply added he decompression checks to an existing test that does most of this.

plotfi updated this revision to Diff 165581.Sep 14 2018, 1:31 PM
plotfi marked an inline comment as done.

-Cleanup: removing DecompressableSection, reusing CompressedSection.

tools/llvm-objcopy/Object.h
397 ↗(On Diff #165581)

explicit

plotfi updated this revision to Diff 165595.Sep 14 2018, 2:49 PM
plotfi updated this revision to Diff 165611.Sep 14 2018, 4:17 PM
plotfi marked an inline comment as done.
plotfi updated this revision to Diff 165670.Sep 15 2018, 5:30 PM

You've forgotten to include context in the latest diff. I can't review some aspects of this change without it.

test/tools/llvm-objcopy/compress-debug-sections.test
5 ↗(On Diff #165331)

If you look at a lot of our other feature tests, we make a copy of the input before running llvm-objcopy and then compare %t.o against that copy, to show that llvm-objcopy didn't do an in-place modification of the file.

tools/llvm-objcopy/Object.cpp
139 ↗(On Diff #165670)

This is okay, but you can actually do the following, which is probably even simpler:

static const uint8_t ZlibGnuMagic[4] = {'Z', 'L', 'I', 'B'};

You can then use sizeof to get the size, and std::begin/std::end for the start and end of the array.

172 ↗(On Diff #165670)

You're still using unsigned. Use size_t or uint64_t not unsigned. This will fail for very large ELF files.

tools/llvm-objcopy/Object.h
405 ↗(On Diff #165331)

Sorry, I meant, what is the DecompressedAlign member variable used for. Can we delete it?

tools/llvm-objcopy/llvm-objcopy.cpp
436 ↗(On Diff #165670)

You should probably use function_ref instead of std::function, if I'm not mistaken. Also, I think that shouldReplace and addSection should be upper-case (i.e. ShouldReplace etc), since they are technically variables here.

1023 ↗(On Diff #165670)

"can not" -> cannot

It feels to me like it would make sense to check both decompress and compress here, rather than in two completely different places, since they are closely related. I'd rather not rely on the library providing error messages, when we can error early and up-front.

1019–1020 ↗(On Diff #165331)

You can remove the last part of the error message after "--decompress-debug-sections"

plotfi updated this revision to Diff 165785.Sep 17 2018, 10:23 AM
jhenderson added inline comments.Sep 18 2018, 2:17 AM
test/tools/llvm-objcopy/compress-debug-sections-zlib-gnu.test
10 ↗(On Diff #165785)

This still isn't a good enough test. All it does is show that we have a section named .debug_foo after decompression. It shows nothing about the properties of the section header or that the section has been decompressed. You need to check both (e.g. by comparing the contents of the section/section header before and after decompression). If you bring across the comparisons in compress-debug-sections.test, then that at least will cover the contents aspect.

tools/llvm-objcopy/Object.h
369 ↗(On Diff #165785)

Why this? You should just provide getters. Don't use friend here.

plotfi updated this revision to Diff 166514.Sep 21 2018, 10:53 AM
plotfi updated this revision to Diff 166515.Sep 21 2018, 10:59 AM
plotfi marked 3 inline comments as done.
plotfi added inline comments.
test/tools/llvm-objcopy/compress-debug-sections-zlib-gnu.test
10 ↗(On Diff #165785)

Added checks for flags and content. Lemme know if these are sufficient.

tools/llvm-objcopy/llvm-objcopy.cpp
1023 ↗(On Diff #165670)

I could do that in an NFC commit after this one.

but, pls, also wait for Jake or James

jhenderson requested changes to this revision.Sep 24 2018, 1:58 AM

Still a few comments to address, but this is mostly okay now.

test/tools/llvm-objcopy/compress-debug-sections-zlib-gnu.test
10 ↗(On Diff #165785)

Thanks. One more suggestion: change "CHECK-FLAGS-DECOMPRESSED" to "CHECK-HEADER" and check the size field too. Then also add a check for this against the original object file too. That way, you'll be showing that the decompressed version is the same as the original version pre-compression.

test/tools/llvm-objcopy/compress-debug-sections.test
5 ↗(On Diff #165331)

As with other tests, this test (or possibly the specific compress and decompress tests) should show that the inputs are not modified when using the specified switch.

This hasn't been addressed.

tools/llvm-objcopy/Object.h
405 ↗(On Diff #165331)

Sorry, I meant, what is the DecompressedAlign member variable used for. Can we delete it?

This hasn't been addressed yet either. Is the DecompressedSection::DecompressedAlign variable used for anything?

Similarly, why do we need a separate DecompressedSize member variable for this section?

What is wrong with just using the Size and Align variables available in the base class?

tools/llvm-objcopy/llvm-objcopy.cpp
436 ↗(On Diff #165670)

This comment hasn't been addressed yet either.

1023 ↗(On Diff #165670)

If by that, you mean move the compress/zlib available check to the decompress check's location, that's fine.

This revision now requires changes to proceed.Sep 24 2018, 1:58 AM
plotfi updated this revision to Diff 166701.Sep 24 2018, 9:08 AM
plotfi marked 3 inline comments as done.
plotfi marked an inline comment as done.
plotfi added inline comments.
test/tools/llvm-objcopy/compress-debug-sections.test
5 ↗(On Diff #165331)

I dont understand what you mean? Do you mean the decompressed section is the same as original?

jhenderson added inline comments.Sep 24 2018, 9:54 AM
test/tools/llvm-objcopy/compress-debug-sections.test
5 ↗(On Diff #165331)

No. I mean you should look at the examples in the testsuite, such as discard-all.test and do something similar to show that the input file (%t.o) is not modified, by copying it first and comparing it against the copy after you have run llvm-objcopy.

plotfi updated this revision to Diff 167668.Sep 30 2018, 5:45 PM
plotfi marked 4 inline comments as done.
  • changed std::function to llvm::function_ref
  • added cp %t %t.o to tests
plotfi marked 5 inline comments as done.Sep 30 2018, 5:46 PM
plotfi added inline comments.
test/tools/llvm-objcopy/compress-debug-sections.test
5 ↗(On Diff #165331)

I hope this is what you meant.

plotfi updated this revision to Diff 167669.Sep 30 2018, 6:05 PM
plotfi marked 2 inline comments as done.
plotfi marked an inline comment as done.
jhenderson accepted this revision.Oct 1 2018, 1:38 AM

LGTM, if you make the suggested test changes.

test/tools/llvm-objcopy/compress-debug-sections-zlib-gnu.test
4 ↗(On Diff #167669)

Sorry, I guess I've confused you. You don't need to copy the input in this particular test, as long as one test (i.e. compress-debug-sections.test - see my inline comments) shows that llvm-objcopy does not change the input.

When you do llvm-readobj -relocations -s %t below, just do it on %t.o.

test/tools/llvm-objcopy/compress-debug-sections.test
5 ↗(On Diff #165331)

Nearly, I'll be more specific below.

4 ↗(On Diff #167669)

Rename %t to %t.o, and copy it to %t.copy.o. This is more self-documenting as to the purpose, I think.

8 ↗(On Diff #167669)

Add the following here:

# RUN: cp %tz.o %tz.copy.o
# RUN: cp %tzg.o %tzg.copy.o
19 ↗(On Diff #167669)

Add the following here:

# RUN: cmp %t.o %t.copy.o
# RUN: cmp %tz.o %tz.copy.o
# RUN: cmp %tzg.o %tzg.copy.o
This revision is now accepted and ready to land.Oct 1 2018, 1:38 AM
This revision was automatically updated to reflect the committed changes.