Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
So I was thinking that we might try and detect compressed section at section construction time instead. The we can implement decompression in terms of CompressedSection...the only blemish seems to be that CompressedSection always allocates. I'm not super sure what the right solution to that is.
On a much more complicated note, I'm thinking I have a nicer and more generic way of handling compressed sections but it's not something you should worry about in this change (unless it sounds awesome to you and you're up for a more complex challenge). If we "decompress" (air quotes explained later) every compressed section at construction regardless of weather it is a debug section then we can even handle compressed symbol tables or relocations! To avoid too much allocation and copying we can lazily decompress (e.g. the airquotes) only when needed. I've had an idea floating around in the back of my head about lazily constructing sections anyway. I'm going to need to think more about a design here but that's what I'm thinking would eventually be a better design.
tools/llvm-objcopy/Object.h | ||
---|---|---|
277 | Why do these need to be moved into SectionBase? | |
tools/llvm-objcopy/llvm-objcopy.cpp | ||
435 | So section replacement is very tricky to do right. It's one of the fundamental operations (like remove and add) that I want on Object. I have not implemented it yet because, well, it's complicated. It's easy to do when no relocations, symbols, or other parts of the system reference the given section but we're almost never so lucky. In the previous change we dealt with relocations via this method but using a different, less generic name. It might be worth doing that again. So either the name should change, or we should bite the bullet and implement a proper replace method in Object. Implementing a proper replace method will require a wide scale change to llvm-objcopy but would have a lot of benefits. My thinking was that replace could be implemented by adding a new method to SectionBase called "replaceSection" much like "removeSection" works now. Architecturally that's pretty simple. The problem is that *every* section needs to be extended with this method and every detail has to be implemented correctly. If you wanna do it, I'd be super happy to review it. If not, we can just rename this function to something less generic for now. Maybe something like "replaceDebugSections" since it really only handles relocations. | |
687 | Instead of modifying OwnedDataSection can you just do the decompression here? |
tools/llvm-objcopy/Object.h | ||
---|---|---|
366 | I'm looking at the changes in OwnedDataSection, tbh, i'm not a big fan of them. |
tools/llvm-objcopy/Object.h | ||
---|---|---|
366 | one potential approach might be to create DecompressedSection class. cc: @jakehehrlich |
tools/llvm-objcopy/Object.h | ||
---|---|---|
366 | typo: DoDecompression |
tools/llvm-objcopy/Object.h | ||
---|---|---|
366 | +1 on this general view point. OwnedDataSection is not the right place. |
tools/llvm-objcopy/Object.h | ||
---|---|---|
277 | Because I set them on "Sec" after makeSection is called. I think maybe if we reuse CompressedSection or add a DecompressedSection this could be moved elsewhere. | |
366 | Sounds good. @jakehehrlich Would you prefer I reuse CompressedSection or write a new DecompressedSection class? I'll send out a new patch after you let me know. | |
tools/llvm-objcopy/llvm-objcopy.cpp | ||
435 | Yeah would be a good thing to think about. Alternatively I could maybe decompress the section into a "DecompressedSection" on read, and instead of replacing the decompressed sections I could possibly just flag them for decompressed write? I agree, better to not encourage people to reuse this outside of the context of debug sections. | |
687 | Sure. Would you prefer I reuse CompressedSection somehow (adding some decompression flag) or make a whole new DecompressedSection class? |
tools/llvm-objcopy/Object.h | ||
---|---|---|
382 | oh, no, that's not what i meant tbh, and this looks kinda strange (i mean the class with this method), although maybe I'm missing smth. |
tools/llvm-objcopy/Object.h | ||
---|---|---|
382 | I need to read and store the decompressed size at makeSection time. So either I need a decompressed size field on the SectionBase or I need to make the decompressed section at makeSection time. So next I could either replace DecompressedSection with a DecompressedSection that’s flagged for decompression or I could just flag it. Alternatively I guess if I had access to the args that HandleArgs gets I could create a decompressed section at makeSection time that always decompresses when the Arg is set. |
- using two new classes: DecompressableSection and DecompressedSection.
- When a compressed debug section is encountered in makeSection we create a DecompressableSection instead of a Section so that we can keep track of the DecompressedSize (which we need ELFT to read).
- in handleArgs if the Decompression command line arg is set then we replace all of the DecompressableSections with DecompressedSections. The difference being that the decompressed sections unset the SHF_COMPRESSED flag, undo the .zdebug* naming, and set the Section size (and alignment to DecompressedSize and DecompressedAlign. DecompressedSections Writer function also does the decompressing whereas DecompressableSection just writes the original uncompressed data.
tools/llvm-objcopy/Object.h | ||
---|---|---|
400 | Oops, I think I need to add Align = DecompressedAlign here. What do you think @jakehehrlich |
although I'd probably try to make the implementation more symmetric (i mean "compression" and "decompression"). Would that be possible ?
@plotfi , oh, sorry, i should have added more context. The thing is, that I've been thinking about this patch more and more, and it seems to me, that it's possible to eliminate DecompressableSection by making the class CompressedSection a bit more robust (probably what you mentioned at some point). For example (not insisting on this approach, but just one potential option): CompressedSection can store (internally) either the original content, or the already compressed content + you can add one more constructor to that class etc. In this case the full picture will probably look a bit nicer and easier to perceive: we will have two classes: DecompressedSection and CompressedSection which are responsible for managing this sort of transformation (replacing a given section with the "compressed" version of it or, instead, with the "decompressed" version of it).
Please remember to include me as a reviewer on all llvm-objcopy reviews.
test/tools/llvm-objcopy/compress-debug-sections.test | ||
---|---|---|
6 | As with other tests, this test (or possibly the specific compress and decompress tests) should show that the inputs are not modified when using the specified switch. | |
test/tools/llvm-objcopy/decompress-debug-sections-zlib-gnu.test | ||
8 ↗ | (On Diff #165331) | Why is this check and the following one operating on the compressed object? That's tested elsewhere. In fact, I'm not sure this test is really testing the decompression at all. It looks like almost a copy-and-paste job from the compressed test. |
9 ↗ | (On Diff #165331) | This check-prefix is badly named, since we are testing more than flags with it. |
10 ↗ | (On Diff #165331) | What is this check achieving? |
test/tools/llvm-objcopy/decompress-debug-sections-zlib.test | ||
1 ↗ | (On Diff #165331) | The same comments apply to this test as the other test. |
tools/llvm-objcopy/Object.cpp | ||
155 | No need to use strlen on a string literal. In general, use char arrays and sizeof. However, in this case, make Magic an ArrayRef, and use the size method and std::equals or similar to compare, rather than strncmp (assuming that ArrayRefs don't have a startswith method). It will read much better, and avoids unnecessary casting from uint8_t to const char. | |
163 | Get rid of the blank lines in this function. They don't really improve readability, in my opinion. | |
164 | Don't use unsigned. Use size_t, since that's the return type of sizeof. dataOffset -> DataOffset | |
166 | As "ZLIB" and its size are used everywhere, I suggest making it a constant global variable that you can query with .size(), as required. (See also my comments elsewhere about making it a uint8_t array/ArrayRef). | |
173 | This may not be your fault, but SmallVector is clearly the wrong thing for this here, since this won't be Small for the vast majority of uses. | |
175 | Is this explicit cast necessary? If it is (e.g. we support 32-bit hosts still), use static_cast. | |
1038 | Please add braces here to this case, so that the local variables scope is tied to the case (see e.g. case SHT_SYMTAB_SHNDX). | |
1042 | Don't use C-style casts. Also, this is the wrong thing to do here anyway. Compare the bytes against an array of uint8_t, rather than converting it to a string. Note that uint8_t does not necessarily convert trivially to char. | |
1043 | I feel like most of this new code should be in a different function, which would add the section to the object, the call to which would be guarded by this if. | |
1044 | Remove this blank line. | |
tools/llvm-objcopy/Object.h | ||
400 | What is DecompressedAlign actually used for? You should certainly be setting Align. | |
tools/llvm-objcopy/llvm-objcopy.cpp | ||
436 | doReplacement isn't clear what it means from the name alone. Perhaps shouldReplace would be clearer. | |
452 | NS? Use a clearer name, please, e.g. NewSection | |
677–689 | No need for the '{' and '}' here and in the else. | |
1023–1024 | "... as well as --decompress-debug-sections at the same time" -> "... at the same time as --decompress-debug-sectons" |
I may have missed this somewhere, but what happens if you specify --decompress/--compress-debug-sections on a system without zlib?
For decompress right now it probably hits an unreachable, looking at the current code. But for compress it hits an error message.
What you're supposed to do is call zlib::isAvailable() to see if zlib is available and if not just print an error. llvm/lib/Support/Compression.cpp has unreachable versions of all the other zlib library functions.
tools/llvm-objcopy/Object.cpp | ||
---|---|---|
155 | I don't want to use ArrayRefs like that again. They blow up when you compile with clang-8 and the bots fail. I'd rather just use StringRef. | |
173 | the choices are: Error uncompress(StringRef InputBuffer, char *UncompressedBuffer, size_t &UncompressedSize); Error uncompress(StringRef InputBuffer, SmallVectorImpl<char> &UncompressedBuffer, size_t UncompressedSize); I dont want to use the one that passes UncompressedSize by reference. | |
175 | Ah sorry about that. |
tools/llvm-objcopy/Object.cpp | ||
---|---|---|
166 | I will not use an ArrayRef like that unless I can be convince something like ArrayRef<uint8_t> foo = { 'F', 'O', 'O' } points foo to an allocated array and that "{ 'F', 'O', 'O' }" isn't immediately deallocated. From what I can I have seen, this will not work on clang-8. I will use a std::vector for now. |
tools/llvm-objcopy/Object.h | ||
---|---|---|
400 | I'm not actually completely sure. But it's in the ELF Chdr so I believe it's just encoding the alignment of the section from what be was before being compressed. |
test/tools/llvm-objcopy/compress-debug-sections.test | ||
---|---|---|
6 | I don't understand what you mean here. Could you please explain? | |
test/tools/llvm-objcopy/decompress-debug-sections-zlib-gnu.test | ||
8 ↗ | (On Diff #165331) | Removed the test, simply added he decompression checks to an existing test that does most of this. |
tools/llvm-objcopy/Object.h | ||
---|---|---|
396 | explicit |
You've forgotten to include context in the latest diff. I can't review some aspects of this change without it.
test/tools/llvm-objcopy/compress-debug-sections.test | ||
---|---|---|
6 | If you look at a lot of our other feature tests, we make a copy of the input before running llvm-objcopy and then compare %t.o against that copy, to show that llvm-objcopy didn't do an in-place modification of the file. | |
tools/llvm-objcopy/Object.cpp | ||
139 | This is okay, but you can actually do the following, which is probably even simpler: static const uint8_t ZlibGnuMagic[4] = {'Z', 'L', 'I', 'B'}; You can then use sizeof to get the size, and std::begin/std::end for the start and end of the array. | |
172 | You're still using unsigned. Use size_t or uint64_t not unsigned. This will fail for very large ELF files. | |
tools/llvm-objcopy/Object.h | ||
400 | Sorry, I meant, what is the DecompressedAlign member variable used for. Can we delete it? | |
tools/llvm-objcopy/llvm-objcopy.cpp | ||
436 | You should probably use function_ref instead of std::function, if I'm not mistaken. Also, I think that shouldReplace and addSection should be upper-case (i.e. ShouldReplace etc), since they are technically variables here. | |
1023 | "can not" -> cannot It feels to me like it would make sense to check both decompress and compress here, rather than in two completely different places, since they are closely related. I'd rather not rely on the library providing error messages, when we can error early and up-front. | |
1023–1024 | You can remove the last part of the error message after "--decompress-debug-sections" |
test/tools/llvm-objcopy/compress-debug-sections-zlib-gnu.test | ||
---|---|---|
11 | This still isn't a good enough test. All it does is show that we have a section named .debug_foo after decompression. It shows nothing about the properties of the section header or that the section has been decompressed. You need to check both (e.g. by comparing the contents of the section/section header before and after decompression). If you bring across the comparisons in compress-debug-sections.test, then that at least will cover the contents aspect. | |
tools/llvm-objcopy/Object.h | ||
369 | Why this? You should just provide getters. Don't use friend here. |
Still a few comments to address, but this is mostly okay now.
test/tools/llvm-objcopy/compress-debug-sections-zlib-gnu.test | ||
---|---|---|
11 | Thanks. One more suggestion: change "CHECK-FLAGS-DECOMPRESSED" to "CHECK-HEADER" and check the size field too. Then also add a check for this against the original object file too. That way, you'll be showing that the decompressed version is the same as the original version pre-compression. | |
test/tools/llvm-objcopy/compress-debug-sections.test | ||
6 |
This hasn't been addressed. | |
tools/llvm-objcopy/Object.h | ||
400 |
This hasn't been addressed yet either. Is the DecompressedSection::DecompressedAlign variable used for anything? Similarly, why do we need a separate DecompressedSize member variable for this section? What is wrong with just using the Size and Align variables available in the base class? | |
tools/llvm-objcopy/llvm-objcopy.cpp | ||
436 | This comment hasn't been addressed yet either. | |
1023 | If by that, you mean move the compress/zlib available check to the decompress check's location, that's fine. |
test/tools/llvm-objcopy/compress-debug-sections.test | ||
---|---|---|
6 | I dont understand what you mean? Do you mean the decompressed section is the same as original? |
test/tools/llvm-objcopy/compress-debug-sections.test | ||
---|---|---|
6 | No. I mean you should look at the examples in the testsuite, such as discard-all.test and do something similar to show that the input file (%t.o) is not modified, by copying it first and comparing it against the copy after you have run llvm-objcopy. |
test/tools/llvm-objcopy/compress-debug-sections.test | ||
---|---|---|
6 | I hope this is what you meant. |
LGTM, if you make the suggested test changes.
test/tools/llvm-objcopy/compress-debug-sections-zlib-gnu.test | ||
---|---|---|
4 | Sorry, I guess I've confused you. You don't need to copy the input in this particular test, as long as one test (i.e. compress-debug-sections.test - see my inline comments) shows that llvm-objcopy does not change the input. When you do llvm-readobj -relocations -s %t below, just do it on %t.o. | |
test/tools/llvm-objcopy/compress-debug-sections.test | ||
5 | Rename %t to %t.o, and copy it to %t.copy.o. This is more self-documenting as to the purpose, I think. | |
6 | Nearly, I'll be more specific below. | |
9 | Add the following here: # RUN: cp %tz.o %tz.copy.o # RUN: cp %tzg.o %tzg.copy.o | |
20 | Add the following here: # RUN: cmp %t.o %t.copy.o # RUN: cmp %tz.o %tz.copy.o # RUN: cmp %tzg.o %tzg.copy.o |
Sorry, I guess I've confused you. You don't need to copy the input in this particular test, as long as one test (i.e. compress-debug-sections.test - see my inline comments) shows that llvm-objcopy does not change the input.
When you do llvm-readobj -relocations -s %t below, just do it on %t.o.