This is an archive of the discontinued LLVM Phabricator instance.

ObjectFileELF: Add support for compressed sections
ClosedPublic

Authored by labath on Nov 29 2017, 10:52 AM.

Details

Summary

We use the llvm decompressor to decompress SHF_COMPRESSED sections. This enables
us to read data from debug info sections, which are sometimes compressed,
particuarly in the split-dwarf case. This functionality is only available if
llvm is compiled with zlib support.

Event Timeline

labath created this revision.Nov 29 2017, 10:52 AM
clayborg accepted this revision.Nov 29 2017, 10:56 AM
This revision is now accepted and ready to land.Nov 29 2017, 10:56 AM
zturner edited edge metadata.Nov 29 2017, 11:01 AM

It's too bad this has to be written as a unit test, because this would be the perfect candidate for a FileCheck style test.

Probably a long shot, but have you tried the llvm-lit project? Last time I tried it, it basically worked, but there were only a handful of tests in it. It might be possible to write a test in such a way that it invokes lldb with a .lldbinit file which enables logging to a file, ends with the quit command, and then FileChecks the log file.

Why would we text scrape when we can test the API?

For the same reason that the entire rest of the LLVM project and all other subprojects do it, when it makes sense and the nature of the test lends itself to it.

Note that there's no interactivity here. This is "feed some input, get some output, make sure the output is correct". That's exactly what FileCheck is designed for. This isn't even testing the public API, it's testing the private API. We should prefer testing the actual program in this case.

davide requested changes to this revision.Nov 29 2017, 11:12 AM

This one is a little weird when written as unittest. Not the worst thing, but I agree it should use llvm-lit.
Can you give it a shot, Pavel? If that doesn't work, we should at least evaluate the amount of work needed to get llvm-lit to run with lldb before dismissing it entirely.
BTW, nice to see lldb getting more and more tests, regardless :)

This revision now requires changes to proceed.Nov 29 2017, 11:12 AM

Another very good reason for writing a FileCheck test rather than a unittest is that writing unittest is tedious :)
In particular for new contributors, FileCheck tests are much easier to write and in this case testing the API surface doesn't seem to add much value.

It does look a little weird as a unit test, but to me this is mostly because it would been much simpler to write it as a regular SB API test.

Anyway, I really don't want the details of the text output of lldb commands to become API. Our experience with gdb was that over time as you write more and more tests that scrape text output, you end up not being able to change command output because the burden of fixing up all the tests becomes too onerous. You can use text scraping in the current lldb testsuite. We discourage that for the reasons above, and try to isolate the tests that do so by having lldbutils interfaces to do the explicit scraping. But it is just as easy, and quite often much easier, to examine objects directly in the lldb testsuite, so this mechanism encourages virtue, even though it doesn't enforce it.

OTOH adding a test mechanism that explicitly relies only on command output scraping leads us down the path that ended up being a real PITN for the gdb testsuite. So for that reason I am not in favor of going this way.

labath updated this revision to Diff 124926.Nov 30 2017, 6:02 AM
labath edited edge metadata.

This rewrites the test in terms on the new lldb-test utility. It should be applied on top of D40636.

While doing that, I noticed a discrepancy in the data presented by the object
file interface -- for GetFileSize(), it would return the compressed size, but,
when reading the data, it would return the decompressed size. This seemed odd
and unwanted.

So now I fetch the decompressed size when constructing the Section object, and
make sure GetFileSize result matches what the GetSectionData returns. This is
slightly odd as well, because now if someone looks at individual section file
offsets and sizes, it will seem that multiple sections overlap. While
unfortunate, this is a situation that can arise in without the presence of
compressed sections (no linker will produce a file like that, but you can
certainly hand-craft one), and our elf parser will hapily accept these files.

labath added inline comments.Nov 30 2017, 7:28 AM
lit/Modules/compressed-sections.yaml
1

It's right here. (I'm open to suggestions where to place it).

zturner added inline comments.Nov 30 2017, 9:04 AM
lit/Modules/compressed-sections.yaml
1

I see. I think part of the reason I didn't notice it is because it has a .yaml extension just like the old one, so I didn't notice this was really a test.

LLVM is a little inconsistent here (it has tests that end in .ll and .s, but not for most other file extensions), so can you rename this to compressed-sections.test?

At some point I think we should inject another directory in this hierarchy (i.e. lit/test/Modules), but since this is not going to be the first directory here, I guess it doesn't need to happen now.

12–13

Can you separate the CHECK lines and the YAML content? I think it makes it easier to follow this way, and it gives a consistent paradigm (checks first, then input, or vice versa). Interspersing them doesn't always work (for example if the tool doesn't output things in the same order as the input description).

17–18

Can you use CHECK-NEXT for these two? As it stands, if we output:

Name: .hello_elf
File size: -1
Data: -1

Name: .hello_coff
File size: 8
Data: 2030405060708090

It would pass, as written.

20

You should probably put this as the very first check statement. Each successfully matching CHECK line will update an internal position and subsequent checks will only start from that position, so here you're only checking that after .bogus does not occur after .hello_elf, but this test would pass if .bogus occurred before .hello_elf. But putting the CHECK-NOT first, both will fail (this is also a good reason not to intersperse the check lines).

labath marked 2 inline comments as done.Dec 1 2017, 4:42 AM
labath added inline comments.
lit/Modules/compressed-sections.yaml
1

llvm (and lld) also have plenty of tests ending in .yaml. Since this is a yaml file, and plenty of editors have syntax highlighting for yaml, it seems a pitty not to take advantage of that.

20

Putting CHECK-NOT first will just make sure that .bogus does not appear *before* the first CHECK match. I put it last as it this is the place it is likely to be if it did we did end up outputting it, but if we want to be safe, I guess we have two options:

  • put it both at the front *and* back
  • have two FileCheck invocations

I chose the latter.

labath updated this revision to Diff 125110.Dec 1 2017, 4:43 AM

Rebase on master and update the test.

zturner added inline comments.Dec 1 2017, 7:07 AM
lit/Modules/compressed-sections.yaml
20

I don't believe this is correct, and if it is then someone has introduced a bug in FileCheck. matches do not succeed or fail based on what check lines come after. They only succeed or fail based on the current file position. If the file position is 0, and you say CHECK-NOT, then you are checking that it does not appear anywhere in the file (i.e. anywhere starting at position 0). Assuming the test passes (i.e. it does not find it), the file position is not updated and then the CHECK line continues by making sure that it does appear. And so on and so forth.

zturner added inline comments.Dec 1 2017, 7:10 AM
lit/Modules/compressed-sections.yaml
1

I don't feel too strongly about this, but I do have a mild preference for having it end in .test. Another alternative to still get syntax highlighting is to have a Inputs folder and put the .yaml file there adn have the test file reference it. I'll defer to davide for a second opinion. If he's ok with the .yaml extension, I guess that's fine.

labath added inline comments.Dec 1 2017, 7:12 AM
lit/Modules/compressed-sections.yaml
20

I don't know what you're basing your claim on, but this behavior is consistent with FileCheck documentation here https://llvm.org/docs/CommandGuide/FileCheck.html:

The “CHECK-NOT:” directive is used to verify that a string doesn’t occur between two matches (or before the first match, or after the last match).
zturner added inline comments.Dec 1 2017, 7:22 AM
lit/Modules/compressed-sections.yaml
20

Well I guess the best way to be sure is to test it, and... you're right. Weird. I've been using it wrong all this time.

I almost feel like we need a CHECK-NOT-DAG or something. Anyway, your solution looks fine.

@davide: Any thoughts on .yaml as a test file suffix?

@clayborg: What do you think about my comment about GetFileSize() of compressed sections

This rewrites the test in terms on the new lldb-test utility. It should be applied on top of D40636.

While doing that, I noticed a discrepancy in the data presented by the object
file interface -- for GetFileSize(), it would return the compressed size, but,
when reading the data, it would return the decompressed size. This seemed odd
and unwanted.

So now I fetch the decompressed size when constructing the Section object, and
make sure GetFileSize result matches what the GetSectionData returns. This is
slightly odd as well, because now if someone looks at individual section file
offsets and sizes, it will seem that multiple sections overlap. While
unfortunate, this is a situation that can arise in without the presence of
compressed sections (no linker will produce a file like that, but you can
certainly hand-craft one), and our elf parser will hapily accept these files.

I think GetFileSize() should remain the number of bytes of the section on disk and we should add new API if we need to figure out the decompressed size. Or maybe when we get bytes from a compressed section we are expected to always just get the raw bytes, then we check of the section is compressed, and if so, then we call another API on ObjectFile to decompress the data. So I would prefer GetFileSize() to return the file size of the section size in the file and not the decompressed size. Is there a way to make this work?

@davide: Any thoughts on .yaml as a test file suffix?

I think this is fine.

I think GetFileSize() should remain the number of bytes of the section on disk and we should add new API if we need to figure out the decompressed size. Or maybe when we get bytes from a compressed section we are expected to always just get the raw bytes, then we check of the section is compressed, and if so, then we call another API on ObjectFile to decompress the data. So I would prefer GetFileSize() to return the file size of the section size in the file and not the decompressed size. Is there a way to make this work?

Yes, that's possible. The first version of this patch had GetFileSize return the on-disk size, but it was weird because then GetSectionData returned a different size. I guess it would stop being "weird" if we add an extra GetDecompressedSize method and document that GetSectionData returns decompressed data. I don't think we can use GetByteSize to return the decompressed size, as we use this value to denote the size in the process memory, and expect it to be zero for non-loadable sections. It is true that the elf spec says no loadable section can be compressed, so we theoretically wouldn't have a conflict here, but I don't think we will be doing anyone a favour by overloading GetByteSize this way.

I don't like the idea of needing to do an extra call to decompress data, as it will complicate clients and I think all clients will want to use the data in the decompressed form.

Sounds good,. So the solution will be:

  • Section::GetFileSize() will return the size in bytes of the section data as it appears in the file
  • Section::GetByteSize() will return the size in bytes for when this section is loaded into process memory (we might consider renaming this to "GetLoadSize()" then?)
  • Getting section data might return more data that GetByteSize() if it needs to be decompressed and decompression will happen automatically

Does that sound right?

Does that sound right?

Yes, I'll get on it.

labath updated this revision to Diff 126975.Dec 14 2017, 8:47 AM

The version where Section::GetFileSize reports the on-disk (compressed) size. I
also like the idea of renaming Section::GetByteSize to something more
descriptive, and I'll make a follow-up patch to do that.

aprantl removed a subscriber: aprantl.Dec 14 2017, 8:49 AM
clayborg accepted this revision.Dec 14 2017, 9:19 AM

Move #include of "llvm/Object/Decompressor.h" into CPP file and this is good to go.

source/Plugins/ObjectFile/ELF/ObjectFileELF.h
24

Move to .cpp file? Nothing in header file seems like it is needed.

This revision was not accepted when it landed; it landed in state Needs Review.Dec 15 2017, 6:24 AM
This revision was automatically updated to reflect the committed changes.
tzik added a subscriber: tzik.Dec 15 2017, 11:37 PM
tzik added inline comments.
lldb/trunk/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
3496 ↗(On Diff #127120)

This adds new dependency to LLVM Object component.
Could you add it into LINK_COMPONENTS section of CMakeLists.txt in this directory?

labath added inline comments.Dec 18 2017, 2:52 AM
lldb/trunk/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
3496 ↗(On Diff #127120)

Done in r320967. Thanks for pointing this out.