This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Support --{,de}compress-debug-sections for zstd
ClosedPublic

Authored by MaskRay on Jul 24 2022, 10:46 PM.

Diff Detail

Event Timeline

MaskRay created this revision.Jul 24 2022, 10:46 PM
MaskRay requested review of this revision.Jul 24 2022, 10:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 24 2022, 10:46 PM
MaskRay updated this revision to Diff 447190.EditedJul 24 2022, 10:52 PM
MaskRay edited the summary of this revision. (Show Details)

To test a LLVM_ENABLE_ZSTD=on build:

git clone https://github.com/facebook/zstd --depth 1
cd zstd
cmake -Hbuild/cmake -Bout/release -GNinja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/tmp/opt/zstd
ninja -C out/release install

...
cmake ... -DLLVM_ENABLE_ZSTD=on -DCMAKE_PREFIX_PATH=/tmp/opt/zstd
MaskRay updated this revision to Diff 447192.Jul 24 2022, 11:08 PM

add more test

dblaikie added a subscriber: dblaikie.
dblaikie added inline comments.Jul 24 2022, 11:50 PM
llvm/docs/CommandGuide/llvm-objcopy.rst
300–301

Might be worth removing zlib-gnu in a separate patch before this one, so if zstd is reverted we don't lose the fix to remove zlib-gnu?

llvm/lib/ObjCopy/ELF/ELFObject.cpp
437–438

Could probably do this refactor/inlining in a separate commit.

443

uncompress V decompress? Seems most of the wording here uses decompress & by google search results it seems the more common term.

448–456

sounded like the folks contributing zstd support had some plans to make modular compression algorithm wrappers - that I'd guess would look more like "get the compression handler, given the compression type, then use it" so wouldn't need to have two separate calls to uncompress here?

451–461

Was there no error handling before & any compression type was treated as zlib & decompressed using zlib?

540–541

Might be better to add the case for DebugCompressionType::None here so that we get a compilation error (uncovered switch) next time a compression algorithm is added to the enum?

llvm/tools/llvm-objcopy/ObjcopyOptions.cpp
720–735

Maybe make these a switch over DebugCompressionType so we can catch this as a compilation error if a new one is added?

MaskRay updated this revision to Diff 447206.Jul 25 2022, 12:39 AM
MaskRay marked 4 inline comments as done.

address comments

llvm/docs/CommandGuide/llvm-objcopy.rst
300–301

Will do. I wanted to check the wording here: whether style should be changed to format and how we should state Use zlib if <format> is omitted.

Just jumped the gun a bit and pushed a commit to remove zlib-gnu and switch to format.
style makes less sense now as the zlib-gnu style has been removed.

llvm/lib/ObjCopy/ELF/ELFObject.cpp
443

"decompress" makes sense to me. I used "uncompress" because the API and many compression libraries use the name.

https://datatracker.ietf.org/doc/html/rfc8478 says that "uncompressed" is related to the state and "decompress" seems preferred for the act of processing.

448–456

This can be a future improvement when the API is available.

451–461

Right. There was no test before.

dblaikie added inline comments.Jul 25 2022, 9:15 AM
llvm/lib/ObjCopy/ELF/ELFObject.cpp
448–456

Some coordination might be good if those folks already have outstanding patches/some planned direction, though?

MaskRay marked 3 inline comments as done.
MaskRay marked an inline comment as done.
jhenderson added inline comments.Jul 26 2022, 12:26 AM
llvm/lib/ObjCopy/ELF/ELFObject.h
539

There's only one ChType in this context, not an original and a modified one, so I'm not sure we need the Orig prefix? (Also applies to the getter obviously)

llvm/test/tools/llvm-objcopy/ELF/compress-debug-sections-zstd.test
3

To me, "decompression" sounds much more natural than "uncompression" (FWIW, I'm not sure I'd ever use the verb "uncompress").

Later on, UNCOMPRESSED should probably be the more specific DECOMPRESSED (see https://english.stackexchange.com/a/56483), since it's a reference to something that has undergone the reverse of compression.

30–31

This comment makes me think it might be good to have a test case that shows we don't convert a zstd section to a zlib section if specified with --compress-debug-sections=zlib and vice versa.

llvm/tools/llvm-objcopy/ObjcopyOptions.cpp
742

FWIW, it should be "cannot", to conform to modern English trends ("can not" means the same, but is much less common).

MaskRay updated this revision to Diff 447597.Jul 26 2022, 12:42 AM
MaskRay marked 4 inline comments as done.

address comments

llvm/test/tools/llvm-objcopy/ELF/compress-debug-sections-zstd.test
3

Thanks for the reference. It matches the terminology in https://datatracker.ietf.org/doc/html/rfc8478#section-2 , too. I fixed some places to use "decompress" but forgot this one.

30–31

# RUN: %if zlib provides the test.

jhenderson added inline comments.Jul 26 2022, 1:59 AM
llvm/test/tools/llvm-objcopy/ELF/compress-debug-sections-zstd.test
3

Looks like you didn't change the check prefixes?

30–31

Sorry, misread the test somehow.

MaskRay updated this revision to Diff 447752.Jul 26 2022, 10:23 AM
MaskRay marked 2 inline comments as done.

update test check prefix: UNCOMPRESSED=>DECOMPRESSED

jhenderson accepted this revision.Jul 27 2022, 1:47 AM

Looks good. Might want to give @dblaikie a bit in case he wants to add anything else.

This revision is now accepted and ready to land.Jul 27 2022, 1:47 AM

looks like the alternative (with other changes) is posted here: https://reviews.llvm.org/D130516 so would be good to resolve the design tradeoffs between these before committing either.

MaskRay added a comment.EditedJul 27 2022, 1:27 PM

Yes!

compression::Param P;
P.Format = CompressionType;
compression::compress(P, OriginalData, CompressedData);

is a lightweight use of the compression interface. I could move back the previous version which uses the slightly verbose switch, then this patch will have no dependency on D130506.

@ckissane What I want to demonstrate in this patch is that we should use multiple patches adding zstd support to different components, instead of having a monolithic D130516.

  • https://gregoryszorc.com/blog/2020/01/07/problems-with-pull-requests-and-how-to-fix-them/ "Previous studies have found that the number of useful comments decreases and the review latency increases as the size of the change increases."
  • When we add zstd support to various components, they should have dedicated tests.
  • Each component is cared by its maintainers/regular reviewers/users. Having a refactoring is fine, but bundling zstd features in the huge patch may surprise them. There are many factors to consider, including what to do when zstd support is unavailable.
  • Many components don't necessarily migrate to zstd.

Note: Having the big change including everything is still very useful, as it gives the full picture and may enlighten useful design discussions.

FWIW if I were to create all patches, I would do these. The ordering is somewhat loose, but I'll start with llvm-objcopy

  • add llvm-objcopy support
  • add MC support (this can be tested with llvm-objcopy)
  • add clang -gz=zstd support
  • add lld input support (D129406)
  • add lld output support (after MC support, I actually have a local patch)
  • add lldb support
  • ...

Yes!

compression::Param P;
P.Format = CompressionType;
compression::compress(P, OriginalData, CompressedData);

is a lightweight use of the compression interface. I could move back the previous version which uses the slightly verbose switch, then this patch will have no dependency on D130506.

@ckissane What I want to demonstrate in this patch is that we should use multiple patches adding zstd support to different components, instead of having a monolithic D130516.

  • https://gregoryszorc.com/blog/2020/01/07/problems-with-pull-requests-and-how-to-fix-them/ "Previous studies have found that the number of useful comments decreases and the review latency increases as the size of the change increases."
  • When we add zstd support to various components, they should have dedicated tests.
  • Each component is cared by its maintainers/regular reviewers/users. Having a refactoring is fine, but bundling zstd features in the huge patch may surprise them. There are many factors to consider, including what to do when zstd support is unavailable.
  • Many components don't necessarily migrate to zstd.

Note: Having the big change including everything is still very useful, as it gives the full picture and may enlighten useful design discussions.

FWIW if I were to create all patches, I would do these. The ordering is somewhat loose, but I'll start with llvm-objcopy

  • add llvm-objcopy support
  • add MC support (this can be tested with llvm-objcopy)
  • add clang -gz=zstd support
  • add lld input support (D129406)
  • add lld output support (after MC support, I actually have a local patch)
  • add lldb support
  • ...

I totally agree!
we could try to at least get in my compression class refactor and cl::opt<CompressionAlgorithm*> support as a patch though if you think that's appropriate?

Yeah, I'm all in favor of smaller incremental patches - but more a question of the general API direction here. Whether to go with multiple entry points that each take the compression scheme and switch over it, or something with inheritance that switches over compression scheme once and then uses dynamic dispatch for compression/decompression/etc. I do see the point of ( https://reviews.llvm.org/D130516#3677580 ) that the inheritance scheme might be overkill for two (maybe three if the default/null/no-op compression scheme is worth having) implementations, but might still be suitable - I think it'll be easier to compare once patches are split up/more isolated.

Thanks for the replys! Personally I think the object-oriented style design is a bit overkill. I think most call sites really don't care about constructing a compression/decompression class. A free function call like

compression::Param P;
P.Format = CompressionType;
// Some callers may want to set --threads here. The user needs to be aware that different threads may lead to different output. For the same number of threads, the output should be deterministic, as guaranteed by zstd.
compression::compress(P, OriginalData, CompressedData);

is just fine. If you can simplify the OO enough so that there is less boilerplate for each call site, happy to take another look. Note: even if an OO design is used, it's possible that we may want to introduce free functions and the end result looks exactly very similar to this patch.

If you consider that even if D130506 is not the final form, it will be close enough to the final form. We can push the two patches and you can continue refining the interface.
The llvm-objcopy serves as an example for you to add support for other components.

MaskRay updated this revision to Diff 448380.Jul 28 2022, 10:44 AM
MaskRay edited the summary of this revision. (Show Details)

remove dependency on the API change (i.e. switch to a previous revision)

This revision was landed with ongoing or failed builds.Jul 28 2022, 10:46 AM
This revision was automatically updated to reflect the committed changes.

remove dependency on the API change (i.e. switch to a previous revision)

Sorry, I made a mistake here. The patch did not depend on D130506. A second-to-last update added the dependency and made it subject to the D130506/D130516 API design debate.

@dblaikie Realized I didn't make sure you were fine with this incremental direction - would you like me to revert while we settle on free function/classe based design first?

remove dependency on the API change (i.e. switch to a previous revision)

Sorry, I made a mistake here. The patch did not depend on D130506. A second-to-last update added the dependency and made it subject to the D130506/D130516 API design debate.

I think I follow that/understood that to be what you were saying originally. Though I think either way this patch is part of that design discussion - not designing the API and using a switch is a design choice in this space/one option to consider, but I'd like to consider them before going ahead with any.

@dblaikie Realized I didn't make sure you were fine with this incremental direction - would you like me to revert while we settle on free function/classe based design first?

Yeah, I think it'd be good to revert this for now, thanks - this patch might be a good one to compare side-by-side with alternatives.

MaskRay reopened this revision.Sep 2 2022, 5:41 PM
This revision is now accepted and ready to land.Sep 2 2022, 5:41 PM

Thanks for review. (Doing a final round of testing like asan, .-DLLVM_ENABLE_ZSTD=FORCE_ON -DCMAKE_PREFIX_PATH=/tmp/opt/zstd, etc before re-landing)