This is an archive of the discontinued LLVM Phabricator instance.

[Support] Add llvm::compression::{getReasonIfUnsupported,compress,decompress}
ClosedPublic

Authored by MaskRay on Jul 25 2022, 11:26 AM.

Details

Summary

as high-level API on top of llvm::compression::{zlib,zstd}::*.

  • getReasonIfUnsupported: return nullptr if the specified format is supported, or (if unsupported) a string like LLVM was not built with LLVM_ENABLE_ZLIB ...
  • compress: dispatch to zlib::uncompress or zstd::uncompress
  • decompress: dispatch to zlib::uncompress or zstd::uncompress

Move llvm::DebugCompressionType from MC to Support to avoid Support->MC cyclic
dependency. There are 40+ uses in llvm-project.

Add another enum class llvm::compression::Format to represent supported
compression formats, which may be a superset of ELF compression formats.

See D130458 (llvm-objcopy --{,de}compress-debug-sections for zstd) for a use
case.


Note: this patch alone will cause -Wswitch to llvm/lib/ObjCopy/ELF/ELFObject.cpp

Diff Detail

Event Timeline

MaskRay created this revision.Jul 25 2022, 11:26 AM
MaskRay requested review of this revision.Jul 25 2022, 11:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2022, 11:26 AM

@ckissane sounded like you/your colleagues already had some outstanding patches/plans to generalize the compression support - should we be considering those patches/directions alongside this one?

I'd had in mind something like a compression algorithm base class, two (or three, if it's useful to have a default implementation for "None") derived classes, singleton instances of each and a function that takes the algorithm type and returns the handler - which can then be queried for "is available" and provides compress/decompress actions. That'd mean having only one switch instead of three, but the three are now so close together that it's not hugely painful to maintain (though having a class hierarchy would ensure the APIs were actually identical, whereas currently they're only identical by convention).

MaskRay added inline comments.Jul 25 2022, 11:59 AM
llvm/include/llvm/MC/MCTargetOptions.h
29

If llvm::CompressionFormat (or possibly under llvm::compression) looks better, I'll also rename Z to Zlib.

ckissane added a comment.EditedJul 25 2022, 1:46 PM

@ckissane sounded like you/your colleagues already had some outstanding patches/plans to generalize the compression support - should we be considering those patches/directions alongside this one?

I'd had in mind something like a compression algorithm base class, two (or three, if it's useful to have a default implementation for "None") derived classes, singleton instances of each and a function that takes the algorithm type and returns the handler - which can then be queried for "is available" and provides compress/decompress actions. That'd mean having only one switch instead of three, but the three are now so close together that it's not hugely painful to maintain (though having a class hierarchy would ensure the APIs were actually identical, whereas currently they're only identical by convention).

@dblaikie indeed I do!
I have some work on https://github.com/ckissane/llvm-project/tree/ckissane.compression-class-simple that I will put into a patch shortly!
It does exactly what you said, makes a base class, a none class, and zstd and zlib classes for compression, and refactors code to take advantage of this.
https://reviews.llvm.org/D130516

@ckissane sounded like you/your colleagues already had some outstanding patches/plans to generalize the compression support - should we be considering those patches/directions alongside this one?

I'd had in mind something like a compression algorithm base class, two (or three, if it's useful to have a default implementation for "None") derived classes, singleton instances of each and a function that takes the algorithm type and returns the handler - which can then be queried for "is available" and provides compress/decompress actions. That'd mean having only one switch instead of three, but the three are now so close together that it's not hugely painful to maintain (though having a class hierarchy would ensure the APIs were actually identical, whereas currently they're only identical by convention).

@dblaikie indeed I do!
I have some work on https://github.com/ckissane/llvm-project/tree/ckissane.compression-class that I will put into a patch shortly!
It does exactly what you said, makes a base class, a none class, and zstd and zlib classes for compression, and refactors code to take advantage of this.

(You may want to use rebase instead of merge. Merge makes it a bit difficult to see the real changes.)

jhenderson added inline comments.Jul 26 2022, 12:30 AM
llvm/include/llvm/Support/Compression.h
34

Since there's more than one member, it should be plural.

36

I don't like the magic -1 here. I think it should either be stored in a constant, or possibly this should be an Optional.

MaskRay added inline comments.Jul 26 2022, 12:32 AM
llvm/include/llvm/Support/Compression.h
36

I have thought about Optional. I pick -1 to be more similar with *::compress's Level and avoid an Optional include. Happy to switch though.

jhenderson added inline comments.Jul 26 2022, 12:35 AM
llvm/include/llvm/Support/Compression.h
36

I'm not too fussed either way - I'd be happy with -1, if it were a named constant, e.g. DefaultLevel or something like that.

MaskRay updated this revision to Diff 448251.Jul 28 2022, 12:42 AM
MaskRay marked 3 inline comments as done.

address comments

dblaikie added inline comments.Jul 28 2022, 12:27 PM
llvm/include/llvm/Support/Compression.h
44–51

It seems to me somewhat unfortunate that these will generally be called somewhat together (isAvailable then compress or isAvailable then decompress) but don't share state/end up switching over compression type, etc, twice.

I've mentioned on the other reviews (maybe we could/should have a discourse thread about this, rather than spreading conversation between alternatives, preliminary patches, etc) that I think something like:

CompressionAlgorithm *C = getCompressionAlgorithm(DebugCompressionType::Zlib);
if (C) {
  C->compress(level, input, output);
}

Would be pretty good to me - possible variant (mentioned on one of the other reviews) would be that this returns non-null even if the algorithm isn't available (in which case it could return by reference) so it can communicate other things (like the name of the build flag needed to enable that algorithm - so it can be written/mapped from the enum in one place, and used in various error messages for various tools, etc)

llvm/lib/Support/Compression.cpp
46

llvm_unreachable should be preferred over assert false ( https://llvm.org/docs/CodingStandards.html#assert-liberally discusses this)

MaskRay added inline comments.Jul 29 2022, 12:08 PM
llvm/include/llvm/Support/Compression.h
44–51

I posted a small-write up here:
https://reviews.llvm.org/D130516#3688123

llvm/lib/Support/Compression.cpp
46

Thanks for mentioning the doc. For this place I hesitated while thinking about llvm_unreachable but picked assert because the user can call the function with any arguments and P.Format==DebugCompressionType::None reachable. The function is supposed to be unreachable from call sites, though...

dblaikie added inline comments.Jul 29 2022, 1:26 PM
llvm/lib/Support/Compression.cpp
46

In the same way that a user can call a function that takes a non-null pointer with a null pointer, resulting in an assertion - that's OK. It's an invariant of the function that P.Format != DebugCompressionType::None, so it's OK/suitable to use unreachable. (using an assert and then having some code "in case the assert fails" is inconsistent - it's defining behavior when an invariant is violated, but that behavior is then untested and unreliable)

But, yeah, guess we can leave this discussion for now while we resolve the larger design discussions.

MaskRay updated this revision to Diff 457737.Sep 2 2022, 5:41 PM
MaskRay retitled this revision from [Support] Add llvm::compression::{isAvailable,compress,uncompress} to [Support] Add llvm::compression::{getReasonIfUnsupported,compress,decompress}.
MaskRay edited the summary of this revision. (Show Details)

redesign

ckissane accepted this revision.Sep 6 2022, 9:51 AM

Looks like my concerns about the enum coupling and none vs unsupported semantics are addressed, one nit though.
LGTM, however, @dblaikie may want to take a pass as well. (Though I think this should be fine with him, because even if the final state of compression api in llvm is more OO, this regardless moves it closer in that direction than the original api.)

llvm/include/llvm/Support/Compression.h
26–27

I think a renaming would be good to keep this extra clear.
Not a blocker though
it could also be good to keep this enum closer to elf-related code.

This revision is now accepted and ready to land.Sep 6 2022, 9:51 AM
ckissane added inline comments.Sep 6 2022, 9:56 AM
llvm/include/llvm/Support/Compression.h
26–27

Note that you state:

Move llvm::DebugCompressionType from MC to Support to avoid Support->MC cyclic
dependency. There are 40+ uses in llvm-project.

would moving both this enum and

inline Format formatFor(DebugCompressionType Type)

into MC also resolve this?

MaskRay marked 2 inline comments as done.Sep 6 2022, 10:45 AM
MaskRay added inline comments.
llvm/include/llvm/Support/Compression.h
26–27

The DebugCompressionType rename change is what I want to avoid. The type may be reused by another object file format and there are ~40 uses in llvm-project which would need updating.

MC depends on Support, so Support cannot depend on MC (cyclic). The type is therefore moved to the lower level: Support.

dblaikie accepted this revision.Sep 7 2022, 4:22 PM

I still have reservations, but the discussion doesn't seem productive, so let's go with this. Some minor tweaks to make, if you like.

llvm/include/llvm/Support/Compression.h
26–27

I think it's probably reasonable to understand DebugCompressionType as an API level enum, not necessarily 1:1 with the actual value in the ELF format - it probably wouldn't be a bad idea to separate those at some point, but not the end of the world.

78–81

I don't think it's worth adding this type - DebugCompressionType seems fine, and some functions that consume it (like compress/decompress) can have a precondition that it is not None for those functions.

94–106

Not sure if this structure is carrying its weight? The level would only be a parameter to compress and not decompress, right? (so decompress should only take Format, not Params (& maybe just take DebugCompressionKind if Format is dropped)) & then if the struct is only used to group two parameters (format and level) for one function call, maybe that's not worth it and we can just take two parameters in compress?

MaskRay added a subscriber: phosek.Sep 7 2022, 5:14 PM
MaskRay added inline comments.
llvm/include/llvm/Support/Compression.h
78–81

A separate type was @ckissane @leonardchan @phosek 's request.

(My original change was to reuse DebugCompressionType for all llvm-project compression. I just did not rename it.)

I think with the current code, this is a small-enough difference. Even if we decide to merge the types, the use sites will get minimum updates (just renaming).

94–106

I agree Level is not useful for decompress. Let me use Format for decompress.

I will keep this structure because compress will share another parameter expressing the degree of concurrency, so its 3 parameters for compress.

MaskRay updated this revision to Diff 458615.Sep 7 2022, 5:35 PM

simplify decompress as suggested

MaskRay marked an inline comment as done.Sep 7 2022, 11:45 PM
MaskRay added inline comments.
llvm/include/llvm/Support/Compression.h
78–81

In offline chat with dblaikie:

"""
So the ELF uses would need a new enum that only provides Zstd/Zlib, and the other users could have a wider enum with this new third compression type?

Yeah, fair enough I guess - I'm not super fussed about addressing that until we cross that bridge I suppose.
"""

Yes, that is the case. I consider this resolved and will land this soon:) Thanks for being open.

This broke compilation with GCC:

In file included from ../include/llvm/MC/MCTargetOptions.h:13,
                 from ../include/llvm/MC/MCAsmInfo.h:20,
                 from ../lib/MC/MCAsmStreamer.cpp:15:
../include/llvm/Support/Compression.h:100:10: error: declaration of ‘llvm::compression::Format llvm::compression::Params::Format’ changes meaning of ‘Format’ [-fpermissive]
  100 |   Format Format;
      |          ^~~~~~
../include/llvm/Support/Compression.h:78:12: note: ‘Format’ declared here as ‘enum class llvm::compression::Format’
   78 | enum class Format {
      |            ^~~~~~