Page MenuHomePhabricator

[llvm] compression classes
Needs ReviewPublic

Authored by ckissane on Mon, Jul 25, 2:26 PM.

Details

Summary

Adds a CompressionKind "enum" with values:
CompressionKind::Unknown ≈ 255
CompressionKind::Zlib ≈ 1
CompressionKind::ZStd ≈ 2

also note: OptionalCompressionKind is typedef'ed as Optional<CompressionKind>, and NoneType() is used to indicate no compression.

The CompressionKind "enum" has overrides for several operators:

CompressionKind::operator uint8_t() const
(Self-explanatory)

CompressionKind::operator bool() const
(true if supported, false otherwise)

bool operator==(CompressionKind a, CompressionKind b)

(Self-explanatory)

none compression is represented simply as a none type for use cases that will use Optional<CompressionKind>.
once you have a CompressionKind itself you can pass it around as a value (because a CompressionKind is just a struct containing one uint8_t (a fake "enum")).
due to my operator overloading, you can do stuff like this:

llvm::compression::OptionalCompressionKind OptionalCompressionScheme =
          llvm::compression::getOptionalCompressionKind(CompressionSchemeId);
      if (!OptionalCompressionScheme) {
        return llvm::MemoryBuffer::getMemBuffer(Blob, Name, true);
      }
      llvm::compression::CompressionKind CompressionScheme =
          *OptionalCompressionScheme;
      if (!CompressionScheme) {
        Error("compression class " +
              (CompressionScheme->Name + " is not available").str());
        return nullptr;
      }
      SmallVector<uint8_t, 0> Uncompressed;
      if (llvm::Error E = CompressionScheme->decompress(
              llvm::arrayRefFromStringRef(Blob), Uncompressed, Record[0])) {
        Error("could not decompress embedded file contents: " +
              llvm::toString(std::move(E)));
        return nullptr;
      }
      return llvm::MemoryBuffer::getMemBufferCopy(
          llvm::toStringRef(Uncompressed), Name);

(excerpt from ASTReader.cpp)

you can even do calls like

llvm::compression::CompressionKind::Zlib->decompress(...)
(this can be useful in cases like elf decompression where you might switch on debug section type and in a switch case for DebugCompressionType::Z)

I also believe similar semantics to the nullptr suggestion have been achieved, due to the bool cast returning supported status, and the Unknown type acting as a nullptr of sorts, being always unsupported and -1 (255) as a uint8. Optional(Unknown) is returned from getOptionalCompressionKind(uint8_t) when it is not 0 (NoneType()), 1 (Optional(Zlib)), or 2 (Optional(ZStd)).

In places where explict compression must be used CompressionKind can be passed around, and possibly optional (sometimes none) compression is represented as llvm::Optional<CompressionKind> which I have type aliased as OptionalCompressionKind

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
  • trim down compression api: remove supported()
ckissane edited the summary of this revision. (Show Details)Tue, Aug 2, 12:01 PM
ckissane edited the summary of this revision. (Show Details)Tue, Aug 2, 12:09 PM

I'd like to make a few arguments for the current namespace+free function design, as opposed to the class+member function design as explored in this patch (but thanks for the exploration!).
Let's discuss several use cases.

(a) if a use case just calls compress/uncompress. The class design has slightly more boilerplate as it needs to get the algorithm class, a new instance, or a singleton instance.
For each new use, the number of lines may not differ, but the involvement of a a static class member or an instance make the reader wonder whether the object will be reused or thrown away.
There is some slight cognitive burden.
The class design has a non-trivial one-shot cost to have a function returning the singleton instance.

Though there must've been a condition that dominates this use somewhere - I'd suggest that condition could be where the algorithm is retrieved, and then passed to this code to use unconditionally.

If the algorithm object is const and raw pointers/references are used, I think it makes it clear to the reader that there's no ownership here, and it's not stateful when compressing/decompressing.

A pointer to a singleton compression class is isomorphic to an enum class CompressionType variable.

I don't mean to suggest that either design is fundamentally more or less functional - I'm totally OK with/agree that both design directions allow the implementation of all the desired final/end-user-visible functionality.

I'm trying to make a point about which, I think, achieves that goal in a "better" way - that's the space of design discussions, I think - what kinds of (developer, maintenance, etc) costs different designs incur.

Using an enum variable doesn't lose any usage pattern we can do with a pointer to a singleton compression class.

I agree that either design doesn't change what's possible - I do, though, think that the "usage patterns" are meaningfully different between the two designs.

An enum variable allows more patterns, as the allowed values are enumerable (we don't need to worry about -Wswitch for the uses).

Say, we do

auto *algo = !compression::ZlibCompression;
if (!algo)
  ...


algo->compress(...);

either together or apart, the result is similar to the following but with (IMO) slightly larger cognitive burden:

if (!compression::isAvailable(format))
  ...

compression::compress(format);

Specifically two APIs that are related (it's important/necessary to check for availability before calling compress or decompress) in their contracts but unrelated in their API use makes it easier to misuse the APIs and have a situation where the availability check doesn't cover the usage. That's what I think is important/important to discuss here.

(b) zlib compress/uncompress immediately following an availability check.

// free function
if (!compression::zlib::isAvailable())
  errs() << "cannot compress: " << compression::zlib::buildConfigurationHint();

// class
auto *algo = !compression::ZlibCompression;
if (!algo->isAvailable()) {
  errs() << "cannot compress: " << algo->buildConfigurationHint();
}

I think maybe this code might end up looking like:

Algo *algo = getAlgo(Zlib)
if (!algo)
  errs() ...

It's possible that this function would return non-null even for a non-available algorithm if we wanted to communicate other things (like the cmake macro name to enable to add the functionality)

I think this is similarly achieved with an enum variable.
With the class based approach, a pointer has a static type of the ancestor compression class and a dynamic type of any possible algorithm.
This is not different from that: the enum variable may have a value the enum class supports.

I agree that the code is similar in either case, but with a small difference that is important to me - that accessing the algorithm necessarily (to some degree - you could still have code that doesn't test the condition/dereferences null, the same way that code can dereference an empty Optional without checking first - but at least the API I'm suggesting makes clear there's a connection between availability and usage).

(c) zlib/zstd compress/uncompress immediately following an availability check.

// free function
if (!compression::isAvailable(format))
  errs() << "cannot compress: " << compression::buildConfigurationHint(format);

// class
std::unique_ptr<Compression> algo = make_compression(format);
if (!algo->isAvailable()) {
  errs() << "cannot compress: " << algo->buildConfigurationHint();
}

I don't think there's a need for unique_ptr here - algorithms can be constant singletons, referenced via raw const pointers/references without ownership.

& this example doesn't include the code that does the compression/decompression, which seems part of the discussion & part I find nice in that the type of compression used matches the type used in the check necessarily rather than being passed into two APIs independently.

Thanks for clarification. Then this fits my "singleton compression classes are isomorphic to an enum CompressionType variable" argument :)

I don't understand what you're saying here. Could you rephrase/expand a bit?

ckissane updated this revision to Diff 449375.Tue, Aug 2, 12:11 PM
  • CompressionKind: clean up param names to == op
ckissane added a comment.EditedTue, Aug 2, 12:17 PM

@dblaikie, @MaskRay I think I have worked out something that is the best of both worlds:
none compression is represented simply as a none type for use cases that will use Optional<CompressionKind>.
once you have a CompressionKind itself you can pass it around as a value (because a CompressionKind is just a struct containing one uint8_t (a fake "enum")).
due to my operator overloading, you can do stuff like this:

llvm::compression::OptionalCompressionKind OptionalCompressionScheme =
          llvm::compression::getOptionalCompressionKind(CompressionSchemeId);
      if (!OptionalCompressionScheme) {
        return llvm::MemoryBuffer::getMemBuffer(Blob, Name, true);
      }
      llvm::compression::CompressionKind CompressionScheme =
          *OptionalCompressionScheme;
      if (!CompressionScheme) {
        Error("compression class " +
              (CompressionScheme->getName() + " is not available").str());
        return nullptr;
      }
      SmallVector<uint8_t, 0> Uncompressed;
      if (llvm::Error E = CompressionScheme->decompress(
              llvm::arrayRefFromStringRef(Blob), Uncompressed, Record[0])) {
        Error("could not decompress embedded file contents: " +
              llvm::toString(std::move(E)));
        return nullptr;
      }
      return llvm::MemoryBuffer::getMemBufferCopy(
          llvm::toStringRef(Uncompressed), Name);

(excerpt from ASTReader.cpp)

you can even do calls like

llvm::compression::CompressionKind::Zlib->decompress(...)
(this can be useful in cases like elf decompression where you might switch on debug section type and in a switch case for DebugCompressionType::Z)

ckissane edited the summary of this revision. (Show Details)Tue, Aug 2, 12:18 PM
ckissane edited the summary of this revision. (Show Details)Tue, Aug 2, 12:25 PM

The current code here still seems more complicated than I'd prefer - looks like currently the size/speed/default levels are currently unused, so maybe we can omit those for now, knowing they will be added?
And the CompressionKind with all its operator overloads seems like a lot of surface area that is pretty non-obvious for usage - boolean testable, logical operator overloads, etc.
Could we have only one decompress/compress function each, for now?
& maybe leave out the name/enum from the base class for now, add it in later (& I think I mentionted in another comment those properties can be non-virtual, maybe even direct const members - passed into the base through the ctor from the derived class)

Maybe it's easier if I either post a patch, or at least more explicitly flesh out what I'm picturing/proposing/suggesting:
Header:

struct CompressionAlgorithm {
  virtual void Compress(...);
  virtual void Decompress(...);
};
enum class CompressionType {
  Zlib, Zstd
};
CompressionAlgorithm *getCompressionAlgorithm(CompressionType);

Implementation:

#if LLVM_ENABLE_ZLIB
struct ZlibCompressionAlgorthim : CompressionAlgorithm {
  void Compress(...) { ... }
  void Decompress(...) { ...}
}
#endif
...
CompressionAlgorithm *getCompressionAlgorithm(CompressionType T) {
  switch (T) {
  case CompressionType::Zlib: {
#if LLVM_ENABLE_ZLIB
    static ZlibCompressionAlgorithm A;
    return &A;
#else
    break;
#endif
  }
...
  }
  return nullptr;
}

Usage:

if (CompressionAlgorithm *C = getCompressionAlgorithm(CompressionType::Zlib) {
  C->compress(...);
}

And, yeah, I think it'd be suitable to eventually add name, type, size/speed/default levels:

struct CompressionAlgorithm {
  const StringRef Name;
  const CompressionType Type;
  const int DefaultLevel;
  const int BestSizeLevel;
  const int BestSpeedLevel;
  virtual void Compress(...);
  virtual void Decompress(...);
protected:
  CompressionAlgorithm(StringRef Name, CompressionType Type, ...) : Name(Name), Type(Type), ... {}
}
struct ZlibCompressionAlgorithm : CompressionAlgorithm {
  ZlibCompressionAlgorithm() : CompressionAlgorithm("zlib", CompressionType::Zlib, 5, 10, 1) { }
  /* as before */
};
...

Though those can be added as needed - good to keep in mind that they're a useful direction to go, but might simplify the review/discussion to omit them for now.

I think I have worked out something that is the best of both worlds:

I think @MaskRay's main concern, which I share to a degree, is that there's a lot of code/complexity here that doesn't currently seem warranted by the size of the problem. So adding more implementation complexity to this patch, even if it does provide some of the benefits (though I don't think the ability to do boolean logic, etc, is the main concern either myself or @MaskRay have - either way we'll have the enum) it's adding a lot more implementation complexity, which is something we're trying to address/reduce.

I'd like to make a few arguments for the current namespace+free function design, as opposed to the class+member function design as explored in this patch (but thanks for the exploration!).
Let's discuss several use cases.

(a) if a use case just calls compress/uncompress. The class design has slightly more boilerplate as it needs to get the algorithm class, a new instance, or a singleton instance.
For each new use, the number of lines may not differ, but the involvement of a a static class member or an instance make the reader wonder whether the object will be reused or thrown away.
There is some slight cognitive burden.
The class design has a non-trivial one-shot cost to have a function returning the singleton instance.

Though there must've been a condition that dominates this use somewhere - I'd suggest that condition could be where the algorithm is retrieved, and then passed to this code to use unconditionally.

If the algorithm object is const and raw pointers/references are used, I think it makes it clear to the reader that there's no ownership here, and it's not stateful when compressing/decompressing.

A pointer to a singleton compression class is isomorphic to an enum class CompressionType variable.

I don't mean to suggest that either design is fundamentally more or less functional - I'm totally OK with/agree that both design directions allow the implementation of all the desired final/end-user-visible functionality.

I'm trying to make a point about which, I think, achieves that goal in a "better" way - that's the space of design discussions, I think - what kinds of (developer, maintenance, etc) costs different designs incur.

[...]

I don't understand what you're saying here. Could you rephrase/expand a bit?

Maybe it's easier if I either post a patch, or at least more explicitly flesh out what I'm picturing/proposing/suggesting:

I agree that it is easier if you post a patch so that we can have discussion on concrete code. I think we are now at the "a picture is worth a thousand words" stage:)

The current code here still seems more complicated than I'd prefer - looks like currently the size/speed/default levels are currently unused, so maybe we can omit those for now, knowing they will be added?
And the CompressionKind with all its operator overloads seems like a lot of surface area that is pretty non-obvious for usage - boolean testable, logical operator overloads, etc.
Could we have only one decompress/compress function each, for now?
& maybe leave out the name/enum from the base class for now, add it in later (& I think I mentionted in another comment those properties can be non-virtual, maybe even direct const members - passed into the base through the ctor from the derived class)

Yes, I can continue to trim down the implementation! I agree with your sentiment.

Maybe it's easier if I either post a patch, or at least more explicitly flesh out what I'm picturing/proposing/suggesting:
Header:

struct CompressionAlgorithm {
  virtual void Compress(...);
  virtual void Decompress(...);
};
enum class CompressionType {
  Zlib, Zstd
};
CompressionAlgorithm *getCompressionAlgorithm(CompressionType);

Implementation:

#if LLVM_ENABLE_ZLIB
struct ZlibCompressionAlgorthim : CompressionAlgorithm {
  void Compress(...) { ... }
  void Decompress(...) { ...}
}
#endif
...
CompressionAlgorithm *getCompressionAlgorithm(CompressionType T) {
  switch (T) {
  case CompressionType::Zlib: {
#if LLVM_ENABLE_ZLIB
    static ZlibCompressionAlgorithm A;
    return &A;
#else
    break;
#endif
  }
...
  }
  return nullptr;
}

I agree with some of this, I have some strong thoughts I would like to work out about the whole nullptr being none or unsupported a little preemptively IMO.

Usage:

if (CompressionAlgorithm *C = getCompressionAlgorithm(CompressionType::Zlib) {

C->compress(...);

}

currently, you can do

if (CompressionKind C = CompressionKind::Zlib) {
  C->compress(...);
}

And, yeah, I think it'd be suitable to eventually add name, type, size/speed/default levels:

struct CompressionAlgorithm {
  const StringRef Name;
  const CompressionType Type;
  const int DefaultLevel;
  const int BestSizeLevel;
  const int BestSpeedLevel;
  virtual void Compress(...);
  virtual void Decompress(...);
protected:
  CompressionAlgorithm(StringRef Name, CompressionType Type, ...) : Name(Name), Type(Type), ... {}
}
struct ZlibCompressionAlgorithm : CompressionAlgorithm {
  ZlibCompressionAlgorithm() : CompressionAlgorithm("zlib", CompressionType::Zlib, 5, 10, 1) { }
  /* as before */
};
...

Though those can be added as needed - good to keep in mind that they're a useful direction to go, but might simplify the review/discussion to omit them for now.

ckissane updated this revision to Diff 449421.EditedTue, Aug 2, 1:51 PM

compression api: greatly simplify implementation
+ cuts around 100 lines of code from compression.h and compression.cpp

ckissane edited the summary of this revision. (Show Details)Tue, Aug 2, 1:54 PM

Yes, I can continue to trim down the implementation! I agree with your sentiment.

Thanks! This update helps - though I think we'll still want to further isolate the different pieces of this change/reduce this further.

I agree with some of this, I have some strong thoughts I would like to work out about the whole nullptr being none or unsupported a little preemptively IMO.

Could you clarify what use cases you have in mind that require the nuance between none and unsupported? (arguably this accessor function could assert when passed None - would that make it simpler? Then the only null return would be unsupported)

Usage:

if (CompressionAlgorithm *C = getCompressionAlgorithm(CompressionType::Zlib) {

C->compress(...);

}

currently, you can do

if (CompressionKind C = CompressionKind::Zlib) {
  C->compress(...);
}

The implementation complexity is a concern too, though. I think having CompressionKind, boolean conversions and logical operator overloads, etc, in addition to the CompressionAlgorithm doesn't seem to provide enough to justify the complexity - but perhaps I'm missing some context/understanding of the values those features provide?

What sort of use cases do you have in mind that necessitate that complexity/functionality? (specifically I see a lot of || llvm::NoneType() which seems really obtuse/unclear why a user of the API would think to do that/understand that was the right/necessary thing to do)

Maybe for comparison purposes it'd be good not to replace this API but to add it on top of the underlying API so only one callsite can be updated, rather than all the changes necessary to update all clients in one go (& in this way maybe omit some of the functionality in this first patch, since it won't have to cover all use cases - eg: those extra fields (name/compression levels (best size/speed/default), etc) could possibly be omitted from the first version of this patch, so the patch only adds enough functionality for one of the compresion clients (like MC, to match/compare with @MaskRay's patch) rather than all of them)

llvm/include/llvm/Support/Compression.h
48–51

Could we skip this wrapper & just have the underlying function (& also we shouldn't be overloading by case like this anyway - please name all the decompress/compress functions with the same case/spelling)

ckissane added inline comments.Wed, Aug 3, 11:20 AM
llvm/include/llvm/Support/Compression.h
48–51

good point

ckissane updated this revision to Diff 449772.Wed, Aug 3, 1:30 PM
  • remove uppercase Compress, Decompress
ckissane updated this revision to Diff 450093.Thu, Aug 4, 12:05 PM
  • remove compression kind || && overload
ckissane edited the summary of this revision. (Show Details)Thu, Aug 4, 12:08 PM

@dblaikie @MaskRay I would like it if you could all take another look.

In response to @dblaikie 's comments about implementation weight I have greatly simplified the implementation, including removing extra capitalized function overloads (Compress, Decompress), and removing || and && operator overrides. Also adopting the recently suggested class impl in part.

I also believe similar semantics to the nullptr suggestion have been achieved, due to the bool cast returning supported status, and the Unknown type acting as a nullptr of sorts, being always unsupported and -1 (255) as a uint8. Optional(Unknown) is returned from getOptionalCompressionKind(uint8_t) when it is not 0 (NoneType()), 1 (Optional(Zlib)), or 2 (Optional(ZStd)).

In places where explict compression must be used CompressionKind can be passed around, and possibly optional (sometimes none) compression is represented as llvm::Optional<CompressionKind> which I have type aliased as OptionalCompressionKind

ckissane edited the summary of this revision. (Show Details)Thu, Aug 4, 2:56 PM
ckissane edited the summary of this revision. (Show Details)
ckissane updated this revision to Diff 450169.Thu, Aug 4, 4:12 PM
ckissane edited the summary of this revision. (Show Details)
  • fix some nits in Compression.h
ckissane updated this revision to Diff 450176.Thu, Aug 4, 4:23 PM
  • move if into switch
leonardchan added inline comments.Thu, Aug 4, 5:39 PM
clang-tools-extra/clangd/index/Serialization.cpp
251–252

nit: I think error accepts format-like arguments, so you could have something similar to above with:

return error("Compressed string table, but {0} is unavailable", CompressionScheme->Name);
clang/lib/Serialization/ASTReader.cpp
1469–1471
1475–1476

I think this Error takes a StringRef, so I think you could have:

return Error("compression class " + CompressionScheme->Name + " is not available");
lld/ELF/InputSection.cpp
1234

typename might not be needed here

llvm/include/llvm/Support/Compression.h
28–112
40

this-> might not be needed here

86

I think this cast might not be needed

106

I think this cast might not be needed

llvm/lib/ObjCopy/ELF/ELFObject.cpp
442–461

What's the explanation for having the llvm_unreachable branch and getting the compression type? I would've thought this section would just be:

if (Error Err1 = compression::CompressionKind::Zlib->decompress(
        Compressed, DecompressedContent, static_cast<size_t>(Sec.Size))) {
  return createStringError(errc::invalid_argument,
                           "'" + Sec.Name +
                               "': " + toString(std::move(Err1)));
}

which looks like it would have identical behavior to the old code.

530–537

Same here. Should this just be compression::CompressionKind::Zlib->compress(OriginalData, CompressedData);? If this is in preparation for the ELF+zstd changes, perhaps we should save those for another patch once that lands?

llvm/lib/Support/Compression.cpp
53–79

nit: add overrides to be more explicit these are virtual methods

83–95

If the llvm_unreachables should be the default implementation for all subclasses, perhaps the [de]compress methods should be regular virtual with these default implementations rather than abstract virtual.

152

I think this cast might not be needed

152–155

Same here

179–182
dblaikie added inline comments.Thu, Aug 4, 11:47 PM
llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
55–61

This still seems like a lot of hoops to jump through - why "noneIfUnsupported" rather than either having the compression scheme (I think it could be the CompressionAlgorithm itself, rather than having the separate OptionalCompressionKind abstraction) either be null itself, or expose an "isAvailable" operation directly on the CompressionAlgorithm?

Even if the CompressionKind/OptionalCompressionKind/CompressionAlgorithm abstractions are kept, I'm not sure why the above code is preferred over, say:

if (Compress && DoInstrProfNameCompression && OptionalCompressionScheme /* .isAvailable(), if we want to be more explicit */) {
  ...
}

What's the benefit that noneIfUnsupported is providing? (& generally I'd expect the Compress && DoInstrProfNameCompression to be tested/exit early before even naming/constructing/querying/doing anything with the compression scheme/algorithm/etc - so there'd be no need to combine the tests for availability and the tests for whether compression was requested)

Perhaps this API is motivated by a desire to implement something much closer to the original code than is necessary/suitable? Or some other use case/benefit I'm not quite understanding yet?

ckissane added inline comments.Fri, Aug 5, 11:09 AM
llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
55–61

I shall remove noneIfUnsupported. You express good points, we can simply check if(OptionalCompressionScheme && *OptionalCompressionScheme) where necessary.

ckissane added inline comments.Fri, Aug 5, 11:11 AM
llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
55–61

though it will make a lot of existing code patterns less clear, and more verbose

ckissane added inline comments.Fri, Aug 5, 11:13 AM
llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
55–61

and sometimes you really do need to re code the exact thing noneIfUnsupported encapsulates...

dblaikie added inline comments.Fri, Aug 5, 11:21 AM
llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
55–61

Are there examples within LLVM that you can show compare/contrast noneIfUnsupported helps?

ckissane updated this revision to Diff 450338.Fri, Aug 5, 11:32 AM
  • cleanup some compression nits
ckissane added inline comments.Fri, Aug 5, 11:36 AM
llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
55–61

yes, I'll paste a couple here

ckissane added inline comments.Fri, Aug 5, 11:59 AM
llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
55–61

Ok, So I believe I was mistaken.
In older versions of this patch there was a none compression implementation that just did a memcpy, this made a natural fall through state, which made this type of pattern advantageous.
However, this is no longer the case.
Hence I will remove this without further adue.
Thank you for your astute observation!

ckissane updated this revision to Diff 450343.Fri, Aug 5, 12:03 PM
  • remove OptionalCompressionKind noneIfUnsupported
ckissane added inline comments.Fri, Aug 5, 12:06 PM
clang/lib/Serialization/ASTReader.cpp
1475–1476

unfortunately that doesn't work (I tried)

ckissane updated this revision to Diff 450345.Fri, Aug 5, 12:09 PM
  • format error string
ckissane updated this revision to Diff 450392.Fri, Aug 5, 2:00 PM
  • fix InputSection decompress issue
ckissane updated this revision to Diff 450401.Fri, Aug 5, 2:08 PM
  • Merge remote-tracking branch 'origin/main' into ckissane.compression-class-simple

This is looking pretty close to what I've been picturing - the only thing remaining is that I think we could get rid of CompressionKind and access the CompressionAlgorithm directly - basically the contents of CompressionKind::operator-> could be a free/public function const CompressionAlgorithm &getCompressionAlgorithm(enum class CompressionKind); - and have it return a reference to the local static implementation, with a none implementation (for those profile cases where you want to pass in an algorithm if it's available, or none) and one implementation for each of zlib/zstd?

clang-tools-extra/clangd/index/Serialization.cpp
238

What purpose is this expression serving? (isn't it a tautology, given that CompressionScheme was initialized a couple of lines back with CompressionKind::Zlib?

clang/lib/Serialization/ASTWriter.cpp
2003–2008

Generally LLVM style rolls these together

llvm/lib/Object/Decompressor.cpp
41–50

Maybe leave this code more like it was before - it can turn into a switch over ELFCompressionSchemeId when Zstd is added here.

llvm/lib/ProfileData/Coverage/CoverageMappingReader.cpp
122–123

Be nice to share the same CompressionKind

llvm/lib/ProfileData/Coverage/CoverageMappingWriter.cpp
52–57

Why/how does OptionalCompressionKind end up in here, compared with this (suggested edit)?

llvm/lib/ProfileData/InstrProf.cpp
465

Yeah, this seems awkward, but I see what you're getting at - if you're going to pass around an algorithm to use, and this particular kind of use case wants to collapse the "algorithm not available" and "no compression was requested" - so, yeah, for this sort of use case I can see the valid in having a null compression algorithm implementation. That shouldn't add a lot of complexity to the implementation and simplify the usage so it doesn't have two layers of "not present" state.

494

Presumably this doesn't need to test OptionalCompressionScheme here, though, since it's tested in the implementation?

llvm/lib/ProfileData/SampleProfReader.cpp
880–883

Nice to pull out a common variable rather than accessing CompressionKind::Zlib twice independently (otherwise we probably might as well go with the more direct API @MaskRay has proposed)

ckissane updated this revision to Diff 451247.Tue, Aug 9, 12:58 PM
  • address review comments
ckissane added a comment.EditedTue, Aug 9, 2:10 PM

This is looking pretty close to what I've been picturing - the only thing remaining is that I think we could get rid of CompressionKind and access the CompressionAlgorithm directly - basically the contents of CompressionKind::operator-> could be a free/public function const CompressionAlgorithm &getCompressionAlgorithm(enum class CompressionKind); - and have it return a reference to the local static implementation, with a none implementation (for those profile cases where you want to pass in an algorithm if it's available, or none) and one implementation for each of zlib/zstd?

I can see what you are asking for, however since its behavior is essentially the same, and still uses both enum values and class implementations, I don't see any practical advantages (though if there is something I am failing to observe let me know).
Additionally, I can see some disadvantages in largely increasing code verbosity across the codebase.
A snippet of an example from elf: if(CompressionKind::Zlib) CompressionKind::Zlib->compress...
would turn into if(getCompressionAlgorithm(CompressionKind::Zlib)) getCompressionAlgorithm(CompressionKind::Zlib)->compress...
(assuming bool operator overload is also moved to class)
Because of this I am leaning away from such an implementation change.

I have only taken very brief look at the new version. Having an enum class CompressionKind with a parallel CompressionAlgorithm seems redundant.
friend CompressionAlgorithm *CompressionKind::operator->() const; looks magical.

I hope that someone insisting on object-oriented design can put up a version with less boilerplate to compete with D130506.

This is looking pretty close to what I've been picturing - the only thing remaining is that I think we could get rid of CompressionKind and access the CompressionAlgorithm directly - basically the contents of CompressionKind::operator-> could be a free/public function const CompressionAlgorithm &getCompressionAlgorithm(enum class CompressionKind); - and have it return a reference to the local static implementation, with a none implementation (for those profile cases where you want to pass in an algorithm if it's available, or none) and one implementation for each of zlib/zstd?

I can see what you are asking for, however since its behavior is essentially the same, and still uses both enum values and class implementations, I don't see any practical advantages (though if there is something I am failing to observe let me know).

The intent is to simplify both implementation and usage because this currently feels like overkill for a fairly small abstraction benefit (compared to @MaskRay's posted alternative, for instance) - we're abstracting only a handful of use cases over only 2 implementations, so this shouldn't be too complicated/overengineered.

Additionally, I can see some disadvantages in largely increasing code verbosity across the codebase.
A snippet of an example from elf: if(CompressionKind::Zlib) CompressionKind::Zlib->compress...
would turn into if(getCompressionAlgorithm(CompressionKind::Zlib)) getCompressionAlgorithm(CompressionKind::Zlib)->compress...

I think in either case this should be changed to something like this (specifically avoid writing CompressionKind::Zlib twice):

if (auto C = CompressionKind::Zlib)
  C->compress...

(assuming bool operator overload is also moved to class)
Because of this I am leaning away from such an implementation change.

I have only taken very brief look at the new version. Having an enum class CompressionKind with a parallel CompressionAlgorithm seems redundant.
friend CompressionAlgorithm *CompressionKind::operator->() const; looks magical.

I hope that someone insisting on object-oriented design can put up a version with less boilerplate to compete with D130506.

Posted something more comparable to D130506 in D131638 - hard to compare, though - D130506 is additive, whereas D131638 and this D130516 are more replacements - though a different with my change there is at least for the initial patch leaving the old APIs in place, with the intent to incrementally change the usages until the old API can be removed.