[compression] move to a enum->spec->impl approach
+ None compression is signified by a nullptr for a CompressionSpecRef
+ Lack of support (lack of impl) is signified by a nullptr for a CompressionImplRef
+ move away from fancy fake enum approach
An alternative/hybrid between D130516 and D130506 and D131638
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Probably easier to review if it's framed somewhat more like @MaskRay and my examples - backwards compatible, without fixing all the API users as we can discuss those separately.
Though it is useful to have one or two example uses - but having all the API uses being updated in one patch could lead us to discussing the same issues repeatedly if we try to review it all in one go.
llvm/include/llvm/Object/Decompressor.h | ||
---|---|---|
52–53 | I think the member should be the CompressionImpl, rather than the specref - null if no decompression is requested/needed, and non-null if it's required/provided by the consumeCompressedSectionHeader. | |
llvm/include/llvm/Support/Compression.h | ||
35–36 | Ref when it's actually a pointer might be confusing - not sure if these add more value over using the raw pointers themselves? | |
38–39 | Probably don't need both of these, just the one that takes the enum type? | |
40 | I don't think we need this function? | |
44 | This should probably be a const member & the CompressionImpl's probably immutable, but could be const too, just to be explicit | |
46 | Probably don't need this member - it should be communicated by the non-null-ness of Implementation? | |
48–50 | These could/should probably go in the Implementation since they're not useful when the algorithm isn't available anyway? | |
87 | the imple could have a pointer back to the spec, rather than having to do another lookup/need another function? (though, if the levels are moved to the impl, maybe this API is not needed?) | |
95–98 | Generally we don't want more variables that need global constructors in LLVM - so these should probably be function-local statics in functions instead. | |
llvm/lib/Object/Decompressor.cpp | ||
51 | This probably isn't a useful error message for a user. And this code is unreachable/untestable, right? The above code would've already errored out on "unsupported compression type"? |
llvm/lib/Object/Decompressor.cpp | ||
---|---|---|
51 | true, it is essentially a repeat of the above |
llvm/include/llvm/Support/Compression.h | ||
---|---|---|
48–50 | done |
llvm/include/llvm/Support/Compression.h | ||
---|---|---|
95–98 | these are just shortcuts to the function local statics of CompressionSpecRef getCompressionSpec(uint8_t Kind) |
(still a fair few unhandled comments from the last round - guess work to address those is still ongoing)
llvm/include/llvm/Support/Compression.h | ||
---|---|---|
95–98 | Yeah, though they're still globals with non-trivial construction, which is to be avoided in LLVM ( https://llvm.org/docs/CodingStandards.html#do-not-use-static-constructors ) So they should either be removed (I think that's probably the right tradeoff, and the one I showed in my proposal - callers that want only one specific compression algorithm can pay the small runtime overhead of going through the switch unnecessarily) or replaced with global functions that use function local statics so the initialization is only paid when the functions are called, and not by every program that links in LLVM for any reason. |
llvm/include/llvm/Support/Compression.h | ||
---|---|---|
95–98 | I concur with removal |
llvm/include/llvm/Object/Decompressor.h | ||
---|---|---|
52–53 | note that null if no decompression is requested/needed falls under the pattern for a CompressionSpec* however as a decompressor *assumes* that it is not none, it could still make sense to make it a CompressionImpl* as you request, which is null if unsupported (and error) and non-null if it's provided by the consumeCompressedSectionHeader. | |
llvm/include/llvm/Support/Compression.h | ||
35–36 | removed typdefs |
llvm/include/llvm/Object/Decompressor.h | ||
---|---|---|
52–53 | I don't follow, sorry. All the error handling is done in consumeCompressedSectionHeader - the only thing done after/outside that is one unconditional CompressionScheme->Implementation->decompress - so it doesn't need the CompressionScheme only the impl, which it's going to use unconditionally anyway. The Decompressor::decompress will never be called if the header failed to parse/if there isn't a decompressor ready to handle it. At least so far as I can see from the code? | |
llvm/lib/ProfileData/InstrProf.cpp | ||
492–496 | looks like this could be changed to pass the implementation, without the spec? (the caller doesn't need/use the spec) | |
llvm/lib/Support/Compression.cpp | ||
191 | Looks like this comment ( https://reviews.llvm.org/D131992#inline-1270577 ) got lost, as there's still two getCompressionSpec functions - please remove the one taking a raw uint8_t, leaving only the enum version. Callers should deal with validating that they're passing a valid enum to that function, and then getCompressionSpec can return a reference, instead of a pointer - since it'd only be passed a valid compression kind? I think UnknownD (also, what's the D suffix meant to mean? I can't figure it out from context) can probably be removed, and callers can deal with that case themselves? | |
llvm/unittests/Support/CompressionTest.cpp | ||
83–84 | Probably don't need a named variable for this, since it's only used once on the next line? Roll the getCompressionSpec call into the line below? | |
85 | Across this whole patch I think long variable names are probably making things harder to read/glance at than helpful - especially for shorter lived variables (or ones where the scope cares mostly about the variable anyway, so it's not getting lost amongst other variables/intentions in the code) could you use shorter variable names? (even single letter names or acronyms would probably suffice here - but Spec and Impl might be good for the unit test?) But also, like I said previously - it'd be good to pull out all the use case changes maybe into another patch - so they can be reviewed separately & this review can focus on the API design (though I realize they're connected - it's helpful to se the use case to understand the design impact, but maybe with one or two small examples, rather than having to change every usage in one go) |
llvm/include/llvm/Support/Compression.h | ||
---|---|---|
38–39 | removed one | |
llvm/lib/ProfileData/InstrProf.cpp | ||
492–496 | (the caller doesn't need/use the spec)... |
@MaskRay I think this change is probably the best point of comparison/demonstration of this patch direction (taking some minor liberties with this patch to simplify things somewhat in ways that have already been discussed/seem quite reasonable - specifically having getCompresisonSpec return a reference and the enum having either no "unknown" value, or getCompressionSpec asserting on the unknown value):
// removing the hardcoded zlib compression kind parameter in favor of what the code would look like in the near future once it's parameterized on a compression kind CompressionImplementation *Compressor = getCompressionSpec(CompressionAlgorithmKind)->Implementation; bool DoCompression = Compress && DoInstrProfNameCompression && Compressor; if (DoCompression) { Compressor->compress(arrayRefFromStringRef(FilenamesStr), CompressedStr, Compressor->BestSizeLevel); }
I think a reasonable speculation about what the alternative would look like in the a non-OO design would be something like:
bool DoCompression = Compress && DoInstrProfNameCompression && isAvailable(CompressionAlgorithmKind); if (DoCompression) { compress(CompressionAlgorithmKind, arrayRefFromStringRef(FilenamesStr), CompressedStr, getBestSizeLevel(CompressionAlgorithmKind));
And having these three correlated calls (isAvailable, compress, and getBestSizeLevel) all made independently/have to have the same compression algorithm kind passed to them independently seems more error prone & redundant (not in an actual execution/runtime efficiency perspective - I don't think any of these different designs would have materially different runtime performance - but in terms of "duplicate work, and work that could diverge/become inconsistent" - essentially the duplicate work/repeated switch statements, one in each compression::* generic entry point seems like a code smell to me that points towards a design that doesn't have that duplication) rather than tying these calls together and making the lookup happen once and then using the features after that. Also exposing the compression algorithm along with the availability in one operation (not exposing compress/decompress/size APIs when the algorithm isn't available) seems to have similar benefits.
I agree it's somewhat overkill for two types of compression, but I don't think it's so burdensome as to be avoided either. I probably wouldn't've put quite this much consideration into the design if I were doing it myself amongst other things, but I appreciate that @ckissane has & I'm pretty happy with where this is now.
A few things still leftover that I think I've mentioned to @ckissane offline:
- getCompressionSpec returning by reference
- more incremental patches - implementing the generic API in terms of the old one (or keeping the old one as a wrapper around the generic one) with at most one or two uses of the API (if any) in the first commit - then subsequent API migrations in future commits, eventually removing the old API once the migration is complete
- having only the getCompressionSpec(CompressionKind) and dropping the version that takes uint8_t - callers should handle mapping from whatever domain-specific format/representation they have into the CompressionKind enum, there's no need for the raw version here
- Probably removing the 'unknown' CompressionKind - if a caller needs to handle having "maybe a compression kind, or none" they can wrap CompressionKind in Optional themselves and handle not looking up the algorithm if no compression kind is desired/requested
llvm/lib/ProfileData/InstrProf.cpp | ||
---|---|---|
492–496 | That should/can wait for the future change. & I think at that point we could get into what the right design might be - and I was thinking maybe the right design might be for the CompressionImplementation to have a pointer back to its spec. But in general, lets defer that design decision for later - it's not too expensive to change what the parameter types are once the data is needed later. (again, this might be even simpler addressed by not including this change at all up-front - keeping the old API and leaving these callers until you're patching the callers to add the new functionality of being able to choose between Zlib and Zstd - rather than, for now, plumbing in the new API without any change in functionality - the plumbing + functionality together might be easire to understand the motivation for and review, and doing it separately from most of the generic design work will isolate the usage design discussion so each piece can move forward separately, for the most part) |
(I was out of town last week so I did not respond then.)
I am sorry to disagree... As previously mentioned this is an expression problem.
When we maintain no state in the compression algorithms, the OO style has more boilerplate in llvm/lib/Support and in call sites, so I prefer the FP style (free function style) instead.
(Adding a compression algorithm has a significant barrier. I think we tend to add functions more than adding types. The FP style optimizes for adding new functions (while the OO style optimizes for adding new classes), so the FP style wins here.)
// removing the hardcoded zlib compression kind parameter in favor of what the code would look like in the near future once it's parameterized on a compression kind CompressionImplementation *Compressor = getCompressionSpec(CompressionAlgorithmKind)->Implementation; bool DoCompression = Compress && DoInstrProfNameCompression && Compressor; if (DoCompression) { Compressor->compress(arrayRefFromStringRef(FilenamesStr), CompressedStr, Compressor->BestSizeLevel); }I think a reasonable speculation about what the alternative would look like in the a non-OO design would be something like:
bool DoCompression = Compress && DoInstrProfNameCompression && isAvailable(CompressionAlgorithmKind); if (DoCompression) { compress(CompressionAlgorithmKind, arrayRefFromStringRef(FilenamesStr), CompressedStr, getBestSizeLevel(CompressionAlgorithmKind));
Most call sites do not specify the compression level, so getBestSizeLevel(CompressionAlgorithmKind) is omitted.
And having these three correlated calls (isAvailable, compress, and getBestSizeLevel) all made independently/have to have the same compression algorithm kind passed to them independently seems more error prone & redundant (not in an actual execution/runtime efficiency perspective - I don't think any of these different designs would have materially different runtime performance - but in terms of "duplicate work, and work that could diverge/become inconsistent" - essentially the duplicate work/repeated switch statements, one in each compression::* generic entry point seems like a code smell to me that points towards a design that doesn't have that duplication) rather than tying these calls together and making the lookup happen once and then using the features after that. Also exposing the compression algorithm along with the availability in one operation (not exposing compress/decompress/size APIs when the algorithm isn't available) seems to have similar benefits.
In the FP style (free function style), CompressionAlgorithmKind is passed in multiple places. It is as if, in the OO style, the Compressor object is passed in multiple places.
CompressionAlgorithmKind is an scoped enum and there is hardly another variable of the same type in the scope of CompressionAlgorithmKind. So I do not see how the FP style is more error-prone.
To me, not checking isAvailable(CompressionAlgorithmKind) is similar to not checking the nullness of Compressor.
I think the member should be the CompressionImpl, rather than the specref - null if no decompression is requested/needed, and non-null if it's required/provided by the consumeCompressedSectionHeader.